WO2022142006A1

WO2022142006A1 - Semantic recognition-based verbal skill recommendation method and apparatus, device, and storage medium

Info

Publication number: WO2022142006A1
Application number: PCT/CN2021/090170
Authority: WO
Inventors: 南海顺
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-12-30
Filing date: 2021-04-27
Publication date: 2022-07-07
Also published as: CN112732911A; CN112732911B

Abstract

A semantic recognition-based verbal skill recommendation method and apparatus, a device, and a storage medium, relating to a natural language processing technology in the technical field of artificial intelligence. The method comprises: performing semantic recognition on training language materials; classifying the training language material to obtain positive samples and negative samples; randomly combining the positive samples and the negative samples to obtain a training sample set; training a preset initial intention recognition model by means of the training sample set to obtain a call intention model; importing the call content of a current call into the call intention model, and outputting a call intention; and finally importing the call intention into a pre-trained verbal skill recommendation model to obtain a target verbal skill matching the call intention. The present application further relates to blockchain technology, and the current call content can be stored in a blockchain. The customer intention is identified to obtain a valid tag, and the reply content corresponding to the valid tag is recommended, so that the user experience is improved.

Description

Method, apparatus, device and storage medium for speech recommendation based on semantic recognition

This application claims the priority of the Chinese patent application filed on December 30, 2020 with the application number 202011607652.3 and the title of the invention is "Method, Apparatus, Equipment and Storage Medium for Speech Recommendation Based on Semantic Recognition", the entire contents of which are Incorporated herein by reference.

technical field

The present application belongs to the technical field of artificial intelligence, and specifically relates to a method, apparatus, device and storage medium for speech recommendation based on semantic recognition.

Background technique

Artificial intelligence (AI) language is a kind of computer programming language with symbolic processing and logical reasoning capabilities suitable for artificial intelligence and knowledge engineering. It can be used to write programs to solve various complex problems with intelligence, such as non-numerical computing, knowledge processing, reasoning, planning, and decision-making. Artificial intelligence (AI) language is a kind of computer programming language with symbolic processing and logical reasoning capabilities suitable for artificial intelligence and knowledge engineering. It can be used to write programs to solve various complex problems with intelligence such as non-numerical computing, knowledge processing, reasoning, planning, decision-making, etc. The typical artificial intelligence languages mainly include LISP, Prolog, Smalltalk, C++, etc.

At present, the most widely used artificial intelligence (AI) language is the call robot, and for the call robot, the design of the dialogue process is the key to the entire dialogue process. A good dialogue process can make the call robot in the dialogue from The customer's answer is effectively labeled, so that the customer experience is better and closer to the manual performance. However, in the process of researching the current discourse recommendation scheme in the industry, the inventor realized that for task-based dialogue, dialogue nodes are often used to flow according to fixed labels, and the design of the dialogue process is not flexible enough, and the customer experience is poor.

SUMMARY OF THE INVENTION

The purpose of the embodiments of the present application is to propose a speech recommendation method, device, computer equipment and storage medium based on semantic recognition, so as to solve the problem that the existing speech recommendation scheme uses fixed tags for circulation, the design of the dialogue flow is not flexible enough, and the customer experience Bad technical issues.

In order to solve the above technical problems, the embodiments of the present application provide a method for recommending speech based on semantic recognition, which adopts the following technical solutions:

A word recommendation method based on semantic recognition, including:

Obtain the training corpus from the preset historical corpus, perform semantic recognition on the training corpus, and obtain the semantic recognition result of the training corpus, wherein the training corpus is the voice information generated during the communication between the user and the talking robot stored in the historical corpus;

Classify the training corpus based on the semantic recognition results to obtain positive samples and negative samples;

Randomly combine positive samples and negative samples to obtain training sample sets and validation data sets;

The preset initial intent recognition model is trained through the training sample set, and the trained call intent model is verified through the verification data set, and the verified call intent model is obtained;

Receive the intent recognition instruction, and obtain the call content of the current call corresponding to the intent recognition instruction;

Import the call content of the current call into the verified call intent model, and output the call intent matching the current call content;

Import the call intent into the pre-trained speech recommendation model, and get the target speech that matches the call intent.

Further, the steps of obtaining the training corpus from the preset historical corpus, and performing semantic recognition on the training corpus to obtain the semantic recognition result of the training corpus, specifically include:

Obtain the training corpus from the preset historical corpus, and preprocess the training corpus;

Semantic recognition is performed on the preprocessed training corpus based on a preset dictionary library, and the semantic recognition result of the training corpus is obtained.

Further, the steps of randomly combining positive samples and negative samples to obtain a training sample set and a verification data set include:

Label the positive and negative samples respectively;

The labeled positive samples and negative samples are randomly combined to obtain a training sample set and a verification data set, and the training sample set and the verification data set are stored in a preset historical corpus.

Further, the steps of training the preset initial intent recognition model through the training sample set specifically include:

Import the training sample set into the preset initial intent recognition model, perform word segmentation processing on the training corpus in the training sample set, and perform vector feature conversion processing on the training corpus after word segmentation to obtain word vectors;

Perform a convolution operation on the word vector to extract the feature data corresponding to the word vector;

The similarity between the feature data and the preset intent label is calculated, and the initial intent recognition model is iteratively updated based on the similarity calculation result until the model is fitted, and the trained call intent model is output.

Further, the steps of calculating the similarity between the feature data and the preset intent label, and iteratively updating the initial intent recognition model based on the similarity calculation result until the model is fitted, and outputting the trained call intent model, specifically including:

Calculate the similarity between the feature data and the preset intent label, and output the recognition result with the largest similarity as the intent recognition result corresponding to the training corpus;

Based on the intent recognition result and the preset standard result, the back-propagation algorithm is used for fitting to obtain the recognition error;

Compare the recognition error with the preset threshold, and if the recognition error is greater than the preset threshold, iteratively update the call intent model until the recognition error is less than or equal to the preset threshold;

The call intent model with the recognition error less than or equal to the preset threshold is used as the trained call intent model, and the trained call intent model is output.

Further, after the step of outputting the trained call intent model by using the call intent model with the recognition error less than or equal to the preset threshold as the trained call intent model, the method further includes:

Obtain the verification samples in the verification data set, import the verification samples into the trained call intent model, and obtain the model verification results;

Compare the model verification results with the labels of the verification samples, and verify the trained call intent model according to the comparison results.

Further, the steps of importing the call intention into the pre-trained speech recommendation model to obtain the target speech matching the call intention include:

Label the call intent to get the intent label of the current call;

Determine all historical calls associated with the current call, and obtain the intent labels corresponding to all historical calls;

Sort the intent tags of the current call and the intent tags corresponding to all historical calls based on the preset sorting rules to obtain the intent tag sequence;

Import the intent label sequence into the pre-trained speech recommendation model, and output the target speech matching the intent label sequence.

In order to solve the above technical problems, the embodiments of the present application also provide a device for recommending speech based on semantic recognition, which adopts the following technical solutions:

A word recommendation device based on semantic recognition, comprising:

The semantic recognition module is used to obtain the training corpus from the preset historical corpus, perform semantic recognition on the training corpus, and obtain the semantic recognition result of the training corpus. generated voice information;

The corpus classification module is used to classify the training corpus based on the semantic recognition results to obtain positive samples and negative samples;

The sample combination module is used to randomly combine positive samples and negative samples to obtain training sample sets and validation data sets;

The model training module is used to train the preset initial intent recognition model through the training sample set, and verify the trained call intent model through the verification data set, and obtain the verified call intent model;

The instruction receiving module is used to receive the intention recognition instruction, and obtain the call content of the current call corresponding to the intention recognition instruction;

The intent recognition module is used to import the call content of the current call into the verified call intent model, and output the call intent matching the current call content;

The speech generation module is used to import the call intention into the pre-trained speech recommendation model, and obtain the target speech that matches the call intention.

In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:

A computer device includes a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the following semantic recognition-based vocabulary recommendation method is implemented:

In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:

A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the following speech recommendation method based on semantic recognition is implemented:

Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:

The present application discloses a method, device, equipment and storage medium for speech recommendation based on semantic recognition, which belong to the technical field of artificial intelligence. Determine the attributes of the training samples, classify the training corpus based on the semantic recognition results, and obtain positive samples and negative samples, where the positive samples are valid calls, and the negative samples are invalid calls. The training sample set composed of positive and negative samples is used to train a Call intention model, the call intention model can identify the call intention, and finally import the call intention into the pre-trained vocabulary recommendation model to obtain the target vocabulary that matches the call intention. The present application enables the call robot to identify the customer's intention to obtain a valid intent label, and recommends the reply content corresponding to the valid intention label, so that the reply content of the call robot can be closer to the performance of human customer service, and the user experience is improved.

Description of drawings

In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

FIG. 1 shows an exemplary system architecture diagram to which the present application can be applied;

FIG. 2 shows a flow chart of an embodiment of a speech recommendation method based on semantic recognition according to the present application;

FIG. 3 shows a schematic structural diagram of an embodiment of a speech recommendation device based on semantic recognition according to the present application;

FIG. 4 shows a schematic structural diagram of an embodiment of a computer device according to the present application.

Detailed ways

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

As shown in FIG. 1 , the system architecture 100 may include

terminal devices

101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the

terminal devices

101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the

terminal devices

101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

The

terminal devices

101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Compression Standard Audio Layer 4) Players, Laptops and Desktops, etc.

The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the

terminal devices

101 , 102 , and 103 .

It should be noted that the method for recommending terms based on semantic recognition provided by the embodiments of the present application is generally executed by a server, and accordingly, a device for recommending terms based on semantic recognition is generally set in the server.

It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

Continuing to refer to FIG. 2 , a flowchart of one embodiment of a method for speech recommendation based on semantic recognition according to the present application is shown. The speech recommendation method based on semantic recognition includes the following steps:

S201, acquiring training corpus from a preset historical corpus, and performing semantic recognition on the training corpus to obtain a semantic recognition result of the training corpus, wherein the training corpus is the voice information stored in the historical corpus during the communication between the user and the talking robot. .

Specifically, the training corpus is obtained from a preset historical corpus, and semantic recognition is performed on the training corpus through a pre-built dictionary base to obtain a semantic recognition result of the training corpus, wherein the training corpus is the user and the calling robot stored in the historical corpus. Voice messages generated during communication.

S202: Classify the training corpus based on the semantic recognition result to obtain positive samples and negative samples.

Specifically, the training corpus is classified based on the semantic recognition results to obtain positive samples and negative samples. Among them, in the specific embodiment of the present application, positive samples are valid calls, and negative samples are invalid calls. For example, in a product recommendation scenario, the content of a training corpus is as follows:

"-Talking Robot: What do you think about this product?

-User: I think it's good! "

After performing semantic recognition on the above training corpus, it is found that the user has a certain interest in the products mentioned by the talking robot, and it can be seen from the semantic recognition results that the user has a high degree of satisfaction with this pass. In this application, The training corpus with high satisfaction is used as a positive sample, that is, an effective call, and the positive sample label is marked for the above training corpus. For another example, the content of another training corpus is as follows:

"-Talking Robot: What do you think about this product?

-User: Sorry, I'm not interested! Stop recommending me. "

After performing semantic recognition on the above training corpus, it is found that the user is not interested in the products mentioned by the talking robot, and it can be seen from the semantic recognition results that the user's satisfaction with this pass is very low. In this application, the training corpus with low satisfaction is regarded as a negative sample, that is, an invalid call, and the above-mentioned training corpus is marked with a negative sample label.

In this application, all training corpora are classified based on the semantic recognition results, and all training corpora are divided into positive samples and negative samples, and a training sample set randomly composed of positive and negative samples is used to train a call intent model. The call intent model It can identify the call intent corresponding to the input call content. For example, for a product recommendation scenario, whether the user is willing to purchase the recommended product can be identified from the content of the call.

S203, randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set.

Specifically, positive samples and negative samples can be randomly combined to obtain a corpus sample set, and the corpus sample set can be randomly grouped to obtain a training sample set and a verification data set. The training sample set is used for model training of the initial intent recognition model, and the validation dataset is used to verify the trained call intent model.

S204 , train the preset initial intent recognition model by using the training sample set, and verify the trained call intent model by using the verification data set, and obtain the verified call intent model.

Among them, the preset initial intent recognition model can use the CNN deep convolutional neural network model, and the convolutional neural network (Convolutional Neural Networks, CNN) is a kind of feedforward neural network (Feedforward Neural Networks) that includes convolution calculation and has a deep structure. ), which is one of the representative algorithms of deep learning. Convolutional neural network has the ability of representation learning and can perform shift-invariant classification of input information according to its hierarchical structure, so it is also called "shift-invariant artificial neural network". Convolutional neural network is constructed by imitating the visual perception mechanism of biology, which can perform supervised learning and unsupervised learning. Small computational effort to learn grid-like topology features, such as pixels and audio, with stable results and no additional feature engineering requirements on the data.

Specifically, after the training sample set and the verification data set are obtained, the training samples in the obtained training sample set are used to train the preset initial intention recognition model to obtain the call intention model. After the training of the call intent model is completed, the trained call intent model is verified through the verification data set, and the verified call intent model is obtained. The call intent model is used to identify the user's intent during the call between the calling robot and the user, for example, in a business handling scenario, to identify the user's willingness to handle business.

S205: Receive the intent identification instruction, and acquire the call content of the current call corresponding to the intent identification instruction.

Specifically, when there is an intention recognition requirement, the intention recognition instruction is received, the call recording of the current call corresponding to the intention recognition instruction is acquired in real time, and the audio-to-text processing is performed on the call recording of the current call to obtain the call text of the current call. The call text of the current call is preprocessed to obtain the call content of the current call, wherein the preprocessing includes error correction, deduplication, punctuation removal, and the like.

In this embodiment, the electronic device (eg, the server shown in FIG. 1 ) on which the semantic recognition-based vocabulary recommendation method runs may receive the intent recognition instruction through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods currently known or developed in the future .

S206, import the call content of the current call into the verified call intent model, and output a call intent matching the current call content.

Specifically, when the user makes a call to communicate with the call robot, the call robot sends the call content into the call intent model in real time to identify the user's intent, and obtains the call intent recognition result by analyzing the call content.

S207, import the call intention into the pre-trained speech recommendation model, and obtain a target speech matching the call intention.

The speech recommendation model may be a speech generation model that recognizes the intent label sequence and outputs a speech matching the intent label sequence. When recommending words, the call intention of the user's historical calls and the reply words of the corresponding human agents or call robots can be input into the speech recommendation model together as a sequence, so that the words can generate call intentions that can comprehensively consider the historical calls. In its own specific embodiment, the speech recommendation model may be an RNN model or an LSTM model or the like.

Specifically, when training the speech recommendation model, the call intention of the customer's multiple calls and the reply speech of the artificial agent can be marked, and the call intention of the user's multiple calls after the labeling can be sorted according to the call time. Label sequence, map the intent label sequence with the corresponding artificial agent's reply speech, combine the intent label sequence and the successfully mapped artificial agent's reply speech to form the training sample of the speech recommendation model, and input the training sample into the initial speech Recommendation model, get the trained vocabulary recommendation model. When in use, import the call intent into the pre-trained speech recommendation model, and obtain the target speech that matches the call intent.

At present, for the call robot, the design of the dialogue process is the key to the entire dialogue process. A good dialogue process can allow the call robot to obtain effective information from the customer's answer in the dialogue, and respond according to the effective information, so that the customer can The experience is better. However, in the current industry, for task-based conversations, call bots often use conversation nodes to circulate according to fixed labels. The design of the conversation process is not flexible enough, and the customer experience is poor.

Based on the above technical problems, the present application discloses a method for recommending vocabulary based on semantic recognition, which belongs to the technical field of artificial intelligence. The method obtains the semantic recognition result of the training corpus by performing semantic recognition on the training corpus, and judges the training through the semantic recognition. The attributes of the samples, classify the training corpus based on the semantic recognition results, and obtain positive samples and negative samples, where the positive samples are valid calls, and the negative samples are invalid calls. The training sample set composed of positive and negative samples is used to train a call intent. The call intention model can identify the call intention, and finally import the call intention into the pre-trained vocabulary recommendation model to obtain the target vocabulary that matches the call intention. The present application enables the call robot to identify the customer's intention to obtain a valid intent label, and recommends the reply content corresponding to the valid intention label, so that the reply content of the call robot can be closer to the performance of human customer service, and the user experience is improved.

Specifically, the training corpus is obtained from a preset historical corpus, and the training corpus is preprocessed, wherein the training corpus is the voice information stored in the historical corpus during the communication between the user and the calling robot. Semantic recognition is performed on the preprocessed training corpus based on a preset dictionary library, and the semantic recognition result of the training corpus is obtained.

In the above embodiment, the present application pre-establishes a dictionary base, which contains all the words in the training corpus, each word corresponds to a unique identification number, and uses one-hot text representation to obtain the training corpus through text mapping. semantic recognition results.

Label the positive and negative samples respectively;

Specifically, the positive samples and negative samples are marked respectively, and the marked positive samples and negative samples are randomly combined to obtain a corpus sample set. For example, the training corpus in the corpus sample set is randomly divided into 10 equal sample subsets, wherein 9 sample subsets are randomly combined as the training sample set, and the remaining sample subsets are used as the validation data set. Import the training sample set into the initial intent recognition model for model training to obtain a trained call intent model, verify the trained call intent model through the verification data set, and output the verified call intent model. In the above embodiment, by constructing a training sample set and a verification data set, and respectively training and verifying the initial recognition model through the training sample set and the verification data set, the user intent recognition model can be quickly obtained.

Specifically, the preset initial intent recognition model includes an input layer, a convolution layer and an output layer. After the training sample set is imported into the CNN model, firstly, the training corpus of the training sample set is subjected to word segmentation and vector feature conversion processing at the input layer of the CNN to obtain the word vector corresponding to each word segmentation in the training corpus, and then each word vector in the training corpus is obtained. The word vector corresponding to the word segmentation is input to the convolution layer of CNN for feature extraction, and the feature data of each word segmentation is obtained. Finally, the similarity between the feature data and the preset intent label is calculated in the output layer of CNN, and the maximum similarity is output. The recognition result of 1 is taken as the intent recognition result corresponding to the training corpus, and the initial intent recognition model is iteratively updated based on the recognition result with the largest similarity until the model is fitted, and the trained call intent model is output.

In a specific embodiment of the present application, the recognition result is output through a softmax function to implement intent classification. When building the initial recognition model, set the corresponding loss function, where the loss function is the cross-entropy loss function. During the training of the call intent model, the trained call intent model is iteratively updated to obtain the fitted call intent model. Among them, the establishment and training of the call intent model can be completed in the tensorflow library in Python.

Among them, the backpropagation algorithm, that is, the error backpropagation algorithm (Backpropagation algorithm, BP algorithm) is a learning algorithm suitable for multi-layer neuron networks. It is based on the gradient descent method and is used for the error of deep learning networks. calculate. The input and output relationship of BP network is essentially a mapping relationship: the function completed by a BP neural network with n input and m output is a continuous mapping from n-dimensional Euclidean space to a finite field in m-dimensional Euclidean space. A map is highly nonlinear. The learning process of BP algorithm consists of forward propagation process and back propagation process. In the process of forward propagation, the input information is processed layer by layer through the hidden layer through the input layer and transmitted to the output layer, and then transferred to the back propagation, and the partial derivative of the objective function to the weight of each neuron is obtained layer by layer, which constitutes The gradient of the objective function to the weight vector is used as the basis for modifying the weight.

Specifically, the training sample set is obtained from a preset database, the training sample set is imported into the initial recognition model for model training, the intent recognition result corresponding to the training corpus is output, and back propagation is used based on the intent recognition result and the preset standard result. The algorithm performs fitting calculation, obtains the recognition error, and compares the recognition error with the preset error threshold. If the recognition error is greater than the preset error threshold, the trained call intent model is iteratively updated based on the loss function of the call intent model, until Until the recognition error is less than or equal to the preset error threshold, the verified call intent model is obtained. The preset standard result and the preset error threshold may be set in advance. In the above-mentioned embodiment, the trained call intent model is iterated through the back-propagation algorithm to obtain the output fitted call intent model.

Specifically, after the iteration of the call intent model is completed, a verification sample is obtained from the verification data set of the preset historical corpus, and the verification sample is imported into the trained call intent model, the model verification result is obtained, and the model verification result is compared with the label of the verification sample. Compare and verify the trained call intent model according to the comparison result. If the model verification result matches the label of the verification sample, the performance of the call intent model meets the requirements. Otherwise, it is necessary to re-run the positive and negative samples Combine to form a new training sample set, and train the initial intent recognition model through the new training sample set.

Label the call intent to get the intent label of the current call;

Specifically, the intent tags of the current call and the intent tags corresponding to all historical calls are sorted based on the call time to obtain the intent tag sequence. For example, in the loan collection scenario, the intent label sequence of the user's five calls is "accept repayment, unable to repay, accept repayment, accept repayment, unable to repay", according to the preset coding rules, the above intent label sequence For example, "receive repayment" is encoded as "1", and "unable to repay" is encoded as "0", the above-mentioned intent tag sequence can be expressed as "10110" after encoding. The encoding result "10110" is imported into the pre-trained vocabulary recommendation model, and the target vocabulary matching the encoding result "10110" of the intent label sequence is output. By combining the call intention of the user's historical calls and the call intention of the current call to form an intention label sequence as the sequence input speech recommendation model, the call intention of the historical call can be taken into account when recommending this dialogue.

In the specific embodiment of the present application, for example, when the customer is being charged for repayment, the customer's current call is shown as a scene of uncooperative repayment, and the customer has never had a similar situation in the historical calls, according to the different answers of the customer service. May have different effects:

(1) Based on the call intention of the current call, recommend a reply technique that is consistent with the call intention of the current call: if the customer is not willing to repay, recommend a fixed technique related to the unwilling repayment, such as:

"- User: I have no money recently and can't pay it back.

-Calling robot: you should hurry up and think of a way, otherwise it will affect your credit report. "

(2) Recommending reply phrases that match the intent tag sequence based on the intent tag sequence: The customer is currently unwilling to repay, but the user has a high historical repayment willingness and good credit, and recommends a combination phrase that matches the intent tag sequence, like:

"- User: I have no money recently and can't pay it back.

- Talking bot: I know about your previous repayments and your credit has been good.

- Talking Bot: Have you had some difficulties recently?

- Talking Bot: Do you need to apply for a deferment of repayment?

-Calling robot: Don't affect your good credit due to overdue repayment!

…”

In the above embodiment, by combining the call intention of the user's historical calls and the call intention of the current call to form an intention label sequence as the sequence input speech recommendation model, the call intention of the historical call can be taken into account when recommending this dialogue, and it is possible to Let the call robot identify the customer's intention to obtain the effective intent label, and recommend the reply content corresponding to the effective intent label, so that the reply content of the call robot can be closer to the performance of human customer service, and improve the user experience.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned current call content, the above-mentioned current call content may also be stored in a node of a blockchain.

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, the processes of the above-mentioned method embodiments may be included. Wherein, the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

Further referring to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a speech recommendation device based on semantic recognition, and the device embodiment corresponds to the method embodiment shown in FIG. 2 , Specifically, the device can be applied to various electronic devices.

As shown in FIG. 3 , the apparatus for recommending words based on semantic recognition described in this embodiment includes:

The semantic recognition module 301 is used for acquiring training corpus from a preset historical corpus, and performing semantic recognition on the training corpus to obtain a semantic recognition result of the training corpus, wherein the training corpus is the communication process between the user and the calling robot stored in the historical corpus voice information generated in the

The corpus classification module 302 is used to classify the training corpus based on the semantic recognition result to obtain positive samples and negative samples;

The sample combination module 303 is used to randomly combine positive samples and negative samples to obtain a training sample set and a verification data set;

The model training module 304 is configured to train the preset initial intent recognition model through the training sample set, and verify the trained call intent model through the verification data set, and obtain the verified call intent model;

an instruction receiving module 305, configured to receive an intent identification instruction, and obtain the call content of the current call corresponding to the intent identification instruction;

Intent recognition module 306, for importing the call content of the current call into the verified call intention model, and outputting the call intention matching the current call content;

The speech generation module 307 is used for importing the call intention into the pre-trained speech recommendation model to obtain the target speech matching the call intention.

Further, the semantic recognition module 301 specifically includes:

The corpus preprocessing unit is used to obtain the training corpus from the preset historical corpus and preprocess the training corpus;

The semantic recognition unit is used to perform semantic recognition on the preprocessed training corpus based on a preset dictionary library, and obtain the semantic recognition result of the training corpus.

Further, the sample combination module 303 specifically includes:

The sample labeling unit is used to label positive samples and negative samples respectively;

The sample combination unit is used to randomly combine the labeled positive samples and negative samples to obtain a training sample set and a verification data set, and store the training sample set and the verification data set in a preset historical corpus.

Further, the model training module 304 specifically includes:

The feature conversion unit is used to import the training sample set into the preset initial intention recognition model, perform word segmentation processing on the training corpus in the training sample set, and perform vector feature transformation processing on the training corpus after word segmentation to obtain word vectors;

The convolution operation unit is used to perform the convolution operation on the word vector to extract the feature data corresponding to the word vector;

The similarity calculation unit is used to calculate the similarity between the feature data and the preset intent label, and based on the similarity calculation result, iteratively updates the initial intent recognition model until the model is fitted, and outputs the trained call intent model.

Further, the similarity calculation unit specifically includes:

The similarity calculation subunit is used to calculate the similarity between the feature data and the preset intent label, and output the recognition result with the largest similarity as the intent recognition result corresponding to the training corpus;

The fitting subunit is used to perform fitting based on the intent recognition result and the preset standard result using the back-propagation algorithm to obtain the recognition error;

an iterative subunit, configured to compare the recognition error with a preset threshold, and if the recognition error is greater than the preset threshold, iteratively update the call intent model until the recognition error is less than or equal to the preset threshold;

The model output subunit is used for taking the call intent model with the recognition error less than or equal to the preset threshold as the trained call intent model, and outputting the trained call intent model.

Further, the model training module 304 also includes:

The model verification subunit is used to obtain the verification samples in the verification data set, and import the verification samples into the trained call intent model to obtain the model verification results;

The verification and comparison subunit is used to compare the model verification result with the label of the verification sample, and verify the trained call intent model according to the comparison result.

Further, the speech generation module 307 specifically includes:

The intent labeling unit is used to label the call intent and obtain the intent label of the current call;

an association unit, used to determine all historical calls that are associated with the current call, and obtain intent labels corresponding to all historical calls;

a sorting unit, configured to sort the intent tags of the current call and the intent tags corresponding to all historical calls based on a preset sorting rule, to obtain an intent tag sequence;

The speech generation unit is used to import the intent label sequence into the pre-trained speech recommendation model, and output the target speech matching the intent label sequence.

The present application discloses a vocabulary recommendation device based on semantic recognition, which belongs to the technical field of artificial intelligence. The present application obtains the semantic recognition result of the training corpus by performing semantic recognition on the training corpus, and judges the attributes of the training samples through the semantic recognition. The recognition results classify the training corpus to obtain positive samples and negative samples. The positive samples are valid calls, and the negative samples are invalid calls. Through the training sample set composed of positive and negative samples, a call intent model is trained. The call intent model can Identify the call intention, and finally import the call intention into the pre-trained vocabulary recommendation model to obtain the target vocabulary matching the call intention. The present application enables the call robot to identify the customer's intention to obtain a valid intent label, and recommends the reply content corresponding to the valid intention label, so that the reply content of the call robot can be closer to the performance of human customer service, and the user experience is improved.

To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.

The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that communicate with each other through a system bus. It should be noted that only the computer device 4 with components 41-43 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.

The memory 41 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or a memory of the computer device 4 . In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital ( Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store the operating system and various application software installed on the computer device 4 , such as computer-readable instructions for a method of speech recommendation based on semantic recognition. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 42 is typically used to control the overall operation of the computer device 4 . In this embodiment, the processor 42 is configured to execute computer-readable instructions stored in the memory 41 or process data, for example, computer-readable instructions for executing the semantic recognition-based term recommendation method.

The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.

The present application discloses a device belonging to the technical field of artificial intelligence. The semantic recognition result of the training corpus is obtained by performing semantic recognition on the training corpus. The present application judges the attributes of the training samples through the semantic recognition, and performs the training corpus based on the semantic recognition result. Classify, get positive samples and negative samples, where positive samples are valid calls, and negative samples are invalid calls. Through the training sample set composed of positive and negative samples, a call intent model is trained. The call intent model can identify the call intent, and finally the The call intention is imported into the pre-trained vocabulary recommendation model, and the target vocabulary matching the call intention is obtained. The present application enables the call robot to identify the customer's intention to obtain a valid intent label, and recommends the reply content corresponding to the valid intention label, so that the reply content of the call robot can be closer to the performance of human customer service, and the user experience is improved.

The present application also provides another implementation manner, that is, to provide a computer-readable storage medium, the computer-readable storage medium may be non-volatile or volatile, and the computer-readable storage medium stores Computer readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the semantic recognition-based vocabulary recommendation method as described above.

The present application discloses a storage medium, which belongs to the technical field of artificial intelligence. The present application obtains the semantic recognition result of the training corpus by performing semantic recognition on the training corpus, judges the attributes of the training sample through the semantic recognition, and analyzes the training corpus based on the semantic recognition result. Classify to get positive samples and negative samples, where the positive samples are valid calls and the negative samples are invalid calls. Through the training sample set composed of positive and negative samples, a call intent model is trained. The call intent model can identify the call intent, and finally Import the call intent into the pre-trained speech recommendation model, and get the target speech that matches the call intent. The present application enables the call robot to identify the customer's intention to obtain a valid intent label, and recommends the reply content corresponding to the valid intention label, so that the reply content of the call robot can be closer to the performance of human customer service, and the user experience is improved.

From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims

A speech recommendation method based on semantic recognition, including:

Acquire training corpus from a preset historical corpus, perform semantic recognition on the training corpus, and obtain a semantic recognition result of the training corpus, wherein the training corpus is the communication between the user and the calling robot stored in the historical corpus Voice information generated during the process;

Classify the training corpus based on the semantic recognition result to obtain positive samples and negative samples;

Randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set;

Train a preset initial intent recognition model by using the training sample set, and verify the trained call intent model by using the verification data set, and obtain a verified call intent model;

Receive the intent recognition instruction, and obtain the call content of the current call corresponding to the intent recognition instruction;

importing the call content of the current call into the verified call intent model, and outputting a call intent matching the current call content;

The call intention is imported into a pre-trained speech recommendation model, and a target speech matching the call intention is obtained.
The speech recommendation method based on semantic recognition according to claim 1, wherein the step of acquiring training corpus from a preset historical corpus, performing semantic recognition on the training corpus, and obtaining a semantic recognition result of the training corpus , including:

Obtain training corpus from a preset historical corpus, and preprocess the training corpus;

Perform semantic recognition on the preprocessed training corpus based on a preset dictionary library to obtain a semantic recognition result of the training corpus.
The speech recommendation method based on semantic recognition according to claim 1, wherein the step of randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set specifically includes:

label the positive samples and the negative samples respectively;

The labeled positive samples and the negative samples are randomly combined to obtain a training sample set and a verification data set, and the training sample set and the verification data set are stored in the preset historical corpus.
The speech recommendation method based on semantic recognition according to claim 1, wherein the step of training a preset initial intent recognition model by using the training sample set specifically includes:

Importing the training sample set into a preset initial intent recognition model, performing word segmentation processing on the training corpus in the training sample set, and performing vector feature conversion processing on the training corpus after word segmentation to obtain word vectors;

Perform a convolution operation on the word vector to extract feature data corresponding to the word vector;

The similarity between the feature data and the preset intent label is calculated, and the initial intent recognition model is iteratively updated based on the similarity calculation result until the model is fitted, and the trained call intent model is output.
The speech recommendation method based on semantic recognition according to claim 4, wherein the calculation of the similarity between the feature data and the preset intention label, and the iteration of the initial intention recognition model based on the similarity calculation result The steps of updating until the model is fitted and outputting the trained call intent model include:

Calculate the similarity between the feature data and the preset intent label, and output the recognition result with the largest similarity as the intent recognition result corresponding to the training corpus;

Based on the intent recognition result and the preset standard result, the back-propagation algorithm is used for fitting to obtain the recognition error;

Compare the recognition error with the preset threshold, and if the recognition error is greater than the preset threshold, iteratively update the call intent model until the recognition error is less than or equal to the preset threshold;

The call intent model with the recognition error less than or equal to the preset threshold is used as the trained call intent model, and the trained call intent model is output.
The speech recommendation method based on semantic recognition according to claim 5, wherein, in the step of outputting the trained calling intent model by using the calling intent model with the recognition error less than or equal to a preset threshold as the trained calling intent model After that, also include:

Obtaining the verification samples in the verification data set, and importing the verification samples into the trained call intent model, to obtain the model verification result;

The model verification result is compared with the label of the verification sample, and the trained call intent model is verified according to the comparison result.
The speech recommendation method based on semantic recognition according to any one of claims 1 to 6, wherein the calling intention is imported into a pre-trained speech recommendation model to obtain a target speech matching the calling intention The steps of the technique include:

Labeling the call intention to obtain the intention label of the current call;

Determine all historical calls that are associated with the current call, and acquire intent labels corresponding to all historical calls;

Sorting the intent tags of the current call and the intent tags corresponding to all historical calls based on a preset sorting rule, to obtain an intent tag sequence;

The intent label sequence is imported into the pre-trained speech recommendation model, and the target speech matching the intent label sequence is output.
A word recommendation device based on semantic recognition, comprising:

A semantic recognition module, used for acquiring training corpus from a preset historical corpus, and performing semantic recognition on the training corpus to obtain a semantic recognition result of the training corpus, wherein the training corpus is stored in the historical corpus The voice information generated during the communication between the user and the call robot;

A corpus classification module, configured to classify the training corpus based on the semantic recognition result to obtain positive samples and negative samples;

a sample combination module for randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set;

A model training module, configured to train a preset initial intent recognition model through the training sample set, and verify the trained call intent model through the verification data set, and obtain a verified call intent model;

an instruction receiving module, configured to receive an intent identification instruction, and obtain the call content of the current call corresponding to the intent identification instruction;

an intention identification module, used for importing the call content of the current call into the call intention model that has passed the verification, and outputting a call intention matching the current call content;

The speech generation module is used for importing the call intention into a pre-trained speech recommendation model to obtain a target speech matching the call intention.
A computer device, comprising a memory and a processor, wherein computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the processor implements the following method for recommending vocabulary based on semantic recognition:

Acquire training corpus from a preset historical corpus, perform semantic recognition on the training corpus, and obtain a semantic recognition result of the training corpus, wherein the training corpus is the communication between the user and the calling robot stored in the historical corpus Voice information generated during the process;

Classify the training corpus based on the semantic recognition result to obtain positive samples and negative samples;

Randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set;

Train a preset initial intent recognition model by using the training sample set, and verify the trained call intent model by using the verification data set, and obtain a verified call intent model;

Receive the intent recognition instruction, and obtain the call content of the current call corresponding to the intent recognition instruction;

importing the call content of the current call into the verified call intent model, and outputting a call intent matching the current call content;

The call intention is imported into a pre-trained speech recommendation model, and a target speech matching the call intention is obtained.
The computer device according to claim 9, wherein the step of acquiring training corpus from a preset historical corpus, and performing semantic recognition on the training corpus to obtain a semantic recognition result of the training corpus, specifically includes:

Obtain training corpus from a preset historical corpus, and preprocess the training corpus;

Perform semantic recognition on the preprocessed training corpus based on a preset dictionary library to obtain a semantic recognition result of the training corpus.
The computer device according to claim 9, wherein the step of randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set specifically includes:

Label the positive samples and the negative samples respectively;

The labeled positive samples and the negative samples are randomly combined to obtain a training sample set and a verification data set, and the training sample set and the verification data set are stored in the preset historical corpus.
The computer device according to claim 9, wherein the step of training a preset initial intent recognition model by using the training sample set specifically includes:

Importing the training sample set into a preset initial intention recognition model, performing word segmentation processing on the training corpus in the training sample set, and performing vector feature conversion processing on the training corpus after word segmentation to obtain a word vector;

Perform a convolution operation on the word vector to extract feature data corresponding to the word vector;

The similarity between the feature data and the preset intent label is calculated, and the initial intent recognition model is iteratively updated based on the similarity calculation result until the model is fitted, and the trained call intent model is output.
The computer device according to claim 12, wherein the calculation of the similarity between the feature data and the preset intent label, and based on the similarity calculation result, the initial intent recognition model is iteratively updated until the model fits Combined, the steps of outputting the trained call intent model include:

Calculate the similarity between the feature data and the preset intent label, and output the recognition result with the largest similarity as the intent recognition result corresponding to the training corpus;

Based on the intent recognition result and the preset standard result, the back-propagation algorithm is used for fitting to obtain the recognition error;

Compare the recognition error with the preset threshold, and if the recognition error is greater than the preset threshold, iteratively update the call intent model until the recognition error is less than or equal to the preset threshold;

The call intent model with the recognition error less than or equal to the preset threshold is used as the trained call intent model, and the trained call intent model is output.
The computer device according to claim 13, wherein, after the step of outputting the trained call intent model by using the call intent model whose identification error is less than or equal to a preset threshold as the trained call intent model, the method further comprises:

Obtaining the verification samples in the verification data set, and importing the verification samples into the trained call intent model, to obtain the model verification result;

The model verification result is compared with the label of the verification sample, and the trained call intent model is verified according to the comparison result.
A computer-readable storage medium, where computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, implement the following method for recommending vocabulary based on semantic recognition:

Acquire training corpus from a preset historical corpus, perform semantic recognition on the training corpus, and obtain a semantic recognition result of the training corpus, wherein the training corpus is the communication between the user and the calling robot stored in the historical corpus Voice information generated during the process;

Classify the training corpus based on the semantic recognition result to obtain positive samples and negative samples;

Randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set;

Train a preset initial intent recognition model by using the training sample set, and verify the trained call intent model by using the verification data set, and obtain a verified call intent model;

Receive the intent recognition instruction, and obtain the call content of the current call corresponding to the intent recognition instruction;

Import the call content of the current call into the call intention model that has passed the verification, and output the call intention matching the current call content;

The call intention is imported into a pre-trained speech recommendation model, and a target speech matching the call intention is obtained.
The computer-readable storage medium according to claim 15, wherein the step of acquiring training corpus from a preset historical corpus and performing semantic recognition on the training corpus to obtain a semantic recognition result of the training corpus, specifically include:

Obtain training corpus from a preset historical corpus, and preprocess the training corpus;

Perform semantic recognition on the preprocessed training corpus based on a preset dictionary library to obtain a semantic recognition result of the training corpus.
The computer-readable storage medium according to claim 15, wherein the step of randomly combining the positive samples and the negative samples to obtain a training sample set and a verification data set specifically includes:

Label the positive samples and the negative samples respectively;

The labeled positive samples and the negative samples are randomly combined to obtain a training sample set and a verification data set, and the training sample set and the verification data set are stored in the preset historical corpus.
The computer-readable storage medium according to claim 15, wherein the step of training a preset initial intent recognition model by using the training sample set specifically includes:

Importing the training sample set into a preset initial intention recognition model, performing word segmentation processing on the training corpus in the training sample set, and performing vector feature conversion processing on the training corpus after word segmentation to obtain a word vector;

Perform a convolution operation on the word vector to extract feature data corresponding to the word vector;

The similarity between the feature data and the preset intent label is calculated, and the initial intent recognition model is iteratively updated based on the similarity calculation result until the model is fitted, and the trained call intent model is output.
The computer-readable storage medium of claim 18, wherein the calculating the similarity between the feature data and the preset intent label, and iteratively updating the initial intent recognition model based on the similarity calculation result, Until the model is fitted, the steps of outputting the trained call intent model include:

Calculate the similarity between the feature data and the preset intent label, and output the recognition result with the largest similarity as the intent recognition result corresponding to the training corpus;

Based on the intent recognition result and the preset standard result, the back-propagation algorithm is used for fitting to obtain the recognition error;

Compare the recognition error with the preset threshold, and if the recognition error is greater than the preset threshold, iteratively update the call intent model until the recognition error is less than or equal to the preset threshold;

The call intent model with the recognition error less than or equal to the preset threshold is used as the trained call intent model, and the trained call intent model is output.
The computer-readable storage medium according to claim 19, wherein, after the step of outputting the trained call intent model by using the call intent model with the recognition error less than or equal to a preset threshold as the trained call intent model, Also includes:

Obtaining the verification samples in the verification data set, and importing the verification samples into the trained call intent model, to obtain the model verification result;

The model verification result is compared with the label of the verification sample, and the trained call intent model is verified according to the comparison result.