CN112839137A

CN112839137A - Call processing method, device, equipment and storage medium based on background environment

Info

Publication number: CN112839137A
Application number: CN202011623437.2A
Authority: CN
Inventors: 钱先洋
Original assignee: Ping An Puhui Enterprise Management Co Ltd
Current assignee: Ping An Puhui Enterprise Management Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-25

Abstract

The scheme relates to the technical field of artificial intelligence, and provides a call processing method, a device, equipment and a storage medium based on a background environment, wherein the method comprises the following steps: acquiring call data corresponding to a client ID; separating the call data by adopting a voice separation model to obtain background voice data and client voice data; analyzing and processing the voice data of the client to obtain an intention analysis result, and obtaining an initial answering conversation based on the intention analysis result; identifying background sound data to obtain a background environment type; if the background environment type is the conversation environment type, adjusting the initial answer according to the background environment type to obtain a target answer; the method comprises the steps of determining a broadcasting strategy according to the background environment type, broadcasting a target answering according to the broadcasting strategy, obtaining an outbound result corresponding to the customer ID, flexibly adjusting an outbound process, effectively improving experience of the customer in a calling process, and improving the outbound success rate.

Description

Call processing method, device, equipment and storage medium based on background environment

Technical Field

The present invention relates to the field of voice call processing, and in particular, to a method, an apparatus, a device, and a storage medium for call processing based on a background environment.

Background

With the development of artificial intelligence, man-machine conversation systems are developed in the fields of intelligent sales, intelligent home, intelligent assistants and the like, and bring great convenience to the life of people. The intelligent robot performs multi-round interactive efficient communication with the client through professional-level real person type simulated sales or customer service experts, so that the sales cost is greatly reduced, and the client obtaining efficiency is improved.

At present, in the process of communication between an intelligent robot and a client, the influence of a background environment on the client is ignored, and the outbound call success rate is low.

Disclosure of Invention

The embodiment of the invention provides a call processing method, a call processing device, computer equipment and a storage medium based on a background environment, wherein the call processing data of a user is processed by adopting the technologies of voice processing, voice recognition and the like, so that the background environment type is analyzed in the call process of a call processing system and the user, the background environment type is utilized for intelligent adjustment, the intelligent degree of outgoing call of a telephone is improved, and the better effect of the outgoing call is ensured, so that the problem of lower success rate of the outgoing call caused by neglecting the influence of the background environment on the client in the communication process of an intelligent robot and the client is solved.

A call processing method based on background environment comprises the following steps:

acquiring call data corresponding to a client ID;

separating the call data by adopting a voice separation model to obtain background voice data and client voice data;

analyzing and processing the client voice data to obtain an intention analysis result, and obtaining an initial answering conversation based on the intention analysis result;

identifying the background sound data to obtain a background environment type;

if the background environment type is a call environment type, adjusting the initial answer according to the background environment type to obtain a target answer;

and determining a broadcasting strategy according to the background environment type, broadcasting the target answering according to the broadcasting strategy, and acquiring an outbound result corresponding to the customer ID.

A context-based call processing apparatus, comprising:

the call data acquisition module is used for acquiring call data corresponding to the client ID;

the separation processing module is used for separating the call data by adopting a voice separation model to obtain background voice data and client voice data;

the analysis processing module is used for analyzing and processing the client voice data, obtaining an intention analysis result and obtaining an initial answer based on the intention analysis result;

the recognition processing module is used for recognizing the background sound data to obtain a background environment type;

the target answer obtaining module is used for adjusting the initial answer according to the background environment type to obtain a target answer if the background environment type is a call environment type;

and the outbound result acquisition module is used for determining a broadcasting strategy according to the background environment type, broadcasting the target answering according to the broadcasting strategy and acquiring an outbound result corresponding to the customer ID.

A computer device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, said processor implementing the steps of the above-described context-based call processing method when executing said computer program.

A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned context-based call processing method.

According to the call processing method and device based on the background environment, the computer equipment and the storage medium, the call data corresponding to the client ID is obtained, the call data obtained by the server only comprise the client voice data and the background voice data, the original answering conversation broadcasted by the intelligent robot cannot be collected, and the simplicity and convenience of subsequent voice separation processing are facilitated. The voice separation model is adopted to separate the call data to obtain background sound data and client voice data, a broadcast strategy can be generated based on the background sound data subsequently, the background sound data is reasonably utilized, the voice technique broadcasted by the intelligent robot is adjusted, the intelligent degree of outgoing call of the phone is improved, and the better effect of the outgoing call is ensured. And analyzing and processing the voice data of the client to obtain an intention analysis result, and obtaining an initial answer based on the intention analysis result, so that the method is favorable for quickly providing service for the client, achieves the aim of voice calling, and realizes smooth communication between the intelligent robot and the client. And identifying the background sound data to obtain a background environment type, providing more reference factors for intelligent outbound, so as to make a broadcasting strategy by using the background environment type subsequently, improve the success rate of the intelligent outbound, flexibly adjust the initial answer and improve the intelligent degree. If the background environment type is the conversation environment type, the initial answer is adjusted according to the background environment type to obtain a target answer, so that the answering, voice, speed and the like in the broadcasting process are ensured to be more suitable for the environment where the client is located, and the outbound success rate is effectively improved. Determining a broadcasting strategy according to the background environment type, broadcasting the target answering according to the broadcasting strategy, acquiring an outbound result corresponding to the customer ID, intelligently generating different broadcasting strategies according to the background environment where the customer is actually located, flexibly adjusting an outbound process, ensuring the achievement of an outbound purpose and effectively improving the experience of a customer calling process.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a diagram illustrating an application environment of a method for call processing based on context according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for call processing based on context in an embodiment of the present invention;

FIG. 3 is another flow chart of a method for context-based call processing according to an embodiment of the present invention;

FIG. 4 is another flow chart of a method for context-based call processing according to an embodiment of the present invention;

FIG. 5 is another flowchart of a method for context-based call processing according to an embodiment of the present invention;

FIG. 6 is another flow chart of a method for context based call processing in an embodiment of the present invention;

FIG. 7 is a schematic block diagram of a context-based call processing apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of a computer device according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The call processing method based on the background environment provided by the embodiment of the invention can be applied to the application environment shown in fig. 1. Specifically, the call processing method based on the background environment is applied to a call processing system based on the background environment, the call processing system based on the background environment comprises a client and a server shown in fig. 1, the client and the server are communicated through a network and are used for analyzing the type of the background environment in the process of communication between the call processing system and a user, intelligent adjustment is carried out by utilizing the type of the background environment, the intelligent degree of outgoing call of a telephone is improved, and the better effect of the outgoing call is ensured. The client is also called a user side, and refers to a program corresponding to the server and providing local services for the client. The client may be installed on, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.

In an embodiment, as shown in fig. 2, a method for processing a call based on a background environment is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

s201: and acquiring the call data corresponding to the client ID.

The client ID is an identifier that uniquely identifies the client, and may be, for example, a client name or a client number.

The call data is data formed by the customer answering the intelligent robot, for example, when the call is successfully made, the intelligent robot broadcasts a preset original answering call, such as 'respected customer, you, ask you xxx', then, the customer answers the call data 'xxx', when the customer starts to talk, the call data replied by the customer is collected, so that the call data is separated and processed subsequently, corresponding processing is realized according to the background environment of the customer, and the call efficiency is improved. The original answering dialect is preset according to the purposes of external calling such as product promotion, after-sale return visit and the like, so that the purpose of external calling is achieved. For example, if the purpose of calling out is to promote xx products, the original answer is a description of basic information and product advantages corresponding to xx products, so as to attract customers to purchase target products. If the external call aims at the post-sale return visit of the xx product, the original answering operation is the customer experience of the xx product or the return visit operation of other post-sale information, so that the service improvement is carried out according to the post-sale return visit result, and the external call aim is achieved.

In this embodiment, the intelligent outbound system stores an outbound list, the outbound list includes at least one client ID and a communication number corresponding to the client ID, an intelligent outbound interface is provided on the intelligent outbound system, an outbound button and a recording button are provided in the intelligent outbound interface, the outbound button is used to call the client in the outbound list, a preset original answer is used to communicate with the client, and when the client starts speaking, the recording button is triggered to obtain the call data. In this embodiment, when the client starts speaking, the recording button is triggered to acquire the call data, so that the call data acquired by the server only includes the client voice data and the background voice data, and the original answering speech broadcasted by the intelligent robot cannot be acquired, which is beneficial to the simplicity of subsequent voice separation processing.

S202: and separating the call data by adopting a voice separation model to obtain background voice data and client voice data.

The voice separation model is used for separating the call data to obtain a model for separating background voice data and client voice data, and provides technical support for subsequent background environment type determination. The voice separation model provided by the embodiment comprises a feature extraction layer and two full connection layers, so that the accuracy of obtaining background voice data and client voice data is guaranteed. The feature extraction layer includes, but is not limited to, a transform, an RNN (recurrent neural network), a CNN (convolutional neural network), and the like.

The separation process is a process of separating call data into background sound data and client voice data.

The background sound data refers to data formed by a background environment where a client is located during a call, and the background environment includes an environment suitable for the call and an environment unsuitable for the call. Here, the environment suitable for a call is an environment with a small score such as a quiet environment. The environment unsuitable for communication is an environment with a large division of shells, such as a construction environment. The customer voice data refers to voice data of the customer who replies to the sales robot. The customer voice data means data including only customer voice.

In the embodiment, the pre-trained voice separation model is adopted to separate the call data, background voice data and client voice data are rapidly obtained, a broadcast strategy can be generated based on the background voice data subsequently, the background voice data is reasonably utilized, the broadcast technique of the intelligent robot is adjusted, the intelligent degree of outgoing call of the telephone is improved, and the better effect of the outgoing call is ensured.

S203: and analyzing and processing the voice data of the client to obtain an intention analysis result, and obtaining an initial answering conversation based on the intention analysis result.

The analysis processing is processing for analyzing the voice data of the client to obtain the intention of the client.

The intention analysis result is a result indicating the intention represented by the customer voice data, for example, how the customer voice data is for xx products, the intention analysis result is to inquire xx product use and beneficial effects; or, the customer voice data is how xx products are used, and the intention analysis result is to inquire xx product use methods.

The initial answering is generated according to the voice data of the client so that the intelligent robot can answer the voice of the client, the initial answering has more professional and accurate terms, misleading of the client is avoided, and the intelligent robot can provide service for the client quickly to achieve the purpose of voice calling. It can be understood that the initial answer is preset, and is used for guiding the customer to know about the product and the after-sales experience feedback and the like when the customer is in conversation, so as to be beneficial to the purpose of finishing the voice call-out.

In the embodiment, the voice data of the client is analyzed to obtain the initial answering, so that the service can be provided for the client quickly, the voice calling purpose is achieved, and the intelligent robot can communicate with the client smoothly.

S204: and identifying the background sound data to obtain the background environment type.

The background environment type refers to the type of the background environment where the client is located during the call. The background environment type may be a call environment type, a non-call environment type, and the like. The communication environment type refers to a type corresponding to an environment suitable for communication, and the non-communication environment type refers to a type corresponding to an environment unsuitable for communication.

At present, only voice data of a client is often extracted during intelligent outbound, the intention result of the client is analyzed, the influence of the background environment type on the client is ignored, the outbound process is fixed, and flexible adjustment cannot be realized according to the background environment type so as to improve the success rate of the intelligent outbound. In this embodiment, the background classification model is used to process the background sound data to obtain a background environment type, so as to provide more reference factors for the intelligent outbound, so that a broadcast strategy is formulated by using the background environment type in the following, the success rate of the intelligent outbound is improved, the initial answer is flexibly adjusted, and the intelligent degree is improved.

S205: and if the background environment type is the conversation environment type, adjusting the initial answer according to the background environment type to obtain the target answer.

The communication environment type is an environment type suitable for the communication of the environment where the client is located, and specifically refers to a type corresponding to the environment suitable for the communication, for example, if the background environment is a quiet environment, the background environment type is a communication environment type.

The target answer is an answer obtained by adjusting the initial answer according to the background environment type. The target answering conversation refers to the background environment type factor, is closer to the actual situation of the client, is beneficial to improving the experience of the client in the conversation process, and improves the success rate of the outbound call.

In this embodiment, the initial answer is adjusted according to the background environment type, so as to flexibly adjust the answer and fit the actual environment of the user, for example, when the background environment type is the call environment type, the user can be asked whether the volume needs to be increased or decreased, and the initial answer is adjusted, so that the answer, the voice, the speed and the like in the broadcasting process are ensured to fit the environment of the user, and the success rate of the outbound call is effectively improved.

S206: and determining a broadcasting strategy according to the background environment type, broadcasting a target answering according to the broadcasting strategy, and acquiring an outbound result corresponding to the customer ID.

The broadcasting strategy is a strategy for adjusting broadcasting parameters such as broadcasting volume and broadcasting speed of the intelligent robot, so that the intelligent robot broadcasting process is more intelligent. For example, when the background environment type of the client is a quiet environment, the broadcast volume is reduced, the communication experience of the user is improved, the broadcast strategy is flexibly generated according to the environment of the client, and the outbound success rate is effectively improved. The outbound result is the result of the product recommended to the client by the outbound call.

In this embodiment, the broadcast policy is generated according to the background environment type, so as to intelligently generate different broadcast policies according to the actual background environment of the client, flexibly adjust the outbound process, ensure the outbound purpose, and effectively improve the experience of the client in the call process.

According to the call processing method based on the background environment, the call data corresponding to the client ID is obtained, the call data obtained by the server only comprises the client voice data and the background voice data, the original answering conversation broadcasted by the intelligent robot cannot be collected, and the simplicity of subsequent voice separation processing is facilitated. The voice separation model is adopted to separate the call data to obtain background sound data and client voice data, a broadcast strategy can be generated based on the background sound data subsequently, the background sound data is reasonably utilized, the voice technique broadcasted by the intelligent robot is adjusted, the intelligent degree of outgoing call of the phone is improved, and the better effect of the outgoing call is ensured. The voice data of the client is analyzed and processed to obtain the intention analysis result, and the initial answer is obtained based on the intention analysis result, so that the voice call method is beneficial to providing service for the client quickly, the voice call purpose is achieved, and the intelligent robot is smoothly communicated with the client. The background sound data is identified to obtain a background environment type, more reference factors are provided for intelligent outbound, so that a broadcasting strategy is formulated by using the background environment type subsequently, the success rate of the intelligent outbound is improved, the initial answering is flexibly adjusted, and the intelligent degree is improved. If the background environment type is the conversation environment type, the initial answer is adjusted according to the background environment type to obtain the target answer, so that the answering, voice, speed and the like in the broadcasting process are more suitable for the environment of the client, and the success rate of the outbound call is effectively improved. The method comprises the steps of determining a broadcasting strategy according to the background environment type, broadcasting a target answering according to the broadcasting strategy, obtaining an outbound result corresponding to the customer ID, intelligently generating different broadcasting strategies according to the background environment where the customer is actually located, flexibly adjusting an outbound process, achieving the purpose of outbound, and effectively improving the experience of the customer calling process.

In an embodiment, after step S204, i.e. after obtaining the background environment type, the method further includes:

s207: and if the background environment type is the non-call environment type, generating recommendation information according to the initial answering, and sending the recommendation information to the mobile phone terminal corresponding to the customer ID.

The recommendation information is information for recommending or returning a call to a client corresponding to the client ID, and the recommendation information is specifically a name of the client and a short message generated according to an initial response.

Specifically, when the background environment type is a non-call environment type, it indicates that the background environment type where the client is located is an environment with a large decibel, and the call is inconvenient, so that problems such as missed call or unclear voice and the like are inevitable in the communication process, and at this time, recommendation information is generated according to the initial answer, so that the recommendation success rate is improved, the client can be ensured to know the call process intuitively, and errors are avoided.

In this embodiment, the recommendation information corresponding to the client ID is formed according to the initial answer, and the specific process is as follows: the method comprises the steps of inquiring a database based on a client ID to obtain client portrait information corresponding to the client ID, analyzing the client portrait information to obtain a client favorite style, inquiring the database based on the client favorite style to obtain a corresponding short message template, generating personalized recommendation information according to the short message template and an initial answering, and improving the recommendation success rate. Wherein the database is a library for storing customer data. The client favorite style is a style representing client favorite, and is determined according to client portrait information of the client, so that the client favorite style is ensured to have pertinence. For example, it may be of a literature style or a rigorous style. The client portrait information may be personal basic information, social attributes, lifestyle habits, and the like, for example, the personal basic information may be client age, client address, client web browsing information, client gender, and the like.

In an embodiment, as shown in fig. 3, after step S206, that is, after the target answer is broadcasted according to the broadcasting policy and the outbound result corresponding to the client ID is obtained, the method further includes:

s301: and updating the total outbound times and the successful outbound times corresponding to the background environment type according to the background environment type and the outbound result.

The number of successful outbound calls is the number of successful outbound calls, for example, if the outbound call is a recommended product, the product is successfully promoted to the customer, and the customer purchases the product, the number of successful outbound calls is increased by 1. It will be appreciated that if the promotion of a product to a customer fails and the customer does not purchase the product, the number of outbound failures is increased by 1. The total number of outbound calls is the number of calls to the client ID, which is equal to the number of successful outbound calls plus the number of failed outbound calls.

S302: and generating the outbound success frequency corresponding to the client ID according to the total outbound times and the outbound success times.

Wherein the call statistics result is a result for counting success or failure in calling the client. The outbound success frequency is equal to the quotient of the outbound success times divided by the total outbound times.

S303: and if the total outbound frequency is greater than the preset frequency and the outbound success frequency is less than the preset frequency, deleting the client ID from the calling list.

S304: and if the total outbound frequency is not more than the preset frequency or the successful outbound frequency is not less than the preset frequency, generating a calling strategy corresponding to the client ID according to the background environment type and recording the calling strategy in a calling list.

The preset frequency is a preset frequency, for example, the preset frequency may be 50%, etc. The preset number of times is a preset number of times, for example, the preset number of times may be 10 times.

The call policy is a policy for calling a client corresponding to the client ID next time, where the call policy is a policy generated according to the background environment type, for example, when the background environment type is the call environment type, the number of successful outbound calls is high, it is determined that the client ID is in a time period corresponding to the call environment type, and the call policy may be a time period, a volume, a speed of sound, and the like when the client ID is in the call environment. As an example, the policy table is queried according to the background environment type and the client ID, and whether an original policy corresponding to the background environment type and the client ID exists in the policy table is determined; if the original strategy corresponding to the background environment type and the client ID exists in the strategy table, determining the original strategy as a call strategy corresponding to the background environment type; and if the original strategy corresponding to the background environment type and the client ID does not exist in the strategy table, inquiring the strategy table according to the background environment type and the client ID to acquire the call strategy corresponding to the background environment type.

In the embodiment, corresponding processing is respectively performed according to the total outbound frequency and the actual situation of the successful outbound frequency, so that subsequent work tasks can be reasonably distributed, and the work efficiency is improved. And when the total outbound frequency is greater than the preset frequency and the successful outbound frequency is less than the preset frequency, the client is not a potential client, and the client ID is deleted from the call list at the moment, so that the subsequent flow steps are simplified. And when the total outbound frequency is not more than the preset frequency or the outbound success frequency in the outbound result is not less than the preset frequency, generating a calling strategy corresponding to the client ID according to the background environment type so as to improve the probability of the outbound success.

In the call processing method based on the background environment provided by this embodiment, the total outbound frequency and the successful outbound frequency corresponding to the background environment type are updated according to the background environment type and the outbound result; generating an outbound success frequency corresponding to the customer ID according to the total outbound times and the outbound success times; if the total outbound frequency is greater than the preset frequency and the outbound success frequency is less than the preset frequency, deleting the client ID from the calling list; if the total outbound frequency is not more than the preset frequency or the successful outbound frequency is not less than the preset frequency, a calling strategy corresponding to the client ID is generated according to the background environment type and recorded in a calling list, and corresponding processing is respectively carried out according to the actual conditions of the total outbound frequency and the successful outbound frequency, so that subsequent work tasks can be reasonably distributed, and the work efficiency is improved.

In an embodiment, as shown in fig. 4, in step S202, performing separation processing on call data by using a voice separation model to obtain background voice data and client voice data, including:

s401: and preprocessing the call data to obtain a mixed voice vector.

The preprocessing is a process of converting the converted speech data into a vectorized mixed speech vector.

A mixed speech vector refers to a vector that contains both customer speech and background sounds.

Specifically, the conversation data is encoded to obtain encoded data, the encoded data is input into a feature extraction layer, the conversation data is subjected to feature extraction to obtain feature representation of initialized data, and mixed voice vectorization features including client voice and background voice are obtained. The coding process is a process of digitizing the call data to obtain a digitized sequence corresponding to the call data and generating a format recognizable by a computer. The feature extraction layer is used for extracting features in the encoded data and forming vectors recognizable to a computer, and in this embodiment, the feature extraction layer is a transform feature extraction layer to form mixed speech vectors.

In the embodiment, the client voice and the background voice share the basic feature extraction layer, so that the call data is processed, the number of model parameters can be effectively reduced, and the processing efficiency is accelerated. If two separated feature extraction layers are used for respectively processing initialization data containing customer voice and background sound to obtain basic feature representation of the customer voice and the background sound, the calculation amount is large, and the processing efficiency is low.

S402: and carrying out client voice filtering processing on the mixed voice vector by adopting a first full connection layer to obtain client voice data.

In this embodiment, the client speech filtering process is a process for filtering out a background speech vector in a mixed speech vector to obtain cleaner client speech data, and provides technical support for subsequently obtaining an initial answer according to the client speech data.

S403: and performing background sound filtering processing on the mixed voice vector by adopting a second full-connection layer to obtain background sound data.

In this embodiment, the background sound filtering process provides technical support for subsequently ensuring the background environment type corresponding to the background sound data in order to filter the customer speech vectors in the mixed speech vector.

Before the speech separation model is used, the speech separation model needs to be trained, and the training stage is as follows: collecting clean background sound data and clean client voice data in advance, wherein the clean background sound data corresponds to an environment type identifier, mixing the background sound data carrying the environment type identifier and the client voice data to obtain a plurality of training sample data, understandably, each training sample data corresponds to the environment type identifier, extracting the characteristics of the training sample data by adopting a characteristic extraction layer of an original model to obtain training vector characteristics, carrying out client voice filtering processing on the training vector characteristics by adopting a first full connection layer to obtain first training data, carrying out background sound filtering processing on the training vector characteristics by adopting a second full connection layer to obtain second training data, calculating a first similarity between the first training data and the clean background sound data, calculating a second similarity between the second training data and the clean client voice data, and calculating prediction error loss according to the first similarity and the second similarity, updating parameters of the original model according to the prediction error loss, and finishing the training of the voice separation model when the original model is converged.

The call processing method based on the background environment provided by the embodiment preprocesses the call data to obtain the mixed voice vector, so that the call data is processed, the number of model parameters can be effectively reduced, and the processing efficiency is accelerated. And performing client voice filtering processing on the mixed voice vector by adopting the first full connection layer to obtain client voice data so as to obtain cleaner client voice data, and providing technical support for obtaining an initial answer for subsequent processing according to the client voice data. And performing background sound filtering processing on the mixed voice vector by adopting a second full-connection layer to obtain background sound data, and providing technical support for ensuring a background environment type corresponding to the background sound data subsequently.

In an embodiment, as shown in fig. 5, step S203 is to perform analysis processing on the client voice data, obtain an intention analysis result, and obtain an initial answer based on the intention analysis result, including;

s501: and recognizing the voice data of the client by adopting a voice recognition model to obtain the current text data.

The speech recognition model is a model for recognizing speech in the customer speech data. The current text data is text data converted from the client voice data after voice recognition is performed on the client voice data.

Specifically, the method comprises the steps of carrying out pre-emphasis processing on client voice data, carrying out framing and windowing operation on the processed client voice data, then carrying out fast Fourier transform and logarithm operation processing, and finally carrying out discrete cosine transform to quickly acquire current voice characteristics so as to eliminate interference information in the client voice data and ensure that the acquired current voice characteristics retain effective data capable of carrying out voice recognition. And the current voice characteristics are recognized by adopting a voice recognition model, so that the obtained current text data is accurate and objective. The current speech features include, but are not limited to, prosodic features, psychoacoustic features, spectral features, lexical features, and voiceprint features.

Before the speech recognition model is adopted to recognize the current speech features, the speech recognition model needs to be trained, so that the speech recognition model after training can recognize the client speech data, and then the client speech data is converted into the corresponding current text data. And the current text data is obtained by adopting the voice recognition model, so that the current text data has higher accuracy. The training process of the speech recognition model specifically comprises the following steps: clean historical voice data (without background voice data) of a plurality of different clients are selected as recognition samples, each recognition sample corresponds to a voice text (which can be understood as a character corresponding to the historical voice data), recognizing the recognition sample through a voice recognition model with initial parameters to obtain a recognition text, the initial parameters are adjusted slightly according to the deviation degree between the recognition text and the voice text until a trained voice recognition model is finally obtained, after a recognition sample is input into the speech recognition model, the deviation degree between the output recognition text and the speech text corresponding to the recognition sample can be controlled within a preset threshold value, at this time, the training success of the speech recognition model is represented, and the trained voice recognition model is used for carrying out voice recognition on the client voice data.

S502: and performing semantic analysis on the current text data by adopting an intention analysis model to obtain an intention analysis result.

The intention analysis model is a classification model adjusted and optimized based on a pre-training model, so that the pre-trained model is used for analyzing the current text data, the intention of the client is determined, an intention analysis result is formed, and the process of automatic conversation is realized. The pre-training models include, but are not limited to, word2vec, BERT, and GPT models.

The intention analysis model is a model obtained by training the voice text and the corresponding intention label, for example, if the voice text is "what is the weather today", the corresponding intention analysis result is "ask the weather".

S503: and inquiring a dialog table based on the intention analysis result to obtain an initial answering dialog.

Specifically, different initial answers and client intentions (i.e., intention analysis results) corresponding to the initial answers are configured in the database in advance to form an answer table. When the intention analysis result is obtained, the matching algorithm is adopted to query the speech table so as to match the corresponding initial answer according to the intention analysis result, and the corresponding initial answer is adopted to communicate with the client, so that the efficiency of the outbound call is improved, the pertinence of the outbound call is ensured, and the purpose of the voice outbound call is fulfilled.

The call processing method based on the background environment provided by the embodiment adopts the voice recognition model to recognize the voice data of the client and acquire the current text data, so that the acquired current text data is accurate and objective. And performing semantic analysis on the current text data by adopting an intention analysis model to obtain an intention analysis result, thereby realizing an automatic conversation process. And inquiring a word list based on the intention analysis result to obtain an initial answering word, matching the corresponding initial answering word according to the intention analysis result, and communicating with the client by adopting the corresponding initial answering word, so that the efficiency of calling the outside is improved, the pertinence of calling the outside is ensured, and the aim of calling the outside by voice is fulfilled.

In one embodiment, as shown in fig. 6, in step S204, the background sound data is subjected to an identification process to obtain a background environment type, which includes

S601: and extracting the features of the background sound data to obtain the background sound vector features.

The background sound vector features of the present embodiment are features for representing the background sound signal, and include, but are not limited to Mel frequency cepstrum coefficients, time-frequency features of the background sound signal, and power features of the background sound.

The Mel frequency cepstrum coefficient is obtained by pre-emphasizing background sound data to obtain pre-emphasized data, so that the background sound data of voice is emphasized, the signal-to-noise ratio is improved, the influence of lip radiation is removed, and the high-frequency resolution of voice is increased; performing framing processing on the pre-emphasis data to obtain framed data so as to ensure smooth transition of voice and maintain continuity; windowing is performed on the subframe data to obtain windowed data, and the frequency characteristics of the short-time signal can be reflected to a high degree. Processing the windowed data through FFT to obtain a windowed frequency spectrum; and the windowing frequency spectrum is processed by a Mel filter bank to obtain a Mel frequency spectrum, so that the data is close to the human auditory sense, and the subsequent classification accuracy is ensured. And performing cepstrum analysis on the Mel frequency spectrum to obtain Mel frequency cepstrum coefficients.

The time-frequency characteristics of the background sound signal are obtained by using a short-time fourier transform.

In this embodiment, feature extraction is performed on the background sound data to obtain background sound vector features, so as to ensure that the background sound data, which is originally a voice signal, is converted into a voice feature vector that can be processed by a computer, thereby facilitating subsequent determination of the background environment type.

S602: and identifying the background sound vector characteristics by adopting a background classification model to obtain the background environment type.

In this embodiment, the background classification model is a model for identifying the background sound data to output a background environment type corresponding to the background sound data, so as to intelligently determine the background environment type and provide more reference factors for the intelligent outbound system, so that a broadcast strategy is formulated by using the background environment type in the following process, and the success rate of the intelligent outbound is improved. The background classification model used in the embodiment is a support vector machine model, and the support vector machine model is obtained by a small number of training sample data, so that the training time is effectively reduced, the algorithm is simple, and the robustness is better.

The call processing method based on the background environment provided by this embodiment performs feature extraction on the background sound data to obtain the background sound vector features, so as to ensure that the background sound data, which is originally a voice signal, is converted into a voice feature vector that can be processed by a computer, thereby facilitating subsequent determination of the type of the background environment. The background classification model is adopted to identify the background sound vector characteristics, the background environment type is obtained, the intelligent determination of the background environment type is realized, more reference factors are provided for the intelligent outbound system, the broadcast strategy is formulated by using the background environment type in the following process, and the success rate of the intelligent outbound is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

In an embodiment, a call processing apparatus based on a background environment is provided, and the call processing apparatus based on the background environment corresponds to the call processing method based on the background environment in the foregoing embodiment one to one. As shown in fig. 7, the call processing apparatus based on the background environment includes a call data obtaining module 701, a separation processing module 702, an analysis processing module 703, an identification processing module 704, a target answering call obtaining module 705, and an outbound result obtaining module 706. The functional modules are explained in detail as follows:

a call data obtaining module 701, configured to obtain call data corresponding to the client ID.

And the separation processing module 702 is configured to separate the call data by using a voice separation model to obtain background voice data and client voice data.

The analysis processing module 703 is configured to perform analysis processing on the client voice data, obtain an intention analysis result, and obtain an initial answer based on the intention analysis result.

And the identification processing module 704 is configured to perform identification processing on the background sound data to obtain a background environment type.

A target answer obtaining module 705, configured to, if the background environment type is the call environment type, adjust the initial answer according to the background environment type to obtain the target answer.

And the outbound result obtaining module 706 is configured to determine a broadcasting policy according to the background environment type, broadcast a target answer according to the broadcasting policy, and obtain an outbound result corresponding to the client ID.

Preferably, after identifying the processing module 704, the apparatus further comprises: recommendation information generation module 707.

And a recommendation information generating module 707, configured to generate recommendation information according to the initial answer if the background environment type is a non-call environment type, and send the recommendation information to a mobile phone terminal corresponding to the client ID.

Preferably, after the outbound result obtaining module 706, the apparatus further comprises: the system comprises an updating module, an outbound success frequency generating module, a client ID deleting module and a calling strategy generating module.

And the updating module is used for updating the total outbound times and the successful outbound times corresponding to the background environment type according to the background environment type and the outbound result.

And the outbound success frequency generation module is used for generating the outbound success frequency corresponding to the client ID according to the total outbound times and the outbound success times.

And the client ID deleting module is used for deleting the client ID from the calling list if the total outbound frequency is greater than the preset frequency and the outbound success frequency is less than the preset frequency.

And the calling strategy generating module is used for generating a calling strategy corresponding to the customer ID according to the background environment type and recording the calling strategy in a calling list if the total outbound frequency is not more than the preset frequency or the outbound success frequency is not less than the preset frequency.

Preferably, the separation processing module 702 includes: the system comprises a preprocessing unit, a client voice data acquisition unit and a background sound data acquisition unit.

And the preprocessing unit is used for preprocessing the call data to obtain a mixed voice vector.

And the client voice data acquisition unit is used for carrying out client voice filtering processing on the mixed voice vector by adopting the first full connection layer to obtain client voice data.

And the background sound data acquisition unit is used for performing background sound filtering processing on the mixed voice vector by adopting a second full-connection layer to obtain background sound data.

Preferably, the analysis processing module 703 includes a current text data obtaining unit, an intention analysis result obtaining unit, and an initial answer obtaining unit.

And the current text data acquisition unit is used for identifying the voice data of the client by adopting the voice identification model to acquire the current text data.

And the intention analysis result acquisition unit is used for performing semantic analysis on the current text data by adopting an intention analysis model to acquire an intention analysis result.

And the initial answer obtaining unit is used for inquiring the answer table based on the intention analysis result to obtain the initial answer.

Preferably, the recognition processing module 704 includes a feature extraction unit and a recognition unit.

And the feature extraction unit is used for extracting features of the background sound data to acquire background sound vector features.

And the identification unit is used for identifying the background sound vector characteristics by adopting the background classification model and acquiring the background environment type.

For the specific definition of the call processing device based on the background environment, reference may be made to the above definition of the call processing method based on the background environment, which is not described herein again. The various modules in the context-based call processing apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 8. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device stores a call list. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a context based call processing method.

In an embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the call processing method based on the background environment in the foregoing embodiments are implemented, for example, steps S201 to S206 shown in fig. 2 or steps shown in fig. 3 to fig. 6, which are not described herein again to avoid repetition. Alternatively, the processor implements the functions of each module/unit in the call processing apparatus based on the background environment when executing the computer program, for example, the functions of the call data obtaining module 701, the separation processing module 702, the analysis processing module 703, the identification processing module 704, the target answer call obtaining module 705 and the outbound result obtaining module 706 shown in fig. 7, and are not described herein again to avoid repetition.

In an embodiment, a computer-readable storage medium is provided, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the call processing method based on the background environment in the foregoing embodiments, such as steps S201 to S206 shown in fig. 2 or steps shown in fig. 3 to fig. 6, which are not described herein again to avoid repetition. Alternatively, the processor implements the functions of each module/unit in the call processing apparatus based on the background environment when executing the computer program, for example, the functions of the call data obtaining module 701, the separation processing module 702, the analysis processing module 703, the identification processing module 704, the target answer call obtaining module 705 and the outbound result obtaining module 706 shown in fig. 7, and are not described herein again to avoid repetition.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for call processing based on a background environment, comprising:

acquiring call data corresponding to a client ID;

identifying the background sound data to obtain a background environment type;

2. The context-based call processing method of claim 1, wherein after said obtaining a context type, said method further comprises:

and if the background environment type is a non-call environment type, generating recommendation information according to the initial answer, and sending the recommendation information to a mobile phone terminal corresponding to the customer ID.

3. The call processing method based on the background environment according to claim 1, wherein after said target answer is announced according to said announcement policy and an outbound result corresponding to said client ID is obtained, said method further comprises:

updating the total outbound times and the successful outbound times corresponding to the background environment type according to the background environment type and the outbound result;

generating an outbound success frequency corresponding to the customer ID according to the total outbound times and the outbound success times;

if the total outbound frequency is greater than the preset frequency and the successful outbound frequency is less than the preset frequency, deleting the client ID from the call list;

and if the total outbound frequency is not more than the preset frequency or the successful outbound frequency is not less than the preset frequency, generating a call strategy corresponding to the client ID according to the background environment type and recording the call strategy in a call list.

4. The method for call processing based on background environment as claimed in claim 1, wherein said separating the call data by using the voice separation model to obtain the background voice data and the client voice data comprises:

preprocessing the call data to obtain a mixed voice vector;

adopting a first full-connection layer to carry out client voice filtering processing on the mixed voice vector to obtain client voice data;

and performing background sound filtering processing on the mixed voice vector by adopting a second full-connection layer to obtain background sound data.

5. The method for processing call based on background environment as claimed in claim 1, wherein said analyzing said client voice data to obtain an intention analysis result, and obtaining an initial answer based on said intention analysis result, comprises:

recognizing the client voice data by adopting a voice recognition model to obtain current text data;

semantic analysis is carried out on the current text data by adopting an intention analysis model, and an intention analysis result is obtained;

and inquiring a dialog table based on the intention analysis result to obtain an initial answering dialog.

6. The method of claim 1, wherein the identifying the background sound data to obtain a type of background environment comprises

Extracting the characteristics of the background sound data to obtain background sound vector characteristics;

and identifying the background sound vector characteristics by adopting a background classification model to obtain the background environment type.

7. A call processing apparatus based on a background environment, comprising:

8. The context-based call processing apparatus of claim 7, wherein after the identifying means, the apparatus further comprises:

and the recommendation information generation module is used for generating recommendation information according to the initial answering call if the background environment type is a non-call environment type, and sending the recommendation information to a mobile phone terminal corresponding to the customer ID.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the context-based call processing method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the context-based call processing method according to any one of claims 1 to 6.