CN112714221A

CN112714221A - Method, system and related equipment for detecting intelligent voice of federated

Info

Publication number: CN112714221A
Application number: CN202011553435.0A
Authority: CN
Inventors: 孔令炜; 王健宗; 黄章成
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-04-27

Abstract

The application provides a method, a system and related equipment for detecting intelligent voice of the Federation, wherein the method comprises the following steps: the voice of the communication between the customer service and the customer is transmitted to the federal intelligent center, the federal intelligent center encrypts the voice and transmits the voice to the federal inference engine at the cloud, and the federal inference engine performs voice recognition and quality inspection judgment on voice data based on a neural network model, judges whether violation behaviors exist in the communication, and achieves voice detection on the on-line customer service.

Description

Method, system and related equipment for detecting intelligent voice of federated

Technical Field

The application relates to the field of customer service quality inspection, in particular to a method, a system and related equipment for detecting intelligent voice of federated nation.

Background

Telephone customer service is always available in many industries and fields, and mainly aims to provide better experience and service for customers and solve the problems of service use of users. With the development and progress of the internet, the concept of telephone customer service is gradually changed into online customer service, and the types of services provided are more and more. The customer service quality inspection is to improve customer satisfaction, perfect customer service and simultaneously evaluate the work of customer service staff.

The method commonly used in the industry at present is based on a voice recognition technology and a semantic analysis technology to perform recording quality inspection, but the quality inspection mode has the problems of poor real-time performance of system updating, poor quality inspection effect, poor data privacy and the like.

Disclosure of Invention

The application provides a method, a system and related equipment for detecting a federated intelligent voice, which can monitor and detect the conversation between a customer service and a client when the customer service and the client converse, judge whether the customer service and the client have an illegal behavior in the conversation, and process the illegal behavior.

In a first aspect, the present application provides a voice detection method, which is applied to a system including a federal intelligent center, a federal inference engine and a customer service system background. The method comprises the steps that a federal intelligent center obtains first audio data, converts the first audio data into ciphertext signals and sends the ciphertext signals to a federal inference engine, wherein the first audio data are audio data of customer service or a customer; the federal reasoning engine converts the ciphertext signal into first audio data, performs voice recognition and quality inspection judgment on the first audio data to obtain a reasoning result, encrypts and sends the reasoning result to the federal intelligent center, and the reasoning result indicates whether customer service or a client violates rules or not; the federal intelligent center decrypts the data to obtain a reasoning result and sends the reasoning result to a customer service system background; and the background of the customer service system processes according to the inference result.

The voice of the communication between the customer service and the customer is subjected to voice recognition and quality inspection judgment through a voice detection method based on the federal intelligence, and is timely processed when violation behaviors occur, so that the online customer service is monitored and monitored on the basis of ensuring data safety.

In one possible implementation manner, before the first audio data is converted into the ciphertext signal, the federal intelligent center performs voice enhancement processing on the first audio data.

The environment of the customer service and the client is likely to be disturbed by the outside world, which leads to recording quality problems, manifested by noise and noises. The federal intelligent center carries out voice enhancement processing on the received call record, so that background noise can be reduced, voice quality can be improved, and accuracy of subsequent voice recognition can be improved.

In a possible implementation mode, before the federal intelligent center acquires the first audio data, the federal reasoning engine trains a neural network model based on training data, and model parameters of the neural network model obtained after training are transmitted to a federal center node in an encrypted manner; the federal reasoning engine receives the aggregation parameters sent by the federal center node, and trains the neural network model according to the training data and the aggregation parameters to obtain a trained neural network model; the aggregation parameters are obtained by aggregating model parameters sent by a plurality of participants by the federal central node, and the plurality of participants comprise a federal inference engine.

The neural network model is trained through a federal learning method, so that the safety of data of all parties can be guaranteed, privacy data of all parties are not revealed, meanwhile, a data set is expanded through joint training of multi-party data, and the accuracy of an identification analysis result is improved.

In one possible implementation, the first audio data is subjected to speech recognition and quality inspection evaluation. And the federal reasoning engine decrypts the ciphertext signal to obtain first audio data, and the neural network model is used for carrying out voice recognition and quality inspection judgment on the first audio data to obtain a reasoning result. And processing the first audio data by using a neural network model based on federated learning modeling to obtain text output and semantic analysis of customer service and client voice.

In one possible implementation, the quality inspection evaluation includes performing prescribed word recognition, emotion-sensitive word recognition, and illicit word recognition on the speech recognition result of the first audio data.

According to a preset rule, keywords appearing in the call are identified, a regulated expression is identified and judged whether the customer service uses regulated speech communication, emotion sensitive words are identified and obtained emotion parameters of a call object, and illegal words are identified and judged whether illegal and illegal expressions appear in the call.

In one possible implementation, the customer service system background performs processing according to the inference result. And when the customer service or the customer breaks rules, the background of the customer service system cuts off the communication between the customer service and the customer.

And when the inference result judges that the violation occurs in the communication between the customer service and the client, the background of the customer service system cuts off the current conversation to prevent the violation from continuing.

In a second aspect, the present application provides a voice detection system, which includes a federal intelligent center, a federal inference engine, and a customer service system background. The federal intelligent center is used for acquiring first audio data, converting the first audio data into ciphertext signals and sending the ciphertext signals to the federal inference engine, wherein the first audio data is audio data of customer service or a client; the federal reasoning engine is used for converting the ciphertext signal into first audio data, performing voice recognition and quality inspection judgment on the first audio data to obtain a reasoning result, encrypting and sending the reasoning result to the federal intelligent center, and indicating whether customer service or a client violates rules or not by the reasoning result; the federal intelligent center decrypts the data to obtain a reasoning result and sends the reasoning result to a customer service system background; and the customer service system background is used for processing according to the reasoning result.

In a third aspect, the present application provides a server device, including a module configured to perform operations as performed by the federal intelligent center in the first aspect or any possible implementation manner of the first aspect.

In a fourth aspect, the present application provides a server device comprising means for performing operations as performed by the federated inference engine in the first aspect or any possible implementation of the first aspect.

In a fifth aspect, the present application provides a computer storage medium storing a computer program which, when executed by a processor, implements a method as set forth in the first aspect or any possible implementation manner of the first aspect.

Drawings

FIG. 1 is a diagram of a federated learning model provided in the prior art;

fig. 2 is a flowchart of a federated intelligent voice detection method provided in the embodiment of the present application;

FIG. 3 is a schematic diagram of a method for training a neural network model by a federated intelligence engine according to an embodiment of the present application;

fig. 4A is a schematic diagram of a federated intelligent voice detection system according to an embodiment of the present application;

FIG. 4B is a schematic diagram of another federated intelligent voice detection system provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a server provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that some embodiments, but not all embodiments of the present application are based on the embodiments of the present application, and all other embodiments obtained by a person skilled in the art without making creative efforts belong to the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

First, an application scenario of the embodiment of the present application is described. The customer service quality inspection is to perform online detection on the conversation between the customer and the customer service to prevent any party from violating the company regulation or national laws and regulations in the service process. With the development of technology, the forms of online customer service become diversified, and can be roughly divided into text customer service, telephone customer service and video customer service. Except for character customer service, the current common customer service quality inspection mode is to store the call records of the customer and the customer service, and detect the call contents after the call is finished, so that illegal behaviors of the customer service cannot be detected in real time, and the real-time performance is poor. For various online customer service forms at present, the traditional customer service quality inspection system no longer meets the service requirements.

The embodiment of the application establishes an intelligent voice detection system based on a federal intelligent center, the conversation voice of both sides of a customer service and a client is transmitted to the federal intelligent center in real time, a federal voice reasoning engine is used for voice recognition and semantic analysis, the service process of the customer service is detected in real time, and illegal behaviors can be processed in time when the illegal behaviors occur in the conversation process.

In order to facilitate understanding of the technical solutions of the present application, some terms related to the present application are explained below. It is worthy to note that the terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.

Federal learning is a machine learning framework that can help organizations to use data and model machine learning while meeting the requirements of user privacy protection, data security and government regulations. The traditional deep learning method cannot train on fragmented data samples, and federal learning provides an invisible training mode for deep learning, so that the problem of data isolated island is solved.

Referring to fig. 1, fig. 1 is a schematic diagram of a federated learning model, where each federated learning modeling participant trains the model using local data and returns parameters to be updated to a central server; the central server aggregates the parameters returned by each party and feeds back the latest model parameters to each party. The training process typically includes the following four steps:

the method comprises the following steps: the participant calculates the training gradient locally, and sends the encrypted gradient to the central server by using an encryption sharing technology;

step two: the central server performs secure aggregation on the encrypted gradient sent by each participant without knowing the specific data of any participant;

step three: the central server sends the result after the safety aggregation back to the participant;

step four: and decrypting by the participants to obtain the aggregated gradient, and updating respective models.

In the process, all the participants have the same and complete models and do not communicate with each other and are independent, so that the independence and privacy of data are ensured, and the models can be predicted independently during prediction.

The federal intelligent voice detection method in the embodiment of the present application is described below. Referring to fig. 2, fig. 2 is a flowchart of a federated intelligent voice detection method provided in the embodiment of the present application.

S201, the federal intelligent center obtains first audio data.

The customer service and the customer establish a conversation service in the background of the customer service system, and an audio data stream of voice information is generated, for example, the customer service voice ' good afternoon, ask what can help you ' and the customer voice ' you ' and i want … … '. In the process, the background of the customer service system can record the audio data of the customer service or the customer in real time, and the federal intelligent center acquires the audio data of the customer service or the customer as first audio data.

The user service system background can monitor the communication between a plurality of groups of user services and the user at the same time, and the federal intelligent center can receive a plurality of audio data from a plurality of user service system backgrounds at the same time.

S202, the federal intelligent center processes and encrypts the first audio data to obtain ciphertext signals, and the ciphertext signals are pushed to a federal inference engine.

After the federal intelligent center obtains the first audio data, the first audio data is encrypted by using a chaotic encryption algorithm based on Logistic regression (Logistic).

x_n+1＝μ*x_n*(1-x_n),(n＝0,1,…,N) (1)

Equation 1 is to generate a chaotic sequence using a Logistic function. Setting the number of feature points needing to be encrypted of the current first audio data as N; n represents the serial number of the last encryption characteristic point; n +1 is the serial number of the feature point needing to be encrypted currently; x is the number of_nThe encryption base number used for the characteristic point with the sequence number n. Using this equation requires setting two values, an initial value x₀And an iterative parameter chaos factor mu, preferably, the value range 0 < x₀When mu is less than 1 and less than or equal to 3.6 and less than or equal to 4, the Logistic function effect is good.

When formula 1 iterates n times, x is obtained₁,x₂,…,x_nAnd n values in total, namely a chaotic sequence. And carrying out exclusive OR operation on the first audio data and the chaotic sequence to obtain a ciphertext signal of the current audio data, and pushing the ciphertext signal and the public key to a federal inference engine.

Optionally, before encrypting the first audio data, the first audio data may be subjected to speech enhancement processing. The environment of the customer service and the customer is likely to be disturbed by the outside world, which leads to quality problems of the first audio data, manifested in noise and noises. The federal intelligent center performs voice enhancement processing on the first audio data, suppresses and reduces noise interference, and a specific voice enhancement algorithm commonly uses a spectral subtraction method, a self-adaptive noise cancellation method and the like. And the federal intelligent center performs exclusive or operation on the chaotic sequence generated by the Logistic function and the first audio data to obtain the chaotic voice sequence encrypted by the current first audio data and form a ciphertext signal.

S203, the federal reasoning engine decrypts the ciphertext signal to obtain first audio data, completes voice recognition and quality inspection judgment, obtains a reasoning result and encrypts and returns the reasoning result to the federal intelligent center.

After the federal reasoning engine at the cloud end obtains the ciphertext signal and the public key pushed by the federal intelligent center, decryption is carried out through the reverse process of the chaotic algorithm, and original first audio data are obtained. And the federal reasoning engine performs voice recognition and quality inspection judgment on the original first audio data, wherein the federal reasoning engine performs voice recognition and quality inspection judgment by using a joint modeling and fusion multi-party neural network model to obtain a reasoning result.

Specifically, as shown in fig. 3, the federal inference engine trains a neural network model based on training data, and encrypts and transmits model parameters of the neural network model obtained after training to the federal central node; the method comprises the following steps that a federal central node aggregates model parameters sent by a plurality of participants; and the federal reasoning engine receives the aggregation parameters sent by the federal central node, and trains the neural network model according to the training data and the aggregation parameters to obtain the trained neural network model. Wherein one federated inference engine is one participant.

Wherein the speech recognition comprises feature extraction and pattern matching. The feature extraction is performed by pre-emphasis, framing, windowing, Fast Fourier Transform (FFT), Mel filter bank, logarithm operation, and Discrete Cosine Transform (DCT) to obtain the feature of Mel-Frequency Cepstral Coefficient (MFCC) in 12 dimensions. The pattern matching marks the MFCC features with phonemes, a dictionary is used for generating a character sequence, the MFCC features are generated into characters, then a character sequence with the maximum probability is given according to language probability statistics, and the output character sequence is a voice recognition result.

The quality inspection and evaluation comprises the steps of carrying out regulated expression recognition, emotion sensitive word recognition and forbidden word recognition on a voice recognition result of the first audio data, and judging whether a violation behavior exists in a client or a customer service. When the regulated expressions are not identified, judging that violation behaviors exist, and recording the specified expressions missed by customer service; and when the emotion sensitive words and the forbidden words are identified, judging that the illegal behaviors exist, and recording the identified emotion sensitive words and the forbidden words.

And the federal reasoning engine encrypts and returns a reasoning result to the federal intelligent center, and the reasoning result indicates whether the violation behavior exists in the communication between the client and the customer service.

And S204, after the federal intelligent center decrypts the obtained reasoning result, the reasoning result is pushed to a customer service system background for corresponding processing.

And after receiving the encrypted feedback of the federal inference engine, the federal intelligent center decrypts the encrypted feedback to obtain an inference result and pushes the inference result to a customer service system background. And the background of the customer service system performs corresponding processing according to the quality inspection judgment result in the inference result.

If the violation occurs in the communication between the customer service and the customer, the current communication is immediately cut off, and the violation is prevented from continuing; if no violation occurs or the call is allowed to continue in spite of violation, the voice recognition result is pushed to the front end for displaying, the background of the customer service system prompts whether the customer service has missed problem points and specified terms according to the quality inspection result, illegal call operation existing in the customer service is displayed in real time, and the customer service is reminded to pay attention to evasion in the next call.

And repeating the steps S201-S204, continuously monitoring the call between the customer service and the customer until the call is finished, and storing the complete record of the call.

Besides the above embodiments, the present application example may also support offline quality inspection as a subsequent auxiliary evaluation scheme. In a possible implementation mode, the recording before a period of preset time is transmitted back to the federal intelligent center for voice enhancement processing, and the processed audio is sent to the federal inference engine in an encrypted manner.

Since the task of real-time speech recognition usually uses UDP to transmit real-time speech, the problem of recording quality is easily caused, which mainly includes acceleration, jamming and missing of recording. The problem caused by UDP real-time transmission can be effectively solved by using the stored and returned recording to carry out off-line recheck. Meanwhile, the environment of the customer service and the client is possibly interfered by the outside, which also causes the problem of recording quality, and more represents noise and noise. The federal intelligent center performs voice enhancement processing on the received call record, and inhibits and reduces noise interference. The specific speech enhancement algorithm is usually spectral subtraction, adaptive noise cancellation, etc.

The speech enhancement processing is carried out, so that the background noise can be reduced, the speech quality can be improved, and the accuracy of subsequent speech recognition can be improved. And the federal intelligent center completes voice recognition through a federal reasoning engine and analyzes the result to generate a report. And corresponding recognition results and analysis reports are transmitted back to the customer service system background, and the customer service system background checks the results and the reports and searches for problems and loopholes.

The federal intelligent voice detection system provided in the embodiment of the present application is introduced below, and is used to implement the federal intelligent voice detection method provided in the embodiment of the present application. Referring to fig. 4A, fig. 4A is a schematic diagram of a federated intelligent voice detection system provided in the embodiment of the present application.

The federal intelligent voice detection system comprises a customer service system background 401, a federal intelligent center 402 and a federal reasoning engine 403. The customer service system background 401 is used for monitoring the conversation between the customer and the customer service; the federal intelligent center 402 is used for encrypted transmission of the first audio data; the federal reasoning engine 403 is located at the cloud and is used for performing voice recognition and quality inspection judgment on the first audio data in combination with the multi-party voice recognition model.

Referring to fig. 4B, the customer service system background 401A, the customer service system background 401B, and the customer service system background 401C can simultaneously monitor the communication between multiple groups of customer services and customers, the federal smart center 402 can simultaneously receive multiple audio data from multiple customer service system backgrounds, such as the customer service system background 401A, the customer service system background 401B, and the customer service system background 401C, and the federal smart center 402 sends the multiple audio data to the federal inference engine 403A. A plurality of federal inference engines such as a federal inference engine 403A, a federal inference engine 403B, a federal inference engine 403C, etc. are deployed on the cloud end.

Federal intelligent center 402 acquires audio data of customer service and customer, respectively, as first audio data. And encrypting the first audio data to obtain a ciphertext signal, and pushing the ciphertext signal to the federal inference engine.

The federal inference engine 403 decrypts the ciphertext signal to obtain the original first audio data, completes voice recognition and quality inspection judgment, obtains an inference result, encrypts and transmits the inference result back to the federal intelligent center 402.

And after decrypting and acquiring the inference result, the federal intelligent center 402 pushes the inference result to the customer service system background 401 for corresponding processing. If the violation occurs in the communication between the customer service and the customer, the background of the customer service system immediately cuts off the current communication, and the violation is prevented from continuing; if no violation occurs or the conversation is allowed to continue in spite of violation, the background of the customer service system pushes the voice recognition result of the first audio data to the front end for displaying, and prompts whether missing problem points and specified phrases exist in the customer service or not according to the quality inspection judgment result, so that the background of the customer service system displays violation conversation skills existing in the customer service in real time and reminds the customer service to pay attention to avoiding in the next conversation.

The embodiment of the application provides a method for detecting the call between the customer service and the customer in real time based on the idea of federal learning, so that the privacy and the safety of data in the detection process are guaranteed, a better detection effect is obtained through federal modeling, and the violation behaviors in the call between the customer service and the customer are monitored and processed in time.

Fig. 5 is a schematic structural diagram of a server provided in an embodiment of the present application, where the federal intelligent center 402 is disposed on the server 500. Server 500 includes modules for implementing operations performed by a federal intelligent center, including: one or more processors 510, a communications interface 520, and a memory 530. Optionally, the processor 510, the communication interface 520, and the memory 530 are interconnected by a bus 540, wherein,

the processor 510 is configured to implement the operations executed by the processing unit, and the processor 510 is configured to execute the steps of processing and encrypting the first audio data by the federal intelligent center in S201-S204 in fig. 2, which are not described herein again.

The processor 510 may be implemented in various ways, for example, the processor 510 may be a central processing unit or an image processor, the processor 510 may also be a single-core processor or a multi-core processor, and the processor 510 may also be a combination of a CPU and a hardware chip.

The communication interface 520 may be a wired interface, such as ethernet interface, Local Interconnect Network (LIN), etc., or a wireless interface, such as a cellular network interface or a wireless lan interface, for communicating with other modules or devices.

In this embodiment, the communication interface 520 may be specifically configured to perform operations of acquiring the first audio data, sending a ciphertext signal, receiving an inference result, and the like in S201 to S204 in fig. 2. Specifically, the actions performed by the communication interface 520 may refer to the above method embodiments, and are not described herein again.

The memory 530 may be a non-volatile memory, such as a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Memory 530 may also be volatile memory, which may be Random Access Memory (RAM), that acts as external cache memory.

Memory 530 may also be used to store instructions and data to facilitate positioning device 500 to invoke the instructions stored in memory 530 to implement the operations performed in S301-S304 described above. Further, server 500 may include more or fewer components than shown in FIG. 5, or have a different arrangement of components.

The bus 540 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 540 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

Optionally, the server 500 may further include an input/output interface 550, and the input/output interface 550 is connected with an input/output device for receiving input information and outputting an operation result.

Fig. 6 is a schematic structural diagram of a server according to an embodiment of the present application, where the federal inference engine 403 is disposed on the server 600 and deployed in a cloud. The server 600 includes modules for implementing operations performed by the federated inference engine, including: one or more processors 610, a communication interface 620, and a memory 630. Optionally, the processor 610, the communication interface 620, and the memory 630 are connected to each other through a bus 640, wherein,

the processor 610 is configured to implement the operations executed by the processing unit, and the processor 610 is configured to execute the steps of decrypting the ciphertext signal by the federal inference engine in S201-S204 in fig. 2, and completing the speech recognition and quality inspection judgment, which are not described herein again.

The processor 610 may be implemented in various ways, for example, the processor 610 may be a central processing unit or an image processor, the processor 610 may also be a single-core processor or a multi-core processor, and the processor 610 may also be a combination of a CPU and a hardware chip.

The communication interface 620 may be a wired interface, such as ethernet interface, Local Interconnect Network (LIN), or the like, or a wireless interface, such as a cellular network interface or a wireless lan interface, for communicating with other modules or devices.

In this embodiment, the communication interface 620 may be specifically configured to perform operations of receiving the ciphertext signal, exchanging parameters with the federal center node, sending an inference result, and the like in S201 to S204 in fig. 2. Specifically, the actions performed by the communication interface 620 may refer to the above method embodiments, and are not described herein again.

The memory 630 may be a non-volatile memory, such as a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Memory 630 may also be volatile memory, which may be Random Access Memory (RAM), which acts as external cache memory.

The memory 630 may also be used to store instructions and data to facilitate the positioning apparatus 500 to invoke the instructions stored in the memory 630 to implement the operations performed in S301-S304 described above. In addition, server 600 may contain more or fewer components than shown in FIG. 6, or have a different arrangement of components.

The bus 640 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 640 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

Optionally, the server 600 may further include an input/output interface 650, and the input/output interface 650 is connected with an input/output device for receiving input information and outputting an operation result.

The embodiments of the present application further provide a non-transitory computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program runs on a processor, the method steps executed in the foregoing method embodiments may be implemented, and specific implementation of the processor of the computer-readable storage medium in executing the method steps may refer to specific operations of S201 to S204 in the foregoing method embodiments, and details are not described herein again.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of clearly illustrating the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, electronic devices and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, electronic device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially or partially contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A voice detection method is characterized in that the method is applied to a system comprising a federal intelligent center, a federal inference engine and a customer service system background, and comprises the following steps:

the method comprises the steps that the federal intelligent center obtains first audio data, converts the first audio data into ciphertext signals and sends the ciphertext signals to the federal inference engine, wherein the first audio data are audio data of customer service or clients;

the federal reasoning engine converts the ciphertext signal into the first audio data, performs voice recognition and quality inspection judgment on the first audio data to obtain a reasoning result, encrypts and sends the reasoning result to the federal intelligent center, and the reasoning result indicates whether the customer service or the customer violates rules or not;

the federal intelligent center decrypts the data to obtain the inference result, and sends the inference result to the customer service system background;

and the customer service system background processes according to the reasoning result.

2. The method of claim 1, wherein before converting the first audio data into the ciphertext signal, further comprising: and the federal intelligent center performs voice enhancement processing on the first audio data.

3. The method according to claim 1 or 2, wherein before the obtaining the first audio data, the federal intelligent center further comprises:

the federal reasoning engine trains a neural network model based on training data, and model parameters of the neural network model obtained after training are transmitted to a federal center node in an encrypted manner;

the federal reasoning engine receives the aggregation parameters sent by the federal center node, and trains the neural network model according to the training data and the aggregation parameters to obtain a trained neural network model; the aggregation parameters are obtained by aggregating model parameters sent by a plurality of participants by the federal central node, and the plurality of participants comprise the federal inference engine.

4. The method of claim 3, wherein the performing speech recognition and quality assessment on the first audio data comprises:

and the federal reasoning engine decrypts the ciphertext signal to obtain the first audio data, and the trained neural network model is used for carrying out voice recognition and quality inspection judgment on the first audio data to obtain the reasoning result.

5. The method of claim 4, comprising: the quality inspection evaluation comprises the steps of performing regulated expression recognition, emotion sensitive word recognition and forbidden word recognition on the voice recognition result of the first audio data.

6. The method of claim 1, wherein the customer service system background performs processing based on the inference result, comprising:

and when the customer service or the customer breaks rules, the background of the customer service system cuts off the communication between the customer service and the customer.

7. A voice detection system is characterized in that the system comprises a federal intelligent center, a federal inference engine and a customer service system background,

the federal intelligent center is used for acquiring first audio data, converting the first audio data into ciphertext signals and sending the ciphertext signals to the federal inference engine, wherein the first audio data is audio data of customer service or a client;

the federal reasoning engine is used for converting the ciphertext signal into the first audio data, performing voice recognition and quality inspection judgment on the first audio data to obtain a reasoning result, encrypting the reasoning result and sending the reasoning result to the federal intelligent center, wherein the reasoning result indicates whether the customer service or the client violates rules or not;

and the customer service system background is used for processing according to the reasoning result.

8. A server device, characterized in that it comprises means for implementing the operations performed by the Federal Intelligent center of any of claims 1 to 6.

9. A server device, characterized in that the server comprises means for implementing the operations performed by the federal inference engine as claimed in any of claims 1 to 6.

10. A computer storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 6.