WO2022227188A1 - Intelligent customer service staff answering method and apparatus for speech, and computer device - Google Patents

Intelligent customer service staff answering method and apparatus for speech, and computer device Download PDF

Info

Publication number
WO2022227188A1
WO2022227188A1 PCT/CN2021/096981 CN2021096981W WO2022227188A1 WO 2022227188 A1 WO2022227188 A1 WO 2022227188A1 CN 2021096981 W CN2021096981 W CN 2021096981W WO 2022227188 A1 WO2022227188 A1 WO 2022227188A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
speech
code
customer service
feature
Prior art date
Application number
PCT/CN2021/096981
Other languages
French (fr)
Chinese (zh)
Inventor
孙奥兰
王健宗
程宁
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022227188A1 publication Critical patent/WO2022227188A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a voice-based intelligent customer service answering method, device, and computer equipment.
  • the traditional intelligent customer service Q&A system can be roughly divided into three independent parts.
  • the voice recognition technology is used to identify the questioner's speech content and convert it into text, and then the text-level Q&A system is used to automatically generate a proposed answer based on the text of the question.
  • the text is finally converted into voice output through the speech synthesis system.
  • This type of system relies on intermediate texts and requires multiple models to be used in a superimposed manner, and its accuracy will be affected by the superposition of multiple models, resulting in a low accuracy rate, and the process of computing through multiple models is cumbersome. It also leads to low efficiency.
  • the main purpose of this application is to provide a voice intelligent customer service answering method, device and computer equipment, which aims to solve the problem that the traditional intelligent customer service question answering system relies on intermediate text and requires multiple models to be superimposed and used, resulting in low efficiency.
  • This application provides a voice intelligent customer service answering method, including:
  • the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed
  • the first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained.
  • the speech segment is synchronously input into the speech decoder to be trained for training;
  • the answering voice is sent to the client.
  • the application also provides a voice intelligent customer service answering device, including:
  • An acquisition unit which is used to acquire the voice segment that the customer contains in question
  • a first input unit for inputting the speech fragment into a speech encoder to obtain the encoded first speech code
  • a processing unit configured to perform timbre standardization processing on the first speech code to obtain a second speech code
  • the second input unit is configured to input the second speech code into the speech decoder to obtain the answer speech; wherein, the speech encoder and the speech decoder are obtained through synchronous training, and the synchronous training method is to In the manual customer service, the first voice fragment of the question raised by the customer is input into the voice encoder to be trained, and the timbre standardization is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and artificial voice code are obtained.
  • the second voice segment corresponding to the question answered by the customer service is synchronously input into the voice decoder to be trained for training;
  • a sending unit configured to send the answering voice to the client.
  • the application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of the intelligent customer service answering method by voice when the processor executes the computer program:
  • the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed
  • the first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained.
  • the speech segment is synchronously input into the speech decoder to be trained for training;
  • the answering voice is sent to the client.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the intelligent customer service answering method for voice are realized:
  • the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed
  • the first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained.
  • the speech segment is synchronously input into the speech decoder to be trained for training;
  • the answering voice is sent to the client.
  • the synchronous speech encoder and the speech decoder are trained based on the sample data composed of the first voice fragment of the customer's question and the second voice fragment corresponding to the manual customer service's answer, so that only the customer needs to obtain the problem that the customer contains the problem. It can realize the corresponding answer voice, realize the realization of voice-to-speech, simplify the intelligent customer service question and answer system, and do not need to convert the voice fragment into text, thus improving the accuracy and calculation efficiency, and further improving the customer satisfaction.
  • the pre-trained voiceprint model is used to supervise the training of the answer voice, so that the generated timbre is unified, so that the customer experience effect is better.
  • FIG. 1 is a schematic flowchart of a voice-based intelligent customer service answering method according to an embodiment of the present application
  • FIG. 2 is a schematic structural block diagram of a voice intelligent customer service answering device according to an embodiment of the present application
  • FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • the present application proposes a voice-based intelligent customer service answering method, including:
  • S4 Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, The first voice fragment of the question raised by the customer is input into the voice encoder to be trained, and the timbre standardization process is carried out to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code corresponding to the question answered by the human customer service. The second speech segment is synchronously input into the speech decoder to be trained for training;
  • S5 Send the answering voice to the client.
  • the voice segment containing the problem is obtained from the client.
  • the voice clip occurs during the conversation between the intelligent customer service and the customer, that is, during the user's questioning process.
  • the voice can be obtained by obtaining voice data transmitted from the mobile phone terminal. Specifically, after the mobile phone microphone collects the voice sent by the customer, the voice is sent to the terminal or server where the intelligent customer service is located.
  • the voice segment is input into the voice encoder to obtain the encoded first voice code.
  • the speech encoder can be any one of a waveform encoder, a vocoder, and a hybrid encoder, and can implement the encoding process of the speech segment.
  • the answer to the voice fragment is not simply to restore the voice fragment. Therefore, the encoding needs to cooperate with the subsequent voice decoder, preferably the first recurrent neural network is used for encoding. Repeat.
  • the timbre standardization process is performed on the first speech code to obtain the second speech code. Since there are many customers and customer service members in the sample data participating in the training, it is easy to cause the final generated answer voice to have incomplete timbre.
  • a pre-trained voiceprint model can be set up to supervise the answer voice generation process. That is, the pre-trained voiceprint model acts as a speaker encoder, constantly correcting the timbre in the answer voice, so that the final answer voice aligns with the speaker encoder, so as to complete the unification of the answer voice timbre.
  • the function of concatenating strings into a string that is, the pre-trained voiceprint model has trained voiceprint features, and the voiceprint features in the voiceprint model are generally expressed as strings, and the first speech code is directly String
  • the voiceprint feature is not a character string
  • you can digitize the voiceprint feature that is, convert it into a corresponding number according to the size of the voiceprint, and then convert the voice into a character corresponding to the voiceprint feature
  • the character string corresponding to the voiceprint feature and the first voice encoded character string are combined into one character string, wherein the concat function is used to connect the two character strings to form a single character string.
  • the second voice code contains both the character string corresponding to the voiceprint model and the character string corresponding to the first voice code.
  • the second speech code is input into the speech decoder to obtain the answer speech.
  • the speech encoder and the speech decoder are trained based on sample data consisting of a first speech segment in which a customer asks a question and a second speech segment corresponding to a question answered by the human customer service in the manual customer service service.
  • the training method is to input the voice of the customer in the manual customer service service into the voice encoder, and perform timbre standardization processing to obtain the voice code corresponding to the first voice fragment, and input the corresponding voice code into the voice decoder, And input the answer answered by the corresponding human customer service into the voice decoder as an output correction, train the corresponding answer voice, and continuously adjust the parameters in the voice encoder and voice decoder, so that the answer voice is infinitely close to or equal to the answer answered by the human customer service. answer, thereby realizing the training of the voice decoder and the voice encoder, so that the corresponding answer voice can be obtained only by inputting the corresponding second voice code in the voice decoder.
  • the answering voice is sent to the client. That is, send the answer voice to the customer to answer the customer's voice fragment, without the complicated process of speech recognition-intent recognition-speech synthesis, etc.
  • the server For customers, it reduces the waiting time and has a better experience effect.
  • the amount of computation is reduced and more computation space can be released.
  • the method before the step S3 of performing timbre standardization processing on the first speech encoding to obtain the second speech encoding, the method further includes:
  • S203 Screen out the voiceprint model with the greatest similarity as a pre-trained voiceprint model according to the calculation result, so as to preprocess the first speech coding.
  • the selection of the voiceprint model is realized.
  • a voiceprint model similar to the customer's timbre can be found.
  • the first voiceprint feature in the voice segment is first extracted, that is, the customer is first collected through the microphone cluster.
  • the voiceprint of the customer is extracted from the voiceprint of the customer to obtain the first voiceprint feature.
  • the extraction method can be Linear Prediction Coefficients (LPC), Perceptual Linear Predictive Coefficient (PLP), Tandem features and Any one of the Bottleneck features, calculate the similarity between the second voiceprint feature corresponding to each voiceprint model and the first voiceprint feature according to the similarity calculation formula, where the similarity calculation formula can be in represents the second voiceprint feature, represents the first voiceprint feature, Indicates the similarity between the first voiceprint feature and the second voiceprint feature, and then selects the voiceprint model with the greatest similarity as the pre-trained voiceprint model according to the calculation result, where the voiceprint model with the greatest similarity is the same as Using the voiceprint model that is most similar to the customer's voice as a pre-trained voiceprint model can improve the customer's favor and improve customer satisfaction.
  • LPC Linear Prediction Coefficients
  • PGP Perceptual Linear Predictive Coefficient
  • Tandem features any one of the Bottleneck features
  • the calculation method of the similarity may also be Pearson Correlation Coefficient, Jaccard Coefficient, Tanimoto Coefficient (Generalized Jaccard Similarity Coefficient), log-likelihood similarity/logarithm Likelihood similarity, etc.
  • the step S2 of inputting the speech segment into a speech encoder to obtain the encoded first speech encoding includes:
  • S211 in the speech encoder, preprocess the speech segment to obtain a speech signal; wherein the speech signal is a one-dimensional signal formed in time sequence;
  • S212 Perform compressed sensing processing on the one-dimensional signal according to the first predetermined formula to obtain a target feature signal
  • S213 Input the target feature signal into a first recurrent neural network to obtain the first speech code.
  • the acquisition of the first speech code is realized. That is to preprocess the shortening of the speech, wherein the preprocessing method is Linear Prediction Analysis (Linear Prediction Coefficients, LPC), Perceptual Linear Predictive Coefficient (Perceptual Linear Predictive, PLP), any one of Tandem features and Bottleneck features to obtain the corresponding A digital signal of a speech segment, i.e. a one-dimensional signal.
  • the preprocessing method is Linear Prediction Analysis (Linear Prediction Coefficients, LPC), Perceptual Linear Predictive Coefficient (Perceptual Linear Predictive, PLP), any one of Tandem features and Bottleneck features to obtain the corresponding A digital signal of a speech segment, i.e. a one-dimensional signal.
  • the target feature signal is obtained, and then the target feature signal is input into the first recurrent neural network for processing to obtain the first speech code. The processing method will be described later, and will not be repeated here.
  • the step S213 of inputting the target feature signal into the first recurrent neural network to obtain the first speech encoding includes:
  • S2132 Sort the codes corresponding to each of the characteristic signal points according to the sequence of each of the characteristic signal points in the target characteristic signal to obtain the first speech code.
  • the second predetermined formula fully considers the value of the previous encoding, and uses the convolution method for encoding, so that the obtained data of the first speech encoding is more comprehensive, and the result of the calculation based on the first speech encoding is also It will be better. Specifically, the corresponding answer voice can refer to more parameters, and the obtained result will be more accurate.
  • the step S4 of inputting the second speech code into the speech decoder to obtain the answering speech includes:
  • S402 Decode the speech coding sequence based on the second recurrent neural network to obtain a decoded intermediate feature signal
  • S403 Obtain the answer voice according to a preset correspondence between the intermediate feature signal and the answer voice; wherein the preset correspondence is obtained by training with corresponding sample data.
  • the parsing of the second speech code is realized, that is, the speech code sequence of the second speech code is obtained, mainly to obtain the first code in the second speech code.
  • the voiceprint model is actually a way to control the timbre after the voice is generated, that is, it is first decoded through the second recurrent neural network. After decoding, the voice information of the corresponding voice fragment, that is, the intermediate feature signal, can be obtained.
  • the speech decoders are all trained by the corresponding sample data, that is, input the corresponding question speech from the speech decoder to get the corresponding answer speech.
  • the speech decoder also decodes the speech and converts it into the corresponding intermediate feature signal, in addition, there is a preset correspondence between the answer voice and the intermediate feature signal in the speech decoder, and the preset correspondence can be Among them, a i represents the i-th speech of the answering speech, b ij represents the value corresponding to the j-th syllable of the i-th speech, and c ij represents the weight corresponding to the j-th syllable of the i-th speech, l represents the length of the voice, so as to obtain the corresponding answer voice.
  • the method before the step S5 of sending the answer voice to the client, the method further includes:
  • S411 Extract the first voiceprint feature in the voice segment and the third voiceprint feature in the answer voice;
  • S412 Detect the similarity between the first voiceprint feature and the third voiceprint feature, and determine whether the similarity is greater than a similarity threshold;
  • the detection of the answer voice is realized, that is, the first voiceprint feature in the voice segment and the third voiceprint feature in the answer voice are first extracted.
  • the extraction method is described above.
  • the method of detecting similarity can still be calculated by the similarity calculation formula, and judge whether the similarity value is greater than the similarity threshold.
  • the correction has played a role and can be sent to the customer. If it is less than or equal to the similarity threshold, it means that the corresponding effect has not been played.
  • the timbre in the answer voice is quite different from the customer's timbre. At this time, you can choose whether to send it or not. Give the customer, or statistics, retrain the pretrained model so that the timbre of the answering speech can be similar to the timbre of the customer.
  • the synchronous voice encoder and the voice decoder are trained based on the sample data composed of the first voice segment of the question raised by the customer and the second voice segment corresponding to the question answered by the manual customer service, so that only It is necessary to obtain the voice fragments containing questions from the customer, and then the corresponding answer voice can be obtained, which realizes the realization of voice-to-voice, and simplifies the intelligent customer service Q&A system without converting the voice fragments into text, thereby improving the accuracy and calculation. efficiency, thereby improving customer satisfaction.
  • the pre-trained voiceprint model is used to supervise the training of the answer voice, so that the generated timbre is unified, so that the customer experience effect is better.
  • the present application also provides a voice intelligent customer service answering device, including:
  • an acquisition unit 10 configured to acquire the voice segment that the customer contains in question
  • the first input unit 20 is used for inputting the speech fragment into the speech encoder to obtain the encoded first speech code
  • a processing unit 30 configured to perform timbre standardization processing on the first speech code to obtain a second speech code
  • the second input unit 40 is configured to input the second speech code into the speech decoder to obtain the answer speech; wherein, the speech encoder and the speech decoder are obtained through synchronous training, wherein the synchronous training method In order to input the first voice segment of the question raised by the customer in the manual customer service service into the voice encoder to be trained, and perform timbre standardization processing to obtain the voice coding corresponding to the first voice fragment, the corresponding voice coding is performed. Synchronously input the second voice segment corresponding to the question answered by the human customer service into the voice decoder to be trained for training;
  • the sending unit 50 is configured to send the answer voice to the client.
  • the voice intelligent customer service answering device further includes:
  • a voiceprint feature extraction unit configured to extract the first voiceprint feature in the voice segment
  • a computing unit configured to calculate the similarity between the second voiceprint feature corresponding to each voiceprint model in the voiceprint model library and the first voiceprint feature
  • the screening unit is configured to screen out the voiceprint model with the greatest similarity as a pre-trained voiceprint model according to the calculation result, so as to perform timbre standardization processing on the first speech code.
  • the first input unit 20 includes:
  • a preprocessing subunit configured to preprocess the speech segment in the speech encoder to obtain a speech signal; wherein the speech signal is a one-dimensional signal formed in time sequence;
  • a compressed sensing processing subunit configured to perform compressed sensing processing on the one-dimensional signal according to a first predetermined formula to obtain a target characteristic signal
  • the feature signal input subunit is used for inputting the target feature signal into the first recurrent neural network to obtain the first speech code.
  • the characteristic signal input subunit includes:
  • the sorting module is configured to sort the codes corresponding to each of the characteristic signal points according to the order of each of the characteristic signal points in the target characteristic signal to obtain the first speech code.
  • the second input unit 40 includes:
  • a coding sequence acquisition subunit used for acquiring the speech coding sequence in the second speech coding
  • a decoding subunit configured to decode the speech coding sequence based on the second recurrent neural network to obtain a decoded intermediate feature signal
  • the answer voice obtaining subunit is configured to obtain the answer voice according to the preset correspondence between the intermediate feature signal and the answer voice; wherein, the preset correspondence is obtained by training corresponding sample data.
  • the intelligent customer service answering device for voice further includes:
  • a third voiceprint feature extraction unit configured to extract the first voiceprint feature in the voice segment and the third voiceprint feature in the answer voice
  • a similarity detection unit configured to detect the similarity between the first voiceprint feature and the third voiceprint feature, and determine whether the similarity is greater than a similarity threshold
  • An execution unit configured to execute the step of sending the answer voice to the client if the similarity is greater than the similarity threshold.
  • an embodiment of the present application further provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store various voice data and the like.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, the voice-based intelligent customer service answering method described in any of the above embodiments can be implemented.
  • FIG. 3 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
  • Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and the computer-readable storage medium may be non-volatile or volatile.
  • the voice-based intelligent customer service answering method described in any of the above embodiments can be implemented.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • SSRSDRAM double-rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • SLDRAM synchronous Link (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM
  • Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the underlying platform of the blockchain can include processing modules such as user management, basic services, smart contracts, and operation monitoring.
  • the user management module is responsible for the identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, and maintenance of the corresponding relationship between the user's real identity and blockchain address (authority management), etc.
  • account management maintenance of public and private key generation
  • key management key management
  • authorization management maintenance of the corresponding relationship between the user's real identity and blockchain address
  • the basic service module is deployed on all blockchain node devices to verify the validity of business requests, After completing the consensus on valid requests, record them in the storage.
  • the basic service For a new business request, the basic service first adapts the interface for analysis and authentication processing (interface adaptation), and then encrypts the business information through the consensus algorithm (consensus management), After encryption, it is completely and consistently transmitted to the shared ledger (network communication), and records are stored; the smart contract module is responsible for the registration and issuance of contracts, as well as contract triggering and contract execution.
  • contract logic through a programming language and publish to On the blockchain (contract registration), according to the logic of the contract terms, call the key or other events to trigger execution, complete the contract logic, and also provide the function of contract upgrade and cancellation;
  • the operation monitoring module is mainly responsible for the deployment in the product release process , configuration modification, contract settings, cloud adaptation, and visual output of real-time status during product operation, such as: alarms, monitoring network conditions, monitoring node equipment health status, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

An intelligent speech customer service staff answering method and apparatus, and a computer device. The method comprises: acquiring a speech segment of a customer that includes a question (S1); inputting the speech segment into a speech encoder, so as to obtain an encoded first speech code (S2); performing timbre standardization processing on the first speech code, so as to obtain a second speech code (S3); inputting the second speech code into a speech decoder, so as to obtain an answer speech (S4); and sending the answer speech to the customer (S5). By means of synchronously training a speech encoder and a speech decoder on the basis of sample data, which is composed of a first speech segment of a customer raising a question, and a second speech segment corresponding to artificial customer service staff answering the question, during an artificial customer service, the corresponding answer speech can be obtained simply by acquiring the speech segment of the customer, without the need to convert the speech segment into text, thereby improving the accuracy and the calculation efficiency, and further improving customer satisfaction.

Description

语音的智能客服回答方法、装置以及计算机设备Voice-based intelligent customer service answering method, device and computer equipment
本申请要求于2021年04月27日提交中国专利局、申请号为202110462426.9,发明名称为“语音的智能客服回答方法、装置以及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on April 27, 2021 with the application number 202110462426.9 and the invention titled "Voice-based intelligent customer service answering method, device and computer equipment", the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及人工智能领域,特别涉及一种语音的智能客服回答方法、装置以及计算机设备。The present application relates to the field of artificial intelligence, and in particular, to a voice-based intelligent customer service answering method, device, and computer equipment.
背景技术Background technique
传统的智能客服问答系统大致可以划分成三个独立的部分,先是通过语音识别技术识别出提问者的话语内容并将其转化为文本,接着使用文字级别的问答系统根据问题的文字自动生成拟作回答的文字,最后通过语音合成系统将该文字转化为语音输出。发明人意识到,这类系统依赖于中间文本,需要多个模型叠加使用,其准确率会受多个模型的叠加影响,导致准确率不高,并且通过多个模型进行计算的过程较为繁琐,也导致效率不高。The traditional intelligent customer service Q&A system can be roughly divided into three independent parts. First, the voice recognition technology is used to identify the questioner's speech content and convert it into text, and then the text-level Q&A system is used to automatically generate a proposed answer based on the text of the question. The text is finally converted into voice output through the speech synthesis system. The inventor realized that this type of system relies on intermediate texts and requires multiple models to be used in a superimposed manner, and its accuracy will be affected by the superposition of multiple models, resulting in a low accuracy rate, and the process of computing through multiple models is cumbersome. It also leads to low efficiency.
技术问题technical problem
本申请的主要目的为提供一种语音的智能客服回答方法、装置以及计算机设备,旨在解决传统的智能客服问答系统依赖于中间文本,需要多个模型叠加使用,导致效率不高的问题。The main purpose of this application is to provide a voice intelligent customer service answering method, device and computer equipment, which aims to solve the problem that the traditional intelligent customer service question answering system relies on intermediate text and requires multiple models to be superimposed and used, resulting in low efficiency.
技术解决方案technical solutions
本申请提供了一种语音的智能客服回答方法,包括:This application provides a voice intelligent customer service answering method, including:
获取客户包含有问题的语音片段;Get the customer's voice snippet that contains the question;
将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;Inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
将所述第一语音编码进行音色标准化处理,得到第二语音编码;Carrying out the timbre standardization process with the described first speech code, obtains the second speech code;
将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed The first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained. The speech segment is synchronously input into the speech decoder to be trained for training;
将所述回答语音发送给所述客户。The answering voice is sent to the client.
本申请还提供了一种语音的智能客服回答装置,包括:The application also provides a voice intelligent customer service answering device, including:
获取单元,用于获取客户包含有问题的语音片段;An acquisition unit, which is used to acquire the voice segment that the customer contains in question;
第一输入单元,用于将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;a first input unit for inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
处理单元,用于将所述第一语音编码进行音色标准化处理,得到第二语音编码;a processing unit, configured to perform timbre standardization processing on the first speech code to obtain a second speech code;
第二输入单元,用于将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;The second input unit is configured to input the second speech code into the speech decoder to obtain the answer speech; wherein, the speech encoder and the speech decoder are obtained through synchronous training, and the synchronous training method is to In the manual customer service, the first voice fragment of the question raised by the customer is input into the voice encoder to be trained, and the timbre standardization is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and artificial voice code are obtained. The second voice segment corresponding to the question answered by the customer service is synchronously input into the voice decoder to be trained for training;
发送单元,用于将所述回答语音发送给所述客户。a sending unit, configured to send the answering voice to the client.
本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现语音的智能客服回答方法的步骤:The application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the steps of the intelligent customer service answering method by voice when the processor executes the computer program:
获取客户包含有问题的语音片段;Get the customer's voice snippet that contains the question;
将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;Inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
将所述第一语音编码进行音色标准化处理,得到第二语音编码;Carrying out the timbre standardization process with the described first speech code, obtains the second speech code;
将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和 所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed The first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained. The speech segment is synchronously input into the speech decoder to be trained for training;
将所述回答语音发送给所述客户。The answering voice is sent to the client.
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现语音的智能客服回答方法的步骤:The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the intelligent customer service answering method for voice are realized:
获取客户包含有问题的语音片段;Get the customer's voice snippet that contains the question;
将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;Inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
将所述第一语音编码进行音色标准化处理,得到第二语音编码;Carrying out the timbre standardization process with the described first speech code, obtains the second speech code;
将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed The first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained. The speech segment is synchronously input into the speech decoder to be trained for training;
将所述回答语音发送给所述客户。The answering voice is sent to the client.
有益效果beneficial effect
通过基于人工客服服务中,客户提出问题的第一语音片段以及人工客服回答问题对应的第二语音片段组成的样本数据训练同步语音编码器和所述语音解码器,使只需要获取客户包含有问题的语音片段,就能实现得到对应的回答语音,实现了语音到语音的实现方式,简化了智能客服问答系统,无需将语音片段转化为文本,从而提高了准确率和计算的效率,进而提升了客户的满意度。另外通过预训练的声纹模型对回答语音进行监督训练,使其生成的音色统一,使客户的体验效果更佳。In the manual customer service service, the synchronous speech encoder and the speech decoder are trained based on the sample data composed of the first voice fragment of the customer's question and the second voice fragment corresponding to the manual customer service's answer, so that only the customer needs to obtain the problem that the customer contains the problem. It can realize the corresponding answer voice, realize the realization of voice-to-speech, simplify the intelligent customer service question and answer system, and do not need to convert the voice fragment into text, thus improving the accuracy and calculation efficiency, and further improving the customer satisfaction. In addition, the pre-trained voiceprint model is used to supervise the training of the answer voice, so that the generated timbre is unified, so that the customer experience effect is better.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description serve to explain the principles of the application.
图1是本申请一实施例的一种语音的智能客服回答方法的流程示意图;1 is a schematic flowchart of a voice-based intelligent customer service answering method according to an embodiment of the present application;
图2是本申请一实施例的一种语音的智能客服回答装置的结构示意框图;2 is a schematic structural block diagram of a voice intelligent customer service answering device according to an embodiment of the present application;
图3为本申请一实施例的计算机设备的结构示意框图。FIG. 3 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
本申请的最佳实施方式Best Mode for Carrying Out the Application
参照图1,本申请提出一种语音的智能客服回答方法,包括:1, the present application proposes a voice-based intelligent customer service answering method, including:
S1:获取客户包含有问题的语音片段;S1: Get the voice segment that the customer contains in question;
S2:将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;S2: inputting the speech fragment into the speech encoder to obtain the encoded first speech code;
S3:将所述第一语音编码进行音色标准化处理,得到第二语音编码;S3: performing timbre standardization processing on the first speech code to obtain a second speech code;
S4:将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;S4: Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, The first voice fragment of the question raised by the customer is input into the voice encoder to be trained, and the timbre standardization process is carried out to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code corresponding to the question answered by the human customer service. The second speech segment is synchronously input into the speech decoder to be trained for training;
S5:将所述回答语音发送给所述客户。S5: Send the answering voice to the client.
如上述步骤S1所述,获取客户包含有问题的语音片段。其中,该语音片段发生在智能客服与客户的对话过程中,即用户提问的过程中,例如,采集智能客服发出了“请问有什么可以帮到您”等类似引导性的话语之后,客户发出的语音,获取的方式可以是通过获取手机 端传过来的语音数据,具体地,手机麦克风在采集了客户发出的语音后,将该语音发送给智能客服所在的终端或服务器。As described in the above step S1, the voice segment containing the problem is obtained from the client. Among them, the voice clip occurs during the conversation between the intelligent customer service and the customer, that is, during the user's questioning process. The voice can be obtained by obtaining voice data transmitted from the mobile phone terminal. Specifically, after the mobile phone microphone collects the voice sent by the customer, the voice is sent to the terminal or server where the intelligent customer service is located.
如上述步骤S2所述,将所述语音片段输入至语音编码器中,得到编码后的第一语音编码。其中,语音编码器可以是波形编器、声码器以及混合编码器中的任意一种,可以实现对语音片段的编码过程即可,为了便于后续对语音片段进行解码,由于回答语音是回答该语音片段的答案,并非是简单的将语音片段进行还原,因此,编码需要与后续的语音解码器相配合,优选使用第一循环神经网络进行编码,具体地编码过程后续有详细说明,此处不再赘述。As described in the above step S2, the voice segment is input into the voice encoder to obtain the encoded first voice code. The speech encoder can be any one of a waveform encoder, a vocoder, and a hybrid encoder, and can implement the encoding process of the speech segment. The answer to the voice fragment is not simply to restore the voice fragment. Therefore, the encoding needs to cooperate with the subsequent voice decoder, preferably the first recurrent neural network is used for encoding. Repeat.
如上述步骤S3所述,将所述第一语音编码进行音色标准化处理,得到第二语音编码。由于参与训练的样本数据中,客户和客服均具有很多个,容易导致最终生成的回答语音的音色不全,具体地,可以设置一个预训练的声纹模型,在回答语音的生成过程中进行监督,即该预训练的声纹模型充当一个speaker encoder,不断纠正回答语音中的音色,使最终的回答语音向speaker encoder看齐,从而完成对回答语音音色的统一,另外concat函数是一种将多个字符串连接成一个字符串的函数,即预训练的声纹模型中有训练好的声纹特征,该声纹特征在声纹模型中的表现一般也为字符串,而第一语音编码则直接为字符串,当然,若声纹特征的表现不为字符串,则可以将声纹特征进行数字化,即根据声纹的大小转化为对应的数字,然后将该语音的转化为对应声纹特征的字符串,再基于concat函数将该声纹特征对应的字符串与第一语音编码的字符串合并为一个字符串,其中,concat函数用于将两个字符串连接起来,形成一个单一的字符串。即第二语音编码中既包含有有声纹模型对应的字符串,也有第一语音编码对应的字符串,后续计算过程中,无需对声纹特征进行分析,忽略掉人的音色信息,只需要关注到用户的语音信息即可,专注于回答语音的生成。As described in the above step S3, the timbre standardization process is performed on the first speech code to obtain the second speech code. Since there are many customers and customer service members in the sample data participating in the training, it is easy to cause the final generated answer voice to have incomplete timbre. Specifically, a pre-trained voiceprint model can be set up to supervise the answer voice generation process. That is, the pre-trained voiceprint model acts as a speaker encoder, constantly correcting the timbre in the answer voice, so that the final answer voice aligns with the speaker encoder, so as to complete the unification of the answer voice timbre. The function of concatenating strings into a string, that is, the pre-trained voiceprint model has trained voiceprint features, and the voiceprint features in the voiceprint model are generally expressed as strings, and the first speech code is directly String, of course, if the voiceprint feature is not a character string, you can digitize the voiceprint feature, that is, convert it into a corresponding number according to the size of the voiceprint, and then convert the voice into a character corresponding to the voiceprint feature Then, based on the concat function, the character string corresponding to the voiceprint feature and the first voice encoded character string are combined into one character string, wherein the concat function is used to connect the two character strings to form a single character string. That is, the second voice code contains both the character string corresponding to the voiceprint model and the character string corresponding to the first voice code. In the subsequent calculation process, there is no need to analyze the voiceprint features, ignoring the timbre information of the person, and only need to pay attention to Only the user's voice information can be used, and the generation of the answer voice can be focused on.
如上述步骤S4所述,将所述第二语音编码输入至语音解码器中,得到回答语音。其中,语音编码器和所述语音解码器基于人工客服服务中,客户提出问题的第一语音片段以及人工客服回答问题对应的第二语音片段组成的样本数据训练而成。训练的方式为将人工客服服务中的客户语音输入至语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将该对应的语音编码输入至语音解码器中,以及将对应的人工客服回答的答案输入至语音解码器中作为输出校正,对应回答语音进行训练,不断调整语音编码器和语音解码器中的参数,使回答语音无限接近或者等于该人工客服回答的答案,从而实现了对语音解码器和语音编码器的训练,使只需要在语音解码器中输入对应的第二语音编码就能得到对应的回答语音。As described in the above step S4, the second speech code is input into the speech decoder to obtain the answer speech. Wherein, the speech encoder and the speech decoder are trained based on sample data consisting of a first speech segment in which a customer asks a question and a second speech segment corresponding to a question answered by the human customer service in the manual customer service service. The training method is to input the voice of the customer in the manual customer service service into the voice encoder, and perform timbre standardization processing to obtain the voice code corresponding to the first voice fragment, and input the corresponding voice code into the voice decoder, And input the answer answered by the corresponding human customer service into the voice decoder as an output correction, train the corresponding answer voice, and continuously adjust the parameters in the voice encoder and voice decoder, so that the answer voice is infinitely close to or equal to the answer answered by the human customer service. answer, thereby realizing the training of the voice decoder and the voice encoder, so that the corresponding answer voice can be obtained only by inputting the corresponding second voice code in the voice decoder.
如上述步骤S5所述,将所述回答语音发送给所述客户。即将所述回答语音发送给客户,以回答客户的语音片段,无需复杂的语音识别-意图识别-语音合成等繁琐的过程,对于客户来说,减少了等待时间,具有更好的体验效果,对于服务器来说,减小了运算量,可以释放出更多的运算空间。As described in the above step S5, the answering voice is sent to the client. That is, send the answer voice to the customer to answer the customer's voice fragment, without the complicated process of speech recognition-intent recognition-speech synthesis, etc. For customers, it reduces the waiting time and has a better experience effect. For the server, the amount of computation is reduced and more computation space can be released.
在一个实施例中,所述将所述第一语音编码进行音色标准化处理,得到第二语音编码的步骤S3之前,还包括:In one embodiment, before the step S3 of performing timbre standardization processing on the first speech encoding to obtain the second speech encoding, the method further includes:
S201:提取所述语音片段中的第一声纹特征;S201: Extract the first voiceprint feature in the voice segment;
S202:计算声纹模型库中各个声纹模型对应的第二声纹特征与所述第一声纹特征的相似度;S202: Calculate the similarity between the second voiceprint feature corresponding to each voiceprint model in the voiceprint model library and the first voiceprint feature;
S203:根据计算结果筛选出相似度最大的声纹模型作为预训练的声纹模型,以对所述第一语音编码进行预处理。S203: Screen out the voiceprint model with the greatest similarity as a pre-trained voiceprint model according to the calculation result, so as to preprocess the first speech coding.
如上述步骤S201-S203所述,实现了对声纹模型的选取。为了适应不同的地域的客户,使客户产生亲近感,可以找出与客户音色相仿的声纹模型,具体地,先提取到语音片段中的第一声纹特征,即先通过麦克风集群采集到客户的声纹,对客户的声纹进行声纹提取,从而得到第一声纹特征,其中提取的方式可以是线性预测分析(LinearPredictionCoefficients,LPC), 感知线性预测系数(PerceptualLinearPredictive,PLP),Tandem特征和Bottleneck特征中的任意一种,根据相似度计算公式计算各个声纹模型对应的第二声纹特征与第一声纹特征的相似度,其中相似度计算公式可以是
Figure PCTCN2021096981-appb-000001
其中
Figure PCTCN2021096981-appb-000002
表示第二声纹特征,
Figure PCTCN2021096981-appb-000003
表示第一声纹特征,
Figure PCTCN2021096981-appb-000004
表示第一声纹特征和第二声纹特征的相似度,然后根据计算的结果,选取相似度最大的声纹模型作为预训练的声纹模型,其中,相似度最大的声纹模型即为与客户语音最相似的声纹模型,采用其作为预训练的声纹模型可以提高客户的好感,提高了客户的满意度。另外,不同的声纹模型采用不同的训练数据进行训练,例如不同地方的方言,亦或者不同年龄段的语言等。在其他实施例中,相似度的计算方式还可以是皮尔森相关系数(Pearson Correlation Coefficient)、Jaccard相似系数(Jaccard Coefficient)、Tanimoto系数(广义Jaccard相似系数)、对数似然相似度/对数似然相似率等。
As described in the above steps S201-S203, the selection of the voiceprint model is realized. In order to adapt to customers in different regions and make customers feel close, a voiceprint model similar to the customer's timbre can be found. Specifically, the first voiceprint feature in the voice segment is first extracted, that is, the customer is first collected through the microphone cluster. The voiceprint of the customer is extracted from the voiceprint of the customer to obtain the first voiceprint feature. The extraction method can be Linear Prediction Coefficients (LPC), Perceptual Linear Predictive Coefficient (PLP), Tandem features and Any one of the Bottleneck features, calculate the similarity between the second voiceprint feature corresponding to each voiceprint model and the first voiceprint feature according to the similarity calculation formula, where the similarity calculation formula can be
Figure PCTCN2021096981-appb-000001
in
Figure PCTCN2021096981-appb-000002
represents the second voiceprint feature,
Figure PCTCN2021096981-appb-000003
represents the first voiceprint feature,
Figure PCTCN2021096981-appb-000004
Indicates the similarity between the first voiceprint feature and the second voiceprint feature, and then selects the voiceprint model with the greatest similarity as the pre-trained voiceprint model according to the calculation result, where the voiceprint model with the greatest similarity is the same as Using the voiceprint model that is most similar to the customer's voice as a pre-trained voiceprint model can improve the customer's favor and improve customer satisfaction. In addition, different voiceprint models are trained with different training data, such as dialects in different places, or languages in different age groups. In other embodiments, the calculation method of the similarity may also be Pearson Correlation Coefficient, Jaccard Coefficient, Tanimoto Coefficient (Generalized Jaccard Similarity Coefficient), log-likelihood similarity/logarithm Likelihood similarity, etc.
在一个实施例中,所述将所述语音片段输入至语音编码器中,得到编码后的第一语音编码的步骤S2,包括:In one embodiment, the step S2 of inputting the speech segment into a speech encoder to obtain the encoded first speech encoding includes:
S211:在所述语音编码器中,将所述语音片段进行预处理得到语音信号;其中所述语音信号为按照时间顺序形成的一维信号;S211: in the speech encoder, preprocess the speech segment to obtain a speech signal; wherein the speech signal is a one-dimensional signal formed in time sequence;
S212:根据第一预定公式对所述一维信号进行压缩感知处理,得到目标特征信号;S212: Perform compressed sensing processing on the one-dimensional signal according to the first predetermined formula to obtain a target feature signal;
S213:将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码。S213: Input the target feature signal into a first recurrent neural network to obtain the first speech code.
如上述步骤S211-S213所述,实现了对第一语音编码的获取。即对语音变短进行预处理,其中,预处理的方式为线性预测分析(LinearPredictionCoefficients,LPC),感知线性预测系数(PerceptualLinearPredictive,PLP),Tandem特征和Bottleneck特征中的任意一种,以得到对应的语音片段的数字信号,即一维信号。然后根据第一预定公式进行压缩,其中,第一预定公式为t i=p is i,其中t i表示第t个信号点的压缩值,s i表示语音片段中第i个信号点的值,p i表示第i个信号点对应的压缩值,与s i相关,即p i=f(s i)。得到目标特征信号,再将目标特征信号输入至第一循环神经网络中进行处理,得到第一语音编码,其中处理的方式后续有说明,此处不再赘述。 As described in the above steps S211-S213, the acquisition of the first speech code is realized. That is to preprocess the shortening of the speech, wherein the preprocessing method is Linear Prediction Analysis (Linear Prediction Coefficients, LPC), Perceptual Linear Predictive Coefficient (Perceptual Linear Predictive, PLP), any one of Tandem features and Bottleneck features to obtain the corresponding A digital signal of a speech segment, i.e. a one-dimensional signal. Then, the compression is performed according to a first predetermined formula, wherein the first predetermined formula is t i =p i s i , where t i represents the compression value of the t-th signal point, and s i represents the value of the i-th signal point in the speech segment , pi represents the compression value corresponding to the ith signal point, which is related to s i , that is, pi =f(s i ) . The target feature signal is obtained, and then the target feature signal is input into the first recurrent neural network for processing to obtain the first speech code. The processing method will be described later, and will not be repeated here.
在一个实施例中,所述将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码的步骤S213,包括:In one embodiment, the step S213 of inputting the target feature signal into the first recurrent neural network to obtain the first speech encoding includes:
S2131:在所述第一循环神经网络的隐含层中,根据第二预定公式对所述目标特征信号的每个特征信号点进行编码;其中,所述第二预定公式为h(i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b),σ为所述第一循环神经网络的激活函数;b为第一线性偏移系数;U为所述第一循环神经网络的第一线性关系系数,W为所述第一循环神经网络的第二线性关系系数,z(i)表示所述目标特征信号的第i个所述特征信号点,h(i)表示第i个所述特征信号点对应的编码值;S2131: In the hidden layer of the first recurrent neural network, encode each feature signal point of the target feature signal according to a second predetermined formula; wherein, the second predetermined formula is h(i)= σ[z(i)]=σ(Uz(i)+Wh(i-1)+b), σ is the activation function of the first recurrent neural network; b is the first linear offset coefficient; U is the the first linear relationship coefficient of the first recurrent neural network, W is the second linear relationship coefficient of the first recurrent neural network, z(i) represents the i-th feature signal point of the target feature signal, h (i) represents the coded value corresponding to the i-th feature signal point;
S2132:按照各个所述特征信号点在所述目标特征信号中的顺序,对各个所述特征信号点对应的编码进行排序,得到所述第一语音编码。S2132: Sort the codes corresponding to each of the characteristic signal points according to the sequence of each of the characteristic signal points in the target characteristic signal to obtain the first speech code.
如上述步骤S2131-S2132所述,即在第一循环神经网络的隐含层中,根据第二预定公式对目标特征信号的每个特征点进行编码,便么与对应的信号点的值相关,即采用公式h(i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b)进行编码,h(i)表示第i个所述特征信号点对应的编码值,h(i-1)表示第i-1个所述特征信号点对应的编码值,对应各个特征信号点的顺序进行排序,得到第一语音编码。需要说明的是,第二预定公式充分考虑了前一编码的数值,采用卷积的方式进行编码,使得到的第一语音编码的数据更加全面,进而基于该第一语音编码进行计算的结果也会更好,具体地,对应的回答语音所能够参照的参数会更多,得到的结果会更加准确。As described in the above steps S2131-S2132, that is, in the hidden layer of the first recurrent neural network, each feature point of the target feature signal is encoded according to the second predetermined formula, so that it is related to the value of the corresponding signal point, That is, the formula h(i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b) is used for encoding, and h(i) represents the i-th characteristic signal point corresponding to the The coding value, h(i-1) represents the coding value corresponding to the i-1 th feature signal point, and is sorted according to the sequence of each feature signal point to obtain the first speech code. It should be noted that the second predetermined formula fully considers the value of the previous encoding, and uses the convolution method for encoding, so that the obtained data of the first speech encoding is more comprehensive, and the result of the calculation based on the first speech encoding is also It will be better. Specifically, the corresponding answer voice can refer to more parameters, and the obtained result will be more accurate.
在一个实时例中,所述将所述第二语音编码输入至语音解码器中,得到回答语音的步 骤S4,包括:In a real-time example, the step S4 of inputting the second speech code into the speech decoder to obtain the answering speech includes:
S401:获取所述第二语音编码中的语音编码序列;S401: Obtain the speech coding sequence in the second speech coding;
S402:基于第二循环神经网络对所述语音编码序列进行解码,得到解码后的中间特征信号;S402: Decode the speech coding sequence based on the second recurrent neural network to obtain a decoded intermediate feature signal;
S403:根据所述中间特征信号与回答语音的预设对应关系,得到所述回答语音;其中,所述预设对应关系通过对应的样本数据训练得到。S403: Obtain the answer voice according to a preset correspondence between the intermediate feature signal and the answer voice; wherein the preset correspondence is obtained by training with corresponding sample data.
如上述步骤S401-S403所述,实现了对第二语音编码的解析,即获取到第二语音编码的语音编码序列,主要是获取到第二语音编码内的第一编码,第二语音编码中的声纹模型其实是在语音生成后对音色进行调控的方式,即先通过第二循环神经网络进行解码,其中解码后可以得到对应语音片段的语音信息,即中间特征信号,由于语音编码器和语音解码器都是通过对应的样本数据进行训练的,即从语音解码器输入对应的问题语音,就能得到对应的回答语音,其中,语音解码器也是通过对语音进行解码,转换成对应的中间特征信号,另外,在语音解码器中有回答语音和中间特征信号的预设对应关系,该预设对应关系可以采用
Figure PCTCN2021096981-appb-000005
其中,a i表示回答语音的第i段语音,b ij表示第i段语音第j个音节对应的值,c ij表示第i段语音第j个音节对应的权重,
Figure PCTCN2021096981-appb-000006
l表示语音的长度,从而得到了对应的回答语音。
As described in the above steps S401-S403, the parsing of the second speech code is realized, that is, the speech code sequence of the second speech code is obtained, mainly to obtain the first code in the second speech code. The voiceprint model is actually a way to control the timbre after the voice is generated, that is, it is first decoded through the second recurrent neural network. After decoding, the voice information of the corresponding voice fragment, that is, the intermediate feature signal, can be obtained. The speech decoders are all trained by the corresponding sample data, that is, input the corresponding question speech from the speech decoder to get the corresponding answer speech. The speech decoder also decodes the speech and converts it into the corresponding intermediate feature signal, in addition, there is a preset correspondence between the answer voice and the intermediate feature signal in the speech decoder, and the preset correspondence can be
Figure PCTCN2021096981-appb-000005
Among them, a i represents the i-th speech of the answering speech, b ij represents the value corresponding to the j-th syllable of the i-th speech, and c ij represents the weight corresponding to the j-th syllable of the i-th speech,
Figure PCTCN2021096981-appb-000006
l represents the length of the voice, so as to obtain the corresponding answer voice.
在一个实施例中,所述将所述回答语音发送给所述客户的步骤S5之前,还包括:In one embodiment, before the step S5 of sending the answer voice to the client, the method further includes:
S411:提取所述语音片段中的第一声纹特征以及所述回答语音中的第三声纹特征;S411: Extract the first voiceprint feature in the voice segment and the third voiceprint feature in the answer voice;
S412:检测所述第一声纹特征与所述第三声纹特征的相似度,并判断所述相似度是否大于相似度阈值;S412: Detect the similarity between the first voiceprint feature and the third voiceprint feature, and determine whether the similarity is greater than a similarity threshold;
S413:若大于所述相似度阈值,则执行所述将所述回答语音发送给所述客户的步骤。S413: If it is greater than the similarity threshold, execute the step of sending the answer voice to the customer.
如上述步骤S411-S413所述,实现了对回答语音的检测,即先提取到语音片段中的第一声纹特征,以及回答语音中的第三声纹特征,提取的方式上述有说明,此处不再赘述,检测相似度的方法依然可以采用相似度计算公式进行计算,判断其相似度的值是否大于相似度阈值,若大于相似度阈值,则表明预训练的声纹模型对回答语音的纠正起到了作用,可以发送给客户,若小于或等于该相似度阈值时,则说明没有起到对应的作用,该回答语音中的音色与客户的音色相差比较大,此时,可以选择是否发送给客户,或者统计数据,对预训练的模型重新进行训练,使回答语音的音色可以与客户的音色相似。As described in the above steps S411-S413, the detection of the answer voice is realized, that is, the first voiceprint feature in the voice segment and the third voiceprint feature in the answer voice are first extracted. The extraction method is described above. The method of detecting similarity can still be calculated by the similarity calculation formula, and judge whether the similarity value is greater than the similarity threshold. The correction has played a role and can be sent to the customer. If it is less than or equal to the similarity threshold, it means that the corresponding effect has not been played. The timbre in the answer voice is quite different from the customer's timbre. At this time, you can choose whether to send it or not. Give the customer, or statistics, retrain the pretrained model so that the timbre of the answering speech can be similar to the timbre of the customer.
本申请的有益效果:通过基于人工客服服务中,客户提出问题的第一语音片段以及人工客服回答问题对应的第二语音片段组成的样本数据训练同步语音编码器和所述语音解码器,使只需要获取客户包含有问题的语音片段,就能实现得到对应的回答语音,实现了语音到语音的实现方式,简化了智能客服问答系统,无需将语音片段转化为文本,从而提高了准确率和计算的效率,进而提升了客户的满意度。另外通过预训练的声纹模型对回答语音进行监督训练,使其生成的音色统一,使客户的体验效果更佳。Beneficial effects of the present application: in the manual customer service service, the synchronous voice encoder and the voice decoder are trained based on the sample data composed of the first voice segment of the question raised by the customer and the second voice segment corresponding to the question answered by the manual customer service, so that only It is necessary to obtain the voice fragments containing questions from the customer, and then the corresponding answer voice can be obtained, which realizes the realization of voice-to-voice, and simplifies the intelligent customer service Q&A system without converting the voice fragments into text, thereby improving the accuracy and calculation. efficiency, thereby improving customer satisfaction. In addition, the pre-trained voiceprint model is used to supervise the training of the answer voice, so that the generated timbre is unified, so that the customer experience effect is better.
参照图2,本申请还提供了一种语音的智能客服回答装置,包括:Referring to FIG. 2, the present application also provides a voice intelligent customer service answering device, including:
获取单元10,用于获取客户包含有问题的语音片段;an acquisition unit 10, configured to acquire the voice segment that the customer contains in question;
第一输入单元20,用于将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;The first input unit 20 is used for inputting the speech fragment into the speech encoder to obtain the encoded first speech code;
处理单元30,用于将所述第一语音编码进行音色标准化处理,得到第二语音编码;a processing unit 30, configured to perform timbre standardization processing on the first speech code to obtain a second speech code;
第二输入单元40,用于将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,其中所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;The second input unit 40 is configured to input the second speech code into the speech decoder to obtain the answer speech; wherein, the speech encoder and the speech decoder are obtained through synchronous training, wherein the synchronous training method In order to input the first voice segment of the question raised by the customer in the manual customer service service into the voice encoder to be trained, and perform timbre standardization processing to obtain the voice coding corresponding to the first voice fragment, the corresponding voice coding is performed. Synchronously input the second voice segment corresponding to the question answered by the human customer service into the voice decoder to be trained for training;
发送单元50,用于将所述回答语音发送给所述客户。The sending unit 50 is configured to send the answer voice to the client.
在一个实施例中,所述语音的智能客服回答装置,还包括:In one embodiment, the voice intelligent customer service answering device further includes:
声纹特征提取单元,用于提取所述语音片段中的第一声纹特征;a voiceprint feature extraction unit, configured to extract the first voiceprint feature in the voice segment;
计算单元,用于计算声纹模型库中各个声纹模型对应的第二声纹特征与所述第一声纹特征的相似度;a computing unit, configured to calculate the similarity between the second voiceprint feature corresponding to each voiceprint model in the voiceprint model library and the first voiceprint feature;
筛选单元,用于根据计算结果筛选出相似度最大的声纹模型作为预训练的声纹模型,以对所述第一语音编码进行音色标准化处理。The screening unit is configured to screen out the voiceprint model with the greatest similarity as a pre-trained voiceprint model according to the calculation result, so as to perform timbre standardization processing on the first speech code.
在一个实施例中,第一输入单元20,包括:In one embodiment, the first input unit 20 includes:
预处理子单元,用于在所述语音编码器中,将所述语音片段进行预处理得到语音信号;其中,所述语音信号为按照时间顺序形成的一维信号;a preprocessing subunit, configured to preprocess the speech segment in the speech encoder to obtain a speech signal; wherein the speech signal is a one-dimensional signal formed in time sequence;
压缩感知处理子单元,用于根据第一预定公式对所述一维信号进行压缩感知处理,得到目标特征信号;a compressed sensing processing subunit, configured to perform compressed sensing processing on the one-dimensional signal according to a first predetermined formula to obtain a target characteristic signal;
特征信号输入子单元,用于将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码。The feature signal input subunit is used for inputting the target feature signal into the first recurrent neural network to obtain the first speech code.
在一个实施例中,所述特征信号输入子单元,包括:In one embodiment, the characteristic signal input subunit includes:
编码模块,用于在所述第一循环神经网络的隐含层中,根据第二预定公式对所述目标特征信号的每个特征信号点进行编码;其中,所述第二预定公式为h(i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b),σ为所述第一循环神经网络的激活函数;b为第一线性偏移系数;U为所述第一循环神经网络的第一线性关系系数,W为所述第一循环神经网络的第二线性关系系数,z(i)表示所述目标特征信号的第i个所述特征信号点,h(i)表示第i个所述特征信号点对应的编码值;The encoding module is used to encode each feature signal point of the target feature signal in the hidden layer of the first recurrent neural network according to a second predetermined formula; wherein, the second predetermined formula is h( i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b), σ is the activation function of the first recurrent neural network; b is the first linear offset coefficient; U is the first linear relationship coefficient of the first RNN, W is the second linear relationship coefficient of the first RNN, z(i) represents the i-th feature signal of the target feature signal point, h(i) represents the coded value corresponding to the i-th feature signal point;
排序模块,用于按照各个所述特征信号点在所述目标特征信号中的顺序,对各个所述特征信号点对应的编码进行排序,得到所述第一语音编码。The sorting module is configured to sort the codes corresponding to each of the characteristic signal points according to the order of each of the characteristic signal points in the target characteristic signal to obtain the first speech code.
在一个实施例中,第二输入单元40,包括:In one embodiment, the second input unit 40 includes:
编码序列获取子单元,用于获取所述第二语音编码中的语音编码序列;A coding sequence acquisition subunit, used for acquiring the speech coding sequence in the second speech coding;
解码子单元,用于基于第二循环神经网络对所述语音编码序列进行解码,得到解码后的中间特征信号;a decoding subunit, configured to decode the speech coding sequence based on the second recurrent neural network to obtain a decoded intermediate feature signal;
回答语音获取子单元,用于根据所述中间特征信号与回答语音的预设对应关系,得到所述回答语音;其中,所述预设对应关系通过对应的样本数据训练得到。The answer voice obtaining subunit is configured to obtain the answer voice according to the preset correspondence between the intermediate feature signal and the answer voice; wherein, the preset correspondence is obtained by training corresponding sample data.
在一个实施例中,语音的智能客服回答装置,还包括:In one embodiment, the intelligent customer service answering device for voice further includes:
第三声纹特征提取单元,用于提取所述语音片段中的第一声纹特征以及所述回答语音中的第三声纹特征;A third voiceprint feature extraction unit, configured to extract the first voiceprint feature in the voice segment and the third voiceprint feature in the answer voice;
相似度检测单元,用于检测所述第一声纹特征与所述第三声纹特征的相似度,并判断所述相似度是否大于相似度阈值;a similarity detection unit, configured to detect the similarity between the first voiceprint feature and the third voiceprint feature, and determine whether the similarity is greater than a similarity threshold;
执行单元,用于若大于所述相似度阈值,则执行所述将所述回答语音发送给所述客户的步骤。An execution unit, configured to execute the step of sending the answer voice to the client if the similarity is greater than the similarity threshold.
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储各种语音数据等。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时可以实现上述任一实施例所述的语音的智能客服回答方法。Referring to FIG. 3 , an embodiment of the present application further provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer design is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store various voice data and the like. The network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer program is executed by the processor, the voice-based intelligent customer service answering method described in any of the above embodiments can be implemented.
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied.
本申请实施例还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机可读存储介质可以是非易失性,也可以是易失性。计算机程序被处理器执行时可以实现上述任一实施例所述的语音的智能客服回答方法。Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and the computer-readable storage medium may be non-volatile or volatile. When the computer program is executed by the processor, the voice-based intelligent customer service answering method described in any of the above embodiments can be implemented.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储与一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM一多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, device, article or method comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, apparatus, article or method. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.
区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层。Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
区块链底层平台可以包括用户管理、基础服务、智能合约以及运营监控等处理模块。其中,用户管理模块负责所有区块链参与者的身份信息管理,包括维护公私钥生成(账户管理)、密钥管理以及用户真实身份和区块链地址对应关系维护(权限管理)等,并且在授权的情况下,监管和审计某些真实身份的交易情况,提供风险控制的规则配置(风控审计);基础服务模块部署在所有区块链节点设备上,用来验证业务请求的有效性,并对有效请求完成共识后记录到存储上,对于一个新的业务请求,基础服务先对接口适配解析和鉴权处理(接口适配),然后通过共识算法将业务信息加密(共识管理),在加密之后完整一致的传输至共享账本上(网络通信),并进行记录存储;智能合约模块负责合约的注册发行以及合约触发和合约执行,开发人员可以通过某种编程语言定义合约逻辑,发布到区块链上(合约注册),根据合约条款的逻辑,调用密钥或者其它的事件触发执行,完成合约逻辑,同时还提供对合约升级注销的功能;运营监控模块主要负责产品发布过程中的部署、配置的修改、合约设置、云适配以及产品运行中的实时状态的可视化输出,例如:告警、监控网络情况、监控节点设备健康状态等。The underlying platform of the blockchain can include processing modules such as user management, basic services, smart contracts, and operation monitoring. Among them, the user management module is responsible for the identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, and maintenance of the corresponding relationship between the user's real identity and blockchain address (authority management), etc. When authorized, supervise and audit the transactions of some real identities, and provide rule configuration for risk control (risk control audit); the basic service module is deployed on all blockchain node devices to verify the validity of business requests, After completing the consensus on valid requests, record them in the storage. For a new business request, the basic service first adapts the interface for analysis and authentication processing (interface adaptation), and then encrypts the business information through the consensus algorithm (consensus management), After encryption, it is completely and consistently transmitted to the shared ledger (network communication), and records are stored; the smart contract module is responsible for the registration and issuance of contracts, as well as contract triggering and contract execution. Developers can define contract logic through a programming language and publish to On the blockchain (contract registration), according to the logic of the contract terms, call the key or other events to trigger execution, complete the contract logic, and also provide the function of contract upgrade and cancellation; the operation monitoring module is mainly responsible for the deployment in the product release process , configuration modification, contract settings, cloud adaptation, and visual output of real-time status during product operation, such as: alarms, monitoring network conditions, monitoring node equipment health status, etc.
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the present application. For those skilled in the art, the present application may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included within the scope of the claims of this application.

Claims (20)

  1. 一种语音的智能客服回答方法,其中,包括:A voice-based intelligent customer service answering method, comprising:
    获取客户包含有问题的语音片段;Get the customer's voice snippet that contains the question;
    将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;Inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
    将所述第一语音编码进行音色标准化处理,得到第二语音编码;Carrying out the timbre standardization process with the described first speech code, obtains the second speech code;
    将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed The first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained. The speech segment is synchronously input into the speech decoder to be trained for training;
    将所述回答语音发送给所述客户。The answering voice is sent to the client.
  2. 如权利要求1所述的语音的智能客服回答方法,其中,所述将所述第一语音编码进行音色标准化处理,得到第二语音编码的步骤之前,还包括:The intelligent customer service answering method of voice according to claim 1, wherein, before the step of performing timbre standardization processing on the first voice coding to obtain the second voice coding, further comprising:
    提取所述语音片段中的第一声纹特征;extracting the first voiceprint feature in the speech segment;
    计算声纹模型库中各个声纹模型对应的第二声纹特征与所述第一声纹特征的相似度;Calculate the similarity between the second voiceprint feature corresponding to each voiceprint model in the voiceprint model library and the first voiceprint feature;
    根据计算结果筛选出相似度最大的声纹模型作为预训练的声纹模型,以对所述第一语音编码进行音色标准化处理。According to the calculation result, a voiceprint model with the greatest similarity is selected as a pre-trained voiceprint model, so as to perform timbre standardization processing on the first speech code.
  3. 如权利要求1所述的语音的智能客服回答方法,其中,所述将所述语音片段输入至语音编码器中,得到编码后的第一语音编码的步骤,包括:The intelligent customer service answering method of voice as claimed in claim 1, wherein the step of inputting the voice fragment into a voice encoder to obtain the encoded first voice code comprises:
    在所述语音编码器中,将所述语音片段进行预处理得到语音信号;其中,所述语音信号为按照时间顺序形成的一维信号;In the speech encoder, the speech segment is preprocessed to obtain a speech signal; wherein, the speech signal is a one-dimensional signal formed in time sequence;
    根据第一预定公式对所述一维信号进行压缩感知处理,得到目标特征信号;Perform compressed sensing processing on the one-dimensional signal according to the first predetermined formula to obtain a target characteristic signal;
    将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码。The target feature signal is input into the first recurrent neural network to obtain the first speech code.
  4. 如权利要求3所述的语音的智能客服回答方法,其中,所述将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码的步骤,包括:The voice-based intelligent customer service answering method according to claim 3, wherein the step of inputting the target feature signal into the first recurrent neural network to obtain the first voice coding comprises:
    在所述第一循环神经网络的隐含层中,根据第二预定公式对所述目标特征信号的每个特征信号点进行编码;其中,所述第二预定公式为h(i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b),σ为所述第一循环神经网络的激活函数;b为第一线性偏移系数;U为所述第一循环神经网络的第一线性关系系数,W为所述第一循环神经网络的第二线性关系系数,z(i)表示所述目标特征信号的第i个所述特征信号点,h(i)表示第i个所述特征信号点对应的编码值;In the hidden layer of the first recurrent neural network, each feature signal point of the target feature signal is encoded according to a second predetermined formula; wherein, the second predetermined formula is h(i)=σ[ z(i)]=σ(Uz(i)+Wh(i-1)+b), σ is the activation function of the first recurrent neural network; b is the first linear offset coefficient; U is the first A first linear relationship coefficient of a recurrent neural network, W is the second linear relationship coefficient of the first recurrent neural network, z(i) represents the i-th feature signal point of the target feature signal, h(i ) represents the coded value corresponding to the i-th feature signal point;
    按照各个所述特征信号点在所述目标特征信号中的顺序,对各个所述特征信号点对应的编码进行排序,得到所述第一语音编码。According to the sequence of each of the characteristic signal points in the target characteristic signal, the codes corresponding to each of the characteristic signal points are sorted to obtain the first speech code.
  5. 如权利要求1所述的语音的智能客服回答方法,其中,所述将所述第二语音编码输入至语音解码器中,得到回答语音的步骤,包括:The voice-based intelligent customer service answering method according to claim 1, wherein the step of inputting the second voice code into a voice decoder to obtain the answering voice comprises:
    获取所述第二语音编码中的语音编码序列;obtaining the speech coding sequence in the second speech coding;
    基于第二循环神经网络对所述语音编码序列进行解码,得到解码后的中间特征信号;Decode the speech coding sequence based on the second recurrent neural network to obtain a decoded intermediate feature signal;
    根据所述中间特征信号与回答语音的预设对应关系,得到所述回答语音;其中,所述预设对应关系通过对应的样本数据训练得到。The answer voice is obtained according to the preset correspondence between the intermediate feature signal and the answer voice; wherein the preset correspondence is obtained by training with corresponding sample data.
  6. 如权利要求1所述的语音的智能客服回答方法,其中,所述将所述回答语音发送给所述客户的步骤之前,还包括:The voice-based intelligent customer service answering method according to claim 1, wherein before the step of sending the answering voice to the customer, the method further comprises:
    提取所述语音片段中的第一声纹特征以及所述回答语音中的第三声纹特征;extracting the first voiceprint feature in the voice segment and the third voiceprint feature in the answering voice;
    检测所述第一声纹特征与所述第三声纹特征的相似度,并判断所述相似度是否大于相似度阈值;Detecting the similarity between the first voiceprint feature and the third voiceprint feature, and judging whether the similarity is greater than a similarity threshold;
    若大于所述相似度阈值,则执行所述将所述回答语音发送给所述客户的步骤。If it is greater than the similarity threshold, the step of sending the answer voice to the customer is performed.
  7. 一种语音的智能客服回答装置,其中,包括:A voice intelligent customer service answering device, comprising:
    获取单元,用于获取客户包含有问题的语音片段;An acquisition unit, which is used to acquire the voice segment that the customer contains in question;
    第一输入单元,用于将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;a first input unit for inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
    处理单元,用于将所述第一语音编码进行音色标准化处理,得到第二语音编码;a processing unit, configured to perform timbre standardization processing on the first speech code to obtain a second speech code;
    第二输入单元,用于将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;The second input unit is configured to input the second speech code into the speech decoder to obtain the answer speech; wherein, the speech encoder and the speech decoder are obtained through synchronous training, and the synchronous training method is to In the manual customer service, the first voice fragment of the question raised by the customer is input into the voice encoder to be trained, and the timbre standardization is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and artificial voice code are obtained. The second voice segment corresponding to the question answered by the customer service is synchronously input into the voice decoder to be trained for training;
    发送单元,用于将所述回答语音发送给所述客户。a sending unit, configured to send the answering voice to the client.
  8. 如权利要求7所述的语音的智能客服回答装置,其中,所述语音的智能客服回答装置,还包括:The voice intelligent customer service answering device according to claim 7, wherein the voice intelligent customer service answering device further comprises:
    声纹特征提取单元,用于提取所述语音片段中的第一声纹特征;a voiceprint feature extraction unit, configured to extract the first voiceprint feature in the voice segment;
    计算单元,用于计算声纹模型库中各个声纹模型对应的第二声纹特征与所述第一声纹特征的相似度;a computing unit, configured to calculate the similarity between the second voiceprint feature corresponding to each voiceprint model in the voiceprint model library and the first voiceprint feature;
    筛选单元,用于根据计算结果筛选出相似度最大的声纹模型作为预训练的声纹模型,以对所述第一语音编码进行音色标准化处理。The screening unit is configured to screen out the voiceprint model with the greatest similarity as a pre-trained voiceprint model according to the calculation result, so as to perform timbre standardization processing on the first speech code.
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现语音的智能客服回答方法的步骤:A computer device, comprising a memory and a processor, wherein the memory stores a computer program, wherein, when the processor executes the computer program, the steps of a voice-based intelligent customer service answering method are implemented:
    获取客户包含有问题的语音片段;Get the customer's voice snippet that contains the question;
    将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;Inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
    将所述第一语音编码进行音色标准化处理,得到第二语音编码;Carrying out the timbre standardization process with the described first speech code, obtains the second speech code;
    将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed The first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained. The speech segment is synchronously input into the speech decoder to be trained for training;
    将所述回答语音发送给所述客户。The answering voice is sent to the client.
  10. 如权利要求9所述的语音的智能客服回答方法,其中,所述将所述第一语音编码进行音色标准化处理,得到第二语音编码的步骤之前,还包括:The intelligent customer service answering method of voice as claimed in claim 9, wherein, before the step of performing timbre standardization processing on the first voice code to obtain the second voice code, the method further comprises:
    提取所述语音片段中的第一声纹特征;extracting the first voiceprint feature in the speech segment;
    计算声纹模型库中各个声纹模型对应的第二声纹特征与所述第一声纹特征的相似度;Calculate the similarity between the second voiceprint feature corresponding to each voiceprint model in the voiceprint model library and the first voiceprint feature;
    根据计算结果筛选出相似度最大的声纹模型作为预训练的声纹模型,以对所述第一语音编码进行音色标准化处理。According to the calculation result, a voiceprint model with the greatest similarity is selected as a pre-trained voiceprint model, so as to perform timbre standardization processing on the first speech code.
  11. 如权利要求9所述的语音的智能客服回答方法,其中,所述将所述语音片段输入至语音编码器中,得到编码后的第一语音编码的步骤,包括:The intelligent customer service answering method of voice according to claim 9, wherein the step of inputting the voice fragment into a voice encoder to obtain the encoded first voice code comprises:
    在所述语音编码器中,将所述语音片段进行预处理得到语音信号;其中,所述语音信号为按照时间顺序形成的一维信号;In the speech encoder, the speech segment is preprocessed to obtain a speech signal; wherein, the speech signal is a one-dimensional signal formed in time sequence;
    根据第一预定公式对所述一维信号进行压缩感知处理,得到目标特征信号;Perform compressed sensing processing on the one-dimensional signal according to the first predetermined formula to obtain a target characteristic signal;
    将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码。The target feature signal is input into the first recurrent neural network to obtain the first speech code.
  12. 如权利要求11所述的语音的智能客服回答方法,其中,所述将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码的步骤,包括:The voice-based intelligent customer service answering method according to claim 11, wherein the step of inputting the target feature signal into the first recurrent neural network to obtain the first voice coding comprises:
    在所述第一循环神经网络的隐含层中,根据第二预定公式对所述目标特征信号的每个特征信号点进行编码;其中,所述第二预定公式为h(i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b),σ为所述第一循环神经网络的激活函数;b为第一线性偏移系数;U为所述第一循环神经网络的第一线性关系系数,W为所述第一循环神经网络的第二线性关系系数,z(i)表示所述目标特征信号的第i个所述特征信号点,h(i)表示第i个所述特征信号点对应的编码值;In the hidden layer of the first recurrent neural network, each feature signal point of the target feature signal is encoded according to a second predetermined formula; wherein, the second predetermined formula is h(i)=σ[ z(i)]=σ(Uz(i)+Wh(i-1)+b), σ is the activation function of the first recurrent neural network; b is the first linear offset coefficient; U is the first A first linear relationship coefficient of a recurrent neural network, W is the second linear relationship coefficient of the first recurrent neural network, z(i) represents the i-th feature signal point of the target feature signal, h(i ) represents the coded value corresponding to the i-th feature signal point;
    按照各个所述特征信号点在所述目标特征信号中的顺序,对各个所述特征信号点对应的编码进行排序,得到所述第一语音编码。According to the sequence of each of the characteristic signal points in the target characteristic signal, the codes corresponding to each of the characteristic signal points are sorted to obtain the first speech code.
  13. 如权利要求9所述的语音的智能客服回答方法,其中,所述将所述第二语音编码输入至语音解码器中,得到回答语音的步骤,包括:The intelligent customer service answering method of voice as claimed in claim 9, wherein the step of inputting the second voice code into a voice decoder to obtain the answering voice comprises:
    获取所述第二语音编码中的语音编码序列;obtaining the speech coding sequence in the second speech coding;
    基于第二循环神经网络对所述语音编码序列进行解码,得到解码后的中间特征信号;Decode the speech coding sequence based on the second recurrent neural network to obtain a decoded intermediate feature signal;
    根据所述中间特征信号与回答语音的预设对应关系,得到所述回答语音;其中,所述预设对应关系通过对应的样本数据训练得到。The answer voice is obtained according to the preset correspondence between the intermediate feature signal and the answer voice; wherein the preset correspondence is obtained by training with corresponding sample data.
  14. 如权利要求9所述的语音的智能客服回答方法,其中,所述将所述回答语音发送给所述客户的步骤之前,还包括:The voice-based intelligent customer service answering method according to claim 9, wherein before the step of sending the answering voice to the customer, the method further comprises:
    提取所述语音片段中的第一声纹特征以及所述回答语音中的第三声纹特征;extracting the first voiceprint feature in the voice segment and the third voiceprint feature in the answering voice;
    检测所述第一声纹特征与所述第三声纹特征的相似度,并判断所述相似度是否大于相似度阈值;Detecting the similarity between the first voiceprint feature and the third voiceprint feature, and judging whether the similarity is greater than a similarity threshold;
    若大于所述相似度阈值,则执行所述将所述回答语音发送给所述客户的步骤。If it is greater than the similarity threshold, the step of sending the answer voice to the customer is performed.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现语音的智能客服回答方法的步骤:A computer-readable storage medium on which a computer program is stored, wherein when the computer program is executed by a processor, the steps of a voice-based intelligent customer service answering method are implemented:
    获取客户包含有问题的语音片段;Get the customer's voice snippet that contains the question;
    将所述语音片段输入至语音编码器中,得到编码后的第一语音编码;Inputting the speech fragment into a speech encoder to obtain the encoded first speech code;
    将所述第一语音编码进行音色标准化处理,得到第二语音编码;Carrying out the timbre standardization process with the described first speech code, obtains the second speech code;
    将所述第二语音编码输入至语音解码器中,得到回答语音;其中,所述语音编码器和所述语音解码器同步训练得到,所述同步训练的方式为将人工客服服务中,客户提出问题的第一语音片段输入至待训练的语音编码器中,并进行音色标准化处理,得到所述第一语音片段对应的语音编码,将所述对应的语音编码和人工客服回答问题对应的第二语音片段同步输入至待训练的语音解码器中,进行训练;Input the second voice code into the voice decoder to obtain the answer voice; wherein, the voice encoder and the voice decoder are obtained through synchronous training, and the synchronous training method is to use the manual customer service service, the customer proposed The first voice fragment of the question is input into the voice encoder to be trained, and the timbre standardization process is performed to obtain the voice code corresponding to the first voice fragment, and the corresponding voice code and the second voice corresponding to the manual customer service answering question are obtained. The speech segment is synchronously input into the speech decoder to be trained for training;
    将所述回答语音发送给所述客户。The answering voice is sent to the client.
  16. 如权利要求15所述的语音的智能客服回答方法,其中,所述将所述第一语音编码进行音色标准化处理,得到第二语音编码的步骤之前,还包括:The intelligent customer service answering method of voice according to claim 15, wherein before the step of performing timbre standardization processing on the first voice code to obtain the second voice code, the method further comprises:
    提取所述语音片段中的第一声纹特征;extracting the first voiceprint feature in the speech segment;
    计算声纹模型库中各个声纹模型对应的第二声纹特征与所述第一声纹特征的相似度;Calculate the similarity between the second voiceprint feature corresponding to each voiceprint model in the voiceprint model library and the first voiceprint feature;
    根据计算结果筛选出相似度最大的声纹模型作为预训练的声纹模型,以对所述第一语音编码进行音色标准化处理。According to the calculation result, a voiceprint model with the greatest similarity is selected as a pre-trained voiceprint model, so as to perform timbre standardization processing on the first speech code.
  17. 如权利要求15所述的语音的智能客服回答方法,其中,所述将所述语音片段输入至语音编码器中,得到编码后的第一语音编码的步骤,包括:The voice-based intelligent customer service answering method according to claim 15, wherein the step of inputting the voice segment into a voice encoder to obtain the encoded first voice code comprises:
    在所述语音编码器中,将所述语音片段进行预处理得到语音信号;其中,所述语音信号为按照时间顺序形成的一维信号;In the speech encoder, the speech segment is preprocessed to obtain a speech signal; wherein, the speech signal is a one-dimensional signal formed in time sequence;
    根据第一预定公式对所述一维信号进行压缩感知处理,得到目标特征信号;Perform compressed sensing processing on the one-dimensional signal according to the first predetermined formula to obtain a target characteristic signal;
    将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码。The target feature signal is input into the first recurrent neural network to obtain the first speech code.
  18. 如权利要求17所述的语音的智能客服回答方法,其中,所述将所述目标特征信号输入至第一循环神经网络中,得到所述第一语音编码的步骤,包括:The voice-based intelligent customer service answering method according to claim 17, wherein the step of inputting the target feature signal into the first recurrent neural network to obtain the first voice coding comprises:
    在所述第一循环神经网络的隐含层中,根据第二预定公式对所述目标特征信号的每个特征信号点进行编码;其中,所述第二预定公式为h(i)=σ[z(i)]=σ(Uz(i)+Wh(i-1)+b),σ为所述第一循环神经网络的激活函数;b为第一线性偏移系数;U为所述第一循环神经网络的第一线性关系系数,W为所述第一循环神经网络的第二线性关系系数,z(i)表示所述目标特征信号的第i个所述特征信号点,h(i)表示第i个所述特征信号点对应的编码值;In the hidden layer of the first recurrent neural network, each feature signal point of the target feature signal is encoded according to a second predetermined formula; wherein, the second predetermined formula is h(i)=σ[ z(i)]=σ(Uz(i)+Wh(i-1)+b), σ is the activation function of the first recurrent neural network; b is the first linear offset coefficient; U is the first A first linear relationship coefficient of a recurrent neural network, W is the second linear relationship coefficient of the first recurrent neural network, z(i) represents the i-th feature signal point of the target feature signal, h(i ) represents the coded value corresponding to the i-th feature signal point;
    按照各个所述特征信号点在所述目标特征信号中的顺序,对各个所述特征信号点对应的编码进行排序,得到所述第一语音编码。According to the sequence of each of the characteristic signal points in the target characteristic signal, the codes corresponding to each of the characteristic signal points are sorted to obtain the first speech code.
  19. 如权利要求15所述的语音的智能客服回答方法,其中,所述将所述第二语音编码输入至语音解码器中,得到回答语音的步骤,包括:The voice-based intelligent customer service answering method according to claim 15, wherein the step of inputting the second voice code into a voice decoder to obtain the answering voice comprises:
    获取所述第二语音编码中的语音编码序列;obtaining the speech coding sequence in the second speech coding;
    基于第二循环神经网络对所述语音编码序列进行解码,得到解码后的中间特征信号;Decode the speech coding sequence based on the second recurrent neural network to obtain a decoded intermediate feature signal;
    根据所述中间特征信号与回答语音的预设对应关系,得到所述回答语音;其中,所述预设对应关系通过对应的样本数据训练得到。The answer voice is obtained according to the preset correspondence between the intermediate feature signal and the answer voice; wherein the preset correspondence is obtained by training with corresponding sample data.
  20. 如权利要求15所述的语音的智能客服回答方法,其中,所述将所述回答语音发送给所述客户的步骤之前,还包括:The voice-based intelligent customer service answering method according to claim 15, wherein before the step of sending the answering voice to the customer, the method further comprises:
    提取所述语音片段中的第一声纹特征以及所述回答语音中的第三声纹特征;extracting the first voiceprint feature in the voice segment and the third voiceprint feature in the answering voice;
    检测所述第一声纹特征与所述第三声纹特征的相似度,并判断所述相似度是否大于相似度阈值;Detecting the similarity between the first voiceprint feature and the third voiceprint feature, and judging whether the similarity is greater than a similarity threshold;
    若大于所述相似度阈值,则执行所述将所述回答语音发送给所述客户的步骤。If it is greater than the similarity threshold, the step of sending the answer voice to the customer is performed.
PCT/CN2021/096981 2021-04-27 2021-05-28 Intelligent customer service staff answering method and apparatus for speech, and computer device WO2022227188A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110462426.9 2021-04-27
CN202110462426.9A CN112951215B (en) 2021-04-27 2021-04-27 Voice intelligent customer service answering method and device and computer equipment

Publications (1)

Publication Number Publication Date
WO2022227188A1 true WO2022227188A1 (en) 2022-11-03

Family

ID=76233541

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096981 WO2022227188A1 (en) 2021-04-27 2021-05-28 Intelligent customer service staff answering method and apparatus for speech, and computer device

Country Status (2)

Country Link
CN (1) CN112951215B (en)
WO (1) WO2022227188A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564280A (en) * 2023-07-05 2023-08-08 深圳市彤兴电子有限公司 Display control method and device based on voice recognition and computer equipment

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556087B (en) * 2023-10-30 2024-04-26 广州圈量网络信息科技有限公司 Customer service reply data processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202238A (en) * 2016-06-30 2016-12-07 马根昌 Real person's analogy method
CN106448670A (en) * 2016-10-21 2017-02-22 竹间智能科技(上海)有限公司 Dialogue automatic reply system based on deep learning and reinforcement learning
CN108172209A (en) * 2018-01-09 2018-06-15 上海大学 Build voice idol method
KR20180100001A (en) * 2017-02-28 2018-09-06 서울대학교산학협력단 System, method and recording medium for machine-learning based korean language conversation using artificial intelligence
CN112669863A (en) * 2020-12-28 2021-04-16 科讯嘉联信息技术有限公司 Man-machine relay service method based on sound changing capability

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108648745B (en) * 2018-03-15 2020-09-01 上海电力学院 Method for converting lip image sequence into voice coding parameter
CN109003614A (en) * 2018-07-31 2018-12-14 上海爱优威软件开发有限公司 A kind of voice transmission method, voice-transmission system and terminal
CN110265008A (en) * 2019-05-23 2019-09-20 中国平安人寿保险股份有限公司 Intelligence pays a return visit method, apparatus, computer equipment and storage medium
CN110990543A (en) * 2019-10-18 2020-04-10 平安科技(深圳)有限公司 Intelligent conversation generation method and device, computer equipment and computer storage medium
CN111312228A (en) * 2019-12-09 2020-06-19 中国南方电网有限责任公司 End-to-end-based voice navigation method applied to electric power enterprise customer service
CN111883140B (en) * 2020-07-24 2023-07-21 中国平安人寿保险股份有限公司 Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition
CN111986675A (en) * 2020-08-20 2020-11-24 深圳Tcl新技术有限公司 Voice conversation method, device and computer readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202238A (en) * 2016-06-30 2016-12-07 马根昌 Real person's analogy method
CN106448670A (en) * 2016-10-21 2017-02-22 竹间智能科技(上海)有限公司 Dialogue automatic reply system based on deep learning and reinforcement learning
KR20180100001A (en) * 2017-02-28 2018-09-06 서울대학교산학협력단 System, method and recording medium for machine-learning based korean language conversation using artificial intelligence
CN108172209A (en) * 2018-01-09 2018-06-15 上海大学 Build voice idol method
CN112669863A (en) * 2020-12-28 2021-04-16 科讯嘉联信息技术有限公司 Man-machine relay service method based on sound changing capability

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116564280A (en) * 2023-07-05 2023-08-08 深圳市彤兴电子有限公司 Display control method and device based on voice recognition and computer equipment
CN116564280B (en) * 2023-07-05 2023-09-08 深圳市彤兴电子有限公司 Display control method and device based on voice recognition and computer equipment

Also Published As

Publication number Publication date
CN112951215B (en) 2024-05-07
CN112951215A (en) 2021-06-11

Similar Documents

Publication Publication Date Title
KR101963993B1 (en) Identification system and method with self-learning function based on dynamic password voice
CN112256825B (en) Medical field multi-round dialogue intelligent question-answering method and device and computer equipment
WO2022227188A1 (en) Intelligent customer service staff answering method and apparatus for speech, and computer device
CN112712813B (en) Voice processing method, device, equipment and storage medium
CN113724695B (en) Electronic medical record generation method, device, equipment and medium based on artificial intelligence
KR102625184B1 (en) Speech synthesis training to create unique speech sounds
WO2021047319A1 (en) Voice-based personal credit assessment method and apparatus, terminal and storage medium
CN111862934B (en) Method for improving speech synthesis model and speech synthesis method and device
CN112530409B (en) Speech sample screening method and device based on geometry and computer equipment
CN110750774B (en) Identity recognition method and device
CN110047510A (en) Audio identification methods, device, computer equipment and storage medium
CN110704618B (en) Method and device for determining standard problem corresponding to dialogue data
CN111883140A (en) Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition
CN110265008A (en) Intelligence pays a return visit method, apparatus, computer equipment and storage medium
CN115563290B (en) Intelligent emotion recognition method based on context modeling
CN116434741A (en) Speech recognition model training method, device, computer equipment and storage medium
CN114997174B (en) Intention recognition model training and voice intention recognition method and device and related equipment
Revathi et al. Digital speech watermarking to enhance the security using speech as a biometric for person authentication
CN113823257B (en) Speech synthesizer construction method, speech synthesis method and device
WO2022126969A1 (en) Service voice quality inspection method, apparatus and device, and storage medium
CN114398487A (en) Method, device, equipment and storage medium for outputting reference information of online session
Londhe et al. Extracting Behavior Identification Features for Monitoring and Managing Speech-Dependent Smart Mental Illness Healthcare Systems
CN115829592A (en) Anti-fraud propaganda method and system thereof
CN112652314A (en) Method, device, equipment and medium for verifying disabled object based on voiceprint shading
Springenberg et al. Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938658

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938658

Country of ref document: EP

Kind code of ref document: A1