CN110580899A - Voice recognition method and device, storage medium and computing equipment - Google Patents

Voice recognition method and device, storage medium and computing equipment Download PDF

Info

Publication number
CN110580899A
CN110580899A CN201910967019.6A CN201910967019A CN110580899A CN 110580899 A CN110580899 A CN 110580899A CN 201910967019 A CN201910967019 A CN 201910967019A CN 110580899 A CN110580899 A CN 110580899A
Authority
CN
China
Prior art keywords
voice data
emotion
data
detection model
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910967019.6A
Other languages
Chinese (zh)
Inventor
李君浩
邹婷婷
顾少丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lake Information Technology Co Ltd
Original Assignee
Shanghai Lake Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lake Information Technology Co Ltd filed Critical Shanghai Lake Information Technology Co Ltd
Priority to CN201910967019.6A priority Critical patent/CN110580899A/en
Publication of CN110580899A publication Critical patent/CN110580899A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/50Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
    • H04M3/51Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
    • H04M3/5175Call or contact centers supervision arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A voice recognition method and device, a storage medium and a computing device are provided, wherein the voice recognition method comprises the following steps: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score. The technical scheme provided by the invention can efficiently and accurately complete the detection of the voice data and improve the detection rate of the illegal voice.

Description

Voice recognition method and device, storage medium and computing equipment
Technical Field
The invention relates to the technical field of voice detection, in particular to a voice recognition method and device, a storage medium and computing equipment.
background
With the development of communication technology, call centers generate huge amounts of telephone recording files every day. When the conversation content quality inspection work is carried out, the traditional quality inspection method can adopt a manual spot inspection mode to randomly spot inspect a small number of telephone recording files so as to judge whether the conversation content of customer service personnel violates rules. However, the traditional quality inspection method has low efficiency, cannot check each telephone recording file one by one, and is difficult to find the working quality of customer service staff through the recording files in time.
Disclosure of Invention
The invention solves the technical problem of how to efficiently and accurately identify illegal voices.
To solve the foregoing technical problem, an embodiment of the present invention provides a speech recognition method, including: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score.
Optionally, the determining whether the voice data to be detected has the violation risk based on the emotion score includes: and when the emotion score is higher than a preset threshold value, determining that the voice data to be detected has violation risk.
Optionally, the speech recognition method further includes: and marking the voice data to be detected with violation risk.
optionally, the training to obtain the emotion detection model based on the emotion feature vector and the text data includes: and training by adopting a neural network algorithm based on the emotion feature vector and the text data to obtain the emotion detection model.
Optionally, the training to obtain the emotion detection model based on the feature vector and the text data includes: and training by adopting a logistic regression algorithm based on the emotion feature vector and the text data to obtain the emotion detection model.
Optionally, the emotion feature vector is used to represent an emotion type, and the emotion type is selected from: happiness, sadness, anger, fear, disgust.
optionally, the converting the set of voice data into text data includes: and converting the voice data into the text data by adopting a voice-to-text technology.
Optionally, the voice data includes voice data of a first character and voice data of a second character, and the extracting an emotion feature vector from a set of voice data and converting the set of voice data into text data includes: distinguishing voice data of a first role from voice data of a second role in the group of voice data to obtain the voice data of the first role and the voice data of the second role; and extracting emotion characteristic vectors of the voice data of the first role and the voice data of the second role respectively, and converting the voice data of the first role and the voice data of the second role into text data respectively.
In order to solve the above technical problem, an embodiment of the present invention further provides a speech recognition apparatus, including: the extraction module is used for extracting emotion characteristic vectors from a group of voice data and converting the group of voice data into text data; the training module is used for training to obtain an emotion detection model based on the emotion feature vector and the text data, and the emotion detection model is used for calculating an emotion score; the calculation module is used for calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and the judging module is used for judging whether the voice data to be detected has violation risks or not based on the emotion scores.
To solve the above technical problem, an embodiment of the present invention further provides a storage medium having stored thereon computer instructions, where the computer instructions execute the steps of the above method when executed.
in order to solve the above technical problem, an embodiment of the present invention further provides a computing device, including a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the above method.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
The embodiment of the invention provides a voice recognition method, which comprises the following steps: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score. According to the embodiment of the invention, the emotion feature vector extracted from the voice data and the text data are used as input data, and the emotion detection model is obtained through training. Because a large amount of voice data can be used as input data of the training model, the statistical advantages can be exerted, and the emotion detection model with high accuracy can be obtained through the training method. The voice data to be detected is determined based on the emotion detection model with high accuracy, so that the detection of the voice data can be completed more efficiently and accurately, and the detection rate of illegal voices is improved. Furthermore, the embodiment of the invention is suitable for mass voice detection and can expand voice detection scenes.
Further, the training of the emotion detection model based on the emotion feature vector and the text data includes: and training by adopting a neural network algorithm based on the emotion feature vector and the text data to obtain the emotion detection model. According to the embodiment of the invention, the neural network model is adopted as the emotion detection model, and the emotion detection model with higher accuracy can be trained by virtue of the advantages of the neural network, so that the detection rate of illegal voices is further improved.
Drawings
FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a speech recognition method in an exemplary scenario according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention.
Detailed Description
As for the background art, the prior art uses a manual sampling inspection method to search for illegal voices, which is inefficient.
the inventor of the present application has found that, in the prior art, the following steps may also be adopted to determine whether the voice data is an illegal voice: firstly, converting voice data to be detected into text data, and extracting emotion characteristic vectors of the voice data to be detected; secondly, determining voice characteristics according to the emotion characteristic vector, and searching whether text data obtained through conversion contains preset keywords or not; and then, comprehensively determining whether the audio data is illegal voice data or not by combining the voice features and preset keywords.
However, when the prior art scheme is adopted to analyze each telephone recording in a large number of telephone recording files, the common information of the illegal voice data statistics cannot be acquired, and the accuracy is low.
The embodiment of the invention provides a voice recognition method, which comprises the following steps: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score.
According to the embodiment of the invention, the emotion feature vector extracted from the voice data and the text data are used as input data, and the emotion detection model is obtained through training. Because a large amount of voice data can be used as input data of the training model, the statistical advantages can be exerted, and the emotion detection model with high accuracy can be obtained through the training method. The voice data to be detected is determined based on the emotion detection model with high accuracy, so that the detection of the voice data can be completed more efficiently and accurately, and the detection rate of illegal voices is improved. Furthermore, the embodiment of the invention is suitable for mass voice detection and can expand voice detection scenes.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Fig. 1 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention. The speech recognition method may be performed by a computing device, such as a server, a personal terminal, or the like.
Specifically, the speech recognition method may include the steps of:
Step S101, extracting emotion characteristic vectors from a group of voice data, and converting the group of voice data into text data;
Step S102, training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score;
Step S103, calculating emotion scores of the voice data to be detected based on the voice data to be detected and the emotion detection model;
And step S104, judging whether the voice data to be detected has violation risk or not based on the emotion score.
More specifically, each recording file of the call center can be used as one voice data, so that a huge amount of voice data can be obtained.
in step S101, at least a portion of the mass voice data may be treated as a set of voice data. And extracting the emotion feature vector of each voice data from the group of voice data, and further obtaining a plurality of emotion feature vectors.
wherein, the emotion feature vector can be used to represent or describe the emotion type, and the emotion type can be happy (happy), sad (sadness), angry (anger), fear (fear), disgust (distust).
those skilled in the art understand that each voice data may generally contain voice output by a plurality of characters. For example, a voice recording recorded at a call center will typically include voices output by two roles, e.g., by customer service personnel and customer personnel, respectively.
Taking the example that the voice data includes voice data of two characters, the voice data may include voice data of a first character and voice data of a second character. At this time, the voice data of the first character and the voice data of the second character may be first distinguished to obtain the voice data of the first character and the voice data of the second character.
In a specific implementation, the voice data of the customer service personnel and the voice data of the customer service personnel can be distinguished in advance in the voice data recorded for evaluation. For example, the customer service person outputs voice data through a first frequency, and the customer service person outputs data through a second frequency, which is different from the first frequency. Also for example, the distinction may be made by keywords or common languages of different characters.
Thereafter, emotion feature vectors of the voice data of the first character and the voice data of the second character may be extracted, respectively.
Further, each voice data may be converted into text data. In one embodiment, an Automatic Speech Recognition (ASR) technique may be used to convert each piece of Speech data into text data, so as to obtain a plurality of pieces of text data.
Take the example that the voice data includes voice data of a first character and voice data of a second character. In a specific implementation, the voice data of the first character and the voice data of the second character can be distinguished in advance. And then, respectively converting the voice data of the first role and the voice data of the second role into text data.
In step S102, the emotion feature vectors and the text data may be trained, so as to obtain an emotion detection model. And taking the voice data to be detected as the input of the emotion detection model, and outputting the emotion score of the voice data to be detected.
In one embodiment, the emotion detection model may be obtained by training using a neural network algorithm based on the emotion feature vector and the text data. Preferably, the neural network algorithm may adopt a long short-Term Memory network (LSTM) algorithm. LSTM is a time-cycled neural network.
In another embodiment, the feature vector and the text data may be used as input data of a Logistic Regression (LR) algorithm, and the emotion detection model is obtained through training.
In a specific implementation, the speech data and the text data of each character can be input to the emotion detection model together to train the emotion detection model. For example, voice data and text data of each character are marked so as to distinguish voice data and text data of different characters.
in step S103, an emotion score of the voice data to be detected is calculated based on the voice data to be detected and the emotion detection model. The voice data to be detected can be voice data in a preset time period or a recorded voice file. And inputting the voice data to be detected into the emotion detection model, and calculating the emotion score of the voice data to be detected through the emotion detection model.
In one embodiment, the voice data to be detected is a voice file, and the voice file includes voice data of a first role and voice data of a second role. It is assumed that the voice data of the first role is the voice data of the customer service personnel, and the voice data of the second role is the voice data of the customer service personnel. After the voice file is distinguished and marked, the voice file may be input to the emotion detection model, which outputs an emotion score that is an emotion score for a first character (e.g., a customer service person).
It should be noted that the speech data of the second character facilitates the emotion detection model to calculate the emotion score of the first character.
In step S104, it may be determined whether the voice data to be detected has a violation risk based on the emotion score. Taking the call center attendant as an example, the violation can refer to the attendant's occurrence of aggressive, abusive, or the like language during the course of a conversation with the attendant.
in specific implementation, a preset threshold may be set by the emotion detection model, and whether the voice data to be detected has a violation risk is determined by using the preset threshold.
if the emotion score is not higher than the preset threshold, it may be determined that the voice data to be detected is not at risk of violation.
If the emotion score is higher than the preset threshold, it may be determined that the voice data to be detected has a violation risk. Further, the voice data to be detected with violation risk can be labeled.
In practical applications, the voice data with the tag may be further confirmed manually to review the voice data.
fig. 2 is a flowchart illustrating a speech recognition method in a typical scenario according to an embodiment of the present invention. As shown in fig. 2, in a typical scenario, a recording file recorded by a call center may be used as voice data, and after obtaining an emotion detection model, the emotion detection model is used to determine whether the recording file has a violation risk.
Specifically, first, operation S201 may be performed to acquire voice data, for example, to acquire a recording file of a call center.
Next, operation S202 may be performed to convert the voice data into text data. Specifically, ASR technology can be used to obtain the text content corresponding to each audio file, and distinguish two conversational roles, i.e., customer service staff and client staff.
Again, operation S203 may be performed to extract an emotion feature vector from the voice data. Specifically, it is possible to derive an emotion feature vector using an acoustic emotion model in the related art, determine which of five emotions, i.e., joy, hurry, anger, fear, neutrality, and neutral, the emotion of two conversational characters belongs to, and output the corresponding emotion feature vector.
Further, an operation S204 may be performed to train the emotion detection model. Specifically, the text content and the emotion feature vector can be used as the input of the emotion detection model, and the emotion detection model is obtained by training through a neural network algorithm or a logistic regression algorithm.
Thereafter, operations S205 and S206 may be performed to input the voice file to be detected to the emotion detection model and calculate an emotion score. Specifically, a voice file to be detected is input to the emotion detection model, and an emotion score is output.
further, if the emotion score output by the emotion detection model exceeds a preset threshold, the sound recording file may be tagged (not shown).
Further, the tagged audio may be provided to a human for further confirmation. The preset threshold may be determined comprehensively according to the rechecking manpower condition and the accuracy related index (not shown).
therefore, the embodiment of the invention fully utilizes the mass voice data to train, so as to obtain the training model (namely the emotion detection model) with higher accuracy, the training model is suitable for mass voice detection, the detection of the voice data can be efficiently and accurately completed, and the detection rate of the illegal voice is improved.
Fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention. The speech recognition device 3 may implement the method solutions shown in fig. 1 and 2, and be executed by a computing device.
specifically, the speech recognition apparatus 3 may include: an extraction module 31, configured to extract an emotion feature vector from a set of voice data, and convert the set of voice data into text data; the training module 32 is used for training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; the calculation module 33 is used for calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and the judging module 34 is used for judging whether the voice data to be detected has violation risk or not based on the emotion score.
In a specific implementation, the determining module 34 may include: a determining submodule 341, configured to determine that the voice data to be detected has a violation risk when the emotion score is higher than a preset threshold.
In a specific implementation, the speech recognition apparatus 3 may further include: and the marking module 35 is used for marking the voice data to be detected with violation risk.
In one embodiment, the training module 32 may include: the first training submodule 321 obtains the emotion detection model by training with a neural network algorithm based on the emotion feature vector and the text data.
in another embodiment, the training module 32 may include: and the second training submodule 322 obtains the emotion detection model by training through a logistic regression algorithm based on the feature vector and the text data.
In a specific implementation, the emotional feature vector may be used to represent a type of emotion, which may be selected from: happiness, sadness, anger, fear, disgust.
in a specific implementation, the extraction module 31 may include: the conversion sub-module 311 is configured to convert the voice data into the text data by using a voice-to-text technique.
In a specific implementation, the voice data may include voice data of a first character and voice data of a second character, and the extraction module 31 may include: a distinguishing submodule 312, configured to distinguish voice data of a first role from voice data of a second role in the group of voice data to obtain voice data of the first role and voice data of the second role; the extracting sub-module 313 is configured to extract emotion feature vectors of the voice data of the first character and the voice data of the second character, and convert the voice data of the first character and the voice data of the second character into text data.
For more details of the operation principle and the operation mode of the speech recognition apparatus 3, reference may be made to the related description in fig. 1 and fig. 2, and details are not repeated here.
Further, the embodiment of the present invention also discloses a storage medium, on which computer instructions are stored, and when the computer instructions are executed, the technical solution of the method in the embodiment shown in fig. 1 and fig. 2 is executed. Preferably, the storage medium may include a computer-readable storage medium such as a non-volatile (non-volatile) memory or a non-transitory (non-transient) memory. The storage medium may include ROM, RAM, magnetic or optical disks, etc.
further, the embodiment of the present invention also discloses a computing device, which includes a memory and a processor, where the memory stores computer instructions capable of running on the processor, and the processor executes the computer instructions to execute the technical solutions of the methods described in the embodiments shown in fig. 1 and fig. 2.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (11)

1. A speech recognition method, comprising:
Extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data;
Training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score;
Calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model;
And judging whether the voice data to be detected has violation risk or not based on the emotion score.
2. The voice recognition method according to claim 1, wherein the determining whether the voice data to be detected has a violation risk based on the emotion score comprises:
And when the emotion score is higher than a preset threshold value, determining that the voice data to be detected has violation risk.
3. The speech recognition method of claim 2, further comprising:
And marking the voice data to be detected with violation risk.
4. The speech recognition method of any one of claims 1 to 3, wherein training a sentiment detection model based on the sentiment feature vectors and the text data comprises:
and training by adopting a neural network algorithm based on the emotion feature vector and the text data to obtain the emotion detection model.
5. The speech recognition method of any one of claims 1 to 3, wherein training a sentiment detection model based on the sentiment feature vectors and the text data comprises:
And training by adopting a logistic regression algorithm based on the feature vector and the text data to obtain the emotion detection model.
6. a speech recognition method according to any one of claims 1 to 3, wherein the emotion feature vector is used to represent an emotion type selected from: happiness, sadness, anger, fear, disgust.
7. The speech recognition method of any one of claims 1 to 3, wherein the converting the set of speech data into text data comprises:
And converting the voice data into the text data by adopting a voice-to-text technology.
8. The speech recognition method of any one of claims 1 to 3, wherein the speech data comprises speech data of a first character and speech data of a second character, and wherein extracting an emotional feature vector from a set of speech data and converting the set of speech data into text data comprises:
Distinguishing voice data of a first role from voice data of a second role in the group of voice data to obtain the voice data of the first role and the voice data of the second role;
and extracting emotion characteristic vectors of the voice data of the first role and the voice data of the second role respectively, and converting the voice data of the first role and the voice data of the second role into text data respectively.
9. A speech recognition apparatus, comprising:
The extraction module is used for extracting emotion characteristic vectors from a group of voice data and converting the group of voice data into text data;
The training module is used for training to obtain an emotion detection model based on the emotion feature vector and the text data, and the emotion detection model is used for calculating an emotion score;
The calculation module is used for calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model;
And the judging module is used for judging whether the voice data to be detected has violation risks or not based on the emotion scores.
10. A storage medium having stored thereon computer instructions, characterized in that the computer instructions are operative to perform the steps of the method of any one of claims 1 to 8.
11. A computing device comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any of claims 1 to 8.
CN201910967019.6A 2019-10-12 2019-10-12 Voice recognition method and device, storage medium and computing equipment Pending CN110580899A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910967019.6A CN110580899A (en) 2019-10-12 2019-10-12 Voice recognition method and device, storage medium and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910967019.6A CN110580899A (en) 2019-10-12 2019-10-12 Voice recognition method and device, storage medium and computing equipment

Publications (1)

Publication Number Publication Date
CN110580899A true CN110580899A (en) 2019-12-17

Family

ID=68814476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910967019.6A Pending CN110580899A (en) 2019-10-12 2019-10-12 Voice recognition method and device, storage medium and computing equipment

Country Status (1)

Country Link
CN (1) CN110580899A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291184A (en) * 2020-01-20 2020-06-16 百度在线网络技术(北京)有限公司 Expression recommendation method, device, equipment and storage medium
CN111405128A (en) * 2020-03-24 2020-07-10 中国—东盟信息港股份有限公司 Call quality inspection system based on voice-to-text conversion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107705807A (en) * 2017-08-24 2018-02-16 平安科技(深圳)有限公司 Voice quality detecting method, device, equipment and storage medium based on Emotion identification
CN107886951A (en) * 2016-09-29 2018-04-06 百度在线网络技术(北京)有限公司 A kind of speech detection method, device and equipment
CN107885723A (en) * 2017-11-03 2018-04-06 广州杰赛科技股份有限公司 Conversational character differentiating method and system
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN109102805A (en) * 2018-09-20 2018-12-28 北京长城华冠汽车技术开发有限公司 Voice interactive method, device and realization device
US20190198040A1 (en) * 2017-12-22 2019-06-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Mood recognition method, electronic device and computer-readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886951A (en) * 2016-09-29 2018-04-06 百度在线网络技术(北京)有限公司 A kind of speech detection method, device and equipment
CN107705807A (en) * 2017-08-24 2018-02-16 平安科技(深圳)有限公司 Voice quality detecting method, device, equipment and storage medium based on Emotion identification
CN107885723A (en) * 2017-11-03 2018-04-06 广州杰赛科技股份有限公司 Conversational character differentiating method and system
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
US20190198040A1 (en) * 2017-12-22 2019-06-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Mood recognition method, electronic device and computer-readable storage medium
CN109102805A (en) * 2018-09-20 2018-12-28 北京长城华冠汽车技术开发有限公司 Voice interactive method, device and realization device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291184A (en) * 2020-01-20 2020-06-16 百度在线网络技术(北京)有限公司 Expression recommendation method, device, equipment and storage medium
CN111291184B (en) * 2020-01-20 2023-07-18 百度在线网络技术(北京)有限公司 Expression recommendation method, device, equipment and storage medium
CN111405128A (en) * 2020-03-24 2020-07-10 中国—东盟信息港股份有限公司 Call quality inspection system based on voice-to-text conversion

Similar Documents

Publication Publication Date Title
CN110910901B (en) Emotion recognition method and device, electronic equipment and readable storage medium
CN109658923B (en) Speech quality inspection method, equipment, storage medium and device based on artificial intelligence
CN106847305B (en) Method and device for processing recording data of customer service telephone
CN112951275B (en) Voice quality inspection method and device, electronic equipment and medium
CN109192225B (en) Method and device for recognizing and marking speech emotion
CN111597818B (en) Call quality inspection method, device, computer equipment and computer readable storage medium
CN112966082A (en) Audio quality inspection method, device, equipment and storage medium
CN111462758A (en) Method, device and equipment for intelligent conference role classification and storage medium
CN112509561A (en) Emotion recognition method, device, equipment and computer readable storage medium
CN110580899A (en) Voice recognition method and device, storage medium and computing equipment
CN113505606B (en) Training information acquisition method and device, electronic equipment and storage medium
CN112364622A (en) Dialog text analysis method, dialog text analysis device, electronic device and storage medium
CN108962228B (en) Model training method and device
CN113516994B (en) Real-time voice recognition method, device, equipment and medium
US9875236B2 (en) Analysis object determination device and analysis object determination method
WO2020199590A1 (en) Mood detection analysis method and related device
CN114254088A (en) Method for constructing automatic response model and automatic response method
CN114418320A (en) Customer service quality evaluation method, apparatus, device, medium, and program product
CN115273854B (en) Service quality determining method and device, electronic equipment and storage medium
CN111061815A (en) Conversation data classification method
CN117271778B (en) Insurance outbound session information output method and device based on generation type large model
CN115759113B (en) Method and device for identifying sentence semantics in dialogue information
CN107798480B (en) Service quality evaluation method and system for customer service
CN115050370A (en) Target text acquisition method and device based on voice recognition and storage medium
CN113889149B (en) Speech emotion recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191217

RJ01 Rejection of invention patent application after publication