CN110580899A - Voice recognition method and device, storage medium and computing equipment - Google Patents
Voice recognition method and device, storage medium and computing equipment Download PDFInfo
- Publication number
- CN110580899A CN110580899A CN201910967019.6A CN201910967019A CN110580899A CN 110580899 A CN110580899 A CN 110580899A CN 201910967019 A CN201910967019 A CN 201910967019A CN 110580899 A CN110580899 A CN 110580899A
- Authority
- CN
- China
- Prior art keywords
- voice data
- emotion
- data
- detection model
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000008451 emotion Effects 0.000 claims abstract description 164
- 238000001514 detection method Methods 0.000 claims abstract description 83
- 239000013598 vector Substances 0.000 claims abstract description 53
- 238000012549 training Methods 0.000 claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 claims description 9
- 230000015654 memory Effects 0.000 claims description 8
- 238000007477 logistic regression Methods 0.000 claims description 6
- 238000000605 extraction Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000002996 emotional effect Effects 0.000 claims description 2
- 238000007689 inspection Methods 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5175—Call or contact centers supervision arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Child & Adolescent Psychology (AREA)
- General Health & Medical Sciences (AREA)
- Hospice & Palliative Care (AREA)
- Psychiatry (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Telephonic Communication Services (AREA)
Abstract
A voice recognition method and device, a storage medium and a computing device are provided, wherein the voice recognition method comprises the following steps: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score. The technical scheme provided by the invention can efficiently and accurately complete the detection of the voice data and improve the detection rate of the illegal voice.
Description
Technical Field
The invention relates to the technical field of voice detection, in particular to a voice recognition method and device, a storage medium and computing equipment.
background
With the development of communication technology, call centers generate huge amounts of telephone recording files every day. When the conversation content quality inspection work is carried out, the traditional quality inspection method can adopt a manual spot inspection mode to randomly spot inspect a small number of telephone recording files so as to judge whether the conversation content of customer service personnel violates rules. However, the traditional quality inspection method has low efficiency, cannot check each telephone recording file one by one, and is difficult to find the working quality of customer service staff through the recording files in time.
Disclosure of Invention
The invention solves the technical problem of how to efficiently and accurately identify illegal voices.
To solve the foregoing technical problem, an embodiment of the present invention provides a speech recognition method, including: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score.
Optionally, the determining whether the voice data to be detected has the violation risk based on the emotion score includes: and when the emotion score is higher than a preset threshold value, determining that the voice data to be detected has violation risk.
Optionally, the speech recognition method further includes: and marking the voice data to be detected with violation risk.
optionally, the training to obtain the emotion detection model based on the emotion feature vector and the text data includes: and training by adopting a neural network algorithm based on the emotion feature vector and the text data to obtain the emotion detection model.
Optionally, the training to obtain the emotion detection model based on the feature vector and the text data includes: and training by adopting a logistic regression algorithm based on the emotion feature vector and the text data to obtain the emotion detection model.
Optionally, the emotion feature vector is used to represent an emotion type, and the emotion type is selected from: happiness, sadness, anger, fear, disgust.
optionally, the converting the set of voice data into text data includes: and converting the voice data into the text data by adopting a voice-to-text technology.
Optionally, the voice data includes voice data of a first character and voice data of a second character, and the extracting an emotion feature vector from a set of voice data and converting the set of voice data into text data includes: distinguishing voice data of a first role from voice data of a second role in the group of voice data to obtain the voice data of the first role and the voice data of the second role; and extracting emotion characteristic vectors of the voice data of the first role and the voice data of the second role respectively, and converting the voice data of the first role and the voice data of the second role into text data respectively.
In order to solve the above technical problem, an embodiment of the present invention further provides a speech recognition apparatus, including: the extraction module is used for extracting emotion characteristic vectors from a group of voice data and converting the group of voice data into text data; the training module is used for training to obtain an emotion detection model based on the emotion feature vector and the text data, and the emotion detection model is used for calculating an emotion score; the calculation module is used for calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and the judging module is used for judging whether the voice data to be detected has violation risks or not based on the emotion scores.
To solve the above technical problem, an embodiment of the present invention further provides a storage medium having stored thereon computer instructions, where the computer instructions execute the steps of the above method when executed.
in order to solve the above technical problem, an embodiment of the present invention further provides a computing device, including a memory and a processor, where the memory stores computer instructions executable on the processor, and the processor executes the computer instructions to perform the steps of the above method.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:
The embodiment of the invention provides a voice recognition method, which comprises the following steps: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score. According to the embodiment of the invention, the emotion feature vector extracted from the voice data and the text data are used as input data, and the emotion detection model is obtained through training. Because a large amount of voice data can be used as input data of the training model, the statistical advantages can be exerted, and the emotion detection model with high accuracy can be obtained through the training method. The voice data to be detected is determined based on the emotion detection model with high accuracy, so that the detection of the voice data can be completed more efficiently and accurately, and the detection rate of illegal voices is improved. Furthermore, the embodiment of the invention is suitable for mass voice detection and can expand voice detection scenes.
Further, the training of the emotion detection model based on the emotion feature vector and the text data includes: and training by adopting a neural network algorithm based on the emotion feature vector and the text data to obtain the emotion detection model. According to the embodiment of the invention, the neural network model is adopted as the emotion detection model, and the emotion detection model with higher accuracy can be trained by virtue of the advantages of the neural network, so that the detection rate of illegal voices is further improved.
Drawings
FIG. 1 is a flow chart of a speech recognition method according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a speech recognition method in an exemplary scenario according to an embodiment of the present invention;
fig. 3 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention.
Detailed Description
As for the background art, the prior art uses a manual sampling inspection method to search for illegal voices, which is inefficient.
the inventor of the present application has found that, in the prior art, the following steps may also be adopted to determine whether the voice data is an illegal voice: firstly, converting voice data to be detected into text data, and extracting emotion characteristic vectors of the voice data to be detected; secondly, determining voice characteristics according to the emotion characteristic vector, and searching whether text data obtained through conversion contains preset keywords or not; and then, comprehensively determining whether the audio data is illegal voice data or not by combining the voice features and preset keywords.
However, when the prior art scheme is adopted to analyze each telephone recording in a large number of telephone recording files, the common information of the illegal voice data statistics cannot be acquired, and the accuracy is low.
The embodiment of the invention provides a voice recognition method, which comprises the following steps: extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data; training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and judging whether the voice data to be detected has violation risk or not based on the emotion score.
According to the embodiment of the invention, the emotion feature vector extracted from the voice data and the text data are used as input data, and the emotion detection model is obtained through training. Because a large amount of voice data can be used as input data of the training model, the statistical advantages can be exerted, and the emotion detection model with high accuracy can be obtained through the training method. The voice data to be detected is determined based on the emotion detection model with high accuracy, so that the detection of the voice data can be completed more efficiently and accurately, and the detection rate of illegal voices is improved. Furthermore, the embodiment of the invention is suitable for mass voice detection and can expand voice detection scenes.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Fig. 1 is a flowchart illustrating a speech recognition method according to an embodiment of the present invention. The speech recognition method may be performed by a computing device, such as a server, a personal terminal, or the like.
Specifically, the speech recognition method may include the steps of:
Step S101, extracting emotion characteristic vectors from a group of voice data, and converting the group of voice data into text data;
Step S102, training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score;
Step S103, calculating emotion scores of the voice data to be detected based on the voice data to be detected and the emotion detection model;
And step S104, judging whether the voice data to be detected has violation risk or not based on the emotion score.
More specifically, each recording file of the call center can be used as one voice data, so that a huge amount of voice data can be obtained.
in step S101, at least a portion of the mass voice data may be treated as a set of voice data. And extracting the emotion feature vector of each voice data from the group of voice data, and further obtaining a plurality of emotion feature vectors.
wherein, the emotion feature vector can be used to represent or describe the emotion type, and the emotion type can be happy (happy), sad (sadness), angry (anger), fear (fear), disgust (distust).
those skilled in the art understand that each voice data may generally contain voice output by a plurality of characters. For example, a voice recording recorded at a call center will typically include voices output by two roles, e.g., by customer service personnel and customer personnel, respectively.
Taking the example that the voice data includes voice data of two characters, the voice data may include voice data of a first character and voice data of a second character. At this time, the voice data of the first character and the voice data of the second character may be first distinguished to obtain the voice data of the first character and the voice data of the second character.
In a specific implementation, the voice data of the customer service personnel and the voice data of the customer service personnel can be distinguished in advance in the voice data recorded for evaluation. For example, the customer service person outputs voice data through a first frequency, and the customer service person outputs data through a second frequency, which is different from the first frequency. Also for example, the distinction may be made by keywords or common languages of different characters.
Thereafter, emotion feature vectors of the voice data of the first character and the voice data of the second character may be extracted, respectively.
Further, each voice data may be converted into text data. In one embodiment, an Automatic Speech Recognition (ASR) technique may be used to convert each piece of Speech data into text data, so as to obtain a plurality of pieces of text data.
Take the example that the voice data includes voice data of a first character and voice data of a second character. In a specific implementation, the voice data of the first character and the voice data of the second character can be distinguished in advance. And then, respectively converting the voice data of the first role and the voice data of the second role into text data.
In step S102, the emotion feature vectors and the text data may be trained, so as to obtain an emotion detection model. And taking the voice data to be detected as the input of the emotion detection model, and outputting the emotion score of the voice data to be detected.
In one embodiment, the emotion detection model may be obtained by training using a neural network algorithm based on the emotion feature vector and the text data. Preferably, the neural network algorithm may adopt a long short-Term Memory network (LSTM) algorithm. LSTM is a time-cycled neural network.
In another embodiment, the feature vector and the text data may be used as input data of a Logistic Regression (LR) algorithm, and the emotion detection model is obtained through training.
In a specific implementation, the speech data and the text data of each character can be input to the emotion detection model together to train the emotion detection model. For example, voice data and text data of each character are marked so as to distinguish voice data and text data of different characters.
in step S103, an emotion score of the voice data to be detected is calculated based on the voice data to be detected and the emotion detection model. The voice data to be detected can be voice data in a preset time period or a recorded voice file. And inputting the voice data to be detected into the emotion detection model, and calculating the emotion score of the voice data to be detected through the emotion detection model.
In one embodiment, the voice data to be detected is a voice file, and the voice file includes voice data of a first role and voice data of a second role. It is assumed that the voice data of the first role is the voice data of the customer service personnel, and the voice data of the second role is the voice data of the customer service personnel. After the voice file is distinguished and marked, the voice file may be input to the emotion detection model, which outputs an emotion score that is an emotion score for a first character (e.g., a customer service person).
It should be noted that the speech data of the second character facilitates the emotion detection model to calculate the emotion score of the first character.
In step S104, it may be determined whether the voice data to be detected has a violation risk based on the emotion score. Taking the call center attendant as an example, the violation can refer to the attendant's occurrence of aggressive, abusive, or the like language during the course of a conversation with the attendant.
in specific implementation, a preset threshold may be set by the emotion detection model, and whether the voice data to be detected has a violation risk is determined by using the preset threshold.
if the emotion score is not higher than the preset threshold, it may be determined that the voice data to be detected is not at risk of violation.
If the emotion score is higher than the preset threshold, it may be determined that the voice data to be detected has a violation risk. Further, the voice data to be detected with violation risk can be labeled.
In practical applications, the voice data with the tag may be further confirmed manually to review the voice data.
fig. 2 is a flowchart illustrating a speech recognition method in a typical scenario according to an embodiment of the present invention. As shown in fig. 2, in a typical scenario, a recording file recorded by a call center may be used as voice data, and after obtaining an emotion detection model, the emotion detection model is used to determine whether the recording file has a violation risk.
Specifically, first, operation S201 may be performed to acquire voice data, for example, to acquire a recording file of a call center.
Next, operation S202 may be performed to convert the voice data into text data. Specifically, ASR technology can be used to obtain the text content corresponding to each audio file, and distinguish two conversational roles, i.e., customer service staff and client staff.
Again, operation S203 may be performed to extract an emotion feature vector from the voice data. Specifically, it is possible to derive an emotion feature vector using an acoustic emotion model in the related art, determine which of five emotions, i.e., joy, hurry, anger, fear, neutrality, and neutral, the emotion of two conversational characters belongs to, and output the corresponding emotion feature vector.
Further, an operation S204 may be performed to train the emotion detection model. Specifically, the text content and the emotion feature vector can be used as the input of the emotion detection model, and the emotion detection model is obtained by training through a neural network algorithm or a logistic regression algorithm.
Thereafter, operations S205 and S206 may be performed to input the voice file to be detected to the emotion detection model and calculate an emotion score. Specifically, a voice file to be detected is input to the emotion detection model, and an emotion score is output.
further, if the emotion score output by the emotion detection model exceeds a preset threshold, the sound recording file may be tagged (not shown).
Further, the tagged audio may be provided to a human for further confirmation. The preset threshold may be determined comprehensively according to the rechecking manpower condition and the accuracy related index (not shown).
therefore, the embodiment of the invention fully utilizes the mass voice data to train, so as to obtain the training model (namely the emotion detection model) with higher accuracy, the training model is suitable for mass voice detection, the detection of the voice data can be efficiently and accurately completed, and the detection rate of the illegal voice is improved.
Fig. 3 is a schematic structural diagram of a speech recognition apparatus according to an embodiment of the present invention. The speech recognition device 3 may implement the method solutions shown in fig. 1 and 2, and be executed by a computing device.
specifically, the speech recognition apparatus 3 may include: an extraction module 31, configured to extract an emotion feature vector from a set of voice data, and convert the set of voice data into text data; the training module 32 is used for training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score; the calculation module 33 is used for calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model; and the judging module 34 is used for judging whether the voice data to be detected has violation risk or not based on the emotion score.
In a specific implementation, the determining module 34 may include: a determining submodule 341, configured to determine that the voice data to be detected has a violation risk when the emotion score is higher than a preset threshold.
In a specific implementation, the speech recognition apparatus 3 may further include: and the marking module 35 is used for marking the voice data to be detected with violation risk.
In one embodiment, the training module 32 may include: the first training submodule 321 obtains the emotion detection model by training with a neural network algorithm based on the emotion feature vector and the text data.
in another embodiment, the training module 32 may include: and the second training submodule 322 obtains the emotion detection model by training through a logistic regression algorithm based on the feature vector and the text data.
In a specific implementation, the emotional feature vector may be used to represent a type of emotion, which may be selected from: happiness, sadness, anger, fear, disgust.
in a specific implementation, the extraction module 31 may include: the conversion sub-module 311 is configured to convert the voice data into the text data by using a voice-to-text technique.
In a specific implementation, the voice data may include voice data of a first character and voice data of a second character, and the extraction module 31 may include: a distinguishing submodule 312, configured to distinguish voice data of a first role from voice data of a second role in the group of voice data to obtain voice data of the first role and voice data of the second role; the extracting sub-module 313 is configured to extract emotion feature vectors of the voice data of the first character and the voice data of the second character, and convert the voice data of the first character and the voice data of the second character into text data.
For more details of the operation principle and the operation mode of the speech recognition apparatus 3, reference may be made to the related description in fig. 1 and fig. 2, and details are not repeated here.
Further, the embodiment of the present invention also discloses a storage medium, on which computer instructions are stored, and when the computer instructions are executed, the technical solution of the method in the embodiment shown in fig. 1 and fig. 2 is executed. Preferably, the storage medium may include a computer-readable storage medium such as a non-volatile (non-volatile) memory or a non-transitory (non-transient) memory. The storage medium may include ROM, RAM, magnetic or optical disks, etc.
further, the embodiment of the present invention also discloses a computing device, which includes a memory and a processor, where the memory stores computer instructions capable of running on the processor, and the processor executes the computer instructions to execute the technical solutions of the methods described in the embodiments shown in fig. 1 and fig. 2.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (11)
1. A speech recognition method, comprising:
Extracting emotion feature vectors from a set of voice data and converting the set of voice data into text data;
Training to obtain an emotion detection model based on the emotion feature vector and the text data, wherein the emotion detection model is used for calculating an emotion score;
Calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model;
And judging whether the voice data to be detected has violation risk or not based on the emotion score.
2. The voice recognition method according to claim 1, wherein the determining whether the voice data to be detected has a violation risk based on the emotion score comprises:
And when the emotion score is higher than a preset threshold value, determining that the voice data to be detected has violation risk.
3. The speech recognition method of claim 2, further comprising:
And marking the voice data to be detected with violation risk.
4. The speech recognition method of any one of claims 1 to 3, wherein training a sentiment detection model based on the sentiment feature vectors and the text data comprises:
and training by adopting a neural network algorithm based on the emotion feature vector and the text data to obtain the emotion detection model.
5. The speech recognition method of any one of claims 1 to 3, wherein training a sentiment detection model based on the sentiment feature vectors and the text data comprises:
And training by adopting a logistic regression algorithm based on the feature vector and the text data to obtain the emotion detection model.
6. a speech recognition method according to any one of claims 1 to 3, wherein the emotion feature vector is used to represent an emotion type selected from: happiness, sadness, anger, fear, disgust.
7. The speech recognition method of any one of claims 1 to 3, wherein the converting the set of speech data into text data comprises:
And converting the voice data into the text data by adopting a voice-to-text technology.
8. The speech recognition method of any one of claims 1 to 3, wherein the speech data comprises speech data of a first character and speech data of a second character, and wherein extracting an emotional feature vector from a set of speech data and converting the set of speech data into text data comprises:
Distinguishing voice data of a first role from voice data of a second role in the group of voice data to obtain the voice data of the first role and the voice data of the second role;
and extracting emotion characteristic vectors of the voice data of the first role and the voice data of the second role respectively, and converting the voice data of the first role and the voice data of the second role into text data respectively.
9. A speech recognition apparatus, comprising:
The extraction module is used for extracting emotion characteristic vectors from a group of voice data and converting the group of voice data into text data;
The training module is used for training to obtain an emotion detection model based on the emotion feature vector and the text data, and the emotion detection model is used for calculating an emotion score;
The calculation module is used for calculating the emotion score of the voice data to be detected based on the voice data to be detected and the emotion detection model;
And the judging module is used for judging whether the voice data to be detected has violation risks or not based on the emotion scores.
10. A storage medium having stored thereon computer instructions, characterized in that the computer instructions are operative to perform the steps of the method of any one of claims 1 to 8.
11. A computing device comprising a memory and a processor, the memory having stored thereon computer instructions executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910967019.6A CN110580899A (en) | 2019-10-12 | 2019-10-12 | Voice recognition method and device, storage medium and computing equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910967019.6A CN110580899A (en) | 2019-10-12 | 2019-10-12 | Voice recognition method and device, storage medium and computing equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110580899A true CN110580899A (en) | 2019-12-17 |
Family
ID=68814476
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910967019.6A Pending CN110580899A (en) | 2019-10-12 | 2019-10-12 | Voice recognition method and device, storage medium and computing equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110580899A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291184A (en) * | 2020-01-20 | 2020-06-16 | 百度在线网络技术(北京)有限公司 | Expression recommendation method, device, equipment and storage medium |
CN111405128A (en) * | 2020-03-24 | 2020-07-10 | 中国—东盟信息港股份有限公司 | Call quality inspection system based on voice-to-text conversion |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107705807A (en) * | 2017-08-24 | 2018-02-16 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
CN107886951A (en) * | 2016-09-29 | 2018-04-06 | 百度在线网络技术(北京)有限公司 | A kind of speech detection method, device and equipment |
CN107885723A (en) * | 2017-11-03 | 2018-04-06 | 广州杰赛科技股份有限公司 | Conversational character differentiating method and system |
CN108122552A (en) * | 2017-12-15 | 2018-06-05 | 上海智臻智能网络科技股份有限公司 | Voice mood recognition methods and device |
CN109102805A (en) * | 2018-09-20 | 2018-12-28 | 北京长城华冠汽车技术开发有限公司 | Voice interactive method, device and realization device |
US20190198040A1 (en) * | 2017-12-22 | 2019-06-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Mood recognition method, electronic device and computer-readable storage medium |
-
2019
- 2019-10-12 CN CN201910967019.6A patent/CN110580899A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107886951A (en) * | 2016-09-29 | 2018-04-06 | 百度在线网络技术(北京)有限公司 | A kind of speech detection method, device and equipment |
CN107705807A (en) * | 2017-08-24 | 2018-02-16 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
CN107885723A (en) * | 2017-11-03 | 2018-04-06 | 广州杰赛科技股份有限公司 | Conversational character differentiating method and system |
CN108122552A (en) * | 2017-12-15 | 2018-06-05 | 上海智臻智能网络科技股份有限公司 | Voice mood recognition methods and device |
US20190198040A1 (en) * | 2017-12-22 | 2019-06-27 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Mood recognition method, electronic device and computer-readable storage medium |
CN109102805A (en) * | 2018-09-20 | 2018-12-28 | 北京长城华冠汽车技术开发有限公司 | Voice interactive method, device and realization device |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291184A (en) * | 2020-01-20 | 2020-06-16 | 百度在线网络技术(北京)有限公司 | Expression recommendation method, device, equipment and storage medium |
CN111291184B (en) * | 2020-01-20 | 2023-07-18 | 百度在线网络技术(北京)有限公司 | Expression recommendation method, device, equipment and storage medium |
CN111405128A (en) * | 2020-03-24 | 2020-07-10 | 中国—东盟信息港股份有限公司 | Call quality inspection system based on voice-to-text conversion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110910901B (en) | Emotion recognition method and device, electronic equipment and readable storage medium | |
CN109658923B (en) | Speech quality inspection method, equipment, storage medium and device based on artificial intelligence | |
CN106847305B (en) | Method and device for processing recording data of customer service telephone | |
CN112951275B (en) | Voice quality inspection method and device, electronic equipment and medium | |
CN109192225B (en) | Method and device for recognizing and marking speech emotion | |
CN111597818B (en) | Call quality inspection method, device, computer equipment and computer readable storage medium | |
CN112966082A (en) | Audio quality inspection method, device, equipment and storage medium | |
CN111462758A (en) | Method, device and equipment for intelligent conference role classification and storage medium | |
CN112509561A (en) | Emotion recognition method, device, equipment and computer readable storage medium | |
CN110580899A (en) | Voice recognition method and device, storage medium and computing equipment | |
CN113505606B (en) | Training information acquisition method and device, electronic equipment and storage medium | |
CN112364622A (en) | Dialog text analysis method, dialog text analysis device, electronic device and storage medium | |
CN108962228B (en) | Model training method and device | |
CN113516994B (en) | Real-time voice recognition method, device, equipment and medium | |
US9875236B2 (en) | Analysis object determination device and analysis object determination method | |
WO2020199590A1 (en) | Mood detection analysis method and related device | |
CN114254088A (en) | Method for constructing automatic response model and automatic response method | |
CN114418320A (en) | Customer service quality evaluation method, apparatus, device, medium, and program product | |
CN115273854B (en) | Service quality determining method and device, electronic equipment and storage medium | |
CN111061815A (en) | Conversation data classification method | |
CN117271778B (en) | Insurance outbound session information output method and device based on generation type large model | |
CN115759113B (en) | Method and device for identifying sentence semantics in dialogue information | |
CN107798480B (en) | Service quality evaluation method and system for customer service | |
CN115050370A (en) | Target text acquisition method and device based on voice recognition and storage medium | |
CN113889149B (en) | Speech emotion recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191217 |
|
RJ01 | Rejection of invention patent application after publication |