CN116092481A

CN116092481A - Scoring method and device based on voice data, electronic equipment and storage medium

Info

Publication number: CN116092481A
Application number: CN202211714558.7A
Authority: CN
Inventors: 谢基有; 李亚桐
Original assignee: Shanghai Shengyang Yunhan Information Technology Co ltd; Voiceai Technologies Co ltd
Current assignee: Shenzhen Digital Miracle Technology Co ltd; Voiceai Technologies Co ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-05-09

Abstract

The application provides a scoring method, a scoring device, electronic equipment and a storage medium based on voice data, wherein the scoring method comprises the following steps: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one; determining voice data input by a user according to a plurality of voice chat messages in the contextual chat messages; analyzing the voice data to obtain an analysis result; and determining the score corresponding to the voice data according to the analysis result. According to the method and the device, the voice data are determined through the voice chat information, the scoring of the voice data is determined according to the analysis result of the voice data, when a user answers with voice, the voice data input by the user can be automatically scored, the scoring is not needed to be performed manually, the labor cost is reduced, and the scoring accuracy is improved.

Description

Scoring method and device based on voice data, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a scoring method, apparatus, electronic device, and storage medium based on voice data.

Background

With the development of technology, people learn more and more on the internet, for example, enterprises can train staff through the network, and in some scenes, in order to simulate real scenes, users are often required to answer through input voices.

However, when the user answers by voice, the answer voice of the user is required to be scored manually at present, the labor cost of enterprises is high, the manual scoring is subjective, the true level of the user cannot be reflected, and the training effect is poor.

Disclosure of Invention

In view of the above, the present application proposes a scoring method, a scoring device, an electronic device and a storage medium based on voice data, so as to improve the above problem.

In a first aspect, an embodiment of the present application provides a scoring method based on voice data, where the method includes: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one; determining voice data input by a user according to a plurality of voice chat messages in the contextual chat messages; analyzing the voice data to obtain an analysis result; and determining the score corresponding to the voice data according to the analysis result.

In a second aspect, an embodiment of the present application further provides a scoring apparatus based on voice data, where the apparatus includes: the acquisition module is used for acquiring the contextual chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one; the first determining module is used for determining voice data input by a user according to a plurality of voice chat messages in the contextual chat messages; the analysis module is used for analyzing the voice data to obtain an analysis result; and the second determining module is used for determining the score corresponding to the voice data according to the analysis result.

In a third aspect, an embodiment of the present application further provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech data based scoring method as in the first aspect.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium storing computer-executable instructions for enabling an electronic device to perform the speech data based scoring method as in the first aspect.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required for the description of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description are only some embodiments of the present application, but not all embodiments. All other embodiments and figures obtained by those skilled in the art without any inventive effort based on the embodiments herein fall within the scope of the present invention.

Fig. 1 is an application scenario schematic diagram of a scoring method based on voice data according to an embodiment of the present application.

Fig. 2 is a flowchart of a scoring method based on voice data according to an embodiment of the present application.

Fig. 3 is a schematic flow chart of a scoring method based on voice data according to an embodiment of the present application.

Fig. 4 is another flow chart of a scoring method based on voice data according to an embodiment of the present application.

Fig. 5 is a schematic flow chart of another scoring method based on voice data according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a scoring device based on voice data according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 8 is a block diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

With the development of technology, people learn more and more on the internet, for example, enterprises can train staff through the internet, but in real scenes, for example, in a scene that a bank teller communicates with a client, in a scene that customer services receive consultation telephones related to products or services of the client, and the like, the client and staff are more and more directly interacted through voice, so that the current staff training often requires users to answer through voice to improve training effects.

However, at present, when a user answers by voice, the user often needs to score manually, so that the labor cost of enterprises is increased; and moreover, the manual scoring is subjective, the true level of the user cannot be reflected, the user still frequently suffers complaints when facing the client alone after finishing training, communication is not smooth, and the training effect is poor.

In order to improve the above problems, the inventors propose a scoring method, a scoring device, an electronic device and a storage medium based on voice data, the method comprising: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one; determining voice data input by a user according to a plurality of voice chat messages in the contextual chat messages; analyzing the voice data to obtain an analysis result; and determining the score corresponding to the voice data according to the analysis result.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic application scenario diagram of a scoring method based on voice data according to an embodiment of the present application. The scoring method based on voice data provided in the embodiment of the present application may be applied to the electronic device 10 in fig. 1, where the electronic device 10 may be provided with a display screen, and the display screen may display a chat interface, for example, a contextual chat scene 100 in fig. 1, and as shown in fig. 1, the contextual chat scene 100 includes a plurality of paired chat information 110 and a plurality of voice chat information 120.

In some implementations, the electronic device also has a voice receiving device, such as a microphone or the like, through which the user can input voice chat information 120.

In some implementations, the electronic device also has a voice output device, such as a speaker or the like, through which the electronic device can output the sparring chat information 110 when the sparring chat information 110 is audio.

In some embodiments, contextual chat information may be obtained through contextual chat scene 100, the contextual chat information including a plurality of voice chat information 120 and a plurality of sparring chat information 110; wherein, the voice chat information 120 corresponds to the opposite chat information 110 one by one.

Optionally, the contextual chat scenario 100 further includes an input control 130, where the input control 130 is configured to, after the electronic device outputs the pairing chat information 110, obtain the voice chat information 120 input by the user if the user triggers the input control 130, and establish a correspondence between the voice chat information 120 and the pairing chat information 110.

In some embodiments, if the on-going chat information 110 is audio information, the on-going chat information 110 and the voice chat information 120 in the embodiments of the present application may not be displayed in the contextual chat scene 100, but may be integrated in one audio, that is, the recording is performed when the electronic device 10 outputs the first on-going chat information 110 until the last voice chat information 120 is received, so as to record the on-going chat information 110 and the voice chat information 120 in time sequence in the audio.

In some embodiments, the contextual chat information further includes contextual information corresponding to the contextual chat scene 100, which may include contextual attributes, customer attributes, and the like.

Optionally, the scenario attribute includes a banking counter service, a notification of a payment acceleration, a customer service, a return visit, a notification of product information update, etc.; customer attributes include whether arrears, overdue, purchased, difficulty of communication, customer category, etc.

Further, the client category includes information for classifying clients, such as male, female, good credit, poor credit, domestic clients, foreign clients, and the like.

In some implementations, the sparring chat information 110 is determined by context information. For example, if the scenario attribute is "notification of refund", the client attribute is "overdue: if the credit is "good", the session information 110 is used to simulate session information sent by overdue clients who need to be informed of the need of the acceleration of money; if the scene attribute is "notification of remittance", the client attribute is "whether overdue: no "," foreign client ", the chat message 110 is used to simulate the dialogue message sent by the foreign client who has not yet overdue and needs to be informed of the acceleration.

In some embodiments, the training chat information 110 corresponding to different scenario information may be stored in advance, so that the training chat information 110 is automatically acquired and displayed when the user performs scenario training, that is, is in the scenario chat scene 100. It is understood that the on-going chat information may be text information or audio information.

Referring to fig. 2, fig. 2 is a flowchart of a scoring method based on voice data according to an embodiment of the present application. As shown in fig. 2, the method includes: step 210 to step 240.

Step 210: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one.

In some embodiments, the contextual chat information further includes contextual information, where the contextual information may include contextual attributes, client attributes, and the like, and specific content may refer to the description of the foregoing part of the specification, which is not repeated herein.

In some embodiments, the contextual chat information further includes an input time, an output time, a user identification, etc. of each of the voice chat information. Optionally, the input time is the time when the user starts inputting the voice chat information, the output time is the time when the voice chat information is input, and the user identity is used for reflecting the user identity.

Illustratively, when the user can input the voice chat information by triggering the input control 130 in fig. 1, the time when the user triggers the input control 130 is the input time, and the time when the user stops triggering the input control 130 is the output time. Further, the user identity identifier may be a user ID, a user account, or other information that can uniquely reflect the user identity.

In some embodiments, the contextual chat information further includes an output time of the refined chat information. It will be appreciated that after the system obtains the sparring chat information according to the context information, the sparring chat information may be directly output in a display interface, for example, the context chat scene 100 in fig. 1, for viewing or interaction by the user.

Step 220: and determining voice data input by the user according to the plurality of voice chat information in the contextual chat information.

In some embodiments, each voice chat message is entered by a user, so that the voice data entered by the user can be determined from a plurality of voice chat messages.

Step 230: and analyzing the voice data to obtain an analysis result.

In some embodiments, the voice data may be subjected to multidimensional analysis to obtain a plurality of analysis results, where the multidimensional analysis includes one or more of a voice analysis mode, a text analysis mode, and a process completion analysis to obtain one or more of a voice analysis result, a text analysis result, and a process completion, and a final analysis result is obtained according to one or more of the voice analysis result, the text analysis result, and the process completion. It can be understood that the more dimensions are analyzed, the more accurate the scoring of the voice data, and the more accurate the scoring of the voice data can be performed by obtaining a multi-dimensional analysis result according to the multi-dimensional analysis.

In some embodiments, the voice analysis mode is used for directly analyzing voice data, the text analysis mode is used for analyzing text information obtained according to the voice data, the flow completion degree analysis is used for analyzing corresponding contextual chat information obtained according to the voice data, and analysis results of the three modes are combined to obtain final scores of the voice data.

Step 240: and determining the score corresponding to the voice data according to the analysis result.

In some embodiments, the score corresponding to the voice data may be determined according to a preset scoring rule according to an analysis result, which will be described in the following section of the specification.

Referring to fig. 3 again, fig. 3 is a schematic flow chart of a scoring method based on voice data according to an embodiment of the present application. As shown in fig. 3, the method 300 includes: steps 310 to 340.

Step 310: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one.

Step 320: and determining voice data input by the user according to the plurality of voice chat information in the contextual chat information.

The specific content of steps 310-320 can refer to steps 210-220, and will not be described herein.

Step 330: analyzing the voice data according to the voice analysis mode to obtain a voice analysis result; the voice analysis mode comprises at least one of identity analysis, speech speed analysis, volume analysis, emotion analysis and overtime analysis.

Step 340: and determining the score corresponding to the voice data according to the voice analysis result.

In some embodiments, according to the voice analysis result, the score corresponding to the voice data may be determined according to a preset scoring rule.

In some implementations, the voice analysis mode includes a timeout analysis, and step 330 includes:

(1) A reply interval between each voice chat message and the corresponding sparring chat message is determined.

(2) And determining a timeout analysis result according to all reply intervals.

Step 340 now comprises:

(3) And determining the score corresponding to the voice data according to the overtime analysis result.

In some embodiments, the opposite-training chat information is text information, and at this time, the reply interval may be determined according to the output time of the voice chat information and the output time of the corresponding opposite-training chat information. Illustratively, the reply interval is the output time of the voice chat message minus the output time of the corresponding paired chat message.

In some embodiments, the opposite-training chat information is voice information, and at this time, scenario chat audio may be generated according to the plurality of voice chat information and the plurality of opposite-training chat information in time sequence, and a reply interval between each voice chat information and the corresponding opposite-training chat information may be determined according to a silence detection manner.

In some embodiments, the sound intensity of the contextual chat audio may be identified in real time, and when the sound intensity of the contextual chat audio is less than or equal to the preset sound intensity, the current silence state is determined, and according to the duration of the silence state between each voice chat message and the corresponding opposite chat message, the reply interval between each voice chat message and the corresponding opposite chat message may be determined.

In some implementations, the timeout analysis results include "timeout" and "not timeout".

In some embodiments, the timeout analysis result is "timeout" whenever a certain reply interval is greater than a preset reply interval, and "not timeout" if all reply intervals are less than or equal to the preset reply interval.

In some embodiments, a reference score may be preset for the user, and when the timeout analysis result is "timeout", the score corresponding to the voice data is determined by performing the score on the basis of the reference score and when the timeout analysis result is "not timeout", the score is not performed.

Through the mode, the user can be supervised to reply the content sent by the client in time, the real dialogue scene is simulated, and the training effect is improved.

In some embodiments, the voice analysis means comprises an identity analysis, step 330 comprising:

(1) And carrying out voiceprint recognition on the voice data to obtain voice voiceprint characteristics.

(2) And obtaining preset voiceprint characteristics corresponding to the user.

(3) And carrying out identity analysis according to the voice voiceprint characteristics and preset voiceprint characteristics, and determining an identity analysis result.

Step 340 now comprises:

(4) And determining the score corresponding to the voice data according to the identity analysis result.

In some embodiments, voiceprint recognition is performed on the voice data according to the trained voiceprint recognition model, so as to obtain voice voiceprint features corresponding to the voice data.

In some embodiments, the user also needs to input voice when registering the account, and voiceprint recognition is performed according to the voice input during user registration, so that preset voiceprint features can be obtained; the preset voiceprint features may be stored in a server or a database, so as to obtain the preset voiceprint features from the server or the database after obtaining the voice voiceprint features.

In some embodiments, if the voice voiceprint feature matches the preset voiceprint feature, for example, the similarity between the voice voiceprint feature and the preset voiceprint feature is greater than the preset similarity, for example, the similarity between the voice voiceprint feature and the preset voiceprint feature is greater than 80%, it represents that the user is replying himself, and the identity analysis result may be "herself"; if the voice voiceprint features are not matched with the preset voiceprint features, the user is represented to answer, and the identity analysis result can be 'non-user'.

In some embodiments, a reference score may be preset for the user, and when the identity analysis result is "not the own", the score corresponding to the voice data is determined by performing the score on the basis of the reference score, and when the identity analysis result is "the own", the score is not performed.

Through the mode, other people can be prevented from displacing the user to conduct scene exercises, and the cheating difficulty is improved.

In some embodiments, the speech analysis mode includes speech rate analysis, and step 330 includes:

(1) And carrying out speech rate analysis on the voice data, and determining the speech rate corresponding to the voice data at each moment.

(2) And determining a speech rate analysis result according to the speech rate corresponding to the speech data at each moment.

Step 340 now comprises:

(3) And determining the score corresponding to the voice data according to the speech speed analysis result.

In some embodiments, the speech analysis result corresponding to the speech rate detection may be "too low speech rate", "normal speech rate", "too high speech rate"; or a specific value of speech rate, in words/minute.

For example, the speech rate analysis result may be determined as "too high" whenever the speech rate at a certain moment is higher than the maximum value of the preset speech rate range, as "too low" whenever the speech rate at a certain moment is lower than the minimum value of the preset speech rate range, and as "normal" when the speech rate at all moments is in the preset speech rate range.

For example, the average speech rate may be determined according to the speech rate at all times, and when the average speech rate is higher than the maximum value of the preset speech rate range, the speech rate analysis result is determined as "too high", when the average speech rate is lower than the minimum value of the preset speech rate range, the speech rate analysis result is determined as "too low", and when the average speech rate is within the preset speech rate range, the speech rate analysis result is determined as "normal".

Further, a reference score can be preset for the user, and when the speech speed analysis result is "too low speech speed" or "too high speech speed", the score corresponding to the speech data is obtained by deducting the score on the basis of the reference score; and when the specific speech speed of the user is out of the preset speech speed range, the user can be deducted according to the specific speech speed of the user on the basis of the reference score, for example, the more the specific speech speed of the user is different from the preset speech speed range, the more the deduction is.

In some implementations, the voice analysis mode includes a volume analysis, and step 330 includes:

(1) And carrying out volume analysis on the voice data, and determining the volume corresponding to the voice data at each moment.

(2) And determining a volume analysis result according to the volume corresponding to the voice data at each moment.

Step 340 now comprises:

(3) And determining the score corresponding to the voice data according to the volume analysis result.

In some embodiments, the volume analysis results corresponding to the volume detection may be "volume too low", "volume normal", "volume too high"; or a specific value of the volume, in decibels.

For example, the volume analysis result may be determined as "volume too high" whenever the volume at a certain time is higher than the maximum value of the preset volume range, as "volume too low" whenever the volume at a certain time is lower than the minimum value of the preset volume range, and as "volume normal" when the volume at all times is within the preset volume range.

For example, the average volume may be determined according to the volume at all times, and when the average volume is higher than the maximum value of the preset volume range, the volume analysis result is determined as "volume too high", when the average volume is lower than the minimum value of the preset volume range, the volume analysis result is determined as "volume too low", and when the average volume is within the preset volume range, the volume analysis result is determined as "volume normal".

Further, a reference score can be preset for the user, and when the volume analysis result is "low volume" or "high volume", the score is deducted on the basis of the reference score so as to obtain the answer score of the user; and when the specific volume of the user is out of the preset volume range, the user can be deducted according to the specific volume of the user on the basis of the reference score, for example, the more the specific volume of the user is far away from the preset volume range, the more the user is deducted.

In some embodiments, the voice analysis means comprises emotion analysis, step 330 comprising:

(1) And carrying out emotion analysis on the voice data, and determining an emotion analysis result.

Step 340 now comprises:

(2) And determining the score corresponding to the voice data according to the emotion analysis result.

In some embodiments, the emotion analysis results may be "normal emotion" and "abnormal emotion.

In some embodiments, semantic analysis and volume analysis may be performed on the voice data, semantic analysis results and volume analysis results may be determined, and emotion analysis results may be determined based on the semantic analysis results and the volume analysis results.

In some embodiments, text recognition may be performed on the voice data to obtain text information corresponding to the voice data, and semantic analysis may be performed according to the text information using a semantic recognition model to obtain a semantic analysis result.

Further, the step of obtaining the volume analysis result may refer to the description of the above part of the specification, and will not be described herein.

In some embodiments, the type of neural network employed in the semantic recognition model may be, for example, CNN (Convo l ut i ona l Neura l Network ), RNN (Recurrent Neura lNetwork, recurrent neural network), DNN (Deep Neura l Network, deep learning network), etc., specifically selectable according to actual needs.

In some embodiments, the emotion may be determined to be one of positive, neutral and negative according to the volume analysis result and the semantic analysis result, and illustratively, when the volume is larger, the emotion is more likely to correspond to negative emotion, when the semantic includes certain keywords, for example, when the semantic includes visceral words, the emotion is more likely to correspond to negative emotion, and when the semantic includes keywords such as "please", "bad meaning" and the like, the emotion is more likely to correspond to neutral and positive emotion.

In some embodiments, emotion analysis may be performed on the voice data directly through the emotion recognition model, to obtain emotion analysis results. Further, the results obtained by the emotion recognition model can be in three categories of positive emotion, neutral emotion and negative emotion.

In some embodiments, the emotion analysis result is determined to be "normal emotion" when the emotion is a positive emotion and a neutral emotion, and the emotion analysis result may be determined to be "abnormal emotion" when the emotion is a negative emotion.

Further, a reference score may be preset for the user, and the score may be deducted based on the reference score when the emotion analysis result is "abnormal emotion", and the score may not be deducted when the emotion analysis result is "normal emotion".

In some embodiments, after analyzing the voice data in multiple analysis modes, multiple scores corresponding to the voice data are obtained, for example, one score is obtained in an identity analysis mode and one score is obtained in a volume analysis mode, at this time, different weight values can be set for different analysis modes, and the final score of the voice data is obtained by calculating according to the weight values.

In some embodiments, the identity analysis mode may not set a corresponding weight value, and if the identity analysis is not passed, the final score of the voice data is directly set to a minimum value; if the identity analysis passes, calculating the final score of the voice data according to the scores obtained by the rest analysis modes and the corresponding weight values.

In some embodiments, please refer to fig. 4 again, fig. 4 is another flow chart of a scoring method based on voice data according to an embodiment of the present application. As shown in fig. 4, the method 400 includes: steps 410 to 470.

Step 410: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one.

The details of step 410 may refer to step 210, and will not be described herein.

Step 420: and determining voice data input by the user according to the plurality of voice chat information in the contextual chat information.

The details of step 420 may refer to step 220, and will not be described herein.

Step 430: the amount of voice chat information in the voice data is determined.

In some embodiments, the voice data is derived from a plurality of voice chat messages, so that the amount of voice chat messages in the voice data can be directly determined.

Step 440: and determining the context information corresponding to the context chat information.

Specifically, the above portions of the specification have already explained that the contextual chat information may include contextual information, which is not further explained herein.

Step 450: and determining a preset quantity threshold corresponding to the voice chat information according to the context information.

Specifically, different scenario information corresponds to different scenario flows, so that the number of voice chat information required to be input by the user is also different, and thus the determined preset number threshold is also different. For example, when the scenario information is a product information update notification, the scenario flow is mostly shorter, and the number of voice chat information input by the user is required to be smaller, whereas when the scenario information is a bank counter service, the scenario flow is mostly longer, and the number of voice chat information input by the user is required to be larger.

Step 460: and determining the completion degree of the flow according to the number of the voice chat messages and a preset number threshold.

In some embodiments, the process completion= (number of voice chat messages/preset number threshold value) ×100%.

Step 470: and determining the score corresponding to the voice data according to the flow completion degree.

In some embodiments, a reference score may be preset for the user, and different deduction values may be determined according to different process completion degrees. Illustratively, no score is given when the process completion is 100% and 10 score is given when the process completion is 50%.

Through the mode, the user can be prevented from finishing the dialogue in advance, so that the dialogue flow is incomplete, important dialogue information is omitted, and the training effect is improved.

In some embodiments, please refer to fig. 5 again, fig. 5 is a schematic flow chart of a scoring method based on voice data according to an embodiment of the present application. As shown in fig. 5, the method 500 includes: steps 510 to 550.

Step 510: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one.

The details of step 510 may refer to step 210, and will not be described herein.

Step 520: and determining voice data input by the user according to the plurality of voice chat information in the contextual chat information.

The details of step 520 may refer to step 220, and will not be described herein.

Step 530: and carrying out text recognition on the voice data to obtain text information corresponding to the voice data.

In some embodiments, the speech data may be identified according to a preset trained speech recognition model to obtain text information.

Step 540: analyzing the text according to the text analysis mode to obtain a text analysis result; the text analysis mode comprises at least one of sensitive word analysis, keyword analysis and speaking accuracy analysis.

In some embodiments, the text information may be analyzed according to the trained text information analysis model to obtain a text analysis result.

Step 550: and determining the score corresponding to the voice data according to the text analysis result.

In some embodiments, according to the text analysis result, the score corresponding to the voice data can be determined according to a preset scoring rule.

In some implementations, the text analysis means includes a speech accuracy analysis, step 540 including:

(1) And determining the reference text information according to a plurality of the paired chat information in the scene chat information.

(2) And performing a conversation accuracy analysis according to the text information and the reference text information, and determining a conversation accuracy analysis result.

At this point, step 550 includes:

(3) And determining the score corresponding to the voice data according to the analysis result of the accuracy of the speaking.

In some embodiments, each pair of the training chat information corresponds to a reference text information, a first correspondence between the training chat information and the reference text information may be preset, and after the training chat information is determined, the reference text information corresponding to the training chat information is determined according to the first correspondence.

In some embodiments, a first semantic meaning corresponding to the text information and a second semantic meaning corresponding to the reference text information may be respectively identified, and a semantic similarity, that is, a speaking accuracy, between the first semantic meaning and the second semantic meaning may be determined through a semantic similarity algorithm or a trained semantic similarity calculation model, so as to obtain a speaking accuracy analysis result.

In some embodiments, the semantic similarity between the first semantic and the second semantic may also be calculated by determining a Chinese character similarity and a pinyin similarity.

In some implementations, the speech accuracy analysis results are a percentage of similarity between the first semantic and the second semantic.

In some embodiments, a reference score may be preset for the user, with different points being determined based on different similarity percentages. Illustratively, no score is given when the similarity percentage is greater than or equal to 80%, and a score of 10 is given when the similarity percentage is less than 80%.

In some implementations, the text analysis means includes sensitive word analysis, and step 540 includes:

(1) And carrying out sensitive word analysis on the text information to obtain a sensitive word analysis result.

At this point, step 550 includes:

(2) And determining the score corresponding to the voice data according to the sensitive word analysis result.

In some embodiments, the sensitive word analysis result may be "contain sensitive words" or "do not contain sensitive words", or may be specific sensitive words.

Further, a reference score can be preset for the user, the score is deducted on the basis of the reference score when the sensitive word analysis result is "including the sensitive word", and the score is not deducted when the sensitive word analysis result is "not including the sensitive word"; or performing deduction on the basis of the reference score according to the deduction value corresponding to each sensitive word, wherein the deduction value corresponding to each sensitive word can be different.

In some implementations, after the sensitive word is detected, the sensitive word may be highlighted at the text message to inform the user what sensitive word was withheld for. Further, the marks can be highlighted in different manners for the sensitive words with different deduction values according to the deduction values of the different sensitive words, for example, the mark with the highest deduction value is highlighted in red, the mark with the highest deduction value is highlighted in yellow, and it can be understood that the sensitive words can be highlighted in a thickening or tilting manner, and the specific manner of the sensitive words marked in the highlighting is not limited in the application.

Through the mode, the habit of not speaking the sensitive words in the communication process of the user can be developed, so that the actual communication requirement is met, and the training effect is improved.

In some implementations, the text analysis means includes keyword analysis, and step 540 includes:

(1) And carrying out keyword analysis on the text information to obtain a keyword analysis result.

At this point, step 550 includes:

(2) And determining the score corresponding to the voice data according to the keyword analysis result.

In some embodiments, the keyword analysis result may be "include keywords" or "not include keywords.

In some embodiments, a reference score may be preset for the user, and the score is deducted based on the reference score when the keyword analysis result is "including the keyword", and the score is not deducted when the keyword analysis result is "not including the keyword". Further, if the number of keywords is plural, the score may be determined according to the number of keywords included in the text information, for example, the greater the number of keywords included in the text information, the smaller the score.

In some embodiments, the keywords may be determined from the reference text information, e.g., the keywords are certain words or certain words in the reference text information. Further, a second corresponding relation between the keywords and the reference text information may be preset, and the keywords corresponding to the reference text information may be determined according to the second corresponding relation.

According to the mode, the user can be prevented from outputting a large amount of invalid information in the communication process, and the communication efficiency of the user and the client in communication is improved.

In some implementations, the scoring method 200 based on voice data provided in the embodiments of the present application further includes the following steps:

(1) And determining scores corresponding to the voice data according to the voice analysis result, the flow completion degree and various results in the text analysis result.

In some embodiments, a first score may be obtained according to a voice analysis manner, a second score may be obtained according to a process completion degree, a third score may be obtained according to a text analysis result, and a score corresponding to the voice data may be determined according to a plurality of scores among the first score, the second score, and the third score.

For example, if the scores corresponding to the finally obtained voice data are multiple, for example, the scores include a first score determined according to the voice analysis result, a second score determined according to the process completion degree, and a third score determined according to the text analysis result, different weights may be set for the scores respectively, so as to comprehensively calculate the final score of the voice data.

Through the mode, the voice data can be scored through multiple dimensions, and the scoring rationality of the voice data is improved.

The application provides a scoring method based on voice data, which comprises the following steps: acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one; determining voice data input by a user according to a plurality of voice chat messages in the contextual chat messages; analyzing the voice data to obtain an analysis result; and determining the score corresponding to the voice data according to the analysis result. According to the method and the device, the voice data are determined through the voice chat information, the scoring of the voice data is determined according to the analysis result of the voice data, when a user answers with voice, the voice data input by the user can be automatically scored, the scoring is not needed to be performed manually, the labor cost is reduced, and the scoring accuracy is improved.

Referring to fig. 6 again, fig. 6 is a schematic structural diagram of a scoring device based on voice data according to an embodiment of the present application. As shown in fig. 6, the voice data-based scoring apparatus 600 includes: the system comprises an acquisition module 610, a first determination module 620, an analysis module 630 and a second determination module 640.

An obtaining module 610, configured to obtain contextual chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one.

The first determining module 620 is configured to determine voice data input by the user according to a plurality of voice chat messages in the contextual chat messages.

The analysis module 630 is configured to analyze the voice data to obtain an analysis result.

And the second determining module 640 is configured to determine a score corresponding to the voice data according to the analysis result.

It should be noted that, for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant points are referred to in the description of the method embodiment. Any of the described processing manners in the method embodiment may be implemented by a corresponding processing module in the device embodiment, which is not described in detail in the device embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module. The integrated modules may be implemented in hardware or in software functional modules.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 700 includes: one or more processors 710, and a memory 720, one processor 710 being illustrated in fig. 7.

Processor 710 and memory 720 may be connected by a bus or otherwise, for example in fig. 7.

A processor 710 for acquiring contextual chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of opposite chat information; wherein, the voice chat information corresponds to the training chat information one by one; determining voice data input by a user according to a plurality of voice chat messages in the contextual chat messages; analyzing the voice data to obtain an analysis result; and determining the score corresponding to the voice data according to the analysis result.

The memory 720 is a non-volatile computer readable storage medium that can be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules of the voice data based scoring method in the embodiments of the present application. The processor 710 executes various functional applications of the electronic device and data processing, i.e., implements the voice data-based scoring method of the above-described method embodiments, by running non-volatile software programs, instructions, and modules stored in the memory 720.

Memory 720 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 720 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 720 may optionally include memory located remotely from processor 710, which may be connected to the controller via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in memory 720 that, when executed by one or more processors 710, perform the speech data based scoring method in any of the method embodiments described above, e.g., perform method steps 210 through 240 in fig. 2 described above.

Referring to fig. 8, fig. 8 is a block diagram illustrating a computer readable storage medium according to an embodiment of the present application. The computer readable storage medium 800 has stored therein program code 810, the program code 810 being executable by a processor to perform the voice data based scoring method described in the method embodiments above.

The computer readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium 800 has storage space for program code to perform any of the method steps of the speech data based scoring method described above. The program code can be read from or written to one or more computer program products. The program code may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; the technical features of the above embodiments or in the different embodiments may also be combined within the idea of the invention, the steps may be implemented in any order, and there are many other variations of the different aspects of the invention as above, which are not provided in details for the sake of brevity; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention. From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, but may also be implemented by means of hardware. Those skilled in the art will appreciate that a program implementing all or part of the above-described embodiment method steps can be implemented by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium, and the program can include the above-described embodiment method steps when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Claims

1. A method of scoring based on speech data, the method comprising:

acquiring scene chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of training chat information; wherein, the voice chat information corresponds to the training chat information one by one;

determining voice data input by a user according to a plurality of voice chat messages in the contextual chat messages;

analyzing the voice data to obtain an analysis result;

and determining the score corresponding to the voice data according to the analysis result.

2. The method of claim 1, wherein analyzing the voice data to obtain an analysis result comprises:

analyzing the voice data according to a voice analysis mode to obtain a voice analysis result; the voice analysis mode comprises at least one of identity analysis, speech speed analysis, volume analysis, emotion analysis and overtime analysis;

the determining the score corresponding to the voice data according to the analysis result comprises the following steps:

and determining the score corresponding to the voice data according to the voice analysis result.

3. The method according to claim 2, wherein the voice analysis mode includes a timeout analysis, and the analyzing the voice data according to the voice analysis mode to obtain a voice analysis result includes:

Determining reply intervals between each voice chat message and the corresponding pair of chat messages;

determining a timeout analysis result according to all the reply intervals;

the determining the score corresponding to the voice data according to the voice analysis result comprises the following steps:

and determining the score corresponding to the voice data according to the overtime analysis result.

4. The method according to claim 2, wherein the voice analysis mode includes identity analysis, and the analyzing the voice data according to the voice analysis mode to obtain a voice analysis result includes:

voiceprint recognition is carried out on the voice data to obtain voice voiceprint characteristics;

acquiring preset voiceprint features corresponding to the user;

carrying out identity analysis according to the voice voiceprint characteristics and the preset voiceprint characteristics, and determining an identity analysis result;

and determining the score corresponding to the voice data according to the identity analysis result.

5. The method of claim 1, wherein analyzing the voice data to obtain an analysis result comprises:

Determining the number of the voice chat messages in the voice data;

determining scene information corresponding to the scene chat information;

determining a preset quantity threshold corresponding to the voice chat information according to the context information;

determining the process completion degree according to the number of the voice chat messages and the preset number threshold;

and determining the score corresponding to the voice data according to the flow completion degree.

6. The method of claim 1, wherein analyzing the voice data to obtain an analysis result comprises:

performing text recognition on the voice data to obtain text information corresponding to the voice data;

analyzing the text information according to a text analysis mode to obtain a text analysis result; the text analysis mode comprises at least one of sensitive word analysis, keyword analysis and speaking accuracy analysis;

and determining the score corresponding to the voice data according to the text analysis result.

7. The method according to claim 6, wherein the text analysis mode includes a speaking accuracy analysis, and the analyzing the text information according to the text analysis mode to obtain a text analysis result includes:

determining reference text information according to a plurality of paired chat information in the scene chat information;

performing a speaking accuracy analysis according to the text information and the reference text information, and determining a speaking accuracy analysis result;

the determining the score corresponding to the voice data according to the text analysis result comprises the following steps:

and determining the score corresponding to the voice data according to the conversation accuracy analysis result.

8. A scoring apparatus based on voice data, the apparatus comprising:

the acquisition module is used for acquiring the contextual chat information; the contextual chat information comprises a plurality of voice chat information and a plurality of training chat information; wherein, the voice chat information corresponds to the training chat information one by one;

the first determining module is used for determining voice data input by a user according to a plurality of voice chat information in the contextual chat information;

the analysis module is used for analyzing the voice data to obtain an analysis result;

And the second determining module is used for determining the score corresponding to the voice data according to the analysis result.

9. An electronic device, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the speech data based scoring method of any one of claims 1-7.

10. A computer readable storage medium having stored thereon computer executable instructions for enabling an electronic device to perform the speech data based scoring method of any one of claims 1-7.