CN113823329A

CN113823329A - Data processing method and computer device

Info

Publication number: CN113823329A
Application number: CN202110876158.5A
Authority: CN
Inventors: 林炳怀; 王丽园
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-12-21

Abstract

The embodiment of the application discloses a data processing method and computer equipment, wherein the method comprises the following steps: acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier; extracting audio features corresponding to the audio data in the voice quality identification request, and generating pronunciation quality scores corresponding to the audio data according to the audio features; and acquiring a target looseness adjustment parameter corresponding to the target looseness identification in the voice quality recognition request, and carrying out score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identification. By adopting the method and the device, the pronunciation scores of various scoring standards can be efficiently provided for the user, and the use cost and the development cost can be reduced.

Description

Data processing method and computer device

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data processing method and a computer device.

Background

Along with the increasingly wide application of oral English in life, the exercise of oral English is more and more taken attention by people. People of all ages have different requirements on the evaluation standard when using oral English evaluation software.

In practical application, different evaluation degrees of looseness are needed for different evaluation objects. The evaluation needs to be relatively loose for young children and relatively strict for adult. The evaluation criteria used are also different for different levels in adults.

Currently on the market, spoken language evaluation applications are designed for a fixed professional level of population type, for example, a spoken language evaluation application may be primarily directed to a population type that may be one of pre-school, elementary school, middle school, or university. If the same person carries out different professional level tests, software aiming at different professional levels needs to be downloaded, and a user needs to use the software respectively to obtain pronunciation scores with different scoring standards, so that the use cost of the user is undoubtedly increased, and the development cost is undoubtedly increased by a plurality of spoken language assessment applications aiming at different professional levels. The current oral evaluation application aims at the single professional level of the crowd type and cannot meet the requirement of the same user on testing different professional levels.

Disclosure of Invention

The embodiment of the application provides a data processing method and computer equipment, which can efficiently provide pronunciation scores with various scoring standards for a user, and can reduce the use cost and the development cost.

One aspect of the present application provides a data processing method, including:

acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier;

extracting audio features corresponding to the audio data in the voice quality identification request, and generating pronunciation quality scores corresponding to the audio data according to the audio features;

acquiring a target looseness adjustment parameter corresponding to the target looseness identification in the voice quality identification request,

and carrying out score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identification.

Further, extracting audio features corresponding to the audio data in the voice quality recognition request, and generating pronunciation quality scores corresponding to the audio data according to the audio features, including:

extracting acoustic features corresponding to the audio data in the voice quality recognition request, recognizing text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into a voice quality recognition model;

and carrying out convolution processing on the audio features through the voice quality recognition model to obtain audio hiding features, inputting the audio hiding features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer.

Further, the target width adjustment parameter comprises a target pronunciation score mean value and a target pronunciation score standard deviation; acquiring a target looseness adjustment parameter corresponding to a target looseness identifier in a voice quality recognition request, and performing score adjustment on pronunciation quality scores according to the target looseness adjustment parameter to obtain target pronunciation quality scores matched with the target looseness identifier, wherein the target pronunciation quality scores comprise:

acquiring a looseness parameter set; the width parameter set comprises a pronunciation score mean value and a pronunciation score standard deviation which are respectively corresponding to at least two width marks;

acquiring a target pronunciation score mean value and a target pronunciation score standard deviation corresponding to a target pronunciation identification in a voice quality identification request from a pronunciation quality parameter set;

and according to the target pronunciation score mean value and the target pronunciation score standard deviation, carrying out score adjustment on the pronunciation quality score to obtain a target pronunciation quality score matched with the target looseness identification.

Further, according to the target pronunciation score mean value and the target pronunciation score standard deviation, the pronunciation quality score is subjected to score adjustment, and a target pronunciation quality score matched with the target looseness identifier is obtained, and the method comprises the following steps:

determining the difference value between the pronunciation quality score and the sample pronunciation score mean value as a first target difference value, and determining the ratio of the first target difference value to the sample pronunciation score standard deviation as a first target ratio; the sample pronunciation score mean value and the sample pronunciation score standard deviation are determined based on the sample pronunciation score corresponding to the sample audio data set;

and determining the product of the first target ratio and the target pronunciation score standard deviation as a first target product, and determining the sum of the first target product and the target pronunciation score mean as the target pronunciation quality score.

Further, the target width adjustment parameter includes a width adjustment parameter corresponding to the first width identifier and a width adjustment parameter corresponding to the second width identifier;

acquiring a target looseness adjustment parameter corresponding to a target looseness identifier in a voice quality recognition request, and performing score adjustment on pronunciation quality scores according to the target looseness adjustment parameter to obtain target pronunciation quality scores matched with the target looseness identifier, wherein the target pronunciation quality scores comprise:

if the target looseness identifier is a non-integer, acquiring a minimum integer interval in which the target looseness identifier is located, determining a minimum integer value in the minimum integer interval as a first looseness identifier, and determining a maximum integer value in the minimum integer interval as a second looseness identifier;

acquiring a first weight corresponding to the first looseness identifier and a second weight corresponding to the second looseness identifier;

according to the first weight and the second weight, carrying out weighted summation on the width adjustment parameter corresponding to the first width identification and the width adjustment parameter corresponding to the second width identification to obtain a fused width adjustment parameter;

and carrying out score adjustment on the pronunciation quality score according to the fusion looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier.

Further, the looseness adjustment parameter comprises a pronunciation score mean value and a pronunciation score standard deviation;

according to the first weight and the second weight, weighting and summing the width adjustment parameter corresponding to the first width identification and the width adjustment parameter corresponding to the second width identification to obtain a fusion width adjustment parameter, comprising:

according to the first weight and the second weight, carrying out weighted summation on the pronunciation score mean value corresponding to the first looseness identifier and the pronunciation score mean value corresponding to the second looseness identifier to obtain a fusion pronunciation score mean value;

according to the square value of the first weight and the square value of the second weight, carrying out weighted summation on the square value of the pronunciation scoring standard deviation corresponding to the first loose degree identification and the square value of the pronunciation scoring standard deviation corresponding to the second loose degree identification to obtain the square value of the fusion pronunciation scoring standard deviation, and acquiring the fusion pronunciation scoring standard deviation from the square value of the fusion pronunciation scoring standard deviation;

and determining the fusion pronunciation score mean value and the fusion pronunciation score standard deviation as a fusion width adjustment parameter.

Further, the target looseness adjustment parameter includes a target interpolation table; acquiring a target looseness adjustment parameter corresponding to a target looseness identifier in a voice quality recognition request, and performing score adjustment on pronunciation quality scores according to the target looseness adjustment parameter to obtain target pronunciation quality scores matched with the target looseness identifier, wherein the target pronunciation quality scores comprise:

acquiring a looseness parameter set; the looseness parameter set comprises interpolation tables respectively corresponding to at least two looseness identifications;

in the loose degree parameter set, acquiring a target interpolation table corresponding to a target loose degree identification in the voice quality identification request;

and carrying out score adjustment on the pronunciation quality score according to the target interpolation table to obtain a target pronunciation quality score matched with the target looseness identification.

Further, the scoring adjustment is carried out on the pronunciation quality score according to the target interpolation table, so as to obtain a target pronunciation quality score matched with the target looseness identifier, and the method comprises the following steps:

acquiring a smooth transformation interval of the pronunciation quality score in the target interpolation table, and acquiring a maximum original pronunciation quality score and a minimum original pronunciation quality score in the smooth transformation interval;

acquiring a maximum adjustment pronunciation quality score mapped by the maximum original pronunciation quality score and a minimum adjustment pronunciation quality score mapped by the minimum original pronunciation quality score in a smooth transformation interval;

and mapping the pronunciation quality score as a target pronunciation quality score according to the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score.

Further, the target looseness adjustment parameter comprises an interpolation table corresponding to a third looseness identifier and an interpolation table corresponding to a fourth looseness identifier; acquiring a target looseness adjustment parameter corresponding to a target looseness identifier in a voice quality recognition request, and performing score adjustment on pronunciation quality scores according to the target looseness adjustment parameter to obtain target pronunciation quality scores matched with the target looseness identifier, wherein the target pronunciation quality scores comprise:

if the target looseness identifier is a non-integer, acquiring a minimum integer interval in which the target looseness identifier is located, determining a minimum integer value in the minimum integer interval as a third looseness identifier, and determining a maximum integer value in the minimum integer interval as a fourth looseness identifier;

acquiring a third weight corresponding to the third looseness identifier and a fourth weight corresponding to the fourth looseness identifier;

grading and adjusting the pronunciation quality score according to an interpolation table corresponding to the third looseness identifier to obtain a first interpolation pronunciation quality score;

grading and adjusting the pronunciation quality score according to an interpolation table corresponding to the fourth looseness identifier to obtain a second interpolation pronunciation quality score;

and according to the third weight and the fourth weight, carrying out weighted summation on the first interpolation pronunciation quality score and the second interpolation pronunciation quality score to obtain a target pronunciation quality score matched with the target looseness identifier.

the terminal equipment acquires the audio data and the target looseness identification and generates a voice quality identification request comprising the audio data and the target looseness identification;

sending the voice quality recognition request to a server so that the server acquires a target looseness adjustment parameter corresponding to a target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier; the pronunciation quality score is generated according to the audio characteristics corresponding to the audio data in the voice quality recognition request;

and outputting the target pronunciation quality score returned by the server.

Further, the terminal device obtains the audio data and the target latitude identification, and generates a voice quality identification request including the audio data and the target latitude identification, including:

the terminal equipment responds to the width input operation aiming at the voice input page and determines the input parameters as target width identification;

and responding to audio acquisition operation aiming at the voice input page, acquiring audio data, and generating a voice quality identification request containing the audio data and the target looseness identifier.

One aspect of the present application provides a data processing apparatus, including:

the acquisition module is used for acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier;

the scoring module is used for extracting the audio characteristics corresponding to the audio data in the voice quality recognition request and generating pronunciation quality scores corresponding to the audio data according to the audio characteristics;

and the width adjustment module is used for acquiring a target width adjustment parameter corresponding to the target width identification in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target width adjustment parameter to obtain a target pronunciation quality score matched with the target width identification.

Wherein, the grading module includes:

the feature extraction unit is used for extracting acoustic features corresponding to the audio data in the voice quality identification request, identifying text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into the voice quality identification model;

and the voice quality recognition unit is used for performing convolution processing on the audio features through the voice quality recognition model to obtain audio hidden features, inputting the audio hidden features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer.

The target width adjustment parameters comprise a target pronunciation score mean value and a target pronunciation score standard deviation; the looseness adjustment module includes:

a first parameter obtaining unit, configured to obtain a looseness parameter set; the looseness parameter set comprises a pronunciation score mean value and a pronunciation score standard deviation which are respectively corresponding to at least two looseness identifications;

the first parameter acquisition unit is further used for acquiring a target pronunciation score mean value and a target pronunciation score standard deviation corresponding to a target pronunciation identification in the voice quality recognition request in the pronunciation quality parameter set;

and the first grading adjustment unit is used for carrying out grading adjustment on the pronunciation quality grade according to the target pronunciation grade mean value and the target pronunciation grade standard deviation to obtain the target pronunciation quality grade matched with the target width degree identification.

The score adjusting unit is specifically configured to determine a difference between the pronunciation quality score and the sample pronunciation score mean as a first target difference, determine a ratio of the first target difference to the sample pronunciation score standard deviation as a first target ratio, determine a product of the first target ratio and the target pronunciation score standard deviation as a first target product, and determine that a sum of the first target product and the target pronunciation score mean is the target pronunciation quality score; the sample pronunciation score mean and the sample pronunciation score standard deviation are determined based on the sample pronunciation score corresponding to the sample audio data set.

The target looseness adjustment parameters comprise looseness adjustment parameters corresponding to the first looseness identifications and looseness adjustment parameters corresponding to the second looseness identifications;

the looseness adjustment module includes:

a second parameter obtaining unit, configured to, if the target looseness identifier is a non-integer, obtain a minimum integer interval in which the target looseness identifier is located, determine a minimum integer value in the minimum integer interval as the first looseness identifier, and determine a maximum integer value in the minimum integer interval as a second looseness identifier;

the second parameter obtaining unit is further used for obtaining a first weight corresponding to the first looseness identifier and a second weight corresponding to the second looseness identifier;

the first fusion unit is used for weighting and summing the looseness adjustment parameters corresponding to the first looseness identifications and the looseness adjustment parameters corresponding to the second looseness identifications according to the first weights and the second weights to obtain fusion looseness adjustment parameters;

and the second grading adjustment unit is used for carrying out grading adjustment on the pronunciation quality grade according to the fusion looseness adjustment parameter to obtain a target pronunciation quality grade matched with the target looseness identifier.

The width adjustment parameters comprise pronunciation score mean values and pronunciation score standard deviations;

the first fusion unit is specifically configured to perform weighted summation on the pronunciation score mean corresponding to the first looseness identifier and the pronunciation score mean corresponding to the second looseness identifier according to the first weight and the second weight to obtain a fusion pronunciation score mean, perform weighted summation on the square value of the pronunciation score standard deviation corresponding to the first looseness identifier and the square value of the pronunciation score standard deviation corresponding to the second looseness identifier according to the square value of the first weight and the square value of the second weight to obtain a square value of the fusion pronunciation score standard deviation, acquire a fusion pronunciation score standard deviation from the square value of the fusion pronunciation score standard deviation, and determine the fusion pronunciation score mean and the fusion pronunciation score standard deviation as a fusion looseness adjustment parameter.

Wherein the target looseness adjustment parameter comprises a target interpolation table;

the looseness adjustment module includes:

a third parameter obtaining unit, configured to obtain a looseness parameter set; the looseness parameter set comprises interpolation tables respectively corresponding to at least two looseness identifications;

the third parameter obtaining unit is further configured to obtain a target interpolation table corresponding to the target looseness identifier in the voice quality recognition request from the looseness parameter set;

and the third grading adjustment unit is used for carrying out grading adjustment on the pronunciation quality grade according to the target interpolation table to obtain a target pronunciation quality grade matched with the target looseness identifier.

The third score adjusting unit is specifically configured to acquire a smooth transformation interval of the pronunciation quality score in the target interpolation table, acquire a maximum original pronunciation quality score and a minimum original pronunciation quality score in the smooth transformation interval, acquire a maximum adjusted pronunciation quality score mapped by the maximum original pronunciation quality score and a minimum adjusted pronunciation quality score mapped by the minimum original pronunciation quality score in the smooth transformation interval, and map the pronunciation quality score into the target pronunciation quality score according to the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score.

The target looseness adjustment parameters comprise an interpolation table corresponding to a third looseness identifier and an interpolation table corresponding to a fourth looseness identifier;

the looseness adjustment module includes:

a fourth parameter obtaining unit, configured to, if the target looseness identifier is a non-integer, obtain a minimum integer interval in which the target looseness identifier is located, determine a minimum integer value in the minimum integer interval as a third looseness identifier, and determine a maximum integer value in the minimum integer interval as a fourth looseness identifier;

the fourth parameter obtaining unit is further configured to obtain a third weight corresponding to the third looseness identifier and a fourth weight corresponding to the fourth looseness identifier;

the interpolation scoring unit is used for carrying out scoring adjustment on the pronunciation quality score according to the interpolation table corresponding to the third looseness identifier to obtain a first interpolation pronunciation quality score;

the interpolation scoring unit is further used for carrying out scoring adjustment on the pronunciation quality score according to the interpolation table corresponding to the fourth looseness identifier to obtain a second interpolation pronunciation quality score;

and the second fusion unit is used for weighting and summing the first interpolation pronunciation quality score and the second interpolation pronunciation quality score according to the third weight and the fourth weight to obtain a target pronunciation quality score matched with the target looseness identifier.

the request generation module is used for acquiring the audio data and the target width identification and generating a voice quality identification request comprising the audio data and the target width identification;

the sending module is used for sending the voice quality recognition request to the server so that the server can obtain a target looseness adjustment parameter corresponding to a target looseness identifier in the voice quality recognition request, and carries out grading adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier; the pronunciation quality score is generated according to the audio characteristics corresponding to the audio data in the voice quality recognition request;

and the output module is used for outputting the target pronunciation quality scores returned by the server.

The request generation module is specifically used for responding to the loose degree input operation aiming at the voice input page, determining the input parameters as target loose degree identifiers, responding to the audio acquisition operation aiming at the voice input page, acquiring audio data and generating a voice quality identification request containing the audio data and the target loose degree identifiers.

Another aspect of the present application provides a computer device, including: a processor, a memory, and a network interface;

the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide data communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to perform a method as in an aspect of an embodiment of the present application.

Another aspect of the present application provides a computer storage medium storing a computer program adapted to be loaded by a processor and to perform a method as in one aspect of the embodiments of the present application.

The embodiment of the application acquires a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier; extracting audio features corresponding to the audio data in the voice quality identification request, and generating pronunciation quality scores corresponding to the audio data according to the audio features; and acquiring a target looseness adjustment parameter corresponding to the target looseness identification in the voice quality recognition request, and carrying out score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identification. The method introduces the looseness identification which can be selected by the user in the spoken language scoring process, therefore, when the user has different scoring requirements, different looseness adjustment parameters can be selected by setting different looseness identifications, the pronunciation quality scoring can be adjusted to the scoring effect expected by the user through the looseness adjustment parameters, and the method can realize the function realized by multiple different spoken language evaluation applications by setting the looseness identification, so that the user does not need to download and operate multiple spoken language evaluation applications respectively when needing the pronunciation scoring of different scoring standards, developers also do not need to develop different spoken language evaluation applications respectively according to different scoring standards, therefore, the method can efficiently provide pronunciation scoring of multiple different scoring standards for the user, and can reduce use cost and development cost.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic diagram of a network architecture provided in an embodiment of the present application;

fig. 2 is a scene diagram illustrating a method for adjusting pronunciation quality scores according to an embodiment of the present disclosure;

fig. 3 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 4 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 5 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 6a is a schematic flowchart of a data processing method according to an embodiment of the present application;

FIG. 6b is a schematic diagram illustrating an exemplary embodiment of adjusting pronunciation quality scores based on an interpolation table;

fig. 7 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 8 is a schematic flowchart of a data processing apparatus according to an embodiment of the present application;

fig. 9a is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 9b is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic diagram of a network architecture according to an embodiment of the present invention. The network architecture may include a server 100 and a plurality of terminal devices (as shown in fig. 1, specifically including a terminal device 200a, a terminal device 200b, a terminal device 200c, and the like), where the server 100 may communicate with each terminal device through a network, each terminal device may install a spoken language evaluation application, and the server 100 may be a background server corresponding to the spoken language evaluation application, so that each terminal device may perform data transmission with the server 100 through a client corresponding to the spoken language evaluation application. The terminal device may include a mobile phone, a tablet computer, a notebook computer, a handheld computer, a Mobile Internet Device (MID), a Point Of Sale (POS) machine, and a wearable device (e.g., a smart watch, a smart bracelet, etc.). Each terminal device can be provided with a spoken language evaluation application and can be evaluated by using the spoken language evaluation application.

Referring to fig. 2, fig. 2 is a schematic view of a scene for adjusting pronunciation quality scores. In fig. 2, the spoken language evaluation application can be applied to objective topics, such as spoken topics, and also to subjective topics, such as talking on the picture, spoken composition, etc. Taking the terminal device 200c as an example, the terminal device 200c may open a spoken language evaluation application interface, input a target latitude identifier in the latitude input edit box 201c, and then may start spoken language evaluation, if the target latitude identifier may be 3. And after the target looseness identification is input, clicking a follow-reading starting button to start follow-reading of the sentence. When the following reading of the sentence is started, the looseness input edit box is displayed as a target looseness mark 3, the state is changed to be in an unchangeable state, the starting following reading button can be switched and displayed as an ending following reading button, when the user finishes following reading of the sentence and clicks the ending following reading button, the following reading of the sentence is ended, and the target pronunciation quality score is output and displayed. See also fig. 2 for flow calls in the server.

In fig. 2, the terminal device 200c may package a target looseness identifier input by a user and an audio collected by the terminal device as a voice quality recognition request, and then send the voice quality recognition request to the server 100, where the server 100 obtains the voice quality recognition request and obtains audio data and the target looseness identifier by obtaining the voice quality recognition request. Then, the server 100 extracts the audio features corresponding to the audio data in the voice quality recognition request through the feature extraction module, and generates a pronunciation quality score corresponding to the audio features through the scoring module. The server 100 obtains a target looseness adjustment parameter corresponding to a target looseness identifier in the voice quality recognition request through the looseness adjustment module, scores and adjusts the pronunciation quality score according to the target looseness adjustment parameter, and obtains a target pronunciation quality score matched with the target looseness identifier, so that efficient conversion of the pronunciation quality score is achieved. The server 100 transmits the target pronunciation quality score to the terminal device 200c, and the terminal device 200c receives and displays the target pronunciation quality score, as shown in fig. 2, wherein the display area 202c displays the target pronunciation quality score, and the display area 203c lights the number of stars according to the percentage of the total score of the target pronunciation quality score. For example, if the total score is 100 points and the target pronunciation quality score is 80 points, the percentage of the target pronunciation quality score to the total score is 80%, 80% of 5 stars are lit, i.e., 4 stars are lit.

By the aid of the method for changing the width, spoken language evaluation can be performed according to different ages and with different widths, and practicability of spoken language evaluation software is guaranteed.

Optionally, the server 100 obtains the voice quality recognition request, obtains the audio data and the target articulation degree identifier by obtaining the voice quality recognition request, extracts the audio feature corresponding to the audio data in the voice quality recognition request, generates the articulation quality score corresponding to the audio data according to the audio feature, obtains the target articulation degree adjustment parameter corresponding to the target articulation degree identifier in the voice quality recognition request, and performs the score adjustment on the articulation quality score according to the target articulation degree adjustment parameter, and the process of obtaining the target articulation quality score matched with the target articulation degree identifier may also be implemented in the terminal device 200c, which is not described herein again.

Referring to fig. 3, which is a flowchart illustrating a data processing method provided in an embodiment of the present application, where the method may be executed by a computer device, where the computer device may be a terminal device or a server, and the method may include:

s301, acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier;

wherein, the computer equipment can obtain a voice quality identification request; the voice quality recognition request includes audio data and a target slack identification. The audio data is extracted from the voice, the target looseness identification is a looseness value input by an evaluating person after the spoken language evaluating application is opened, for example, the value range of the looseness value can be 0-4, and the value of the looseness value can be a non-integer, for example, the value of the looseness value can be 1.2, 2.6 or 3.7. For example, after the spoken language evaluation software is opened, the terminal device may have an indication box for prompting to input the degree of looseness, and 2 is input in the indication box, that is, the indication representing the degree of looseness is 2.

Specifically, the digitized sound data is audio data. The process of digitizing sound is actually a process of performing analog-to-digital conversion on a continuous analog audio signal from a voice input device at a certain frequency to obtain audio data; the playing of the digitized sound is to convert the audio data into analog audio signals by digital-to-analog conversion and output the analog audio signals. There are two important metrics in digitizing sound, namely sampling frequency and sampling size. The sampling frequency is the sampling frequency in unit time, the larger the sampling frequency is, the smaller the interval between sampling points is, the more vivid the digitized sound is, but the corresponding data volume is increased, and the more difficult the processing is; the sampling size is the number of digits of the numerical value of the size of the sample value recorded each time, the dynamic change range of sampling is determined, the more the digits are, the more exquisite the change degree of the recorded sound is, and the larger the obtained data size is.

Further, when the target looseness identification is input into the computer equipment and then the voice is input, the computer equipment analyzes the input data, and finds that the target looseness identification and the voice are both input, the condition is achieved, and then the voice quality recognition request can be triggered and obtained. Wherein the audio data can be extracted from the input speech by the audio extraction means by means of the audio extraction means.

S302, extracting audio features corresponding to the audio data in the voice quality identification request, and generating pronunciation quality scores corresponding to the audio data according to the audio features;

specifically, the computer device may extract corresponding audio features through analysis processing of the audio data, and a process of generating a pronunciation quality score corresponding to the audio data by the current computer device according to the audio features may be based on a convolutional neural network model recognition method. The computer equipment can perform model training on the convolutional neural network model based on the artificial label, and can output pronunciation quality scores corresponding to the audio data through the trained convolutional neural network model. For example, a professional with an english background can select a proper spoken language evaluation artificial label for a target loose degree identification of spoken language evaluation, the computer device performs convolutional neural network model training based on the selected spoken language evaluation artificial label to obtain a convolutional neural network model capable of accurately identifying pronunciation quality scores corresponding to audio data, and then the server can input audio features corresponding to the audio data in a speech quality recognition request into the trained convolutional neural network model and output pronunciation quality scores through the convolutional neural network model.

S303, acquiring a target looseness adjustment parameter corresponding to a target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier;

specifically, the target looseness identifiers in the voice quality recognition request are in a value range, different target looseness identifiers can correspond to different target looseness adjustment parameters, different target looseness adjustment parameters are selected to obtain target pronunciation quality scores matched with the different target looseness identifiers, corresponding pronunciation quality scores can be adjusted through the obtained target looseness adjustment parameters, and then the required target pronunciation quality scores are obtained. For example, the target looseness identifier may be 2, the target looseness adjustment parameter corresponding to the target looseness identifier 2 is a, the target looseness identifier may also be 4, the target looseness adjustment parameter corresponding to the target looseness identifier 4 is B, and when the target looseness identifier is 2, the selected corresponding target looseness adjustment parameter a may be looser than the target looseness adjustment parameter B, that is, for the same audio data, the target pronunciation quality score adjusted by using the target looseness adjustment parameter a may be larger than the target pronunciation quality score adjusted by using the target looseness adjustment parameter B.

Further, according to the target pronunciation score mean value and the target pronunciation score standard deviation, score adjustment is carried out on the pronunciation quality score to obtain a pronunciation quality score to be optimized, and if the pronunciation quality score to be optimized is smaller than the minimum value in the pronunciation quality score value range, the minimum value in the pronunciation quality score value range is determined as the target pronunciation quality score matched with the target looseness identifier; and if the pronunciation quality score to be optimized is larger than the maximum value in the pronunciation quality score value range, determining the maximum value in the pronunciation quality score value range as a target pronunciation quality score matched with the target width identification. In one embodiment, the pronunciation quality score to be optimized can be represented by p (improve), the minimum value in the pronunciation quality score value range is C, and the maximum value in the pronunciation quality score value range is D. If the minimum value C in the value range of the pronunciation quality score is 0 and the maximum value D in the value range of the pronunciation quality score is 100, the target pronunciation quality score is min (max (p (improve),0), 100).

According to the embodiment of the application, the voice quality identification request is obtained through the computer equipment, and the audio data and the target looseness identification are obtained. And then, the computer equipment extracts the audio features corresponding to the audio data in the voice quality recognition request and generates pronunciation quality scores corresponding to the audio data according to the audio features. And the computer equipment acquires a target looseness adjusting parameter corresponding to the target looseness identification in the voice quality recognition request, and carries out grading adjustment on the pronunciation quality score according to the target looseness adjusting parameter to obtain a target pronunciation quality score matched with the target looseness identification, so that the pronunciation quality score is quickly changed. The method introduces the looseness identification which can be selected by the user in the spoken language scoring process, therefore, when the user has different scoring requirements, different looseness adjustment parameters can be selected by setting different looseness identifications, the pronunciation quality scoring can be adjusted to the scoring effect expected by the user through the looseness adjustment parameters, and the method can realize the function realized by multiple different spoken language evaluation applications by setting the looseness identification, so that the user does not need to download and operate multiple spoken language evaluation applications respectively when needing the pronunciation scoring of different scoring standards, developers also do not need to develop different spoken language evaluation applications respectively according to different scoring standards, therefore, the method can efficiently provide pronunciation scoring of multiple different scoring standards for the user, and can reduce use cost and development cost.

Referring to fig. 4, which is a flowchart illustrating a data processing method provided in an embodiment of the present application, where the method may be executed by a computer device, where the computer device may be a terminal device or a server, and the method may include:

s401, acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier;

the specific process of this step may refer to S301 in the embodiment corresponding to fig. 3, which is not described herein again.

S402, extracting acoustic features corresponding to the audio data in the voice quality identification request, identifying text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into a voice quality identification model;

specifically, after the audio data in the voice quality recognition request is obtained, effective features corresponding to the audio data may be extracted, and the effective features may include text features and acoustic features. The text features mainly comprise semantic features, pragmatic features, keyword features and text disfluency features. The keyword features mainly comprise extraction of keywords in standard answers and keywords of answer contents, calculation accuracy, recall rate and the like. The pragmatic features include the diversity of words of the answer content, the diversity of sentence patterns and the grammatical accuracy of analyzing the answer content based on the language model. The semantic features include subject features of the answer content, term frequency-inverse text frequency index (tf-idf) features, and the like. tf-idf is proportional to the number of occurrences of a word in a document and inversely proportional to the number of occurrences of the word in the entire language, and is used to measure the importance of a word in a document. The acoustic features are mainly classified into pronunciation accuracy, pronunciation fluency, pronunciation rhythm, and the like. Pronunciation accuracy refers to pronunciation scores such as phoneme, word, sentence level, etc. The pronunciation fluency comprises the speech speed characteristics in the pronunciation process, the characteristics based on time length statistics, such as the average time length of pronunciation sections, the average pause time length between pronunciation sections and the like. The pronunciation rhythm degree comprises evaluation on pronunciation rhythm sense, evaluation on word re-reading correctness in a sentence, evaluation on sentence boundary tone and the like. The text features and the acoustic features may be further collectively referred to as audio features, after the audio features are obtained, the audio features may be input into a speech quality recognition model, the speech quality recognition model may be a regression model or a classification model, which may be a conventional regression model or an emerging regression model, and the selected regression model may be a nearest neighbor node algorithm (KNN) model, a Support Vector Regression (SVR) model, a gradient boosted decision tree (GBT) model, a deep neural network model, or the like.

S403, performing convolution processing on the audio features through the voice quality recognition model to obtain audio hiding features, inputting the audio hiding features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer;

specifically, the audio features are input into the voice quality recognition model, firstly, the audio hidden features can be obtained by specifically analyzing the local information of the audio features through convolution processing, the obtained audio hidden features are input into a classification layer of the voice quality model, and the classification layer performs classification matching on the obtained audio hidden features to obtain a matched pronunciation quality score.

S404, acquiring a looseness parameter set;

specifically, a looseness parameter set is obtained, and the looseness parameter set includes a pronunciation score mean value and a pronunciation score standard deviation respectively corresponding to at least two looseness identifications. For example, the two ease designations in the range of 0-4 may be included in the ease parameter set, and if the two ease designations included in the ease parameter set are 2 and 3, the ease parameter set includes a mean pronunciation score corresponding to the ease designation 2, a standard deviation pronunciation score corresponding to the ease designation 2, a mean pronunciation score corresponding to the ease designation 3, and a standard deviation pronunciation score corresponding to the ease designation 3.

S405, acquiring a target pronunciation score mean value and a target pronunciation score standard deviation corresponding to a target pronunciation identification in the voice quality recognition request from the pronunciation quality parameter set;

specifically, a target pronunciation score mean and a target pronunciation score standard deviation corresponding to the target pronunciation looseness identifier obtained from the voice quality recognition request, and different target pronunciation score mean and target pronunciation score standard deviation represent parameters of different target pronunciation looseness identifiers, for example, for a looser target pronunciation quality score, the target pronunciation score mean is larger, and the target pronunciation score standard deviation is smaller. For a more strict target pronunciation quality score, the target pronunciation score mean is smaller, and the target pronunciation score standard deviation is larger. For example, a pronunciation score standard deviation of a loose note 2 may be less than a pronunciation score standard deviation of a loose note 3. And after the target loose degree identifier is determined, a target pronunciation score mean value and a target pronunciation score standard deviation corresponding to the target loose degree identifier in the voice quality recognition request can be obtained from the loose degree parameter set.

S406, determining the difference value between the pronunciation quality score and the sample pronunciation score mean value as a first target difference value, and determining the ratio of the first target difference value to the sample pronunciation score standard deviation as a first target ratio; the sample pronunciation score mean value and the sample pronunciation score standard deviation are determined based on the sample pronunciation score corresponding to the sample audio data set;

specifically, a certain number of voice samples are randomly sampled from a large number of evaluated data, acoustic features and text features in voice data of the voice samples are extracted, convolution processing is performed on the acoustic features and the text features of the voice data of the samples to obtain audio hidden features, a batch of sample pronunciation scores p1, p2, p3, p4, … … and pn are obtained through a classification layer, if the pronunciation scores of the batch of samples accord with Gaussian distribution, a sample pronunciation mean value mean (raw) and a sample pronunciation score standard deviation std (raw) are obtained, a difference value between the pronunciation quality score and the sample pronunciation score mean value is determined as a first target difference value, and a ratio of the first target difference value and the sample pronunciation score standard deviation is determined as a first target ratio value. For example, if p is used to represent pronunciation quality score, p-mean (raw) is the first target difference,

i.e. the first target ratio.

And S407, determining the product of the first target ratio and the target pronunciation score standard deviation as a first target product, and determining the sum of the first target product and the target pronunciation score mean as the target pronunciation quality score.

Specifically, based on the step of S406, the product of the first target ratio and the target pronunciation score standard deviation is determined as the first target product, and the sum of the first target product and the target pronunciation score mean is actually the target pronunciation quality score. For example, if the target looseness is identified as i, the target pronunciation score mean may be mean_iThe standard deviation of the target pronunciation score may be std_iP represents pronunciation quality score, then

std_iThat is to say the first target product, is,

namely the target pronunciation quality score.

The method introduces the target loose degree identification which can be selected by the user in the spoken language scoring process, therefore, when the user has different scoring requirements, different target pronunciation scoring mean values and target pronunciation scoring standard deviations can be selected by setting different target loose degree identifications, the pronunciation quality scoring can be adjusted to the target pronunciation quality scoring effect expected by the user through the target pronunciation scoring mean values and the target pronunciation scoring standard deviations, and the method can realize the functions realized by various different spoken language evaluation applications by setting the target loose degree identification, so that the user does not need to download and operate the various spoken language evaluation applications respectively when needing the pronunciation scoring of different scoring standards, developers do not need to develop different spoken language evaluation applications respectively aiming at different scoring standards, therefore, the method can efficiently provide the pronunciation scoring of various different scoring standards for the user, and the use cost and the development cost can be reduced.

Referring to fig. 5, which is a flowchart illustrating a data processing method provided in an embodiment of the present application, where the method may be executed by a computer device, where the computer device may be a terminal device or a server, and the method may include:

s501, acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier;

S502, extracting acoustic features corresponding to the audio data in the voice quality identification request, identifying text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into a voice quality identification model;

the specific process of this step may refer to S402 in the embodiment corresponding to fig. 4, which is not described herein again.

S503, carrying out convolution processing on the audio features through the voice quality recognition model to obtain audio hidden features, inputting the audio hidden features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer;

the specific process of this step may refer to S403 in the embodiment corresponding to fig. 4, which is not described herein again.

S504, if the target looseness identifier is a non-integer, obtaining a minimum integer interval in which the target looseness identifier is located, determining a minimum integer value in the minimum integer interval as a first looseness identifier, and determining a maximum integer value in the minimum integer interval as a second looseness identifier;

specifically, if the target looseness identifier is a non-integer, a minimum integer interval where the target looseness identifier is located is obtained, parameters of the non-integer target looseness identifier are obtained by means of the looseness identifiers of two integers in the minimum integer interval, the minimum integer value in the minimum integer interval is determined as a first looseness identifier, and the maximum integer value in the minimum integer interval is determined as a second looseness identifier. For example, if the target looseness identifier is 2.2, the corresponding minimum integer interval is 2 to 3, 2 is the first looseness identifier, and 3 is the second looseness identifier.

S505, acquiring a first weight corresponding to the first looseness identifier and a second weight corresponding to the second looseness identifier;

specifically, by using the looseness identifier of the minimum integer and the looseness identifier of the maximum integer in the minimum integer interval, a first weight corresponding to the first looseness identifier and a second weight corresponding to the second looseness identifier may be obtained first. For example, if floor indicates that the ease flag i is rounded to the smallest integer of the smallest integer interval and ceil indicates that the ease i is rounded to the largest integer of the smallest integer interval, in one embodiment, i may be 2.2, floor (i) is 2 and ceil (i) is 3. Wherein the first weight may be (1- (i-floor (i))) and the second weight may be (i-floor (i))).

S506, according to the first weight and the second weight, carrying out weighted summation on the pronunciation score mean value corresponding to the first looseness identifier and the pronunciation score mean value corresponding to the second looseness identifier to obtain a fusion pronunciation score mean value;

specifically, the obtained fusion pronunciation score mean value may be generated by using a pronunciation score mean value of a minimum integer (i.e., a first ease identification) and a pronunciation score mean value of a maximum integer (i.e., a second ease identification) in a minimum integer interval, and is obtained by performing weighted summation on a pronunciation score mean value corresponding to the first ease identification and a pronunciation score mean value corresponding to the second ease identification. For example, if floor indicates that the value of the width i is rounded to the smallest integer of the smallest integer interval, ceil indicates that the width i is rounded to the largest integer of the smallest integer interval, the mean value of the pronunciation score corresponding to the first width indicator may be mean (floor (i)), and the mean value of the pronunciation score corresponding to the second width indicator may be mean (ceil (i)), the rounding distance between the width i and the smallest integer and the largest integer of the smallest integer interval, and the rounding distance between the width i and the largest integer, compared to the integer (smallest integer or largest integer) with the smaller rounding distance to prove that the width i is closer to it, the mean value of the pronunciation score of the integer with the smaller rounding distance, the mean value of the pronunciation score of the integer with the larger rounding distance, have a greater influence on the score adjustment factor corresponding to the width i, that is, that the value of the width i rounding parameter is inversely proportional to the rounding distance of the width i, therefore, based on the first weight (1- (i-floor (i))) and the second weight (i-floor (i))), the pronunciation score mean value corresponding to the first loose index and the pronunciation score mean value corresponding to the second loose index are weighted and summed to obtain (1- (i-floor (i)) + (i-floor (i))) mean (center (i))) which is the fusion pronunciation score mean value; in one embodiment, if i can be 2.2, then mean (floor (i)) is mean (2) and mean (ceil (i)) is mean (3), i.e., 2.2 is less than 3, thus, the magnitude of the first weight will be greater than the magnitude of the second weight, and the mean fused pronunciation score will be (1- (2.2-2))) mean (2) + (2.2-2) × mean (3).

S507, according to the square value of the first weight and the square value of the second weight, carrying out weighted summation on the square value of the pronunciation scoring standard deviation corresponding to the first loose degree identification and the square value of the pronunciation scoring standard deviation corresponding to the second loose degree identification to obtain a square value of a fusion pronunciation scoring standard deviation, and obtaining the fusion pronunciation scoring standard deviation from the square value of the fusion pronunciation scoring standard deviation;

specifically, a fusion pronunciation score is obtainedThe score standard deviation can be generated by means of a pronunciation score standard deviation corresponding to a minimum integer (i.e., a first looseness degree) and a pronunciation score standard deviation corresponding to a maximum integer (i.e., a second looseness degree) in a minimum integer interval, specifically, a square value of the fusion pronunciation score standard deviation is obtained by weighting and summing a square value of the pronunciation score standard deviation corresponding to the first looseness degree identification and a square value of the pronunciation score standard deviation corresponding to the second looseness degree identification, and the fusion pronunciation score standard deviation is obtained from the square value of the fusion pronunciation score standard deviation. For example, if floor indicates that the degree of looseness i is rounded to the minimum integer of the minimum integer interval, and ceil indicates that the degree of looseness i is rounded to the maximum integer of the minimum integer interval, the first weight may be (1- (i-floor (i))), the second weight may be (i-floor (i)), and the square value of the standard deviation of pronunciation score corresponding to the first degree of looseness may be std (floor (i))²The second ease label may have the square of the standard deviation of pronunciation score as std (ceil (i))²The square value of the fusion pronunciation score standard deviation is std_i ²＝((1-(i-floor(i))))²*std(floor(i))²+(i-floor(i))²*std(ceil(i))²And (4) opening the root number of the square value of the fusion pronunciation score standard deviation to obtain the fusion pronunciation score standard deviation.

And S508, determining the fusion pronunciation score mean value and the fusion pronunciation score standard deviation as fusion width adjustment parameters.

S509, carrying out score adjustment on the pronunciation quality score according to the fusion width adjustment parameter to obtain a target pronunciation quality score matched with the target width identification;

specifically, based on the step S508, after the fusion looseness adjustment parameter is obtained, the pronunciation quality score may be subjected to score adjustment according to the fusion looseness adjustment parameter, so as to obtain a target pronunciation quality score matched with the target looseness identifier. The score adjustment process may refer to step S407 in the embodiment corresponding to fig. 4, which is not described herein again. The method introduces the looseness identification which can be selected by the user in the spoken language scoring process, therefore, when the user has different scoring requirements, different target looseness identifications can be set, the set target looseness identification can also be non-integer, different looseness adjusting parameters can be selected, the pronunciation quality scoring can be adjusted to the scoring effect expected by the user through the looseness adjusting parameters, the method can realize the functions realized by various spoken language evaluating applications by setting the target looseness identification, more choices can be made by processing the non-integer target looseness identification, the target pronunciation quality scoring granularity can be thinner, the use dimension of the target looseness identification is refined, more comprehensive professional level coverage is provided, and the user does not need to download and operate a plurality of spoken language evaluating applications respectively when needing the pronunciation scoring of different scoring standards, developers do not need to respectively develop different oral evaluation applications according to different scoring standards, so that the method and the system can efficiently provide pronunciation scores of various scoring standards for users, and can reduce use cost and development cost.

Referring to fig. 6a, which is a schematic flowchart of a data processing method provided in an embodiment of the present application, where the method may be executed by a computer device, and the computer device may be a terminal device or a server, and the method may include:

s601, acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier;

S602, extracting acoustic features corresponding to the audio data in the voice quality identification request, identifying text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into a voice quality identification model;

S603, performing convolution processing on the audio features through the voice quality recognition model to obtain audio hidden features, inputting the audio hidden features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer;

S604, acquiring a looseness parameter set;

specifically, a looseness parameter set is obtained, where the looseness parameter set includes interpolation tables corresponding to at least two looseness identifiers, respectively. The interpolation table is a graph of linear interpolation, and the linear interpolation refers to an interpolation mode in which an interpolation function is a first-order polynomial. The method adopts a linear interpolation mode to adjust the width of the fraction. Linear interpolation indicates that given (x0, y0), (x1, y1) makes a prediction of the value of x for samples between x0 and x 1. And finally obtaining an interpolation result y. And by designing different interpolation tables, the width degree conversion is realized. For example, the two ease designations in the range of 0-4 may be included in the ease parameter set, and if the two ease designations included in the ease parameter set are 2 and 3, the ease parameter set includes an interpolation table corresponding to the ease designation 2 and an interpolation table corresponding to the ease designation 3.

S605, acquiring a target interpolation table corresponding to a target looseness identifier in the voice quality identification request from the looseness parameter set;

specifically, the looseness parameter set may include interpolation tables corresponding to a plurality of looseness identifiers, and after the target looseness identifier is obtained, the target interpolation table corresponding to the target looseness identifier in the voice quality recognition request may be obtained.

S606, acquiring a smooth transformation interval of the pronunciation quality score in the target interpolation table, and acquiring a maximum original pronunciation quality score and a minimum original pronunciation quality score in the smooth transformation interval;

specifically, the interpolation table may have a plurality of different smooth transition intervals, and by looking up the interpolation table, a smooth transition interval of the pronunciation quality score in the target interpolation table may be obtained, and a maximum original pronunciation quality score and a minimum original pronunciation quality score in the smooth transition interval may be obtained. Referring to fig. 6b together, fig. 6b is a schematic diagram illustrating the adjustment of pronunciation quality scores based on an interpolation table, in fig. 6b, wherein the horizontal axis may be pronunciation quality scores, the vertical axis may be target pronunciation quality scores, x0 may be minimum raw pronunciation quality scores, and x1 may be maximum raw pronunciation quality scores. In an interpolation table, if the pronunciation quality score is 74, a smooth interval corresponding to 74 is found from the interpolation table, if the smooth interval corresponding to 74 is 73-75, the maximum original pronunciation quality score in the smooth conversion interval is 75, and the minimum original pronunciation quality score in the smooth conversion interval is 73.

S607, acquiring the maximum adjustment pronunciation quality score mapped by the maximum original pronunciation quality score and the minimum adjustment pronunciation quality score mapped by the minimum original pronunciation quality score in the smooth transformation interval;

specifically, by looking up the interpolation table, the maximum adjusted pronunciation quality score mapped by the maximum original pronunciation quality score and the minimum adjusted pronunciation quality score mapped by the minimum original pronunciation quality score can be obtained in the smooth transformation interval. Referring to fig. 6b again, y0 may be the minimum adjusted pronunciation quality score, y1 may be the maximum adjusted pronunciation quality score, in an interpolation table, if the pronunciation quality score is 74, a smooth interval corresponding to 74 is found from the interpolation table, if the smooth interval corresponding to 74 is 73-75, the maximum adjusted pronunciation quality score mapped by the maximum original pronunciation quality score is 87 according to the corresponding relationship of the interpolation table, and the minimum adjusted pronunciation quality score mapped by the minimum original pronunciation quality score is 86.

And S608, mapping the pronunciation quality score to a target pronunciation quality score according to the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score.

Specifically, the corresponding relationship between the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score in the smooth transition interval can be obtained from the interpolation table according to the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score in the smooth transition intervalAnd changing the corresponding relation of the intervals, and mapping the pronunciation quality score into a target pronunciation quality score. Referring again to fig. 6b, for a given (x0, y0), (x1, y1) a sample pronunciation quality score x between x0 and x1 is predicted,

the target pronunciation quality score is obtained, and finally the sample target pronunciation quality score y is obtained. In an interpolation table, if the pronunciation quality score is 74, finding a smooth interval corresponding to 74 from the interpolation table, if the smooth interval corresponding to 74 is 73-75, obtaining a maximum adjustment pronunciation quality score mapped by the maximum original pronunciation quality score to be 87 through the corresponding relation of the interpolation table, and obtaining a minimum adjustment pronunciation quality score mapped by the minimum original pronunciation quality score to be 86, wherein the final adjusted score is 86.5. Each interpolation table represents a type of looseness indication.

The method introduces the looseness identification which can be selected by the user in the spoken language scoring process, therefore, when the user has different scoring requirements, different looseness adjustment parameters can be selected by setting different looseness identifications, the pronunciation quality scoring can be adjusted to the scoring effect expected by the user through the looseness adjustment parameters, the method can realize the functions realized by various spoken language evaluation applications by setting the looseness identification, the use model can be more freely established by using the target looseness adjustment parameters of the interpolation table, the diversity of the target looseness parameter set is increased, the user does not need to respectively download and operate a plurality of spoken language evaluation applications when needing the pronunciation scoring of different scoring standards, developers do not need to respectively develop different spoken language evaluation applications aiming at different scoring standards, therefore, the method can efficiently provide suitable pronunciation scoring for different crowd types, and the use cost and the development cost can be reduced.

Referring to fig. 7, which is a flowchart illustrating a data processing method provided in an embodiment of the present application, where the method may be executed by a computer device, where the computer device may be a terminal device or a server, and the method may include:

s701, acquiring a voice quality identification request; the voice quality recognition request comprises audio data and a target looseness identifier;

S702, extracting acoustic features corresponding to the audio data in the voice quality identification request, identifying text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into a voice quality identification model;

S703, performing convolution processing on the audio features through the voice quality recognition model to obtain audio hidden features, inputting the audio hidden features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer;

S704, if the target looseness identifier is a non-integer, obtaining a minimum integer interval in which the target looseness identifier is located, determining a minimum integer value in the minimum integer interval as a third looseness identifier, and determining a maximum integer value in the minimum integer interval as a fourth looseness identifier;

specifically, if the target looseness identifier is a non-integer, a minimum integer interval where the target looseness identifier is located is obtained, parameters of the non-integer target looseness identifier are obtained by means of the looseness identifiers of two integers in the minimum integer interval, the minimum integer value in the minimum integer interval is determined as a third looseness identifier, and the maximum integer value in the minimum integer interval is determined as a fourth looseness identifier. For example, if the target ease identifier is 2.2, the corresponding minimum integer interval is 2 to 3, 2 is the third ease identifier, and 3 is the fourth ease identifier.

S705, acquiring a third weight corresponding to the third looseness identifier and a fourth weight corresponding to the fourth looseness identifier;

for example, if floor indicates that the looseness i is rounded to the smallest integer of the smallest integer interval and ceil indicates that the looseness i is rounded to the largest integer of the smallest integer interval, the third weight may be (1- (i-floor (i))) and the fourth weight may be (i-floor (i)).

S706, carrying out score adjustment on the pronunciation quality score according to the interpolation table corresponding to the third looseness identifier to obtain a first interpolation pronunciation quality score;

specifically, an interpolation table corresponding to a third ease identifier is obtained from the ease parameter set, the ease parameter set may include a plurality of interpolation tables corresponding to the ease identifiers, and the pronunciation quality score is subjected to score adjustment according to the interpolation table corresponding to the third ease identifier, so as to obtain the first interpolation pronunciation quality score. For example, if the target looseness identifier is 2.2, the corresponding minimum integer interval is 2 to 3, and 2 is the third looseness identifier, and the pronunciation quality score is subjected to score adjustment according to the interpolation table corresponding to the third looseness identifier 2, so that the first interpolation pronunciation quality score is obtained.

S707, carrying out score adjustment on the pronunciation quality score according to an interpolation table corresponding to the fourth looseness identifier to obtain a second interpolation pronunciation quality score;

specifically, an interpolation table corresponding to the fourth looseness identifier is obtained from the looseness parameter set, and the pronunciation quality score is subjected to score adjustment according to the interpolation table corresponding to the fourth looseness identifier, so that a second interpolation pronunciation quality score is obtained. For example, if the target looseness identifier is 2.2, the corresponding minimum integer interval is 2 to 3, and 3 is the fourth looseness identifier, and the score of the pronunciation quality score is adjusted according to the interpolation table corresponding to the fourth looseness identifier 3 to obtain a second interpolation pronunciation quality score.

And S708, according to the third weight and the fourth weight, carrying out weighted summation on the first interpolation pronunciation quality score and the second interpolation pronunciation quality score to obtain a target pronunciation quality score matched with the target looseness identifier.

Specifically, the first interpolation pronunciation quality score and the second interpolation pronunciation quality score can be weighted and summed through a third weight corresponding to the third latitude mark and a fourth weight corresponding to the fourth latitude mark, so as to obtain a target pronunciation quality score matched with the target latitude mark. For example, if the target articulation quality flag is 2.2, the corresponding minimum integer interval is 2 to 3, 2 is a third articulation flag, the articulation quality score is subjected to score adjustment according to an interpolation table corresponding to the third articulation flag 2 to obtain a first interpolated articulation quality score, 3 is a fourth articulation flag, the articulation quality score is subjected to score adjustment according to an interpolation table corresponding to the fourth articulation flag 3 to obtain a second interpolated articulation quality score, if floor represents that articulation quality i is rounded to the minimum integer interval, ceil represents that articulation quality i is rounded to the maximum integer interval, the third weight may be (1- (i-elevator (i)), the fourth weight may be (i-elevator (i)), and if the first interpolated articulation quality score is a, and the second interpolated articulation quality score is B, the target articulation quality score may be (1- (i-elevator a) (i-elevator (i)), (i +) (B) (i) B).

In the embodiment of the present application, it can be seen that, after the latitude degree transformation is performed, as the latitude degree increases, the integral score weight shifts to the high score direction.

The method introduces the looseness identification which can be selected by the user in the spoken language scoring process, therefore, when the user has different scoring requirements, different looseness adjustment parameters can be selected by setting different looseness identifications, the pronunciation quality scoring can be adjusted to the scoring effect expected by the user through the looseness adjustment parameters, the method can realize the functions realized by various spoken language evaluating applications by setting the looseness identification, the target pronunciation quality scoring granularity can be thinner by using the target looseness adjustment parameter of an interpolation table and processing the non-integer target looseness identification, more comprehensive professional level coverage is provided, the user does not need to respectively download and operate a plurality of spoken language evaluating applications when needing the pronunciation of different scoring standards, developers do not need to respectively develop different spoken language evaluating applications according to different scoring standards, therefore, the method and the device can effectively provide suitable pronunciation scores for different crowd types, and can reduce use cost and development cost.

Please refer to fig. 8, which is a flowchart illustrating a data processing method according to an embodiment of the present application, where the method may be executed by a terminal device, and the method may include:

s801, the terminal equipment acquires audio data and a target width identification, and generates a voice quality identification request comprising the audio data and the target width identification;

specifically, the terminal equipment responds to the width input operation aiming at the voice input page and determines the input parameters as target width identification; and responding to audio acquisition operation aiming at the voice input page, acquiring audio data, and generating a voice quality identification request containing the audio data and the target looseness identifier.

S802, sending the voice quality recognition request to a server so that the server can acquire a target looseness adjustment parameter corresponding to a target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier; the pronunciation quality score is generated according to the audio characteristics corresponding to the audio data in the voice quality recognition request;

specifically, the terminal device sends the voice recognition request to the server, the server obtains a target looseness adjustment parameter according to a corresponding relation between a target looseness identification and a target looseness adjustment parameter in the voice quality recognition request, and then scores and adjusts the pronunciation quality score through the target looseness adjustment parameter, so that the obtained target pronunciation quality score is matched with the target looseness identification. The process of the server adjusting the pronunciation quality score may refer to the embodiments corresponding to fig. 3 to fig. 7, and will not be described herein again.

S803, outputting the target pronunciation quality score returned by the server;

specifically, the terminal device receives the target pronunciation quality score sent by the server, and outputs and displays the received target pronunciation quality score.

According to the embodiment of the application, the audio data and the target looseness identification are obtained through the terminal equipment, the voice quality identification request comprising the audio data and the target looseness identification is generated, the voice quality identification request is sent to the server, so that the server obtains the target looseness adjustment parameter corresponding to the target looseness identification in the voice quality identification request, grading adjustment is conducted on the pronunciation quality grade according to the target looseness adjustment parameter, the target pronunciation quality grade matched with the target looseness identification is obtained, the pronunciation quality grade is generated according to the audio feature corresponding to the audio data in the voice quality identification request, the terminal equipment outputs the target pronunciation quality grade returned by the server, and therefore rapid conversion of the pronunciation quality grade is achieved. The method introduces the looseness identification which can be selected by the user in the spoken language scoring process, optimizes the interactive flow between the terminal equipment and the server, therefore, when the user has different scoring requirements, different looseness adjustment parameters can be selected by setting different looseness identifications, the pronunciation quality scoring can be adjusted to the scoring effect expected by the user through the looseness adjustment parameters, the method can realize the functions realized by various spoken language evaluation applications by setting the looseness identification, improves the interactive efficiency between the terminal equipment and the server, ensures that the user does not need to respectively download and operate a plurality of spoken language evaluation applications when needing the pronunciation scoring of different scoring standards, developers also do not need to respectively develop different spoken language evaluation applications according to different scoring standards, and therefore, the method can efficiently provide the pronunciation scoring of various scoring standards for the user, and the use cost and the development cost can be reduced.

Please refer to fig. 9a, which is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 9a, the data processing apparatus 1 may be applied to any one of the terminal devices in the embodiment corresponding to fig. 1, and the data processing apparatus 1 may include: the system comprises an acquisition module 11, a grading module 12 and a looseness adjusting module 13;

an obtaining module 11, configured to obtain a voice quality recognition request; the voice quality recognition request comprises audio data and a target looseness identifier;

the specific implementation of the obtaining module 11 may refer to step S301 in the embodiment of fig. 3, which is not described herein again.

The scoring module 12 is configured to extract an audio feature corresponding to the audio data in the voice quality recognition request, and generate a pronunciation quality score corresponding to the audio data according to the audio feature;

the specific implementation of the scoring module 12 may refer to step S302 in the embodiment of fig. 3, which is not described herein again.

The width adjustment module 13 is configured to acquire a target width adjustment parameter corresponding to a target width identifier in the voice quality recognition request, and perform score adjustment on the pronunciation quality score according to the target width adjustment parameter to obtain a target pronunciation quality score matched with the target width identifier;

the specific implementation of the looseness adjusting module 13 may refer to step S303 in the embodiment of fig. 3, and details are not described here.

Wherein, the scoring module 12 includes:

a feature extraction unit 121, configured to extract an acoustic feature corresponding to the audio data in the voice quality recognition request, recognize text information of the audio data, extract a text feature corresponding to the text information, determine the acoustic feature and the text feature as an audio feature, and input the audio feature into the voice quality recognition model;

the specific implementation of the feature extraction unit 121 may refer to step S402 in the embodiment of fig. 4, which is not described herein again.

The voice quality recognition unit 122 is configured to perform convolution processing on the audio features through the voice quality recognition model to obtain audio hidden features, input the audio hidden features into a classification layer in the voice quality recognition model, and output pronunciation quality scores through the classification layer;

the specific implementation of the voice quality recognition unit 122 can refer to step S403 in the embodiment of fig. 4, which is not described herein again.

The target width adjustment parameters comprise a target pronunciation score mean value and a target pronunciation score standard deviation; the looseness adjustment module 13 includes:

a first parameter obtaining unit 131, configured to obtain a looseness parameter set; the width parameter set comprises a pronunciation score mean value and a pronunciation score standard deviation which are respectively corresponding to at least two width marks;

the specific implementation of the first parameter obtaining unit 131 may refer to step S404 in the embodiment of fig. 4, which is not described herein again.

The first parameter obtaining unit 131 is further configured to obtain, in the popularity parameter set, a target pronunciation score mean and a target pronunciation score standard deviation corresponding to the target popularity identifier in the voice quality recognition request;

the specific implementation of the first parameter obtaining unit 131 may refer to step S405 in the embodiment of fig. 4, which is not described herein again.

The first score adjusting unit 132 is configured to perform score adjustment on the pronunciation quality score according to the target pronunciation score mean value and the target pronunciation score standard deviation, so as to obtain a target pronunciation quality score matched with the target looseness identifier;

the specific implementation of the first score adjustment unit 132 may refer to steps S406 and S407 in the embodiment of fig. 4, which is not described herein again.

The first score adjusting unit 132 is configured to determine a difference between the pronunciation quality score and the sample pronunciation score mean as a first target difference, determine a ratio of the first target difference to the sample pronunciation score standard deviation as a first target ratio, determine a product of the first target ratio and the target pronunciation score standard deviation as a first target product, and determine that a sum of the first target product and the target pronunciation score mean is the target pronunciation quality score; the sample pronunciation score mean value and the sample pronunciation score standard deviation are determined based on the sample pronunciation score corresponding to the sample audio data set;

the looseness adjustment module 13 further includes:

a second parameter obtaining unit 133, configured to, if the target looseness identifier is a non-integer, obtain a minimum integer interval in which the target looseness identifier is located, determine a minimum integer value in the minimum integer interval as the first looseness identifier, and determine a maximum integer value in the minimum integer interval as the second looseness identifier;

the specific implementation of the second parameter obtaining unit 133 may refer to step S504 in the example of fig. 5, which is not described herein again.

The second parameter obtaining unit 133 is further configured to obtain a first weight corresponding to the first looseness identifier and a second weight corresponding to the second looseness identifier;

the specific implementation of the second parameter obtaining unit 133 may refer to step S505 in the embodiment of fig. 5, which is not described herein again.

The first fusion unit 134 is configured to perform weighted summation on the slack adjustment parameter corresponding to the first slack identifier and the slack adjustment parameter corresponding to the second slack identifier according to the first weight and the second weight to obtain a fusion slack adjustment parameter;

the specific implementation of the first fusing unit 134 may refer to step S506 in the embodiment of fig. 5, which is not described herein again.

The second score adjusting unit 135 is configured to perform score adjustment on the pronunciation quality score according to the fusion width adjustment parameter, so as to obtain a target pronunciation quality score matched with the target width identifier;

the specific implementation of the second score adjustment unit 135 may refer to step S509 in the embodiment of fig. 5, which is not described herein again.

the first fusion unit 134 is specifically configured to perform weighted summation on the pronunciation score mean corresponding to the first looseness identifier and the pronunciation score mean corresponding to the second looseness identifier according to the first weight and the second weight to obtain a fusion pronunciation score mean, perform weighted summation on the square value of the pronunciation score standard deviation corresponding to the first looseness identifier and the square value of the pronunciation score standard deviation corresponding to the second looseness identifier according to the square value of the first weight and the square value of the second weight to obtain a square value of the fusion pronunciation score standard deviation, obtain a fusion pronunciation score standard deviation from the square value of the fusion pronunciation score standard deviation, and determine the fusion pronunciation score mean and the fusion pronunciation score standard deviation as a fusion looseness adjustment parameter;

the specific implementation of the first fusing unit 134 may refer to steps S507 and S508 in the embodiment of fig. 5, which is not described herein again.

the looseness adjustment module 13 further includes:

a third parameter obtaining unit 136, configured to obtain a looseness parameter set; the looseness parameter set comprises interpolation tables respectively corresponding to at least two looseness identifications;

for a specific implementation of the third parameter obtaining unit 136, refer to step S604 in the embodiment of fig. 6a, which is not described herein again.

The third parameter obtaining unit 136 is further configured to obtain, in the looseness parameter set, a target interpolation table corresponding to the target looseness identifier in the voice quality recognition request;

for a specific implementation of the third parameter obtaining unit 136, refer to step S605 in the embodiment of fig. 6a, which is not described herein again.

The third score adjusting unit 137 is configured to perform score adjustment on the pronunciation quality score according to the target interpolation table to obtain a target pronunciation quality score matched with the target looseness identifier;

the specific implementation of the third score adjustment unit 137 may refer to steps S606, S607, and S608 in the embodiment of fig. 6a, which are not described herein again.

The third score adjusting unit 137 is specifically configured to obtain a smooth transformation interval of the pronunciation quality score in the target interpolation table, obtain a maximum original pronunciation quality score and a minimum original pronunciation quality score in the smooth transformation interval, obtain a maximum adjusted pronunciation quality score mapped by the maximum original pronunciation quality score and a minimum adjusted pronunciation quality score mapped by the minimum original pronunciation quality score in the smooth transformation interval, and map the pronunciation quality score into a target pronunciation quality score according to the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score;

the looseness adjustment module 13 further includes:

a fourth parameter obtaining unit 138, configured to, if the target looseness identifier is a non-integer, obtain a minimum integer interval in which the target looseness identifier is located, determine a minimum integer value in the minimum integer interval as a third looseness identifier, and determine a maximum integer value in the minimum integer interval as a fourth looseness identifier;

the specific implementation of the fourth parameter obtaining unit 138 may refer to step S704 in the embodiment of fig. 7, which is not described herein again.

The fourth parameter obtaining unit 138 is further configured to obtain a third weight corresponding to the third ease identifier and a fourth weight corresponding to the fourth ease identifier;

the specific implementation of the fourth parameter obtaining unit 138 may refer to step S705 in the embodiment of fig. 7, which is not described herein again.

The interpolation scoring unit 139 is configured to perform scoring adjustment on the pronunciation quality score according to the interpolation table corresponding to the third looseness identifier, so as to obtain a first interpolation pronunciation quality score;

the specific implementation of the interpolation scoring unit 139 may refer to step S706 in the embodiment of fig. 7, which is not described herein again.

The interpolation scoring unit 139 is further configured to perform scoring adjustment on the pronunciation quality score according to the interpolation table corresponding to the fourth looseness identifier, so as to obtain a second interpolation pronunciation quality score;

the specific implementation of the interpolation scoring unit 139 may refer to step S707 in the embodiment of fig. 7, which is not described herein again.

And a second fusion unit 1310, configured to perform weighted summation on the first interpolation pronunciation quality score and the second interpolation pronunciation quality score according to the third weight and the fourth weight, so as to obtain a target pronunciation quality score matching the target looseness identifier.

The specific implementation of the second fusion unit 1310 may refer to step S708 in the embodiment of fig. 7, which is not described herein again.

Please refer to fig. 9b, which is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. As shown in fig. 9b, the data processing apparatus 2 may be applied to any one of the terminal devices in the embodiment corresponding to fig. 1, and the data processing apparatus 2 may include: a request generation module 21, a transmission module 22 and an output module 23; the request generation module 21 is configured to obtain the audio data and the target looseness identifier, and generate a voice quality recognition request including the audio data and the target looseness identifier;

the specific implementation of the request generating module 21 may refer to step S801 in the embodiment of fig. 8, which is not described herein again.

The sending module 22 is configured to send the voice quality recognition request to the server, so that the server obtains a target looseness adjustment parameter corresponding to the target looseness identifier in the voice quality recognition request, and performs score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier; the pronunciation quality score is generated according to the audio characteristics corresponding to the audio data in the voice quality recognition request;

the specific implementation of the sending module 22 may refer to step S802 in the embodiment of fig. 8, which is not described herein again.

And the output module 23 is configured to output the target pronunciation quality score returned by the server.

The specific implementation of the output module 23 may refer to step S803 in the embodiment of fig. 8, which is not described herein again.

The request generating module 21 is specifically configured to respond to a loose degree input operation for a voice input page, determine an input parameter as a target loose degree identifier, respond to an audio acquisition operation for the voice input page, acquire audio data, and generate a voice quality recognition request including the audio data and the target loose degree identifier.

According to the embodiment of the application, the voice quality identification request is obtained through the terminal equipment, and the audio data and the target looseness identification are obtained. And then, the terminal equipment extracts the audio features corresponding to the audio data in the voice quality recognition request and generates pronunciation quality scores corresponding to the audio data according to the audio features. The terminal equipment acquires a target looseness adjustment parameter corresponding to a target looseness identification in the voice quality recognition request, scores and adjusts the pronunciation quality score according to the target looseness adjustment parameter, and obtains a target pronunciation quality score matched with the target looseness identification, so that the pronunciation quality score is rapidly changed. By the aid of the method for changing the width, spoken language evaluation can be performed according to different ages and with different widths, and practicability of spoken language evaluation software is guaranteed.

Fig. 10 is a schematic structural diagram of another computer device according to an embodiment of the present application. As shown in fig. 10, the computer device may be applied to the terminal device in the embodiment corresponding to fig. 1. The computer device 1000 includes: the processor 1001, the network interface 1004, and the memory 1005, and the computer device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1004 may optionally be at least one storage device located remotely from the processor 1001. As shown in fig. 10, the memory 1004, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1000 shown in fig. 10, the network interface 1004 may provide a network communication function for communicating with a server; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be used to invoke a device control application stored in the memory 1004 to implement:

the processor 1001 obtains a voice quality recognition request; the voice quality recognition request comprises audio data and a target looseness identifier; extracting audio features corresponding to the audio data in the voice quality identification request, and generating pronunciation quality scores corresponding to the audio data according to the audio features; and acquiring a target looseness adjustment parameter corresponding to the target looseness identification in the voice quality recognition request, and carrying out score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identification.

In an embodiment, when the processor 1001 extracts an audio feature corresponding to the audio data in the speech quality recognition request and generates a pronunciation quality score corresponding to the audio data according to the audio feature, the following steps are specifically performed:

extracting acoustic features corresponding to the audio data in the voice quality recognition request, recognizing text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into a voice quality recognition model; and carrying out convolution processing on the audio features through the voice quality recognition model to obtain audio hiding features, inputting the audio hiding features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer.

In an embodiment, when obtaining a target pronunciation quality score corresponding to a target pronunciation quality identifier in a speech quality recognition request, and performing score adjustment on a pronunciation quality score according to the target pronunciation quality adjustment parameter, the processor 1001 specifically performs the following steps:

acquiring a looseness parameter set; the width parameter set comprises a pronunciation score mean value and a pronunciation score standard deviation which are respectively corresponding to at least two width marks; acquiring a target pronunciation score mean value and a target pronunciation score standard deviation corresponding to a target pronunciation identification in a voice quality identification request from a pronunciation quality parameter set; and according to the target pronunciation score mean value and the target pronunciation score standard deviation, carrying out score adjustment on the pronunciation quality score to obtain a target pronunciation quality score matched with the target looseness identification.

In one embodiment, when the processor 1001 performs score adjustment on the pronunciation quality score according to the target pronunciation score mean and the target pronunciation score standard deviation to obtain a target pronunciation quality score matching the target looseness identifier, the following steps are specifically performed:

determining the difference value between the pronunciation quality score and the sample pronunciation score mean value as a first target difference value, and determining the ratio of the first target difference value to the sample pronunciation score standard deviation as a first target ratio; the sample pronunciation score mean value and the sample pronunciation score standard deviation are determined based on the sample pronunciation score corresponding to the sample audio data set; and determining the product of the first target ratio and the target pronunciation score standard deviation as a first target product, and determining the sum of the first target product and the target pronunciation score mean as the target pronunciation quality score.

In one embodiment, when obtaining a target looseness adjustment parameter corresponding to a target looseness identifier in a voice quality recognition request, where the target looseness adjustment parameter includes a looseness adjustment parameter corresponding to a first looseness identifier and a looseness adjustment parameter corresponding to a second looseness identifier, and performing score adjustment on a pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matching the target looseness identifier, the processor 1001 specifically performs the following steps:

if the target looseness identifier is a non-integer, acquiring a minimum integer interval in which the target looseness identifier is located, determining a minimum integer value in the minimum integer interval as a first looseness identifier, and determining a maximum integer value in the minimum integer interval as a second looseness identifier; acquiring a first weight corresponding to the first looseness identifier and a second weight corresponding to the second looseness identifier; according to the first weight and the second weight, carrying out weighted summation on the width adjustment parameter corresponding to the first width identification and the width adjustment parameter corresponding to the second width identification to obtain a fused width adjustment parameter; and carrying out score adjustment on the pronunciation quality score according to the fusion looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier.

In an embodiment, when the processor 1001 performs weighted summation on the slack adjustment parameter corresponding to the first slack identifier and the slack adjustment parameter corresponding to the second slack identifier according to the first weight and the second weight to obtain the fused slack adjustment parameter, the following steps are specifically performed:

according to the first weight and the second weight, carrying out weighted summation on the pronunciation score mean value corresponding to the first looseness identifier and the pronunciation score mean value corresponding to the second looseness identifier to obtain a fusion pronunciation score mean value; according to the square value of the first weight and the square value of the second weight, carrying out weighted summation on the square value of the pronunciation scoring standard deviation corresponding to the first loose degree identification and the square value of the pronunciation scoring standard deviation corresponding to the second loose degree identification to obtain the square value of the fusion pronunciation scoring standard deviation, and acquiring the fusion pronunciation scoring standard deviation from the square value of the fusion pronunciation scoring standard deviation; and determining the fusion pronunciation score mean value and the fusion pronunciation score standard deviation as a fusion width adjustment parameter.

acquiring a looseness parameter set; the looseness parameter set comprises interpolation tables respectively corresponding to at least two looseness identifications; in the loose degree parameter set, acquiring a target interpolation table corresponding to a target loose degree identification in the voice quality identification request; and carrying out score adjustment on the pronunciation quality score according to the target interpolation table to obtain a target pronunciation quality score matched with the target looseness identification.

In one embodiment, when the processor 1001 performs score adjustment on the pronunciation quality score according to the target interpolation table to obtain a target pronunciation quality score matching the target looseness identifier, the following steps are specifically performed:

acquiring a smooth transformation interval of the pronunciation quality score in the target interpolation table, and acquiring a maximum original pronunciation quality score and a minimum original pronunciation quality score in the smooth transformation interval; acquiring a maximum adjustment pronunciation quality score mapped by the maximum original pronunciation quality score and a minimum adjustment pronunciation quality score mapped by the minimum original pronunciation quality score in a smooth transformation interval; and mapping the pronunciation quality score as a target pronunciation quality score according to the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score.

if the target looseness identifier is a non-integer, acquiring a minimum integer interval in which the target looseness identifier is located, determining a minimum integer value in the minimum integer interval as a third looseness identifier, and determining a maximum integer value in the minimum integer interval as a fourth looseness identifier; acquiring a third weight corresponding to the third looseness identifier and a fourth weight corresponding to the fourth looseness identifier; grading and adjusting the pronunciation quality score according to an interpolation table corresponding to the third looseness identifier to obtain a first interpolation pronunciation quality score; grading and adjusting the pronunciation quality score according to an interpolation table corresponding to the fourth looseness identifier to obtain a second interpolation pronunciation quality score; and according to the third weight and the fourth weight, carrying out weighted summation on the first interpolation pronunciation quality score and the second interpolation pronunciation quality score to obtain a target pronunciation quality score matched with the target looseness identifier.

It should be understood that the computer device 1000 described in this embodiment of the present application may perform the description of the data processing method in the embodiment corresponding to any one of fig. 3, fig. 4, fig. 5, fig. 6a, fig. 7, and fig. 8, and may also perform the description of the computer device in the embodiment corresponding to fig. 8, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Please refer to fig. 11, which is a schematic structural diagram of another computer device according to an embodiment of the present application. As shown in fig. 11, the computer device may be applied to the terminal device in the embodiment corresponding to fig. 1. The computer device 1100 includes: the processor 1101, the network interface 1104 and the memory 1105, the computer device 1100 may further include: a user interface 1103, and at least one communication bus 1102. Wherein a communication bus 1102 is used to enable connective communication between these components. The user interface 1103 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1104 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1104 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1104 may optionally be at least one memory device located remotely from the processor 1101. As shown in fig. 11, the memory 1104, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the computer device 1100 shown in fig. 11, the network interface 1104 may provide a network communication function for communicating with a server; while user interface 1103 is primarily used to provide an interface for user input; and the processor 1101 may be configured to invoke a device control application stored in the memory 1104 to implement:

the processor 1101 acquires the audio data and the target latitude identification, and generates a voice quality recognition request comprising the audio data and the target latitude identification; sending the voice quality recognition request to a server so that the server acquires a target looseness adjustment parameter corresponding to a target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier; the pronunciation quality score is generated according to the audio characteristics corresponding to the audio data in the voice quality recognition request; and outputting the target pronunciation quality score returned by the server.

In an embodiment, when the terminal device acquires the audio data and the target latitude identifier and generates a voice quality recognition request including the audio data and the target latitude identifier, the processor 1101 specifically performs the following steps:

the terminal equipment responds to the width input operation aiming at the voice input page and determines the input parameters as target width identification; and responding to audio acquisition operation aiming at the voice input page, acquiring audio data, and generating a voice quality identification request containing the audio data and the target looseness identifier.

It should be understood that the computer device 1100 described in this embodiment of the present application may perform the description of the data processing method in the embodiment corresponding to any one of fig. 3, fig. 4, fig. 5, fig. 6a, fig. 7, and fig. 8, and may also perform the description of the computer device in the embodiment corresponding to fig. 8, which is not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Further, here, it is to be noted that: an embodiment of the present application further provides a computer storage medium, where a computer program executed by the aforementioned computer device is stored in the computer storage medium, and the computer program includes program instructions, and when the processor executes the program instructions, the description of the data processing method in any one of the embodiments corresponding to fig. 2, fig. 3, fig. 4, fig. 5, fig. 6a, fig. 7, and fig. 8 can be executed, so that details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium referred to in the present application, reference is made to the description of the embodiments of the method of the present application.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims

1. A data processing method, comprising:

extracting audio features corresponding to the audio data in the voice quality recognition request, and generating pronunciation quality scores corresponding to the audio data according to the audio features;

and acquiring a target looseness adjustment parameter corresponding to the target looseness identification in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identification.

2. The method according to claim 1, wherein the extracting an audio feature corresponding to the audio data in the speech quality recognition request and generating a pronunciation quality score corresponding to the audio data according to the audio feature comprises:

extracting acoustic features corresponding to audio data in the voice quality recognition request, recognizing text information of the audio data, extracting text features corresponding to the text information, determining the acoustic features and the text features as audio features, and inputting the audio features into a voice quality recognition model;

and carrying out convolution processing on the audio features through the voice quality recognition model to obtain audio hidden features, inputting the audio hidden features into a classification layer in the voice quality recognition model, and outputting pronunciation quality scores through the classification layer.

3. The method of claim 1, wherein the target ease adjustment parameter comprises a target pronunciation score mean and a target pronunciation score standard deviation; the acquiring a target looseness adjustment parameter corresponding to the target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier includes:

in the popularity parameter set, acquiring the target pronunciation score mean value and the target pronunciation score standard deviation corresponding to the target popularity identification in the voice quality recognition request;

4. The method according to claim 3, wherein said adjusting the pronunciation quality score according to the target pronunciation score mean and the target pronunciation score standard deviation to obtain a target pronunciation quality score matching the target looseness identification comprises:

determining the difference value between the pronunciation quality score and the mean value of the sample pronunciation score as a first target difference value, and determining the ratio of the first target difference value to the standard deviation of the sample pronunciation score as a first target ratio; the sample pronunciation score mean and the sample pronunciation score standard deviation are determined based on the sample pronunciation score corresponding to the sample audio data set;

5. The method of claim 1, wherein the target slack adjustment parameter comprises a slack adjustment parameter corresponding to a first slack identification and a slack adjustment parameter corresponding to a second slack identification;

the acquiring a target looseness adjustment parameter corresponding to the target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier includes:

according to the first weight and the second weight, carrying out weighted summation on a looseness adjustment parameter corresponding to the first looseness identifier and a looseness adjustment parameter corresponding to the second looseness identifier to obtain a fused looseness adjustment parameter;

and carrying out score adjustment on the pronunciation quality score according to the fusion looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identification.

6. The method of claim 5, wherein the slack adjustment parameter comprises a mean pronunciation score and a standard deviation pronunciation score;

the weighting and summing the slack adjustment parameter corresponding to the first slack identifier and the slack adjustment parameter corresponding to the second slack identifier according to the first weight and the second weight to obtain a fused slack adjustment parameter includes:

according to the square value of the first weight and the square value of the second weight, carrying out weighted summation on the square value of the pronunciation score standard deviation corresponding to the first loose degree identification and the square value of the pronunciation score standard deviation corresponding to the second loose degree identification to obtain a square value of a fusion pronunciation score standard deviation, and obtaining a fusion pronunciation score standard deviation from the square value of the fusion pronunciation score standard deviation;

7. The method of claim 1, wherein the target slack adjustment parameter comprises a target interpolation table; the acquiring a target looseness adjustment parameter corresponding to the target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier includes:

in the looseness parameter set, acquiring a target interpolation table corresponding to the target looseness identifier in the voice quality identification request;

8. The method of claim 7, wherein the adjusting the pronunciation quality score according to the target interpolation table to obtain a target pronunciation quality score matching the target ease identification comprises:

acquiring a maximum adjustment pronunciation quality score mapped by the maximum original pronunciation quality score and a minimum adjustment pronunciation quality score mapped by the minimum original pronunciation quality score in the smooth transformation interval;

and mapping the pronunciation quality score to the target pronunciation quality score according to the maximum original pronunciation quality score, the maximum adjusted pronunciation quality score, the minimum original pronunciation quality score and the minimum adjusted pronunciation quality score.

9. The method of claim 1, wherein the target slack adjustment parameter comprises an interpolation table corresponding to a third slack identification and an interpolation table corresponding to a fourth slack identification; the acquiring a target looseness adjustment parameter corresponding to the target looseness identifier in the voice quality recognition request, and performing score adjustment on the pronunciation quality score according to the target looseness adjustment parameter to obtain a target pronunciation quality score matched with the target looseness identifier includes:

10. A computer device, comprising: a processor, a memory, and a network interface;

the processor is coupled to the memory and the network interface, wherein the network interface is configured to provide data communication functionality, the memory is configured to store program code, and the processor is configured to invoke the program code to perform the method of any of claims 1-9.