CN115175233A

CN115175233A - Voice quality evaluation method and device, electronic equipment and storage medium

Info

Publication number: CN115175233A
Application number: CN202210787995.5A
Authority: CN
Inventors: 杨飞虎; 刘贤松; 欧大春; 张忠平; 许国平; 陈旻; 张硕伟; 佘士钊; 石旭荣; 李珊珊
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2022-07-06
Filing date: 2022-07-06
Publication date: 2022-10-11

Abstract

The application provides a voice quality evaluation method, a voice quality evaluation device, electronic equipment and a storage medium, and relates to the field of communication. The method comprises the following steps: acquiring call ticket data of a terminal to be evaluated in a time period to be evaluated; obtaining feature data of the terminal to be evaluated from the call ticket data of the terminal to be evaluated; wherein the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters; inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained. According to the scheme, the voice quality can be monitored and analyzed.

Description

Voice quality evaluation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of communications, and in particular, to a method and an apparatus for evaluating voice quality, an electronic device, and a storage medium.

Background

With the development of mobile network communication technology, quality optimization of data and voice services is increasingly important, for example, in network optimization and evaluation, evaluation of voice service quality is one of very important indexes.

The conventional drive test needs to be provided with professional testers, drive test instruments, vehicles and the like, and the professional testers drive to pass through a target route to carry out on-site test so as to obtain an evaluation result of a voice service instruction. Therefore, the traditional drive test mode has certain requirements on the skills of personnel, needs to be provided with professional equipment and vehicles, collects data on site for evaluation, is low in evaluation efficiency, and consumes a large amount of time and cost.

Disclosure of Invention

The application provides a voice quality assessment method, a voice quality assessment device, electronic equipment and a storage medium, which are used for realizing convenient and rapid voice quality assessment.

In one aspect, the present application provides a speech quality assessment method, including: acquiring call ticket data of a terminal to be evaluated in a time period to be evaluated; obtaining feature data of the terminal to be evaluated from the call ticket data of the terminal to be evaluated; the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters; inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained.

In one embodiment, the method further comprises: establishing an initial model based on an extreme gradient lifting algorithm; acquiring training data, wherein the training data comprises voice quality evaluation results of all test terminals and characteristic data corresponding to the voice quality evaluation results of all the test terminals; and training the initial model based on the training data until the voice quality evaluation model is obtained.

In some embodiments, the method further comprises: acquiring training drive test data of each test terminal at each moment, wherein the training drive test data comprises an identifier of the test terminal and a voice quality evaluation result of the test terminal; acquiring training call ticket data of each test terminal in each period, wherein the training call ticket data comprises an identifier of the test terminal and characteristic data of the test terminal; and aiming at each test terminal, establishing association between the voice quality evaluation result of the test terminal and the characteristic data of the test terminal based on the identification of the test terminal so as to obtain the training data.

In some embodiments, after establishing, for each test terminal, an association between the voice quality evaluation result of the test terminal and the feature data of the test terminal based on the identity of the test terminal to obtain the training data, the method further includes: pre-processing the training data, the pre-processing comprising at least one of: missing value filling, abnormal value processing, normalization processing, single hot coding processing and characteristic fusion and disassembly processing.

In some embodiments, for each test terminal, establishing an association between the voice quality evaluation result of the test terminal and the feature data of the test terminal based on the identity of the test terminal to obtain the training data includes: aiming at each test terminal, acquiring training drive test data of the test terminal at each moment and training call ticket data of the test terminal at each time period based on the identification of the test terminal; determining training ticket data corresponding to each training drive test data of the test terminal according to the corresponding time of each training drive test data of the test terminal and the corresponding time period of each training ticket data of the test terminal; the time corresponding to the training drive test data is positioned in the time period corresponding to the training ticket data corresponding to the training drive test data; aiming at each training call ticket data of the test terminal, if one training drive test data corresponding to the training call ticket data is available, establishing association between a voice quality evaluation result in the training drive test data and feature data in the training call ticket data; and if the training drive test data corresponding to the training call ticket data are multiple, taking the average value of the voice quality evaluation results in the multiple training drive test data as a final voice quality evaluation result, and establishing the association between the final voice quality evaluation result and the feature data in the training call ticket data.

In some embodiments, the training the initial model based on the training data until the speech quality assessment model is obtained includes: dividing the training data to obtain a training set and a test set; training the initial model based on the training data in the training set to obtain a current model; inputting the feature data in the test set into a current model to obtain a first voice quality evaluation result output by the model; and calculating a correlation coefficient between the first voice quality evaluation result and the corresponding voice quality evaluation result in the test set until the current correlation coefficient meets a preset requirement, and judging that the training is finished.

In another aspect, the present application provides a speech quality assessment apparatus, including: the first acquisition module is used for acquiring the call ticket data of the terminal to be evaluated in the time period to be evaluated; the first obtaining module is further configured to obtain feature data of the terminal to be evaluated from the ticket data of the terminal to be evaluated; the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters; the processing module is used for inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, wherein the voice quality evaluation result is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained.

In some embodiments, the apparatus further comprises: the model establishing module is used for establishing an initial model based on an extreme gradient lifting algorithm; the second acquisition module is used for acquiring training data, wherein the training data comprises voice quality evaluation results of all the test terminals and characteristic data corresponding to the voice quality evaluation results of all the test terminals; and the training module is used for training the initial model based on the training data until the voice quality evaluation model is obtained.

In some embodiments, the second obtaining module includes: the system comprises an acquisition unit, a processing unit and a control unit, wherein the acquisition unit is used for acquiring training drive test data of each test terminal at each moment, and the training drive test data comprises an identifier of the test terminal and a voice quality evaluation result of the test terminal; the acquisition unit is further used for acquiring training call ticket data of each test terminal in each time period, wherein the training call ticket data comprises an identifier of the test terminal and feature data of the test terminal; and the association unit is used for establishing association between the voice quality evaluation result of the test terminal and the characteristic data of the test terminal based on the identification of the test terminal aiming at each test terminal so as to obtain the training data.

In some embodiments, the apparatus further comprises: a preprocessing module for preprocessing the training data, the preprocessing including at least one of: missing value filling, abnormal value processing, normalization processing, single hot coding processing and characteristic fusion and disassembly processing.

In some embodiments, the association unit is specifically configured to: aiming at each test terminal, acquiring training drive test data of the test terminal at each moment and training call ticket data of the test terminal at each time period based on the identification of the test terminal; determining training ticket data corresponding to each training drive test data of the test terminal according to the corresponding time of each training drive test data of the test terminal and the corresponding time period of each training ticket data of the test terminal; the time corresponding to the training drive test data is positioned in the time period corresponding to the training ticket data corresponding to the training drive test data; aiming at each training call ticket data of the test terminal, if one training drive test data corresponding to the training call ticket data is available, establishing association between a voice quality evaluation result in the training drive test data and feature data in the training call ticket data; and if the training drive test data corresponding to the training call ticket data are multiple, taking the average value of the voice quality evaluation results in the multiple training drive test data as a final voice quality evaluation result, and establishing the association between the final voice quality evaluation result and the feature data in the training call ticket data.

In some embodiments, the training module is specifically configured to: dividing the training data to obtain a training set and a test set; training the initial model based on the training data in the training set to obtain a current model; inputting the feature data in the test set into a current model to obtain a first voice quality evaluation result output by the model; and calculating a correlation coefficient between the first voice quality evaluation result and the corresponding voice quality evaluation result in the test set, and judging that the training is finished until the current correlation coefficient meets a preset requirement.

In yet another aspect, the present application provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored by the memory to implement the method as previously described.

In yet another aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the method as described above when executed by a processor.

According to the voice quality evaluation method, the voice quality evaluation device, the electronic equipment and the storage medium, call ticket data of the terminal to be evaluated in the time period to be evaluated are obtained; obtaining feature data of the terminal to be evaluated from the ticket data of the terminal to be evaluated; and inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model. In the scheme of the application, the voice quality assessment model is established based on the extreme gradient lifting algorithm and trained, an accurate voice quality assessment result can be output based on input feature data, and the scheme does not need to rely on professionals and does not need to carry out field data acquisition and assessment, so that the efficiency of voice quality assessment is effectively improved, and a large amount of time and cost are saved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic view of a conventional road voice quality assessment method;

fig. 2 is a schematic flowchart of a voice quality evaluation method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a speech quality assessment method according to a second embodiment of the present application;

fig. 4 is a schematic structural diagram of a speech quality assessment apparatus according to a third embodiment of the present application;

fig. 5 is a schematic structural diagram of a speech quality assessment apparatus according to a fourth embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that the brief descriptions of the terms in the present application are only for convenience of understanding of the embodiments described below, and are not intended to limit the embodiments of the present application. These terms should be understood in their ordinary and customary meaning unless otherwise indicated.

The terms "first," "second," and the like in the description and claims of this application and in the above-described drawings are used for distinguishing between similar or analogous objects or entities and are not necessarily meant to limit the order or sequence Unless otherwise indicated (Unless second indicated). It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.

Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device. The term "module," as used herein, refers to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.

Fig. 1 is a scene diagram of a conventional road voice quality assessment method. As shown in fig. 1, an intersection 101 is a road voice quality assessment section, a vehicle 104 passes through the intersection 101, and a base station 102 is located near the intersection and is a mobile communication switching center.

Professional testers drive vehicles 104 to pass through the crossroads 101, and voice quality evaluation results can be directly measured by applying a road test device to perform field test. In an actual scene, voice service usually occurs in a road voice quality evaluation section. For example, when the pedestrian 103 passes through the intersection 101, high-definition voice services can be performed by using terminal devices, such as a mobile terminal, a tablet computer, a smart watch, and the like. Wireless terminals and/or wired terminals may also be employed. A wireless terminal may refer to a device that provides voice and/or other traffic data connectivity to a user, a handheld device having wireless connection capability, or other processing device connected to a wireless modem. A wireless terminal, which may be a mobile terminal such as a mobile phone (or called a "cellular" phone) and a computer having a mobile terminal, for example, a portable, pocket, hand-held, computer-included or vehicle-mounted mobile device, may also communicate with one or more core Network devices via a Radio Access Network (RAN), and exchange languages and/or data with the RAN. For another example, the Wireless terminal may be a Personal Communication Service (PCS) phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA), and the like. A wireless Terminal may also be referred to as a system, a Subscriber Unit (Subscriber Unit), a Subscriber Station (Subscriber Station), a Mobile Station (Mobile), a Remote Station (Remote Station), a Remote Terminal (Remote Terminal), an Access Terminal (Access Terminal), a User Terminal (User Terminal), a User Agent (User Agent), and a User Equipment (User Device or User Equipment), which are not limited herein. The terminal device transmits the voice communication information to the base station 102 through the uplink channel, and the base station 102 transmits the voice communication information to other users through the downlink channel. The application drive test device can test the voice quality evaluation result in the voice quality evaluation road section based on the voice service condition in the voice quality evaluation road section.

However, the implementation of this solution requires the provision of specialized testers, drive test equipment and vehicles, and the efficiency of voice quality assessment is low, requiring a lot of time and cost. Therefore, the voice quality assessment model is established, the voice quality assessment result similar to that of the traditional drive test is obtained by analyzing the user data through the model, and the voice quality assessment efficiency is improved while the time and the cost are saved.

The technical means of the present application and the technical means of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. In the description of the present application, unless otherwise explicitly specified and defined, each term should be understood broadly in the art. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 2 is a schematic flow chart of a voice quality assessment method according to an embodiment of the present application, where an implementation subject of the embodiment may be a voice quality assessment apparatus, as shown in fig. 2, the method includes:

s201, acquiring call bill data of a terminal to be evaluated in a time period to be evaluated;

s202, obtaining feature data of the terminal to be evaluated from the call ticket data of the terminal to be evaluated; the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters;

s203, inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained.

The call ticket Data may be External Data Representation (XDR) call ticket Data. The data types in the XDR ticket data are many, such as basic information types, for example, service start time, session ID, etc.; or, for example, a user information class, such as a user identifier, a mobile station device identifier, a network access identifier, etc.; or, for example, the quality information class, for example, time delay, number of out-of-order messages, number of retransmitted bytes, record closing reason, etc.; or, for example, a traffic information class, such as a number of messages, a number of bytes, etc. In practical application, the XDR call ticket acquisition system can flexibly generate XDR call ticket data in a time period to be evaluated according to needs.

In practical applications, the execution subject of the speech quality assessment method may be a speech quality assessment apparatus, and the speech quality assessment apparatus may be implemented in various ways, for example, by a computer program, for example, application software; or, for example, a chip, etc. May also be implemented as a medium storing an associated computer program, e.g., a usb disk, a cloud disk, etc.; still alternatively, the present invention may be implemented by a physical device, such as a server, etc., in which the relevant computer program is integrated or installed.

In practical application, the characteristic data is part of data related to the voice quality evaluation method in the call ticket data. For example, factors such as speech coding, delay, packet loss, and jitter affect speech quality. Specifically, in S202, feature data of the terminal to be evaluated is obtained from the ticket data of the terminal to be evaluated; wherein the characteristic data comprises a real-time transport protocol communication parameter and a time delay parameter. For example, the real-time transport protocol communication parameters may include: real-time transmission protocol data packet counting, real-time transmission protocol jitter, real-time transmission protocol packet loss number, real-time transmission control protocol data packet counting, real-time transmission control protocol jitter and real-time transmission control protocol packet loss number; the delay parameters may include: average codec rate, loop delay, average delay, etc.

Correspondingly, in S203, inputting the feature data of the terminal to be evaluated to a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained. For example, the characteristic data of the terminal to be evaluated may include: real-time transport protocol communication parameters, such as real-time transport protocol jitter, real-time transport protocol packet loss; or a delay parameter, such as average codec rate, loop delay. And inputting the characteristic data of the terminal to be evaluated into the voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in the time period to be evaluated, which is output by the voice quality evaluation model.

In the voice quality evaluation method provided by the embodiment, call ticket data of a terminal to be evaluated in a time period to be evaluated is acquired; obtaining feature data of the terminal to be evaluated from the ticket data of the terminal to be evaluated; the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters; inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained. In the embodiment, the voice quality evaluation result of the terminal to be evaluated in the time period to be evaluated, which is output by the voice quality evaluation model, is obtained by inputting the feature data of the terminal to be evaluated in the time period to be evaluated to the voice quality evaluation model, so that the voice quality evaluation result is obtained by the voice quality evaluation model, the efficiency of voice quality evaluation is effectively improved, and the cost of obtaining the voice quality evaluation result by the traditional drive test is saved.

Fig. 3 is a schematic flow chart of a speech quality assessment method according to a second embodiment of the present application, and as shown in fig. 3, the method further includes:

s301, establishing an initial model based on an extreme gradient lifting algorithm;

s302, training data are obtained, wherein the training data comprise voice quality evaluation results of all test terminals and feature data corresponding to the voice quality evaluation results of all the test terminals;

s303, training the initial model based on the training data until the voice quality evaluation model is obtained.

Wherein the voice quality assessment result is obtained through traditional drive tests. In practical applications, the voice quality evaluation result may be a Mean Opinion Score (MOS for short). For example, the 3-Fold Quality assessment (3 Fold Quality assessment Of Speech In Telecommunications, abbreviated as 3 QUEST) standard Of Telecommunications Speech applies three parameters Of Speech Mean Opinion Score (abbreviated as SMOS), noise Mean Opinion Score (abbreviated as NMOS), global Mean Opinion Score (abbreviated as GMOS) to express Speech naturalness, noise presence and overall Quality, and Speech Quality In three dimensions. The MOS value range is 0-5, and the larger the MOS value is, the better the voice quality is.

In practical application, there are many feature data affecting voice quality, and obviously, a plurality of feature data are considered together comprehensively to obtain a voice quality evaluation result more accurately. And the extreme gradient lifting algorithm adopts an iterative serial form to obtain a predicted value of a voice quality evaluation result. And constructing a new decision tree by each iteration of the extreme gradient lifting algorithm, wherein each decision tree represents a characteristic data type, a new predicted value is obtained by each iteration, and the new predicted value is equal to the sum of the predicted value of the current decision tree and the previous predicted value.

Specifically, an initial model is established based on an extreme gradient lifting algorithm in S301; for example, assume that the i-th sample x has been iterated t-1 times _i The predicted value expression of (c) is as follows:

where k is a tree of the decision tree, in the next iteration, a model with the minimum loss of the newly generated integrated model on the training set is trained, and the ith iteration and the ith sample x are _i The predicted value expression of (c) is as follows:

the objective function of the extreme gradient boosting algorithm includes a loss function L and a regularization term Ω, and the objective function can be written as:

i sample x according to t iteration _i Predicted value f of _t (x _i ) The objective function at this time can be written as:

carrying out Taylor second-order expansion on the target function by using a Taylor formula to obtain

Wherein g is _i As the first derivative of the loss function, h _i The second derivative of the loss function. When the loss function takes the square loss, the objective function is approximated as:

further, the basis functions are taken as a decision tree model f _t (X)＝ω _q(x) Q (x) represents a leaf node where the sample x is located, the number of the leaf nodes of the decision tree is set to be T, the value determines the complexity of the decision tree, the larger the value is, the more complex the model is, and at the moment, the regular term of the objective function is represented as:

since each sample x _i All the leaf nodes are finally fallen on the leaf nodes, and each leaf node isWill contain multiple samples and thus traverse all samples x _i The loss solving function is equivalent to the loss solving function by traversing all leaf nodes, and the sample set contained in the jth leaf node is set as I _j = i, the loss function is:

to simplify the formula, G is defined _i ＝∑ _i∈I ，g _i ，H _i ＝∑ _i∈I h _i The objective function is:

then to omega _j And (3) solving a first derivative to make the first derivative be 0, and obtaining a weight value corresponding to the leaf node j and an optimal objective function as follows:

and constructing a t decision tree according to the weight corresponding to the leaf node j and the optimal objective function, and obtaining a current predicted value according to the node of the t decision tree.

In this embodiment, an initial model is established based on an extreme gradient algorithm. And training data including the voice quality evaluation results of the test terminals and the feature data corresponding to the voice quality evaluation results of the test terminals are obtained, and the initial model is trained based on the training data to obtain a voice quality evaluation model. The voice quality evaluation model is used for the voice quality evaluation result, so that the cost of obtaining the voice quality evaluation result through actual drive test is saved.

In addition, in order to establish training data in consideration of different voice quality evaluation results of different test terminals, in an embodiment, S302 specifically includes: acquiring training drive test data of each test terminal at each moment, wherein the training drive test data comprises an identifier of the test terminal and a voice quality evaluation result of the test terminal; acquiring training call ticket data of each test terminal in each period, wherein the training call ticket data comprises an identifier of the test terminal and characteristic data of the test terminal; and aiming at each test terminal, establishing association between the voice quality evaluation result of the test terminal and the characteristic data of the test terminal based on the identification of the test terminal so as to obtain the training data.

The identification of the test terminal is the user identification of the test terminal user. Including but not limited to: international Mobile Subscriber Identity (IMSI), mobile Subscriber Number (MSISDN), temporary Mobile Subscriber Identity (TMSI).

In practical application, training drive test data of a plurality of test terminals and training call ticket data of the plurality of test terminals may be obtained, the training drive test data and the training call ticket data both include identifiers of the test terminals, and a voice quality evaluation result and feature data of the same test terminal are associated according to the identifiers of the test terminals. For example, the obtained training drive test data includes training drive test data of the test terminal 1 (including an identifier and a voice quality evaluation result of the test terminal 1) and training drive test data of the test terminal 2 (including an identifier and a voice quality evaluation result of the test terminal 2), and the obtained training call ticket data includes training call ticket data of the test terminal 2 (including an identifier and feature data of the test terminal 2) and training call ticket data of the test terminal 3 (including an identifier and feature data of the test terminal 3). Based on the identifier of the test terminal, the drive test data and the call ticket data corresponding to the same identifier may be associated, for example, the voice quality evaluation result in the training drive test data of the test terminal 2 and the feature data in the training call ticket data of the test terminal 2 are associated.

In the embodiment, according to the obtained test terminal identification in the training drive test data and the training call ticket data, the voice quality evaluation result and the characteristic data corresponding to the same test terminal are associated. Therefore, the characteristic data corresponding to the test terminal is input into the initial model, a first voice quality evaluation result corresponding to the same test terminal can be obtained, the voice quality evaluation result and the first voice quality evaluation result of the same test terminal are compared, and a more reliable comparison effect can be obtained.

In addition, in order to handle the possible abnormal data in the training data, in an embodiment, for each test terminal, based on the identity of the test terminal, establishing an association between the voice quality evaluation result of the test terminal and the feature data of the test terminal to obtain the training data, further includes: pre-processing the training data, the pre-processing comprising at least one of: missing value filling, abnormal value processing, normalization processing, single hot coding processing and characteristic fusion and disassembly processing.

In practical application, in the process of training the initial model, after the training data is obtained, the training data is preprocessed in order to obtain a better training effect. For example, the training data has a data missing problem, and the missing value can be filled by applying methods such as pre-filling, post-filling, mean filling, mode filling, median filling and the like. Or, the abnormal value in the training data may cause a serious deviation to the training result of the initial model, and the abnormal value may be processed by applying methods such as deletion, mean value correction, regression interpolation, multiple interpolation, and the like. Or, the training data is in different ranges, and the normalization processing is applied to limit the training data to a certain range. Or, the one-hot encoding process can expand discrete data in the training data to the Euclidean space, so as to calculate the similarity between the training data. Or, the performance of model training can be improved by fusing a plurality of data in the training data or disassembling certain data, and the performance of model training is improved by applying characteristic fusion and disassembly processing. The pretreatment methods described above may be performed alone or in combination.

In the embodiment, the training data is preprocessed, so that reliable training data can be obtained, and the cost of subsequent training caused by abnormal data is saved.

In addition, considering that the training call ticket data in a certain period may correspond to the training drive test data at multiple times, in an embodiment, for each test terminal, based on the identifier of the test terminal, establishing an association between the voice quality evaluation result of the test terminal and the feature data of the test terminal to obtain the training data specifically includes:

aiming at each test terminal, acquiring training drive test data of the test terminal at each moment and training call bill data of the test terminal at each time period based on the identification of the test terminal;

determining training ticket data corresponding to each training drive test data of the test terminal according to the corresponding time of each training drive test data of the test terminal and the corresponding time period of each training ticket data of the test terminal; the time corresponding to the training drive test data is positioned in the time period corresponding to the training ticket data corresponding to the training drive test data;

aiming at each training call ticket data of the test terminal, if one training drive test data corresponding to the training call ticket data is available, establishing association between a voice quality evaluation result in the training drive test data and feature data in the training call ticket data; and if the training drive test data corresponding to the training call ticket data are multiple, taking the average value of the voice quality evaluation results in the multiple training drive test data as a final voice quality evaluation result, and establishing the association between the final voice quality evaluation result and the feature data in the training call ticket data.

In practical application, the voice quality evaluation result is collected at fixed intervals. For example, the operator test specification specifies that, in the same test terminal, the drive test apparatus acquires an MOS value at intervals of 9 seconds. For example, for test terminal a,10 o ' clock 0 min 0 sec measures MOS1, 10 o ' clock 0 min 9 sec measures MOS2, and 10 o ' clock 0 min 18 sec measures MOS3.

In practical application, for the same test terminal, the call ticket data is collected through a deep message system, and is divided into training call ticket data with fixed time length in the process of being transmitted to a call ticket server through a unified collection and instruction adaptation platform. For example, the training call ticket data 1 of the test terminal a in the time period from 10 o 'clock 0 min 0 sec to 10 o' clock 0 min 12 sec and the training call ticket data 2 in the time period from 10 o 'clock 0 min 12 sec to 10 o' clock 0 min 24 sec are obtained. It should be noted that the duration of the training data period is consistent with the duration of the period to be evaluated, and the duration may be specified as needed.

Specifically, for each training call ticket data of the test terminal, if the number of training drive test data corresponding to the training call ticket data is one, establishing association between a voice quality evaluation result in the training drive test data and feature data in the training call ticket data; and if the training drive test data corresponding to the training call ticket data are multiple, taking the average value of the voice quality evaluation results in the multiple training drive test data as a final voice quality evaluation result, and establishing the association between the final voice quality evaluation result and the feature data in the training call ticket data. With the above example, the acquisition moments of the MOS1 and the MOS2 both fall within the time period corresponding to the training ticket data 1; the acquisition time of the MOS3 falls within the time period corresponding to the training call ticket data 2. The MOS value corresponding to the characteristic data in the training call ticket data 1 is the average value of MOS1 and MOS 2; and the MOS value corresponding to the characteristic data in the training call ticket data 2 is MOS3.

In the embodiment, for the same test terminal, a one-to-one correspondence relationship between the feature data and the voice evaluation result in a certain period is established, so that the first voice evaluation result corresponding to the feature data output by the model in a certain period can be conveniently compared with the voice evaluation result, and the model training efficiency is improved.

In addition, in order to determine whether the model is trained, in an embodiment, S303 specifically includes: dividing the training data to obtain a training set and a test set; training the initial model based on the training data in the training set to obtain a current model; inputting the feature data in the test set into a current model to obtain a first voice quality evaluation result output by the model; and calculating a correlation coefficient between the first voice quality evaluation result and the corresponding voice quality evaluation result in the test set until the current correlation coefficient meets a preset requirement, and judging that the training is finished.

In practical application, the magnitude of the correlation coefficient value is a basis for determining whether the current model is trained, and the predetermined requirement of the correlation coefficient value can be set in advance. The larger the value of the correlation coefficient is, the higher the similarity between the first speech quality assessment result and the corresponding speech quality assessment result in the test set is. For example, when the magnitude of the correlation coefficient is greater than 2, the current model meets the application requirements and completes training.

Specifically, a correlation coefficient between the first speech quality assessment result and the corresponding speech quality assessment result in the test set is calculated, and the training is judged to be completed until the current correlation coefficient meets a preset requirement. For example, if the correlation coefficient is 1.9 and the current model does not meet the preset requirement, the current model continues to be trained based on the training data in the training set. For another example, if the correlation coefficient is 2.4 and the current model meets the preset requirement, it is determined that the training is completed.

In the embodiment, whether the current model completes training is judged by calculating the correlation coefficient between the first voice quality evaluation result and the corresponding voice quality evaluation result in the test set, so that the trained voice quality evaluation can be obtained in time, and the cost caused by multiple times of training is avoided.

In the speech quality assessment method provided by this embodiment, an initial model is established based on an extreme gradient boosting algorithm; acquiring training data, wherein the training data comprises voice quality evaluation results of all test terminals and characteristic data corresponding to the voice quality evaluation results of all the test terminals; and training the initial model based on the training data until a voice quality evaluation model is obtained. In the initial model training process based on the extreme gradient lifting algorithm, the initial model is trained by using the training data, a voice quality evaluation model capable of replacing actual drive test is obtained, the voice quality evaluation model is used for obtaining a voice quality evaluation result, and the cost of obtaining the voice quality evaluation result in the actual drive test is saved.

Fig. 4 is a schematic structural diagram of a speech quality assessment apparatus according to a third embodiment of the present application, and as shown in fig. 4, the apparatus includes:

the first obtaining module 41 is used for obtaining the ticket data of the terminal to be evaluated in the time period to be evaluated;

the first obtaining module 41 is further configured to obtain feature data of the terminal to be evaluated from the ticket data of the terminal to be evaluated; wherein the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters;

the processing module 42 is configured to input the feature data of the terminal to be evaluated to a voice quality evaluation model, and obtain a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained.

The call ticket Data may be External Data Representation (XDR) call ticket Data. In practical application, the XDR call ticket acquisition system can flexibly generate XDR call ticket data in a time period to be evaluated according to needs.

In practical application, the characteristic data is part of data related to the voice quality evaluation method in the call ticket data. And the processing module 42 inputs the characteristic data of the terminal to be evaluated into the voice quality evaluation model, and obtains a voice quality evaluation result of the terminal to be evaluated in the time period to be evaluated, which is output by the voice quality evaluation model.

In the voice quality evaluation device provided by this embodiment, call ticket data of a terminal to be evaluated in a time period to be evaluated is acquired; obtaining feature data of the terminal to be evaluated from the ticket data of the terminal to be evaluated; the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters; inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained. In the embodiment, the voice quality evaluation result of the terminal to be evaluated in the time period to be evaluated, which is output by the voice quality evaluation model, is obtained by inputting the feature data of the terminal to be evaluated in the time period to be evaluated into the voice quality evaluation model, so that the voice quality evaluation result is obtained through the voice quality evaluation model, the cost of obtaining the voice quality evaluation result through the traditional drive test is saved, and effective voice quality monitoring and analysis are realized.

Fig. 5 is a schematic structural diagram of a speech quality assessment apparatus according to a fourth embodiment of the present application, and as shown in fig. 5, the apparatus further includes:

a model establishing module 51, configured to establish an initial model based on an extreme gradient lifting algorithm;

a second obtaining module 52, configured to obtain training data, where the training data includes a voice quality evaluation result of each test terminal and feature data corresponding to the voice quality evaluation result of each test terminal;

and a training module 53, configured to train the initial model based on the training data until the speech quality assessment model is obtained.

Wherein the voice quality assessment result is obtained through traditional drive tests. In practical applications, the speech quality assessment result may be a mean opinion value. The MOS value range is 0-5, and the larger the MOS value is, the better the voice quality is.

In practical application, there are many feature data affecting the voice quality, and obviously, the voice quality evaluation result can be obtained more accurately by comprehensively considering a plurality of feature data together. And the extreme gradient lifting algorithm adopts an iterative serial form to obtain a predicted value of a voice quality evaluation result. And constructing a new decision tree by each iteration of the extreme gradient lifting algorithm, wherein each decision tree represents a characteristic data type, a new predicted value is obtained by each iteration, and the new predicted value is equal to the sum of the predicted value of the current decision tree and the previous predicted value.

In addition, in order to establish training data in consideration of different voice quality evaluation results of different test terminals, in one embodiment, the second obtaining module 52 includes: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training drive test data of each test terminal at each moment, and the training drive test data comprises an identifier of the test terminal and a voice quality evaluation result of the test terminal; the acquisition unit is further used for acquiring training call ticket data of each test terminal in each time period, wherein the training call ticket data comprises an identifier of the test terminal and feature data of the test terminal; and the association unit is used for establishing association between the voice quality evaluation result of the test terminal and the characteristic data of the test terminal based on the identification of the test terminal aiming at each test terminal so as to obtain the training data.

The identification of the test terminal is the user identification of the test terminal user. In practical application, training drive test data of a plurality of test terminals and training call ticket data of the plurality of test terminals may be obtained, the training drive test data and the training call ticket data both include identifiers of the test terminals, and a voice quality evaluation result and feature data of the same test terminal are associated according to the identifiers of the test terminals.

In the embodiment, according to the obtained test terminal identification in the training drive test data and the training call ticket data, the voice quality evaluation result and the feature data corresponding to the same test terminal are associated. Therefore, the characteristic data corresponding to the test terminal is input into the initial model, a first voice quality evaluation result corresponding to the same test terminal can be obtained, the voice quality evaluation result and the first voice quality evaluation result of the same test terminal are compared, and a more reliable comparison effect can be obtained.

Furthermore, in order to handle possible abnormal data in the training data, in one embodiment, the apparatus further comprises: a preprocessing module configured to preprocess the training data, the preprocessing including at least one of: missing value filling, abnormal value processing, normalization processing, single hot coding processing and characteristic fusion and disassembly processing.

In practical application, in the process of training the initial model, after the training data is obtained, the training data is preprocessed in order to obtain a better training effect. The pretreatment methods described above may be performed alone or in combination. In the embodiment, the training data is preprocessed, so that reliable training data can be obtained, and the cost of subsequent training caused by abnormal data is saved.

In addition, in consideration of a situation that training ticket data in a certain period may correspond to training drive test data at multiple times, in an embodiment, the association unit is specifically configured to: aiming at each test terminal, acquiring training drive test data of the test terminal at each moment and training call ticket data of the test terminal at each time period based on the identification of the test terminal; determining training call ticket data corresponding to each piece of training drive test data of the test terminal according to the corresponding time of each piece of training drive test data of the test terminal and the corresponding time period of each piece of training call ticket data of the test terminal; the time corresponding to the training drive test data is positioned in the time period corresponding to the training ticket data corresponding to the training drive test data; aiming at each training call ticket data of the test terminal, if one training drive test data corresponding to the training call ticket data is available, establishing association between a voice quality evaluation result in the training drive test data and feature data in the training call ticket data; and if the training drive test data corresponding to the training call ticket data are multiple, taking the average value of the voice quality evaluation results in the multiple training drive test data as a final voice quality evaluation result, and establishing the association between the final voice quality evaluation result and the feature data in the training call ticket data.

In practical application, the voice quality evaluation result is collected at fixed intervals. In practical application, for the same test terminal, the call ticket data is collected through a deep message system, and is divided into training call ticket data with fixed time length in the process of being transmitted to a call ticket server through a unified collection and instruction adaptation platform. It should be noted that the duration of the training data period is consistent with the duration of the period to be evaluated, and the duration may be specified as needed.

In the embodiment, for the same test terminal, the one-to-one correspondence relationship between the feature data and the voice evaluation result at a certain time period is established, so that the first voice evaluation result corresponding to the feature data output by the model at a certain time period can be compared with the voice evaluation result conveniently, and the model training efficiency is improved.

In addition, in order to determine whether the model is trained, in an embodiment, the training module 53 is specifically configured to: dividing the training data to obtain a training set and a test set; training the initial model based on the training data in the training set to obtain a current model; inputting the feature data in the test set into a current model to obtain a first voice quality evaluation result output by the model; and calculating a correlation coefficient between the first voice quality evaluation result and the corresponding voice quality evaluation result in the test set until the current correlation coefficient meets a preset requirement, and judging that the training is finished.

In practical applications, the magnitude of the correlation coefficient value is a basis for determining whether the current model is trained, and the predetermined requirement of the correlation coefficient value can be set in advance. The larger the value of the correlation coefficient is, the higher the similarity between the first speech quality assessment result and the corresponding speech quality assessment result in the test set is.

In the speech quality assessment apparatus provided in this embodiment, an initial model is established based on an extreme gradient boosting algorithm; acquiring training data, wherein the training data comprises voice quality evaluation results of all test terminals and characteristic data corresponding to the voice quality evaluation results of all the test terminals; and training the initial model based on the training data until a voice quality evaluation model is obtained. In the initial model training process based on the extreme gradient lifting algorithm, the initial model is trained by using the training data, a voice quality evaluation model capable of replacing actual drive tests is obtained, the voice quality evaluation model is used for obtaining a voice quality evaluation result, and the cost of obtaining the voice quality evaluation result in the actual drive tests is saved.

Fig. 6 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present application, and as shown in fig. 6, the electronic device includes:

a processor (processor) 61, the electronic device further comprising a memory (memory) 62; a Communication Interface 63 and bus 64 may also be included. The processor 61, the memory 62, and the communication interface 63 may communicate with each other through a bus 64. Communication interface 63 may be used for information transfer. The processor 61 may call logic instructions in the memory 64 to perform the method of the above-described embodiment.

Furthermore, the logic instructions in the memory 62 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product.

The memory 62 is used as a computer readable storage medium for storing software programs, computer executable programs, such as program instructions/modules corresponding to the methods in the embodiments of the present application. The processor 61 executes the functional application and data processing by executing the software programs, instructions and modules stored in the memory 62, namely, implements the method in the above method embodiment.

The memory 62 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal device, and the like. Further, the memory 62 may include high speed random access memory and may also include non-volatile memory.

The present application provides a non-transitory computer-readable storage medium, in which computer-executable instructions are stored, and when executed by a processor, the computer-executable instructions are used to implement the method according to the foregoing embodiments.

In an exemplary embodiment, a computer program product is also provided, comprising a computer program which, when executed by a processor, carries out the above-mentioned method.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A speech quality assessment method, comprising:

acquiring call ticket data of a terminal to be evaluated in a time period to be evaluated;

obtaining feature data of the terminal to be evaluated from the call ticket data of the terminal to be evaluated; wherein the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters;

inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model, and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, which is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained.

2. The method of claim 1, further comprising:

establishing an initial model based on an extreme gradient lifting algorithm;

acquiring training data, wherein the training data comprises voice quality evaluation results of all test terminals and feature data corresponding to the voice quality evaluation results of all the test terminals;

and training the initial model based on the training data until the voice quality evaluation model is obtained.

3. The method of claim 2, wherein the obtaining training data comprises:

acquiring training drive test data of each test terminal at each moment, wherein the training drive test data comprises an identifier of the test terminal and a voice quality evaluation result of the test terminal;

acquiring training call ticket data of each test terminal in each period, wherein the training call ticket data comprises an identifier of the test terminal and characteristic data of the test terminal;

and aiming at each test terminal, establishing association between the voice quality evaluation result of the test terminal and the characteristic data of the test terminal based on the identification of the test terminal so as to obtain the training data.

4. The method according to claim 3, wherein the establishing, for each test terminal, an association between the voice quality evaluation result of the test terminal and the feature data of the test terminal based on the identity of the test terminal to obtain the training data further comprises:

pre-processing the training data, the pre-processing comprising at least one of: missing value filling, abnormal value processing, normalization processing, single hot coding processing and characteristic fusion and disassembly processing.

5. The method according to claim 3, wherein for each test terminal, establishing an association between the voice quality evaluation result of the test terminal and the feature data of the test terminal based on the identity of the test terminal to obtain the training data comprises:

aiming at each test terminal, acquiring training drive test data of the test terminal at each moment and training call ticket data of the test terminal at each time period based on the identification of the test terminal;

6. The method according to any one of claims 2-5, wherein the training the initial model until the speech quality assessment model is obtained based on the training data comprises:

dividing the training data to obtain a training set and a test set;

training the initial model based on the training data in the training set to obtain a current model;

inputting the feature data in the test set into a current model to obtain a first voice quality evaluation result output by the model;

and calculating a correlation coefficient between the first voice quality evaluation result and the corresponding voice quality evaluation result in the test set until the current correlation coefficient meets a preset requirement, and judging that the training is finished.

7. A speech quality assessment apparatus, comprising:

the first acquisition module is used for acquiring the call ticket data of the terminal to be evaluated in the time period to be evaluated;

the first obtaining module is further configured to obtain feature data of the terminal to be evaluated from the ticket data of the terminal to be evaluated; wherein the characteristic data comprises real-time transmission protocol communication parameters and time delay parameters;

the processing module is used for inputting the characteristic data of the terminal to be evaluated into a voice quality evaluation model and obtaining a voice quality evaluation result of the terminal to be evaluated in a time period to be evaluated, wherein the voice quality evaluation result is output by the voice quality evaluation model; the voice quality evaluation model is a model which is established based on an extreme gradient lifting algorithm and is trained.

8. The apparatus of claim 7, further comprising:

the model establishing module is used for establishing an initial model based on an extreme gradient lifting algorithm;

the second acquisition module is used for acquiring training data, wherein the training data comprises voice quality evaluation results of all the test terminals and characteristic data corresponding to the voice quality evaluation results of all the test terminals;

and the training module is used for training the initial model based on the training data until the voice quality evaluation model is obtained.

9. The apparatus of claim 8, wherein the second obtaining module comprises:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring training drive test data of each test terminal at each moment, and the training drive test data comprises an identifier of the test terminal and a voice quality evaluation result of the test terminal;

the acquisition unit is further used for acquiring training call ticket data of each test terminal in each time period, wherein the training call ticket data comprises an identifier of the test terminal and feature data of the test terminal;

and the association unit is used for establishing association between the voice quality evaluation result of the test terminal and the characteristic data of the test terminal based on the identification of the test terminal aiming at each test terminal so as to obtain the training data.

10. The apparatus of claim 9, further comprising:

a preprocessing module configured to preprocess the training data, the preprocessing including at least one of: missing value filling, abnormal value processing, normalization processing, single hot coding processing and characteristic fusion and disassembly processing.

11. The apparatus according to claim 9, wherein the association unit is specifically configured to:

determining training ticket data corresponding to each training drive test data of the test terminal according to the corresponding time of each training drive test data of the test terminal and the corresponding time period of each training ticket data of the test terminal; the time corresponding to the training drive test data is positioned in the time period corresponding to the training call ticket data corresponding to the training drive test data;

12. The apparatus according to any one of claims 8-11, wherein the training module is specifically configured to:

dividing the training data to obtain a training set and a test set;

13. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-6.

14. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the method of any one of claims 1-6.