CN114358089A - Training method and device of speech evaluation model based on electroencephalogram and electronic equipment - Google Patents

Training method and device of speech evaluation model based on electroencephalogram and electronic equipment Download PDF

Info

Publication number
CN114358089A
CN114358089A CN202210081634.9A CN202210081634A CN114358089A CN 114358089 A CN114358089 A CN 114358089A CN 202210081634 A CN202210081634 A CN 202210081634A CN 114358089 A CN114358089 A CN 114358089A
Authority
CN
China
Prior art keywords
electroencephalogram
corpus
degraded
response test
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210081634.9A
Other languages
Chinese (zh)
Inventor
宋奇蔚
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yun Lan Technology Co ltd
Original Assignee
Beijing Yun Lan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yun Lan Technology Co ltd filed Critical Beijing Yun Lan Technology Co ltd
Priority to CN202210081634.9A priority Critical patent/CN114358089A/en
Publication of CN114358089A publication Critical patent/CN114358089A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

The application provides a training method and device of a speech evaluation model based on electroencephalogram and electronic equipment, relates to the technical field of speech evaluation, and specifically comprises the following steps: acquiring an original corpus sample set, wherein the original corpus sample set comprises a plurality of degraded corpus samples; acquiring an electroencephalogram response test result and an artificial score of a plurality of testers on the degenerate corpus sample aiming at each degenerate corpus sample, and processing the electroencephalogram response test result of the degenerate corpus sample to obtain a quantifiable electroencephalogram response test result of the degenerate corpus sample; and training the voice evaluation model based on each degraded corpus, the quantifiable EEG response test experiment result and the corresponding voice quality scoring label. The training method can enable the speech evaluation model to contain electroencephalogram weight, and improves the evaluation accuracy of the speech evaluation model.

Description

Training method and device of speech evaluation model based on electroencephalogram and electronic equipment
Technical Field
The application relates to the technical field of speech evaluation, in particular to a training method and device of a speech evaluation model based on electroencephalogram and electronic equipment.
Background
Voice is the most basic service requirement of communication network services, and with the development of communication network technologies, users have higher requirements for voice quality. In order to provide better service for users, VoLTE (a way of transmitting voice over 4G network), VoNR and other technologies are developed in succession. In order to further understand the Experience of the user for the voice Quality, and optimize the communication network service level and the more targeted optimization of the network service based on the Experience, the rapid development of Quality of Experience (QOE) concepts and technologies is derived and promoted.
The experience quality refers to the subjective experience of the user on the service and the network service quality, is the psychological comprehensive experience established by the user in the service using process, and relates to all aspects in the interaction process of people, networks, services and the like. The experience quality can reflect the relation between the current service and network quality and the user experience, integrates all the influencing factors of the service level, the user level and the network level, and directly reflects the approval degree of the user to the network service.
According to the basis of quality evaluation, the subjective evaluation method can be divided into subjective evaluation and objective evaluation, the subjective evaluation method depends on user scoring for quality evaluation, and can be influenced by individual bias, so that massive data needs to be collected for model training, great resource consumption is caused, evaluation results with highly consistent rules are difficult to obtain, and subsequent problem analysis and network optimization based on the evaluation results are inconvenient. Therefore, objective evaluation methods are commonly adopted in the field of voice quality evaluation at present, PESQ (Perceptual evaluation of speech quality) and POLQA (wave recording card) systems are mainly adopted, the systems have been developed for more than 20 years, a digital signal processing method is mainly adopted for voice signal decomposition, a subjective scale is established through an ear audiometry experiment, and finally a conversion formula is used as an objective algorithm model to obtain a voice quality evaluation result. Although the digital signal processing method and the objective algorithm model are improved aiming at a new network technology, the adaptability problem of emerging services still exists, the subjective scale parameters based on the ear audiometry experiment are single, and the human body comprehensive reaction is not reflected, so that the system is difficult to perfectly and accurately evaluate the emerging services in the fields of audio/video, VR, games and the like.
Disclosure of Invention
In view of this, the present application provides a method and an apparatus for training a speech evaluation model based on electroencephalogram, and an electronic device, so as to solve the technical problem that the speech evaluation model in the prior art lacks a comprehensive response of a human to speech.
In one aspect, an embodiment of the present application provides a method for training a speech evaluation model based on electroencephalogram, including:
acquiring an original corpus sample set, wherein the original corpus sample set comprises a plurality of degraded corpus samples;
acquiring an electroencephalogram response test result and an artificial score of a plurality of testers on the degenerate corpus sample aiming at each degenerate corpus sample, and processing the electroencephalogram response test result of the degenerate corpus sample to obtain a quantifiable electroencephalogram response test result of the degenerate corpus sample;
and training the voice evaluation model based on each degraded corpus, the quantifiable EEG response test experiment result and the corresponding voice quality scoring label.
Further, when the testers are experts, acquiring the electroencephalogram response test experiment results and manual scoring of the plurality of testers on the degraded corpus samples, and processing the electroencephalogram response test experiment results of the degraded corpus samples to obtain quantifiable electroencephalogram response test experiment results of the degraded corpus samples; the method comprises the following steps:
acquiring electroencephalogram signals and manual scores of each expert on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
and extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal to serve as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
Further, when the testers are ordinary testers, acquiring the electroencephalogram response test experiment results and manual scoring of the multiple testers on the degraded corpus samples, and processing the electroencephalogram response test experiment results of the degraded corpus samples to obtain quantifiable electroencephalogram response test experiment results of the degraded corpus samples; the method comprises the following steps:
acquiring electroencephalogram signals and manual scores of the common testers on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal;
clustering all electroencephalogram signal feature values of the degraded corpus samples to obtain a classifiable electroencephalogram signal feature group;
and taking all electroencephalogram signal characteristic values in the classifiable electroencephalogram signal characteristic group as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
Further, the voice quality scoring label is a corrected manual score or a corrected initial voice quality score; the initial speech quality score is obtained by a scoring tool.
Further, correcting the initial speech quality score includes:
establishing an electroencephalogram subjective scale according to the plurality of electroencephalogram characteristic values of the degraded corpus samples and corresponding artificial scores;
and screening and correcting abnormal scores in the initial voice quality scores of the degraded corpus samples based on the electroencephalogram subjective scale to obtain the voice quality score labels of each degraded corpus sample.
Further, training the speech evaluation model based on each degraded corpus sample, a quantifiable electroencephalogram response test experiment result and a corresponding speech quality scoring label, comprising:
generating a time domain oscillogram of each degraded corpus sample;
generating a spectrogram of each degraded corpus sample by adopting short-time Fourier transform;
using a convolutional neural network to carry out reasoning and mapping on the characteristic value, the time domain oscillogram and the spectrogram of the electroencephalogram signal to obtain a predicted voice quality score;
and calculating a loss function based on the predicted voice quality score and the voice quality score label, and updating the parameters of the convolutional neural network by using the loss function.
Further, the method further comprises: performing voice quality evaluation on the degraded linguistic data to be evaluated by utilizing the trained voice evaluation model; the method specifically comprises the following steps:
generating a time domain oscillogram of the degraded corpus to be evaluated;
generating a spectrogram of the degraded corpus to be evaluated by adopting short-time Fourier transform;
reasoning and mapping the time domain oscillogram and the spectrogram by using a convolutional neural network to obtain a voice quality score of the degraded corpus to be evaluated;
on the other hand, the embodiment of the application provides a speech evaluation model training device based on electroencephalogram, comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original corpus sample set which comprises a plurality of degraded corpus samples;
the electroencephalogram testing unit is used for acquiring electroencephalogram response testing experimental results and manual scores of a plurality of testers on the degenerate corpus samples according to each degenerate corpus sample, and processing the electroencephalogram response testing experimental results of the degenerate corpus samples to obtain quantifiable electroencephalogram response testing experimental results of the degenerate corpus samples;
and the training unit is used for training the voice evaluation model based on each degraded corpus, the quantifiable EEG response test experiment result and the corresponding voice quality scoring label.
In another aspect, an embodiment of the present application provides an electronic device, including: the device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the electroencephalogram-based speech evaluation model training method of the embodiment of the application.
In another aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements the electroencephalogram-based speech evaluation model training method according to the embodiment of the present application.
According to the embodiment of the application, an original corpus sample set is obtained, wherein the original corpus sample set comprises a plurality of degraded corpus samples; acquiring an electroencephalogram response test result and an artificial score of a plurality of testers on the degenerate corpus sample aiming at each degenerate corpus sample, and processing the electroencephalogram response test result of the degenerate corpus sample to obtain a quantifiable electroencephalogram response test result of the degenerate corpus sample; and training the voice evaluation model based on each degraded corpus, the quantifiable EEG response test experiment result and the corresponding voice quality scoring label. According to the embodiment of the application, the speech evaluation model comprises the electroencephalogram weight, and the accuracy of the speech evaluation model evaluation of the model is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a structural diagram of a speech quality assessment system provided in an embodiment of the present application;
FIG. 2 is a flowchart of a method for training a brain-based speech assessment model according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating an audio corpus data converted into a spectrogram by using a short-time fourier transform according to an embodiment of the present application;
FIG. 4 is a functional block diagram of a training apparatus for a brain-based speech assessment model according to an embodiment of the present disclosure;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
First, the design idea of the embodiment of the present application is briefly introduced.
At present, objective evaluation methods are commonly adopted in the field of voice quality evaluation, PESQ (Perceptual evaluation of speech quality) and POLQA (wave recording card) systems are mainly adopted, the systems have been developed for more than 20 years, a digital signal processing method is mainly adopted for voice signal decomposition, a subjective scale is established through an ear audiometry experiment, and finally a conversion formula is used as an objective scale algorithm model to obtain a voice quality evaluation result. At present, the methods lack the embodiment of comprehensive reaction on human bodies, so that the voice quality evaluation result lacks the subjective reaction of people.
In order to solve the above technical problem, the present application first constructs a layered speech quality assessment system, as shown in fig. 1, the speech quality assessment system includes: a perception layer, a characteristic layer and an artificial intelligence algorithm model; the perception layer is the bottom layer and comprises one or more of linguistic data, network and QoS parameters. The corpus comprises: language, grammar speed, frequency difference and environment scene; the network comprises: network performance, network coding, and audio coding; the QoS parameters include: frequency distortion, scene noise and jitter frame loss; preferably, the bottom layer of the embodiment of the present application includes corpora; the characteristic layer is the upper layer of the perception layer and is used for realizing the extraction of the abstract characteristic parameters of the bottom layer; the characteristic indexes include: corpus characteristics, human sensory parameters and EEG. The artificial intelligence algorithm model is the upper layer of the characteristic layer and comprises a cognitive reasoning model and score mapping; the extracted features are subjected to reasoning and mapping of an artificial intelligence algorithm model to obtain a voice quality evaluation result. The voice quality evaluation system adopts a deep learning method to realize unsupervised efficient feature learning and layered feature extraction.
The application provides a speech quality evaluation system based on electroencephalogram, which carries out speech quality subjective scale expression through the electric wave response of human brains to speech, and establishes an objective scale algorithm model by using an advanced artificial intelligence algorithm to carry out speech quality evaluation. The brain is a response and control center of the human body, so that the intrinsic reaction of the human body to external stimulation can be more fully reflected, and the accuracy of the model is improved; and because of the structure and the working principle of the brain of the human body, the response of the brain has the tendency, the requirement on data acquisition is greatly reduced, and in addition, the brain electricity is used as a core comprehensive system in the human body, has strong stimulation response universality and can be easily adapted to the development of new services in the future and the expansion of different service types.
In addition, compare in prior art need a large amount of data just can train and accomplish speech quality evaluation model, the training method of this application need not use a large amount of data, can train good model with 500 samples, has improved the training efficiency of model.
After introducing the design concept of the embodiments of the present application, the following describes the technical solutions provided by the embodiments of the present application.
As shown in fig. 2, an embodiment of the present application provides a method for training a speech evaluation model based on electroencephalogram, including:
step 101: acquiring an original corpus sample set, wherein the original corpus sample set comprises a plurality of degraded corpus samples;
the degraded corpus is audio obtained by transmitting an original corpus through a voice transmission device, and the degraded corpus is also called degraded corpus. For example, the speech of a normal call is a degraded corpus after device, environment, communication distortion, etc. The original corpus is used as a reference corpus. The quality of the processed degraded corpus can be improved. For example, the quality of the original corpus is very poor, and the quality of the degraded corpus is improved after the equipment processes.
The terminal for collecting the original corpus is one or more of hardware, SDK and APP, wherein the hardware comprises one or more of a mobile phone, a fixed phone, a recorder and a loudspeaker; the collected original corpus is transmitted to data processing equipment through a network, the data processing equipment is APP, hardware or a cloud, the data processing equipment comprises all database storage, parameters and predictors, and the data processing equipment and the function control are realized through a user UI and a user interface.
Step 102: acquiring an electroencephalogram response test result and an artificial score of a plurality of testers on the degenerate corpus sample aiming at each degenerate corpus sample, and processing the electroencephalogram response test result of the degenerate corpus sample to obtain a quantifiable electroencephalogram response test result of the degenerate corpus sample;
the wireless sensor is adopted in the embodiment to carry out an electroencephalogram response test experiment, a plurality of degraded corpus samples are used to carry out the electroencephalogram response test experiment, and under the common condition, 500 samples can meet the training requirement.
According to whether external stimulation exists, the electroencephalogram signals can be divided into endogenous signals and evoked signals. The evoked potential, also called evoked response, refers to the weak electrical changes of the human brain caused by applying a stimulus to the subject. An event-related potential (ERP) is a special evoked potential, which refers to an electrical signal generated by the response of the cerebral cortex to a stimulus after a person is stimulated, and belongs to a near-field potential (an idea is that an event-related potential is equal to a near-field potential), and can reflect the neuroelectrophysiological changes of the brain during a cognitive process. The endogenous event-related potential signal ERP includes a P300 component, an N200 component, a P100 component, and the like. These ERP components need to be triggered by the participation of human brain in advanced cognitive processing, i.e. certain mental tasks need to be completed, but no long-term repetitive training is required.
For the different tester types: expert or general testers, this step takes two embodiments:
the first embodiment:
acquiring electroencephalogram signals and manual scores of each expert on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
and extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal to serve as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
Extracting a corresponding characteristic value from each denoised and cleaned electroencephalogram signal; the type of the feature value may be a P300 component, an N200 component, or a P100 component, but is not limited to these types, and the specific type of the feature value may be determined according to a specific application scenario. And taking the characteristic value as a quantifiable electroencephalogram response test experiment result of the degraded corpus sample.
The second embodiment:
acquiring electroencephalogram signals and manual scores of the common testers on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal;
clustering all electroencephalogram signal feature values of the degraded corpus samples to obtain a classifiable electroencephalogram signal feature group;
wherein the non-classifiable abnormal data is deleted. In addition, the characteristic value of the clustering edge needs to be corrected based on the minimum distance principle;
and taking all electroencephalogram signal characteristic values in the classifiable electroencephalogram signal characteristic group as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
Step 103: and training the voice evaluation model based on each degraded corpus, the quantifiable EEG response test experiment result and the corresponding voice quality scoring label.
For each degraded corpus, the voice quality score label can be obtained in two ways:
the first mode is as follows:
and (3) correcting the manual scoring of the degraded corpus sample of each tester acquired in the step 102: taking the corrected manual score as a voice quality scoring label;
wherein, the correction is mainly to eliminate abnormal values in manual scoring.
The second mode is as follows:
scoring each degraded corpus sample by adopting the existing scoring tool (PESQ or POLQA) to obtain an initial voice quality score;
establishing an electroencephalogram subjective scale according to a plurality of electroencephalogram characteristic values of the degraded corpus samples and corresponding artificial scores;
and screening and correcting abnormal scores in the initial voice quality scores of the degraded corpus samples based on the electroencephalogram subjective scale to obtain the voice quality score labels of each degraded corpus sample.
For example, an objective evaluation system PESQ is adopted to score the degraded linguistic data transmitted by the audio telephone, and the score is in the range of 1-5.
Take 8s of original corpus as an example: acquiring a plurality of original corpora and corresponding degraded corpora of 8s, performing electroencephalogram experiments and manual scoring, and establishing an electroencephalogram subjective scale; and (3) scoring by PESQ to obtain a batch of scores, finding abnormal scores by data analysis, and correcting the abnormal scores by using an electroencephalogram subjective scale.
The speech evaluation model is the above artificial intelligence algorithm model (objective scale algorithm model).
Because the voice corpus data with different grading grades has obvious characteristics on the frequency spectrum, firstly, a time domain oscillogram of the degraded corpus to be evaluated is generated; then, converting each degraded corpus sample into a spectrogram by adopting short-time Fourier transform, and compared with the defect of losing time domain features existing in the traditional Fourier transform, dividing audio data into segmented time windows by the short-time Fourier transform, and performing Fourier transform in each time window (255 data points), so that sufficient time domain features are reserved; the time window comprises 16-2048 data.
The short-time Fourier transform generates a corpus spectrogram (as shown in FIG. 3), then the corpus spectrogram is converted into a standard two-dimensional tensor, and an image processing method is adopted for further feature extraction, so that a Convolutional Neural Network (CNN) is adopted for processing the spectrogram to obtain an objective scale algorithm model (a subjective scale is trained to be an objective scale) which can be used for actual voice quality evaluation. A convolutional neural network is adopted in the voice evaluation model, and unsupervised efficient feature learning and layered feature extraction can be achieved.
When the artificial intelligence algorithm model is used for voice evaluation, an invasive system is adopted, standard original voice corpora preset in the invasive system are sent to a system to be tested, degraded corpora received by the system to be tested are collected and sent to the system, and voice quality grading of the degraded corpora is obtained through reasoning of a built-in objective scale algorithm model (the trained artificial intelligence algorithm model).
The method further comprises the following steps: performing voice quality evaluation on the degraded linguistic data to be evaluated by utilizing the trained voice evaluation model; the method specifically comprises the following steps:
generating a time domain oscillogram of the degraded corpus to be evaluated;
generating a spectrogram of the degraded corpus to be evaluated by adopting short-time Fourier transform;
reasoning and mapping the time domain oscillogram and the spectrogram by using a convolutional neural network to obtain a voice quality score of the degraded corpus to be evaluated;
specifically, when the tested system is a telephone, the preset standard original voice corpus is stored in the New MOS equipment, the degraded corpus of the standard original corpus after passing through the telephone is collected, and the voice quality score of the degraded corpus is obtained through reasoning by an objective scale algorithm model built in the New MOS equipment, so that the voice quality of the telephone is tested.
Based on the foregoing embodiments, an embodiment of the present application provides a training device for a speech evaluation model based on electroencephalogram, and referring to fig. 4, the training device 200 for a speech evaluation model based on electroencephalogram provided by the embodiment of the present application at least includes:
an obtaining unit 201, configured to obtain an original corpus sample set, where the original corpus sample set includes a plurality of degenerate corpus samples;
the electroencephalogram testing unit 202 is configured to, for each degraded corpus sample, obtain electroencephalogram response testing experiment results and manual scores of multiple testers for the degraded corpus sample, and process the electroencephalogram response testing experiment results of the degraded corpus sample to obtain quantifiable electroencephalogram response testing experiment results of the degraded corpus sample;
and the training unit 203 is used for training the voice evaluation model based on each degraded corpus, the quantifiable electroencephalogram response test experiment result and the corresponding voice quality scoring label.
As a possible implementation, when the tester is an expert, the brain electrical testing unit 202 is specifically configured to:
acquiring electroencephalogram signals and manual scores of each expert on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
and extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal to serve as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
As a possible implementation manner, when the tester is a general tester, the electroencephalogram testing unit 202 is specifically configured to:
acquiring electroencephalogram signals and manual scores of the common testers on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal;
clustering all electroencephalogram signal feature values of the degraded corpus samples to obtain a classifiable electroencephalogram signal feature group;
and taking all electroencephalogram signal characteristic values in the classifiable electroencephalogram signal characteristic group as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
As a possible implementation, the voice quality scoring tag scores the corrected human score.
As a possible implementation, the voice quality score label is a corrected initial voice quality score; the method specifically comprises the following steps:
performing voice evaluation on the degraded corpus sample by using a scoring tool to obtain an initial voice quality score;
establishing an electroencephalogram subjective scale according to the plurality of electroencephalogram characteristic values of the degraded corpus samples and corresponding artificial scores;
and screening and correcting abnormal scores in the initial voice quality scores of the degraded corpus samples based on the electroencephalogram subjective scale to obtain the voice quality score labels of each degraded corpus sample.
As a possible implementation, the training unit 203 is specifically configured to:
generating a time domain oscillogram of each degraded corpus sample;
generating a spectrogram of each degraded corpus sample by adopting short-time Fourier transform;
using a convolutional neural network to carry out reasoning and mapping on the characteristic value, the time domain oscillogram and the spectrogram of the electroencephalogram signal to obtain a predicted voice quality score;
and calculating a loss function based on the predicted voice quality score and the voice quality score label, and updating the parameters of the convolutional neural network by using the loss function.
As a possible implementation, the apparatus further comprises: the voice quality evaluation module 204 is specifically configured to:
generating a time domain oscillogram of the degraded corpus to be evaluated;
generating a spectrogram of the degraded corpus to be evaluated by adopting short-time Fourier transform;
and reasoning and mapping the time domain oscillogram and the spectrogram by using a convolutional neural network to obtain the voice quality score of the degraded corpus to be evaluated.
Based on the foregoing embodiments, an embodiment of the present application further provides an electronic device, and referring to fig. 5, an electronic device 300 provided in an embodiment of the present application at least includes: the device comprises a processor 301, a memory 302 and a computer program stored on the memory 302 and capable of running on the processor 301, wherein the processor 301 implements the electroencephalogram-based speech evaluation model training method provided by the embodiment of the application when executing the computer program.
The electronic device 300 provided by the embodiment of the present application may further include a bus 303 connecting different components (including the processor 301 and the memory 302). Bus 303 represents one or more of any of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.
The Memory 302 may include readable media in the form of volatile Memory, such as Random Access Memory (RAM) 3021 and/or cache Memory 3022, and may further include Read Only Memory (ROM) 3023.
The memory 302 may also include a program tool 3024 having a set (at least one) of program modules 3025, the program modules 3025 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Electronic device 300 may also communicate with one or more external devices 304 (e.g., keyboard, remote control, etc.), with one or more devices that enable a user to interact with electronic device 300 (e.g., cell phone, computer, etc.), and/or with any device that enables electronic device 300 to communicate with one or more other electronic devices 300 (e.g., router, modem, etc.). Such communication may be through an Input/Output (I/O) interface 305. Also, the electronic device 300 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 306. As shown in FIG. 5, the network adapter 306 communicates with the other modules of the electronic device 300 via the bus 303. It should be understood that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with electronic device 300, including but not limited to: microcode, device drivers, Redundant processors, external disk drive Arrays, disk array (RAID) subsystems, tape drives, and data backup storage subsystems, to name a few.
It should be noted that the electronic device 400 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiments.
The embodiment of the application also provides a computer-readable storage medium, which stores computer instructions, and the computer instructions, when executed by a processor, implement the electroencephalogram-based speech evaluation model training method provided by the embodiment of the application.
It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.
Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A speech evaluation model training method based on electroencephalogram is characterized by comprising the following steps:
acquiring an original corpus sample set, wherein the original corpus sample set comprises a plurality of degraded corpus samples;
acquiring an electroencephalogram response test result and an artificial score of a plurality of testers on the degenerate corpus sample aiming at each degenerate corpus sample, and processing the electroencephalogram response test result of the degenerate corpus sample to obtain a quantifiable electroencephalogram response test result of the degenerate corpus sample;
and training the voice evaluation model based on each degraded corpus, the quantifiable EEG response test experiment result and the corresponding voice quality scoring label.
2. The electroencephalogram-based speech evaluation model training method according to claim 1, wherein when a tester is an expert, electroencephalogram response test experiment results and manual scores of a plurality of testers on the degraded corpus samples are obtained, and the electroencephalogram response test experiment results of the degraded corpus samples are processed to obtain quantifiable electroencephalogram response test experiment results of the degraded corpus samples; the method comprises the following steps:
acquiring electroencephalogram signals and manual scores of each expert on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
and extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal to serve as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
3. The electroencephalogram-based speech evaluation model training method according to claim 1, wherein when a tester is a common tester, electroencephalogram response test experiment results and manual scores of a plurality of testers on the degraded corpus samples are obtained, and the electroencephalogram response test experiment results of the degraded corpus samples are processed to obtain quantifiable electroencephalogram response test experiment results of the degraded corpus samples; the method comprises the following steps:
acquiring electroencephalogram signals and manual scores of the common testers on the degraded corpus samples;
denoising and cleaning each electroencephalogram signal of the degraded corpus sample;
extracting corresponding characteristic values from each de-noised and cleaned electroencephalogram signal;
clustering all electroencephalogram signal feature values of the degraded corpus samples to obtain a classifiable electroencephalogram signal feature group;
and taking all electroencephalogram signal characteristic values in the classifiable electroencephalogram signal characteristic group as quantifiable electroencephalogram response test experiment results of the degraded corpus samples.
4. The brain-electrical-based speech assessment model training method according to claim 2 or 3, wherein said speech quality scoring labels are corrected artificial scores or corrected initial speech quality scores; the initial speech quality score is obtained by a scoring tool.
5. The brain-electrical based speech assessment model training method of claim 4, wherein correcting said initial speech quality score comprises:
establishing an electroencephalogram subjective scale according to the plurality of electroencephalogram characteristic values of the degraded corpus samples and corresponding artificial scores;
and screening and correcting abnormal scores in the initial voice quality scores of the degraded corpus samples based on the electroencephalogram subjective scale to obtain the voice quality score labels of each degraded corpus sample.
6. The method of claim 5, wherein training the speech assessment model based on each degraded corpus sample, quantifiable EEG response test experiment results, and corresponding speech quality score labels comprises:
generating a time domain oscillogram of each degraded corpus sample;
generating a spectrogram of each degraded corpus sample by adopting short-time Fourier transform;
using a convolutional neural network to carry out reasoning and mapping on the characteristic value, the time domain oscillogram and the spectrogram of the electroencephalogram signal to obtain a predicted voice quality score;
and calculating a loss function based on the predicted voice quality score and the voice quality score label, and updating the parameters of the convolutional neural network by using the loss function.
7. The brain-electrical-based speech assessment model training method of claim 6, further comprising: performing voice quality evaluation on the degraded linguistic data to be evaluated by utilizing the trained voice evaluation model; the method specifically comprises the following steps:
generating a time domain oscillogram of the degraded corpus to be evaluated;
generating a spectrogram of the degraded corpus to be evaluated by adopting short-time Fourier transform;
and reasoning and mapping the time domain oscillogram and the spectrogram by using a convolutional neural network to obtain the voice quality score of the degraded corpus to be evaluated.
8. A speech assessment model training device based on electroencephalogram is characterized by comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring an original corpus sample set which comprises a plurality of degraded corpus samples;
the electroencephalogram testing unit is used for acquiring electroencephalogram response testing experimental results and manual scores of a plurality of testers on the degenerate corpus samples according to each degenerate corpus sample, and processing the electroencephalogram response testing experimental results of the degenerate corpus samples to obtain quantifiable electroencephalogram response testing experimental results of the degenerate corpus samples;
and the training unit is used for training the voice evaluation model based on each degraded corpus, the quantifiable EEG response test experiment result and the corresponding voice quality scoring label.
9. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the brain-based speech assessment model training method of any one of claims 1-7 when executing the computer program.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, implements the brain-based speech assessment model training method according to any one of claims 1-7.
CN202210081634.9A 2022-01-24 2022-01-24 Training method and device of speech evaluation model based on electroencephalogram and electronic equipment Pending CN114358089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210081634.9A CN114358089A (en) 2022-01-24 2022-01-24 Training method and device of speech evaluation model based on electroencephalogram and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210081634.9A CN114358089A (en) 2022-01-24 2022-01-24 Training method and device of speech evaluation model based on electroencephalogram and electronic equipment

Publications (1)

Publication Number Publication Date
CN114358089A true CN114358089A (en) 2022-04-15

Family

ID=81094267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210081634.9A Pending CN114358089A (en) 2022-01-24 2022-01-24 Training method and device of speech evaluation model based on electroencephalogram and electronic equipment

Country Status (1)

Country Link
CN (1) CN114358089A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457374A (en) * 2022-11-09 2022-12-09 之江实验室 Deep pseudo-image detection model generalization evaluation method and device based on reasoning mode
CN116168729A (en) * 2023-04-21 2023-05-26 致讯科技(天津)有限公司 Voice quality evaluation method and device and electronic equipment
CN116523574A (en) * 2023-04-20 2023-08-01 上海外国语大学 Quality of experience evaluation method and system based on user portrait and electroencephalogram data
CN117411969A (en) * 2023-12-14 2024-01-16 致讯科技(天津)有限公司 User perception evaluation method and device for non-target material

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531190A (en) * 2016-10-12 2017-03-22 科大讯飞股份有限公司 Speech quality evaluation method and device
CN108206027A (en) * 2016-12-20 2018-06-26 北京酷我科技有限公司 A kind of audio quality evaluation method and system
CN111383657A (en) * 2018-12-27 2020-07-07 中国移动通信集团辽宁有限公司 Voice quality evaluation method, device, equipment and medium
CN112353391A (en) * 2020-10-22 2021-02-12 武汉理工大学 Electroencephalogram signal-based method and device for recognizing sound quality in automobile
CN112383829A (en) * 2019-11-06 2021-02-19 致讯科技(天津)有限公司 Experience quality evaluation method and device
CN112383828A (en) * 2019-12-12 2021-02-19 致讯科技(天津)有限公司 Experience quality prediction method, equipment and system with brain-like characteristic
CN112967735A (en) * 2021-02-23 2021-06-15 北京达佳互联信息技术有限公司 Training method of voice quality detection model and voice quality detection method
CN113554597A (en) * 2021-06-23 2021-10-26 清华大学 Image quality evaluation method and device based on electroencephalogram characteristics
CN113576498A (en) * 2021-09-07 2021-11-02 上海交通大学 Visual and auditory aesthetic evaluation method and system based on electroencephalogram signals

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106531190A (en) * 2016-10-12 2017-03-22 科大讯飞股份有限公司 Speech quality evaluation method and device
CN108206027A (en) * 2016-12-20 2018-06-26 北京酷我科技有限公司 A kind of audio quality evaluation method and system
CN111383657A (en) * 2018-12-27 2020-07-07 中国移动通信集团辽宁有限公司 Voice quality evaluation method, device, equipment and medium
CN112383829A (en) * 2019-11-06 2021-02-19 致讯科技(天津)有限公司 Experience quality evaluation method and device
CN112383828A (en) * 2019-12-12 2021-02-19 致讯科技(天津)有限公司 Experience quality prediction method, equipment and system with brain-like characteristic
CN112353391A (en) * 2020-10-22 2021-02-12 武汉理工大学 Electroencephalogram signal-based method and device for recognizing sound quality in automobile
CN112967735A (en) * 2021-02-23 2021-06-15 北京达佳互联信息技术有限公司 Training method of voice quality detection model and voice quality detection method
CN113554597A (en) * 2021-06-23 2021-10-26 清华大学 Image quality evaluation method and device based on electroencephalogram characteristics
CN113576498A (en) * 2021-09-07 2021-11-02 上海交通大学 Visual and auditory aesthetic evaluation method and system based on electroencephalogram signals

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
柴明锐,万成祥著: "《数据挖掘技术及在石油地质中的应用》", 30 September 2017, 天津科学技术出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115457374A (en) * 2022-11-09 2022-12-09 之江实验室 Deep pseudo-image detection model generalization evaluation method and device based on reasoning mode
CN116523574A (en) * 2023-04-20 2023-08-01 上海外国语大学 Quality of experience evaluation method and system based on user portrait and electroencephalogram data
CN116168729A (en) * 2023-04-21 2023-05-26 致讯科技(天津)有限公司 Voice quality evaluation method and device and electronic equipment
CN117411969A (en) * 2023-12-14 2024-01-16 致讯科技(天津)有限公司 User perception evaluation method and device for non-target material
CN117411969B (en) * 2023-12-14 2024-03-12 致讯科技(天津)有限公司 User perception evaluation method and device for non-target material

Similar Documents

Publication Publication Date Title
CN114358089A (en) Training method and device of speech evaluation model based on electroencephalogram and electronic equipment
CN110251124B (en) Method and system for determining effective brain network
CN111428601B (en) P300 signal identification method, device and storage medium based on MS-CNN
CN114732409A (en) Emotion recognition method based on electroencephalogram signals
CN114190944A (en) Robust emotion recognition method based on electroencephalogram signals
CN115281685A (en) Sleep stage identification method and device based on anomaly detection and computer readable storage medium
CN115153588A (en) Electroencephalogram space-time denoising method integrating dense residual error and attention mechanism
CN114601476A (en) EEG signal emotion recognition method based on video stimulation
CN113208629A (en) Alzheimer disease screening method and system based on EEG signal
CN113208632A (en) Attention detection method and system based on convolutional neural network
CN116842361A (en) Epileptic brain electrical signal identification method based on time-frequency attention mixing depth network
Kuruvila et al. Inference of the selective auditory attention using sequential LMMSE estimation
CN115017960B (en) Electroencephalogram signal classification method based on space-time combined MLP network and application
CN111736690A (en) Motor imagery brain-computer interface based on Bayesian network structure identification
CN113486208A (en) Voice search equipment based on artificial intelligence and search method thereof
CN113017648A (en) Electroencephalogram signal identification method and system
CN113208623A (en) Sleep staging method and system based on convolutional neural network
CN111783857A (en) Motor imagery brain-computer interface based on nonlinear network information graph
CN116189668B (en) Voice classification and cognitive disorder detection method, device, equipment and medium
CN117257303B (en) Anxiety detection method and device, electronic equipment and storage medium
CN116035594B (en) Electroencephalogram artifact removing method based on segmentation-noise reduction network
Navea et al. Classification of wavelet-denoised musical tone stimulated EEG signals using artificial neural networks
Faisal et al. Eye-Blink Artifact Removal from EEG Signal using Machine Learning and De-noising Techniques
Zhang et al. Sparsity-Driven EEG Channel Selection for Brain-Assisted Speech Enhancement
CN116048282B (en) Data processing method, system, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220415