CN116865887B - Emotion classification broadcasting system and method based on knowledge distillation - Google Patents

Emotion classification broadcasting system and method based on knowledge distillation Download PDF

Info

Publication number
CN116865887B
CN116865887B CN202310828957.4A CN202310828957A CN116865887B CN 116865887 B CN116865887 B CN 116865887B CN 202310828957 A CN202310828957 A CN 202310828957A CN 116865887 B CN116865887 B CN 116865887B
Authority
CN
China
Prior art keywords
program
broadcasting
emotion
emotion classification
text content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310828957.4A
Other languages
Chinese (zh)
Other versions
CN116865887A (en
Inventor
刘海章
王祥
张长娟
田才林
黄大池
朱静宁
赵开宇
杜限
黄河
靳晶晶
王佩
邹雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Institute Of Radio And Television Science And Technology
Original Assignee
Sichuan Institute Of Radio And Television Science And Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Institute Of Radio And Television Science And Technology filed Critical Sichuan Institute Of Radio And Television Science And Technology
Priority to CN202310828957.4A priority Critical patent/CN116865887B/en
Publication of CN116865887A publication Critical patent/CN116865887A/en
Application granted granted Critical
Publication of CN116865887B publication Critical patent/CN116865887B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H20/00Arrangements for broadcast or for distribution combined with broadcast
    • H04H20/53Arrangements specially adapted for specific applications, e.g. for traffic information or for mobile receivers
    • H04H20/59Arrangements specially adapted for specific applications, e.g. for traffic information or for mobile receivers for emergency or urgency
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Databases & Information Systems (AREA)
  • Emergency Management (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses an emotion classification broadcasting system and a method based on knowledge distillation, wherein the emotion classification broadcasting system comprises a program broadcasting control service subsystem, a program text content processing subsystem and a program text content processing subsystem, wherein the program broadcasting control service subsystem is used for converging the content of a program text content; the AI service subsystem is used for carrying out knowledge distillation on the emotion classification BERT pre-training large model and training to obtain an edge side emotion classification small model; and the intelligent broadcasting terminal subsystem is used for transmitting the emotion factors and the program texts obtained by classification to the text-to-speech conversion engine, generating an audio program with the emotion factors and broadcasting the audio program. The invention not only ensures that the broadcasting system has the broadcasting effect of the program with emotion colors, but also greatly reduces the transmission data quantity because the transmitted program data is changed from audio frequency into text, so that the transmission time of the program is shorter and the emergency broadcasting capability of the system is stronger.

Description

Emotion classification broadcasting system and method based on knowledge distillation
Technical Field
The invention belongs to the technical field of intelligent broadcasting, and particularly relates to an emotion classification broadcasting system and method based on knowledge distillation.
Background
In the current broadcasting system, broadcasting content is collected from safe and compliant broadcasting data sources through a news collector and a web crawler program, the broadcasting system firstly uses a text-to-speech mode for collected broadcasting texts to generate audio programs at a cloud end, and then uses streaming media communication modes such as RTMP and the like to transmit the audio programs to a broadcasting terminal for broadcasting the audio programs.
The main problems in the above-mentioned manner are:
(1) Broadcast program content without emotion color
Because the broadcasting programs are converted by using a text-to-speech conversion engine (TTS), emotion classification is not carried out on the broadcasting program text during conversion, and all the programs are of the same speech speed and tone, so that the broadcasting programs are flat and have no infection force, and effective influence on audience products cannot be generated.
(2) The program transmission time is too long
Because the broadcasting terminal directly plays the converted audio file, compared with directly transmitting and playing the text, the audio file transmission time may be too long, and even the playing failure condition may be caused by the network transmission quality problem under the condition of poor network.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides an emotion classification broadcasting system and method based on knowledge distillation, so as to solve the problems of no emotion color and overlong program transmission time of the existing broadcasting program.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in a first aspect, an emotion classification broadcast system based on knowledge distillation, comprising:
the program broadcasting control service subsystem is used for carrying out content aggregation on the text content of the program, storing the content into a broadcasting program library and sending the text content of the program to the intelligent broadcasting terminal subsystem according to broadcasting requirements;
the AI service subsystem is used for carrying out knowledge distillation on the emotion classification BERT pre-training large model, training to obtain an edge side emotion classification small model, and storing and transmitting the edge side emotion classification small model to the intelligent broadcasting terminal subsystem;
and the intelligent broadcasting terminal subsystem is used for receiving the edge side emotion classification small model and the program text, performing emotion classification on the program text, transmitting the emotion factors and the program text obtained by classification to the text-to-speech conversion engine, generating an audio program with the emotion factors, and broadcasting the audio program.
In a second aspect, a broadcasting method of an emotion classification broadcasting system based on knowledge distillation is characterized by comprising the following steps:
s1, content aggregation is carried out on the text content of the program, the content is stored in a broadcasting program library, and the text content of the program is sent to an intelligent broadcasting terminal subsystem according to broadcasting requirements;
s2, carrying out knowledge distillation on the emotion classification BERT pre-training large model, training to obtain an edge side emotion classification small model, and storing and transmitting the edge side emotion classification small model to the intelligent broadcast terminal subsystem;
s3, carrying out emotion classification on the program text by adopting an edge side emotion classification small model, transmitting emotion factors and the program text obtained by classification to a text-to-speech conversion engine, generating an audio program with emotion factors, and broadcasting.
Further, step S1 includes:
the method comprises the steps of collecting content through a news collector and a web crawler for emergency broadcasting, geological disaster information, emergency release of other three parties, appointed news information or government notices, cleaning data of collected content, storing aggregated program data in a broadcasting program library, storing the broadcasting program library in a data and file separation mode, and carrying out CDN release on programs of the broadcasting program library based on broadcasting service.
Further, in step S2, knowledge distillation is performed on the emotion classification BERT pre-training large model, and the training is performed to obtain an edge side emotion classification small model, which includes:
s2.1, initializing a temperature parameter T;
s2.2, training a student model by adopting a current temperature parameter T and a loss function;
s2.3, evaluating the performance of the training student model under the temperature parameter T by adopting the accuracy of the verification set;
s2.4, if the performance on the verification set does not meet the preset condition, increasing the value of the temperature parameter T, and returning to S2.2 for continuous training; if the performance on the verification set meets the preset condition, the current temperature parameter T is saved, the value of the temperature parameter T is reduced, and then the training is continued by returning to S2.2;
and stopping training until the value of the temperature parameter T is smaller than the threshold value or the training round number is larger than the maximum training round number.
Further, step S2.2 includes:
introducing a temperature parameter T, performing temperature scaling by using a softmax function, and outputting probability distribution q by the scaled student model i The method comprises the following steps:
wherein f T ( i The method comprises the steps of carrying out a first treatment on the surface of the ) Input text content sample x for student model i The lower output probability distribution, θ isParameters of the green model;
using a loss function L S,T Training a student model after temperature scaling:
wherein y is i Training the real label of the sample for the ith text content; n is the total number of text content training samples; k is the number of categories of the tag, namely the number of emotion classifications.
Further, step S2.3 includes:
wherein N is v To verify the total number of samples, A T Is the model accuracy.
Further, step S3 includes:
carrying out emotion classification on the program text by adopting an edge-side emotion classification small model, wherein the emotion classification of the program text comprises urgency, pleasure, peace and sadness, and the category with the maximum probability of four emotion classification is used as an emotion factor of the broadcast program text;
and transmitting the obtained emotion factors and the program text to a text-to-speech conversion engine, performing speech conversion on the program text, generating an audio program with the emotion factors, and broadcasting the audio program.
The emotion classification broadcasting system and method based on knowledge distillation provided by the invention have the following beneficial effects:
according to the invention, the emotion classification method based on knowledge distillation is adopted to carry out emotion analysis on the program text at the edge side, and emotion factors obtained by analysis are input into the text-to-speech conversion engine to carry out audio program conversion, so that the broadcast audio program has emotion factors and has more infectivity. Meanwhile, as the transmitted program data is changed from audio frequency to text, the transmission data volume is greatly reduced, the program transmission time is shorter, and the emergency broadcasting capability of the system is stronger.
The invention not only enables the broadcasting system to have the program broadcasting effect with emotion colors, but also enables the program type issued by the cloud to be text type, the data size after compression is only a few KB, the files transmitted by the existing broadcasting system are audio files, the size after encoding is also more than hundreds KB, and compared with the prior art, the data volume of the program transmission is reduced by more than hundreds of times, and the transmission time is also greatly shortened; compared with the existing broadcasting system, the method and the system greatly reduce the transmitted data volume and transmission time, enhance the emergency broadcasting capability of the broadcasting system, and simultaneously carry out the emotion reasoning of the whole text on the intelligent terminal at the edge side, thereby greatly reducing the calculation pressure of the cloud of the broadcasting system.
The core idea of the invention is to use an adaptive way to determine the value of the temperature parameter T when distilling a small model, in particular when the performance on the validation set is not improved, the value of the temperature parameter can be increased to expand the search space of the model, thus having a greater possibility to find a better model. When the performance on the verification set is improved, the current temperature parameter T is saved, and the value of the temperature parameter T is reduced, so that the model is focused on the knowledge of the original pre-training model; by repeatedly iterating the training and continuously adjusting the value of the temperature parameter T, the algorithm can adaptively determine the optimal temperature parameter value, thereby improving the performance of the student model.
Drawings
Fig. 1 is a system block diagram of an emotion classification broadcast system based on knowledge distillation.
FIG. 2 is a flow chart of adaptive temperature scaling knowledge distillation.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Example 1
The embodiment provides an emotion classification broadcasting system based on knowledge distillation, which aims at solving the problems of no emotion color, overlong program transmission time and the like of broadcasting programs in the existing method, adopts emotion classification based on knowledge distillation to carry out emotion classification on program texts at the edge side, inputs emotion factors obtained by analysis into a text-to-speech conversion engine to carry out audio program conversion, so that broadcasting audio programs have emotion factors and have more infectivity, and meanwhile, because the transmitted program data is changed into texts from audio, the transmission data volume is greatly reduced, the program transmission time is shorter, and the emergency broadcasting capability of the system is stronger, and referring to fig. 1, the system specifically comprises:
program broadcasting control service subsystem
The system is used for converging the contents of programs such as emergency message texts, news information websites, emergency audios and the like, storing the contents in a broadcasting program library and issuing the broadcasting contents to the intelligent broadcasting terminal according to broadcasting requirements;
specifically, the subsystem collects contents according to set rules through a news collector and a web crawler program on websites such as emergency broadcasting, geological disaster information, emergency release of other three parties, appointed news information, government notices and the like, cleans the collected contents only by data and stores the collected program data in a broadcasting program library. The broadcasting program library is stored in a mode of separating data from files, and the functions of automatic classification, automatic labeling and automatic cleaning of expired data are provided. The broadcasting service distributes the program of the broadcasting program library to CDN. Compared with the prior art, the broadcasting system of the invention changes the form of issuing programs from audio to text;
AI service subsystem
Carrying out knowledge distillation on the emotion classification BERT (Bidirectional Encoder Representation from Transformers) pre-training large model, training to obtain an edge side emotion classification small model which can be operated at the intelligent broadcasting terminal, and storing and issuing the edge emotion classification small model according to the requirement;
in order for a broadcast system to have emotion broadcasting capability, it is necessary to use AI to perform emotion classification on broadcast text content. In the system, emotion classification is to classify emotion of a program and judge which category of "urgent", "pleasant", "peaceful" and "sad" the program belongs to. Considering that the number of broadcasting terminals is large, if emotion classification is carried out at the cloud, the calculation force requirement is huge and the reasoning response time is too long, so that the invention carries out the emotion classification reasoning task on the intelligent broadcasting terminals. Because the intelligent broadcasting terminal has limited calculation power, the intelligent broadcasting terminal cannot directly lower the estrus classification BERT pre-training large model (comprising a multi-layer encoder), and the large model can be efficiently operated at the edge end only by compressing and cutting;
intelligent broadcasting terminal subsystem
Using the received emotion classification small model to carry out emotion classification on the broadcast text, transmitting the program text and emotion factors to a text-to-speech conversion engine to generate an audio program with emotion factors and broadcasting the audio program;
specifically, the intelligent broadcasting terminal adopts multi-channel receiving, and can receive 4G/5G, wiFi, bluetooth and other transmission data. And the data analysis module analyzes the received data to obtain a broadcasting program text and an emotion classification BERT small model. The AI emotion analysis module loads an emotion classification BERT small model by using an AI framework such as PyTorch and the like, performs emotion classification reasoning on the broadcast program text, and then takes the category with the highest emotion classification probability of ' urgent ', ' pleasant ', ' mild ' sad ' and the like as an emotion factor of the broadcast program text. Finally, the text-to-speech engine TTS performs speech conversion on the program text according to the input emotion factors, and generates an audio program with emotion colors for broadcasting.
Example 2
The embodiment provides a broadcasting method of an emotion classification broadcasting system based on knowledge distillation, which is based on a self-adaptive temperature scaling knowledge distillation method to perform knowledge distillation on a BERT large model to obtain a corresponding BERT small model (only 3 layers of encoders). The model calculation and storage cost is greatly reduced, and meanwhile, the performance and generalization capability of a large model are basically ensured, so that a small model can carry out efficient emotion classification reasoning on an intelligent broadcasting terminal, wherein the self-adaptive temperature scaling knowledge distillation is as follows:
the goal of knowledge distillation is to migrate knowledge in a large BERT emotion classification large model teacher network T into a small student model S that will be trained to mimic the behavior of the teacher network; f (f) T And f S Representing the behavioural functions of the teacher network and the student network, respectively, the goal of the behavioural functions is to convert the input of the network into a corresponding information encoded representation, knowledge distillation can be modeled as a minimization process of the following objective functions, namely:
L S,T =∑ x∈E Loss( T (x), S (x))
the loss (·) is a loss function for measuring the difference between a teacher network and a student network, x is an input sample, and E is a sample set; the focus of knowledge distillation is to select and construct a loss function with which to correlate an effective behavioral function.
In the method, the student model selects the BERT model which is consistent with the teacher model in structure, and the model layer number is far smaller than that of the teacher model. In the method, the output of a prediction layer and the attention weight are selected as corresponding behavior functions, and a cross entropy function is selected as a basic loss function. In the aspect of data sets, the patent adopts a broadcasting program library to construct, and emotion is classified into four categories, namely: "urgent", "pleasant", "peaceful", "sad". The whole data set comprises 10000 samples of a text content training set, 5000 samples of a text content verification set and 5000 samples of a text content test set.
The self-adaptive temperature scaling method is designed to optimize the whole distillation process and improve the generalization capability of the student model. In particular, when performance on the validation set is not improved, the value of the temperature parameter may be increased to expand the search space of the model, thereby having a greater likelihood of finding a better model. When the performance on the validation set improves, the current temperature parameter T is saved, and the value of the temperature parameter T is reduced, so that the model is focused on the knowledge of the original pre-trained model. By iteratively training and continuously adjusting the value of the temperature parameter T, the algorithm can adaptively determine the optimal temperature parameter value, thereby improving the performance of the student model, and referring to fig. 2, the algorithm specifically comprises the following steps:
step S1, content aggregation is carried out on the program text content, the content is stored in a broadcasting program library, and the program text content is sent to an intelligent broadcasting terminal subsystem according to broadcasting requirements;
specifically, the news collector and the web crawler collect content of emergency broadcast, geological disaster information, other three-party emergency delivery, appointed news information or government notices, clean the collected content data, store the collected program data in a broadcasting program library, store the broadcasting program library in a mode of separating data from files, and carry out CDN delivery on the programs of the broadcasting program library based on broadcasting service.
S2, carrying out knowledge distillation on the emotion classification BERT pre-training large model, training to obtain an edge side emotion classification small model, and storing and transmitting the edge side emotion classification small model to an intelligent broadcast terminal subsystem, wherein the method specifically comprises the following steps of:
step S2.1, initializing a temperature parameter T, wherein the temperature parameter T is a smaller value;
s2.2, training a student model by adopting a current temperature parameter T and a loss function;
introducing a temperature parameter T, performing temperature scaling by using a softmax function, and outputting probability distribution q by the scaled student model i The method comprises the following steps:
wherein f T ( i The method comprises the steps of carrying out a first treatment on the surface of the ) Input text content sample x for student model i The lower output probability distribution, θ is the parameter of the student model;
using a loss function L S,T Training a student model after temperature scaling:
wherein y is i Training the real label of the sample for the ith text content; n is the total number of text content training samples (10000); k is the number of categories (4) of labels, i.e. the number of emotion classifications, including "urgent", "pleasant", "peaceful", "sad";
s2.3, evaluating the performance of the training student model under the temperature parameter T by adopting the accuracy of the verification set;
wherein N is v To verify the total number of samples (5000), A T Is the model accuracy;
step S2.4, if the performance on the verification set does not meet the preset condition, increasing the value of the temperature parameter T, and returning to the step S2.2 to continue training; if the performance on the verification set meets the preset condition, the current temperature parameter T is saved, the value of the temperature parameter T is reduced, and then the step S2.2 is returned to continue training;
and stopping training until the value of the temperature parameter T is smaller than the threshold value or the training round number is larger than the maximum training round number.
As shown in FIG. 2, wherein A T () For model accuracy in training round I, l max For maximum number of rounds of training, T min For a set minimum temperature value, α, β are constants.
S3, carrying out emotion classification on the program text by adopting an edge side emotion classification small model, wherein the emotion classification of the program text comprises urgency, pleasure, peace and sadness, and the category with the highest probability of four emotion classification is used as an emotion factor of the broadcast program text;
and transmitting the obtained emotion factors and the program text to a text-to-speech conversion engine, performing speech conversion on the program text, generating an audio program with the emotion factors, and broadcasting the audio program.
Although specific embodiments of the invention have been described in detail with reference to the accompanying drawings, it should not be construed as limiting the scope of protection of the present patent. Various modifications and variations which may be made by those skilled in the art without the creative effort are within the scope of the patent described in the claims.

Claims (1)

1. An emotion classification broadcast system based on knowledge distillation, comprising:
the program broadcasting control service subsystem is used for carrying out content aggregation on the text content of the program, storing the content into a broadcasting program library and sending the text content of the program to the intelligent broadcasting terminal subsystem according to broadcasting requirements;
the AI service subsystem is used for carrying out knowledge distillation on the emotion classification BERT pre-training large model, training to obtain an edge side emotion classification small model, storing the edge side emotion classification small model and transmitting the edge side emotion classification small model to the intelligent broadcasting terminal subsystem;
the intelligent broadcasting terminal subsystem comprises a text-to-speech conversion engine and is used for receiving the edge side emotion classification small model and the program text content, performing emotion classification on the received program text content through the edge side emotion classification small model, transmitting emotion factors and the program text content obtained by classification to the text-to-speech conversion engine, generating an audio program with emotion factors and broadcasting the audio program;
the step of content aggregation of the program text content, storing the content in a broadcasting program library and sending the program text content to an intelligent broadcasting terminal subsystem according to broadcasting requirements comprises the following steps:
collecting program text content, cleaning data of the collected program text content, storing the data in a broadcasting program library, and carrying out CDN release on the program text content of the broadcasting program library according to broadcasting requirements based on broadcasting service so as to send the program text content to an intelligent broadcasting terminal subsystem;
performing knowledge distillation on the emotion classification BERT pre-training big model, training to obtain an edge side emotion classification small model, wherein the method comprises the following steps:
s2.1, initializing a temperature parameter T;
s2.2, training a student model by adopting a current temperature parameter T and a loss function;
s2.3, evaluating the performance of the student model under the current temperature parameter T by adopting the accuracy of the verification set;
s2.4, if the performance of the student model under the current temperature parameter T does not meet the preset condition, increasing the value of the temperature parameter T, and returning to S2.2 for continuous training; if the performance of the student model under the current temperature parameter T meets the preset condition, the current temperature parameter T is saved, the value of the temperature parameter T is reduced, and then the student model returns to S2.2 to continue training;
stopping training until the value of the temperature parameter T is smaller than a threshold value or the training round number is larger than the maximum training round number;
the accuracy of the validation set in S2.3 is obtained by: obtaining a label of each verification sample in a verification set through a student model, and taking the ratio of the number of the verification samples, which are consistent with the obtained label and the corresponding real label, to the total number of the verification samples in the verification set as the accuracy of the verification set;
the method for receiving the edge side emotion classification small model and the program text content, performing emotion classification on the received program text content through the edge side emotion classification small model, transmitting emotion factors and the program text content obtained by classification to a text-to-speech conversion engine, generating and broadcasting an audio program with emotion factors, and comprises the following steps:
the intelligent broadcasting terminal subsystem further comprises a data analysis module, wherein the data analysis module receives data sent by the program broadcasting control service subsystem and the AI service subsystem through 4G, 5G, wiFi or Bluetooth by adopting multiple channels, analyzes the received data, and obtains program text content and an edge side emotion classification small model;
the intelligent broadcasting terminal subsystem further comprises an AI emotion analysis module, wherein the AI emotion analysis module loads an edge side emotion classification small model by using a PyTorch, performs emotion classification on received program text content through the edge side emotion classification small model, and uses the category with the highest probability of four types of emotion classification as emotion factors of the received program text content, wherein the emotion classification of the program text content comprises urgency, pleasure, peace and sadness;
and transmitting the obtained emotion factors and the program text content to the text-to-speech conversion engine, performing speech conversion on the program text content through the text-to-speech conversion engine, and generating and broadcasting an audio program with the emotion factors.
CN202310828957.4A 2023-07-06 2023-07-06 Emotion classification broadcasting system and method based on knowledge distillation Active CN116865887B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310828957.4A CN116865887B (en) 2023-07-06 2023-07-06 Emotion classification broadcasting system and method based on knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310828957.4A CN116865887B (en) 2023-07-06 2023-07-06 Emotion classification broadcasting system and method based on knowledge distillation

Publications (2)

Publication Number Publication Date
CN116865887A CN116865887A (en) 2023-10-10
CN116865887B true CN116865887B (en) 2024-03-01

Family

ID=88235340

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310828957.4A Active CN116865887B (en) 2023-07-06 2023-07-06 Emotion classification broadcasting system and method based on knowledge distillation

Country Status (1)

Country Link
CN (1) CN116865887B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347586A (en) * 2018-11-24 2019-02-15 合肥龙泊信息科技有限公司 There is the emergency broadcase system of terminal broadcast speech monitoring function when a kind of teletext
CN111767740A (en) * 2020-06-23 2020-10-13 北京字节跳动网络技术有限公司 Sound effect adding method and device, storage medium and electronic equipment
CN114241282A (en) * 2021-11-04 2022-03-25 河南工业大学 Knowledge distillation-based edge equipment scene identification method and device
CN114863226A (en) * 2022-04-26 2022-08-05 江西理工大学 Network physical system intrusion detection method
CN116260642A (en) * 2023-02-27 2023-06-13 南京邮电大学 Knowledge distillation space-time neural network-based lightweight Internet of things malicious traffic identification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109347586A (en) * 2018-11-24 2019-02-15 合肥龙泊信息科技有限公司 There is the emergency broadcase system of terminal broadcast speech monitoring function when a kind of teletext
CN111767740A (en) * 2020-06-23 2020-10-13 北京字节跳动网络技术有限公司 Sound effect adding method and device, storage medium and electronic equipment
CN114241282A (en) * 2021-11-04 2022-03-25 河南工业大学 Knowledge distillation-based edge equipment scene identification method and device
CN114863226A (en) * 2022-04-26 2022-08-05 江西理工大学 Network physical system intrusion detection method
CN116260642A (en) * 2023-02-27 2023-06-13 南京邮电大学 Knowledge distillation space-time neural network-based lightweight Internet of things malicious traffic identification method

Also Published As

Publication number Publication date
CN116865887A (en) 2023-10-10

Similar Documents

Publication Publication Date Title
Gündüz et al. Beyond transmitting bits: Context, semantics, and task-oriented communications
CN114362367B (en) Cloud-edge-cooperation-oriented power transmission line monitoring system and method, and cloud-edge-cooperation-oriented power transmission line identification system and method
CN111581437A (en) Video retrieval method and device
EP3255633B1 (en) Audio content recognition method and device
CN113488063B (en) Audio separation method based on mixed features and encoding and decoding
CN113988086A (en) Conversation processing method and device
WO2021028236A1 (en) Systems and methods for sound conversion
CN116865887B (en) Emotion classification broadcasting system and method based on knowledge distillation
CN114528434A (en) IPTV live channel fusion recommendation method based on self-attention mechanism
Liang et al. Generative ai-driven semantic communication networks: Architecture, technologies and applications
Xiao et al. Reasoning over the air: A reasoning-based implicit semantic-aware communication framework
CN116737936A (en) AI virtual personage language library classification management system based on artificial intelligence
US11785299B1 (en) Selecting advertisements for media programs and establishing favorable conditions for advertisements
CN114743540A (en) Speech recognition method, system, electronic device and storage medium
CN115098633A (en) Intelligent customer service emotion analysis method and system, electronic equipment and storage medium
CN115273828A (en) Training method and device of voice intention recognition model and electronic equipment
CN114373443A (en) Speech synthesis method and apparatus, computing device, storage medium, and program product
CN114328867A (en) Intelligent interruption method and device in man-machine conversation
CN116127074B (en) Anchor image classification method based on LDA theme model and kmeans clustering algorithm
CN114065742B (en) Text detection method and device
CN116543769B (en) MIMO voice transmission method and system based on semantic perception network
CN115472152B (en) Voice endpoint detection method and device, computer equipment and readable storage medium
CN114567811B (en) Multi-modal model training method, system and related equipment for voice sequencing
Hang et al. Research on Audio Recognition and Optimization Processing based on Deep Learning
Di Principles of AIGC technology and its application in new media micro-video creation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant