CN117153144A - Battery information voice broadcasting method and device based on terminal calculation - Google Patents

Battery information voice broadcasting method and device based on terminal calculation Download PDF

Info

Publication number
CN117153144A
CN117153144A CN202311425249.2A CN202311425249A CN117153144A CN 117153144 A CN117153144 A CN 117153144A CN 202311425249 A CN202311425249 A CN 202311425249A CN 117153144 A CN117153144 A CN 117153144A
Authority
CN
China
Prior art keywords
data
model
network
battery
edsa
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311425249.2A
Other languages
Chinese (zh)
Other versions
CN117153144B (en
Inventor
李朝
钟逸晨
丁东辉
肖劼
胡始昌
杨斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yugu Technology Co ltd
Original Assignee
Hangzhou Yugu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yugu Technology Co ltd filed Critical Hangzhou Yugu Technology Co ltd
Priority to CN202311425249.2A priority Critical patent/CN117153144B/en
Publication of CN117153144A publication Critical patent/CN117153144A/en
Application granted granted Critical
Publication of CN117153144B publication Critical patent/CN117153144B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H02GENERATION; CONVERSION OR DISTRIBUTION OF ELECTRIC POWER
    • H02JCIRCUIT ARRANGEMENTS OR SYSTEMS FOR SUPPLYING OR DISTRIBUTING ELECTRIC POWER; SYSTEMS FOR STORING ELECTRIC ENERGY
    • H02J7/00Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries
    • H02J7/00032Circuit arrangements for charging or depolarising batteries or for supplying loads from batteries characterised by data exchange
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/027Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Abstract

The application relates to a battery information voice broadcasting method and device based on end calculation, wherein the battery information voice broadcasting method is applied to a battery management system in battery replacement equipment and comprises the following steps: acquiring parameter information of a target battery in a current state, and performing text conversion on the parameter information to obtain text data; converting the text data into a speech signal according to a pre-trained speech synthesis model; controlling a player to carry out voice broadcasting according to the voice signal; the speech synthesis model is configured as a transducer TTS model fused to an EDSA network, and specifically includes an encoder and a decoder, the decoder including the EDSA network, and a self-attention layer in data link with the EDSA network; the EDSA network is used for carrying out linear calculation on the input multidimensional characteristic data to obtain processing data; the self-attention layer is used for processing the processing data and the data output by the encoder.

Description

Battery information voice broadcasting method and device based on terminal calculation
Technical Field
The application relates to the field of deep learning and battery equipment, in particular to a battery information voice broadcasting method and device based on end calculation.
Background
In order to ensure that various parameters are in a normal range when the battery equipment operates, information such as voltage, temperature, current and the like of the battery needs to be monitored and broadcasted in real time.
At present, the data of the battery equipment is generally uploaded to a cloud server, and the text on the cloud server is converted into a voice model to process the data of the battery equipment, so that the function of broadcasting battery information is achieved.
The text-to-speech model parameters used in the prior art are large in scale, high in calculation complexity, incapable of being deployed at a battery end, and capable of being calculated only by depending on a cloud server, but a large amount of calculation resources and long network transmission time are needed for calculation through the cloud server, so that delay broadcasting of a certain time is caused, and the efficiency of battery information broadcasting is affected.
Disclosure of Invention
The embodiment of the application provides a battery information voice broadcasting method and device based on end calculation, which are used for at least solving the problem of low efficiency of battery information broadcasting in the related technology.
In a first aspect, an embodiment of the present application provides a method for voice broadcasting battery information based on end calculation, where the method is applied to a battery management system in a battery exchange device, and the method includes:
acquiring parameter information of a target battery in a current state, and performing text conversion on the parameter information to obtain text data;
converting the text data into a speech signal according to a pre-trained speech synthesis model;
controlling a player to carry out voice broadcasting according to the voice signal;
the speech synthesis model is configured as a transducer TTS model fused to an EDSA network, and specifically includes an encoder and a decoder, the decoder including the EDSA network, and a self-attention layer in data link with the EDSA network;
the EDSA network is used for carrying out linear calculation on the input multidimensional characteristic data to obtain processing data;
the self-attention layer is used for processing the processing data and the data output by the encoder.
In an embodiment, the EDSA network comprises two fully connected layers, the EDSA network being specifically configured to:
receiving the multi-dimensional feature, obtaining a weight matrix of each dimension in the multi-dimensional feature,wherein V is t Is a multidimensional feature, w t Is a weight matrix, < >>The method is characterized in that the method is a weight parameter initialized randomly, linear1 and Linear2 are full-connection layers, and a d-dimensional vector can be obtained through Linear1 and Linear2 respectively;
calculating an output vector, normalizing the weight matrix through a softmax function, regularizing through a dropout function to obtain processed data,wherein (1)>Is processing data, j is an integer from 0 to d, < >>∈R d ,V j Features representing the first j-dimensional vector.
In an embodiment, the obtaining the parameter information of the target battery in the current state, and performing text conversion on the parameter information to obtain text data includes:
receiving target battery parameter information acquired and sent by a sensor, wherein the parameter information comprises voltage, current and temperature;
filtering the parameter information to remove abnormal values and obtain filtered data;
the filtered data is converted into structured text data.
In an embodiment, the speech synthesis model is configured to be obtained by:
training a transducer TTS model of the fused EDSA network to obtain an intermediate model;
the intermediate model is converted to the speech synthesis model by a deep learning framework.
In an embodiment, the obtaining the intermediate model includes: carrying out batch normalization processing on model parameters:
in the batch normalization processing process, adjusting the weight corresponding to the characteristic parameters extracted by the characteristic channels by adjusting the coefficient of the regularization term in the batch normalization function; and deleting the characteristic channels corresponding to the weights smaller than or equal to the preset threshold value.
In an embodiment, the obtaining the intermediate model further comprises:
in the training process, parameter data of each network is processed in a model quantization mode so as to convert floating point type data of high-order numbers into integer type data of low-order numbers.
In an embodiment, the converting the intermediate model into the speech synthesis model by a deep learning framework includes:
the intermediate model is converted into a format file which can be operated by the battery management system by adopting a deep learning conversion frame, so as to obtain the voice synthesis model; the deep learning conversion frame corresponds to a frame of the training model.
In a second aspect, an embodiment of the present application provides a battery information voice broadcasting device, including:
the acquisition module is used for: the method comprises the steps of obtaining parameter information of a target battery in a current state, and performing text conversion on the parameter information to obtain text data;
and a conversion module: for converting the text data into a speech signal according to a pre-trained speech synthesis model;
and a broadcasting module: the voice broadcasting device is used for controlling the player to carry out voice broadcasting according to the voice signal;
the speech synthesis model is configured as a transducer TTS model fused to an EDSA network, and specifically includes an encoder and a decoder, the decoder including the EDSA network, and a self-attention layer in data link with the EDSA network;
the EDSA network is used for carrying out linear calculation on the input multidimensional characteristic data to obtain processing data;
the self-attention layer is used for processing the processing data and the data output by the encoder.
In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method for voice broadcasting battery information based on end calculation according to any one of the examples in the first aspect when the processor executes the computer program.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium, on which a computer program is stored, where the program when executed by a processor implements the method for voice broadcasting battery information based on end calculation according to any one of the first examples.
The battery information voice broadcasting method and device based on the end calculation provided by the embodiment of the application have at least the following technical effects.
According to the application, the parameter information of the target battery is structured, so that the quality of data is ensured, and the accuracy of converting text into voice data is ensured. By replacing the original self-attention layer in the transducer TTS model with EDSA, the weight matrix is calculated in a linear mode, the calculation complexity of the voice synthesis model is reduced, and the calculation efficiency is improved. Text-to-speech operation is directly performed at the battery equipment end without depending on a cloud server, so that data transmission time is saved, and model conversion instantaneity and application flexibility are improved. The voice broadcasting of the battery information is beneficial to the user to grasp the working state of the battery in real time, ensures that various parameters of the battery equipment are in a normal range when the battery equipment operates, and is more beneficial to timely processing the abnormal state of the battery. In this way, the real-time performance of the voice broadcasting of the battery information is improved, and the problems that the calculated amount is large and the calculation can only be performed depending on the cloud server in the prior art, so that the battery information is delayed to broadcast are solved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
fig. 1 is a flowchart illustrating a battery information voice broadcasting method according to an exemplary embodiment;
FIG. 2 is a data processing method shown in accordance with an exemplary embodiment;
FIG. 3 is a block diagram of a speech synthesis model shown in accordance with an exemplary embodiment;
fig. 4 is a block diagram of a battery voice broadcast device according to an exemplary embodiment;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
In a first aspect, an embodiment of the present application provides a method for voice broadcasting battery information based on end calculation, where the method is applied to a battery management system in a battery exchange device. Fig. 1 is a flowchart illustrating a battery information voice broadcasting method according to an exemplary embodiment, and as shown in fig. 1, the battery information voice broadcasting method based on end calculation includes:
step S101, obtaining parameter information of the target battery in the current state, and performing text conversion on the parameter information to obtain text data. Optionally, the parameter information characterizes an operational state performance of the battery, for example, characterizes an electrical parameter (current, voltage or charge) of the battery, or characterizes an operational state (temperature, whether in a charged state, etc.) of the battery.
In one example, fig. 2 is a data processing method shown according to an exemplary embodiment, in which parameter information includes voltage, current, and temperature as examples. As shown in fig. 2, step S101 includes:
step S1011, receiving the target battery parameter information acquired and transmitted by the sensor.
Optionally, temperature, voltage and current information of the battery are acquired through a temperature sensor, a voltage sensor and a current sensor, respectively.
In step S1012, the parameter information is filtered to remove the outlier and obtain filtered data.
Optionally, the abnormal value occurs in a case of missing value, format error, data type and length not meeting preset conditions.
In step S1013, the filtered data is converted into structured text data.
Optionally, the filtered data is converted into structured text data by means of natural language processing or the like. For example, the filtered data is subjected to structured text conversion using a natural language processing model such as a transducer or BERT.
In this example, the quality of the data is ensured by screening and filtering the data collected by the sensor, and the filtered sensor data is converted into structured text data which is easier to understand and read, which is beneficial to ensuring the accuracy in the subsequent conversion of text into voice data.
With continued reference to fig. 1, step S102 is performed after step S101.
Step S102, converting text data into voice signals according to a pre-trained voice synthesis model.
The speech synthesis model is configured as a transducer TTS model fused to the EDSA network. The model specifically includes an encoder and a decoder, wherein the decoder includes an EDSA network, and a self-attention layer in data link with the EDSA network. The EDSA network is used for carrying out linear calculation on the input multidimensional characteristic data to obtain processing data, and the self-attention layer is used for processing the processing data and the data output by the encoder.
Alternatively, fig. 3 is a block diagram of a speech synthesis model shown according to an exemplary embodiment, and as shown in fig. 3, the speech synthesis model includes: encoding module 1, decoding module 6, and vocoder 5. The encoding module 1 is used for converting the input text data into feature vectors. The decoding module 6 is configured to process the input feature vector and the existing mel spectrogram 2 to obtain a mel spectrogram 4 matched with the input vector. The vocoder 5 is used to convert the mel-spectrogram 4 into a speech waveform.
The speech synthesis model provided by the embodiment of the application replaces the original self-attention layer in the transform TTS model with the high-efficiency decoding self-attention layer 3 (Efficient Decoding Self-attention, EDSA), and the EDSA network is used for carrying out linear calculation on the input multi-dimensional characteristic data. By adopting the mode, the calculation complexity of the whole model is reduced, the operation efficiency is effectively improved, and further, the requirement of the whole method on the corresponding hardware performance is reduced, so that the model can be deployed on terminal equipment such as a point changing cabinet.
In one example, the EDSA network includes two fully connected layers, and is used for performing linear computation processing on the input multi-dimensional characteristic data, and is specifically configured to:
step S301, receiving a multidimensional feature V t Wherein
Step S302, obtaining a weight matrix w of each dimension in the multi-dimension feature through two full connection layers tWherein w is t For the weight matrix of each dimension, +.>The weight parameters are randomly initialized, and Linear1 and Linear2 are full connection layers, so that d-dimensional vectors can be obtained respectively.
Step S303, calculating an output vectorSpecifically, normalizing the weight matrix through a softmax function, and regularizing through a dropout function to obtain +.>. Output vector +.>As data that is processed linearly through the EDSA network. />
Wherein j is an integer from 0 to d,∈R d ,V j features representing the first j-dimensional vector.
In the example, the weight matrix of each dimension of the feature is obtained by adding the weight matrix initialized randomly and the two full connection layers, and the weight matrix is calculated in a linear mode, so that the complexity of model calculation is reduced, the calculation efficiency is improved, and the conversion efficiency when text data is converted into voice data in the follow-up process is facilitated.
In one example, the speech synthesis model is configured to be obtained by:
step S401, collecting battery equipment state text information disclosed on a website as a first data set, and taking structured text data synthesized according to battery equipment state data collected by a sensor as a self-test data set. Training a transducer TTS model of the fused EDSA network based on the first data set and the self-testing data set to obtain an intermediate model. Optionally, obtaining the intermediate model by taking the set training times as the training ending condition. The intermediate model is a model after model compression, and the model volume is compressed in the training process in the following manner.
The first mode is to prune the model, which specifically includes:
in the batch normalization process, adjusting the weight corresponding to the characteristic parameters extracted by the characteristic channels by adjusting the coefficient of a regularization term in a batch normalization function; and deleting the characteristic channels corresponding to the weights smaller than or equal to the preset threshold value.
Optionally, in the model training process, regularization term coefficients in the batch normalization function are adjusted, weights corresponding to the feature parameters are adjusted, and parameter channels corresponding to the weights close to 0 are removed. The regularization mode can adopt L1 norm regularization or L2 regularization.
In this example, the smaller the weight parameter of the model, the lower the contribution of the parameter to the model is indicated, and the parameter channel corresponding to the weight approaching 0 in the model is removed in a regularization mode. In this way, the model accuracy is ensured as much as possible, the calculation amount of the model parameters and the model is reduced, and the calculation speed of the model is improved.
The second mode, quantizes the model parameters, specifically includes:
in the training process, parameter data of each network is processed in a model quantization mode so as to convert floating point type data of high-order numbers into integer type data of low-order numbers. Alternatively, 32-bit single precision floating point type data is converted into 8-bit integer type data.
The parameter digit is reduced in a quantization mode in the training process, the calculated amount is reduced, the model volume is reduced, and the memory occupation in operation is reduced, so that the method is beneficial to the subsequent high-efficiency application at the battery management equipment end.
The third mode is to prune the model and quantize the model parameters, and specifically includes:
in the training process, the first mode and the second mode are adopted simultaneously, pruning treatment is carried out on the model, and quantization treatment is carried out on model parameters. In this way, model parameter channels with low contribution to the model are eliminated, the number of parameter bits is reduced, the calculation amount of the model is reduced, the volume of the model and the memory occupation during operation are reduced, and the method is beneficial to the subsequent efficient application at the battery management equipment end.
Step S402, converting the intermediate model into a speech synthesis model through a deep learning framework. Optionally, the intermediate model is a model trained and compressed in step S401.
In one example, step S402 specifically includes: the middle model is converted into a format file which can be operated by a battery management system by adopting a deep learning conversion frame, so as to obtain a voice synthesis model; the deep learning transformation framework corresponds to the framework of the training model.
Optionally, when the intermediate model is converted into a format which can be operated by the battery management system by adopting the deep learning conversion frame, if the model is trained by adopting the Tensorflow frame during training, the model is converted by adopting the Tensorflow Lite frame; if the model is trained by using the PyTorch frame during training, model conversion is carried out through the PyTorch Lite frame, and the corresponding model is deployed to a battery management system for subsequent text-to-speech recognition.
The compressed voice synthesis model is deployed to the battery management system through the mobile deep learning framework, so that the model volume is effectively compressed without relying on deep learning plug-ins and operators when the mobile deep learning framework is applied, the memory occupation of the model in operation is reduced, and the instantaneity of the voice synthesis model is improved.
In summary, by replacing the original self-attention layer in the transducer TTS model with the EDSA, the weight matrix is calculated in a linear manner, so that the complexity of calculation of the speech synthesis model is reduced, and the calculation efficiency is improved. The parameters of the speech synthesis model are reduced through the step S401, so that the calculated amount of the model is effectively reduced; the model volume and the memory occupation during operation are reduced. Through step S402, the model is transplanted to the battery management system, so that the voice synthesis model does not need to depend on a cloud server, the time of data transmission is saved, and the instantaneity of model conversion and the flexibility of application are improved.
With continued reference to fig. 1, step S103 is performed after step S102.
Step S103, controlling a player to perform voice broadcasting according to the voice signal;
and controlling the player to perform voice broadcasting according to the voice signal identified in the step S102. Optionally, the player is a speaker, a loudspeaker, or the like mounted in the battery exchange device. The battery information is broadcasted in real time in the mode, so that a user can grasp the working state of the battery in real time, various parameters of the battery equipment in the running process are ensured to be in a normal range, and the abnormal state of the battery can be processed in time.
In summary, the data collected by the sensor is filtered and structured in step S101, so that the quality of the data is ensured, and the accuracy of converting the text into the voice data in the following process is ensured.
Through step S102, the original self-attention layer in the transducer TTS model is replaced by EDSA, so that the weight matrix is calculated in a linear mode, the calculation complexity of the voice synthesis model is reduced, the calculation efficiency is improved, and further the requirement of the whole method on the corresponding hardware performance is reduced, so that the model can be deployed on terminal equipment such as a point changing cabinet. By pruning and quantifying the model, the calculation amount of the model is effectively reduced, and the volume of the model and the memory occupation during operation are reduced. And the model is transplanted to a battery management system, so that the voice synthesis model can be directly applied to a battery end without depending on a cloud server, the time of data transmission is saved, and the instantaneity and the application flexibility of model conversion are improved.
Through step S103, the battery information is subjected to voice broadcasting, so that a user can grasp the working state of the battery in real time, various parameters of the battery equipment in the running process are ensured to be in a normal range, and the abnormal state of the battery can be processed in time.
In a second aspect, an embodiment of the present application provides a battery information voice broadcasting device, and fig. 4 is a block diagram of a battery voice broadcasting device according to an exemplary embodiment, where the device includes:
the acquisition module 100: the method is used for acquiring the parameter information of the target battery in the current state, and converting the parameter information into text data.
Conversion module 200: for converting text data into speech signals according to a pre-trained speech synthesis model.
Broadcast module 300: and the device is used for controlling the player to carry out voice broadcasting according to the voice signal.
The speech synthesis model is configured as a transducer TTS model that fuses the EDSA network, specifically including an encoder and a decoder, the decoder including the EDSA network, and a self-attention layer in data link with the EDSA network.
The EDSA network is used for carrying out linear calculation on the input multi-dimensional characteristic data to obtain processing data.
The self-attention layer is used for processing the processed data and the data output by the encoder.
In one example, the EDSA network in the conversion module 200 includes two fully connected layers, the EDSA network specifically configured to:
weight unit: for receiving the multi-dimensional features, obtaining a weight matrix for each dimension in the multi-dimensional features,
wherein V is t Is a multidimensional feature, w t Is a matrix of weights that are to be used,the method is characterized in that the method is a weight parameter initialized randomly, linear1 and Linear2 are full-connection layers, and a d-dimensional vector can be obtained through Linear1 and Linear2 respectively;
an output unit: for calculating output vector, normalizing the weight matrix by softmax function, regularizing by dropout function to obtain processed data,wherein (1)>Is processing data, j is an integer from 0 to d, < >>∈R d ,V j Features representing the first j-dimensional vector.
In one example, the acquisition module 100 includes:
a receiving unit: the system is used for receiving the target battery parameter information acquired and sent by the sensor, wherein the parameter information comprises voltage, current and temperature.
And a filtering unit: and the filtering module is used for filtering the parameter information to remove abnormal values and obtain filtered data.
A conversion unit: for converting the filtered data into structured text data.
In one example, the speech synthesis model in the conversion module 200 is configured to be obtained by:
the acquisition unit 210: the method comprises the steps of training a transducer TTS model of a fusion EDSA network to obtain an intermediate model;
the conversion unit 220: for converting the intermediate model into a speech synthesis model by means of a deep learning framework.
In one example, the acquisition unit 210 includes: carrying out batch normalization processing on model parameters:
in the batch normalization process, adjusting the weight corresponding to the characteristic parameters extracted by the characteristic channels by adjusting the coefficient of a regularization term in a batch normalization function; and deleting the characteristic channels corresponding to the weights smaller than or equal to the preset threshold value.
In one example, the acquisition unit 210 includes:
in the training process, parameter data of each network is processed in a model quantization mode so as to convert floating point type data of high-order numbers into integer type data of low-order numbers.
In one example, the conversion unit 220 includes:
the middle model is converted into a format file which can be operated by a battery management system by adopting a deep learning conversion frame, so as to obtain a voice synthesis model; the deep learning transformation framework corresponds to the framework of the training model.
In summary, the data acquired by the sensor is filtered and structured by the acquisition module 100, so that the quality of the data is ensured, and the accuracy of converting text into voice data is ensured. The conversion module 200 replaces the original self-attention layer in the transducer TTS model with EDSA, so that the weight matrix is calculated in a linear mode, the calculation complexity of the voice synthesis model is reduced, the calculation efficiency is improved, and further the requirement of the whole method on the corresponding hardware performance is reduced, so that the model can be deployed on terminal equipment such as a point changing cabinet. By pruning and quantifying the model, the calculation amount of the model is effectively reduced, and the volume of the model and the memory occupation during operation are reduced. And the model is transplanted to a battery management system, so that the voice synthesis model can be directly applied to a battery end without depending on a cloud server, the time of data transmission is saved, and the instantaneity and the application flexibility of model conversion are improved. Through broadcasting module 300, carry out voice broadcast to battery information, be favorable to the user to grasp the operating condition of battery in real time, each item parameter when guaranteeing battery equipment to operate is in normal range, is favorable to in time handling battery abnormal state.
In a third aspect, an embodiment of the present application provides an electronic device, and fig. 5 is a schematic structural diagram of the electronic device provided in the embodiment of the present application. The electronic device includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for voice broadcasting battery information based on end calculation provided in the first aspect when executing the program, and the electronic device 60 shown in fig. 5 is merely an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.
The electronic device 60 may be in the form of a general purpose computing device, which may be a server device, for example. Components of electronic device 60 may include, but are not limited to: the at least one processor 61, the at least one memory 62, a bus 63 connecting the different system components, including the memory 62 and the processor 61.
The bus 63 includes a data bus, an address bus, and a control bus.
Memory 62 may include volatile memory such as Random Access Memory (RAM) 621 and/or cache memory 622, and may further include Read Only Memory (ROM) 623.
Memory 62 may also include a program/utility 625 having a set (at least one) of program modules 624, such program modules 624 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The processor 61 executes various functional applications and data processing, such as the end-calculation-based battery information voice broadcasting method of the first aspect of the present application, by running a computer program stored in the memory 62.
The electronic device 60 may also communicate with one or more external devices 64 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 65. Also, the model-generating device 60 may also communicate with one or more networks, such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet, through a network adapter 66. As shown, the network adapter 66 communicates with other modules of the model-generating device 60 via the bus 63. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generating device 60, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.
It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements a method for voice broadcasting battery information based on end calculation as provided in the first aspect.
More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation manner, the present application may also be implemented in the form of a program product, which includes a program code for causing a terminal device to perform the steps of implementing the end-calculation based battery information voice broadcast method provided in the first aspect, when the program product is run on the terminal device.
Wherein the program code for carrying out the application may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. The method is applied to a battery management system in a battery replacement device, and comprises the following steps:
acquiring parameter information of a target battery in a current state, and performing text conversion on the parameter information to obtain text data;
converting the text data into a speech signal according to a pre-trained speech synthesis model;
controlling a player to carry out voice broadcasting according to the voice signal;
the speech synthesis model is configured as a transducer TTS model fused to an EDSA network, and specifically includes an encoder and a decoder, the decoder including the EDSA network, and a self-attention layer in data link with the EDSA network;
the EDSA network is used for carrying out linear calculation on the input multidimensional characteristic data to obtain processing data;
the self-attention layer is used for processing the processing data and the data output by the encoder.
2. The end-computing-based battery information voice broadcast method of claim 1, wherein the EDSA network comprises two fully connected layers, the EDSA network being specifically configured to:
receiving the multi-dimensional feature, obtaining a weight matrix of each dimension in the multi-dimensional feature,
wherein V is t Is a multidimensional feature, w t Is a weight matrix, < >>The method is characterized in that the method is a weight parameter initialized randomly, linear1 and Linear2 are full-connection layers, and a d-dimensional vector can be obtained through Linear1 and Linear2 respectively;
calculating an output vector, normalizing the weight matrix through a softmax function, regularizing through a dropout function to obtain processed data,
wherein (1)>Is processing data, j is an integer from 0 to d, < >>∈R d ,V j Features representing the first j-dimensional vector.
3. The method for voice broadcasting battery information based on end calculation according to claim 1, wherein the obtaining parameter information of the target battery in the current state, performing text conversion on the parameter information to obtain text data, includes:
receiving target battery parameter information acquired and sent by a sensor, wherein the parameter information comprises voltage, current and temperature;
filtering the parameter information to remove abnormal values and obtain filtered data;
the filtered data is converted into structured text data.
4. The end-computing-based battery information voice broadcast method of claim 1, wherein the voice synthesis model is configured to be obtained by:
training a transducer TTS model of the fused EDSA network to obtain an intermediate model;
the intermediate model is converted to the speech synthesis model by a deep learning framework.
5. The method for voice broadcasting battery information based on end calculation according to claim 4, wherein the obtaining the intermediate model comprises: carrying out batch normalization processing on model parameters:
in the batch normalization processing process, adjusting the weight corresponding to the characteristic parameters extracted by the characteristic channels by adjusting the coefficient of the regularization term in the batch normalization function; and deleting the characteristic channels corresponding to the weights smaller than or equal to the preset threshold value.
6. The method for voice broadcasting battery information based on end calculation according to claim 4 or 5, wherein the obtaining the intermediate model further comprises:
in the training process, parameter data of each network is processed in a model quantization mode so as to convert floating point type data of high-order numbers into integer type data of low-order numbers.
7. The end-computing-based battery information voice broadcasting method of claim 4, wherein the converting the intermediate model into the speech synthesis model by a deep learning framework comprises:
the intermediate model is converted into a format file which can be operated by the battery management system by adopting a deep learning conversion frame, so as to obtain the voice synthesis model; the deep learning conversion frame corresponds to a frame of the training model.
8. A battery information voice broadcast device, comprising:
the acquisition module is used for: the method comprises the steps of obtaining parameter information of a target battery in a current state, and performing text conversion on the parameter information to obtain text data;
and a conversion module: for converting the text data into a speech signal according to a pre-trained speech synthesis model;
and a broadcasting module: the voice broadcasting device is used for controlling the player to carry out voice broadcasting according to the voice signal;
the speech synthesis model is configured as a transducer TTS model fused to an EDSA network, and specifically includes an encoder and a decoder, the decoder including the EDSA network, and a self-attention layer in data link with the EDSA network;
the EDSA network is used for carrying out linear calculation on the input multidimensional characteristic data to obtain processing data;
the self-attention layer is used for processing the processing data and the data output by the encoder.
9. An electronic device, comprising
The memory device is used for storing the data,
a processor, and
computer program stored on the memory and executable on the processor, which when executed implements the end-computing based battery information voice broadcast method according to any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when executed by a processor, implements the end-calculation based battery information voice broadcast method according to any one of claims 1 to 7.
CN202311425249.2A 2023-10-31 2023-10-31 Battery information voice broadcasting method and device based on terminal calculation Active CN117153144B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311425249.2A CN117153144B (en) 2023-10-31 2023-10-31 Battery information voice broadcasting method and device based on terminal calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311425249.2A CN117153144B (en) 2023-10-31 2023-10-31 Battery information voice broadcasting method and device based on terminal calculation

Publications (2)

Publication Number Publication Date
CN117153144A true CN117153144A (en) 2023-12-01
CN117153144B CN117153144B (en) 2024-02-06

Family

ID=88910516

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311425249.2A Active CN117153144B (en) 2023-10-31 2023-10-31 Battery information voice broadcasting method and device based on terminal calculation

Country Status (1)

Country Link
CN (1) CN117153144B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205282538U (en) * 2015-12-31 2016-06-01 长安大学 New energy automobile battery compartment cooling firebreak device
CN110534089A (en) * 2019-07-10 2019-12-03 西安交通大学 A kind of Chinese speech synthesis method based on phoneme and rhythm structure
US20200082807A1 (en) * 2018-01-11 2020-03-12 Neosapience, Inc. Text-to-speech synthesis method and apparatus using machine learning, and computer-readable storage medium
CN111721535A (en) * 2020-06-23 2020-09-29 中国人民解放军战略支援部队航天工程大学 Bearing fault detection method based on convolution multi-head self-attention mechanism
US20210035551A1 (en) * 2019-08-03 2021-02-04 Google Llc Controlling Expressivity In End-to-End Speech Synthesis Systems
CN112638242A (en) * 2018-08-24 2021-04-09 马塞洛·马利尼·拉梅戈 Monitoring device and system
CN114842826A (en) * 2022-04-25 2022-08-02 马上消费金融股份有限公司 Training method of speech synthesis model, speech synthesis method and related equipment
CN114882862A (en) * 2022-04-29 2022-08-09 华为技术有限公司 Voice processing method and related equipment
US20230343319A1 (en) * 2022-04-22 2023-10-26 Papercup Technologies Limited speech processing system and a method of processing a speech signal
CN116941947A (en) * 2023-08-29 2023-10-27 武义零智不锈钢制品有限公司 AI intelligent cover

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN205282538U (en) * 2015-12-31 2016-06-01 长安大学 New energy automobile battery compartment cooling firebreak device
US20200082807A1 (en) * 2018-01-11 2020-03-12 Neosapience, Inc. Text-to-speech synthesis method and apparatus using machine learning, and computer-readable storage medium
CN112638242A (en) * 2018-08-24 2021-04-09 马塞洛·马利尼·拉梅戈 Monitoring device and system
CN110534089A (en) * 2019-07-10 2019-12-03 西安交通大学 A kind of Chinese speech synthesis method based on phoneme and rhythm structure
US20210035551A1 (en) * 2019-08-03 2021-02-04 Google Llc Controlling Expressivity In End-to-End Speech Synthesis Systems
CN111721535A (en) * 2020-06-23 2020-09-29 中国人民解放军战略支援部队航天工程大学 Bearing fault detection method based on convolution multi-head self-attention mechanism
US20230343319A1 (en) * 2022-04-22 2023-10-26 Papercup Technologies Limited speech processing system and a method of processing a speech signal
CN114842826A (en) * 2022-04-25 2022-08-02 马上消费金融股份有限公司 Training method of speech synthesis model, speech synthesis method and related equipment
CN114882862A (en) * 2022-04-29 2022-08-09 华为技术有限公司 Voice processing method and related equipment
CN116941947A (en) * 2023-08-29 2023-10-27 武义零智不锈钢制品有限公司 AI intelligent cover

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEI ZHAO ET AL.: "Enhancing Local Dependencies for Transformer-Based Text-to-Speech via Hybrid Lightweight Convolution", IEEE ACCESS ( VOLUME: 9) *
吴邦誉;周越;赵群飞;张朋柱;: "采用拼音降维的中文对话模型", 中文信息学报, no. 05 *

Also Published As

Publication number Publication date
CN117153144B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
CN111627418B (en) Training method, synthesizing method, system, device and medium for speech synthesis model
CN110718211B (en) Keyword recognition system based on hybrid compressed convolutional neural network
CN112185352A (en) Voice recognition method and device and electronic equipment
US11507324B2 (en) Using feedback for adaptive data compression
CN107240396B (en) Speaker self-adaptation method, device, equipment and storage medium
CN111627428B (en) Method for constructing compressed speech recognition model
US20220207356A1 (en) Neural network processing unit with network processor and convolution processor
CN112565777A (en) Deep learning model-based video data transmission method, system, medium and device
EP3507799A1 (en) Quantizer with index coding and bit scheduling
CN117153144B (en) Battery information voice broadcasting method and device based on terminal calculation
CN111816197B (en) Audio encoding method, device, electronic equipment and storage medium
CN111653261A (en) Speech synthesis method, speech synthesis device, readable storage medium and electronic equipment
CN110570877A (en) Sign language video generation method, electronic device and computer readable storage medium
CN114330239A (en) Text processing method and device, storage medium and electronic equipment
CN115936092A (en) Neural network model quantization method and device, storage medium and electronic device
CN111104951A (en) Active learning method and device and terminal equipment
CN111583902B (en) Speech synthesis system, method, electronic device and medium
CN114139703A (en) Knowledge distillation method and device, storage medium and electronic equipment
JP4603429B2 (en) Client / server speech recognition method, speech recognition method in server computer, speech feature extraction / transmission method, system, apparatus, program, and recording medium using these methods
KR20020081586A (en) Data processing device
CN115102852B (en) Internet of things service opening method and device, electronic equipment and computer medium
CN116343781A (en) Training method and device of voice recognition model, storage medium and electronic equipment
US20230075562A1 (en) Audio Transcoding Method and Apparatus, Audio Transcoder, Device, and Storage Medium
CN114970955B (en) Short video heat prediction method and device based on multi-mode pre-training model
WO2020149511A1 (en) Electronic device and control method therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant