CN112837702A

CN112837702A - Voice emotion distributed system and voice signal processing method

Info

Publication number: CN112837702A
Application number: CN202011630403.6A
Authority: CN
Inventors: 刘春贤; 王新涛; 李姗姗
Original assignee: Safukai Information System Wuxi Co ltd
Current assignee: Safukai Information System Wuxi Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-25

Abstract

The invention relates to a voice emotion distributed system and a voice signal processing method, wherein the processing method comprises the following steps of S1, receiving a voice signal; s2, selecting an off-line analysis mode or a real-time analysis mode; under an off-line analysis mode, cutting the voice signals into voice blocks, and performing emotion analysis on the voice blocks; and in the real-time analysis mode, carrying out real-time emotion analysis on a voice stream formed by the voice signals. The method can apply the voice emotion analysis technology to the actual scene in a large scale and at low cost, thereby ensuring the accuracy of the voice emotion analysis, ensuring the requirements of high performance, high expansibility and the like of the system, and reducing the use cost and the maintenance cost.

Description

Voice emotion distributed system and voice signal processing method

Technical Field

The invention relates to the field of signal processing, in particular to a voice emotion distributed system and a voice signal processing method.

Background

Speech is an important behavioral signal that reflects human emotion, and emotion recognition based on speech signals has been widely focused and studied in recent years. Emotion recognition based on speech signals at the present stage is mainly classified into two categories, based on different expression modes of emotion. The first expression is the kind of emotion, and the six most commonly used basic emotions include happy (happy), sad (sadness), angor (anger), nausea (distust), fear (fear), and surprise (surrise). The second representation is based on several dimensional vectors, the most common being the arousal-value space (2-dimensional emotion space). Neither technique is currently widely popularized and applied in display scenes, for example:

scene 1: in the question-answering process of signature, the prior medical history and the prior diseases are concealed, or the real age, the real occupation and the like are concealed; in the process of claim settlement, there are cases such as intentionally hiding facts and events, imposition, and the like. At present, a speech emotion system is used in the scenes, but most of the scenes can only analyze speech emotion, the requirement on speech quality is high, and a uniform scheduling center, a report center and the like are not provided.

Scene 2: the public security and inspection authorities use technical means to discover the emotion of the interlocutor in real time in the case conversation and auditing process, so that the whole processing process is more accurate. In the current market, part of speech emotion recognition systems are used, but the speech emotion recognition systems are basically single integrated machines combining software and hardware, cannot be networked with multiple terminals for simultaneous processing and use, are high in use cost and poor in maintenance and upgrading performance.

Disclosure of Invention

Aiming at the defects of the prior art, the invention discloses a voice emotion distributed system and a voice signal processing method.

The technical scheme adopted by the invention is as follows:

a speech signal processing method based on a speech emotion distributed system comprises the following steps:

s1, receiving a voice signal;

s2, selecting an off-line analysis mode or a real-time analysis mode; under an off-line analysis mode, cutting the voice signals into voice blocks, and performing emotion analysis on the voice blocks; and in the real-time analysis mode, carrying out real-time emotion analysis on a voice stream formed by the voice signals.

The further technical scheme is that the real-time analysis mode specifically comprises the following steps:

s101, entering a real-time analysis mode;

s102, starting a voice simultaneous transmission function, and simultaneously transmitting voice signals to a plurality of terminals of the distributed system;

s103, carrying out verification, identification and authorization to verify the identity of the user; after step S103, step S104 is performed or step S106 is directly performed;

s104, stream pushing, namely pushing a voice stream formed by a voice signal to a local cache module;

s105, caching the voice signal pushed by the stream to the local;

s106, task allocation or service switching, ordering and scheduling the voice signals to be analyzed, transmitting the voice signals with high processing priority to a voice stream real-time analysis module, and executing S107 on other voice signals with low processing priority

And S107, receiving and buffering the stream, and receiving and buffering other voice signals with low processing priority.

And S108, carrying out real-time voice stream analysis, receiving a voice signal needing to be subjected to an analysis task, and carrying out real-time emotion analysis on the voice signal.

S109, analyzing result feedback;

and step S110, processing and displaying the analysis result.

And S111, finishing authorization.

The further technical scheme is that the offline analysis mode specifically comprises the following steps:

s200, transmitting the voice signal into a system, and cutting the voice signal into voice blocks;

and S204, performing emotion analysis on the voice block.

S205, analyzing result feedback;

and S206, processing and displaying the analysis result.

A further technical scheme is that the step S200 specifically includes:

s201, uploading a voice source file;

s202, voice cutting transcoding is conducted, and voice source files are automatically cut into voice blocks convenient to analyze.

S203, task allocation or service switching, sequencing and scheduling the voice signals needing to be analyzed, and transmitting the voice signals with high processing priority to a voice block analysis module;

then step S204 is executed

Alternatively, the first and second electrodes may be,

s207, manually cutting the voice signals into voice blocks through a voice cutting interface;

s208, locally caching the cut voice blocks;

s209, carrying out verification, identification and authorization to verify the identity of the user;

s213, task allocation or service switching, wherein the voice signals needing to be analyzed are sequenced and scheduled, and the voice signals with high processing priority are transmitted to a voice block analysis module;

step S204 is then performed.

A voice emotion distributed system comprises a voice signal receiving module, a mode selection judging module, an offline analyzing module and a real-time analyzing module; the voice signal receiving module is used for receiving and uploading voice signals; receiving the voice signal uploaded by the voice signal module through a mode selection judging module, and selecting an offline analysis mode or a real-time analysis mode; the mode selection judging module transmits the voice signal to the offline analyzing module to perform offline analysis on the emotion of the voice signal, or the mode selection judging module transmits the voice signal to the real-time analyzing module to perform real-time analysis on the emotion of the voice signal.

The further technical scheme is as follows:

the real-time analysis module comprises:

the real-time analysis starting module is used for starting and entering a real-time analysis mode;

the voice simultaneous transmission module is used for starting a voice simultaneous transmission function and simultaneously transmitting the received voice signals to a plurality of terminals of the distributed system;

an identification authorization module for verifying the identity of the user prior to the analysis and for turning off the authorization after the analysis;

the stream pushing module is used for pushing a voice stream formed by the voice signal to the voice local cache module;

the voice local cache module is used for caching voice signals;

the task allocation or service switching module is used for sequencing and scheduling the voice signals to be analyzed, transmitting the voice signals with high processing priority to the voice stream real-time analysis module and transmitting other voice signals with low processing priority to the stream receiving cache module;

and the stream receiving and buffering module is used for receiving and buffering the voice signals with low processing priority.

And the voice stream real-time analysis module is used for receiving the voice signals needing to be analyzed and carrying out real-time emotion analysis on the voice signals.

The analysis result feedback module is used for feeding back an analysis result;

and the analysis result processing and displaying module is used for processing and displaying the analysis result.

The further technical scheme is as follows:

the offline analysis module includes:

the voice signal primary processing module is used for transmitting the voice signal into the system and cutting the voice signal into voice blocks;

the voice block analysis module is used for carrying out emotion analysis on the voice block;

The further technical scheme is that the voice signal preliminary processing module comprises:

the voice source uploading module is used for uploading a voice source file;

the voice cutting transcoding module is used for automatically cutting the voice source file into voice blocks and converting the voice blocks into a format suitable for processing;

the task allocation or service switching module is used for receiving the voice signals needing to be analyzed and performing emotion analysis on the voice signals;

the voice cutting interface module is used for manually cutting the voice signals into voice blocks;

the voice local cache module is used for locally caching the cut voice blocks;

and the identification authorization module is used for verifying the identity of the user before analysis.

The invention has the following beneficial effects:

the invention can apply the voice emotion analysis technology to the actual scene in a large scale and at low cost. The system uses a mature core algorithm, constructs a distributed platform, and constructs two analysis scenes of off-line analysis and real-time analysis, thereby ensuring the accuracy of speech emotion analysis, ensuring the requirements of high performance, high expansibility and the like of the system, and reducing the use cost and the maintenance cost. And the system is widely used in actual enterprises, and the practical production environment is tested.

Drawings

FIG. 1 is a flow chart of a real-time analysis mode of the present invention.

FIG. 2 is a flow chart of an offline analysis mode according to the present invention.

Detailed Description

The following describes embodiments of the present invention with reference to the drawings.

s1, receiving a voice signal;

FIG. 1 is a flow chart of a real-time analysis mode of the present invention. As shown in fig. 1, the real-time analysis mode specifically includes:

s101, entering a real-time analysis mode;

and S103, performing identification authorization to verify the identity of the user, wherein the identification authorization can be performed through the client. After step S103, executing step S106 to sort and schedule the voice signals to be analyzed, or executing step S104 to buffer the voice signals temporarily in the local area;

step S104, stream pushing, namely pushing a voice stream formed by the voice signal to a voice local cache module, and then executing step S105.

And S105, local voice caching for caching voice signals. Then, step S106 is executed to sort and schedule the buffered voice signals.

S106, task allocation or service switching, namely sequencing and scheduling the voice signals needing to be analyzed, transmitting the voice signals with high processing priority to a voice stream real-time analysis module, and executing S107 on other voice signals with low processing priority;

And S108, carrying out real-time voice stream analysis, receiving a voice signal to be analyzed, and carrying out real-time emotion analysis on the voice signal.

S109, analyzing result feedback;

s110, processing and displaying an analysis result;

step S111 corresponds to step S103, and the client identification authorization is ended.

FIG. 2 is a flow chart of an offline analysis mode according to the present invention. As shown in fig. 2, the offline analysis mode specifically includes the following steps that are sequentially executed:

s200, transmitting the voice signal into a system, and cutting the voice signal or cutting a voice block which is converted into a convenient voice block for analysis;

and S204, performing emotion analysis on the voice block.

S205, analyzing result feedback;

and S206, processing and displaying the analysis result.

Further, step S200 specifically includes the following steps that are sequentially executed:

s201, uploading a voice source file;

And S203, task allocation or service switching, sequencing and scheduling the voice signals needing to be analyzed, and transmitting the voice signals with high processing priority to the voice block analysis module.

Step S204 is then performed.

Alternatively, step S200 specifically includes the following steps that are sequentially executed:

s207, manually cutting the received voice signal into voice blocks through a voice cutting interface;

s208, locally caching the voice block;

Step S204 is then performed.

The invention also discloses a voice emotion distributed system for realizing the voice signal processing method, which comprises a voice signal receiving module, a mode selection judging module, an off-line analysis module and a real-time analysis module; the voice signal receiving module is used for receiving and uploading voice signals; receiving the voice signal uploaded by the voice signal module through a mode selection judging module, and selecting an offline analysis mode or a real-time analysis mode; the mode selection judging module transmits the voice signal to the offline analyzing module to perform offline analysis on the emotion of the voice signal, or the mode selection judging module transmits the voice signal to the real-time analyzing module to perform real-time analysis on the emotion of the voice signal.

Specifically, the real-time analysis module includes:

and the identification authorization module is used for verifying the identity of the user before real-time analysis and controlling the authorization to be finished after the real-time analysis is finished.

the voice local cache module is used for receiving the voice signal pushed by the stream pushing module and caching the voice signal;

and the task allocation or service switching module is used for sequencing and scheduling the voice signals to be analyzed, transmitting the voice signals with high processing priority to the voice stream real-time analysis module, and transmitting other voice signals with low processing priority to the stream receiving and caching module.

The offline analysis module includes:

Specifically, the voice signal preliminary processing module includes:

the voice source uploading module is used for uploading a voice source file;

and the voice cutting transcoding module is used for automatically cutting the voice source file into voice blocks convenient for analysis and converting the voice blocks into a format suitable for processing.

And the task distribution or service switching module is used for receiving the voice signals needing to be analyzed and performing emotion analysis on the voice signals.

And the voice cutting interface module is used for manually cutting the voice signals into voice blocks.

And the voice local cache module is used for locally caching a part of voice blocks which are cut manually.

NET Core and C + + language are used to construct, and the distributed B/S system is used.

The algorithm and the module of the voice cutting or the voice cutting transcoding are specifically two modes according to the content disclosed by the invention: for noiseless voice signals, such as voice in question and answer form, a manual cutting workbench, namely a voice cutting interface module, is provided, and an automatic cutting service, namely a voice cutting transcoding module, can also be used, and is generally completed by using an ffmpeg component. For a noisy voice signal, the system can also provide an Api interface for public docking, and supports docking of a source audio file after voice cutting, for example, a voice cutting system of companies such as fast business, voice technology, science and technology, and science and communication can be integrated.

The speech analysis techniques mentioned, in particular,

the off-line analysis algorithm of the voice signal and the related modules can adopt a voice emotion analysis algorithm kernel of Nemesysco company, and a net core language correlation technology is used for building a voice analysis background service; micro-services and distributed development techniques and components are used to support distributed deployment of the system.

The online analysis algorithm of the voice signal and the related modules can adopt a voice emotion analysis algorithm kernel of Nemessysco company, and a distributed online voice analysis background service is built by using net core and C + + technology; micro-services and distributed development techniques and components are used to support distributed deployment of the system.

The Task allocation or service switching algorithm and the related modules can adopt a Task pool (Task) in the net core, a callback mechanism, a Socket technology and the like to construct an analysis Task unified management function, manage the process, the progress and the like of the analysis Task. And automatic scheduling, manual scheduling, task progress reminding and the like of the tasks are supported.

The analysis result processing and displaying algorithm and the related modules can adopt VueJS, WebApi, ECharts and other technologies to construct a report center so as to graphically display the result of the voice emotion analysis and support the derivation of a PDF report.

In summary, the specific implementation methods of speech signal segmentation, transcoding, emotion analysis, task allocation or service switching, report derivation, and the like in the present invention are all prior art, and those skilled in the art can build the system by combination as needed.

The foregoing description is illustrative of the present invention and is not to be construed as limiting thereof, the scope of the invention being defined by the appended claims, which may be modified in any manner without departing from the basic structure thereof.

Claims

1. A speech signal processing method based on a speech emotion distributed system is characterized by comprising the following steps:

s1, receiving a voice signal;

2. The speech signal processing method based on the speech emotion distributed system as claimed in claim 1, wherein the real-time analysis mode specifically includes:

s101, entering a real-time analysis mode;

s105, caching the voice signal pushed by the stream to the local;

S109, analyzing result feedback;

s110, processing and displaying an analysis result;

and S111, finishing authorization.

3. The control method based on the speech emotion distributed system as claimed in claim 1, wherein the offline analysis mode specifically includes:

and S204, performing emotion analysis on the voice block.

S205, analyzing result feedback;

and S206, processing and displaying the analysis result.

4. The control method based on the distributed system of speech emotion according to claim 3, wherein step S200 specifically includes:

s201, uploading a voice source file;

then step S204 is executed

Alternatively, the first and second electrodes may be,

s208, locally caching the cut voice blocks;

step S204 is then performed.

5. A distributed system of speech emotion, characterized by: the system comprises a voice signal receiving module, a mode selection judging module, an off-line analysis module and a real-time analysis module; the voice signal receiving module is used for receiving and uploading voice signals; receiving the voice signal uploaded by the voice signal module through a mode selection judging module, and selecting an offline analysis mode or a real-time analysis mode; the mode selection judging module transmits the voice signal to the offline analyzing module to perform offline analysis on the emotion of the voice signal, or the mode selection judging module transmits the voice signal to the real-time analyzing module to perform real-time analysis on the emotion of the voice signal.

6. The distributed system of speech emotion of claim 5, wherein:

the real-time analysis module comprises:

the voice local cache module is used for caching voice signals;

7. The distributed system of speech emotion of claim 5, wherein:

the offline analysis module includes:

8. The control method based on the distributed system of voice emotion according to claim 7, wherein the voice signal preliminary processing module comprises:

the voice source uploading module is used for uploading a voice source file;

the voice local cache module is used for locally caching the cut voice blocks;