WO2016090762A1 - 一种语音信号的处理方法、终端及计算机存储介质 - Google Patents

一种语音信号的处理方法、终端及计算机存储介质 Download PDF

Info

Publication number
WO2016090762A1
WO2016090762A1 PCT/CN2015/074740 CN2015074740W WO2016090762A1 WO 2016090762 A1 WO2016090762 A1 WO 2016090762A1 CN 2015074740 W CN2015074740 W CN 2015074740W WO 2016090762 A1 WO2016090762 A1 WO 2016090762A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
emotion type
emotion
voice emotion
type
Prior art date
Application number
PCT/CN2015/074740
Other languages
English (en)
French (fr)
Inventor
安斌
张慕辉
赵金
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016090762A1 publication Critical patent/WO2016090762A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Definitions

  • the present invention relates to the field of signal processing, and in particular, to a method for processing a voice signal, a terminal, and a computer storage medium.
  • the embodiment of the present invention is to provide a method for processing a voice signal, a terminal, and a computer storage medium for intelligently processing a voice signal during a user's voice call, thereby improving the intelligence of the terminal and providing a good user experience.
  • an embodiment of the present invention provides a method for processing a voice signal, where the method includes: obtaining, in a process of user voice communication, a first voice emotion type, wherein the first voice emotion type is used to reflect Depressing an emotion when the user inputs a voice signal; determining a second voice emotion type corresponding to the output of the first voice emotion type according to a correspondence between the pre-stored input voice emotion type and the output voice emotion type, wherein the input The voice emotion type is different from the output voice emotion type; the voice signal is processed to output a voice signal reflecting the second voice emotion type.
  • the obtaining the first voice emotion type includes: parsing a voice signal input by the user, and extracting a voice emotion parameter; and querying a parameter value of the voice emotion parameter in a preset voice emotion reference library; Corresponding voice emotion type, the voice emotion type corresponding to the parameter value is determined as the first voice emotion type.
  • the method further includes: determining, after the voice emotion type corresponding to the parameter value is not found in the voice emotion reference library, determining the location according to the preset condition The first type of voice emotion is described.
  • the voice emotion parameter includes at least an average spectral energy and/or a fundamental frequency front end rising slope.
  • the output corresponding to the first voice emotion type is determined according to a correspondence between a pre-stored input voice emotion type and an output voice emotion type.
  • the second type of voice emotion includes: determining a neutral or positive emotion type as the second voice emotion type according to the correspondence.
  • an embodiment of the present invention provides a terminal, where the terminal includes: an obtaining unit, a determining unit, and a processing unit, where the obtaining unit is configured to obtain a first voice emotion type in a process of user voice communication.
  • the first voice emotion type is used to reflect an emotion when the user inputs a voice signal;
  • the determining unit is configured to determine, according to a correspondence between a pre-stored input voice emotion type and an output voice emotion type, a second voice emotion type corresponding to the output of the first voice emotion type, wherein the input voice emotion type and The output speech emotion type is different;
  • the processing unit is configured to process the speech signal and output a speech signal reflecting the second speech emotion type.
  • the obtaining unit is configured to parse the voice signal input by the user, and extract a voice emotion parameter; when the voice emotion corresponding to the parameter value of the voice emotion parameter is queried in the preset voice emotion reference library When the type is selected, the voice emotion type corresponding to the parameter value is determined as the first voice emotion type.
  • the determining unit is further configured to: after the obtaining, by the obtaining unit, the voice emotion parameter, when the voice emotion type corresponding to the parameter value is not queried in the voice emotion reference library, according to the preset Condition, determining the first voice emotion type.
  • the voice emotion parameter includes at least an average spectral energy and/or a fundamental frequency front end rising slope.
  • the determining unit is configured to determine, according to the correspondence relationship, a neutral or positive emotion type as the second voice emotion type when the first voice emotion type is a negative emotion type.
  • the embodiment of the present invention further provides a computer storage medium, the storage medium comprising a set of computer executable instructions for performing a method for processing a voice signal according to an embodiment of the present invention.
  • the terminal in the process of the user performing voice communication, the terminal first obtains a first voice emotion type reflecting the emotion when the user inputs the voice signal, and then according to the pre-stored Entering a correspondence between a voice emotion type and an output voice emotion type, and determining a second voice emotion type corresponding to the output of the first voice emotion type, wherein the input voice emotion type is different from the output voice emotion type, and finally, the terminal is based on the second a voice emotion type, processing the voice signal, and outputting the processed voice signal, that is, when the user inputs the voice signal, the terminal may obtain an output voice emotion type different from the voice emotion type input by the user according to the corresponding relationship, and then The terminal intelligently processes the voice signal input by the user based on the output voice emotion type, so that the processed language The voice emotion reflected by the tone signal is different from the input time, avoiding the influence of one party in the conversation affecting the other party's emotion, effectively
  • FIG. 1 is a schematic flowchart of a method for processing a voice signal according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a method for processing a voice signal reflecting an angry mood according to an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the embodiment of the invention provides a method for processing a voice signal, and the method is applied to a terminal, and the terminal may be a device such as a smart phone or a tablet computer.
  • FIG. 1 is a schematic flowchart of a method for processing a voice signal according to an embodiment of the present invention. Referring to FIG. 1, the method includes:
  • S101 In a process of user voice communication, obtaining a first voice emotion type, where the first voice emotion type is used to reflect an emotion when the user inputs a voice signal;
  • the terminal collects the voice signal input by the user in real time, through the encoder chip or bandpass filter, analog-to-digital converter (ADC, Analog- To-Digital Converter) performs pre-processing on the speech signal, such as amplification and filtering, and then parses the speech signal, extracts corresponding speech emotion parameters, and queries the preset speech emotion reference library to query the parameter values of the speech emotion parameter.
  • ADC Analog- To-Digital Converter
  • a voice emotion type when queried, determining a voice emotion type corresponding to the parameter value as corresponding to the voice signal
  • the first speech emotion type wherein the speech emotion parameter includes at least an average spectral energy and/or a fundamental frequency front end rising slope.
  • voice emotion type mentioned in the embodiment of the present invention may refer to negative emotions such as sadness, anger, fear, etc., and may also refer to positive emotions such as happiness, joy, happiness, etc., and may also refer to calmness, Peaceful, calm and other neutral emotions.
  • the voice emotion parameter can be detected at the same time, and the detection condition is clear, and the voice emotion reference library stores a large number of voice emotion reference models, so the processing time of the voice signal is fast, so that the other party It is not perceptible that the output processed speech signal has a significant delay, so that normal voice communication between users can be guaranteed.
  • the encoder chip can be used to preprocess the speech signal.
  • the speech signal can be preprocessed by using a bandpass filter, an ADC, a code modulator, etc., of course, other methods can also be used.
  • the embodiment of the invention is not specifically limited.
  • user A wants to chat with user B, activates the voice emotion recognition function through the control button on the smart phone, and inputs the telephone number of user B on the dial button, and pulls out the phone, through a microphone or a headset, etc.
  • User B makes a voice call.
  • the smartphone collects the voice signal of user A in real time. Because during the chat process, user A may be angry because of some things or the mood is suddenly bad.
  • the smart phone can receive the voice signal input by the user A through the microphone or the bluetooth earphone, and then perform analog/digital conversion, amplification, filtering and the like on the voice signal input by the user A, analyze the voice signal, and extract the corresponding voice emotion parameter, that is, Average spectral energy, baseband front-end rising slope, etc., and then according to the corresponding parameter values, such as the average spectral energy value of 60dB and the fundamental frequency front-end rising slope value of 3.28, in the preset local speech emotion reference library or the network voice emotion reference library Query the type of voice emotion corresponding to the above parameter values, such as anger, at this time, the smartphone will The anger emotion is determined as the first voice emotion type; or the smartphone corresponding to the voice emotion parameter corresponding to the voice signal input by the user A, such as an average spectral energy value of 58 dB and a fundamental frequency front end rising slope value of 0.45 Querying in the preset local voice emotion reference library or the network voice emotion reference library
  • the corresponding voice emotion type such
  • the smart phone determines the happy mood as the first voice emotion type; further, the smart phone obtains the parameter value corresponding to the voice emotion parameter, such as the average spectral energy value of 40 dB and the fundamental frequency.
  • the front-end rising slope value is 2.5, and the corresponding voice emotion type is queried in the preset local voice emotion reference library or the network voice emotion reference library as calm emotion.
  • the smart phone determines the calm emotion as the first voice emotion type. .
  • the preset voice emotion reference library includes at least a local voice emotion reference library and a network voice emotion reference library, wherein the local voice emotion reference library is preset in the terminal, and the user can record by itself.
  • the local voice emotion reference library stores different types of voice emotion reference models
  • the terminal can connect to the network voice emotion reference library through the network provided by the operator or the terminal wireless network, etc., in the network voice emotion reference library
  • the voice emotion type of the voice signal input by the user may be queried, and the voice emotion type of the voice signal input by the user may be queried based on the local voice emotion reference library, and of course, there may be other manners, which are not specifically limited in the embodiment of the present invention.
  • the terminal may not query the voice emotion type corresponding to the parameter value of the voice emotion parameter in the preset voice emotion reference library.
  • the terminal may at least base.
  • the frequency front end rises the slope value to determine the first voice emotion type corresponding to the voice signal input by the user. For example, when the parameter value corresponding to the voice emotion parameter extracted by the smart phone according to the voice signal input by the user A is not queried in the preset voice emotion reference library, the base frequency front end rising slope value may be used.
  • the first voice emotion type is determined to be an anger emotion because the base frequency front end rising slope is greater than a preset threshold; or, the base frequency front end rising slope value is 0.45 and the preset threshold value is 2.5.
  • the first voice emotion type is determined to be a happy mood; or, the base frequency front end rising slope value is 2.5 and The preset threshold value of 2.5 is compared, and since the rising slope of the fundamental frequency front end is equal to the preset threshold value, the voice emotion type of the above voice signal is determined to be a calm emotion type.
  • the foregoing preset thresholds may have other values, which are not limited by the embodiments of the present invention.
  • the S101 may also be: in the process of the user voice communication, the terminal first preprocesses the voice signal input by the user, and obtains the decibel value of the voice signal, when decibel When the value is outside the preset decibel threshold range, the first voice emotion type described above is obtained.
  • S102 Determine, according to a correspondence between the pre-stored input voice emotion type and the output voice emotion type, a second voice emotion type corresponding to the output of the first voice emotion type;
  • the terminal determines the first voice emotion type of the user voice signal, according to the correspondence between the pre-stored input voice emotion type and the output voice emotion type, as shown in Table 1, determining the output emotion type corresponding to the first voice emotion type , that is, the second type of voice emotion.
  • the smartphone determines that the first voice emotion type is an anger mood, it can be determined that the corresponding output voice emotion type is a calm mood, and at this time, the smartphone determines the calm emotion as the second voice emotion type. ;
  • the input speech emotion type is a negative emotion, and the corresponding output speech emotion type may be a positive emotion; or, input The voice emotion type is a positive emotion, and the corresponding output voice emotion type may be a neutral emotion; or the input voice emotion type is a neutral emotion, and the corresponding output voice emotion type may be a positive emotion, which is not specific to the embodiment of the present invention. Limit set.
  • the terminal may also process only for negative emotions. Then, when S101 determines that the first voice emotion type is a negative emotion, S102 is performed; when S101 determines the first voice. When the emotion type is positive or neutral, the terminal does not process the input voice signal and directly outputs it.
  • the terminal may process the voice signal based on the outputted second voice emotion type, and then output the processed voice signal.
  • the smartphone After determining, according to Table 1, the second voice emotion type is a calm emotion type, the smartphone performs modulation and demodulation on the voice signal through an internal encoder chip or a code modulator, etc., and inputs the user A.
  • the voice signal reflecting the anger is converted into a voice signal reflecting the calm emotion, and then the processed voice signal is output to the user B; or the smart phone determines that the second voice emotion type is a happy mood, at this time, the smartphone will be the user A
  • the input voice signal reflecting the anger is converted into a voice signal reflecting the happy mood and output to the user B.
  • the terminal may modify the register in the encoder chip or the code modulator to filter the output voice through a low-pass filter to filter a high-frequency portion of the negative emotion speech that is greater than a certain threshold (eg, 3 kHz). , only pass the low frequency part below the threshold (such as 3kHz), so that the output is a calm voice signal.
  • a certain threshold eg, 3 kHz
  • FIG. 2 is a schematic flowchart of a method for processing a voice signal reflecting an angry mood according to an embodiment of the present invention. Referring to FIG. 2, the method includes:
  • S202 The mobile phone performs pre-processing on the voice signal input by the user A.
  • S203 The mobile phone parses the voice signal, and extracts an average spectral energy and a rising slope of the base frequency front end. Parameter value
  • the average spectral energy value is 60dB and the fundamental frequency front end rising slope value is 3.28.
  • the mobile phone searches for the corresponding voice emotion type as an anger emotion according to the average spectral energy value and the base frequency front end rising slope value in the preset voice emotion reference library;
  • the mobile phone determines, according to the correspondence between the input voice emotion type and the output voice emotion type, that the output voice emotion type corresponding to the angry mood is a calm mood;
  • the mobile phone processes the voice signal input by the user A based on the calm emotion, and outputs a voice signal reflecting the calm emotion.
  • the terminal when the user inputs a voice signal, the terminal can obtain an output voice emotion type different from the voice emotion type input by the user according to the preset correspondence between the input voice emotion type and the output voice emotion type, and then the terminal is based on the output.
  • the voice emotion type intelligently processes the voice signal input by the user, so that the voice emotion reflected by the processed voice signal is different from the input time, thereby avoiding the influence of one party in the call affecting the emotion of the other party, and improving the intelligence level of the terminal. To improve the user experience.
  • an embodiment of the present invention provides a terminal that is consistent with the terminal described in one or more of the foregoing embodiments.
  • the terminal includes: an obtaining unit 31, a determining unit 32, and a processing unit 33;
  • the obtaining unit 31 is configured to obtain a first voice emotion type in a process of user voice communication, wherein the first voice emotion type is used to reflect an emotion when the user inputs a voice signal; and the determining unit 32 is configured to be pre-stored according to the Entering a correspondence between a voice emotion type and an output voice emotion type, and determining an output second voice emotion type corresponding to the first voice emotion type, wherein the input voice emotion type is different from the output voice emotion type; and the processing unit 33 is configured to The speech signal is processed to output a speech signal reflecting the second speech emotion type.
  • the obtaining unit 31 is configured to parse the voice signal input by the user, and extract the voice emotion parameter; when the voice value corresponding to the parameter value of the voice emotion parameter is queried in the preset voice emotion reference library In the emotion type, the voice emotion type corresponding to the parameter value is determined as the first voice emotion type.
  • the determining unit 32 is further configured to determine, after the obtaining unit extracts the voice emotion parameter, the first voice emotion type according to the preset condition, after the voice emotion type corresponding to the parameter value is not queried in the voice emotion reference library.
  • the speech emotion parameter includes at least an average spectral energy and/or a fundamental frequency front end rising slope.
  • the determining unit 32 is configured to determine the neutral or positive emotion type as the second voice emotion type according to the correspondence relationship when the first voice emotion type is a negative emotion type.
  • the obtaining unit 31, the determining unit 32, and the processing unit 33 may be disposed in a processor such as a CPU, an ARM, an audio processor, or the like, or may be disposed in, for example, an embedded controller or a system-on-a-chip. Make specific limits.
  • the embodiment of the present invention further provides a computer storage medium, the storage medium comprising a set of computer executable instructions for performing a method for processing a voice signal according to an embodiment of the present invention.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)

Abstract

一种语音信号的处理方法、终端及计算机存储介质,所述方法包括:在用户语音通信的过程中,获得第一语音情绪类型,其中,所述第一语音情绪类型用于反映所述用户输入语音信号时的情绪(S101);根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定所述第一语音情绪类型对应的输出的第二语音情绪类型(S102),其中,所述输入语音情绪类型与所述输出语音情绪类型不同;对所述语音信号进行处理,输出反映所述第二语音情绪类型的语音信号(S103)。

Description

一种语音信号的处理方法、终端及计算机存储介质 技术领域
本发明涉及信号处理领域,尤其涉及一种语音信号的处理方法、终端及计算机存储介质。
背景技术
随着智能手机的飞速发展,智能手机已经成为重要的通讯工具,人们之间通过手机、电脑等设备进行语音通话已经非常普遍,通过这种方式与亲朋好友交流沟通,不仅可以增进彼此的感情,也可以拉近彼此的距离。
以手机为例,人们可以通过手机与朋友聊天,以增进彼此的感情,但是,在人们聊天的过程中,手机不会对用户输入的语音信号进行任何处理,直接传递给对端,那么,这样就可能存在这样的情况:如果用户A心情不好或者与用户B意见不和,导致情绪愤怒,此时,他所输入的语音信号就可以反映出这种情绪,而手机直接将该语音信号传递给用户B,用户B接收到该语音信号时,能够感受到用户A的情绪,这样,就可能影响用户B的情绪,可能会导致用户A和用户B最终不愉快地结束通话,这样不仅影响彼此的心情,而且可能导致双方关系破裂,造成一系列不好的后果。
所以,在现有技术中存在终端的智能程度低、无法对用户语音通话时的语音信号进行智能处理的技术问题。
发明内容
有鉴于此,本发明实施例期望提供一种语音信号的处理方法、终端及计算机存储介质,用以对用户语音通话时的语音信号进行智能处理,提高终端的智能程度,提供良好的用户体验。
为达到上述目的,本发明实施例的技术方案是这样实现的:
第一方面,本发明实施例提供一种语音信号的处理方法,所述方法包括:在用户语音通信的过程中,获得第一语音情绪类型,其中,所述第一语音情绪类型用于反映所述用户输入语音信号时的情绪;根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定所述第一语音情绪类型对应的输出的第二语音情绪类型,其中,所述输入语音情绪类型与所述输出语音情绪类型不同;对所述语音信号进行处理,输出反映所述第二语音情绪类型的语音信号。
上述方案中,所述获得第一语音情绪类型,包括:解析所述用户输入的语音信号,提取语音情绪参数;当在预置的语音情绪参考库中查询到所述语音情绪参数的参数值所对应的语音情绪类型时,将所述参数值所对应的语音情绪类型确定为所述第一语音情绪类型。
上述方案中,在所述提取语音情绪参数之后,所述方法还包括:当在所述语音情绪参考库中未查询到所述参数值所对应的语音情绪类型后,根据预设条件,确定所述第一语音情绪类型。
上述方案中,所述语音情绪参数至少包括平均谱能量和/或基频前端上升斜率。
上述方案中,当所述第一语音情绪类型为负性情绪类型时,所述根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定所述第一语音情绪类型对应的输出的第二语音情绪类型,包括:根据所述对应关系,将中性或正性情绪类型确定为所述第二语音情绪类型。
第二方面,本发明实施例提供一种终端,所述终端包括:获得单元、确定单元及处理单元;其中,所述获得单元,配置为在用户语音通信的过程中,获得第一语音情绪类型,其中,所述第一语音情绪类型用于反映所述用户输入语音信号时的情绪;所述确定单元,配置为根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定所述第一语音情绪类型对应的输出的第二语音情绪类型,其中,所述输入语音情绪类型与所 述输出语音情绪类型不同;所述处理单元,配置为对所述语音信号进行处理,输出反映所述第二语音情绪类型的语音信号。
上述方案中,所述获得单元,配置为解析所述用户输入的语音信号,提取语音情绪参数;当在预置的语音情绪参考库中查询到所述语音情绪参数的参数值所对应的语音情绪类型时,将所述参数值所对应的语音情绪类型确定为所述第一语音情绪类型。
上述方案中,所述确定单元,还配置为在所述获得单元提取语音情绪参数之后,当在所述语音情绪参考库中未查询到所述参数值所对应的语音情绪类型后,根据预设条件,确定所述第一语音情绪类型。
上述方案中,所述语音情绪参数至少包括平均谱能量和/或基频前端上升斜率。
上述方案中,所述确定单元,配置为当所述第一语音情绪类型为负性情绪类型时,根据所述对应关系,将中性或正性情绪类型确定为所述第二语音情绪类型。
本发明实施例还提供了一种计算机存储介质,所述存储介质包括一组计算机可执行指令,所述指令用于执行本发明实施例所述的语音信号的处理方法。
本发明实施例所提供的语音信号的处理方法、终端及计算机存储介质,在用户进行语音通信的过程中,终端首先获得反映用户输入语音信号时的情绪的第一语音情绪类型,然后根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定第一语音情绪类型对应的输出的第二语音情绪类型,其中,输入语音情绪类型与输出语音情绪类型不同,最后,终端基于第二语音情绪类型,对语音信号进行处理,输出处理后的语音信号,也就是说,用户输入语音信号时,终端可以根据上述对应关系,获得与用户输入的语音情绪类型不同的输出语音情绪类型,然后,终端基于该输出语音情绪类型,对用户输入的语音信号进行智能处理,这样,处理后的语 音信号所反映的语音情绪就与输入时不同,避免因通话中的一方情绪影响另一方情绪,有效地解决了现有技术中终端的智能程度低、无法对用户语音通话时的语音信号进行智能处理的技术问题,提高了终端的智能程度,提高用户的体验。
附图说明
图1为本发明实施例中的对语音信号进行处理的方法流程示意图;
图2为本发明实施例中的对反映愤怒情绪的语音信号进行处理的方法流程示意图;
图3为本发明实施例中的终端的结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
本发明实施例提供一种语音信号的处理方法,该方法应用于终端上,该终端可以为智能手机、平板电脑等设备。
图1为本发明实施例中的对语音信号进行处理的方法流程示意图,参考图1所示,该方法包括:
S101:在用户语音通信的过程中,获得第一语音情绪类型,其中,第一语音情绪类型用于反映用户输入语音信号时的情绪;
当用户使用终端与其他用户打电话、视频通话或即时语音聊天的过程中时,终端实时采集用户输入的语音信号,通过编码器芯片或带通滤波器、模/数转换器(ADC,Analog-to-Digital Converter)等对语音信号进行放大、滤波等预处理,然后解析该语音信号,提取对应的语音情绪参数,在预置的语音情绪参考库中查询该语音情绪参数的参数值所对应的语音情绪类型,当查询到时,将该参数值所对应的语音情绪类型确定为该语音信号对应的 第一语音情绪类型,其中,上述语音情绪参数至少包括平均谱能量和/或基频前端上升斜率。
需要说明的是,在本发明实施例中所说的语音情绪类型可以指如悲伤、愤怒、恐惧等负性情绪,也可以指如高兴、喜悦、愉快等正性情绪,还可以指如平静、平和、平稳等中性情绪。
终端在对语音信号进行上述预处理时,语音情绪参数可以同时检测到,并且检测条件明确,语音情绪参考库中存储有大量的语音情绪参考模型,所以对语音信号的处理时间很快,使对方不会察觉到输出的处理后的语音信号具有明显的延时,所以,可以保证用户之间的正常语音通信。对语音信号的预处理方法很多,可以采用编码器芯片对语音信号进行预处理,也可以采用带通滤波器、ADC、编码调制器等对语音信号进行预处理,当然还可以采用其它方法,本发明实施例不做具体限定。
举例来说,用户A想要与用户B进行聊天,通过智能手机上的控制键开启语音情感识别功能,并在拨号键上输入了用户B的电话号码,拔出电话,通过麦克风或耳机等与用户B进行语音通话,在两人聊天的过程中,智能手机实时采集用户A的语音信号,由于在聊天过程中,用户A可能因为一些事情或者心情突然不好,情绪变的愤怒,此时,智能手机可以通过麦克风或蓝牙耳机接收用户A输入的语音信号,然后对用户A输入的语音信号进行模拟/数字转化、放大、滤波等预处理,解析该语音信号,提取对应的语音情绪参数,即平均谱能量、基频前端上升斜率等,然后根据对应的参数值,如平均谱能量值为60dB及基频前端上升斜率值为3.28,在预置的本地语音情绪参考库或网络语音情绪参考库中查询上述参数值所对应的语音情绪类型,如愤怒情绪,此时,智能手机将该愤怒情绪确定为上述第一语音情绪类型;或者,智能手机根据用户A输入的语音信号中提取出的语音情绪参数对应的参数值,如平均谱能量值为58dB及基频前端上升斜率值为0.45,在预置的本地语音情绪参考库或网络语音情绪参考库中查询到 对应的语音情绪类型为高兴情绪,此时,智能手机将高兴情绪确定为上述第一语音情绪类型;再者,智能手机获得语音情绪参数对应的参数值,如平均谱能量值为40dB及基频前端上升斜率值为2.5,在预置的本地语音情绪参考库或网络语音情绪参考库中查询到对应的语音情绪类型为平静情绪,此时,智能手机将平静情绪确定为上述第一语音情绪类型。
需要说明的是,在实际应用中,上述预置的语音情绪参考库至少包括本地语音情绪参考库和网络语音情绪参考库,其中,终端中预置有本地语音情绪参考库,用户可以通过自己录制等方式存储自己常用的一些语音情绪参考模型,并在之后用户的使用过程中,终端根据用户的习惯等进行学习,将用户一些新的语音情绪类型添加到该本地语音情绪参考库中,来扩充本地语音情绪参考库;网络语音情绪参考库中存储有不同类型的语音情绪参考模型,终端可以通过运营商提供的网络或终端的无线网络等连接到网络语音情绪参考库,在网络语音情绪参考库中查询用户输入的语音信号的语音情绪类型,也可以基于本地的语音情绪参考库对用户输入的语音信号的语音情绪类型进行查询,当然还可以有其它方式,本发明实施例不做具体限定。
在具体实施过程中,终端在提取语音情绪参数之后,还可能在预置的语音情绪参考库中查询不到上述语音情绪参数的参数值所对应的语音情绪类型,此时,终端可以至少根据基频前端上升斜率值,确定用户输入的语音信号所对应的第一语音情绪类型。比如,当智能手机根据用户A输入的语音信号中提取出的语音情绪参数对应的参数值在预置的语音情绪参考库中未查询到对应的语音情绪类型时,可以将基频前端上升斜率值为3.28与预设阈值2.5进行比较,由于基频前端上升斜率大于预设阈值,从而确定上述第一语音情绪类型为愤怒情绪;或者,将基频前端上升斜率值为0.45与预设阈值2.5进行比较,由于基频前端上升斜率小于预设阈值,从而确定上述第一语音情绪类型为高兴情绪;再或者,将基频前端上升斜率值为2.5与 预设阈值2.5进行比较,由于基频前端上升斜率与预设阈值相等,从而确定上述语音信号的语音情绪类型为平静情绪类型。当然,上述预设阈值还可以有其它取值,以实际应用为准,本发明实施例不做具体限定。
为了减少终端的功耗,简化终端的数据处理流程,S101还可以为:终端在用户语音通信的过程中,先对用户输入的语音信号进行预处理之后,获得该语音信号的分贝值,当分贝值处于预设分贝门限范围之外时,获得上述第一语音情绪类型。
S102:根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定第一语音情绪类型对应的输出的第二语音情绪类型;
当终端确定用户语音信号的第一语音情绪类型后,根据预先存储的输入语音情绪类型与输出语音情绪类型之间的对应关系,如表1所示,确定第一语音情绪类型对应的输出情绪类型,即第二语音情绪类型。
输入语音情绪类型 输出语音情绪类型
悲伤 平静
愤怒 平静
恐惧 平静
表1
举例来说,参考表1,当智能手机确定第一语音情绪类型为愤怒情绪时,可以确定其对应的输出语音情绪类型为平静情绪,此时,智能手机将平静情绪确定为第二语音情绪类型;
在实际应用中,上述输入语音情绪类型与输出语音情绪类型之间还可以有其它对应关系,比如,输入语音情绪类型为负性情绪,对应的输出语音情绪类型可以为正性情绪;或者,输入语音情绪类型为正性情绪,对应的输出语音情绪类型可以为中性情绪;或者,输入语音情绪类型为中性情绪,对应的输出语音情绪类型可以为正性情绪,本发明实施例不做具体限 定。
在另一实施例中,终端还可以仅针对负性情绪进行处理,那么,此时,当S101确定出上述第一语音情绪类型为负性情绪时,执行S102;当S101确定出上述第一语音情绪类型为正性或者中性情绪时,终端对输入的语音信号不做处理,直接输出。
S103:对语音信号进行处理,输出反映第二语音情绪类型的语音信号。
终端可以基于输出的第二语音情绪类型,对上述语音信号进行处理,然后输出处理后的语音信号。
举例来说,智能手机根据表1确定出第二语音情绪类型为平静情绪类型后,通过其内部的如编码器芯片或者编码调制器等对语音信号进行调制解调等处理,将用户A输入的反映愤怒情绪的语音信号转化为反映平静情绪的语音信号,然后将处理后的语音信号输出给用户B;或者,智能手机确定出第二语音情绪类型为高兴情绪,此时,智能手机将用户A输入的反映愤怒情绪的语音信号转化为反映高兴情绪的语音信号输出给用户B。具体的,终端可以通过对编码器芯片或编码调制器内的寄存器修改,使输出的语音通过其低通滤波器,将负性情绪语音中大于一定阈值(如3kHz)的高频部分进行滤除,只通过阈值(如3kHz)以下的低频部分,从而使输出是平静的语音信号。
下面以具体实例来对上述一个或者多个实施例所述的语音信号的处理方法进行描述。
图2为本发明实施例中的对反映愤怒情绪的语音信号进行处理的方法流程示意图,参考图2所示,该方法包括:
S201:在用户A与用户B打电话的过程中,手机获得用户A输入的语音信号;
S202:手机对用户A输入的语音信号进行预处理;
S203:手机解析该语音信号,提取平均谱能量和基频前端上升斜率的 参数值;
其中,平均谱能量值为60dB及基频前端上升斜率值为3.28。
S204:手机根据平均谱能量值和基频前端上升斜率值,在预置的语音情绪参考库查询到对应的语音情绪类型为愤怒情绪;
S205:手机根据输入语音情绪类型与输出语音情绪类型之间的对应关系,确定与愤怒情绪对应的输出语音情绪类型为平静情绪;
S206:手机基于平静情绪,对用户A输入的语音信号进行处理,输出反映平静情绪的语音信号。
由上述可知,用户输入语音信号时,终端可以根据预置的输入语音情绪类型与输出语音情绪类型的对应关系,获得与用户输入的语音情绪类型不同的输出语音情绪类型,然后,终端基于该输出语音情绪类型,对用户输入的语音信号进行智能处理,这样,处理后的语音信号所反映的语音情绪就与输入时不同,避免因通话中的一方情绪影响另一方情绪,提高了终端的智能程度,提高用户的体验。
基于同一发明构思,本发明实施例提供一种终端,该终端与上述一个或者多个实施例中所述的终端一致。
图3为本发明实施例中的终端的结构示意图,参考图3所示,该终端包括:获得单元31、确定单元32及处理单元33;
其中,获得单元31,配置为在用户语音通信的过程中,获得第一语音情绪类型,其中,第一语音情绪类型用于反映用户输入语音信号时的情绪;确定单元32,配置为根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定第一语音情绪类型对应的输出的第二语音情绪类型,其中,输入语音情绪类型与输出语音情绪类型不同;处理单元33,配置为对语音信号进行处理,输出反映第二语音情绪类型的语音信号。
获得单元31,配置为解析用户输入的语音信号,提取语音情绪参数;当在预置的语音情绪参考库中查询到语音情绪参数的参数值所对应的语音 情绪类型时,将参数值所对应的语音情绪类型确定为第一语音情绪类型。
确定单元32,还配置为在获得单元提取语音情绪参数之后,当在语音情绪参考库中未查询到参数值所对应的语音情绪类型后,根据预设条件,确定第一语音情绪类型。
语音情绪参数至少包括平均谱能量和/或基频前端上升斜率。
确定单元32,配置为当第一语音情绪类型为负性情绪类型时,根据对应关系,将中性或正性情绪类型确定为第二语音情绪类型。
上述获得单元31、确定单元32及处理单元33均可设置在终端如CPU、ARM、音频处理器等处理器中,也可以设置在如嵌入式控制器或系统级芯片中,本发明实施例不做具体限定。
本发明实施例还提供了一种计算机存储介质,所述存储介质包括一组计算机可执行指令,所述指令用于执行本发明实施例所述的语音信号的处理方法。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上,使得在计算机或其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。

Claims (11)

  1. 一种语音信号的处理方法,应用于终端,所述方法包括:
    在用户语音通信的过程中,获得第一语音情绪类型,其中,所述第一语音情绪类型用于反映所述用户输入语音信号时的情绪;
    根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定所述第一语音情绪类型对应的输出的第二语音情绪类型,其中,所述输入语音情绪类型与所述输出语音情绪类型不同;
    对所述语音信号进行处理,输出反映所述第二语音情绪类型的语音信号。
  2. 根据权利要求1所述的方法,其中,所述获得第一语音情绪类型,包括:
    解析所述用户输入的语音信号,提取语音情绪参数;
    当在预置的语音情绪参考库中查询到所述语音情绪参数的参数值所对应的语音情绪类型时,将所述参数值所对应的语音情绪类型确定为所述第一语音情绪类型。
  3. 根据权利要求2所述的方法,其中,在所述提取语音情绪参数之后,所述方法还包括:
    当在所述语音情绪参考库中未查询到所述参数值所对应的语音情绪类型后,根据预设条件,确定所述第一语音情绪类型。
  4. 根据权利要求2或3所述的方法,其中,所述语音情绪参数至少包括平均谱能量和/或基频前端上升斜率。
  5. 根据权利要求1所述的方法,其中,当所述第一语音情绪类型为负性情绪类型时,所述根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定所述第一语音情绪类型对应的输出的第二语音情绪类型,包括:
    根据所述对应关系,将中性或正性情绪类型确定为所述第二语音情绪类型。
  6. 一种终端,所述终端包括:获得单元、确定单元及处理单元;其中,
    所述获得单元,配置为在用户语音通信的过程中,获得第一语音情绪类型,其中,所述第一语音情绪类型用于反映所述用户输入语音信号时的情绪;
    所述确定单元,配置为根据预存的输入语音情绪类型与输出语音情绪类型之间的对应关系,确定所述第一语音情绪类型对应的输出的第二语音情绪类型,其中,所述输入语音情绪类型与所述输出语音情绪类型不同;
    所述处理单元,配置为对所述语音信号进行处理,输出反映所述第二语音情绪类型的语音信号。
  7. 根据权利要求6所述的终端,其中,所述获得单元,配置为解析所述用户输入的语音信号,提取语音情绪参数;当在预置的语音情绪参考库中查询到所述语音情绪参数的参数值所对应的语音情绪类型时,将所述参数值所对应的语音情绪类型确定为所述第一语音情绪类型。
  8. 根据权利要求7所述的终端,其中,所述确定单元,还配置为在所述获得单元提取语音情绪参数之后,当在所述语音情绪参考库中未查询到所述参数值所对应的语音情绪类型后,根据预设条件,确定所述第一语音情绪类型。
  9. 根据权利要求7或8所述的终端,其中,所述语音情绪参数至少包括平均谱能量和/或基频前端上升斜率。
  10. 根据权利要求6所述的终端,其中,所述确定单元,配置为当所述第一语音情绪类型为负性情绪类型时,根据所述对应关系,将中性或正性情绪类型确定为所述第二语音情绪类型。
  11. 一种计算机存储介质,所述存储介质包括一组计算机可执行指令,所述指令用于执行权利要求1-5任一项所述的语音信号的处理方法。
PCT/CN2015/074740 2014-12-12 2015-03-20 一种语音信号的处理方法、终端及计算机存储介质 WO2016090762A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410768174.2A CN105741854A (zh) 2014-12-12 2014-12-12 一种语音信号的处理方法及终端
CN201410768174.2 2014-12-12

Publications (1)

Publication Number Publication Date
WO2016090762A1 true WO2016090762A1 (zh) 2016-06-16

Family

ID=56106536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/074740 WO2016090762A1 (zh) 2014-12-12 2015-03-20 一种语音信号的处理方法、终端及计算机存储介质

Country Status (2)

Country Link
CN (1) CN105741854A (zh)
WO (1) WO2016090762A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108494952A (zh) * 2018-03-05 2018-09-04 广东欧珀移动通信有限公司 语音通话处理方法及相关设备
CN109697290A (zh) * 2018-12-29 2019-04-30 咪咕数字传媒有限公司 一种信息处理方法、设备及计算机存储介质
CN111833907A (zh) * 2020-01-08 2020-10-27 北京嘀嘀无限科技发展有限公司 一种人机交互方法与终端、计算机可读存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818787B (zh) * 2017-10-31 2021-02-05 努比亚技术有限公司 一种语音信息的处理方法、终端及计算机可读存储介质
CN107995370B (zh) * 2017-12-21 2020-11-24 Oppo广东移动通信有限公司 通话控制方法、装置及存储介质和移动终端
CN108900706B (zh) * 2018-06-27 2021-07-02 维沃移动通信有限公司 一种通话语音调整方法及移动终端
CN109820522A (zh) * 2019-01-22 2019-05-31 苏州乐轩科技有限公司 情绪侦测装置、系统及其方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007271655A (ja) * 2006-03-30 2007-10-18 Brother Ind Ltd 感情付加装置、感情付加方法及び感情付加プログラム
CN101370195A (zh) * 2007-08-16 2009-02-18 英华达(上海)电子有限公司 移动终端中实现情绪调节的方法及装置
CN103543979A (zh) * 2012-07-17 2014-01-29 联想(北京)有限公司 一种输出语音的方法、语音交互的方法及电子设备
US20140046660A1 (en) * 2012-08-10 2014-02-13 Yahoo! Inc Method and system for voice based mood analysis
CN103903627A (zh) * 2012-12-27 2014-07-02 中兴通讯股份有限公司 一种语音数据的传输方法及装置
CN104113634A (zh) * 2013-04-22 2014-10-22 三星电子(中国)研发中心 一种对语音进行处理的方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102184731A (zh) * 2011-05-12 2011-09-14 北京航空航天大学 一种韵律类和音质类参数相结合的情感语音转换方法
CN104050965A (zh) * 2013-09-02 2014-09-17 广东外语外贸大学 具有情感识别功能的英语语音发音质量评价系统及方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007271655A (ja) * 2006-03-30 2007-10-18 Brother Ind Ltd 感情付加装置、感情付加方法及び感情付加プログラム
CN101370195A (zh) * 2007-08-16 2009-02-18 英华达(上海)电子有限公司 移动终端中实现情绪调节的方法及装置
CN103543979A (zh) * 2012-07-17 2014-01-29 联想(北京)有限公司 一种输出语音的方法、语音交互的方法及电子设备
US20140046660A1 (en) * 2012-08-10 2014-02-13 Yahoo! Inc Method and system for voice based mood analysis
CN103903627A (zh) * 2012-12-27 2014-07-02 中兴通讯股份有限公司 一种语音数据的传输方法及装置
CN104113634A (zh) * 2013-04-22 2014-10-22 三星电子(中国)研发中心 一种对语音进行处理的方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108494952A (zh) * 2018-03-05 2018-09-04 广东欧珀移动通信有限公司 语音通话处理方法及相关设备
CN108494952B (zh) * 2018-03-05 2021-07-09 Oppo广东移动通信有限公司 语音通话处理方法及相关设备
CN109697290A (zh) * 2018-12-29 2019-04-30 咪咕数字传媒有限公司 一种信息处理方法、设备及计算机存储介质
CN109697290B (zh) * 2018-12-29 2023-07-25 咪咕数字传媒有限公司 一种信息处理方法、设备及计算机存储介质
CN111833907A (zh) * 2020-01-08 2020-10-27 北京嘀嘀无限科技发展有限公司 一种人机交互方法与终端、计算机可读存储介质

Also Published As

Publication number Publication date
CN105741854A (zh) 2016-07-06

Similar Documents

Publication Publication Date Title
WO2016090762A1 (zh) 一种语音信号的处理方法、终端及计算机存储介质
CN109801644B (zh) 混合声音信号的分离方法、装置、电子设备和可读介质
CN104616666B (zh) 一种基于语音分析改善对话沟通效果的方法及装置
WO2016180100A1 (zh) 一种音频处理的性能提升方法及装置
US10270736B2 (en) Account adding method, terminal, server, and computer storage medium
CN105489221A (zh) 一种语音识别方法及装置
CN104468930B (zh) 一种放音响度调整方法及装置
WO2016023317A1 (zh) 一种语音信息的处理方法及终端
WO2020253128A1 (zh) 基于语音识别的通信服务方法、装置、计算机设备及存储介质
CN104883437B (zh) 基于环境的语音分析调整提示音量的方法及系统
US11062708B2 (en) Method and apparatus for dialoguing based on a mood of a user
WO2016183961A1 (zh) 智能设备的界面切换方法、系统、设备及非易失性计算机存储介质
KR101559364B1 (ko) 페이스 투 페이스 인터랙션 모니터링을 수행하는 모바일 장치, 이를 이용하는 인터랙션 모니터링 방법, 이를 포함하는 인터랙션 모니터링 시스템 및 이에 의해 수행되는 인터랙션 모니터링 모바일 애플리케이션
CN105120063A (zh) 一种输入语音的音量提示方法及电子设备
CN108665889B (zh) 语音信号端点检测方法、装置、设备及存储介质
CN104412258A (zh) 应用文本信息进行通信的方法及装置
WO2017181615A1 (zh) 一种陌生来电处理方法、装置及移动终端
CN104202485B (zh) 一种安全通话方法、装置和移动终端
CN110931028B (zh) 一种语音处理方法、装置和电子设备
CN104078045A (zh) 一种识别的方法及电子设备
US10884696B1 (en) Dynamic modification of audio signals
WO2018032760A1 (zh) 一种语音信息处理方法和装置
CN110970015B (zh) 一种语音处理方法、装置和电子设备
TW201637003A (zh) 音訊處理系統
US20210082405A1 (en) Method for Location Reminder and Electronic Device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15868225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15868225

Country of ref document: EP

Kind code of ref document: A1