CN110364177A - Method of speech processing, mobile terminal and computer readable storage medium - Google Patents

Method of speech processing, mobile terminal and computer readable storage medium Download PDF

Info

Publication number
CN110364177A
CN110364177A CN201910623577.0A CN201910623577A CN110364177A CN 110364177 A CN110364177 A CN 110364177A CN 201910623577 A CN201910623577 A CN 201910623577A CN 110364177 A CN110364177 A CN 110364177A
Authority
CN
China
Prior art keywords
voice signal
mobile terminal
main body
channel model
fundamental tone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910623577.0A
Other languages
Chinese (zh)
Inventor
王冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201910623577.0A priority Critical patent/CN110364177A/en
Publication of CN110364177A publication Critical patent/CN110364177A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)

Abstract

This application provides a kind of method of speech processing, comprising: acquires main body voice signal or object voice signal by microphone;When mobile terminal is in the first operating mode, the channel model of main body voice signal is analyzed, to extract the channel model parameter of the main body voice signal;When mobile terminal is in the second operating mode, the fundamental tone of object voice signal is analyzed, to extract the fundamental tone feature of the object voice signal;The object voice signal is handled based on the channel model parameter and the fundamental tone feature, to obtain targeted voice signal.Present invention also provides a kind of mobile terminal and computer readable storage mediums.Such mode can the fundamental tone feature of channel model parameter and object voice signal based on the main body voice signal of extraction object voice signal is handled, to obtain the targeted voice signal of the fundamental tone feature invariant of the channel model for meeting main body voice signal and object voice signal, the experience of user is improved.

Description

Method of speech processing, mobile terminal and computer readable storage medium
Technical field
This application involves voice processing technology field more particularly to a kind of method of speech processing, mobile terminal and computer Readable storage medium storing program for executing.
Background technique
Usual voice signal all has certain feature, and user can distinguish speaker by these features.Due to people Hearing level difference, the voice signal of same word speed can feel word speed quickly so that not hearing for some people Chu can think that word speed is very slow so that feeling wasting time for another part people.Therefore, mobile terminal needs people's Actual demand carries out change appropriate to the intonation and word speed of the voice signal received.
However, usually will lead to broadcasting speed when changing word speed by accelerating or slowing down broadcasting speed is not suitable for user's Word speed habit, affects the experience of user.It changes simultaneously as the change of word speed will lead to intonation, causes user can not Speaker is identified according to the voice signal after change, has seriously affected the impression of user.
Summary of the invention
The main purpose of the application is to propose a kind of method of speech processing, mobile terminal and computer-readable storage medium Matter, it is intended to which the channel model parameter of the main body voice signal based on extraction and the fundamental tone feature of object voice signal are to object voice Signal is handled, thus obtain the fundamental tone feature invariant of the channel model for meeting main body voice signal and object voice signal Targeted voice signal, that is, the targeted voice signal for meeting the word speed of user and maintaining speaker's intonation constant, promote user Experience.
To achieve the above object, this application provides a kind of method of speech processing, the method for speech processing is applied to move Dynamic terminal, the mobile terminal are configured with microphone, and the method for speech processing includes: to acquire main body by the microphone Voice signal or object voice signal;When the mobile terminal is in the first operating mode, to the main body voice signal Channel model is analyzed, to extract the channel model parameter of the main body voice signal;When the mobile terminal is in second When operating mode, the fundamental tone of the object voice signal is analyzed, to extract the fundamental tone feature of the object voice signal; And the object voice signal is handled based on the channel model parameter and the fundamental tone feature, to obtain target voice Signal.
Optionally, the mobile terminal is also configured with switch unit, described to acquire main body voice by the microphone After the step of signal or object voice signal, further includes: the operating mode for obtaining the mobile terminal, when the switch unit When in the conductive state, the mobile terminal is in the first operating mode, described when the switch unit is in off state Mobile terminal is in the second operating mode.
Optionally, the step of channel model to the main body voice signal is analyzed, comprising: utilize synchronous waves Shape superposition algorithm or the similar superposition algorithm of waveform analyze the channel model of the main body voice signal.
Optionally, the channel model parameter of the main body voice signal includes at least instantaneous amplitude, instantaneous frequency, instantaneous phase Position.
Optionally, the step of fundamental tone to the object voice signal is analyzed, comprising: utilize the synchronous waves Shape superposition algorithm or the similar superposition algorithm of the waveform analyze the fundamental tone of the object voice signal.
Optionally, the word speed of the targeted voice signal is identical as the word speed of the main body voice signal, the target language The word speed of sound signal is different from the word speed of the object voice signal, the intonation of the targeted voice signal and the object voice The intonation of signal is identical.
Optionally, it is described based on the channel model parameter and the fundamental tone feature to the object voice signal at Reason, the step of to obtain targeted voice signal, comprising: using the channel model parameter to fundamental tone feature progress convolution at Reason, and by the voice signal Jing Guo process of convolution by radiation patterns, to obtain targeted voice signal.
Optionally, the mobile terminal is also configured with earphone, described to be based on the channel model parameter and the fundamental tone Feature handles the object voice signal, the step of to obtain targeted voice signal after, further includes: by the target Voice messaging is exported by the earphone.
The application also provides a kind of mobile terminal, and the mobile terminal includes: processor;Memory, with the processor Connection, the memory include control instruction, and when the processor reads the control instruction, it is real to control the mobile terminal Existing above-mentioned method of speech processing.
The application also provides a kind of computer readable storage medium, and the computer readable storage medium has one or more Program, one or more of programs are executed by one or more processors, to realize above-mentioned method of speech processing.
Method of speech processing, mobile terminal and computer readable storage medium provided by the present application, first pass through the Mike Elegance collection main body voice signal or object voice signal;When the mobile terminal is in the first operating mode, to the main body The channel model of voice signal is analyzed and extracts the channel model parameter of the main body voice signal;When the mobile terminal When in the second operating mode, the base of the object voice signal is analyzed the fundamental tone of the object voice signal and extracted Sound feature;It is then based on the channel model parameter and the fundamental tone feature handles the object voice signal, with To targeted voice signal, to obtain the fundamental tone feature invariant of the channel model for meeting main body voice signal and object voice signal Targeted voice signal, improve the usage experience of user.Further, the word speed of the targeted voice signal and the main body The word speed of voice signal is identical, and the word speed of the targeted voice signal is different from the word speed of the object voice signal, the mesh The intonation of poster sound signal is identical as the intonation of the object voice signal, to realize the word speed of obtained targeted voice signal Meet the word speed of user, while intonation is identical as speaker's intonation, user can be accustomed to catching by the word speed of oneself The voice messaging of words person ensure that the subjective feeling of user.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
Fig. 1 is the hardware structural diagram for realizing each embodiment one optional mobile terminal of the application;
Fig. 2 is the communications network system schematic diagram of mobile terminal as shown in Figure 1;
Fig. 3 is the flow chart for the method for speech processing that one embodiment of the application provides;
Fig. 4 is the principle framework figure that the mobile terminal that one embodiment of the application provides carries out speech processes;
Fig. 5 is the structural schematic diagram for the mobile terminal that one embodiment of the application provides.
The embodiments will be further described with reference to the accompanying drawings for realization, functional characteristics and the advantage of the application purpose.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
In subsequent description, it is only using the suffix for indicating such as " module ", " component " or " unit " of element Be conducive to explanation of the invention, itself there is no a specific meaning.Therefore, " module ", " component " or " unit " can mix Ground uses.
Terminal can be implemented in a variety of manners.For example, terminal described in the present invention may include such as mobile phone, plate Computer, laptop, palm PC, personal digital assistant (Personal Digital Assistant, PDA), portable Media player (Portable Media Player, PMP), wearable device, Intelligent bracelet, pedometer, helps navigation device Listen the fixed terminals such as the mobile terminals such as device equipment, and number TV, desktop computer.
It will be illustrated by taking mobile terminal as an example in subsequent descriptions, it will be appreciated by those skilled in the art that in addition to special Except element for moving purpose, the construction of embodiment according to the present invention can also apply to the terminal of fixed type.
Referring to Fig. 1, a kind of hardware structural diagram of its mobile terminal of each embodiment to realize the present invention, the shifting Dynamic terminal 100 may include: RF (Radio Frequency, radio frequency) unit 101, WiFi module 102, audio output unit 103, A/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, the components such as memory 109, processor 110 and power supply 111.It will be understood by those skilled in the art that shown in Fig. 1 Mobile terminal structure does not constitute the restriction to mobile terminal, and mobile terminal may include components more more or fewer than diagram, Perhaps certain components or different component layouts are combined.
It is specifically introduced below with reference to all parts of the Fig. 1 to mobile terminal 100:
Radio frequency unit 101 can be used for receiving and sending messages or communication process in, signal sends and receivees, specifically, by base station Downlink information receive after, to processor 110 handle;In addition, the data of uplink are sent to base station.In general, radio frequency unit 101 Including but not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier, duplexer etc..In addition, penetrating Frequency unit 101 can also be communicated with network and other equipment by wireless communication.Any communication can be used in above-mentioned wireless communication Standard or agreement, including but not limited to GSM (Global System of Mobile communication, global system for mobile telecommunications System), GPRS (General Packet Radio Service, general packet radio service), CDMA2000 (Code Division Multiple Access 2000, CDMA 2000), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access, TD SDMA), FDD-LTE (Frequency Division Duplexing-Long Term Evolution, frequency division duplex long term evolution) and TDD-LTE (Time Division Duplexing-Long Term Evolution, time division duplex long term evolution) etc..
WiFi belongs to short range wireless transmission technology, and mobile terminal can help user to receive and dispatch electricity by WiFi module 102 Sub- mail, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 1 shows Go out WiFi module 102, but it is understood that, and it is not belonging to must be configured into for mobile terminal, it completely can be according to need It to omit within the scope of not changing the essence of the invention.
Audio output unit 103 can be in call signal reception pattern, call mode, record mould in mobile terminal 100 When under the isotypes such as formula, speech recognition mode, broadcast reception mode, by radio frequency unit 101 or WiFi module 102 it is received or The audio data stored in memory 109 is converted into audio signal and exports to be sound.Moreover, audio output unit 103 Audio output relevant to the specific function that mobile terminal 100 executes can also be provided (for example, call signal receives sound, disappears Breath receives sound etc.).Audio output unit 103 may include loudspeaker, buzzer etc..
A/V input unit 104 is for receiving audio or video signal.A/V input unit 104 may include graphics processor (Graphics Processing Unit, GPU) 1041 and microphone 1042, graphics processor 1041 is in video acquisition mode Or the image data of the static images or video obtained in image capture mode by image capture apparatus (such as camera) carries out Reason.Treated, and picture frame may be displayed on display unit 106.Through graphics processor 1041, treated that picture frame can be deposited Storage is sent in memory 109 (or other storage mediums) or via radio frequency unit 101 or WiFi module 102.Mike Wind 1042 can connect in telephone calling model, logging mode, speech recognition mode etc. operational mode via microphone 1042 Quiet down sound (audio data), and can be audio data by such acoustic processing.Audio that treated (voice) data can To be converted to the format output that can be sent to mobile communication base station via radio frequency unit 101 in the case where telephone calling model. Microphone 1042 can be implemented various types of noises elimination (or inhibition) algorithms and send and receive sound to eliminate (or inhibition) The noise generated during frequency signal or interference.
Mobile terminal 100 further includes at least one sensor 105, such as optical sensor, motion sensor and other biographies Sensor.Specifically, optical sensor includes ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 1061, and proximity sensor can close when mobile terminal 100 is moved in one's ear Display panel 1061 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (general For three axis) size of acceleration, it can detect that size and the direction of gravity when static, can be used to identify the application of mobile phone posture (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) etc.; The fingerprint sensor that can also configure as mobile phone, pressure sensor, iris sensor, molecule sensor, gyroscope, barometer, The other sensors such as hygrometer, thermometer, infrared sensor, details are not described herein.
Display unit 106 is for showing information input by user or being supplied to the information of user.Display unit 106 can wrap Display panel 1061 is included, liquid crystal display (Liquid Crystal Display, LCD), Organic Light Emitting Diode can be used Forms such as (Organic Light-Emitting Diode, OLED) configure display panel 1061.
User input unit 107 can be used for receiving the number or character information of input, and generate the use with mobile terminal Family setting and the related key signals input of function control.Specifically, user input unit 107 may include touch panel 1071 with And other input equipments 1072.Touch panel 1071, also referred to as touch screen collect the touch operation of user on it or nearby (for example user uses any suitable objects or attachment such as finger, stylus on touch panel 1071 or in touch panel 1071 Neighbouring operation), and corresponding attachment device is driven according to preset formula.Touch panel 1071 may include touch detection Two parts of device and touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch operation band The signal come, transmits a signal to touch controller;Touch controller receives touch information from touch detecting apparatus, and by it It is converted into contact coordinate, then gives processor 110, and order that processor 110 is sent can be received and executed.In addition, can To realize touch panel 1071 using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.In addition to touch panel 1071, user input unit 107 can also include other input equipments 1072.Specifically, other input equipments 1072 can wrap It includes but is not limited in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. It is one or more, specifically herein without limitation.
Further, touch panel 1071 can cover display panel 1061, when touch panel 1071 detect on it or After neighbouring touch operation, processor 110 is sent to determine the type of touch event, is followed by subsequent processing device 110 according to touch thing The type of part provides corresponding visual output on display panel 1061.Although in Fig. 1, touch panel 1071 and display panel 1061 be the function that outputs and inputs of realizing mobile terminal as two independent components, but in certain embodiments, it can The function that outputs and inputs of mobile terminal is realized so that touch panel 1071 and display panel 1061 is integrated, is not done herein specifically It limits.
Interface unit 108 be used as at least one external device (ED) connect with mobile terminal 100 can by interface.For example, External device (ED) may include wired or wireless headphone port, external power supply (or battery charger) port, wired or nothing Line data port, memory card port, the port for connecting the device with identification module, audio input/output (I/O) end Mouth, video i/o port, ear port etc..Interface unit 108 can be used for receiving the input from external device (ED) (for example, number It is believed that breath, electric power etc.) and the input received is transferred to one or more elements in mobile terminal 100 or can be with For transmitting data between mobile terminal 100 and external device (ED).
Memory 109 can be used for storing software program and various data.Memory 109 can mainly include storing program area The storage data area and, wherein storing program area can (such as the sound of application program needed for storage program area, at least one function Sound playing function, image player function etc.) etc.;Storage data area can store according to mobile phone use created data (such as Audio data, phone directory etc.) etc..In addition, memory 109 may include high-speed random access memory, it can also include non-easy The property lost memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts.
Processor 110 is the control centre of mobile terminal, utilizes each of various interfaces and the entire mobile terminal of connection A part by running or execute the software program and/or module that are stored in memory 109, and calls and is stored in storage Data in device 109 execute the various functions and processing data of mobile terminal, to carry out integral monitoring to mobile terminal.Place Managing device 110 may include one or more processing units;Preferably, processor 110 can integrate application processor and modulatedemodulate is mediated Manage device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is main Processing wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 110.
Mobile terminal 100 can also include the power supply 111 (such as battery) powered to all parts, it is preferred that power supply 111 Can be logically contiguous by power-supply management system and processor 110, to realize management charging by power-supply management system, put The functions such as electricity and power managed.
Although Fig. 1 is not shown, mobile terminal 100 can also be including bluetooth module etc., and details are not described herein.
Embodiment to facilitate the understanding of the present invention, the communications network system that mobile terminal of the invention is based below into Row description.
Referring to Fig. 2, Fig. 2 is a kind of communications network system architecture diagram provided in an embodiment of the present invention, the communication network system System is the LTE system of universal mobile communications technology, which includes UE (User Equipment, the use of successively communication connection Family equipment) (the land Evolved UMTS Terrestrial Radio Access Network, evolved UMTS 201, E-UTRAN Ground wireless access network) 202, EPC (Evolved Packet Core, evolved packet-based core networks) 203 and operator IP operation 204。
Specifically, UE201 can be above-mentioned terminal 100, and details are not described herein again.
E-UTRAN202 includes eNodeB2021 and other eNodeB2022 etc..Wherein, eNodeB2021 can be by returning Journey (backhaul) (such as X2 interface) is connect with other eNodeB2022, and eNodeB2021 is connected to EPC203, ENodeB2021 can provide the access of UE201 to EPC203.
EPC203 may include MME (Mobility Management Entity, mobility management entity) 2031, HSS (Home Subscriber Server, home subscriber server) 2032, other MME2033, SGW (Serving Gate Way, Gateway) 2034, PGW (PDN Gate Way, grouped data network gateway) 2035 and PCRF (Policy and Charging Rules Function, policy and rate functional entity) 2036 etc..Wherein, MME2031 be processing UE201 and The control node of signaling, provides carrying and connection management between EPC203.HSS2032 is all to manage for providing some registers Such as the function of home location register (not shown) etc, and preserves some related service features, data rates etc. and use The dedicated information in family.All customer data can be sent by SGW2034, and PGW2035 can provide the IP of UE 201 Address distribution and other functions, PCRF2036 are strategy and the charging control strategic decision-making of business data flow and IP bearing resource Point, it selects and provides available strategy and charging control decision with charge execution function unit (not shown) for strategy.
IP operation 204 may include internet, Intranet, IMS (IP Multimedia Subsystem, IP multimedia System) or other IP operations etc..
Although above-mentioned be described by taking LTE system as an example, those skilled in the art should know the present invention is not only Suitable for LTE system, be readily applicable to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA with And the following new network system etc., herein without limitation.
Based on above-mentioned mobile terminal hardware configuration and communications network system, each embodiment of the method for the present invention is proposed.
Fig. 3 is the flow chart of the embodiment of a method of speech processing provided by the present application.Once the method quilt of the embodiment User's triggering, then the process in the embodiment passes through 100 automatic running of mobile terminal, wherein each step is when operation It can be and successively carried out according to the sequence in such as flow chart, be also possible to multiple steps according to the actual situation while carrying out, herein And without limitation.Referring to Figure 4 together, the mobile terminal 100 is configured with microphone 205.At voice provided by the present application Reason method includes the following steps:
Step S310 acquires main body voice signal or object voice signal by the microphone 205;
Step S330, when the mobile terminal 100 is in the first operating mode, to the sound channel of the main body voice signal Model is analyzed, to extract the channel model parameter of the main body voice signal;
Step S350, when the mobile terminal 100 is in the second operating mode, to the fundamental tone of the object voice signal It is analyzed, to extract the fundamental tone feature of the object voice signal;
Step S370, based on the channel model parameter and the fundamental tone feature to the object voice signal at Reason, to obtain targeted voice signal.
By above embodiment, the channel model parameter of the main body voice signal based on extraction and object voice signal Fundamental tone feature handles object voice signal, to obtain the channel model for meeting main body voice signal and object voice letter Number fundamental tone feature invariant targeted voice signal, improve the usage experience of user.
Above-mentioned steps are specifically described below in conjunction with specific embodiment.
In step s310, main body voice signal or object voice signal are acquired by the microphone 205.
Specifically, the main body voice signal is user, that is, auditor voice signal.The object voice signal is to say The voice signal of words person.
In the present embodiment, the acquisition of the main body voice signal and the object voice signal can be to call industry The monitoring of the application program of voice play function business is engaged in and can be realized to realize;It can be when making a phone call or answering the call Voice signal, the voice signal for being also possible to have the application program of voice play function to play.It should be noted that this hair The type of voice signal in bright can be Chinese, English, Japanese, German and French etc., herein not to the type of voice signal It is defined.In other embodiments, main body voice signal or object language can also be acquired by other voice collecting units Sound signal.
In the present embodiment, since the possibility of acquisition main body voice signal or object voice signal is by from surrounding ring The noise that border and transmission medium introduce, makes the voice signal received and impure primary speech signal, but by noise The Noisy Speech Signal of pollution therefore can be first after collecting the main body voice signal or the object voice signal The main body voice signal or the object voice signal are filtered to obtain pure primary speech signal.
In the present embodiment, the mobile terminal 100 is also configured with switch unit 300, described to pass through the Mike Wind 205 acquired after the step of main body voice signal or object voice signal, comprising:
Step S3101 obtains the operating mode of the mobile terminal 100, when the switch unit 300 is in the conductive state When, the mobile terminal 100 is in the first operating mode, described mobile whole when the switch unit 300 is in off state End 100 is in the second operating mode.
Specifically, referring to Fig. 4, main body can be user, that is, auditor, and object can be speaker.In a kind of reality It applies in mode, the main body voice signal that the main body issues can be propagated without any processing by air or other media It goes out to be received by the object.In another embodiment, when the switch unit 300 is in different conditions, the movement Terminal 100 may be at two kinds of different operating modes, so that the main body and the microphone 205 can receive main body itself The object voice signal that the main body voice signal or object of sending issue.Specifically, when the switch unit 300 is on shape When state, the mobile terminal 100 is in the first operating mode, only acquires the main body voice signal by the microphone 205, In order to only be analyzed the channel model of the main body voice signal of entering signal processing unit 200 subsequent, in turn Extract the channel model parameter of the main body voice signal.It is described mobile whole when the switch unit 300 is in off state End 100 be in the second operating mode, the object voice signal is only acquired by the microphone 205, in order to it is subsequent only The fundamental tone of the object voice signal of entering signal processing unit 200 is analyzed, to extract the object voice signal Fundamental tone feature.
In step S330, when the mobile terminal 100 is in the first operating mode, to the main body voice signal Channel model is analyzed, to extract the channel model parameter of the main body voice signal.
Specifically, the channel model can determine the word speed of the main body voice signal.The word speed is that the mankind are peculiar Voice or language expression definition, people using have propagate or link up meaning lexical representation or propagate information when, it is described Word speed is the size of vocabulary included in the unit time.In the present embodiment, the feature spoken due to everyone is different, because The word speed of this everyone voice signal is substantially different.It is appreciated that in other embodiments, may due to it is some it is external because Element such as imitates so that the word speed of imitator and the person of being imitated may be roughly the same.
In the present embodiment, the channel model parameter of the main body voice signal includes at least instantaneous amplitude, instantaneous frequency Rate, instantaneous phase.When the switch unit 300 is in the conductive state, the mobile terminal 100 is in the first operating mode, The channel model of the main body voice signal is analyzed, and then the instantaneous width of the main body voice signal is extracted and preserved The parameters such as degree, instantaneous frequency, instantaneous phase.
In the present embodiment, the step of channel model to the main body voice signal is analyzed, comprising:
Step S3301, using sync waveform superposition algorithm or the similar superposition algorithm of waveform to the main body voice signal Channel model is analyzed.
Specifically, the signal processing unit 200 of the mobile terminal 100 can use sync waveform superposition algorithm (Pitch Synchronous Over-Lap-Add, PSOLA) or waveform it is similar (Waveform Similarity Over-Lap-Add, WSOLA) superposition algorithm analyzes the channel model of the main body voice signal.It is appreciated that in other embodiments, The signal processing unit 200 of the mobile terminal 100 can use other algorithms and carry out sound channel mould to the main body voice signal Type is analyzed.
In step S350, when the mobile terminal 100 is in the second operating mode, to the object voice signal Fundamental tone is analyzed, to extract the fundamental tone feature of the object voice signal.
Specifically, the fundamental tone can determine the intonation of the object voice signal.Intonation, that is, speech frequency height, sound The sound of height-regulating is light, short, thin, and the low sound of tone is heavy, long, thick.In the present embodiment, the feature spoken due to everyone is not Together, therefore the intonation of everyone voice signal is substantially different.It is appreciated that in other embodiments, it may be due to some External factor is such as imitated so that the intonation of imitator and the person of being imitated may be roughly the same.
In the present embodiment, the step of fundamental tone to the object voice signal is analyzed, comprising:
Step S3501, using the sync waveform superposition algorithm or the similar superposition algorithm of the waveform to the object language The fundamental tone of sound signal is analyzed.
Specifically, the signal processing unit 200 of the mobile terminal 100 can use the sync waveform superposition algorithm or The similar superposition algorithm of the waveform analyzes the fundamental tone of the object voice signal.It is appreciated that in other embodiments In, the signal processing unit 200 of the mobile terminal 100 can use other algorithms and carry out fundamental tone to the object voice signal It is analyzed.
In step S370, the object voice signal is carried out based on the channel model parameter and the fundamental tone feature Processing, to obtain targeted voice signal.
In the present embodiment, described that the object voice is believed based on the channel model parameter and the fundamental tone feature It number is handled, the step of to obtain targeted voice signal, comprising:
Step S3701 carries out process of convolution to the fundamental tone feature using the channel model parameter, and will pass through convolution The voice signal of processing is by radiation patterns, to obtain targeted voice signal.
Specifically, the word speed of the targeted voice signal is identical as the word speed of the main body voice signal, the target language The word speed of sound signal is different from the word speed of the object voice signal.The intonation of the targeted voice signal and the object voice The intonation of signal is identical, and the intonation of the targeted voice signal is different from the intonation of the main body voice signal.
In the present embodiment, the signal processing unit 200 of the mobile terminal 100 can use the channel model ginseng It is several that process of convolution is carried out to the fundamental tone feature, and by the voice signal Jing Guo process of convolution by radiation patterns, to obtain Meet the targeted voice signal of the fundamental tone feature invariant of the channel model of main body voice signal and object voice signal to get to The word speed of the main body voice signal is identical and identical with the intonation of object voice signal targeted voice signal, to allow use Family can be accustomed to catching the voice messaging of speaker by the word speed of oneself, ensure that the subjective feeling of user, improve user Usage experience.For example, when user, that is, auditor is the elderly, since old person's development and word speed are slow, and The word speed of normal person may be very fast, can extract the channel model of the voice signal of the elderly and the voice signal of speaker at this time Gene expression characteristics, and the fundamental tone feature is handled using the channel model parameter, to obtain the language for meeting normal person Targeted voice signal speed habit and maintain the intonation of speaker constant is listened to guarantee that auditor is enough by the word speed habit of oneself Understand the object voice messaging of speaker.
In the present embodiment, the mobile terminal 100 is also configured with microphone 206, described to be based on the sound channel mould Shape parameter and the fundamental tone feature handle the object voice signal, the step of to obtain targeted voice signal after, Further include:
Step S3702 exports the target voice information by the microphone 206.
Specifically, the target voice information is being obtained, the mobile terminal 100 can lead to the target voice information The output of microphone 206 is crossed to pass to auditor.
By above embodiment, first passes through the microphone 205 and acquire main body voice signal or object voice signal;When When the mobile terminal 100 is in the first operating mode, the channel model of the main body voice signal is analyzed and extracted The channel model parameter of the main body voice signal;When the mobile terminal 100 is in the second operating mode, to the object The fundamental tone of voice signal is analyzed and extracts the fundamental tone feature of the object voice signal;It is then based on the channel model ginseng The several and fundamental tone feature handles the object voice signal, to obtain targeted voice signal, to obtain meeting master The targeted voice signal of the fundamental tone feature invariant of the channel model and object voice signal of body voice signal, improves making for user With experience.Further, the word speed of the targeted voice signal is identical as the word speed of the main body voice signal, the target language The word speed of sound signal is different from the word speed of the object voice signal, the intonation of the targeted voice signal and the object voice The intonation of signal is identical, to realize that the obtained word speed of targeted voice signal meets the word speed of user, while intonation with speak Person's intonation is identical, user can be accustomed to catching the voice messaging of speaker by the word speed of oneself, ensure that the master of user Perception by.
Fig. 5 is the structure composition schematic diagram of mobile terminal 100 provided by the embodiments of the present application, and mobile terminal 100 includes: place Manage device 110;Memory 109 is connect with the processor 110, and the memory 109 includes control instruction, when the processor When the 110 reading control instruction, controls the mobile terminal 100 and realizes following steps:
Microphone by being configured to the mobile terminal 100 acquires main body voice signal or object voice signal;When When the mobile terminal 100 is in the first operating mode, the channel model of the main body voice signal is analyzed, to extract The channel model parameter of the main body voice signal;When the mobile terminal 100 is in the second operating mode, to the object The fundamental tone of voice signal is analyzed, to extract the fundamental tone feature of the object voice signal;And joined based on the channel model The several and fundamental tone feature handles the object voice signal, to obtain targeted voice signal.
Optionally, the mobile terminal 100 is also configured with switch unit, described to acquire main body language by the microphone After the step of sound signal or object voice signal, further includes: the operating mode for obtaining the mobile terminal 100 is opened when described When pass unit is in the conductive state, the mobile terminal 100 is in the first operating mode, when the switch unit is in cut-off shape When state, the mobile terminal 100 is in the second operating mode.
Optionally, the step of channel model to the main body voice signal is analyzed, comprising: utilize synchronous waves Shape superposition algorithm or the similar superposition algorithm of waveform analyze the channel model of the main body voice signal.
Optionally, the channel model parameter of the main body voice signal includes at least instantaneous amplitude, instantaneous frequency, instantaneous phase Position.
Optionally, the step of fundamental tone to the object voice signal is analyzed, comprising: utilize the synchronous waves Shape superposition algorithm or the similar superposition algorithm of the waveform analyze the fundamental tone of the object voice signal.
Optionally, the word speed of the targeted voice signal is identical as the word speed of the main body voice signal, the target language The word speed of sound signal is different from the word speed of the object voice signal, the intonation of the targeted voice signal and the object voice The intonation of signal is identical.
Optionally, it is described based on the channel model parameter and the fundamental tone feature to the object voice signal at Reason, the step of to obtain targeted voice signal, comprising: using the channel model parameter to fundamental tone feature progress convolution at Reason, and by the voice signal Jing Guo process of convolution by radiation patterns, to obtain targeted voice signal.
Optionally, the mobile terminal 100 is also configured with earphone, described to be based on the channel model parameter and the base Sound feature handles the object voice signal, the step of to obtain targeted voice signal after, further includes: by the mesh Mark voice messaging is exported by the earphone.
By above-mentioned mobile terminal 100, the microphone acquisition main body voice signal or object voice signal are first passed through;When When the mobile terminal 100 is in the first operating mode, the channel model of the main body voice signal is analyzed and extracted The channel model parameter of the main body voice signal;When the mobile terminal 100 is in the second operating mode, to the object The fundamental tone of voice signal is analyzed and extracts the fundamental tone feature of the object voice signal;It is then based on the channel model ginseng The several and fundamental tone feature handles the object voice signal, to obtain targeted voice signal, to obtain meeting master The targeted voice signal of the fundamental tone feature invariant of the channel model and object voice signal of body voice signal, improves making for user With experience.Further, the word speed of the targeted voice signal is identical as the word speed of the main body voice signal, the target language The word speed of sound signal is different from the word speed of the object voice signal, the intonation of the targeted voice signal and the object voice The intonation of signal is identical, to realize that the obtained word speed of targeted voice signal meets the word speed of user, while intonation with speak Person's intonation is identical, user can be accustomed to catching the voice messaging of speaker by the word speed of oneself, ensure that the master of user Perception by.
The embodiment of the present application also provides a kind of computer readable storage medium, and computer readable storage medium has one or more A program, one or more programs are executed by one or more processors, to realize following steps:
Microphone by being configured to mobile terminal acquires main body voice signal or object voice signal;When the movement When terminal is in the first operating mode, the channel model of the main body voice signal is analyzed, to extract the main body language The channel model parameter of sound signal;When the mobile terminal is in the second operating mode, to the base of the object voice signal Sound is analyzed, to extract the fundamental tone feature of the object voice signal;And it is based on the channel model parameter and the fundamental tone Feature handles the object voice signal, to obtain targeted voice signal.
Optionally, the mobile terminal is also configured with switch unit, described to acquire main body voice by the microphone After the step of signal or object voice signal, further includes: the operating mode for obtaining the mobile terminal 100, when the switch When unit is in the conductive state, the mobile terminal is in the first operating mode, when the switch unit is in off state, The mobile terminal is in the second operating mode.
Optionally, the step of channel model to the main body voice signal is analyzed, comprising: utilize synchronous waves Shape superposition algorithm or the similar superposition algorithm of waveform analyze the channel model of the main body voice signal.
Optionally, the channel model parameter of the main body voice signal includes at least instantaneous amplitude, instantaneous frequency, instantaneous phase Position.
Optionally, the step of fundamental tone to the object voice signal is analyzed, comprising: utilize the synchronous waves Shape superposition algorithm or the similar superposition algorithm of the waveform analyze the fundamental tone of the object voice signal.
Optionally, the word speed of the targeted voice signal is identical as the word speed of the main body voice signal, the target language The word speed of sound signal is different from the word speed of the object voice signal, the intonation of the targeted voice signal and the object voice The intonation of signal is identical.
Optionally, it is described based on the channel model parameter and the fundamental tone feature to the object voice signal at Reason, the step of to obtain targeted voice signal, comprising: using the channel model parameter to fundamental tone feature progress convolution at Reason, and by the voice signal Jing Guo process of convolution by radiation patterns, to obtain targeted voice signal.
Optionally, the mobile terminal is also configured with earphone, described to be based on the channel model parameter and the fundamental tone Feature handles the object voice signal, the step of to obtain targeted voice signal after, further includes: by the target Voice messaging is exported by the earphone.
By above-mentioned computer readable storage medium, the microphone acquisition main body voice signal or object voice are first passed through Signal;When the mobile terminal is in the first operating mode, the channel model of the main body voice signal is analyzed simultaneously Extract the channel model parameter of the main body voice signal;When the mobile terminal is in the second operating mode, to the visitor The fundamental tone of body voice signal is analyzed and extracts the fundamental tone feature of the object voice signal;It is then based on the channel model Parameter and the fundamental tone feature handle the object voice signal, to obtain targeted voice signal, to be met The targeted voice signal of the fundamental tone feature invariant of the channel model and object voice signal of main body voice signal, improves user's Usage experience.Further, the word speed of the targeted voice signal is identical as the word speed of the main body voice signal, the target The word speed of voice signal is different from the word speed of the object voice signal, the intonation of the targeted voice signal and the object language The intonation of sound signal is identical, to realize that the obtained word speed of targeted voice signal meets the word speed of user, while intonation and saying Words person's intonation is identical, user can be accustomed to catching the voice messaging of speaker by the word speed of oneself, ensure that user's Subjective feeling.
The embodiment of the present application also provides a kind of computer readable storage mediums.Here computer readable storage medium is deposited Contain one or more program.Wherein, computer readable storage medium may include volatile memory, such as arbitrary access Memory;Memory also may include nonvolatile memory, such as read-only memory, flash memory, hard disk or solid-state are hard Disk;Memory can also include the combination of the memory of mentioned kind.
Corresponding technical characteristic in the respective embodiments described above do not cause scheme contradiction or it is not implementable under the premise of, can Mutually to use.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the device that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or device.
Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, the technical solution of the application substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal (can be mobile phone, computer, service Device, air conditioner or network equipment etc.) execute method described in each embodiment of the application.
Embodiments herein is described above in conjunction with attached drawing, but the application be not limited to it is above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the enlightenment of the application, when not departing from the application objective and scope of the claimed protection, can also it make very much Form, these are belonged within the protection of the application.

Claims (10)

1. a kind of method of speech processing, which is characterized in that the method for speech processing is applied to mobile terminal, the mobile terminal It is configured with microphone, the method for speech processing includes:
Main body voice signal or object voice signal are acquired by the microphone;
When the mobile terminal is in the first operating mode, the channel model of the main body voice signal is analyzed, with Extract the channel model parameter of the main body voice signal;
When the mobile terminal is in the second operating mode, the fundamental tone of the object voice signal is analyzed, to extract The fundamental tone feature of the object voice signal;And
The object voice signal is handled based on the channel model parameter and the fundamental tone feature, to obtain target language Sound signal.
2. method of speech processing as described in claim 1, which is characterized in that it is single that the mobile terminal is also configured with switch Member, after described the step of acquiring main body voice signal or object voice signal by the microphone, further includes:
The operating mode for obtaining the mobile terminal, when the switch unit is in the conductive state, the mobile terminal is in First operating mode, when the switch unit is in off state, the mobile terminal is in the second operating mode.
3. method of speech processing as described in claim 1, which is characterized in that the sound channel mould to the main body voice signal The step of type is analyzed, comprising:
The channel model of the main body voice signal is divided using sync waveform superposition algorithm or waveform similar superposition algorithm Analysis.
4. method of speech processing as described in claim 1, which is characterized in that the channel model parameter of the main body voice signal Including at least instantaneous amplitude, instantaneous frequency, instantaneous phase.
5. method of speech processing as claimed in claim 3, which is characterized in that the fundamental tone to the object voice signal into The step of row analysis, comprising:
Using the sync waveform superposition algorithm or the similar superposition algorithm of the waveform to the fundamental tone of the object voice signal into Row analysis.
6. method of speech processing as described in claim 1, which is characterized in that the word speed of the targeted voice signal and the master The word speed of body voice signal is identical, and the word speed of the targeted voice signal is different from the word speed of the object voice signal, described The intonation of targeted voice signal is identical as the intonation of the object voice signal.
7. method of speech processing as described in claim 1, which is characterized in that described based on the channel model parameter and described Fundamental tone feature handles the object voice signal, the step of to obtain targeted voice signal, comprising:
Process of convolution is carried out to the fundamental tone feature using the channel model parameter, and by the voice signal Jing Guo process of convolution By radiation patterns, to obtain targeted voice signal.
8. method of speech processing as described in claim 1, which is characterized in that the mobile terminal is also configured with earphone, institute It states and the object voice signal is handled based on the channel model parameter and the fundamental tone feature, to obtain target voice After the step of signal, further includes:
The target voice information is exported by the earphone.
9. a kind of mobile terminal, which is characterized in that the mobile terminal includes:
Processor;And
Memory is connected to the processor, and the memory includes control instruction, is referred to when the processor reads the control When enabling, controls the mobile terminal and realize the described in any item method of speech processing of claim 1-8.
10. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium has one or more journeys Sequence, one or more of programs are executed by one or more processors, to realize the described in any item voices of claim 1-8 Processing method.
CN201910623577.0A 2019-07-11 2019-07-11 Method of speech processing, mobile terminal and computer readable storage medium Pending CN110364177A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910623577.0A CN110364177A (en) 2019-07-11 2019-07-11 Method of speech processing, mobile terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910623577.0A CN110364177A (en) 2019-07-11 2019-07-11 Method of speech processing, mobile terminal and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN110364177A true CN110364177A (en) 2019-10-22

Family

ID=68218753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910623577.0A Pending CN110364177A (en) 2019-07-11 2019-07-11 Method of speech processing, mobile terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110364177A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0643380A2 (en) * 1993-09-10 1995-03-15 Hitachi, Ltd. Speech speed conversion method and apparatus
CN1197976A (en) * 1997-04-28 1998-11-04 苏勇 Orthoscopic speed-changing audio signal playback method and equipment
CN1270356A (en) * 1999-04-08 2000-10-18 英业达股份有限公司 Method for changing pronunciation speed
TW200529175A (en) * 2004-02-26 2005-09-01 Univ Southern Taiwan Tech Instant speech speed varying processor
CN1885405A (en) * 2005-06-22 2006-12-27 富士通株式会社 Speech speed converting device and speech speed converting method
CN101740034A (en) * 2008-11-04 2010-06-16 刘盛举 Method for realizing sound speed-variation without tone variation and system for realizing speed variation and tone variation
CN108156317A (en) * 2017-12-21 2018-06-12 广东欧珀移动通信有限公司 call voice control method, device and storage medium and mobile terminal
CN109616131A (en) * 2018-11-12 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 A kind of number real-time voice is changed voice method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0643380A2 (en) * 1993-09-10 1995-03-15 Hitachi, Ltd. Speech speed conversion method and apparatus
CN1197976A (en) * 1997-04-28 1998-11-04 苏勇 Orthoscopic speed-changing audio signal playback method and equipment
CN1270356A (en) * 1999-04-08 2000-10-18 英业达股份有限公司 Method for changing pronunciation speed
TW200529175A (en) * 2004-02-26 2005-09-01 Univ Southern Taiwan Tech Instant speech speed varying processor
CN1885405A (en) * 2005-06-22 2006-12-27 富士通株式会社 Speech speed converting device and speech speed converting method
CN101740034A (en) * 2008-11-04 2010-06-16 刘盛举 Method for realizing sound speed-variation without tone variation and system for realizing speed variation and tone variation
CN108156317A (en) * 2017-12-21 2018-06-12 广东欧珀移动通信有限公司 call voice control method, device and storage medium and mobile terminal
CN109616131A (en) * 2018-11-12 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 A kind of number real-time voice is changed voice method

Similar Documents

Publication Publication Date Title
CN109462692A (en) Split screen display available operating method, mobile terminal and computer readable storage medium
CN109743504A (en) A kind of auxiliary photo-taking method, mobile terminal and storage medium
CN108391190B (en) A kind of noise-reduction method, earphone and computer readable storage medium
CN109144440A (en) A kind of display refresh control method, terminal and computer readable storage medium
CN109032466A (en) Long screenshot method, mobile terminal and storage medium based on double screen
CN109947248A (en) Vibration control method, mobile terminal and computer readable storage medium
CN108551520A (en) A kind of phonetic search response method, equipment and computer readable storage medium
CN108418948A (en) A kind of based reminding method, mobile terminal and computer storage media
CN110177177A (en) Message back method, mobile terminal and computer readable storage medium
CN108831479A (en) A kind of audio recognition method, terminal and computer readable storage medium
CN110134471A (en) A kind of screens switch animation control methods, terminal and computer readable storage medium
CN108172161A (en) Display methods, mobile terminal and computer readable storage medium based on flexible screen
CN108230270A (en) A kind of noise-reduction method, terminal and computer readable storage medium
CN109584897A (en) Vedio noise reduction method, mobile terminal and computer readable storage medium
CN110314375A (en) A kind of method for recording of scene of game, terminal and computer readable storage medium
CN109302528A (en) A kind of photographic method, mobile terminal and computer readable storage medium
CN109276881A (en) A kind of game control method, equipment
CN108280334A (en) A kind of unlocked by fingerprint method, mobile terminal and computer readable storage medium
CN109428923A (en) Information push method, server, terminal and computer readable storage medium
CN109686359A (en) Speech output method, terminal and computer readable storage medium
CN110443238A (en) A kind of display interface scene recognition method, terminal and computer readable storage medium
CN110213444A (en) Display methods, device, mobile terminal and the storage medium of mobile terminal message
CN109065065A (en) Call method, mobile terminal and computer readable storage medium
CN109218531A (en) screen content display effect control method, terminal and computer readable storage medium
CN109561221A (en) A kind of call control method, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination