CN106356065A - Mobile terminal and voice conversion method - Google Patents

Mobile terminal and voice conversion method Download PDF

Info

Publication number
CN106356065A
CN106356065A CN201610969015.8A CN201610969015A CN106356065A CN 106356065 A CN106356065 A CN 106356065A CN 201610969015 A CN201610969015 A CN 201610969015A CN 106356065 A CN106356065 A CN 106356065A
Authority
CN
China
Prior art keywords
speech
language
speaker speech
mobile terminal
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610969015.8A
Other languages
Chinese (zh)
Inventor
陈小翔
张腾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nubia Technology Co Ltd
Original Assignee
Nubia Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nubia Technology Co Ltd filed Critical Nubia Technology Co Ltd
Priority to CN201610969015.8A priority Critical patent/CN106356065A/en
Publication of CN106356065A publication Critical patent/CN106356065A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. SMS or e-mail

Abstract

The invention provides a mobile terminal which comprises an analysis module and a conversion module, wherein the analysis module is used for obtaining the speaker voice and analyzing the language habits of the speaker according to the speaker voice and judging the language type of the speaker voice according to the language habits; and the conversion module is used for performing voice recognition of the speaker voice according to the judged language type and converting the speaker voice into text content. The invention also provides a voice conversion method. Through the mobile terminal and voice conversion method provided by the invention, the speaker language can be converted into corresponding text information, and complete text content corresponding to the speaker language is generated to facilitate user lookup.

Description

A kind of mobile terminal and phonetics transfer method
Technical field
The present invention relates to the communications field, more particularly, it relates to a kind of mobile terminal and phonetics transfer method.
Background technology
With the popularization with social software, increasing user carries out voice-enabled chat by social software, in language During chat, the voice of use is probably the various language such as dialect, mandarin, foreign language, is convenient for people to move in real time While communication, speech data also provides abundant data information simultaneously, can provide abundant information as research material, But, because the voice custom that talker uses is varied, it is right that the voice of talker can't be converted into by existing technology The word content answered.
Content of the invention
The invention provides a kind of mobile terminal, talker's language conversion can be become corresponding Word message, generate and say The corresponding complete word content of words person's language, facilitates user to consult.Described mobile terminal includes analysis module, modular converter.
Analysis module, for obtaining speaker speech, and analyzes the language of described talker according to described speaker speech Speech custom, judges the category of language of described speaker speech according to described language convention.
Modular converter, for speech recognition is carried out to described speaker speech according to the described category of language judged, will Described speaker speech is converted into word content.
Further, described analysis module includes accent recognition module, and described accent recognition module is used for being practised according to dialect It is used to described speaker speech is identified, judge the dialect species of described speaker speech;And
Described modular converter, is additionally operable to the dialect species according to judging and carries out speech recognition, by described speaker speech It is converted into word content.
Further, described analysis module also includes foreign language identification module, and described foreign language identification module is used for according to foreign language Custom is identified to described speaker speech, judges the foreign language species of described speaker speech;And
Described modular converter, is additionally operable to the foreign language species according to judging and carries out speech recognition, by described speaker speech It is converted into word content.
Further, described mobile terminal also includes:
Integrate module, for sequentially in time all voices of described talker being integrated into complete voice data, And sequentially in time the corresponding word content of all voices of described speaker is integrated into complete written historical materialss.
Further, described mobile terminal also includes:
Display module, for showing described complete written historical materialss;And
Playing module, for playing back described complete voice data.
The mobile terminal that the present invention provides analyzes the language kind of speaker speech by analyzing the language convention of talker Class, carries out targetedly identifying processing according to speech category, according to the dialect content of dialect category identification speaker speech, root According to the foreign language content of foreign language category identification speaker speech, and will identify that the Content Transformation coming becomes word content, can generate and The corresponding complete word content of talker's language, provides abundant Word message, facilitates user to consult, be conducive to user follow-up Carry out the editing and processing of correlation.
The present invention also provides a kind of phonetics transfer method, talker's language conversion can be become corresponding Word message, generates Complete word content corresponding with talker's language, facilitates user to consult.Described phonetics transfer method includes:
Acquisition speaker speech, and analyze the language convention of described talker according to described speaker speech, according to institute State the category of language that language convention judges described speaker speech;And
Speech recognition is carried out to described speaker speech according to the described category of language judged, by described speaker speech It is converted into word content.
Further, described phonetics transfer method also includes:
According to dialect custom, described speaker speech is identified, judges the dialect species of described speaker speech;And
Dialect species according to judging carries out speech recognition, and described speaker speech is converted into word content.
Further, described phonetics transfer method also includes:
According to foreign language custom, described speaker speech is identified, judges the foreign language species of described speaker speech;And
Foreign language species according to judging carries out speech recognition, and described speaker speech is converted into word content.
Further, described phonetics transfer method also includes:
Sequentially in time all voices of described talker are integrated into complete voice data;And
Sequentially in time the corresponding word content of all voices of described speaker is integrated into complete written historical materialss.
Further, described phonetics transfer method also includes:
Described complete written historical materialss are shown;And
Described complete voice data is played back.
The phonetics transfer method that the present invention provides analyzes the language of speaker speech by analyzing the language convention of talker Speech species, carries out targetedly identifying processing according to speech category, in the dialect according to dialect category identification speaker speech Hold, according to the foreign language content of foreign language category identification speaker speech, and will identify that the Content Transformation coming becomes word content, can give birth to Become complete word content corresponding with talker's language, abundant Word message is provided, facilitates user to consult, be conducive to user Subsequently carry out the editing and processing of correlation.
Brief description
Fig. 1 is the hardware architecture diagram of the mobile terminal realizing each embodiment of the present invention;
Fig. 2 is the wireless communication system schematic diagram of mobile terminal as shown in Figure 1;
Fig. 3 is the functional block diagram of the embodiment of the present invention one mobile terminal;
Fig. 4 is the functional block diagram of the embodiment of the present invention two mobile terminal;
Fig. 5 is the voice transition diagram of the embodiment of the present invention three mobile terminal;
Fig. 6 is the flow chart of the embodiment of the present invention four phonetics transfer method;
Fig. 7 is the flow chart of the embodiment of the present invention five phonetics transfer method.
The realization of the object of the invention, functional characteristics and advantage will be described further in conjunction with the embodiments referring to the drawings.
Specific embodiment
It should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
Realize the mobile terminal of each embodiment of the present invention referring now to Description of Drawings.In follow-up description, use For represent element such as " module ", " part " or " unit " suffix only for being conducive to the explanation of the present invention, itself Not specific meaning.Therefore, " module " and " part " can mixedly use.
Mobile terminal can be implemented in a variety of manners.For example, the terminal described in the present invention can include such as moving Phone, smart phone, notebook computer, digit broadcasting receiver, pda (personal digital assistant), pad (panel computer), pmp The mobile terminal of (portable media player), guider etc. and such as numeral tv, desk computer etc. consolidate Determine terminal.Hereinafter it is assumed that terminal is mobile terminal.However, it will be understood by those skilled in the art that, except being used in particular for moving Outside the element of purpose, construction according to the embodiment of the present invention can also apply to the terminal of fixed type.
Fig. 1 is the hardware architecture diagram of the mobile terminal realizing each embodiment of the present invention.
Mobile terminal 10 can include, but not limited to memorizer 20, controller 30, wireless communication unit 40, input block 50th, input block 60, photographic head 70, mike 71, interface unit 80 and power subsystem 90.Fig. 1 shows with various assemblies Mobile terminal 10 it should be appreciated that being not required for implementing all assemblies illustrating.Can alternatively implement more or Less assembly.Will be discussed in more detail below the element of mobile terminal 10.
Wireless communication unit 40 generally includes one or more assemblies, its allow mobile terminal 10 and wireless communication system or Wireless points communication between network.For example, wireless communication unit can include broadcasting reception module, mobile communication module, wireless At least one of the Internet module, short range communication module and location information module.
Broadcasting reception module receives broadcast singal via broadcast channel from external broadcast management server and/or broadcast is related Information.Broadcast channel can include satellite channel and/or terrestrial channel.Broadcast management server can be to generate and send broadcast The server of signal and/or broadcast related information or receive before generate broadcast singal and/or broadcast related information and Send it to the server of terminal.Broadcast singal can include tv broadcast singal, radio signals, data broadcasting signal Etc..And, broadcast singal may further include the broadcast singal combining with tv or radio signals.The related letter of broadcast Breath can also provide via mobile communications network, and in this case, broadcast related information can be come by mobile communication module Receive.Broadcast singal can exist in a variety of manners, and for example, it can be referred to the electronic programming of DMB (dmb) The form of southern (epg), the electronic service guidebooks (esg) of digital video broadcast-handheld (dvb-h) etc. and exist.Broadcast reception mould Block can be broadcasted by using various types of broadcast system receipt signals.Especially, broadcasting reception module can be by using Such as multimedia broadcasting-ground (dmb-t), DMB-satellite (dmb-s), DVB-hand-held (dvb- H), forward link media (mediaflo@) Radio Data System, the number of received terrestrial digital broadcasting integrated service (isdb-t) etc. Word broadcast system receives digital broadcasting.Broadcasting reception module may be constructed such that the various broadcast systems being adapted to provide for broadcast singal And above-mentioned digit broadcasting system.Via broadcasting reception module, the broadcast singal receiving and/or broadcast related information can store In memorizer 20 (or other types of storage medium).
Mobile communication module send radio signals to base station (for example, access point, node b etc.), exterior terminal with And at least one of server and/or receive from it radio signal.Such radio signal can include language call Signal, video calling signal or the various types of data sending and/or receiving according to text and/or Multimedia Message.
Wireless Internet module supports the Wi-Fi (Wireless Internet Access) of mobile terminal.This module can internally or externally couple To terminal.Wi-Fi (Wireless Internet Access) technology involved by this module can include wlan (wireless lan) (wi-fi), wibro (no Live width band), wimax (worldwide interoperability for microwave accesses), hsdpa (high-speed downlink packet access) etc..
Short range communication module is the module for supporting junction service.Some examples of short-range communication technology include bluetoothtm、 RF identification (rfid), Infrared Data Association (irda), ultra broadband (uwb), purple honeybeetmEtc..
Location information module be for check or obtain mobile terminal positional information module.The allusion quotation of location information module Type example is gps (global positioning system).According to current technology, gps module calculates the distance from three or more satellites Information and correct time information and the Information application triangulation for calculating, thus according to longitude, latitude and height Calculate three-dimensional current location information exactly.Currently, for calculate position and temporal information method use three satellites and The position calculating by using an other satellite correction and the error of temporal information.Additionally, gps module can be by real When ground Continuous plus current location information carry out calculating speed information.
Output unit 50 be configured to vision, audio frequency and/or tactile manner provide output signal (for example, audio signal, Video signal, alarm signal, vibration signal etc.).Output unit 50 can include display unit 51, dio Output Modules 52, Alarm unit 53 etc..
Display unit 51 may be displayed on the information processing in mobile terminal 10.For example, when mobile terminal 10 is in phone During call mode, display unit 51 can show and communicate with call or other (for example, text messaging, under multimedia file Carry etc.) related user interface (ui) or graphic user interface (gui).When mobile terminal 10 be in video calling pattern or During image capture mode, display unit 51 can show the image of capture and/or the image of reception, illustrate video or image and Ui or gui of correlation function etc..
Meanwhile, when display unit 51 and the touch pad touch screen with formation superposed on one another as a layer, display unit 51 Can serve as input equipment and output device.Display unit 51 can include liquid crystal display (lcd), thin film transistor (TFT) lcd (tft-lcd), at least in Organic Light Emitting Diode (oled) display, flexible display, three-dimensional (3d) display etc. Kind.Some in these display may be constructed such that transparence to allow user from outside viewing, and this is properly termed as transparent aobvious Show device, typical transparent display can be, for example, toled (transparent organic light emitting diode) display etc..Thought according to specific The embodiment wanted, mobile terminal 10 can include two or more display units (or other display device), for example, mobile whole End can include outernal display unit (not shown) and inner display unit (not shown).Touch screen can be used for detecting touch input Pressure and touch input position and touch input area.
Dio Output Modules 52 can be in call signal reception pattern, call mode, logging mode, language in mobile terminal When under speech recognition mode, the isotype such as broadcast reception mode, that wireless communication unit 40 is received or deposit in memorizer 20 Storage voice data transducing audio signal and be output as sound.And, dio Output Modules 52 can provide and mobile terminal The audio output (for example, call signal receives sound, message sink sound etc.) of the specific function correlation of 10 execution.Audio frequency is defeated Go out module 52 and can include speaker, buzzer etc..
Alarm unit 53 can provide output to notify event to mobile terminal 10.Typical event can be wrapped Include calling reception, message sink, key signals input, touch input etc..In addition to audio or video output, alarm unit 53 Output can be provided in a different manner with the generation of notification event.For example, alarm unit 53 can be provided in the form of vibrating Output, enters when communicating (incoming communication) when receiving calling, message or some other, alarm unit 53 Tactile output (that is, vibrating) can be provided to notify to user.By providing such tactile output, even if user's When mobile phone is in the pocket of user, user also can recognize that the generation of various events.Alarm unit 53 can also be through The output of the generation of notification event is provided by display unit 51 or dio Output Modules 52.
Input block 60 can generate key input data to control the various behaviour of mobile terminal according to the order of user input Make.Input block 60 allows the various types of information of user input, and can include keyboard, metal dome, touch pad (for example, Detection due to touched and lead to resistance, pressure, the change of electric capacity etc. sensitive component), roller, rocking bar etc..Especially Ground, when touch pad is superimposed upon on display unit 50 as a layer, can form touch screen.In an embodiment of the present invention, Described input block 60 includes touch screen and ink screen.Photographic head 70 is used for shooting image data, and mike 71 is used for enrolling sound Frequency data.
Interface unit 80 is connected, with mobile terminal 10, the interface that can pass through as at least one external device (ED).For example, outward Part device can include wired or wireless head-band earphone port, external power source (or battery charger) port, wired or wireless FPDP, memory card port, for connect have the port of device of identification module, audio input/output (i/o) port, Video i/o port, ear port etc..Identification module can be storage for verifying that user uses the various letters of mobile terminal 10 Cease and subscriber identification module (uim), client identification module (sim), Universal Subscriber identification module (usim) etc. can be included. In addition, the device (hereinafter referred to as " identifying device ") with identification module can take the form of smart card, therefore, identifying device Can be connected with mobile terminal 10 via port or other attachment means.Interface unit 80 can be used for receiving from external device (ED) Input (for example, data message, electric power etc.) and the input receiving is transferred to one or many in mobile terminal 10 Individual element or can be used for transmission data between mobile terminal and external device (ED).
In addition, when mobile terminal 10 is connected with external base, interface unit 80 can serve as allowing by it by electric power There is provided the path of mobile terminal 10 from base or can serve as allowing the various command signals from base input to pass by it The defeated path to mobile terminal.May serve as whether identifying mobile terminal from the various command signals of base input or electric power It is accurately fitted within the signal on base.
Memorizer 20 can store software program of the process being executed by controller 30 and control operation etc., or permissible Temporarily store oneself data (for example, telephone directory, message, still image, video etc.) through exporting or will export.And, Memorizer 20 can be to store the vibration of various modes and the data of audio signal with regard to exporting when touching and being applied to touch screen.
Memorizer 20 can include the storage medium of at least one type, and described storage medium includes flash memory, hard disk, many matchmakers Body card, card-type memorizer (for example, sd or dx memorizer etc.), random access storage device (ram), static random-access memory (sram), read only memory (rom), Electrically Erasable Read Only Memory (eeprom), programmable read only memory (prom), magnetic storage, disk, CD etc..And, mobile terminal 10 can execute memorizer 20 with by network connection Store function network storage device cooperation.
Controller 30 generally controls the overall operation of mobile terminal.For example, controller 30 execution and language call, data are led to The related control of letter, video calling etc. and process.In addition, controller 30 can be included for reproducing (or playback) multimedia number According to multi-media module, multi-media module can construct in controller 30, or it is so structured that separates with controller 30.Control The handwriting input executing on the touchscreen or picture can be drawn input and are identified as with execution pattern identifying processing by device 30 processed Character or image.
Power subsystem 90 receives external power or internal power under the control of the controller 30 and provides operation each element With the suitable electric power needed for assembly.
Various embodiment described herein can be with using such as computer software, hardware or its any combination of calculating Machine computer-readable recording medium is implementing.Hardware is implemented, embodiment described herein can be by using application-specific IC (asic), digital signal processor (dsp), digital signal processing device (dspd), programmable logic device (pld), scene can Program gate array (fpga), processor, controller, microcontroller, microprocessor, be designed to execute function described herein At least one in electronic unit implementing, in some cases, can be implemented in controller 180 by such embodiment. Software is implemented, the embodiment of such as process or function can with allow to execute the single of at least one function or operation Software module is implementing.Software code can be come by the software application (or program) write with any suitable programming language Implement, software code can be stored in memorizer 160 and be executed by controller 180.
So far, oneself is through describing mobile terminal according to its function.Below, for the sake of brevity, will describe such as folded form, Slide type mobile terminal in various types of mobile terminals of board-type, oscillating-type, slide type mobile terminal etc. is as showing Example.Therefore, the present invention can be applied to any kind of mobile terminal, and is not limited to slide type mobile terminal.
As shown in Figure 1 mobile terminal 10 may be constructed such that using such as wired via frame or packet transmission data To operate with wireless communication system and satellite-based communication system.
The communication system being wherein operable to according to the mobile terminal of the present invention referring now to Fig. 2 description.
Such communication system can use different air interfaces and/or physical layer.For example, used by communication system Air interface includes such as frequency division multiple access (fdma), time division multiple acess (tdma), CDMA (cdma) and universal mobile communications system System (umts) (especially, Long Term Evolution (lte)), global system for mobile communications (gsm) etc..As non-limiting example, under The description in face is related to cdma communication system, but such teaching is equally applicable to other types of system.
With reference to Fig. 2, cdma wireless communication system can include multiple mobile terminal 1s 0, multiple base station (bs) 270, base station control Device (bsc) 275 processed and mobile switching centre (msc) 280.Msc280 is configured to and Public Switched Telephony Network (pstn) 290 Form interface.Msc280 is also structured to and can form interface via the bsc275 that back haul link is couple to base station 270.Backhaul If circuit can construct according to any one in the interface that Ganji knows, described interface includes such as e1/t1, atm, ip, ppp, Frame relay, hdsl, adsl or xdsl.It will be appreciated that system as shown in Figure 2 can include multiple bsc2750.
Each bs270 can service one or more subregions (or region), by the sky of multidirectional antenna or sensing specific direction Each subregion that line covers is radially away from bs270.Or, each subregion can by for diversity reception two or more Antenna covers.Each bs270 may be constructed such that support multiple frequency distribution, and the distribution of each frequency has specific frequency spectrum (for example, 1.25mhz, 5mhz etc.).
Intersecting that subregion and frequency are distributed can be referred to as cdma channel.Bs270 can also be referred to as base station transceiver System (bts) or other equivalent terms.In this case, term " base station " can be used for broadly representing single Bsc275 and at least one bs270.Base station can also be referred to as " cellular station ".Or, each subregion of specific bs270 can be claimed For multiple cellular stations.
As shown in Figure 2, broadcast singal is sent to the mobile terminal of operation in system by broadcsting transmitter (bt) 295 10.Broadcasting reception module 111 is arranged on and is believed by the broadcast that bt295 sends with receiving at mobile terminal 10 as shown in Figure 1 Number.In fig. 2 it is shown that several global positioning system (gps) satellites 300.Satellite 300 helps position in multiple mobile terminal 1s 0 At least one.
In fig. 2, depict multiple satellites 300, it is understood that be, it is possible to use any number of satellite obtains useful Location information.Gps module 115 is generally configured to coordinate with satellite 300 to obtain the positioning letter wanted as shown in Figure 1 Breath.Substitute gps tracking technique or outside gps tracking technique, it is possible to use other of the position of mobile terminal can be followed the tracks of Technology.In addition, at least one gps satellite 300 can optionally or additionally process satellite dmb transmission.
As a typical operation of wireless communication system, bs270 receives the reverse link from various mobile terminal 1s 0 Signal.Mobile terminal 10 generally participates in call, information receiving and transmitting and other types of communication.Each of certain base station 270 reception is anti- Processed in specific bs270 to link signal.The data obtaining is forwarded to the bsc275 of correlation.Bsc provides call Resource allocation and the mobile management function of including the coordination of soft switching process between bs270.Bsc275 is also by the number receiving According to being routed to msc280, it provides the extra route service for forming interface with pstn290.Similarly, pstn290 with Msc280 forms interface, and msc and bsc275 form interface, and bsc275 correspondingly controls bs270 with by forward link signals It is sent to mobile terminal 10.
Based on above-mentioned mobile terminal hardware configuration and communication system, each embodiment of the inventive method is proposed.
Refer to Fig. 3, Fig. 3 is the functional block diagram of the embodiment of the present invention one mobile terminal.Mobile terminal 10 shown in Fig. 3 Including: analysis module 101, modular converter 103.Below each functional module is described in detail.Analysis module 101 obtains Speaker speech, and the language convention of talker is analyzed according to speaker speech, judge speaker speech according to the language habits Category of language, wherein, obtain speaker speech can by the microphone location of mobile terminal, Network Capture can also be passed through Speaker speech, language convention includes the personal habits of talker, dialect custom, foreign language custom, and personal habits include talker Commonly used interjection, auxiliary words of mood.Modular converter 103 carries out voice according to the category of language judged to speaker speech Identification, speaker speech is converted into word content.
The mobile terminal that the present embodiment provides analyzes the language of speaker speech by analyzing the language convention of talker Species, carries out targetedly identifying processing according to speech category, and will identify that the Content Transformation coming becomes word content, can give birth to Become complete word content corresponding with talker's language, abundant Word message is provided, facilitates user to consult, be conducive to user Subsequently carry out the editing and processing of correlation.
Refer to Fig. 4, Fig. 4 is the functional block diagram of the embodiment of the present invention two mobile terminal.Mobile terminal 10 shown in Fig. 4 Including: analysis module 101, modular converter 103, integration module 109, text importing module 111, playing module 113, analysis module 101 include accent recognition module 105, foreign language identification module 107.Below each functional module is described in detail.
Analysis module 101 acquisition speaker speech, and the language convention of talker is analyzed according to speaker speech, according to Language convention judges the category of language of speaker speech, and wherein, obtaining speaker speech can be by the mike of mobile terminal Record, speaker speech can also be downloaded by network.Modular converter 103 is according to the category of language judged to speaker speech Carry out speech recognition, speaker speech is converted into word content.
Further illustrate, speaker's voice has polytype, can be dialect, foreign language, mandarin, wherein, dialect Including dialect all over China, such as northeast words, Sichuan words, Hunan words, Cantonese etc., foreign language include English, French, German, The category of language such as Russian.After mobile terminal 10 obtains speaker speech, accent recognition module 105 is accustomed to talker according to dialect Voice is identified, and judges the dialect species of speaker speech.Modular converter 103 carries out voice according to the dialect species judged Identification, speaker speech is converted into word content.After mobile terminal 10 obtains speaker speech, foreign language identification module 107 According to foreign language custom, speaker speech is identified, judges the foreign language species of speaker speech.Modular converter 103 is according to judging Foreign language species carry out speech recognition, speaker speech is converted into word content.
When speaker's language is multiple brief voice, integrate module 109 sequentially in time that talker is all Voice is integrated into complete voice data, and is integrated into the corresponding word content of all voices of speaker sequentially in time Complete written historical materialss.After obtaining complete voice data, complete written historical materialss are shown by display module 111, just Consult in user, wherein, display module 111 includes display screen, touch screen, display screen includes tft LCDs, ufb liquid crystal Display screen, stn screen, active matrix organic light-emitting diode (AMOLED) panel, touch screen includes capacitive touch screen infrared-type and touches Screen, surface face ripple touch screen, mtk touch screen, touch screen receives touch signal and controls whether to show complete word content.Play Complete voice data is played back by module 113, and be easy to user's energy smoothness hears out all voice data, wherein, plays mould Block 113 also carries out denoising to complete voice data, the fluency of the complete speech data of speaker is adjusted, root Adjust suitable broadcast sound volume according to current broadcasting scene, after receiving the control signal of the complete speech playing speaker, Playing module 113 is by the speech play after processing out.
The mobile terminal that the present embodiment provides analyzes the language of speaker speech by analyzing the language convention of talker Species, carries out targetedly identifying processing according to speech category, according to the dialect content of dialect category identification speaker speech, According to the foreign language content of foreign language category identification speaker speech, and will identify that the Content Transformation coming becomes word content, can generate Complete word content corresponding with talker's language, provides abundant Word message, facilitates user to consult, after being conducive to user The continuous editing and processing carrying out correlation.
Refering to Fig. 5, Fig. 5 is the voice transition diagram of the embodiment of the present invention three mobile terminal.Voice in the present embodiment In transition diagram, the analysis module 101 of the mobile terminal 10 shown in the left side obtains speaker speech, and speaker speech includes a First voice of talker, the 3rd voice of a talker, second voice of b talker, the 4th voice of b talker, according to a, B speaker speech analyzes the language convention of a, b talker, and the language convention according to a, b talker judges a, b talker respectively The category of language of voice, wherein, obtain a, b speaker speech can by the microphone location of mobile terminal, can also pass through Network downloads speaker speech.Modular converter 103 carries out voice knowledge according to the category of language judged to a, b speaker speech Not, a, b speaker speech is converted into word content.
Further illustrate, speaker's voice has polytype, can be dialect, foreign language, mandarin etc., wherein, side Speech includes dialect all over China, such as northeast words, Sichuan words, Hunan words etc., and foreign language includes English, French, German, Russian etc. Category of language.After mobile terminal 10 obtains speaker speech, accent recognition module 105 is accustomed to a, b talker's language according to dialect Sound is identified, and judges the dialect species of a, b speaker speech.Modular converter 103 carries out language according to the dialect species judged Sound identifies, a, b speaker speech is converted into word content.After mobile terminal 10 obtains a, b speaker speech, foreign language identifies mould Block 107 is identified to a, b speaker speech according to foreign language custom, judges the foreign language species of a, b speaker speech.Modular converter 103 carry out speech recognition according to the foreign language species judged, a, b speaker speech is converted into word content.
Integrate module 109 and sequentially in time first voice of a talker, the 3rd voice are integrated into complete voice money Material, and sequentially in time the corresponding word content of all voices of speaker is integrated into complete written historical materialss, word provides Material includes the first word content, the 3rd word content, and the wherein first word content is corresponding with the content of the first voice, the 3rd literary composition Word content is corresponding with the content of the 3rd voice.Similarly, integrate module 109 sequentially in time by second language of b talker Sound, the 4th voice are integrated into complete voice data, and sequentially in time by the corresponding word of all voices of b speaker Hold and be integrated into complete written historical materialss, written historical materialss include the second word content, the 4th word content, the wherein second word content Corresponding with the content of the second voice, the 4th word content is corresponding with the content of the 4th voice, that is, in Figure 5 shown in right figure Content.
After obtaining complete voice data, the complete written historical materialss of a talker, b talker are shown by display module 111 Illustrate, be easy to user and consult, wherein, display module 111 includes display screen, touch screen, and display screen includes tft liquid crystal display Screen, ufb LCDs, stn screen, active matrix organic light-emitting diode (AMOLED) panel, touch screen can be red for capacitive touch screen Outer wire type touch screen, surface face ripple touch screen, mtk touch screen, touch screen receives touch signal and controls whether to show complete literary composition Word content.Complete voice data is played back by playing module 113, and be easy to user's energy smoothness hears out all voice data, Wherein, playing module 113 also carries out denoising to complete voice data, the fluency to the complete speech data of speaker It is adjusted, suitable broadcast sound volume is adjusted according to current broadcasting scene, when the complete speech receiving broadcasting speaker After control signal, playing module 113 is by the speech play after processing out.
The mobile terminal that the present invention provides analyzes the language kind of speaker speech by analyzing the language convention of talker Class, carries out targetedly identifying processing according to speech category, according to the dialect content of dialect category identification speaker speech, root According to the foreign language content of foreign language category identification speaker speech, and will identify that the Content Transformation coming becomes word content, can generate and The corresponding complete word content of talker's language, provides abundant Word message, facilitates user to consult, be conducive to user follow-up Carry out the editing and processing of correlation.
The present invention also provides a kind of phonetics transfer method, the mobile terminal 10 shown in the method application Fig. 3 or Fig. 4, below The phonetics transfer method of the present embodiment is described in detail.
Refering to Fig. 6, Fig. 6 is the flow chart of the embodiment of the present invention four phonetics transfer method.
In step s601, analysis module 101 obtains speaker speech, and analyzes talker's according to speaker speech Language convention, judges the category of language of speaker speech according to the language habits, and wherein, obtaining speaker speech can be by moving The microphone location of terminal, speaker speech can also be downloaded by network.
In step s603, modular converter 103 carries out speech recognition according to the category of language judged to speaker speech, Speaker speech is converted into word content.
Further illustrate, speaker's voice has polytype, can be dialect, foreign language, mandarin etc., wherein, side Speech includes dialect all over China, such as northeast words, Sichuan words, Hunan words etc., and foreign language includes English, French, German, Russian etc. Category of language.After mobile terminal 10 obtains speaker speech, accent recognition module 105 is entered to speaker speech according to dialect custom Row identification, judges the dialect species of speaker speech.Modular converter 103 carries out speech recognition according to the dialect species judged, Speaker speech is converted into word content.After mobile terminal 10 obtains speaker speech, foreign language identification module 107 is according to foreign language Custom is identified to speaker speech, judges the foreign language species of speaker speech.Modular converter 103 is according to the foreign language judged Species carries out speech recognition, and speaker speech is converted into word content.
Refering to Fig. 7, Fig. 7 is the flow chart of the embodiment of the present invention five phonetics transfer method.
In step s701, when speaker's language is multiple brief voice, integrates module 109 and sequentially in time will All voices of talker are integrated into complete voice data.
In step s703, integrate module 109 sequentially in time by the corresponding word content of all voices of speaker It is integrated into complete written historical materialss.
In step s705, after obtaining complete voice data, complete written historical materialss are shown by display module 111 Come, be easy to user and consult, wherein, display module 111 includes display screen, touch screen, and display screen includes tft LCDs, ufb LCDs, stn screen, active matrix organic light-emitting diode (AMOLED) panel, touch screen includes capacitive touch screen infrared-type Touch screen, surface face ripple touch screen, mtk touch screen, touch screen receives touch signal and controls whether to show complete word content.
In step s707, complete voice data is played back by playing module 113, and being easy to user can smooth hearing out All voice data.Wherein, playing module 113 also carries out denoising to complete voice data, the complete language to speaker The fluency of sound data is adjusted, and adjusts suitable broadcast sound volume according to current broadcasting scene, plays speech when receiving After the control signal of the complete speech of person, playing module 113 is by the speech play after processing out.
Supplementary notes, the order of each step of the present embodiment can change, below with the analysis of mobile terminal 10 Module 101 obtain first voice of a talker, the 3rd voice of a talker, second voice of b talker, the of b talker Processing procedure after four voices is described in detail.The analysis module 101 of mobile terminal 10 analyzes according to a, b speaker speech The language convention of a, b talker, the language convention according to a, b talker judges the category of language of a, b speaker speech respectively, its In, obtain a, b speaker speech can by the microphone location of mobile terminal, can also by network download talker's language Sound.Modular converter 103 carries out speech recognition according to the category of language judged to a, b speaker speech, by a, b speaker speech It is converted into word content.
Further illustrate, speaker's voice has polytype, can be dialect, foreign language, mandarin etc..Mobile terminal After 10 obtain speaker speech, accent recognition module 105 is identified to a, b speaker speech according to dialect custom, judges a, b The dialect species of speaker speech.Modular converter 103 carries out speech recognition according to the dialect species judged, by a, b talker Voice is converted into word content.After mobile terminal 10 obtains a, b speaker speech, foreign language identification module 107 is accustomed to according to foreign language A, b speaker speech is identified, judges the foreign language species of a, b speaker speech.Modular converter 103 is outer according to judge Languages class carries out speech recognition, and a, b speaker speech is converted into word content.
Integrate module 109 and sequentially in time first voice of a talker, the 3rd voice are integrated into complete voice money Material, and sequentially in time the corresponding word content of all voices of speaker is integrated into complete written historical materialss, word provides Material includes the first word content, the 3rd word content, and the wherein first word content is corresponding with the content of the first voice, the 3rd literary composition Word content is corresponding with the content of the 3rd voice.Similarly, integrate module 109 sequentially in time by second language of b talker Sound, the 4th voice are integrated into complete voice data, and sequentially in time by the corresponding word of all voices of b speaker Hold and be integrated into complete written historical materialss, written historical materialss include the second word content, the 4th word content, the wherein second word content Corresponding with the content of the second voice, the 4th word content is corresponding with the content of the 4th voice.
After obtaining complete voice data, the complete written historical materialss of a talker, b talker are shown by display module 111 Illustrate, be easy to user and consult, wherein, display module 111 includes display screen, touch screen, and display screen includes tft liquid crystal display Screen, ufb LCDs, stn screen, active matrix organic light-emitting diode (AMOLED) panel, it is red that touch screen includes capacitive touch screen Outer wire type touch screen, surface face ripple touch screen, mtk touch screen, touch screen receives touch signal and controls whether to show complete literary composition Word content.Complete voice data is played back by playing module 113, and be easy to user's energy smoothness hears out all voice data, Wherein, playing module 113 also carries out denoising to complete voice data, the fluency to the complete speech data of speaker It is adjusted, suitable broadcast sound volume is adjusted according to current broadcasting scene, when the complete speech receiving broadcasting speaker After control signal, playing module 113 is by the speech play after processing out.
The mobile terminal that the present invention provides analyzes the language kind of speaker speech by analyzing the language convention of talker Class, carries out targetedly identifying processing according to speech category, according to the dialect content of dialect category identification speaker speech, root According to the foreign language content of foreign language category identification speaker speech, and will identify that the Content Transformation coming becomes word content, can generate and The corresponding complete word content of talker's language, provides abundant Word message, facilitates user to consult, be conducive to user follow-up Carry out the editing and processing of correlation.
These are only presently preferred embodiments of the present invention, not in order to limit the present invention, all spirit in the present invention and Any modification, equivalent and improvement of being made within principle etc., should be included within the scope of the present invention.

Claims (10)

1. a kind of mobile terminal is it is characterised in that include:
Analysis module, for obtaining speaker speech, and practises according to the language that described speaker speech analyzes described talker Used, the category of language of described speaker speech is judged according to described language convention;And
Modular converter, for speech recognition is carried out to described speaker speech according to the described category of language judged, will be described Speaker speech is converted into word content.
2. mobile terminal as claimed in claim 1 is it is characterised in that described analysis module includes accent recognition module, described Accent recognition module is used for according to dialect custom, described speaker speech being identified, and judges the dialect of described speaker speech Species;And
Described modular converter, is additionally operable to the dialect species according to judging and carries out speech recognition, described speaker speech is changed Become word content.
3. mobile terminal as claimed in claim 1 is it is characterised in that described analysis module also includes foreign language identification module, institute State foreign language identification module for being identified to described speaker speech according to foreign language custom, judge the outer of described speaker speech Languages class;And
Described modular converter, is additionally operable to the foreign language species according to judging and carries out speech recognition, described speaker speech is changed Become word content.
4. the mobile terminal as described in as arbitrary in claim 1-3 one is it is characterised in that also include:
Integrate module, for sequentially in time all voices of described talker being integrated into complete voice data, and press According to time sequencing, the corresponding word content of all voices of described speaker is integrated into complete written historical materialss.
5. mobile terminal as claimed in claim 4 is it is characterised in that also include:
Display module, for showing described complete written historical materialss;And
Playing module, for playing back described complete voice data.
6. a kind of phonetics transfer method is it is characterised in that include:
Acquisition speaker speech, and analyze the language convention of described talker according to described speaker speech, according to institute's predicate Speech custom judges the category of language of described speaker speech;And
Speech recognition is carried out to described speaker speech according to the described category of language judged, described speaker speech is changed Become word content.
7. phonetics transfer method as claimed in claim 6 is it is characterised in that also include:
According to dialect custom, described speaker speech is identified, judges the dialect species of described speaker speech;And
Dialect species according to judging carries out speech recognition, and described speaker speech is converted into word content.
8. phonetics transfer method as claimed in claim 6 is it is characterised in that also include:
According to foreign language custom, described speaker speech is identified, judges the foreign language species of described speaker speech;And
Foreign language species according to judging carries out speech recognition, and described speaker speech is converted into word content.
9. the phonetics transfer method as described in as arbitrary in claim 6-8 one is it is characterised in that also include:
Sequentially in time all voices of described talker are integrated into complete voice data;And
Sequentially in time the corresponding word content of all voices of described speaker is integrated into complete written historical materialss.
10. phonetics transfer method as claimed in claim 9 is it is characterised in that also include:
Described complete written historical materialss are shown;And
Described complete voice data is played back.
CN201610969015.8A 2016-10-31 2016-10-31 Mobile terminal and voice conversion method Pending CN106356065A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610969015.8A CN106356065A (en) 2016-10-31 2016-10-31 Mobile terminal and voice conversion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610969015.8A CN106356065A (en) 2016-10-31 2016-10-31 Mobile terminal and voice conversion method

Publications (1)

Publication Number Publication Date
CN106356065A true CN106356065A (en) 2017-01-25

Family

ID=57864300

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610969015.8A Pending CN106356065A (en) 2016-10-31 2016-10-31 Mobile terminal and voice conversion method

Country Status (1)

Country Link
CN (1) CN106356065A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952645A (en) * 2017-03-24 2017-07-14 广东美的制冷设备有限公司 The recognition methods of phonetic order, the identifying device of phonetic order and air-conditioner
CN107316637A (en) * 2017-05-31 2017-11-03 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN108255917A (en) * 2017-09-15 2018-07-06 广州市动景计算机科技有限公司 Image management method, equipment and electronic equipment
CN108572764A (en) * 2018-03-13 2018-09-25 努比亚技术有限公司 A kind of word input control method, equipment and computer readable storage medium
CN110197656A (en) * 2018-02-26 2019-09-03 付明涛 It is a kind of can fast recording conference content and the equipment that is converted into text
CN110349564A (en) * 2019-07-22 2019-10-18 苏州思必驰信息科技有限公司 Across the language voice recognition methods of one kind and device
WO2019218467A1 (en) * 2018-05-14 2019-11-21 平安科技(深圳)有限公司 Method and apparatus for dialect recognition in voice and video calls, terminal device, and medium
CN111461946A (en) * 2020-04-14 2020-07-28 山东致群信息技术有限公司 Intelligent public security interrogation system
WO2021134547A1 (en) * 2019-12-31 2021-07-08 李庆远 Sound recording device based on general-purpose mobile device
WO2022057759A1 (en) * 2020-09-21 2022-03-24 华为技术有限公司 Voice conversion method and related device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1774715A (en) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 System and method for performing automatic dubbing on an audio-visual stream
CN201403118Y (en) * 2008-12-12 2010-02-10 康佳集团股份有限公司 Device with dialect translating function and mobile terminal
CN102831195A (en) * 2012-08-03 2012-12-19 河南省佰腾电子科技有限公司 Individualized voice collection and semantics determination system and method
CN103327181A (en) * 2013-06-08 2013-09-25 广东欧珀移动通信有限公司 Voice chatting method capable of improving efficiency of voice information learning for users
CN104123932A (en) * 2014-07-29 2014-10-29 科大讯飞股份有限公司 Voice conversion system and method
CN104361888A (en) * 2014-11-28 2015-02-18 上海斐讯数据通信技术有限公司 Device and method for informing hearing-impaired person of voice message through vibration signal
CN106024014A (en) * 2016-05-24 2016-10-12 努比亚技术有限公司 Voice conversion method and device and mobile terminal
CN106057193A (en) * 2016-07-13 2016-10-26 深圳市沃特沃德股份有限公司 Conference record generation method based on telephone conference and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1774715A (en) * 2003-04-14 2006-05-17 皇家飞利浦电子股份有限公司 System and method for performing automatic dubbing on an audio-visual stream
CN201403118Y (en) * 2008-12-12 2010-02-10 康佳集团股份有限公司 Device with dialect translating function and mobile terminal
CN102831195A (en) * 2012-08-03 2012-12-19 河南省佰腾电子科技有限公司 Individualized voice collection and semantics determination system and method
CN103327181A (en) * 2013-06-08 2013-09-25 广东欧珀移动通信有限公司 Voice chatting method capable of improving efficiency of voice information learning for users
CN104123932A (en) * 2014-07-29 2014-10-29 科大讯飞股份有限公司 Voice conversion system and method
CN104361888A (en) * 2014-11-28 2015-02-18 上海斐讯数据通信技术有限公司 Device and method for informing hearing-impaired person of voice message through vibration signal
CN106024014A (en) * 2016-05-24 2016-10-12 努比亚技术有限公司 Voice conversion method and device and mobile terminal
CN106057193A (en) * 2016-07-13 2016-10-26 深圳市沃特沃德股份有限公司 Conference record generation method based on telephone conference and device

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106952645B (en) * 2017-03-24 2020-11-17 广东美的制冷设备有限公司 Voice instruction recognition method, voice instruction recognition device and air conditioner
CN106952645A (en) * 2017-03-24 2017-07-14 广东美的制冷设备有限公司 The recognition methods of phonetic order, the identifying device of phonetic order and air-conditioner
CN107316637A (en) * 2017-05-31 2017-11-03 广东欧珀移动通信有限公司 Audio recognition method and Related product
CN108255917A (en) * 2017-09-15 2018-07-06 广州市动景计算机科技有限公司 Image management method, equipment and electronic equipment
CN108255917B (en) * 2017-09-15 2020-12-18 阿里巴巴(中国)有限公司 Image management method and device and electronic device
CN110197656A (en) * 2018-02-26 2019-09-03 付明涛 It is a kind of can fast recording conference content and the equipment that is converted into text
CN108572764A (en) * 2018-03-13 2018-09-25 努比亚技术有限公司 A kind of word input control method, equipment and computer readable storage medium
CN108572764B (en) * 2018-03-13 2022-01-14 努比亚技术有限公司 Character input control method and device and computer readable storage medium
WO2019218467A1 (en) * 2018-05-14 2019-11-21 平安科技(深圳)有限公司 Method and apparatus for dialect recognition in voice and video calls, terminal device, and medium
CN110349564A (en) * 2019-07-22 2019-10-18 苏州思必驰信息科技有限公司 Across the language voice recognition methods of one kind and device
CN110349564B (en) * 2019-07-22 2021-09-24 思必驰科技股份有限公司 Cross-language voice recognition method and device
WO2021134547A1 (en) * 2019-12-31 2021-07-08 李庆远 Sound recording device based on general-purpose mobile device
CN111461946A (en) * 2020-04-14 2020-07-28 山东致群信息技术有限公司 Intelligent public security interrogation system
WO2022057759A1 (en) * 2020-09-21 2022-03-24 华为技术有限公司 Voice conversion method and related device

Similar Documents

Publication Publication Date Title
CN106356065A (en) Mobile terminal and voice conversion method
CN105100892B (en) Video play device and method
CN106385548A (en) Mobile terminal and method for generating video captions
CN105719659A (en) Recording file separation method and device based on voiceprint identification
CN104917896A (en) Data pushing method and terminal equipment
CN105306815A (en) Shooting mode switching device, method and mobile terminal
CN105100482A (en) Mobile terminal and system for realizing sign language identification, and conversation realization method of the mobile terminal
CN105718071A (en) Terminal and method for recommending associational words in input method
CN105049637A (en) Device and method for controlling instant communication
CN106249989A (en) Social networking application program icon aligning method during a kind of sharing contents and mobile terminal
CN106909681A (en) A kind of information processing method and its device
CN106028090A (en) Mobile terminal and video recording method thereof
CN106790942A (en) Voice messaging intelligence store method and device
CN106254617A (en) A kind of mobile terminal and control method
CN104731508B (en) Audio frequency playing method and device
CN108829267A (en) A kind of vocabulary recommended method, equipment and computer can storage mediums
CN105872997A (en) Short message merging method and terminal equipment
CN106993093A (en) A kind of image processing apparatus and method
CN106657643A (en) Mobile terminal and communication session display method
CN106791149A (en) A kind of method of mobile terminal and control screen
CN106357929A (en) Previewing method based on audio file and mobile terminal
CN106376004A (en) Information processing method and terminal
CN106161790A (en) A kind of mobile terminal and control method thereof
CN107249072A (en) Mobile terminal and its chain operation start method
CN106648505A (en) Mobile terminal control method and mobile terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170125

RJ01 Rejection of invention patent application after publication