CN103546623A

CN103546623A - Method, device and equipment for sending voice information and text description information thereof

Info

Publication number: CN103546623A
Application number: CN201210242430.5A
Authority: CN
Inventors: 陈莹
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2012-07-12
Filing date: 2012-07-12
Publication date: 2014-01-29
Anticipated expiration: 2032-07-12
Also published as: CN103546623B

Abstract

The invention provides a method, a device and equipment for sending voice information and text description information thereof in a correlated mode. The method includes firstly, acquiring text recognition result information acquired from voice recognition treatment for voice information recognition; secondly, generating the text description information used for describing voice content of the voice information by the recognition result information; finally, sending the text description information and the voice information in a correlated mode. Compared with the prior art, the method, the device and the equipment have the advantages that a mobile terminal can send the voice information and the text description information thereof in a combined way, so that a user as a receiver can acquire the voice information and the text description information with main content of the voice information in a combine way, can know the content of the voice information without listening to the voice information, and can visually acquire the voice content in the voice information in an auditory sense and visual sense combined way, and user experience is improved.

Description

For sending method, device and the equipment of voice messaging and textual description information thereof

Technical field

The present invention relates to information of mobile terminal and send field, relate in particular to a kind of method, device and equipment for voice messaging and its textual description information are sent explicitly.

Background technology

Along with mobile Internet is played the part of more and more important role in people's life, people can carry out the transmission of information whenever and wherever possible with mutual in the Internet by mobile terminal.But in prior art, mobile terminal receives the voice messaging of user's input and directly sends, therefore, as the user who receives a side, this voice messaging need to be downloaded to local and carry out and listen to the content that operation can be known voice messaging, it cannot obtain intuitively the voice content in this voice messaging in the situation that not listening to speech message, the mode that also cannot combine with vision by the sense of hearing is obtained the voice content in voice messaging, thereby causes user experience not high.

Summary of the invention

The object of this invention is to provide a kind of method, device and equipment for voice messaging and its textual description information are sent explicitly.

According to an aspect of the present invention, provide a kind of in mobile terminal the method for voice messaging and its textual description information are sent explicitly, the method comprises the following steps:

A obtains the text identification object information that voice messaging to be identified is carried out to voice recognition processing gained;

B, according to described text identification object information, generates for describing the textual description information of the voice content of described voice messaging;

C sends described textual description information and described voice messaging explicitly.

According to a further aspect in the invention, also provide a kind of dispensing device for voice messaging and its textual description information are sent explicitly, this dispensing device comprises:

Recognition result acquisition device, for obtaining the text identification object information that voice messaging to be identified is carried out to voice recognition processing gained;

Descriptor generating apparatus, for according to described text identification object information, generates for describing the textual description information of the voice content of described voice messaging;

Associated dispensing device, for sending described textual description information and described voice messaging explicitly.

Compared with prior art, the present invention has the following advantages: what 1) mobile terminal can combine voice messaging and textual description information thereof sends, what make voice messaging can be combined with the textual description information that comprises its main contents as the user who accepts a side carries out obtaining of information, make user also can understand the content of voice messaging without listen to voice messaging in the situation that, and realized the mode that user combines with vision by the sense of hearing and obtained intuitively the voice content in voice messaging, improve user profile and obtain experience; 2) because the user as accepting a side only can obtain the main contents in voice messaging intuitively by browsing text descriptor, when the voice content at voice messaging more, need to spend the more time listens in the situation of voice, has improved the user's who receives text descriptor information acquisition efficiency; 3) by textual description information is sent explicitly with voice messaging in a variety of forms, realized the send mode that the voice messaging of diversification combines with text message, improve user's experience; 4) further, as the user who receives a side, also can first to textual description information, browse, to judge whether it wishes to listen to voice messaging, thereby realize the anticipation of user to rubbish voice information, avoid listening to voice junk information.

Accompanying drawing explanation

By reading the detailed description that non-limiting example is done of doing with reference to the following drawings, it is more obvious that other features, objects and advantages of the present invention will become:

Fig. 1 illustrates the structural representation for dispensing device that voice messaging and its textual description information are sent explicitly of one aspect of the invention;

Fig. 2 illustrates the equipment schematic diagram for dispensing device that voice messaging and its textual description information are sent explicitly in accordance with a preferred embodiment of the present invention;

Fig. 3 illustrate according to another preferred embodiment of the present invention for generating for describing the descriptor generating apparatus of textual description information of the voice content of described voice messaging;

Fig. 4 illustrates according to a further aspect of the present invention the method flow diagram for voice messaging and its textual description information are sent explicitly;

Fig. 5 illustrates the method flow diagram for voice messaging and its textual description information are sent explicitly in accordance with a preferred embodiment of the present invention;

Fig. 6 illustrate according to another preferred embodiment of the present invention for generating for describing the method flow diagram of textual description information of the voice content of described voice messaging.

In accompanying drawing, same or analogous Reference numeral represents same or analogous parts.

Embodiment

Below in conjunction with accompanying drawing, the present invention is described in further detail.

Fig. 1 illustrates the structural representation for dispensing device that voice messaging and its textual description information are sent explicitly of one aspect of the invention.The dispensing device of the present embodiment is contained in mobile terminal; This dispensing device comprises recognition result acquisition device 1, descriptor generating apparatus 2 and associated dispensing device 3.

Wherein, described mobile terminal comprise a kind of can be according to the instruction of prior setting or storage, automatically carry out the electronic equipment of numerical computations and information processing, its hardware can include but not limited to microprocessor, application-specific integrated circuit (ASIC) (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc.Described mobile terminal includes but not limited to anyly applicable to of the present invention, can carry out mutual electronic product with user by keyboard, touch-screen etc., such as panel computer, mobile phone, PDA, palmtop PC (PPC), game machine (PSP) etc.

Those skilled in the art will be understood that above-mentioned mobile terminal is only for giving an example, and other mobile terminals existing or that may occur from now on, as applicable to the present invention, also should be included in protection range of the present invention, and are contained in this with way of reference.

First, recognition result acquisition device 1 obtains the text identification object information that voice messaging to be identified is carried out to voice recognition processing gained.

Wherein, the source of described voice messaging to be identified includes but not limited to:

1) voice messaging of inputting by this mobile terminal from the user who uses mobile terminal;

2) from the voice messaging to be sent being stored in the local voice storehouse of mobile terminal;

3) from the voice messaging to be sent that is forwarded to mobile terminal of third party device.

Wherein, those skilled in the art should be able to determine according to actual conditions and demand the mode of described speech recognition.Preferably, the mode of described speech recognition includes but not limited to:

1) voice recognition mode based on DTW (Dynamic Time Warping) and simulation matching technique; Wherein, DTW and template matching technique directly utilize the phonetic feature of extraction as template, can be used for realizing alone word voice identification.

2) the identification voice recognition mode based on implicit Markov model HMM (Hidden Markov Model); Wherein, this voice recognition mode, by a large amount of speech datas are carried out to data statistics, is set up identification statistical model, then from voice to be identified, extracts feature, with these Model Matching, by comparison match analysis, obtains recognition result.

3) voice recognition mode of artificial neural net ANN (Artificial Neural Networks); This voice recognition mode is to imitate animal nerve network behavior feature, to form the voice recognition mode on the basis of algorithm mathematics model of distributed parallel information processing.

It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, it should be appreciated by those skilled in the art that the implementation of any speech recognition, all can be within the scope of the present invention.

Particularly, recognition result acquisition device 1 obtains the mode that voice messaging to be identified is carried out to the text identification object information of voice recognition processing gained and includes but not limited to:

1) recognition result acquisition device 1 obtains mobile terminal and voice messaging to be identified is carried out to the text identification object information of voice recognition processing gained;

In an example, first, use the user of mobile terminal by the voice interaction device of this mobile terminal, as microphone, to this mobile terminal, input voice messaging to be identified; Mobile terminal receives this voice messaging to be identified, then, 1 pair of this voice messaging to be identified of recognition result acquisition device of this mobile terminal carries out voice recognition processing, for example, from these voice to be identified, extract phonetic feature, mate with the speech model generating based on HMM recognizer, by comparison match analysis, obtain text identification object information.

2) recognition result acquisition device 1 obtains the text identification object information that voice messaging to be identified that the network equipment offers it to mobile terminal carries out voice recognition processing gained;

Wherein, the described network equipment comprise a kind of can be according to the instruction of prior setting or storage, automatically carry out the electronic equipment of numerical computations and information processing, its hardware includes but not limited to microprocessor, application-specific integrated circuit (ASIC) (ASIC), programmable gate array (FPGA), digital processing unit (DSP), embedded device etc.The described network equipment includes but not limited to the cloud that computer, network host, single network server, a plurality of webserver collection or a plurality of server form.At this, cloud consists of a large amount of computers based on cloud computing (Cloud Computing) or the webserver, and wherein, cloud computing is a kind of of Distributed Calculation, the super virtual machine being comprised of the loosely-coupled computer collection of a group.

Wherein, between described mobile terminal and the described network equipment, can realize and communicating by letter by any communication mode, include but not limited to the mobile communication based on 3GPP, LTE, WIMAX, the computer network communication based on TCP/IP, udp protocol and the low coverage wireless transmission method based on bluetooth, infrared transmission standard.The network being connected between described mobile terminal and the described network equipment includes but not limited to: the Internet, wide area network, metropolitan area network, local area network (LAN), VPN network, wireless self-organization network (Ad Hoc network) etc.

In an example, recognition result acquisition device 1 is sent to the network equipment by the voice messaging to be sent being stored in the local voice storehouse of mobile terminal; The network equipment carries out voice recognition processing to this voice messaging, to obtain text identification object information; The network equipment is sent to this mobile terminal by this network text recognition result information subsequently, and this mobile terminal receives the text recognition result information that this network equipment returns.

3) preferably, recognition result acquisition device 1 combines the text identification object information obtaining from mobile terminal this locality with obtain text identification result from the network equipment, with obtain in conjunction with after text identification result; Wherein, recognition result acquisition device comprises the first result acquisition device (not shown) and the second result acquisition device (not shown); The first result acquisition device obtains described mobile terminal and described voice messaging to be identified is carried out to the local text identification object information of voice recognition processing gained, and the voice messaging described to be identified that the network equipment offers it to described mobile terminal carries out the network text recognition result information of voice recognition processing gained; Then, the second result acquisition device, according to described local text identification object information and described network text recognition result information, obtains described text identification object information.

Particularly, the first result acquisition device obtains mobile terminal and voice messaging to be identified is carried out to the local text identification object information of voice recognition processing gained, and this voice messaging to be identified that the network equipment offers it to this mobile terminal carries out the network text recognition result information of voice recognition processing gained; Then, the second result acquisition device is to this this locality text identification object information and this network text recognition result information, by natural language analysis rule, analyze respectively identification, such as regular by natural language analysises such as sentence cutting, part part-of-speech tagging, title extraction, chunk, parsings, this locality and network text recognition result information are resolved respectively, to determine semantic correct text message in this locality and network text recognition result information, using as text identification object information.Wherein, described natural language analysis rule mean by this natural language analysis rule realize man-machine between the communication of natural language, realize the understanding of computer to natural language, it can be realized by natural language processing instruments such as OpenNLP, FudanNLP, Standford NLP, language technology platforms (LTP).

For example, the first result acquisition device obtains mobile terminal and voice messaging to be identified is carried out to the local text identification object information of voice recognition processing gained, this this locality text identification object information is " ordering egg tonight 6, we see by Dongdan subway exit, do not forgotten to take file ", and, this voice messaging to be identified that the network equipment offers it to this mobile terminal carries out the network text recognition result information of voice recognition processing gained, this network text recognition result is " 6 thirty of tonight, we see by Dongdan subway station B mouth, have not forgotten to take ", then, first the second result acquisition device determines that the common factor in local text identification object information and network text recognition result information is " tonight 6 X, we see by Dongdan subway station X mouth, do not forgotten to take XXX ", and by sentence cutting, part part-of-speech tagging, title extracts, chunk, parsing waits natural language analysis rule, respectively this this locality text identification object information and this network text recognition result information are analyzed to identification, take and determine that the correct content of text of voice of these two non-intersect parts of text identification result is " point ", " B " and " upper file ", the text identification result of obtaining subsequently after merging is " 6 thirty of tonight, we see by Dongdan subway station B mouth, do not forgotten to take file ".

It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation of voice messaging to be identified being carried out to the text identification object information of voice recognition processing gained of obtaining, all should be within the scope of the present invention.

By and network local from mobile terminal, obtain respectively text identification result; realized in the situation that local voice recognition capability is weak; the network equipment can provide to this mobile terminal the technical support of higher speech recognition capabilities; to guarantee accuracy and the reliability of the text identification result of acquisition, the textual description information of further describing comparatively accurately voice content in voice messaging for obtaining provides strong guarantee.

Then, the text identification object information that descriptor generating apparatus 2 obtains according to recognition result acquisition device 1, generates for describing the textual description information of the voice content of described voice messaging.

Wherein, descriptor generating apparatus 2 generates for describing the mode of textual description information of the voice content of voice messaging and includes but not limited to according to text identification object information:

1) descriptor generating apparatus 2, according to front the first book character number in text identification object information or the front second predetermined complete statement number, generates textual description information;

For example, the text identification object information that voice messaging identifies is " I want to see spy 4 in dish very much; do not know where also showing; we find ", descriptor generating apparatus 2 extracts the 1st complete statement " I want to see spy 4 in dish very much " in text recognition result information, usings as the textual description information that is used for describing the voice content of this voice messaging.

Alternatively, the number of characters comprising when text identification object information is less than or equal to the 3rd book character number, or the complete statement number that comprises of text identification object information is when being less than or equal to the 4th predetermined phrase and counting, descriptor generating apparatus 2 is using the whole textual description information as being used for describing the voice content of voice messaging of text recognition result information;

For example, the text identification object information that voice messaging identifies is " you come off duty some ", wherein only comprise 1 complete statement number, be less than the 4th predetermined phrase number 2, descriptor generating apparatus 2 using text recognition result information whole " you come off duty some " as the textual description information that is used for describing the voice content of voice messaging.

2) this dispensing device also comprises keyword extracting device (not shown), extracts at least one main body keyword the text identification object information that keyword extracting device obtains from recognition result acquisition device 1; Descriptor generating apparatus 2 comprises sub-generating apparatus (not shown), and it generates textual description information according to the part or all of keyword in described at least one main body keyword.

Particularly, the text identification object information that keyword extracting device obtains from recognition result acquisition device 1, extract subject, predicate and the object of each statement, preferably, also can comprise such as attribute, the adverbial modifier etc., or, keyword extracting device obtains notional word from recognition result acquisition device 1, the main body keyword of usining as text recognition result information; Then, sub-generating apparatus, according to the part or all of keyword in this at least one main body keyword, generates textual description information.Preferably, when text identification object information only comprises a word, keyword extracting device is using this word as main body keyword.

Preferably, sub-generating apparatus is according to the part or all of keyword at least one main body keyword, and the mode that generates textual description information includes but not limited to:

A combines the main body keyword of front the 6th book character number at least one main body keyword, to generate textual description information;

B combines the whole keywords at least one main body keyword, to generate textual description information.

In an example, the text identification object information that voice messaging has identified is " may rain today; while going out, you take the umbrella of face behind the door ", keyword extracting device is by natural language analysis rules such as sentence cutting, part part-of-speech tagging, title extraction, parsings, the subject " today " and the predicate that from text recognition result, extract first " rain ", extract the subject " you " of second, predicate " takes " and object " umbrella "; Sub-generating apparatus is according to these whole main body keywords, the subject of first and predicate are merged, to generate " rain today ", the subject of second, predicate and object are merged, to generate " you take umbrella ", generate textual description information " rain today, and you take umbrella ".

It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any according to text identification object information, generate for describing the implementation of textual description information of the voice content of described voice messaging, for example, can text identification object information and predetermined keywords database in keyword carry out matching inquiry, and using the keyword that can match in text identification object information as the keyword from wherein extracting etc., all should be within the scope of the present invention.

Using textual description information as the voice content of describing voice messaging, can guarantee as the user who accepts a side only by browsing text descriptor, can intuitively and accurately obtain the main contents in voice messaging, when the voice content at voice messaging more, need to spend the more time listens in the situation of voice, can also improve this user's information acquisition efficiency.

Subsequently, associated dispensing device 3 obtains descriptor generating apparatus 2 textual description information and described voice messaging send explicitly.

Preferably, described textual description information and described voice messaging explicitly mode include but not limited to following at least one:

1) heading message using textual description information as voice messaging;

2) summary info using textual description information as voice messaging;

3) attribute information using textual description information as voice messaging, for example, adds textual description information in the attribute description of voice document.

Wherein, the mode that associated dispensing device 3 sends textual description information and voice messaging explicitly includes but not limited to following situation:

1) heading message using textual description information as voice messaging, sends explicitly with voice messaging;

For example, the filename of associated dispensing device 3 using this heading message as voice messaging, sends the voice messaging with this document name; When the user as accepting a side receives this voice messaging, can directly by browsing the filename of this voice messaging, obtain the voice content of this voice messaging.

2) summary info using textual description information as voice messaging, sends explicitly with voice messaging;

For example, associated dispensing device 3 is the voice messaging annex in summary info using voice messaging, and the summary info with this voice messaging annex is sent; When the user as accepting a side receives this summary info, can directly by the content of text of this summary info, obtain the voice content of this voice messaging.

Again for example, associated dispensing device 3 by summary info with Stealth Modus, as summary info is hidden, be one can be clicked Drawing Object, what be associated with voice messaging sends, when the user as accepting a side receives this voice messaging, and to this Drawing Object click, during the operation such as suspension, hiding summary info is presented.

3) attribute information using textual description information as voice messaging, sends explicitly with voice messaging;

For example, attribute title in the attribute information of associated dispensing device 3 using textual description information as voice messaging, attribute subject content etc., what be associated with voice messaging sends, when the user as accepting a side receives this voice messaging, the attribute information of this voice messaging is checked in click, can obtain text descriptor.

It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any implementation that described textual description information and described voice messaging are sent explicitly, all should be within the scope of the present invention.

Due to sending that mobile terminal can combine voice messaging and textual description information thereof, what make voice messaging can be combined with the textual description information that comprises its main contents as the user who accepts a side carries out obtaining of information, make user also can understand the content of voice messaging without listen to voice messaging in the situation that, and realized the mode that user combines with vision by the sense of hearing and obtained intuitively the voice content in voice messaging, improve user profile and obtain experience; Meanwhile, by textual description information is sent explicitly with voice messaging in a variety of forms, realize the send mode that the voice messaging of diversification combines with text message, improved user's experience; Further, as the user who receives a side, also can first to textual description information, browse, to judge whether it wishes to listen to voice messaging, can realize the anticipation of user to rubbish voice information, avoid listening to voice junk information.

As one of the preferred version of the present embodiment (with reference to Fig. 1), dispensing device also comprises configuration information acquisition device (not shown), and configuration information acquisition device obtains the transmission configuration information of described mobile terminal; The operation that associated dispensing device 3 sends described textual description information and described voice messaging explicitly comprises: according to described transmission configuration information, described textual description information and described voice messaging are sent explicitly.

Particularly, the mode that configuration information acquisition device obtains the transmission configuration information of mobile terminal includes but not limited to:

1) preset transmission configuration information is read in the configuration information storehouse of the application sending for information from mobile terminal, for example, sends configuration information and comprises text identification object information is sent as heading message;

2) by carrying out alternately with using the user of mobile terminal, obtain in real time the transmission configuration information of the mobile terminal of user's setting.

Then, when associated dispensing device 3 need to send textual description information and voice messaging explicitly, the transmission configuration information that it obtains according to configuration information acquisition device, to send the send mode that is associated shown in configuration information, sends textual description information and voice messaging.

In an example, first, configuration information acquisition device reads the configuration information storehouse that information in mobile terminal sends application, and obtaining and sending configuration information is the summary info using textual description information as voice messaging, and what be associated with voice messaging sends; Subsequently, associated dispensing device 3 sends configuration information according to this, and the voice messaging annex using voice messaging in summary info, sends the summary info with this voice messaging annex.

It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, the any transmission configuration information of obtaining described mobile terminal, the implementation that described textual description information and described voice messaging are sent explicitly, all should be within the scope of the present invention.

By textual description information and voice messaging being sent explicitly according to sending configuration information, realized configurable associated send mode, user can select its desired send mode preset or in real time, improves the controllability of information sender formula, and then improves user's experience.

Fig. 2 illustrates the equipment schematic diagram for dispensing device that voice messaging and its textual description information are sent explicitly in accordance with a preferred embodiment of the present invention.Dispensing device in the present embodiment comprises recognition result acquisition device 1, descriptor generating apparatus 2, associated dispensing device 3, positional information extraction element 4 and cartographic information acquisition device 5; Wherein, associated dispensing device 3 comprises sub associated dispensing device 31.

Wherein, recognition result acquisition device 1 and descriptor generating apparatus 2 are described in detail with reference to the embodiment shown in FIG. 1, do not repeat them here.

The geographical location information that positional information extraction element 4 extracts in described text identification object information.Particularly, positional information extraction element 4 can adopt various ways to extract the geographical location information in text identification object information, and for example, its adoptable mode includes but not limited to:

1) positional information extraction element 4, by the predetermined regular expression for identification of geographic location information, mates identification, to determine the geographical location information matching with this regular expression in text recognition result information in text identification object information;

For example, the C Sharp language of take is example, by predetermined XX district, expression geographical location information “XX city " regular expression:

" (? <c>.* ?) city (? <d>.* ?) district .* ",

In text identification object information, mate, take and determine that the geographical location information matching with this regular expression is " Haidian District, Beijing City "; Wherein, described C Sharp voice are a kind of Object Oriented OO (object-oriented) program language.

2) positional information extraction element 4, according to predetermined geography information character string, mates identification in text identification object information, to determine the geographical location information matching with this predetermined geographical information character string in text recognition result information;

For example, predetermined geography information character string is respectively " ”,“ Dongcheng District, Beijing during March, ”,“ Chaoyang District, Beijing City, Haidian District, Beijing City ", these three geography information character strings are mated to identification at text identification object information respectively, determine that the text sequence that " Dongcheng District, Beijing during March " and text recognition result comprise matches, extract text sequence corresponding to this geography information character string, using as geographical location information.

Then, cartographic information acquisition device 5, according to described geographical location information, obtains the cartographic information of the affiliated geographic area of described geographical location information.Particularly, cartographic information acquisition device 5 can adopt various ways to obtain the cartographic information of the affiliated geographic area of geographical location information, and for example, its adoptable mode includes but not limited to:

1) cartographic information acquisition device 5 is by calling the map application interface (API) of mobile terminal this locality, using geographical location information as input parameter, in map application, carry out position enquiring, to generate the cartographic information of geographic area under this geographical location information, as generated map picture;

2) cartographic information acquisition device 5, according to geographical location information, carries out matching inquiry in the cartographic information storehouse of mobile terminal this locality or the network equipment, to obtain the cartographic information corresponding with this geographical position prestoring.

Subsequently, sub associated dispensing device 31, by cartographic information and textual description information, sends explicitly with voice messaging; As using cartographic information and textual description information as the picture and text summary info of voice messaging, and for example using textual description information as voice messaging and the heading message of cartographic information, and for example using cartographic information and textual description information as the attribute information of voice messaging, send explicitly with voice messaging.

It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, geographical location information in the described text identification object information of any extraction, according to described geographical location information, obtain the cartographic information of the affiliated geographic area of described geographical location information, then by described cartographic information and described text identification object information, with the implementation that described voice messaging sends explicitly, all should be within the scope of the present invention.

When voice messaging comprises geographical location information, by the transmission that the corresponding cartographic information of geographic area under this geographical location information is associated with text identification object information, make the user who receives a side without searching voluntarily, can directly obtain visually the accurate location of the affiliated geographic area of geographical location information, improve user's viewing experience.

Fig. 3 illustrate according to another preferred embodiment of the present invention for generating for describing the descriptor generating apparatus 2 of textual description information of the voice content of described voice messaging.Descriptor generating apparatus 2 comprises original text generating apparatus 21, presents device 22 and descriptor acquisition device 23.

Wherein, original text generating apparatus 21, according to text identification object information, generates for describing the original text descriptor of the voice content of voice messaging.Its mode with reference to the generating apparatus of descriptor shown in Fig. 12 according to text identification object information, generate for describing the mode of textual description information of voice content of voice messaging same or similarly, do not repeat them here.

Present device 22 and described original text descriptor is presented to the user who uses described mobile terminal.

For example, present the original text descriptor that device 22 generates original text generating apparatus 21, by page technology, as JSP, ASP, PHP, in mobile terminal display screen curtain, present to user with form in certain sequence, for the user of this mobile terminal, browse.

Then, descriptor acquisition device 23 is the adjusting operation to described textual description information according to described user, obtains the textual description information after adjustment.

For example, descriptor acquisition device 23, by the interactive operation with this user, obtains the adjusting operation of this user to text descriptor, as deletion, interpolation, word order adjustment etc.; Subsequently according to should (etc.) adjusting operation, text descriptor is adjusted accordingly, to obtain the textual description information after adjustment.

It should be noted that, above-mentioned for example only for technical scheme of the present invention is described better, but not limitation of the present invention, those skilled in the art should understand that, any according to described text identification object information, generate for describing the original text descriptor of the voice content of described voice messaging, then described original text descriptor is presented to the user who uses described mobile terminal, the adjusting operation to described textual description information according to described user subsequently, obtain the implementation of the textual description information after adjustment, all should be within the scope of the present invention.

By obtaining user, original text descriptor is carried out to adjusting operation to obtain the textual description information after adjustment, thereby make in the inaccurate situation of original text descriptor, user carries out adjustment edit operation to original text descriptor becomes possibility, improve the content accuracy of textual description information, thereby guarantee textual description information and voice messaging after adjusting more accurately to send explicitly.

Fig. 4 illustrates the method flow diagram for voice messaging and its textual description information are sent explicitly of one aspect of the invention.According to the method for this preferred embodiment, comprise step S1, step S2 and step S3.Wherein, method of the present invention mainly realizes by mobile terminal.

First, in step S1, acquisition for mobile terminal carries out the text identification object information of voice recognition processing gained to voice messaging to be identified.

Particularly, in step S1, the mode that acquisition for mobile terminal carries out the text identification object information of voice recognition processing gained to voice messaging to be identified includes but not limited to:

1), in step S1, acquisition for mobile terminal mobile terminal carries out the text identification object information of voice recognition processing gained to voice messaging to be identified;

In an example, first, use the user of mobile terminal by the interactive voice mode with this mobile terminal, as by microphone, to this mobile terminal, input voice messaging to be identified; Mobile terminal receives this voice messaging to be identified, then, in step S1, mobile terminal carries out voice recognition processing to this voice messaging to be identified, for example, from these voice to be identified, extract phonetic feature, mate with the speech model generating based on HMM recognizer, by comparison match analysis, obtain text identification object information.

2), in step S1, the voice messaging to be identified that the acquisition for mobile terminal network equipment offers it to mobile terminal carries out the text identification object information of voice recognition processing gained;

In an example, in step S1, mobile terminal is sent to the network equipment by the voice messaging to be sent being stored in the local voice storehouse of mobile terminal; The network equipment carries out voice recognition processing to this voice messaging, to obtain text identification object information; The network equipment is sent to this mobile terminal by this network text recognition result information subsequently, and this mobile terminal receives the text recognition result information that this network equipment returns.

3) preferably, in step S1, mobile terminal combines the text identification object information obtaining from mobile terminal this locality with obtain text identification result from the network equipment, with obtain in conjunction with after text identification result; Wherein, step S1 comprises step S11 (not shown) and step S12 (not shown); In step S11, described in acquisition for mobile terminal, mobile terminal carries out the local text identification object information of voice recognition processing gained to described voice messaging to be identified, and the voice messaging described to be identified that the network equipment offers it to described mobile terminal carries out the network text recognition result information of voice recognition processing gained; Then,, in step S12, mobile terminal, according to described local text identification object information and described network text recognition result information, obtains described text identification object information.

Particularly, in step S11, acquisition for mobile terminal mobile terminal carries out the local text identification object information of voice recognition processing gained to voice messaging to be identified, and this voice messaging to be identified that the network equipment offers it to this mobile terminal carries out the network text recognition result information of voice recognition processing gained; Then, in step S12, mobile terminal is to this this locality text identification object information and this network text recognition result information, by natural language analysis rule, analyze respectively identification, such as regular by natural language analysises such as sentence cutting, part part-of-speech tagging, title extraction, chunk, parsings, this locality and network text recognition result information are resolved respectively, to determine semantic correct text message in this locality and network text recognition result information, using as text identification object information.Wherein, described natural language analysis rule mean by this natural language analysis rule realize man-machine between the communication of natural language, realize the understanding of computer to natural language, it can be realized by natural language processing instruments such as OpenNLP, FudanNLP, Standford NLP, language technology platforms (LTP).

For example, in step S11, acquisition for mobile terminal mobile terminal carries out the local text identification object information of voice recognition processing gained to voice messaging to be identified, this this locality text identification object information is " ordering egg tonight 6, we see by Dongdan subway exit, do not forgotten to take file ", and, this voice messaging to be identified that the network equipment offers it to this mobile terminal carries out the network text recognition result information of voice recognition processing gained, this network text recognition result is " 6 thirty of tonight, we see by Dongdan subway station B mouth, have not forgotten to take ", then, in step S12, first mobile terminal determines that the common factor in local text identification object information and network text recognition result information is " tonight 6 X, we see by Dongdan subway station X mouth, do not forgotten to take XXX ", and by sentence cutting, part part-of-speech tagging, title extracts, chunk, parsing waits natural language analysis rule, respectively this this locality text identification object information and this network text recognition result information are analyzed to identification, take and determine that the correct content of text of voice of these two non-intersect parts of text identification result is " point ", " B " and " upper file ", the text identification result of obtaining subsequently after merging is " 6 thirty of tonight, we see by Dongdan subway station B mouth, do not forgotten to take file ".

Then,, in step S2, the text identification object information that mobile terminal obtains in step S1 according to it, generates for describing the textual description information of the voice content of described voice messaging.

Wherein, in step S2, mobile terminal generates for describing the mode of textual description information of the voice content of voice messaging and includes but not limited to according to text identification object information:

1), in step S2, mobile terminal, according to front the first book character number in text identification object information or the front second predetermined complete statement number, generates textual description information;

For example, the text identification object information that voice messaging identifies is " I want to see spy 4 in dish very much; do not know where also showing; we find ", in step S2, mobile terminal extracts the 1st complete statement " I want to see spy 4 in dish very much " in text recognition result information, usings as the textual description information that is used for describing the voice content of this voice messaging.

Alternatively, the number of characters comprising when text identification object information is less than or equal to the 3rd book character number, or the complete statement number that comprises of text identification object information is when being less than or equal to the 4th predetermined phrase and counting, in step S2, mobile terminal is using the whole textual description information as being used for describing the voice content of voice messaging of text recognition result information;

For example, the text identification object information that voice messaging identifies is " you come off duty some ", wherein only comprise 1 complete statement number, be less than the 4th predetermined phrase number 2, in step S2, mobile terminal using text recognition result information whole " you come off duty some " as the textual description information that is used for describing the voice content of voice messaging.

2) the method also comprises step S6 (not shown), in step S6, extracts at least one main body keyword in the text identification object information that mobile terminal obtains among step S1 from it; Step S2 comprises step S24 (not shown), in step S24, mobile terminal its according to the part or all of keyword in described at least one main body keyword, generate textual description information.

Particularly, in step S6, in the text identification object information that mobile terminal obtains among step S1 from it, extract subject, predicate and the object of each statement, preferably, also can comprise such as attribute, the adverbial modifier etc., or, in step S6, in the text identification object information that mobile terminal obtains among step S1 from it, obtain notional word, the main body keyword of usining as text recognition result information; Then,, in step S24, mobile terminal, according to the part or all of keyword in this at least one main body keyword, generates textual description information.Preferably, when text identification object information only comprises a word, in step S6, mobile terminal is using this word as main body keyword.

Preferably, in step S24, mobile terminal is according to the part or all of keyword at least one main body keyword, and the mode that generates textual description information includes but not limited to:

In an example, the text identification object information that voice messaging has identified is " may rain today; while going out, you take the umbrella of face behind the door ", in step S6, mobile terminal is by natural language analysis rules such as sentence cutting, part part-of-speech tagging, title extraction, parsings, the subject " today " and the predicate that from text recognition result, extract first " rain ", extract the subject " you " of second, predicate " takes " and object " umbrella "; In step S24, mobile terminal is according to these whole main body keywords, the subject of first and predicate are merged, to generate " rain today ", the subject of second, predicate and object are merged, to generate " you take umbrella ", generate textual description information " rain today, and you take umbrella ".

Subsequently, in step S3, textual description information and described voice messaging that mobile terminal obtains it in step S2 send explicitly.

1) heading message using textual description information as voice messaging;

2) summary info using textual description information as voice messaging;

Wherein, in step S3, the mode that mobile terminal sends textual description information and voice messaging explicitly includes but not limited to following situation:

For example, in step S3, the filename of mobile terminal using this heading message as voice messaging, sends the voice messaging with this document name; When the user as accepting a side receives this voice messaging, can directly by browsing the filename of this voice messaging, obtain the voice content of this voice messaging.

For example, in step S3, mobile terminal is the voice messaging annex in summary info using voice messaging, and the summary info with this voice messaging annex is sent; When the user as accepting a side receives this summary info, can directly by the content of text of this summary info, obtain the voice content of this voice messaging.

Again for example, in step S3, mobile terminal by summary info with Stealth Modus, as summary info is hidden, be one can be clicked Drawing Object, what be associated with voice messaging sends, when the user as accepting a side receives this voice messaging, and to this Drawing Object click, during the operation such as suspension, hiding summary info is presented.

For example, in step S3, attribute title in the attribute information of mobile terminal using textual description information as voice messaging, attribute subject content etc., what be associated with voice messaging sends, when the user as accepting a side receives this voice messaging, the attribute information of this voice messaging is checked in click, can obtain text descriptor.

As one of the preferred version of the present embodiment (with reference to Fig. 4), the method also comprises step S7 (not shown), in step S7, and the transmission configuration information of mobile terminal described in acquisition for mobile terminal; In step S3, the step that mobile terminal sends described textual description information and described voice messaging explicitly comprises: according to described transmission configuration information, described textual description information and described voice messaging are sent explicitly.

Particularly, in step S7, the mode of the transmission configuration information of acquisition for mobile terminal mobile terminal includes but not limited to:

Then, when in step S3, when mobile terminal need to send textual description information and voice messaging explicitly, the transmission configuration information that it obtains in step S7 according to it, textual description information and voice messaging, to send the send mode that is associated shown in configuration information, are sent.

In an example, first, in step S7, mobile terminal reads the configuration information storehouse that information in mobile terminal sends application, and obtaining and sending configuration information is the summary info using textual description information as voice messaging, and what be associated with voice messaging sends; Subsequently, in step S3, mobile terminal sends configuration information according to this, and the voice messaging annex using voice messaging in summary info, sends the summary info with this voice messaging annex.

Fig. 5 illustrates the method flow diagram for voice messaging and its textual description information are sent explicitly in accordance with a preferred embodiment of the present invention.Method in the present embodiment comprises step S1, step S2, step S31, step S4 and step S5.

Wherein, step S1 and step S2 are described in detail with reference to the embodiment shown in FIG. 3, do not repeat them here.

In step S4, mobile terminal extracts the geographical location information in described text identification object information.Particularly, in step S4, mobile terminal can adopt various ways to extract the geographical location information in text identification object information, and for example, its adoptable mode includes but not limited to:

1) in step S4, mobile terminal is by the predetermined regular expression for identification of geographic location information, in text identification object information, mate identification, to determine the geographical location information matching with this regular expression in text recognition result information;

" (? <c>.* ?) city (? <d>.* ?) district .* ",

2) in step S4, mobile terminal, according to predetermined geography information character string, mates identification in text identification object information, to determine the geographical location information matching with this predetermined geographical information character string in text recognition result information;

Then,, in step S5, mobile terminal, according to described geographical location information, obtains the cartographic information of the affiliated geographic area of described geographical location information.Particularly, in step S5, mobile terminal can adopt various ways to obtain the cartographic information of the affiliated geographic area of geographical location information, and for example, its adoptable mode includes but not limited to:

1) in step S5, mobile terminal, by calling the map application interface (API) of mobile terminal this locality, using geographical location information as input parameter, carries out position enquiring in map application, to generate the cartographic information of geographic area under this geographical location information, as generated map picture;

2) in step S5, mobile terminal, according to geographical location information, carries out matching inquiry in the cartographic information storehouse of mobile terminal this locality or the network equipment, to obtain the cartographic information corresponding with this geographical position prestoring.

Subsequently, in step S31, mobile terminal, by cartographic information and textual description information, sends explicitly with voice messaging; As using cartographic information and textual description information as the picture and text summary info of voice messaging, and for example using textual description information as voice messaging and the heading message of cartographic information, and for example using cartographic information and textual description information as the attribute information of voice messaging, send explicitly with voice messaging.

Fig. 6 illustrate according to another preferred embodiment of the present invention for generating for describing the method flow diagram of the voice content of described voice messaging.In the method for the present embodiment, step S2 comprises step S21, step S22 and step S23.

Wherein, in step S21, mobile terminal, according to text identification object information, generates for describing the original text descriptor of the voice content of voice messaging.Its mode with reference to mobile terminal shown in Fig. 4 in step S2 according to text identification object information, generate for describing the mode of textual description information of voice content of voice messaging same or similarly, do not repeat them here.

In step S22, mobile terminal is presented to described original text descriptor the user who uses described mobile terminal.

For example, in step S22, the original text descriptor that mobile terminal generates it in step S21, pass through page technology, as JSP, ASP, PHP, in mobile terminal display screen curtain, present to user with form in certain sequence, for the user of this mobile terminal, browse.

Then, in step S23, mobile terminal is the adjusting operation to described textual description information according to described user, obtains the textual description information after adjustment.

For example, in step S23, mobile terminal, by the interactive operation with this user, obtains the adjusting operation of this user to text descriptor, as deletion, interpolation, word order adjustment etc.; Subsequently according to should (etc.) adjusting operation, text descriptor is adjusted accordingly, to obtain the textual description information after adjustment.

It should be noted that the present invention can be implemented in the assembly of software and/or software and hardware, for example, hardware wherein can adopt application-specific integrated circuit (ASIC) (ASIC) or any other similar hardware device to realize.In one embodiment, software program of the present invention can carry out to realize step mentioned above or function by processor.Similarly, software program of the present invention (comprising relevant data structure) can be stored in computer readable recording medium storing program for performing, for example, and RAM memory, magnetic or CD-ROM driver or floppy disc and similar devices.In addition, steps more of the present invention or function can adopt hardware to realize, for example, thereby as coordinate the circuit of carrying out each step or function with processor.

To those skilled in the art, obviously the invention is not restricted to the details of above-mentioned one exemplary embodiment, and in the situation that not deviating from spirit of the present invention or essential characteristic, can realize the present invention with other concrete form.Therefore, no matter from which point, all should regard embodiment as exemplary, and be nonrestrictive, scope of the present invention is limited by claims rather than above-mentioned explanation, is therefore intended to be included in the present invention dropping on the implication that is equal to important document of claim and all changes in scope.Any Reference numeral in claim should be considered as limiting related claim.In addition, obviously other steps do not got rid of in " comprising " word, and odd number is not got rid of plural number.A plurality of devices of stating in device claim also can be realized by software or hardware by a device.The first, the second word such as grade is used for representing title, and does not represent any specific order.

Claims

1. the method for voice messaging and its textual description information are sent explicitly in mobile terminal, the method comprises the following steps:

2. method according to claim 1, wherein, the method also comprises:

-from described text identification object information, extract at least one main body keyword;

Wherein, described step b comprises:

-according to the part or all of keyword in described at least one main body keyword, generate described textual description information.

3. method according to claim 1 and 2, wherein, described textual description information and described voice messaging explicitly mode include but not limited to following at least one:

-heading message using described textual description information as described voice messaging;

-summary info using described textual description information as described voice messaging;

-attribute information using described textual description information as described voice messaging.

4. according to the method in any one of claims 1 to 3, wherein, the method also comprises:

-extract the geographical location information in described text identification object information;

-according to described geographical location information, obtain the cartographic information of the affiliated geographic area of described geographical location information;

Wherein, described step c comprises:

-by described cartographic information and described textual description information, send explicitly with described voice messaging.

5. according to the method described in any one in claim 1 to 4, wherein, the method also comprises:

-obtain the transmission configuration information of described mobile terminal;

Wherein, the described step that described textual description information and described voice messaging are sent explicitly comprises:

-according to described transmission configuration information, described textual description information and described voice messaging are sent explicitly.

6. according to the method described in any one in claim 1 to 5, wherein, described step b comprises:

-according to described text identification object information, generate for describing the original text descriptor of the voice content of described voice messaging;

-described original text descriptor is presented to the user who uses described mobile terminal;

-adjusting operation according to described user to described textual description information, obtains the textual description information after adjustment.

7. according to the method described in any one in claim 1 to 6, wherein, described step a comprises the following steps:

-obtaining described mobile terminal carries out the local text identification object information of voice recognition processing gained to described voice messaging to be identified, and the voice messaging described to be identified that the network equipment offers it to described mobile terminal carries out the network text recognition result information of voice recognition processing gained;

-according to described local text identification object information and described network text recognition result information, obtain described text identification object information.

8. the dispensing device for voice messaging and its textual description information are sent explicitly, this dispensing device comprises:

9. dispensing device according to claim 8, wherein, this dispensing device also comprises:

Keyword extracting device, for extracting at least one main body keyword from described text identification object information;

Wherein, described descriptor generating apparatus comprises:

Sub-generating apparatus, for according to the part or all of keyword of described at least one main body keyword, generates described textual description information.

10. dispensing device according to claim 8 or claim 9, wherein, described textual description information and described voice messaging explicitly mode include but not limited to following at least one:

Dispensing device in 11. according to Claim 8 to 10 described in any one, wherein, this dispensing device also comprises:

Positional information extraction element, for extracting the geographical location information of described text identification object information;

Cartographic information acquisition device, for according to described geographical location information, obtains the cartographic information of the affiliated geographic area of described geographical location information;

Wherein, described associated dispensing device comprises:

The associated dispensing device of son, for by described cartographic information and described textual description information, sends explicitly with described voice messaging.

Dispensing device in 12. according to Claim 8 to 11 described in any one, wherein, this dispensing device also comprises:

Configuration information acquisition device, for obtaining the transmission configuration information of described mobile terminal;

Wherein, the operation that described associated dispensing device sends described textual description information and described voice messaging explicitly comprises:

According to described transmission configuration information, described textual description information and described voice messaging are sent explicitly.

Dispensing device in 13. according to Claim 8 to 12 described in any one, wherein, described descriptor generating apparatus comprises:

Original text generating apparatus, for according to described text identification object information, generates for describing the original text descriptor of the voice content of described voice messaging;

Present device, for described original text descriptor being presented to the user who uses described mobile terminal;

Descriptor acquisition device, for the adjusting operation to described textual description information according to described user, obtains the textual description information after adjustment.

Dispensing device in 14. according to Claim 8 to 13 described in any one, wherein, described recognition result acquisition device comprises:

The first result acquisition device, for obtaining described mobile terminal, described voice messaging to be identified is carried out to the local text identification object information of voice recognition processing gained, and the voice messaging described to be identified that the network equipment offers it to described mobile terminal carries out the network text recognition result information of voice recognition processing gained;

The second result acquisition device, for according to described local text identification object information and described network text recognition result information, obtains described text identification object information.

15. 1 kinds of mobile terminals, comprise the dispensing device as described in claim 8 to 14 at least one.