CN104462058B

CN104462058B - Character string identification method and device

Info

Publication number: CN104462058B
Application number: CN201410579684.5A
Authority: CN
Inventors: 戴强; 刘骁
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2014-10-24
Filing date: 2014-10-24
Publication date: 2018-10-02
Anticipated expiration: 2034-10-24
Also published as: CN104462058A

Abstract

The present invention relates to a kind of character string identification method and devices, the described method comprises the following steps in one embodiment：Character string is obtained, the character string is made of multiple types substring；The character string is segmented according to the substring type of the multiple types substring and combinations thereof, the character string is divided at least one substring；Judge whether at least one substring is that word converges, and it is the vocabulary for having unique meaning in the affiliated languages of the substring that the word, which converges,；If it is that word converges to judge the substring not, processing is identified at least one substring；And all substrings after identification are synthesized into Connected Speech.According to the method for the embodiment of the present invention and device, the meaning of character string can be accurately identified.

Description

Character string identification method and device

Technical field

The present invention relates to field of computer technology, more particularly to a kind of character string identification method and device.

Background technology

The development of present computer technology, phonetic synthesis also occur therewith, and phonetic synthesis will arbitrary text information reality When be converted into the massage voice reading of standard smoothness and come out.This mode either content, storage, transmission or convenience, in time Property etc. all facilitate user transmit message and read message.But all there are many pronunciation, different pronunciations for a large amount of character strings Also there are different meanings, only correct pronunciation that could give expression to appropriate meaning after synthesizing voice.Therefore in phonetic synthesis When, the meaning of a word of accurate identification string is particularly important.

Invention content

In view of this, a kind of character string identification method of present invention offer and device, can accurately identify the meaning of character string.

A kind of character string identification method, the described method comprises the following steps：

Character string is obtained, the character string is made of multiple types substring；

The character string is segmented according to the substring type of a plurality of types of substrings and combinations thereof, The character string is divided at least one substring；

Judge whether at least one substring is that word converges, and it is the affiliated languages of the substring that the word, which converges, In have the vocabulary of unique meaning；

If it is that word converges to judge the substring not, processing is identified at least one substring；With And

All substrings after identification are synthesized into Connected Speech.

A kind of character string identification device, described device comprise the following modules：

Acquisition module, for obtaining character string, the character string is made of multiple types substring；

Word-dividing mode is used for the character string according to the substring class of the multiple types substring and combinations thereof Type is segmented, and the character string is divided at least one substring；

Judgment module, for judging whether at least one substring is that word converges, it is the son that the word, which converges, There is the vocabulary of unique meaning in the affiliated languages of character string；

Processing module, if being that word converges for judging the substring not, by least one substring into Row identifying processing；And

Synthesis module, for all substrings synthesis Connected Speech after identifying.

According to the method and device of above-described embodiment, by being segmented to character string according to the classification of character string, then It is identified by word, improves the accuracy of character string identification.

For the above and other objects, features and advantages of the present invention can be clearer and more comprehensible, preferred embodiment cited below particularly, And coordinate institute's accompanying drawings, it is described in detail below.

Description of the drawings

Fig. 1 is a kind of structure diagram of electronic device.

Fig. 2 is the character string identification method flow chart that first embodiment provides.

Fig. 3 is the character string identification method flow chart that second embodiment provides.

Fig. 4 is the character string identification method flow chart that 3rd embodiment provides.

Fig. 5 is the character string identification method flow chart that fourth embodiment provides.

Fig. 6 is the character string identification method flow chart that the 5th embodiment provides.

Fig. 7 is the character string identification device structure diagram that sixth embodiment provides.

Fig. 8 is the character string identification device structure diagram that the 7th embodiment provides.

Fig. 9 is the character string identification device structure diagram that the 8th embodiment provides.

Figure 10 is the character string identification device structure diagram that the 9th embodiment provides.

Figure 11 is the character string identification device structure diagram that the tenth embodiment provides.

Specific implementation mode

Further to illustrate that the present invention is the technological means and effect realized predetermined goal of the invention and taken, below in conjunction with Specific implementation mode, structure, feature and its effect according to the present invention is described in detail as after in attached drawing and preferred embodiment.

A kind of character string identification method involved in the embodiment of the present invention and device, can be used for character string in phonetic synthesis Identification, specific its can be used in electronic device.

Fig. 1 is the structure diagram of above-mentioned electronic device.As shown in Figure 1, electronic device 100 include one or more (in figure Only show one) processor 102, memory 104, RF (Radio Frequency, radio frequency) module 106, network module 108, sound Frequency module 110, input module 112, display module 114,.It will appreciated by the skilled person that structure shown in FIG. 1 is only For signal, the structure of electronic device 100 is not caused to limit.For example, electronic device 100 may also include than shown in Fig. 1 More either less components or with the configuration different from shown in Fig. 1.The specific example packet of above-mentioned electronic device 100 Include but be not limited to handheld computer, mobile phone, media player, mobile unit, personal digital assistant and aforementioned device Various combinations.

It will appreciated by the skilled person that for processor 102, every other component belongs to outer If being coupled by multiple Peripheral Interfaces 124 between processor 102 and these peripheral hardwares.Peripheral Interface 124 can be based on following standard It realizes：Universal Asynchronous Receive/sending device (Universal Asynchronous Receiver/Transmitter, UART), Universal input/output (General Purpose Input Output, GPIO), Serial Peripheral Interface (SPI) (Serial Peripheral Interface, SPI), internal integrated circuit (Inter-Integrated Circuit, I2C), but not and limit In above-mentioned standard.In some instances, Peripheral Interface 124 can only include bus；In other examples, Peripheral Interface 124 is also May include other elements, display controller such as one or more controller, such as connecting liquid crystal display panel or Storage control 122 for connecting memory.In addition, this this controller can also be detached from Peripheral Interface 124, and It is integrated in the interior or corresponding peripheral hardware of processor 102.

Memory 104 can be used for storing software program and module, as the character string identification method in the embodiment of the present invention/ Corresponding program instruction/the module of device, processor 102 are stored in software program and module in memory 104 by operation, To perform various functions application and data processing, that is, realize above-mentioned character string identification method.Memory 104 may include height Fast random access memory, may also include nonvolatile memory, as one or more magnetic storage device, flash memory or other Non-volatile solid state memory.In some instances, memory 104 can further comprise remotely located relative to processor 102 Memory, these remote memories can pass through network connection to electronic device 100.The example of above-mentioned network includes but unlimited In internet, intranet, LAN, mobile radio communication and combinations thereof.

RF modules 106 realize the mutual conversion of electromagnetic wave and electric signal, thus with logical for receiving and transmitting electromagnetic wave News network or other equipment are communicated.RF modules 106 may include the various existing circuit elements for executing these functions Part, for example, antenna, RF transceiver, digital signal processor, encryption/deciphering chip, subscriber identity module (SIM) card, storage Device etc..RF modules 106 can carry out communication with various networks such as internet, intranet, wireless network or by wireless Network is communicated with other equipment.Above-mentioned wireless network may include cellular telephone networks, WLAN or Metropolitan Area Network (MAN). Above-mentioned wireless network can use various communication standards, agreement and technology, including but not limited to global system for mobile communications (Global System for Mobile Communication, GSM), enhanced mobile communication technology (Enhanced Data GSM Environment, EDGE), Wideband CDMA Technology (wideband code division multiple Access, W-CDMA), Code Division Multiple Access (Code division access, CDMA), time division multiple access technology (time Division multiple access, TDMA), adopting wireless fidelity technology (Wireless, Fidelity, WiFi) (such as U.S.'s electricity Gas and Electronic Engineering Association standard IEEE 802.11a, IEEE 802.11b, IEEE802.11g and/or IEEE 802.11n), the networking telephone (Voice over internet protocal, VoIP), worldwide interoperability for microwave accesses (Worldwide Interoperability for Microwave Access, Wi-Max), other for mail, Instant Messenger The agreement and any other suitable communications protocol of news and short message, or even may include that those are not developed currently yet Agreement.

Network module 108 is for receiving and transmitting network signal.Above-mentioned network signal may include wireless signal or have Line signal.In an example, above-mentioned network signal is WiFi signal, since the working frequency of WiFi is also at the frequency range of radio frequency Interior, network module can have the hardware configuration similar with RF modules 106 at this time, you can including antenna, RF transceiver, number letter The elements such as number processor, encryption/deciphering chip.In an example, above-mentioned network signal is cable network signal.At this point, net Network module 108 may include the elements such as processor, random access memory, converter, crystal oscillator.

Voicefrequency circuit 110, loud speaker, sound jack, microphone are provided jointly between user and electronic apparatus 100 Audio interface.Specifically, voicefrequency circuit 110 receives voice data from processor 102, and voice data is converted to electric signal, By electric signal transmission to loud speaker.Loud speaker 101 converts electrical signals to the sound wave that human ear can be heard.Voicefrequency circuit 110 also from Receive electric signal at microphone, convert electrical signals to voice data, and by data transmission in network telephony to processor 102 with into traveling The processing of one step.Audio data can obtain from memory 104 or by RF modules 106, network module 108.In addition, sound Frequency evidence can also be stored into memory 104 or be sent by RF modules 106 and network module 108.

Input unit 112 can be used for receiving the character information of input, and generation has with user setting and function control Keyboard, mouse, operating lever, optics or the input of trace ball signal of pass.Specifically, input unit 112 may include button and Touch-control surface.Button for example may include the character keys for inputting character, and the control button for triggering control function. The example of control button includes " returning to main screen " button, on/off button, camera button etc..Touch-control surface collects user On it or neighbouring touch operation (such as user using any suitable object or attachment such as finger, stylus in touch-control surface The upper or operation near touch-control surface), and corresponding attachment device is driven according to a pre-set procedure.Optionally, touch-control Surface may include both touch detecting apparatus and touch controller.Wherein, the touch side of touch detecting apparatus detection user Position, and the signal that touch operation is brought is detected, transmit a signal to touch controller；Touch controller is from touch detecting apparatus Touch information is received, and is converted into contact coordinate, then gives processor 102, and the order that processor 102 is sent can be received And it is executed.Furthermore, it is possible to realize touch-control table using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves Face.In addition to touch-control surface, input unit 112 can also include other input equipments.Other above-mentioned input equipments include but not It is limited to one or more in physical keyboard, trace ball, mouse, operating lever etc..

Display module 114 is used to show information input by user, is supplied to user information and electronic device 100 Various graphical user interface, these graphical user interface can be made of figure, text, icon, video and its arbitrary combination. In an example, display module 114 includes a display panel.Display panel may be, for example, a liquid crystal display panel (Liquid Crystal Display, LCD), Organic Light Emitting Diode (Organic Light-Emitting Diode Display, OLED) display panel, electrophoretic display panel (Electro-Phoretic Display, EPD) etc..Further, Touch-control surface may be disposed on display panel to constitute an entirety with display panel.In further embodiments, mould is shown Block 114 may also include other kinds of display device, such as including a projection display equipment.Compared to general display surface Plate, projection display equipment also need to include some component such as lens groups for projection.

First embodiment

Fig. 2 is a kind of character string identification method flow chart provided in this embodiment, as shown in Fig. 2, the method for the present embodiment Include the following steps：

Step S101, character string is obtained, the character string is made of multiple types substring.

The character string can be the character string inputted immediately by user, can also be existing word in Current electronic device Symbol string.In an example, the method in the present embodiment is used in a immediate communication tool, the first user terminal and second user Character string is sent between end mutually, the acquisition character string can be that the character string that current interface receives can also communication tool Character string in historical record.In another example, the method for the present embodiment can be used in a translation software, the character String can be that electronic device receives character string input by user.

It is appreciated that character string is there are many type, for example, Arabic, at noon, English, number symbol and its appoints The types such as the combination of meaning.The multiple types character string also Corresponding matching respective profiles, the configuration file is for marking The character string type to prestore, which corresponds to, determines target type.The addend word for example, number is put in marks " Number2Punction2Number " can be expressed as decimal, telephone number, numerical value etc..For example, " 2.13 ", " 010- 88888888”.Corresponding configuration is：“Number2Punction2Number：Decimal, Telephone ".Further, described The meaning that character string defines can be changed and be increased to configuration file.Such as character string " 3,247 " belongs to above-mentioned number and puts in marks Addend word " Number2Punction2Number " type, but " 3,247 " are not belonging to the class being arranged in configuration file to character string Type belongs to numerical value.Increase target type " Numerical " then can be carried out in the configuration file to above-mentioned character string type.

Step S102, by the character string according to the substring type of the multiple types substring and combinations thereof into Row participle, is divided at least one substring by the character string.

In one embodiment, character string is divided into four major class character strings：English (English) indicates Chinese character (Kanji), symbol (Punctuation), digital (Number).Above-mentioned four classes character string can also be combined arbitrarily, for example, English2Number：The type of expression English addend word, Type Length 2, such as " CA1419 "； Number2Punctuation2Number：Indicate the type of digital addend word of putting in marks, Type Length 3, for example, " 010- 88888888”；Number2Kanji：Indicate number plus the type of Chinese character, Type Length 2, for example, 2014.It can be according to English (English) indicates Chinese character (Kanji), symbol (Punctuation), digital (Number) and combinations thereof participle.

In an example, by sentence " China Mobile (0941) March 16 is in Hong Kong publication 2005 wealth year business performance " Segmented " China/movement/(/ 0941/)/March/16 day/in the wealth year of/Hong Kong/publication/2005//operation/achievement ".Further Each substring is also marked part of speech by ground when being segmented.For example, " China " mark part of speech " Kanji ", " March " mark word Property " Number2 Kanji ".By marking the part of speech of a substring, when can be used for substring identifying processing, as front and back The reference information of substring.

Step S103, judge whether at least one substring is that word converges.

It is the vocabulary for having unique meaning in the affiliated languages of the substring that the word, which converges,.Only have when being exported with spoken language Unique pronunciation.For example, it is understood that " China " has in Chinese uniquely contains if the substring is " China " Justice then can determine that character string " China " is converged for word.For example, substring " China " also has unique meaning in English, also may be used It is converged for word with judgement " China ".

In embodiments of the present invention, classified according to the four of above-mentioned character string kinds, if it is appreciated that the substring Then ambiguity can be not present under normal circumstances with its meaning of Direct Recognition for Chinese or English word.For example, " China " is in voice It directly can sequentially be understood when synthesis.Judge whether substring is English or Chinese, it then can be straight if Chinese and English word Reading is connect, the identification of meaning need not be carried out again.If not Sino-British word converges, then the deciphering for carrying out ambiguity, such as " 2001 are needed Year ", which can be understood as " in two thousand 01 ", can also be read as " in 2001 ".

If it is that word converges step S104, to judge the substring not, at least one substring is known It manages in other places.

In one embodiment, the method for the present embodiment is used for phonetic synthesis.Phonetic synthesis, also known as literary periodicals (Text to Speech) technology, the massage voice reading that can convert arbitrary text information in real time standard smoothness come out, are equivalent to Artificial face has been loaded onto to machine.For synthetic language, in addition to depending on various rules, including semantics rule, lexical rule, Phonetics rule is outer, it is necessary to is well understood by having in word, the problem of this also relates to natural language understanding.

For the character strings of combination of above-mentioned four major types, there may be the multiple meanings of ambiguity, same type of character strings It can indicate a plurality of types of contents, then the meaning to substring in current string is needed to be identified.

For example, " 120 " can indicate that ambulance call pronounces " one 20 ", it can also indicate that numerical value is pronounced " 102 ".Then It can be identified according to the meaning of front and back substring, such as in an example, " dialing 120 ambulance calls " then can basis Character string " ambulance call " judges " 120 " as telephone number below.

" Number2Punction2Number " type can be expressed as decimal, telephone number, numerical value etc..For example, " 2014 Hundred million yuan/RMB of year/China/movement/business revenue/3,247/ ", therein " 3,247 " can be according to front and back character string " hundred million yuan " It is judged as numerical value.For example, " 010-88888888 ", which is also " Number2Punction2Number " type, indicates telephone number.Example Such as, " 2014 " in above-mentioned example can indicate that " 2,014 years " can also indicate " in 2014 ".It then can basis Front and back character string information establishes Matching Model, and by model treatment, then the result of preference pattern is as final recognition result. It can be used in one example " conditional random field models (CRF models) ".The conditional random field models have undirected graph model, Vertex in figure represents stochastic variable, and the line between vertex represents the dependence relation between stochastic variable, in condition random field, with Machine variable Y is distributed as conditional probability, and given observed value is then stochastic variable X.In principle, the graph model cloth of condition random field Office can be any given, and general common layout is the framework of chain eliminant." 2014 " in above-mentioned example can basis Subsequent multiple character strings " China/movement/business revenue " are judged as " in 2014 ", rather than " 2,014 years ".

For example, having exact meaning for number plus percentage symbol, percentage is indicated.Then matched with general rule Identification.For example, number plus percentage sign indicate percentage.

For example, character string " jpg ", " gif " etc. is picture/mb-type character.Default rule can be then set, appearance is worked as " BMP ", " JPG ", " GIF ", " PNG " are then identified as picture format, can directly be understood in turn according to letter, number in character string.

Step S105, all substrings after identification are synthesized into Connected Speech.

The character string of above-mentioned identification is changed into spoken output that can listen to understand, fluent.

Further, the method for the present embodiment, can also be by the character string phonetic synthesis after identification.

It is that other character string is segmented, then is identified respectively to substring by treating according to the method for the present embodiment Processing, improves the accuracy of identification.

Second embodiment

The present embodiment provides a kind of character string identification methods, and the present embodiment is similar with first embodiment, and difference exists In as shown in figure 3, step S104 further includes specifically：

Step S201, according to substring described in the content recognition of the corresponding front and back character string of the substring.

Step S202, the substring after identification is synthesized into voice.

The method of the present embodiment can be identified according to the substring of above or below.According to front and back substring part Ambiguity is not present in character string, then can obtain a result.

For example, " 120 " can indicate that ambulance call pronounces " one 20 ", it can also indicate that numerical value is pronounced " 102 ".Then It can be identified according to the meaning of front and back substring, such as in an example, " dialing 120 ambulance calls " then can basis Character string " ambulance call " judges " 120 " as telephone number below.For example, " 2014/China/movement/business revenue/3,247/ hundred million Member/RMB ", therein " 3,247 " can be judged as numerical value according to front and back character string " hundred million yuan ".Identify accurate result The substring currently identified is synthesized into voice again.

According to the method for the present embodiment, when processing is identified to substring, pass through the information of front and back substring The meaning for identifying substring, avoids the character string of more meanings from interfering, realizes higher accuracy rate.

3rd embodiment

The present embodiment provides a kind of character string identification methods, and the present embodiment is similar with first embodiment, and difference exists In as shown in figure 4, step S104 further includes specifically：

Step S301, string matching model is established, the meaning of the substring is identified according to the Matching Model.

Step S302, the substring after identification is synthesized into voice.

The multiple types substring also Corresponding matching respective profiles, the configuration file are described pre- for marking The character string type deposited, which corresponds to, determines target type.Addend word " Number2Punction2Number " for example, number is put in marks It can be expressed as decimal, telephone number, numerical value etc..Corresponding configuration is：“Number2Punction2Number：Decimal, Telephone, Numerical ".It can be according to the corresponding of the character string Corresponding matching of substring corresponding types when identification string Configuration file identifies.

For example, " the hundred million yuan/RMB of/China/movement/business revenue/3,247/ in 2014 ", therein " 3,247 " can basis Front and back character string " hundred million yuan " is judged as numerical value.For example, " 010-88888888 " is also " Number2Punction2Number " Type indicates telephone number.For example, " 2014 " in above-mentioned example can indicate that " 2,014 years " can also indicate " two 1 years ".Then Matching Model can be established according to front and back character string information, by model treatment, the then result of preference pattern As final recognition result.In an example, it can be used " conditional random field models (CRF models) ".The condition random field Model has undirected graph model, and the vertex in figure represents stochastic variable, and the line between vertex represents interdependent between stochastic variable Relationship, in condition random field, stochastic variable Y's is distributed as conditional probability, and given observed value is then stochastic variable X.Principle On, the graph model layout of condition random field can be any given, and general common layout is the framework of chain eliminant.Above-mentioned example " 2014 " in son can be judged as " in 2014 " according to subsequent multiple character strings " China/movement/business revenue ", without It is " 2,014 years ".It is understood that the Matching Model can also be other statistical models, such as Hidden Markov Model (HMM model), conditional random field models (CRF models), maximum entropy model (ME models) etc..Finally by the character string of identification Synthesize voice.

According to the method for the present embodiment, according to front and back information, still there may be ambiguities for partial character string, pass through foundation With model, the character string information for comparing context identifies the meaning of current substring, to further increase character string identification Accuracy rate.

Fourth embodiment

The present embodiment provides a kind of character string identification methods, and the present embodiment is similar with first embodiment, and difference exists In as shown in figure 5, step S104 further includes specifically：

Step S401, according to the meaning Direct Recognition of the substring.

Step S402, the substring after identification is synthesized into voice.

Process resource is saved, together for there is the character string Direct Recognition of direct clear meaning according to the method for the present embodiment When also have higher accuracy rate.

5th embodiment

The present embodiment provides a kind of character string identification methods, and the present embodiment is similar with first embodiment, and difference exists In as shown in fig. 6, step S104 further includes specifically：

Step S501, it is identified according to default type according to the recognizable character string in the substring.

Step S502, the substring after identification is synthesized into voice.

There is the corresponding meaning given tacit consent to for some character strings, then the recognition rule of acquiescence can be set.

For example, character string " jpg ", " gif " etc. is picture/mb-type character.Default rule can be then set, appearance is worked as " BMP ", " JPG ", " GIF ", " PNG " are then identified as picture format, can directly be understood in turn according to letter, number in character string. When synthesizing voice then directly in order letter in composite characters string, number voice.

According to the method for the present embodiment, part special string can be directly identified according to the rule of acquiescence, it can Special rules is defined, the recognition accuracy of character string is improved.

Sixth embodiment

The present embodiment provides a kind of character string identification devices, as shown in fig. 7, the device of the present embodiment includes：Acquisition module 601, word-dividing mode 602, judgment module 603, processing module 604 and synthesis module 605.

Acquisition module 601, for obtaining character string, the character string is made of multiple types substring.

Word-dividing mode 602 is used for the character string according to the sub- character of the multiple types substring and combinations thereof String type is segmented, and the character string is divided at least one substring.

Judgment module 603, for judging whether at least one substring is that word converges.

It is the vocabulary for having unique meaning in the affiliated languages of the substring that the word, which converges,.

Processing module 604, if being that word converges for judging the substring not, by least one substring Processing is identified.

Substring there may be ambiguity is identified, obtains accurate result.

Synthesis module 605, for all substrings synthesis Connected Speech after identifying.By the character string of above-mentioned identification It is changed into spoken output that can listen to understand, fluent.

It is that other character string is segmented by treating, at substring respectively identification according to the device of the present embodiment Reason, improves the accuracy of identification.

7th embodiment

The present embodiment provides a kind of character string identification device, the present embodiment is similar with the 7th embodiment, and difference exists In as shown in figure 8, described device further includes：

First recognition unit 6041, for son described in the content recognition according to the corresponding front and back character string of the substring Character string；

Phonetic synthesis unit 6042, for the substring synthesis voice after identifying.

The other details of device about the present embodiment can also further regard to second embodiment, be not repeated herein.

According to the device of the present embodiment, when processing is identified to substring, pass through the information of front and back substring The meaning for identifying substring, avoids the character string of more meanings from interfering, realizes higher accuracy rate.

8th embodiment

The present embodiment provides a kind of character string identification device, the present embodiment is similar with the 7th embodiment, and difference exists In as shown in figure 9, described device further includes：

Second recognition unit 6043 identifies the sub- word for establishing string matching model according to the Matching Model Accord with the meaning of string.

Phonetic synthesis unit 6042 is used for after the substring in the character string identifies, by the son after identification Character string synthesizes voice.

The other details of device about the present embodiment can also further regard to 3rd embodiment, be not repeated herein.

According to the device of the present embodiment, according to front and back information, still there may be ambiguities for partial character string, pass through foundation With model, the character string information for comparing context identifies the meaning of current substring, to further increase character string identification Accuracy rate.

9th embodiment

The present embodiment provides a kind of character string identification device, the present embodiment is similar with the 7th embodiment, and difference exists In as shown in Figure 10, described device further includes：

Third recognition unit 6044, for the meaning Direct Recognition according to the substring.

The other details of device about the present embodiment can also further regard to fourth embodiment, be not repeated herein.

Process resource is saved, together for there is the character string Direct Recognition of direct clear meaning according to the device of the present embodiment When also have higher accuracy rate.

Tenth embodiment

The present embodiment provides a kind of character string identification device, the present embodiment is similar with the 7th embodiment, and difference exists In as shown in figure 11, described device further includes：

4th recognition unit 6045, for being carried out according to default type according to the recognizable character string in the substring Identification.

The other details of device about the present embodiment can also further regard to the 5th embodiment, be not repeated herein.

According to the device of the present embodiment, part special string can be directly identified according to the rule of acquiescence, it can Special rules is defined, the recognition accuracy of character string is improved.

In addition, the embodiment of the present invention also provides a kind of computer readable storage medium, it is executable to be stored with computer Instruction, above-mentioned computer readable storage medium is, for example, nonvolatile memory such as CD, hard disk or flash memory.It is above-mentioned Computer executable instructions for allowing computer or similar arithmetic unit to complete in above-mentioned character string identification method Various operations.

The above described is only a preferred embodiment of the present invention, be not intended to limit the present invention in any form, though So the present invention has been disclosed with preferred embodiment as above, and however, it is not intended to limit the invention, any those skilled in the art, not It is detached within the scope of technical solution of the present invention, when the technology contents using the disclosure above make a little change or are modified to equivalent change The equivalent embodiment of change, as long as being without departing from technical solution of the present invention content, according to the technical essence of the invention to implementing above Any simple modification, equivalent change and modification made by example, in the range of still falling within technical solution of the present invention.

Claims

1. a kind of character string identification method, which is characterized in that the described method comprises the following steps：

Character string is obtained, the character string is made of multiple types substring, and the multiple types substring includes：English Type, numeric type, sign pattern, Chinese character type and combinations thereof, the multiple types substring Corresponding matching corresponding configuration File, the configuration file are used to mark the character string type to prestore to correspond to and determine target type；

The character string is segmented according to the substring type of the multiple types substring and combinations thereof, it will be described Character string is divided at least one substring, each substring is marked part of speech when being segmented, the part of speech is for indicating The type of each substring；

Judge whether at least one substring is that word converges, and it is to have in the affiliated languages of the substring that the word, which converges, The vocabulary of unique meaning；

If it is that word converges to judge the substring not, processing is identified at least one substring；And

All substrings after identification are synthesized into Connected Speech；

The described substring is identified specifically includes：

String matching model is established according to front and back character string information, containing for the substring is identified according to the Matching Model Justice selects the handling result of the Matching Model as recognition result；

The substring after identification is synthesized into voice.

2. character string identification method as described in claim 1, which is characterized in that described that tool is identified in the substring Body includes：

According to substring described in the content recognition of the corresponding front and back character string of the substring；

The substring after identification is synthesized into voice.

3. character string identification method as described in claim 1, which is characterized in that described that tool is identified in the substring Body includes：

According to the meaning Direct Recognition of the substring；

The substring after identification is synthesized into voice.

4. character string identification method as described in claim 1, which is characterized in that described that tool is identified in the substring Body includes：

It is identified according to default type according to the recognizable character string in the substring；

The substring after identification is synthesized into voice.

5. a kind of character string identification device, which is characterized in that described device comprises the following modules：

Acquisition module, for obtaining character string, the character string is made of multiple types substring, the sub- word of multiple types Symbol is gone here and there：English type, numeric type, sign pattern, Chinese character type and combinations thereof, the multiple types substring correspond to Respective profiles are matched, the configuration file is used to mark the character string type to prestore to correspond to and determines target type；

Word-dividing mode, for by the character string according to the substring type of the multiple types substring and combinations thereof into Row participle, is divided at least one substring by the character string, and each substring is marked part of speech when being segmented, described Part of speech is used to indicate the type of each substring；

Judgment module, for judging whether at least one substring is that word converges, it is the sub- character that the word, which converges, There is the vocabulary of unique meaning in languages belonging to string；

Processing module knows at least one substring if being that word converges for judging the substring not It manages in other places；And

Synthesis module, for all substrings synthesis Connected Speech after identifying；

The processing module specifically includes：

Second recognition unit is known for establishing string matching model according to front and back character string information according to the Matching Model The meaning of the not described substring, selects the handling result of the Matching Model as recognition result；

Phonetic synthesis unit, for the substring synthesis voice after identifying.

6. character string identification device as claimed in claim 5, which is characterized in that the processing module specifically includes：

First recognition unit, for substring described in the content recognition according to the corresponding front and back character string of the substring 's；

Phonetic synthesis unit, for the substring synthesis voice after identifying.

7. character string identification device as claimed in claim 5, which is characterized in that the processing module specifically includes：

Third recognition unit, for the meaning Direct Recognition according to the substring；

Phonetic synthesis unit, for the substring synthesis voice after identifying.

8. character string identification device as claimed in claim 5, which is characterized in that the processing module specifically includes：

4th recognition unit, for being identified according to default type according to the recognizable character string in the substring；

Phonetic synthesis unit, for the substring synthesis voice after identifying.