CN109684638A

CN109684638A - Sentence segmentation method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN109684638A
Application number: CN201811579742.9A
Authority: CN
Inventors: 史文丽
Original assignee: Beijing Kingsoft Internet Security Software Co Ltd
Current assignee: Beijing Kingsoft Internet Security Software Co Ltd
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2019-04-26
Anticipated expiration: 2038-12-24
Also published as: CN109684638B

Abstract

The embodiment of the invention provides a sentence dividing method and a device thereof, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: and acquiring a text to be segmented, and performing word segmentation processing on the text to be segmented to generate a word segmentation sequence corresponding to the text. Determining the part of speech of each participle in the participle sequence, and according to the part of speech of each participle, carrying out sentence segmentation on the participle sequence to generate a plurality of sentences. Therefore, the method and the device realize that the arranged sentences are generated according to the part of speech of the participle by carrying out the participle processing on the text, so that the arranged sentences are more convenient to be analyzed and processed by the language model, and further improve the accuracy of sentence correction. The technical problem that the result obtained by directly correcting the daily conversation by using the language model in the prior art is inaccurate is solved.

Description

Subordinate sentence method and device thereof, electronic equipment, computer readable storage medium

Technical field

The present invention relates to natural language processing technique field more particularly to a kind of subordinate sentence method and device thereof, electronic equipment, Computer readable storage medium.

Background technique

With the development of natural language processing technique, in order to allow computer that can more accurately understand the natural language of the mankind Speech, needs to correct sentence, keeps it more complete and rationalizes.

In the related technology, directly every-day language is corrected with language model, but the dialogue of daily vernacular often lacks Punctuation mark, so that the result inaccuracy directly corrected with language model to every-day language.Therefore, it is necessary to a kind of energy Enough methods that relatively nonstandard every-day language is rearranged and standardized, further to be located to every-day language Reason.

Summary of the invention

The present invention is directed to solve at least some of the technical problems in related technologies.

For this purpose, the first purpose of this invention is to propose a kind of subordinate sentence method, to realize by segmenting to text Processing generates the sentence after arranging according to the part of speech of participle, so that the sentence after arranging easily facilitates and divided by language model Analysis and processing, and then improve the accuracy rate of sentence correction.

Second object of the present invention is to propose a kind of subordinate sentence device.

Third object of the present invention is to propose a kind of electronic equipment.

Fourth object of the present invention is to propose a kind of computer readable storage medium.

In order to achieve the above object, the subordinate sentence method of first aspect present invention embodiment, comprising: obtain the text to subordinate sentence；It is right The text to subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of the text；It determines in the segmentation sequence The part of speech of each participle；According to the part of speech of each participle, subordinate sentence is carried out to generate multiple sentences to the segmentation sequence.

In addition, the subordinate sentence method of the embodiment of the present invention, also has following additional technical characteristic:

Optionally, the segmentation sequence includes the Chinese symbol and foreign language symbol of the text to subordinate sentence, described to institute It states and carries out word segmentation processing to the text of subordinate sentence, to generate the corresponding segmentation sequence of the text, comprising: removal is described to subordinate sentence Messy code in text；And/or the Chinese symbol is converted by the foreign language symbol in the text to subordinate sentence, wherein The foreign language symbol includes foreign language punctuation mark, and the Chinese symbol includes Chinese punctuation mark；And use segmentation methods pair The text to subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of the text.

Optionally, the part of speech according to each participle carries out subordinate sentence to the segmentation sequence to generate multiple sentences, wraps It includes: according to pre-set sentence tail word part of speech table, the corresponding part of speech of sentence tail word is selected from the part of speech of each participle, In, the sentence tail word refers to the word occurred at sentence end；It is corresponding in the text according to the corresponding part of speech of the sentence tail word Segmentation sequence in determine the sentence tail word；And according to the sentence tail word and/or the Chinese punctuation mark to the text Corresponding segmentation sequence carries out subordinate sentence, to generate multiple sentences.

Optionally, the part of speech according to each participle carries out subordinate sentence to generate multiple sentences, also to the segmentation sequence Include: judge the generation sentence whether include default part of speech word；If the sentence of the generation includes the default word Property word, then the corresponding punctuation mark of the prediction supplement default part of speech in the sentence of the generation.

Optionally, in the part of speech according to each participle, subordinate sentence is carried out to generate multiple sentences to the segmentation sequence Later, further includes: word segmentation processing is carried out to the sentence using segmentation methods, to obtain the word segmentation result of the sentence；According to The word segmentation result of the sentence constructs N-Gram language model；The sentence is rectified using the N-Gram language model Just.

Optionally, described that the sentence is corrected using the N-Gram language model, comprising: to use the N- Gram language model carries out a group word to the sentence；The prediction of word is carried out to the sentence using the N-Gram language model； The prediction of punctuation mark is carried out to the sentence using the N-Gram language model.

The subordinate sentence device of second aspect of the present invention embodiment, comprising: module is obtained, for obtaining the text to subordinate sentence；The One word segmentation processing module, for carrying out word segmentation processing to the text to subordinate sentence, to generate the corresponding participle sequence of the text Column；Determining module, for determining the part of speech of each participle in the segmentation sequence；And subordinate sentence module, for according to each point The part of speech of word carries out subordinate sentence to the segmentation sequence to generate multiple sentences.

In addition, the subordinate sentence device of the embodiment of the present invention, also has following additional technical characteristic:

Optionally, the segmentation sequence includes the Chinese symbol and foreign language symbol of the text to subordinate sentence, and described first Word segmentation processing module, comprising: removal submodule, it is described to the messy code in the text of subordinate sentence for removing；And/or conversion submodule Block, for converting the Chinese symbol for the foreign language symbol in the text to subordinate sentence, wherein the foreign language symbol Including foreign language punctuation mark, the Chinese symbol includes Chinese punctuation mark；And word segmentation processing submodule, for using participle Algorithm carries out word segmentation processing to the text to subordinate sentence, to generate the corresponding segmentation sequence of the text.

Optionally, the subordinate sentence module, comprising: selection submodule, for according to pre-set sentence tail word part of speech table, from The corresponding part of speech of sentence tail word is selected in the part of speech of each participle, wherein the sentence tail word refers in the appearance of sentence end Word；Submodule is determined, for determining institute in the corresponding segmentation sequence of the text according to the corresponding part of speech of the sentence tail word State a tail word；And subordinate sentence submodule, for corresponding to the text according to the sentence tail word and/or the Chinese punctuation mark Segmentation sequence carry out subordinate sentence, to generate multiple sentences.

Optionally, the subordinate sentence module, further includes: judging submodule, for judge the generation sentence whether include The word of default part of speech；Prediction supplement submodule, for determining that the sentence of the generation includes described when the judging submodule When the word of default part of speech, prediction supplements the corresponding punctuation mark of the default part of speech in the sentence of the generation.

Optionally, described device further include: the second word segmentation processing module, for being carried out using segmentation methods to the sentence Word segmentation processing, to obtain the word segmentation result of the sentence；Module is constructed, for the word segmentation result according to the sentence, constructs N- Gram language model；Rectification module, for being corrected using the N-Gram language model to the sentence.

Optionally, the rectification module, comprising: group lexon module, for using the N-Gram language model to described Sentence carries out a group word；First prediction submodule, for using the N-Gram language model to carry out the prediction of word to the sentence； Second prediction submodule, for using the N-Gram language model to carry out the prediction of punctuation mark to the sentence.

The electronic equipment of third aspect present invention embodiment, comprising: memory, processor and storage are on a memory and can The computer program run on a processor when the processor executes described program, is realized as described in preceding method embodiment Subordinate sentence method.

The computer readable storage medium of fourth aspect present invention embodiment, is stored thereon with computer program, the program The subordinate sentence method as described in preceding method embodiment is realized when being executed by processor.

Technical solution provided in an embodiment of the present invention may include following the utility model has the advantages that obtaining the text to subordinate sentence, right Text to subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of text, wherein segmentation sequence includes the text to subordinate sentence This Chinese punctuation mark.The part of speech for determining each participle in segmentation sequence, according to the part of speech of each participle, to segmentation sequence into Row subordinate sentence is to generate multiple sentences.Hereby it is achieved that being generated and being arranged according to the part of speech of participle by carrying out word segmentation processing to text Sentence afterwards so that the sentence after arranging easily facilitates and analyzed and handled by language model, and then improves sentence correction Accuracy rate.

The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description Obviously, or practice through the invention is recognized.

Detailed description of the invention

Fig. 1 is a kind of flow diagram of subordinate sentence method provided by the embodiment of the present invention；

The flow diagram for another subordinate sentence method that Fig. 2 is proposed by the embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of subordinate sentence device provided by the embodiment of the present invention；

Fig. 4 is the structural schematic diagram of another kind subordinate sentence device provided by the embodiment of the present invention；And

Fig. 5 is the hardware structural diagram for illustrating electronic equipment according to an embodiment of the present invention.

Specific embodiment

The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.

Below with reference to the accompanying drawings the subordinate sentence method and device thereof of the embodiment of the present invention are described, electronic equipment, computer-readable are deposited Storage media.

Description based on the above-mentioned prior art is it is recognised that directly correct every-day language with language model, still The dialogue of daily vernacular often lacks punctuation mark, so that the result directly corrected with language model to every-day language is not Accurately.

For this problem, the embodiment of the invention provides a kind of subordinate sentence methods, to realize by segmenting to text Processing generates the sentence after arranging according to the part of speech of participle, so that the sentence after arranging easily facilitates and divided by language model Analysis and processing, and then improve the accuracy rate of sentence correction.

Fig. 1 is a kind of flow diagram of subordinate sentence method provided by the embodiment of the present invention.As shown in Figure 1, the subordinate sentence side Method the following steps are included:

S101 obtains the text to subordinate sentence.

It should be understood that although the subordinate sentence method that the embodiment of the present invention is proposed is to be directed to lack the daily of punctuation mark What dialogue was proposed, but be not restricted to that every-day language, can be used for carrying out subordinate sentence to the text more standardized.

S102, the text for treating subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of text.

It should be appreciated that in every-day language, in addition to there are problems that lacking punctuation mark, messy code there is also, such as: (* ' ▽ `) ノノ, Shi auspicious day today.

In addition, to may be not only comprising Chinese symbol in the text of subordinate sentence, but also include foreign language symbol.Therefore, subordinate sentence is treated Text carries out word segmentation processing, and the segmentation sequence of generation includes the Chinese symbol and foreign language symbol of the text to subordinate sentence.Such as English Comma in text is ", ", and the comma in Chinese is ", ", and may both include ", " in segmentation sequence, and including ", ".

In order to which the text for allowing segmentation methods that can more accurately treat subordinate sentence carries out word segmentation processing, a kind of possible realization Mode is, removes to the messy code in the text of subordinate sentence, and/or, Chinese symbol is converted by the foreign language symbol in the text to subordinate sentence Number, wherein foreign language symbol includes foreign language punctuation mark, and Chinese symbol includes Chinese punctuation mark, for example converts ", " for ", ". Word segmentation processing is carried out using the text that segmentation methods treat subordinate sentence, to generate the corresponding segmentation sequence of text.

It should be appreciated that foreign language symbol in addition to punctuation mark, further includes other symbols, such as currency symbol, although such as currency Other symbols of symbol etc will not impact subordinate sentence, but may correct and have an impact to subsequent sentence, therefore Foreign language symbol in text can be converted to Chinese symbol, to improve the accuracy rate of sentence correction.

It should be noted that segmentation methods can be known any segmentation methods, as long as can will be thin to the text of subordinate sentence The form of word granularity is turned to, convenient for reconfiguring word, and then realizes the subordinate sentence to text.

S103 determines the part of speech of each participle in segmentation sequence.

Wherein, part of speech includes: noun, verb, adjective, number, quantifier, pronoun, adverbial word, preposition, conjunction, auxiliary word, sighs One or more in word and onomatopoeia.

S104 carries out subordinate sentence to segmentation sequence according to the part of speech of each participle to generate multiple sentences.

It should be appreciated that Chinese language habit in, modal particle is usually placed on a tail, such as: " ", " ", " ", " ", auxiliary word also occur frequently in a tail, such as: " ", " ".

Therefore, a tail word part of speech table can be preset, the part of speech of sentence tail word and sentence tail word is subjected to induction-arrangement.

In addition, to Chinese punctuation mark original in the text of subordinate sentence, and be converted in S102 by foreign language punctuate Chinese punctuation mark all can serve as the mark of subordinate sentence.

One kind is possible to be achieved in that, according to pre-set sentence tail word part of speech table, is selected from the part of speech of each participle Select the corresponding part of speech of tail word, wherein sentence tail word refers to the word occurred at sentence end.According to the corresponding part of speech of sentence tail word, Sentence tail word is determined in the corresponding segmentation sequence of text.According to sentence tail word and/or Chinese punctuation mark to the corresponding participle of text Sequence carries out subordinate sentence, to generate multiple sentences.

In conclusion a kind of subordinate sentence method provided by the embodiment of the present invention, obtains the text to subordinate sentence, treats subordinate sentence Text carries out word segmentation processing, to generate the corresponding segmentation sequence of text, wherein segmentation sequence includes the Chinese of the text to subordinate sentence Punctuation mark.The part of speech for determining each participle in segmentation sequence, according to the part of speech of each participle, to segmentation sequence carry out subordinate sentence with Generate multiple sentences.Hereby it is achieved that generating the sentence after arranging according to the part of speech of participle by carrying out word segmentation processing to text Son so that the sentence after arranging easily facilitates and analyzed and handled by language model, and then improves the accurate of sentence correction Rate.

In order to clearly illustrate subordinate sentence method provided by the embodiment of the present invention, it is exemplified below.

It is to subordinate sentence text, " it is serious only to cough? what symptom that has a sore throat, which is had a talk about, buys medicine to you ", " grace is remembered to have taken back to me.I goes for a dip now ", use the segmentation sequence and corresponding word obtained after segmentation methods Property for " tight/a, not sternly/a, weight/a, eh/y, only/d, cough/vi, /y,?, have a sore throat/n, /y, what/ry, symptom/n, Eh/y ,/v is had a talk about ,/p is given, you/rr, medicine/nz is bought, eh/y ", " grace/ng remembers/v, gives/p, I/rr, by/v, back/v, good/ A ,/ule,., I/rr, now/t removes/vf, bathing/vi ,/ule ".

Wherein ,/a indicates that adjective ,/y indicate that modal particle ,/d indicate that adverbial word, vi/ indicate that intransitive verb ,/n indicate name Word, ry/ indicate that interrogative pronoun, v/ indicate that verb, p/ indicate that preposition, rr/ indicate that personal pronoun ,/nz indicate other proper names, ng/ Indicate nominal morpheme, ule/ illustrate or, t/ indicates that time word, vf/ indicate directional verb.

Subordinate sentence is carried out to segmentation sequence, multiple sentences of generation are " serious "；" only coughing? "；It " has a sore throat "；" what symptom "；" have a talk about and buy medicine to you "；" grace "；" remember to have taken back to me."；" I goes for a dip now ".

Further, judge the sentence generated whether include default part of speech word, if the sentence generated includes default word Property word, then prediction supplements the default corresponding punctuation mark of part of speech in the sentence of generation.Such as: sentence " what disease of generation Include in shape " interrogative pronoun " what " can then predict to supplement the corresponding punctuation mark of the sentence be "? ", " what which becomes Symptom? ".

In order to which subordinate sentence method provided by the embodiment of the present invention can be corrected to the sentence after subordinate sentence processing, this Inventive embodiments also proposed another subordinate sentence method.The stream for another subordinate sentence method that Fig. 2 is proposed by the embodiment of the present invention Journey schematic diagram.As shown in Fig. 2, method flow shown in FIG. 1 is based on, in S104, according to the part of speech of each participle, to segmentation sequence After subordinate sentence is carried out to generate multiple sentences, further includes:

S201 carries out word segmentation processing to sentence using segmentation methods, to obtain the word segmentation result of sentence.

It should be appreciated that the multiple sentences generated in S104, are the text progress punctuate generations later again for treating subordinate sentence, Each sentence is the sentence of independent completion, can be corrected to it.

In order to use N-Gram language model to correct sentence, it is necessary first to be segmented again to sentence.

S202 constructs N-Gram language model according to the word segmentation result of sentence.

S203 corrects sentence using N-Gram language model.

It is appreciated that N-Gram language model is being needed using the collocation information between adjacent word in context continuous nothing When space word is converted into sentence, the sentence with maximum probability can be calculated, to realize the correction to sentence.

Further, sentence is corrected using N-Gram language model, is specifically included: when continuous multiple in sentence When word needs front and back is connected to be converted to phrase, a group word is carried out to sentence using N-Gram language model.When lacking word in sentence When language, the prediction of word is carried out to sentence using N-Gram language model.When lacking punctuation mark in sentence, N-Gram language is used Say that model carries out the prediction of punctuation mark to sentence.

To realize and be corrected to the sentence generated after subordinate sentence.

In order to realize above-described embodiment, the embodiment of the present invention also proposed a kind of subordinate sentence device, and Fig. 3 is the embodiment of the present invention A kind of structural schematic diagram of provided subordinate sentence device.As shown in figure 3, the subordinate sentence device includes: to obtain module 310, first point Word processing module 320, determining module 330, subordinate sentence module 340.

Module 310 is obtained, for obtaining the text to subordinate sentence.

First participle processing module 320, the text for treating subordinate sentence carries out word segmentation processing, to generate corresponding point of text Word sequence.

Determining module 330, for determining the part of speech of each participle in segmentation sequence.

Subordinate sentence module 340 carries out subordinate sentence to segmentation sequence for the part of speech according to each participle to generate multiple sentences.

Further, a kind of in order to which the text for allowing segmentation methods that can more accurately treat subordinate sentence carries out word segmentation processing Possible to be achieved in that, segmentation sequence includes the Chinese symbol and foreign language symbol of the text to subordinate sentence, and the first participle handles mould Block 320, comprising: removal submodule 321, for removing the messy code in the text to subordinate sentence.And/or conversion submodule 322, it uses In converting Chinese symbol for the foreign language symbol in the text to subordinate sentence, wherein foreign language symbol includes foreign language punctuation mark, Chinese Symbol includes Chinese punctuation mark.Word segmentation processing submodule 323, the text for treating subordinate sentence using segmentation methods are segmented Processing, to generate the corresponding segmentation sequence of text.

Further, in order to according to the part of speech of each participle, to segmentation sequence progress subordinate sentence to generate multiple sentences, one kind It is possible to be achieved in that, subordinate sentence module 340, comprising: selection submodule 341, for according to pre-set sentence tail word part of speech Table selects the corresponding part of speech of sentence tail word, wherein sentence tail word refers to the word occurred at sentence end from the part of speech of each participle Language.Submodule 342 is determined, for determining sentence tail word in the corresponding segmentation sequence of text according to the corresponding part of speech of sentence tail word.Point Sentence module 343, for carrying out subordinate sentence to the corresponding segmentation sequence of text according to sentence tail word and/or Chinese punctuation mark, with life At multiple sentences.

Further, for the prediction supplement punctuation mark in the sentence of generation, one kind is possible to be achieved in that, subordinate sentence Module 340, further includes: judging submodule 344, for judge generate sentence whether include default part of speech word.Prediction is mended Submodule 345 is filled, when for determining that the sentence of generation includes the word of default part of speech when judging submodule 344, in the sentence of generation Prediction supplements the default corresponding punctuation mark of part of speech in son.

It should be noted that the aforementioned explanation to subordinate sentence embodiment of the method is also applied for the subordinate sentence dress of the embodiment It sets, details are not described herein again.

In conclusion a kind of subordinate sentence device provided by the embodiment of the present invention, obtains the text to subordinate sentence, treats subordinate sentence Text carries out word segmentation processing, to generate the corresponding segmentation sequence of text, wherein segmentation sequence includes the Chinese of the text to subordinate sentence Punctuation mark.The part of speech for determining each participle in segmentation sequence, according to the part of speech of each participle, to segmentation sequence carry out subordinate sentence with Generate multiple sentences.Hereby it is achieved that generating the sentence after arranging according to the part of speech of participle by carrying out word segmentation processing to text Son so that the sentence after arranging easily facilitates and analyzed and handled by language model, and then improves the accurate of sentence correction Rate.

In order to realize above-described embodiment, the embodiment of the present invention also proposed another subordinate sentence device, and Fig. 4 is that the present invention is implemented The structural schematic diagram of another kind subordinate sentence device provided by example.As shown in figure 4, it is based on apparatus structure shown in Fig. 3, subordinate sentence dress It sets further include: the second word segmentation processing module 350 constructs module 360, rectification module 370.

Second word segmentation processing module 350, for carrying out word segmentation processing to sentence using segmentation methods, to obtain point of sentence Word result.

Module 360 is constructed, for the word segmentation result according to sentence, constructs N-Gram language model.

Rectification module 370, for being corrected using N-Gram language model to sentence.

Further, in order to use N-Gram language model to correct sentence, one kind is possible to be achieved in that, is rectified Positive module 370, comprising: group lexon module 371, for carrying out a group word to sentence using N-Gram language model.First prediction Module 372, for using N-Gram language model to carry out the prediction of word to sentence.Second prediction submodule 373, for using N- Gram language model carries out the prediction of punctuation mark to sentence.

In order to realize above-described embodiment, the present invention also proposes a kind of electronic equipment, comprising: memory, processor and storage On a memory and the computer program that can run on a processor, it when the processor executes described program, realizes as aforementioned Subordinate sentence method described in embodiment of the method.

Fig. 5 is the hardware structural diagram for illustrating electronic equipment according to an embodiment of the present invention.Electronic equipment can be with each Kind of form is implemented, and the electronic equipment in the present invention can include but is not limited to such as mobile phone, smart phone, notebook electricity Brain, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), is led at digit broadcasting receiver The mobile electronic device of boat device, vehicle electronic device, car-mounted display electronics, vehicle electronics rearview mirror etc. and such as number The stationary electronic devices of TV, desktop computer etc..

As shown in figure 5, electronic equipment 1100 may include wireless communication unit 1110, A/V (audio/video) input unit 1120, user input unit 1130, sensing unit 1140, output unit 1150, memory 1160, interface unit 1170, control Device 1180 and power supply unit 1190 etc..Fig. 5 shows the electronic equipment with various assemblies, it should be understood that not It is required that implementing all components shown.More or fewer components can alternatively be implemented.

Wherein, wireless communication unit 1110 allows the radio between electronic equipment 1100 and wireless communication system or network Communication.A/V input unit 1120 is for receiving audio or video signal.What user input unit 1130 can be inputted according to user Order generates key input data with the various operations of controlling electronic devices.Sensing unit 1140 detects the current of electronic equipment 1100 State, the position of electronic equipment 1100, user take the presence or absence of touch input of electronic equipment 1100, electronic equipment 1100 Acceleration or deceleration to, electronic equipment 1100 is mobile and direction etc., and generates the operation for being used for controlling electronic devices 1100 Order or signal.Interface unit 1170 be used as at least one external device (ED) connect with electronic equipment 1100 can by connect Mouthful.Output unit 1150 is configured to provide output signal with vision, audio and/or tactile manner.Memory 1160 can be deposited The software program etc. of processing and control operation that storage is executed by controller 1180, or can temporarily store oneself through output or The data that will be exported.Memory 1160 may include the storage medium of at least one type.Moreover, electronic equipment 1100 can be with It cooperates with the network storage device for the store function for executing memory 1160 by network connection.The usually control electricity of controller 1180 The overall operation of sub- equipment.In addition, controller 1180 may include for reproducing or the multi-media module of multimedia playback data. The handwriting input executed on the touchscreen or picture can be drawn input and be known by controller 1180 with execution pattern identifying processing It Wei not character or image.Power supply unit 1190 receives external power or internal power and is provided under the control of controller 1180 Operate electric power appropriate needed for each element and component.

The various embodiments of subordinate sentence method proposed by the present invention can with use such as computer software, hardware or its The computer-readable medium of what combination is implemented.Hardware is implemented, the various embodiments of subordinate sentence method proposed by the present invention It can be by using application-specific IC (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate array (FPGA), processor, controller, microcontroller, Wei Chu Reason device is designed to execute at least one of electronic unit of function described herein to implement, in some cases, this hair The various embodiments of the subordinate sentence method of bright proposition can be implemented in controller 1180.For software implementation, the present invention is proposed Subordinate sentence method various embodiments can come with the individual software module for allowing to execute at least one functions or operations it is real It applies.Software code can be implemented by the software application (or program) write with any programming language appropriate, software generation Code can store in memory 1160 and be executed by controller 1180.

In order to realize above-described embodiment, the present invention also proposes a kind of computer readable storage medium, and the program is by processor The subordinate sentence method as described in preceding method embodiment is realized when execution.

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.

In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.

Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention Embodiment person of ordinary skill in the field understood.

Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media His suitable method is handled electronically to obtain described program, is then stored in computer storage.

It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used Any one of art or their combination are realized: have for data-signal is realized the logic gates of logic function from Logic circuit is dissipated, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (FPGA), scene can compile Journey gate array (FPGA) etc..

Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.

It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer In read/write memory medium.

Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as to limit of the invention System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of the invention Type.

Claims

1. a kind of subordinate sentence method characterized by comprising

Obtain the text to subordinate sentence；

Word segmentation processing is carried out to the text to subordinate sentence, to generate the corresponding segmentation sequence of the text；

Determine the part of speech of each participle in the segmentation sequence；And

According to the part of speech of each participle, subordinate sentence is carried out to generate multiple sentences to the segmentation sequence.

2. the method as described in claim 1, which is characterized in that the segmentation sequence includes the Chinese of the text to subordinate sentence Symbol and foreign language symbol, it is described that word segmentation processing is carried out to the text to subordinate sentence, to generate the corresponding participle sequence of the text Column, comprising:

Removal is described to the messy code in the text of subordinate sentence；And/or

The Chinese symbol is converted by the foreign language symbol in the text to subordinate sentence, wherein the foreign language symbol packet Foreign language punctuation mark is included, the Chinese symbol includes Chinese punctuation mark；And

Word segmentation processing is carried out to the text to subordinate sentence using segmentation methods, to generate the corresponding segmentation sequence of the text.

3. method according to claim 2, which is characterized in that the part of speech according to each participle, to the segmentation sequence Subordinate sentence is carried out to generate multiple sentences, comprising:

According to pre-set sentence tail word part of speech table, the corresponding part of speech of sentence tail word is selected from the part of speech of each participle, In, the sentence tail word refers to the word occurred at sentence end；

According to the corresponding part of speech of the sentence tail word, the sentence tail word is determined in the corresponding segmentation sequence of the text；And

Subordinate sentence is carried out to the corresponding segmentation sequence of the text according to the sentence tail word and/or the Chinese punctuation mark, with life At multiple sentences.

4. method as claimed in claim 3, which is characterized in that the part of speech according to each participle, to the segmentation sequence Subordinate sentence is carried out to generate multiple sentences, further includes:

Judge the generation sentence whether include default part of speech word；

If the sentence of the generation includes the word of the default part of speech, prediction supplement is described pre- in the sentence of the generation If the corresponding punctuation mark of part of speech.

5. such as method of any of claims 1-4, which is characterized in that right in the part of speech according to each participle After the segmentation sequence carries out subordinate sentence to generate multiple sentences, further includes:

Word segmentation processing is carried out to the sentence using segmentation methods, to obtain the word segmentation result of the sentence；

According to the word segmentation result of the sentence, N-Gram language model is constructed；

The sentence is corrected using the N-Gram language model.

6. method as claimed in claim 5, which is characterized in that it is described using the N-Gram language model to the sentence into Row correction, comprising:

A group word is carried out to the sentence using the N-Gram language model；

The prediction of word is carried out to the sentence using the N-Gram language model；

The prediction of punctuation mark is carried out to the sentence using the N-Gram language model.

7. a kind of subordinate sentence device characterized by comprising

Module is obtained, for obtaining the text to subordinate sentence；

First participle processing module, it is corresponding to generate the text for carrying out word segmentation processing to the text to subordinate sentence Segmentation sequence；

Determining module, for determining the part of speech of each participle in the segmentation sequence；And

Subordinate sentence module carries out subordinate sentence to the segmentation sequence for the part of speech according to each participle to generate multiple sentences.

8. device as claimed in claim 7, which is characterized in that the segmentation sequence includes the Chinese of the text to subordinate sentence Symbol and foreign language symbol, the first participle processing module, comprising:

Submodule is removed, it is described to the messy code in the text of subordinate sentence for removing；And/or

Submodule is converted, for converting the Chinese symbol for the foreign language symbol in the text to subordinate sentence, wherein The foreign language symbol includes foreign language punctuation mark, and the Chinese symbol includes Chinese punctuation mark；And

Word segmentation processing submodule, for using segmentation methods to carry out word segmentation processing to the text to subordinate sentence, described in generating The corresponding segmentation sequence of text.

9. device as claimed in claim 8, which is characterized in that the subordinate sentence module, comprising:

Submodule is selected, for selecting sentence tail from the part of speech of each participle according to pre-set sentence tail word part of speech table The corresponding part of speech of word, wherein the sentence tail word refers to the word occurred at sentence end；

Submodule is determined, for determining institute in the corresponding segmentation sequence of the text according to the corresponding part of speech of the sentence tail word State a tail word；And

Subordinate sentence submodule is used for according to the sentence tail word and/or the Chinese punctuation mark to the corresponding participle sequence of the text Column carry out subordinate sentence, to generate multiple sentences.

10. device as claimed in claim 9, which is characterized in that the subordinate sentence module, further includes:

Judging submodule, for judge the generation sentence whether include default part of speech word；

Prediction supplement submodule, for determining that the sentence of the generation includes the word of the default part of speech when the judging submodule When language, prediction supplements the corresponding punctuation mark of the default part of speech in the sentence of the generation.