CN109684638A - Sentence segmentation method and device, electronic equipment and computer readable storage medium - Google Patents
Sentence segmentation method and device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN109684638A CN109684638A CN201811579742.9A CN201811579742A CN109684638A CN 109684638 A CN109684638 A CN 109684638A CN 201811579742 A CN201811579742 A CN 201811579742A CN 109684638 A CN109684638 A CN 109684638A
- Authority
- CN
- China
- Prior art keywords
- sentence
- word
- text
- speech
- participle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
The embodiment of the invention provides a sentence dividing method and a device thereof, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: and acquiring a text to be segmented, and performing word segmentation processing on the text to be segmented to generate a word segmentation sequence corresponding to the text. Determining the part of speech of each participle in the participle sequence, and according to the part of speech of each participle, carrying out sentence segmentation on the participle sequence to generate a plurality of sentences. Therefore, the method and the device realize that the arranged sentences are generated according to the part of speech of the participle by carrying out the participle processing on the text, so that the arranged sentences are more convenient to be analyzed and processed by the language model, and further improve the accuracy of sentence correction. The technical problem that the result obtained by directly correcting the daily conversation by using the language model in the prior art is inaccurate is solved.
Description
Technical field
The present invention relates to natural language processing technique field more particularly to a kind of subordinate sentence method and device thereof, electronic equipment,
Computer readable storage medium.
Background technique
With the development of natural language processing technique, in order to allow computer that can more accurately understand the natural language of the mankind
Speech, needs to correct sentence, keeps it more complete and rationalizes.
In the related technology, directly every-day language is corrected with language model, but the dialogue of daily vernacular often lacks
Punctuation mark, so that the result inaccuracy directly corrected with language model to every-day language.Therefore, it is necessary to a kind of energy
Enough methods that relatively nonstandard every-day language is rearranged and standardized, further to be located to every-day language
Reason.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.
For this purpose, the first purpose of this invention is to propose a kind of subordinate sentence method, to realize by segmenting to text
Processing generates the sentence after arranging according to the part of speech of participle, so that the sentence after arranging easily facilitates and divided by language model
Analysis and processing, and then improve the accuracy rate of sentence correction.
Second object of the present invention is to propose a kind of subordinate sentence device.
Third object of the present invention is to propose a kind of electronic equipment.
Fourth object of the present invention is to propose a kind of computer readable storage medium.
In order to achieve the above object, the subordinate sentence method of first aspect present invention embodiment, comprising: obtain the text to subordinate sentence;It is right
The text to subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of the text;It determines in the segmentation sequence
The part of speech of each participle;According to the part of speech of each participle, subordinate sentence is carried out to generate multiple sentences to the segmentation sequence.
In addition, the subordinate sentence method of the embodiment of the present invention, also has following additional technical characteristic:
Optionally, the segmentation sequence includes the Chinese symbol and foreign language symbol of the text to subordinate sentence, described to institute
It states and carries out word segmentation processing to the text of subordinate sentence, to generate the corresponding segmentation sequence of the text, comprising: removal is described to subordinate sentence
Messy code in text;And/or the Chinese symbol is converted by the foreign language symbol in the text to subordinate sentence, wherein
The foreign language symbol includes foreign language punctuation mark, and the Chinese symbol includes Chinese punctuation mark;And use segmentation methods pair
The text to subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of the text.
Optionally, the part of speech according to each participle carries out subordinate sentence to the segmentation sequence to generate multiple sentences, wraps
It includes: according to pre-set sentence tail word part of speech table, the corresponding part of speech of sentence tail word is selected from the part of speech of each participle,
In, the sentence tail word refers to the word occurred at sentence end;It is corresponding in the text according to the corresponding part of speech of the sentence tail word
Segmentation sequence in determine the sentence tail word;And according to the sentence tail word and/or the Chinese punctuation mark to the text
Corresponding segmentation sequence carries out subordinate sentence, to generate multiple sentences.
Optionally, the part of speech according to each participle carries out subordinate sentence to generate multiple sentences, also to the segmentation sequence
Include: judge the generation sentence whether include default part of speech word;If the sentence of the generation includes the default word
Property word, then the corresponding punctuation mark of the prediction supplement default part of speech in the sentence of the generation.
Optionally, in the part of speech according to each participle, subordinate sentence is carried out to generate multiple sentences to the segmentation sequence
Later, further includes: word segmentation processing is carried out to the sentence using segmentation methods, to obtain the word segmentation result of the sentence;According to
The word segmentation result of the sentence constructs N-Gram language model;The sentence is rectified using the N-Gram language model
Just.
Optionally, described that the sentence is corrected using the N-Gram language model, comprising: to use the N-
Gram language model carries out a group word to the sentence;The prediction of word is carried out to the sentence using the N-Gram language model;
The prediction of punctuation mark is carried out to the sentence using the N-Gram language model.
The subordinate sentence device of second aspect of the present invention embodiment, comprising: module is obtained, for obtaining the text to subordinate sentence;The
One word segmentation processing module, for carrying out word segmentation processing to the text to subordinate sentence, to generate the corresponding participle sequence of the text
Column;Determining module, for determining the part of speech of each participle in the segmentation sequence;And subordinate sentence module, for according to each point
The part of speech of word carries out subordinate sentence to the segmentation sequence to generate multiple sentences.
In addition, the subordinate sentence device of the embodiment of the present invention, also has following additional technical characteristic:
Optionally, the segmentation sequence includes the Chinese symbol and foreign language symbol of the text to subordinate sentence, and described first
Word segmentation processing module, comprising: removal submodule, it is described to the messy code in the text of subordinate sentence for removing;And/or conversion submodule
Block, for converting the Chinese symbol for the foreign language symbol in the text to subordinate sentence, wherein the foreign language symbol
Including foreign language punctuation mark, the Chinese symbol includes Chinese punctuation mark;And word segmentation processing submodule, for using participle
Algorithm carries out word segmentation processing to the text to subordinate sentence, to generate the corresponding segmentation sequence of the text.
Optionally, the subordinate sentence module, comprising: selection submodule, for according to pre-set sentence tail word part of speech table, from
The corresponding part of speech of sentence tail word is selected in the part of speech of each participle, wherein the sentence tail word refers in the appearance of sentence end
Word;Submodule is determined, for determining institute in the corresponding segmentation sequence of the text according to the corresponding part of speech of the sentence tail word
State a tail word;And subordinate sentence submodule, for corresponding to the text according to the sentence tail word and/or the Chinese punctuation mark
Segmentation sequence carry out subordinate sentence, to generate multiple sentences.
Optionally, the subordinate sentence module, further includes: judging submodule, for judge the generation sentence whether include
The word of default part of speech;Prediction supplement submodule, for determining that the sentence of the generation includes described when the judging submodule
When the word of default part of speech, prediction supplements the corresponding punctuation mark of the default part of speech in the sentence of the generation.
Optionally, described device further include: the second word segmentation processing module, for being carried out using segmentation methods to the sentence
Word segmentation processing, to obtain the word segmentation result of the sentence;Module is constructed, for the word segmentation result according to the sentence, constructs N-
Gram language model;Rectification module, for being corrected using the N-Gram language model to the sentence.
Optionally, the rectification module, comprising: group lexon module, for using the N-Gram language model to described
Sentence carries out a group word;First prediction submodule, for using the N-Gram language model to carry out the prediction of word to the sentence;
Second prediction submodule, for using the N-Gram language model to carry out the prediction of punctuation mark to the sentence.
The electronic equipment of third aspect present invention embodiment, comprising: memory, processor and storage are on a memory and can
The computer program run on a processor when the processor executes described program, is realized as described in preceding method embodiment
Subordinate sentence method.
The computer readable storage medium of fourth aspect present invention embodiment, is stored thereon with computer program, the program
The subordinate sentence method as described in preceding method embodiment is realized when being executed by processor.
Technical solution provided in an embodiment of the present invention may include following the utility model has the advantages that obtaining the text to subordinate sentence, right
Text to subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of text, wherein segmentation sequence includes the text to subordinate sentence
This Chinese punctuation mark.The part of speech for determining each participle in segmentation sequence, according to the part of speech of each participle, to segmentation sequence into
Row subordinate sentence is to generate multiple sentences.Hereby it is achieved that being generated and being arranged according to the part of speech of participle by carrying out word segmentation processing to text
Sentence afterwards so that the sentence after arranging easily facilitates and analyzed and handled by language model, and then improves sentence correction
Accuracy rate.
The additional aspect of the present invention and advantage will be set forth in part in the description, and will partially become from the following description
Obviously, or practice through the invention is recognized.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of subordinate sentence method provided by the embodiment of the present invention;
The flow diagram for another subordinate sentence method that Fig. 2 is proposed by the embodiment of the present invention;
Fig. 3 is a kind of structural schematic diagram of subordinate sentence device provided by the embodiment of the present invention;
Fig. 4 is the structural schematic diagram of another kind subordinate sentence device provided by the embodiment of the present invention;And
Fig. 5 is the hardware structural diagram for illustrating electronic equipment according to an embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, it is intended to is used to explain the present invention, and is not considered as limiting the invention.
Below with reference to the accompanying drawings the subordinate sentence method and device thereof of the embodiment of the present invention are described, electronic equipment, computer-readable are deposited
Storage media.
Description based on the above-mentioned prior art is it is recognised that directly correct every-day language with language model, still
The dialogue of daily vernacular often lacks punctuation mark, so that the result directly corrected with language model to every-day language is not
Accurately.
For this problem, the embodiment of the invention provides a kind of subordinate sentence methods, to realize by segmenting to text
Processing generates the sentence after arranging according to the part of speech of participle, so that the sentence after arranging easily facilitates and divided by language model
Analysis and processing, and then improve the accuracy rate of sentence correction.
Fig. 1 is a kind of flow diagram of subordinate sentence method provided by the embodiment of the present invention.As shown in Figure 1, the subordinate sentence side
Method the following steps are included:
S101 obtains the text to subordinate sentence.
It should be understood that although the subordinate sentence method that the embodiment of the present invention is proposed is to be directed to lack the daily of punctuation mark
What dialogue was proposed, but be not restricted to that every-day language, can be used for carrying out subordinate sentence to the text more standardized.
S102, the text for treating subordinate sentence carries out word segmentation processing, to generate the corresponding segmentation sequence of text.
It should be appreciated that in every-day language, in addition to there are problems that lacking punctuation mark, messy code there is also, such as:
(* ' ▽ `) ノ ノ, Shi auspicious day today.
In addition, to may be not only comprising Chinese symbol in the text of subordinate sentence, but also include foreign language symbol.Therefore, subordinate sentence is treated
Text carries out word segmentation processing, and the segmentation sequence of generation includes the Chinese symbol and foreign language symbol of the text to subordinate sentence.Such as English
Comma in text is ", ", and the comma in Chinese is ", ", and may both include ", " in segmentation sequence, and including ", ".
In order to which the text for allowing segmentation methods that can more accurately treat subordinate sentence carries out word segmentation processing, a kind of possible realization
Mode is, removes to the messy code in the text of subordinate sentence, and/or, Chinese symbol is converted by the foreign language symbol in the text to subordinate sentence
Number, wherein foreign language symbol includes foreign language punctuation mark, and Chinese symbol includes Chinese punctuation mark, for example converts ", " for ", ".
Word segmentation processing is carried out using the text that segmentation methods treat subordinate sentence, to generate the corresponding segmentation sequence of text.
It should be appreciated that foreign language symbol in addition to punctuation mark, further includes other symbols, such as currency symbol, although such as currency
Other symbols of symbol etc will not impact subordinate sentence, but may correct and have an impact to subsequent sentence, therefore
Foreign language symbol in text can be converted to Chinese symbol, to improve the accuracy rate of sentence correction.
It should be noted that segmentation methods can be known any segmentation methods, as long as can will be thin to the text of subordinate sentence
The form of word granularity is turned to, convenient for reconfiguring word, and then realizes the subordinate sentence to text.
S103 determines the part of speech of each participle in segmentation sequence.
Wherein, part of speech includes: noun, verb, adjective, number, quantifier, pronoun, adverbial word, preposition, conjunction, auxiliary word, sighs
One or more in word and onomatopoeia.
S104 carries out subordinate sentence to segmentation sequence according to the part of speech of each participle to generate multiple sentences.
It should be appreciated that Chinese language habit in, modal particle is usually placed on a tail, such as: " ", " ", " ",
" ", auxiliary word also occur frequently in a tail, such as: " ", " ".
Therefore, a tail word part of speech table can be preset, the part of speech of sentence tail word and sentence tail word is subjected to induction-arrangement.
In addition, to Chinese punctuation mark original in the text of subordinate sentence, and be converted in S102 by foreign language punctuate
Chinese punctuation mark all can serve as the mark of subordinate sentence.
One kind is possible to be achieved in that, according to pre-set sentence tail word part of speech table, is selected from the part of speech of each participle
Select the corresponding part of speech of tail word, wherein sentence tail word refers to the word occurred at sentence end.According to the corresponding part of speech of sentence tail word,
Sentence tail word is determined in the corresponding segmentation sequence of text.According to sentence tail word and/or Chinese punctuation mark to the corresponding participle of text
Sequence carries out subordinate sentence, to generate multiple sentences.
In conclusion a kind of subordinate sentence method provided by the embodiment of the present invention, obtains the text to subordinate sentence, treats subordinate sentence
Text carries out word segmentation processing, to generate the corresponding segmentation sequence of text, wherein segmentation sequence includes the Chinese of the text to subordinate sentence
Punctuation mark.The part of speech for determining each participle in segmentation sequence, according to the part of speech of each participle, to segmentation sequence carry out subordinate sentence with
Generate multiple sentences.Hereby it is achieved that generating the sentence after arranging according to the part of speech of participle by carrying out word segmentation processing to text
Son so that the sentence after arranging easily facilitates and analyzed and handled by language model, and then improves the accurate of sentence correction
Rate.
In order to clearly illustrate subordinate sentence method provided by the embodiment of the present invention, it is exemplified below.
It is to subordinate sentence text, " it is serious only to cough? what symptom that has a sore throat, which is had a talk about, buys medicine to you ",
" grace is remembered to have taken back to me.I goes for a dip now ", use the segmentation sequence and corresponding word obtained after segmentation methods
Property for " tight/a, not sternly/a, weight/a, eh/y, only/d, cough/vi, /y,?, have a sore throat/n, /y, what/ry, symptom/n,
Eh/y ,/v is had a talk about ,/p is given, you/rr, medicine/nz is bought, eh/y ", " grace/ng remembers/v, gives/p, I/rr, by/v, back/v, good/
A ,/ule,., I/rr, now/t removes/vf, bathing/vi ,/ule ".
Wherein ,/a indicates that adjective ,/y indicate that modal particle ,/d indicate that adverbial word, vi/ indicate that intransitive verb ,/n indicate name
Word, ry/ indicate that interrogative pronoun, v/ indicate that verb, p/ indicate that preposition, rr/ indicate that personal pronoun ,/nz indicate other proper names, ng/
Indicate nominal morpheme, ule/ illustrate or, t/ indicates that time word, vf/ indicate directional verb.
Subordinate sentence is carried out to segmentation sequence, multiple sentences of generation are " serious ";" only coughing? ";It " has a sore throat
";" what symptom ";" have a talk about and buy medicine to you ";" grace ";" remember to have taken back to me.";" I goes for a dip now ".
Further, judge the sentence generated whether include default part of speech word, if the sentence generated includes default word
Property word, then prediction supplements the default corresponding punctuation mark of part of speech in the sentence of generation.Such as: sentence " what disease of generation
Include in shape " interrogative pronoun " what " can then predict to supplement the corresponding punctuation mark of the sentence be "? ", " what which becomes
Symptom? ".
In order to which subordinate sentence method provided by the embodiment of the present invention can be corrected to the sentence after subordinate sentence processing, this
Inventive embodiments also proposed another subordinate sentence method.The stream for another subordinate sentence method that Fig. 2 is proposed by the embodiment of the present invention
Journey schematic diagram.As shown in Fig. 2, method flow shown in FIG. 1 is based on, in S104, according to the part of speech of each participle, to segmentation sequence
After subordinate sentence is carried out to generate multiple sentences, further includes:
S201 carries out word segmentation processing to sentence using segmentation methods, to obtain the word segmentation result of sentence.
It should be appreciated that the multiple sentences generated in S104, are the text progress punctuate generations later again for treating subordinate sentence,
Each sentence is the sentence of independent completion, can be corrected to it.
In order to use N-Gram language model to correct sentence, it is necessary first to be segmented again to sentence.
S202 constructs N-Gram language model according to the word segmentation result of sentence.
S203 corrects sentence using N-Gram language model.
It is appreciated that N-Gram language model is being needed using the collocation information between adjacent word in context continuous nothing
When space word is converted into sentence, the sentence with maximum probability can be calculated, to realize the correction to sentence.
Further, sentence is corrected using N-Gram language model, is specifically included: when continuous multiple in sentence
When word needs front and back is connected to be converted to phrase, a group word is carried out to sentence using N-Gram language model.When lacking word in sentence
When language, the prediction of word is carried out to sentence using N-Gram language model.When lacking punctuation mark in sentence, N-Gram language is used
Say that model carries out the prediction of punctuation mark to sentence.
To realize and be corrected to the sentence generated after subordinate sentence.
In order to realize above-described embodiment, the embodiment of the present invention also proposed a kind of subordinate sentence device, and Fig. 3 is the embodiment of the present invention
A kind of structural schematic diagram of provided subordinate sentence device.As shown in figure 3, the subordinate sentence device includes: to obtain module 310, first point
Word processing module 320, determining module 330, subordinate sentence module 340.
Module 310 is obtained, for obtaining the text to subordinate sentence.
First participle processing module 320, the text for treating subordinate sentence carries out word segmentation processing, to generate corresponding point of text
Word sequence.
Determining module 330, for determining the part of speech of each participle in segmentation sequence.
Subordinate sentence module 340 carries out subordinate sentence to segmentation sequence for the part of speech according to each participle to generate multiple sentences.
Further, a kind of in order to which the text for allowing segmentation methods that can more accurately treat subordinate sentence carries out word segmentation processing
Possible to be achieved in that, segmentation sequence includes the Chinese symbol and foreign language symbol of the text to subordinate sentence, and the first participle handles mould
Block 320, comprising: removal submodule 321, for removing the messy code in the text to subordinate sentence.And/or conversion submodule 322, it uses
In converting Chinese symbol for the foreign language symbol in the text to subordinate sentence, wherein foreign language symbol includes foreign language punctuation mark, Chinese
Symbol includes Chinese punctuation mark.Word segmentation processing submodule 323, the text for treating subordinate sentence using segmentation methods are segmented
Processing, to generate the corresponding segmentation sequence of text.
Further, in order to according to the part of speech of each participle, to segmentation sequence progress subordinate sentence to generate multiple sentences, one kind
It is possible to be achieved in that, subordinate sentence module 340, comprising: selection submodule 341, for according to pre-set sentence tail word part of speech
Table selects the corresponding part of speech of sentence tail word, wherein sentence tail word refers to the word occurred at sentence end from the part of speech of each participle
Language.Submodule 342 is determined, for determining sentence tail word in the corresponding segmentation sequence of text according to the corresponding part of speech of sentence tail word.Point
Sentence module 343, for carrying out subordinate sentence to the corresponding segmentation sequence of text according to sentence tail word and/or Chinese punctuation mark, with life
At multiple sentences.
Further, for the prediction supplement punctuation mark in the sentence of generation, one kind is possible to be achieved in that, subordinate sentence
Module 340, further includes: judging submodule 344, for judge generate sentence whether include default part of speech word.Prediction is mended
Submodule 345 is filled, when for determining that the sentence of generation includes the word of default part of speech when judging submodule 344, in the sentence of generation
Prediction supplements the default corresponding punctuation mark of part of speech in son.
It should be noted that the aforementioned explanation to subordinate sentence embodiment of the method is also applied for the subordinate sentence dress of the embodiment
It sets, details are not described herein again.
In conclusion a kind of subordinate sentence device provided by the embodiment of the present invention, obtains the text to subordinate sentence, treats subordinate sentence
Text carries out word segmentation processing, to generate the corresponding segmentation sequence of text, wherein segmentation sequence includes the Chinese of the text to subordinate sentence
Punctuation mark.The part of speech for determining each participle in segmentation sequence, according to the part of speech of each participle, to segmentation sequence carry out subordinate sentence with
Generate multiple sentences.Hereby it is achieved that generating the sentence after arranging according to the part of speech of participle by carrying out word segmentation processing to text
Son so that the sentence after arranging easily facilitates and analyzed and handled by language model, and then improves the accurate of sentence correction
Rate.
In order to realize above-described embodiment, the embodiment of the present invention also proposed another subordinate sentence device, and Fig. 4 is that the present invention is implemented
The structural schematic diagram of another kind subordinate sentence device provided by example.As shown in figure 4, it is based on apparatus structure shown in Fig. 3, subordinate sentence dress
It sets further include: the second word segmentation processing module 350 constructs module 360, rectification module 370.
Second word segmentation processing module 350, for carrying out word segmentation processing to sentence using segmentation methods, to obtain point of sentence
Word result.
Module 360 is constructed, for the word segmentation result according to sentence, constructs N-Gram language model.
Rectification module 370, for being corrected using N-Gram language model to sentence.
Further, in order to use N-Gram language model to correct sentence, one kind is possible to be achieved in that, is rectified
Positive module 370, comprising: group lexon module 371, for carrying out a group word to sentence using N-Gram language model.First prediction
Module 372, for using N-Gram language model to carry out the prediction of word to sentence.Second prediction submodule 373, for using N-
Gram language model carries out the prediction of punctuation mark to sentence.
It should be noted that the aforementioned explanation to subordinate sentence embodiment of the method is also applied for the subordinate sentence dress of the embodiment
It sets, details are not described herein again.
To realize and be corrected to the sentence generated after subordinate sentence.
In order to realize above-described embodiment, the present invention also proposes a kind of electronic equipment, comprising: memory, processor and storage
On a memory and the computer program that can run on a processor, it when the processor executes described program, realizes as aforementioned
Subordinate sentence method described in embodiment of the method.
Fig. 5 is the hardware structural diagram for illustrating electronic equipment according to an embodiment of the present invention.Electronic equipment can be with each
Kind of form is implemented, and the electronic equipment in the present invention can include but is not limited to such as mobile phone, smart phone, notebook electricity
Brain, PDA (personal digital assistant), PAD (tablet computer), PMP (portable media player), is led at digit broadcasting receiver
The mobile electronic device of boat device, vehicle electronic device, car-mounted display electronics, vehicle electronics rearview mirror etc. and such as number
The stationary electronic devices of TV, desktop computer etc..
As shown in figure 5, electronic equipment 1100 may include wireless communication unit 1110, A/V (audio/video) input unit
1120, user input unit 1130, sensing unit 1140, output unit 1150, memory 1160, interface unit 1170, control
Device 1180 and power supply unit 1190 etc..Fig. 5 shows the electronic equipment with various assemblies, it should be understood that not
It is required that implementing all components shown.More or fewer components can alternatively be implemented.
Wherein, wireless communication unit 1110 allows the radio between electronic equipment 1100 and wireless communication system or network
Communication.A/V input unit 1120 is for receiving audio or video signal.What user input unit 1130 can be inputted according to user
Order generates key input data with the various operations of controlling electronic devices.Sensing unit 1140 detects the current of electronic equipment 1100
State, the position of electronic equipment 1100, user take the presence or absence of touch input of electronic equipment 1100, electronic equipment 1100
Acceleration or deceleration to, electronic equipment 1100 is mobile and direction etc., and generates the operation for being used for controlling electronic devices 1100
Order or signal.Interface unit 1170 be used as at least one external device (ED) connect with electronic equipment 1100 can by connect
Mouthful.Output unit 1150 is configured to provide output signal with vision, audio and/or tactile manner.Memory 1160 can be deposited
The software program etc. of processing and control operation that storage is executed by controller 1180, or can temporarily store oneself through output or
The data that will be exported.Memory 1160 may include the storage medium of at least one type.Moreover, electronic equipment 1100 can be with
It cooperates with the network storage device for the store function for executing memory 1160 by network connection.The usually control electricity of controller 1180
The overall operation of sub- equipment.In addition, controller 1180 may include for reproducing or the multi-media module of multimedia playback data.
The handwriting input executed on the touchscreen or picture can be drawn input and be known by controller 1180 with execution pattern identifying processing
It Wei not character or image.Power supply unit 1190 receives external power or internal power and is provided under the control of controller 1180
Operate electric power appropriate needed for each element and component.
The various embodiments of subordinate sentence method proposed by the present invention can with use such as computer software, hardware or its
The computer-readable medium of what combination is implemented.Hardware is implemented, the various embodiments of subordinate sentence method proposed by the present invention
It can be by using application-specific IC (ASIC), digital signal processor (DSP), digital signal processing device
(DSPD), programmable logic device (PLD), field programmable gate array (FPGA), processor, controller, microcontroller, Wei Chu
Reason device is designed to execute at least one of electronic unit of function described herein to implement, in some cases, this hair
The various embodiments of the subordinate sentence method of bright proposition can be implemented in controller 1180.For software implementation, the present invention is proposed
Subordinate sentence method various embodiments can come with the individual software module for allowing to execute at least one functions or operations it is real
It applies.Software code can be implemented by the software application (or program) write with any programming language appropriate, software generation
Code can store in memory 1160 and be executed by controller 1180.
In order to realize above-described embodiment, the present invention also proposes a kind of computer readable storage medium, and the program is by processor
The subordinate sentence method as described in preceding method embodiment is realized when execution.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example
Point is included at least one embodiment or example of the invention.In the present specification, schematic expression of the above terms are not
It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office
It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field
Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples
It closes and combines.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance
Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or
Implicitly include at least one this feature.In the description of the present invention, the meaning of " plurality " is at least two, such as two, three
It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes
It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion
Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discussed suitable
Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be of the invention
Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use
In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for
Instruction execution system, device or equipment (such as computer based system, including the system of processor or other can be held from instruction
The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set
It is standby and use.For the purpose of this specification, " computer-readable medium ", which can be, any may include, stores, communicates, propagates or pass
Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment
It sets.The more specific example (non-exhaustive list) of computer-readable medium include the following: there is the electricity of one or more wirings
Interconnecting piece (electronic device), portable computer diskette box (magnetic device), random access memory (RAM), read-only memory
(ROM), erasable edit read-only storage (EPROM or flash memory), fiber device and portable optic disk is read-only deposits
Reservoir (CDROM).In addition, computer-readable medium can even is that the paper that can print described program on it or other are suitable
Medium, because can then be edited, be interpreted or when necessary with it for example by carrying out optical scanner to paper or other media
His suitable method is handled electronically to obtain described program, is then stored in computer storage.
It should be appreciated that each section of the invention can be realized with hardware, software, firmware or their combination.Above-mentioned
In embodiment, software that multiple steps or method can be executed in memory and by suitable instruction execution system with storage
Or firmware is realized.Such as, if realized with hardware in another embodiment, following skill well known in the art can be used
Any one of art or their combination are realized: have for data-signal is realized the logic gates of logic function from
Logic circuit is dissipated, the specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (FPGA), scene can compile
Journey gate array (FPGA) etc..
Those skilled in the art are understood that realize all or part of step that above-described embodiment method carries
It suddenly is that relevant hardware can be instructed to complete by program, the program can store in a kind of computer-readable storage medium
In matter, which when being executed, includes the steps that one or a combination set of embodiment of the method.
It, can also be in addition, each functional unit in each embodiment of the present invention can integrate in a processing module
It is that each unit physically exists alone, can also be integrated in two or more units in a module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also be realized in the form of software function module.The integrated module is such as
Fruit is realized and when sold or used as an independent product in the form of software function module, also can store in a computer
In read/write memory medium.
Storage medium mentioned above can be read-only memory, disk or CD etc..Although having been shown and retouching above
The embodiment of the present invention is stated, it is to be understood that above-described embodiment is exemplary, and should not be understood as to limit of the invention
System, those skilled in the art can be changed above-described embodiment, modify, replace and become within the scope of the invention
Type.
Claims (10)
1. a kind of subordinate sentence method characterized by comprising
Obtain the text to subordinate sentence;
Word segmentation processing is carried out to the text to subordinate sentence, to generate the corresponding segmentation sequence of the text;
Determine the part of speech of each participle in the segmentation sequence;And
According to the part of speech of each participle, subordinate sentence is carried out to generate multiple sentences to the segmentation sequence.
2. the method as described in claim 1, which is characterized in that the segmentation sequence includes the Chinese of the text to subordinate sentence
Symbol and foreign language symbol, it is described that word segmentation processing is carried out to the text to subordinate sentence, to generate the corresponding participle sequence of the text
Column, comprising:
Removal is described to the messy code in the text of subordinate sentence;And/or
The Chinese symbol is converted by the foreign language symbol in the text to subordinate sentence, wherein the foreign language symbol packet
Foreign language punctuation mark is included, the Chinese symbol includes Chinese punctuation mark;And
Word segmentation processing is carried out to the text to subordinate sentence using segmentation methods, to generate the corresponding segmentation sequence of the text.
3. method according to claim 2, which is characterized in that the part of speech according to each participle, to the segmentation sequence
Subordinate sentence is carried out to generate multiple sentences, comprising:
According to pre-set sentence tail word part of speech table, the corresponding part of speech of sentence tail word is selected from the part of speech of each participle,
In, the sentence tail word refers to the word occurred at sentence end;
According to the corresponding part of speech of the sentence tail word, the sentence tail word is determined in the corresponding segmentation sequence of the text;And
Subordinate sentence is carried out to the corresponding segmentation sequence of the text according to the sentence tail word and/or the Chinese punctuation mark, with life
At multiple sentences.
4. method as claimed in claim 3, which is characterized in that the part of speech according to each participle, to the segmentation sequence
Subordinate sentence is carried out to generate multiple sentences, further includes:
Judge the generation sentence whether include default part of speech word;
If the sentence of the generation includes the word of the default part of speech, prediction supplement is described pre- in the sentence of the generation
If the corresponding punctuation mark of part of speech.
5. such as method of any of claims 1-4, which is characterized in that right in the part of speech according to each participle
After the segmentation sequence carries out subordinate sentence to generate multiple sentences, further includes:
Word segmentation processing is carried out to the sentence using segmentation methods, to obtain the word segmentation result of the sentence;
According to the word segmentation result of the sentence, N-Gram language model is constructed;
The sentence is corrected using the N-Gram language model.
6. method as claimed in claim 5, which is characterized in that it is described using the N-Gram language model to the sentence into
Row correction, comprising:
A group word is carried out to the sentence using the N-Gram language model;
The prediction of word is carried out to the sentence using the N-Gram language model;
The prediction of punctuation mark is carried out to the sentence using the N-Gram language model.
7. a kind of subordinate sentence device characterized by comprising
Module is obtained, for obtaining the text to subordinate sentence;
First participle processing module, it is corresponding to generate the text for carrying out word segmentation processing to the text to subordinate sentence
Segmentation sequence;
Determining module, for determining the part of speech of each participle in the segmentation sequence;And
Subordinate sentence module carries out subordinate sentence to the segmentation sequence for the part of speech according to each participle to generate multiple sentences.
8. device as claimed in claim 7, which is characterized in that the segmentation sequence includes the Chinese of the text to subordinate sentence
Symbol and foreign language symbol, the first participle processing module, comprising:
Submodule is removed, it is described to the messy code in the text of subordinate sentence for removing;And/or
Submodule is converted, for converting the Chinese symbol for the foreign language symbol in the text to subordinate sentence, wherein
The foreign language symbol includes foreign language punctuation mark, and the Chinese symbol includes Chinese punctuation mark;And
Word segmentation processing submodule, for using segmentation methods to carry out word segmentation processing to the text to subordinate sentence, described in generating
The corresponding segmentation sequence of text.
9. device as claimed in claim 8, which is characterized in that the subordinate sentence module, comprising:
Submodule is selected, for selecting sentence tail from the part of speech of each participle according to pre-set sentence tail word part of speech table
The corresponding part of speech of word, wherein the sentence tail word refers to the word occurred at sentence end;
Submodule is determined, for determining institute in the corresponding segmentation sequence of the text according to the corresponding part of speech of the sentence tail word
State a tail word;And
Subordinate sentence submodule is used for according to the sentence tail word and/or the Chinese punctuation mark to the corresponding participle sequence of the text
Column carry out subordinate sentence, to generate multiple sentences.
10. device as claimed in claim 9, which is characterized in that the subordinate sentence module, further includes:
Judging submodule, for judge the generation sentence whether include default part of speech word;
Prediction supplement submodule, for determining that the sentence of the generation includes the word of the default part of speech when the judging submodule
When language, prediction supplements the corresponding punctuation mark of the default part of speech in the sentence of the generation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811579742.9A CN109684638B (en) | 2018-12-24 | 2018-12-24 | Clause method and device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811579742.9A CN109684638B (en) | 2018-12-24 | 2018-12-24 | Clause method and device, electronic equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109684638A true CN109684638A (en) | 2019-04-26 |
CN109684638B CN109684638B (en) | 2023-08-11 |
Family
ID=66188816
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811579742.9A Active CN109684638B (en) | 2018-12-24 | 2018-12-24 | Clause method and device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109684638B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222194A (en) * | 2019-05-21 | 2019-09-10 | 深圳壹账通智能科技有限公司 | Data drawing list generation method and relevant apparatus based on natural language processing |
CN110348013A (en) * | 2019-07-05 | 2019-10-18 | 武汉莱博信息技术有限公司 | Writing householder method, equipment and readable storage medium storing program for executing based on artificial intelligence |
CN110705261A (en) * | 2019-09-26 | 2020-01-17 | 浙江蓝鸽科技有限公司 | Chinese text word segmentation method and system thereof |
CN111259163A (en) * | 2020-01-14 | 2020-06-09 | 北京明略软件系统有限公司 | Knowledge graph generation method and device and computer readable storage medium |
CN111950237A (en) * | 2019-04-29 | 2020-11-17 | 深圳市优必选科技有限公司 | Sentence rewriting method, sentence rewriting device and electronic equipment |
CN112507714A (en) * | 2020-12-22 | 2021-03-16 | 北京百度网讯科技有限公司 | Text segmentation method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0981568A (en) * | 1995-09-11 | 1997-03-28 | Matsushita Electric Ind Co Ltd | Chinese language generation device for machine translation |
CN106446116A (en) * | 2016-09-18 | 2017-02-22 | 深圳麦亚信科技股份有限公司 | Business rule parameter interaction method and business rule parameter interaction device applied to rule engine |
CN107038163A (en) * | 2016-02-03 | 2017-08-11 | 常州普适信息科技有限公司 | A kind of text semantic modeling method towards magnanimity internet information |
CN107247706A (en) * | 2017-06-16 | 2017-10-13 | 中国电子技术标准化研究院 | Text punctuate method for establishing model, punctuate method, device and computer equipment |
CN108170674A (en) * | 2017-12-27 | 2018-06-15 | 东软集团股份有限公司 | Part-of-speech tagging method and apparatus, program product and storage medium |
-
2018
- 2018-12-24 CN CN201811579742.9A patent/CN109684638B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0981568A (en) * | 1995-09-11 | 1997-03-28 | Matsushita Electric Ind Co Ltd | Chinese language generation device for machine translation |
CN107038163A (en) * | 2016-02-03 | 2017-08-11 | 常州普适信息科技有限公司 | A kind of text semantic modeling method towards magnanimity internet information |
CN106446116A (en) * | 2016-09-18 | 2017-02-22 | 深圳麦亚信科技股份有限公司 | Business rule parameter interaction method and business rule parameter interaction device applied to rule engine |
CN107247706A (en) * | 2017-06-16 | 2017-10-13 | 中国电子技术标准化研究院 | Text punctuate method for establishing model, punctuate method, device and computer equipment |
CN108170674A (en) * | 2017-12-27 | 2018-06-15 | 东软集团股份有限公司 | Part-of-speech tagging method and apparatus, program product and storage medium |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950237A (en) * | 2019-04-29 | 2020-11-17 | 深圳市优必选科技有限公司 | Sentence rewriting method, sentence rewriting device and electronic equipment |
CN111950237B (en) * | 2019-04-29 | 2023-06-09 | 深圳市优必选科技有限公司 | Sentence rewriting method, sentence rewriting device and electronic equipment |
CN110222194A (en) * | 2019-05-21 | 2019-09-10 | 深圳壹账通智能科技有限公司 | Data drawing list generation method and relevant apparatus based on natural language processing |
CN110222194B (en) * | 2019-05-21 | 2022-10-04 | 深圳壹账通智能科技有限公司 | Data chart generation method based on natural language processing and related device |
CN110348013A (en) * | 2019-07-05 | 2019-10-18 | 武汉莱博信息技术有限公司 | Writing householder method, equipment and readable storage medium storing program for executing based on artificial intelligence |
CN110705261A (en) * | 2019-09-26 | 2020-01-17 | 浙江蓝鸽科技有限公司 | Chinese text word segmentation method and system thereof |
CN110705261B (en) * | 2019-09-26 | 2023-03-24 | 浙江蓝鸽科技有限公司 | Chinese text word segmentation method and system thereof |
CN111259163A (en) * | 2020-01-14 | 2020-06-09 | 北京明略软件系统有限公司 | Knowledge graph generation method and device and computer readable storage medium |
CN112507714A (en) * | 2020-12-22 | 2021-03-16 | 北京百度网讯科技有限公司 | Text segmentation method and device |
CN112507714B (en) * | 2020-12-22 | 2023-06-23 | 北京百度网讯科技有限公司 | Text segmentation method and device |
Also Published As
Publication number | Publication date |
---|---|
CN109684638B (en) | 2023-08-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109684638A (en) | Sentence segmentation method and device, electronic equipment and computer readable storage medium | |
CN109344413B (en) | Translation processing method, translation processing device, computer equipment and computer readable storage medium | |
US9899019B2 (en) | Systems and methods for structured stem and suffix language models | |
CN110110041A (en) | Wrong word correcting method, device, computer installation and storage medium | |
CN107767870A (en) | Adding method, device and the computer equipment of punctuation mark | |
CN107729313B (en) | Deep neural network-based polyphone pronunciation distinguishing method and device | |
JP4769031B2 (en) | Method for creating language model, kana-kanji conversion method, apparatus, computer program, and computer-readable storage medium | |
CN104573099B (en) | The searching method and device of topic | |
CN107679032A (en) | Voice changes error correction method and device | |
CN108227565A (en) | A kind of information processing method, terminal and computer-readable medium | |
CN107608957A (en) | Text modification method, apparatus and its equipment based on voice messaging | |
CN109119079A (en) | voice input processing method and device | |
CN106952655A (en) | A kind of input method and terminal | |
CN113051371A (en) | Chinese machine reading understanding method and device, electronic equipment and storage medium | |
CN107832302B (en) | Word segmentation processing method and device, mobile terminal and computer readable storage medium | |
CN110348007A (en) | A kind of text similarity determines method and device | |
JP2009217665A (en) | Text editing apparatus | |
Shmidman et al. | Nakdan: Professional hebrew diacritizer | |
KR101929509B1 (en) | Device and method for composing morpheme | |
CN112668325A (en) | Machine translation enhancing method, system, terminal and storage medium | |
CN105683891A (en) | Inputting tone and diacritic marks by gesture | |
Jiang et al. | Braille to print translations for Chinese | |
Attia et al. | Fassieh, a semi-automatic visual interactive tool for morphological, PoS-Tags, phonetic, and semantic annotation of Arabic text corpora | |
CN107423293A (en) | The method and apparatus of data translation | |
KR102284903B1 (en) | Mehtod and apparatus for input sequence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |