CN108363693A - Text handling method and device - Google Patents
Text handling method and device Download PDFInfo
- Publication number
- CN108363693A CN108363693A CN201810149309.5A CN201810149309A CN108363693A CN 108363693 A CN108363693 A CN 108363693A CN 201810149309 A CN201810149309 A CN 201810149309A CN 108363693 A CN108363693 A CN 108363693A
- Authority
- CN
- China
- Prior art keywords
- text
- short sentence
- knowledge point
- processing
- answer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
An embodiment of the present invention provides a kind of text handling method and device, solve the problems, such as existing text-processing mode change or increase text-processing rule development cost is high, the project cycle is long and difficult in maintenance.This article treatment method provides one or more knowledge points, and each knowledge point includes problem and answer, and the content of described problem corresponds to received text content, and the content of the answer corresponds to text-processing mode, the method includes:Pending text is split into multiple short sentences;Each short sentence is matched with the described problem of each knowledge point;And when the described problem successful match of the short sentence and the knowledge point, text-processing is carried out to the short sentence according to the answer of the knowledge point.
Description
Technical field
The present invention relates to field of artificial intelligence, and in particular to a kind of text handling method and device.
Background technology
With the continuous improvement that the continuous development and people of artificial intelligence technology require interactive experience, intelligent interaction
Mode gradually starts to substitute some traditional man-machine interaction modes, and has become a research hotspot.Existing intelligence is handed over
Mutual mode will carry out the content of text of voice when carrying out semantic analysis to voice text-processing, however existing text-processing
Mode is the content needed based on regular expression search, then carries out corresponding text-processing to the content searched out again.It is this
Although simple text-processing effect may be implemented in text-processing mode in the case of less complicated, exist debugging it is difficult with
And increasing the difficult problem of text-processing rule, the text processing system realized is substantially " disposable " product.If repaiied
Change or increase new text-processing rule, then needs that regular expression and/or searching algorithm are carried out updating adjustment, in this way
Development cost close to a set of text processing system is developed again, significantly increase development cost, the project cycle is long and ties up
Shield is difficult.
Invention content
In view of this, an embodiment of the present invention provides a kind of text handling method and device, solves existing text-processing
Development cost height, project cycle length and the problem difficult in maintenance of text-processing rule are changed or increased to mode.
According to an aspect of the present invention, a kind of text handling method is provided, one or more knowledge points, Mei Gesuo are provided
It includes problem and answer to state knowledge point, and the content of described problem corresponds to received text content, and the content of the answer corresponds to text
Processing mode, the method includes:Pending text is split into multiple short sentences;It will each short sentence and each knowledge
The described problem of point is matched;And when the described problem successful match of the short sentence and the knowledge point, known according to this
The answer for knowing point carries out text-processing to the short sentence.
In one embodiment, one or more of knowledge points are stored in advance in an intellectual analysis engine;Wherein, institute
State by each short sentence and the described problem of each knowledge point match including:Described in each short sentence input
Intellectual analysis engine, wherein the intellectual analysis engine is configured to the institute of the short sentence that will be inputted and each knowledge point
The problem of stating is matched, and exports the answer of the knowledge point of the short sentence;Wherein, the institute according to the knowledge point
State answer includes to short sentence progress text-processing:According to the output result of the intellectual analysis engine to matching the knowledge
The short sentence of the described problem of point carries out text-processing.
In one embodiment, described to include by each short sentence input intellectual analysis engine:Fractionation is obtained
The multiple short sentence inputs the intellectual analysis engine one by one;Wherein, the output result according to the intellectual analysis engine
The short sentence of described problem to matching the knowledge point carries out text-processing:It is defeated one by one to the intellectual analysis engine
The corresponding short sentence of the output result gone out carries out corresponding text-processing.
In one embodiment, the method further includes:Increase or changes or delete in the intellectual analysis engine
The knowledge point.
In one embodiment, before being matched each short sentence with the described problem of each knowledge point,
The method further includes:The text formatting for recording the short sentence and/or the position in the pending text.
In one embodiment, described that each short sentence and the described problem of each knowledge point are subjected to matching packet
It includes:Each short sentence and the described problem of each knowledge point are subjected to Text similarity computing, most by text similarity
The big knowledge point is as the knowledge point with the short sentence successful match;Wherein, when the short sentence and the knowledge point
Described problem successful match when, according to the answer of the knowledge point to the short sentence carry out text-processing include:According to this
The answer of the knowledge point of successful match carries out text-processing to the short sentence.
In one embodiment, described pending text is split into multiple short sentences to include:It identifies pre- in pending text
If splitting symbol;And the content of text in two adjacent default fractionation symbols is split as a short sentence.
In one embodiment, it is described it is default split symbol include it is following it is several in it is one or more:Punctuation mark, newline
Number and default split word.
In one embodiment, the text-processing mode includes one or more combinations in following processing mode:It adjusts
The short sentence before default whole text formatting, extraction character or after default character, arranged according to preset rules content of text,
Increase default mark.
In one embodiment, described to include according to preset rules arrangement content of text:It inserts preset table or filling is pre-
If word template.
In one embodiment, the word template is using the form of semantic formula or the form of regular expression.
In one embodiment, the pending text is obtained by speech conversion process.
According to another aspect of the present invention, a kind of text processing apparatus is provided, including:Knowledge data base, including one
Or multiple knowledge points, each knowledge point includes problem and answer, and the content of described problem corresponds to received text content, described
The content of answer corresponds to text-processing mode;Module is split, is configured to pending text splitting into multiple short sentences;Match mould
Block is configured to match each short sentence with the described problem of each knowledge point;And text processing module, match
When being set to the described problem successful match when the short sentence and the knowledge point, according to the answer of the knowledge point to described short
Sentence carries out text-processing.
According to another aspect of the present invention, a kind of computer equipment is provided, including memory, processor and is stored in
The computer program executed by the processor on the memory, the processor are realized such as when executing the computer program
The step of preceding any text handling method.
According to another aspect of the present invention, a kind of computer readable storage medium is provided, computer journey is stored thereon with
Sequence, when the computer program is executed by processor realize it is preceding it is any as described in text handling method the step of.
A kind of text handling method and device provided in an embodiment of the present invention, by provide it is one or more include problem and
The knowledge point of answer, and the content of pending text is split into short sentence, the search and determination for pending content of text
The process of text-processing rule converts in order to which using short sentence as the question answering process of unit, the content of short sentence corresponds to the problem of puing question to, and knows
Know the corresponding typical problem of the problems in point, after being matched to corresponding knowledge point, the answer directly corresponding to knowledge point
Text-processing can be carried out, can expeditiously realize that the automation of big data quantity content of text arranges.Meanwhile when needing to increase
When adding or changing text-processing rule, it is only necessary to the problems in increase or modification knowledge point and answer so that text-processing
Rule easily can be edited flexibly, ensure that the versatility of product, it can be achieved that rapid deployment and debugging, greatly reduce exploitation
Cost.
Description of the drawings
Fig. 1 show one embodiment of the invention and provides a kind of flow diagram of text handling method.
Fig. 2 show one embodiment of the invention and provides a kind of split process in text handling method to pending text
Flow diagram.
Fig. 3 show the stream that one embodiment of the invention provides the calculating process of text similarity in a kind of text handling method
Journey schematic diagram.
Fig. 4 show a kind of principle schematic of text handling method of one embodiment of the invention offer.
Fig. 5 show a kind of structural schematic diagram of text processing apparatus of one embodiment of the invention offer.
Fig. 6 show a kind of structural schematic diagram of text processing apparatus of one embodiment of the invention offer.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this
Embodiment in invention, the every other reality that those of ordinary skill in the art are obtained without creative efforts
Example is applied, shall fall within the protection scope of the present invention.
Fig. 1 show one embodiment of the invention and provides a kind of flow diagram of text handling method.This article present treatment side
Method provides one or more knowledge points, and each knowledge point includes problem and answer, and the content of problem corresponds to received text content, answers
The content of case corresponds to text-processing mode.Correspondence between problem and answer can be pre-established by pre- learning process.Such as
Shown in Fig. 1, this article treatment method includes the following steps:
Step 101:Pending text is split into multiple short sentences.
By the way that pending text is split into multiple short sentences so that each short sentence becomes one of text-processing substantially singly
Member, each short sentence subsequently obtained again to fractionation is matched with knowledge point, in a manner of the corresponding text-processing of determination.It is pending
Text can come from third-party data content (such as from third-party contract text or customer information text), also may be used
To be the content of text that is obtained by upper voice conversion process (such as in the text that is converted into of the voice recording under customer service scene
Hold), the present invention does not limit the particular content of pending text.
In an embodiment of the present invention, the split process to pending text can be completed as follows, such as Fig. 2 institutes
Show, which may include following steps:
Step 1011:Identify the default fractionation symbol in pending text.
It is default split symbol may include it is following it is several in it is one or more:Punctuation mark, line feed symbol and default fractionation word.
The default particular content for splitting word can be depending on actual application scenarios, and the present invention does not do the default particular content for splitting word
It limits.
For example, for following contract text content:
" borrower:I companies
Legal representative (responsible person):H user
Legal address:The areas X of the cities the S roads Y Z "
Can be arranged colon ":" and hiding line feed symbol be that default split accords with.When pending text is identified,
It can be accorded with according to the default fractionation in the pending text of scheduled traversal order (such as progressively scanning from left to right) identification.
Step 1012:Content of text in adjacent two default fractionation symbols is split as a short sentence.
For example, following short sentence just can be obtained in said contract content of text after split process:" borrower:", " I is public
Department ", " legal representative (responsible person):", " H user ", " legal address:" and " areas X of the cities the S roads Y Z ".And obtained by splitting
Each short sentence can carry out subsequent matching process as basic unit and the knowledge point of a text-processing.
Step 102:Each short sentence is matched with the problem of each knowledge point.
As previously mentioned, each knowledge point includes problem and answer, the content of problem corresponds to received text content, answer it is interior
Hold corresponding text-processing mode.By the way that each short sentence is matched with the problem of knowledge point, you can determine matched standard text
This content, and then determine corresponding answer, i.e., corresponding text-processing mode.For example, using above-mentioned contract text content as example,
The knowledge point provided just may include that " borrower's information please be provided" and " legal representative's information please be provide" two knowledge points
The problem of.In this way by short sentence " borrower:" when being matched with the problem of knowledge point provided, then it can determine whether matched knowledge
The problem of point is that " please provide borrower's information", it can be to short sentence " borrower according to the answer of the knowledge point:" make it is corresponding
Text-processing;And by short sentence " legal representative (responsible person):" when being matched with the problem of knowledge point provided, then it can be true
The problem of fixed matched knowledge point is that " please provide legal representative's information", it can be " legal to short sentence according to the answer of the knowledge point
Representative (responsible person):" make corresponding text-processing.
In an embodiment of the present invention, text can be passed through by the problem of each short sentence and each knowledge point being carried out matched process
The mode of this similarity calculation is realized, specifically, as shown in figure 3, the calculating process of text similarity may include walking as follows
Suddenly:
Step 1021:The problem of each short sentence and each knowledge point, is subjected to Text similarity computing.
Step 1022:Using the maximum knowledge point of text similarity as the knowledge point with short sentence successful match.
When the short sentence and a knowledge point the problem of when successful match, subsequently then according to the answer of the knowledge point of the successful match
Text-processing is carried out to short sentence.One or more realizations in following computational methods can be used in text similarity calculation process:
Editing distance computational methods, n-gram computational methods, JaroWinkler computational methods and Soundex computational methods.However
The present invention does not do considered critical to the specific implementation of the similarity calculation.
In an embodiment of the present invention, it is contemplated that split obtained short sentence might have various deformation, for further
The intelligent experience of text-processing mode is improved, the mode that semantic template can be used in the problems in knowledge point is realized.
Semantic template can be the set for the one or more abstract semantics expression formulas for indicating a certain semantic content, by developing
Personnel combine semantic content to generate according to scheduled rule, i.e., can describe corresponding semantic content by a semantic template
The sentence of a variety of difference expression ways, the obtained possible various deformation of short sentence is split with reply.In this way will split obtain it is short
Sentence is matched with preset semantic template, is avoided and is identified using being only capable of describing a kind of standard semantic template of expression way
Limitation when user message.
Each abstract semantics expression formula mainly may include semantic component word and semantic rules word.Semantic component word is by semanteme
Ingredient symbol indicates, miscellaneous specific language can be expressed after these semantic components symbol is filled with corresponding value (i.e. content)
Justice.
The semantic component symbol of abstract semantics may include:
[concept]:Indicate the word or phrase of main body or object composition.
Such as:" CRBT " in " how open-minded CRBT is ".
[action]:Expression acts the word or phrase of ingredient.
Such as:" handling " in " how credit card is handled ".
[attribute]:Indicate the word or phrase of attribute composition.
Such as:" color " in " which color iphone has ".
[adjective]:Indicate the word or phrase of ornamental equivalent.
Such as:" cheap " in " which brand of refrigerator is cheap ".
Some main abstract semantics classification examples have:
What conceptual illustration [concept] is
Attribute constitutes which [attribute] [concept] has
Behavior [concept] how [action]
Behavior place [concept] is somewhere [action]
Why behavioral reasons [concept] can [action]
Behavior prediction [concept] can or can not [action]
Behavior judges [concept] either with or without [attribute]
[attribute] of attribute situation [concept] is [adjective]
Whether determined property [concept] has [attribute]
Why so [adjective] [attribute] of attribute reason [concept]
Where are the difference of proximate nutrition [concept1] and [concept2]
[attribute] that attribute compares [concept1] and [concept2] has any difference
Question sentence can do general judge in the composition judgement of abstract semantics level by part-of-speech tagging, concept pairs
The part of speech answered is noun, and the corresponding parts of speech of action are verb, the corresponding parts of speech of attribute are noun, adjective correspondences
Be adjective.
By the abstract semantics [concept] that classification is " behavior " how for [action], the abstract language of the category
It may include a plurality of abstract semantics expression formula under justice set:
Abstract semantics classification:Behavior
Abstract semantics expression formula:
A. [concept] [need | should] [how]<[can with]><It carries out>[action]
B. { [concept]~[action] }
c.[concept]<'s>[action]<Method | mode | step>
d.<Which has | what has | either with or without><Pass through | use |>[concept][action]<'s>[method]
E. [how] [action]~[concept]
Tetra- abstract semantics expression formulas of above-mentioned a, b, c, d are all for describing " behavior " this abstract semantics classification
's.Semantic symbol " | " expression "or" relationship, semantic symbol "" indicate that the ingredient is not essential.
It should be appreciated, however, that although the example of some semantic component words, semantic rules word and semantic symbol is presented above,
But the particular content and part of speech of semantic component word, the definition of the particular content and part of speech and semantic symbol of semantic rules word and take
With can all be preset according to the specific interactive service scene that the intelligent interactive method is applied by developer, the present invention to this simultaneously
It does not limit.
In a further embodiment, when identifying semantic component word and semantic rules word in the short sentence that fractionation obtains
When, included semantic component word and semantic rules word can be also converted to simplified in user speech operation information and semantic template
Then text-string carries out similarity calculation using these text-strings, to improve the efficiency of similarity calculation.
In an embodiment of the present invention, as previously mentioned, semantic template can be made of semantic component word and semantic rules word, and
The grammer of these semantic component words and semantic rules word again between part of speech of these words in semantic template and word closes
It is related, therefore the similarity calculation process can be specially:First identify word, the word in user speech operation information text
Part of speech and grammatical relation, semantic component word and semanteme therein are then identified according to the part of speech of word and grammatical relation
Regular word, then the semantic component word identified and semantic rules word introduced into vector space model with calculate split obtain it is short
Multiple text similarities between sentence and multiple preset semantic templates.It in an embodiment of the present invention, can the side of participle as follows
Word in the short sentence that one or more identifications in method are split, the grammatical relation between the part of speech and word of word:
Hidden markov model approach, Forward Maximum Method method, reverse maximum matching process and name entity recognition method.
In an embodiment of the present invention, as previously mentioned, semantic template can be to indicate that the multiple of a certain semantic content are abstracted
The set of semantic formula can describe a variety of different expression ways of corresponding semantic content by a semantic template at this time
Sentence, with multiple extension expression ways of the same standard expression way of correspondence.Therefore calculate split obtained short sentence with it is pre-
If semantic template between text similarity when, need to calculate and split obtained short sentence with multiple preset semantic templates respectively
At least one extension of expansion ask between text similarity, then ask the highest extension of text similarity to corresponding semanteme
Template is as matched semantic template.These expansion extensions ask can according to included by semantic template semantic component word and/or
Semantic rules word and/or semantic symbol and obtain.
For example, " borrower's information please be provide", " whom borrower is" can be the same semantic template " [please carry
For] [borrower | creditor] [information | identity] [it is | be] [whose] " two different extensions of expansion ask.When what is split out
Short sentence is " borrower:" when, it then can determine that the highest extension of text similarity is asked as " borrower is by Text similarity computing
Who", it can determine matched semantic template at this time.And when the short sentence split out is " borrower's information:", although passing through text
Similarity calculation determines that the highest extension of text similarity is asked as that " please provide borrower's information", but can determine matched semanteme
Template is still above-mentioned semantic template.It can be seen that being indicated in a manner of semantic template by knowledge point the problem of, avoid
Limitation when user message is identified using being only capable of describing a kind of standard semantic template of expression way.
Step 103:When the short sentence and knowledge point the problem of when successful match, according to the answer of the knowledge point to short sentence into style of writing
Present treatment.
When matched knowledge point is determined by matching process (for example, by using the side of Text similarity computing above-mentioned
Formula), that is, the answer in corresponding knowledge point is determined.Short sentence can be carried out at text according to the answer of the matched knowledge point
Reason.In an embodiment of the present invention, text-processing mode may include one or more combinations in following processing mode:Adjustment
Before default text formatting (such as adjustment font size, font model, font color etc.), extraction character or after default character
Short sentence, according to preset rules arrange content of text (such as inserting preset table or the preset word template of filling), increase
Default mark.
In an embodiment of the present invention, when text-processing mode includes inserting preset word template, word template can
Using the form of semantic formula or the form of regular expression.The present invention does not limit the concrete form of this article character matrix plate.
For example, using above-mentioned contract text content as example, " borrower " and " legal representative " two received text contents
Corresponding answer, which may respectively be, to be designated as purple and is designated as yellow, splits obtained short sentence " borrower in this way:" and " legal generation
Table people (responsible person):" purple and yellow will be denoted as in pending text.For another example " borrower " and " legal generation
Two corresponding answers of received text content of table people " may respectively be in the preset table of filling, and will be after corresponding short sentence
The short sentence in face also inserts the predeterminated position of the preset table, and text-processing result obtained in this way can be as shown in the table:
Borrower: | I companies |
Legal representative (responsible person): | H user |
In another implementation of the present invention, other than carrying out text-processing to current short sentence, text-processing mode may be used also
Including the short sentence before default character of extraction or after default character, and text-processing is also carried out to the short sentence of the extraction.More than
It is example to state contract text content, and " borrower " and " legal representative " two corresponding answers of received text content can divide
Purple Wei be designated as and be designated as yellow, simultaneously " borrower:" the subsequent short sentence of short sentence " I companies " can also be designated as purple, " legal generation
Table people (responsible person):" the subsequent short sentence of short sentence " H user " can also be designated as yellow, specifically set according to user using which kind of mode
Text-processing mode determine.
It should be appreciated, however, that the particular content of the answer (i.e. text-processing mode) in knowledge point can basis with processing rule
Actual application scenarios and design or adjust, the present invention is to the concrete form of the answer in knowledge point and is not specifically limited.
It can be seen that a kind of text handling method provided in an embodiment of the present invention includes problem by providing one or more
With the knowledge point of answer, and the content of pending text is split into short sentence, the search and really for pending content of text
The process for determining text-processing rule converts in order to which using short sentence as the question answering process of unit, the content of short sentence corresponds to the problem of puing question to,
The problems in knowledge point corresponds to typical problem, after being matched to corresponding knowledge point, direct answering corresponding to knowledge point
Case can carry out text-processing, can expeditiously realize that the automation of big data quantity content of text arranges.Meanwhile working as needs
When increasing or changing text-processing rule, it is only necessary to the problems in increase or modification knowledge point and answer so that at text
Reason rule easily can be edited flexibly, ensure that the versatility of product, it can be achieved that rapid deployment and debugging, greatly reduce out
Send out cost.
Fig. 4 show a kind of principle schematic of text handling method of one embodiment of the invention offer.As shown in figure 4,
The text handling method that the embodiment is provided is based on an intellectual analysis engine implementation, is prestored in the intellectual analysis engine
One or more knowledge points.In this way when being matched each short sentence with the problem of each knowledge point, being in fact will be each
Short sentence inputs intellectual analysis engine, and intellectual analysis engine is configured to the short sentence that will be inputted and progress the problem of each knowledge point
Match, and exports the answer of the knowledge point of short sentence.According to the output result of intellectual analysis engine again to matching knowledge point the problem of
Short sentence carries out text-processing.
It should be appreciated that the intellectual analysis engine can be soft on text processing system front end or cloud server by being arranged
Part program realize, by the short sentence for splitting pending text input the intellectual analysis engine can directly determine it is corresponding
Text-processing mode, entire text-processing process is convenient and efficient, can be realized to different types of pending text general.Meanwhile
When text-processing rule to be changed, directly increase or change or delete the knowledge point in intellectual analysis engine, and does not have to
Again text processing system is developed, development cost is further reduced.
In an embodiment of the present invention, it is contemplated that can obtain many short sentences after pending text deconsolidation process, it is therefore desirable to
These are split with the orderly processing of obtained short sentence to ensure the efficiency and accuracy of text-processing, may be configured as to split at this time
Obtained multiple short sentences input intellectual analysis engine simultaneously one by one, and the output result exported one by one to intellectual analysis engine is corresponding
Short sentence carries out corresponding text-processing.
In a further embodiment, due to it will split obtained short sentence input intellectual analysis engine after, identified text
Present treatment mode may realize (such as the font size for the content of text for adjusting current short sentence in original pending text
Or background color etc.), at this time with regard to needing before being matched each short sentence with the problem of each knowledge point, further record
The text formatting of short sentence and/or the position in pending text, it is corresponding in order to be carried out after determining text-processing mode
Text-processing process.
Fig. 5 show a kind of structural schematic diagram of text processing apparatus of one embodiment of the invention offer.As shown in figure 5,
Text processing unit 50 includes:Knowledge data base 51 splits module 52, matching module 53 and text processing module 54.Specifically
For, knowledge data base 51 includes one or more knowledge points, and each knowledge point includes problem and answer, and the content of problem corresponds to
The content of received text content, answer corresponds to text-processing mode.Fractionation module 52 is configured to split into pending text more
A short sentence.Matching module 53 is configured to match each short sentence with the problem of each knowledge point.Text processing module 54 is matched
When being set to the successful match when the short sentence and knowledge point the problem of, text-processing is carried out to short sentence according to the answer of the knowledge point.
A kind of text processing apparatus 50 provided in an embodiment of the present invention includes problem and answer by providing one or more
Knowledge point, and the content of pending text is split into short sentence, for pending content of text search and determine text
The process of processing rule converts in order to which using short sentence as the question answering process of unit, the content of short sentence corresponds to the problem of puing question to, knowledge point
The problems in corresponding typical problem, after being matched to corresponding knowledge point, the answer directly corresponding to knowledge point
Text-processing is carried out, can expeditiously realize that the automation of big data quantity content of text arranges.Meanwhile when need increase or
When changing text-processing rule, it is only necessary to the problems in increase or modification knowledge point and answer so that text-processing rule
Easily can flexibly edit, ensure that the versatility of product, it can be achieved that rapid deployment and debugging, greatly reduce exploitation at
This.
In an embodiment of the present invention, as shown in fig. 6, text processing unit 50 can further comprise:Input module 55
With intellectual analysis engine 56.Input module 55 is configured to each short sentence input intellectual analysis engine 56.51 He of knowledge data base
Matching module 53 is previously integrated in intellectual analysis engine 56, intellectual analysis engine 56 be configured to by the short sentence of input with each know
The problem of knowing point is matched, and exports the answer of the knowledge point of short sentence.Wherein, text processing module 54 is further configured to:
The short sentence of the problem of according to the output result of intellectual analysis engine 56 to matching knowledge point carries out text-processing.
It is that each short sentence is inputted into intelligence in fact in this way when being matched each short sentence with the problem of each knowledge point
Analysis engine 56, intellectual analysis engine 56 is configured to match the short sentence of input with the problem of each knowledge point, and exports
The answer of the knowledge point of short sentence.According to the output result of intellectual analysis engine 56 again to matching knowledge point the problem of short sentence carry out
Text-processing.
It should be appreciated that the intellectual analysis engine 56 can be by being arranged on text processing system front end or cloud server
Software program realizes that inputting the intellectual analysis engine 56 by the short sentence for splitting pending text can directly determine pair
The text-processing mode answered, entire text-processing process is convenient and efficient, can be realized to different types of pending text general.Together
When, in an embodiment of the present invention, as shown in Fig. 6, text processing unit 50 can further comprise:Editor module 57, configuration
To increase or changing or deleting the knowledge point in intellectual analysis engine 56.In this way when needing to change text-processing rule, just not
With text processing system is developed again, development cost further reduced.
In an embodiment of the present invention, it is contemplated that can obtain many short sentences after pending text deconsolidation process, it is therefore desirable to
These are split with the orderly processing of obtained short sentence to ensure that the efficiency and accuracy of text-processing, input module 55 can be further
It is configured to:Obtained multiple short sentences will be split and input intellectual analysis engine 56 one by one.At this point, text processing module 54 can be further
It is configured to:Corresponding text-processing is carried out to the corresponding short sentence of output result that intellectual analysis engine 56 exports one by one.
In an embodiment of the present invention, identified after since obtained short sentence input intellectual analysis engine 56 will be split
Text-processing mode may realize that (such as the font for the content of text for adjusting current short sentence is big in original pending text
Small or background color etc.), at this time as shown in fig. 6, text processing unit 50 can further comprise:Logging modle 58, is configured to
The text formatting of short sentence is recorded before being matched each short sentence with the problem of each knowledge point and/or in pending text
In position.
In an embodiment of the present invention, the problem of matching module 53 is further configured to each short sentence and each knowledge point
Text similarity computing is carried out, using the maximum knowledge point of text similarity as the knowledge point with short sentence successful match.At this point, literary
The answer of the knowledge point of the successful match carries out text-processing to short sentence according to present treatment module 54 can be further configured.This article
One or more realizations in following computational methods can be used in this similarity calculation process:Editing distance computational methods, n-gram
Computational methods, JaroWinkler computational methods and Soundex computational methods.However the present invention is to the tool of the similarity calculation
Body realization method does not do considered critical.
In an embodiment of the present invention, may include as shown in fig. 6, splitting module 52:Recognition unit 521 and fractionation execute
Unit 522.Recognition unit 521 is configured to identify that the default fractionation in pending text accords with.Split execution unit 522 be configured to by
The default content of text split in symbol of adjacent two is split as a short sentence.
In an embodiment of the present invention, preset split symbol may include it is following it is several in it is one or more:Punctuation mark changes
Row symbol and default fractionation word.
In an embodiment of the present invention, text-processing mode may include one or more groups in following processing mode
It closes:It adjusts the short sentence before default text formatting, extraction character or after default character, arranged in text according to preset rules
Hold, increase default mark.
In an embodiment of the present invention, arranging content of text according to preset rules includes:Insert preset table or filling
Preset word template.
It should be appreciated that the particular content of the answer (i.e. text-processing mode) in knowledge point can be according to reality with processing rule
Application scenarios and design or adjust, the present invention is to the concrete form to the answer in knowledge point and is not specifically limited.
In an embodiment of the present invention, the form of semantic formula or the form of regular expression can be used in word template.
In an embodiment of the present invention, pending text can be obtained by speech conversion process.Pending text can be
Come from third-party data content (such as from third-party contract text or customer information text), can also be by upper
The content of text (such as the voice recording under customer service scene be converted into content of text) that voice conversion process obtains, the present invention couple
The particular content of pending text does not limit.
One embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in memory
On the computer program that is executed by processor, processor realizes the text such as preceding any embodiment described in when executing computer program
The step for the treatment of method.
One embodiment of the invention also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates
It is realized when machine program is executed by processor as described in preceding any embodiment the step of text handling method.The computer stores
Medium can be any tangible media, such as floppy disk, CD-ROM, DVD, hard disk drive, even network medium etc..
It should be appreciated that each module or unit described in the text processing apparatus 50 that above-described embodiment is provided with it is preceding
The method and step stated is corresponding.The operation of method and step description above-mentioned and feature are equally applicable to text-processing as a result,
Device 50 and its included in corresponding module and unit, repeat content details are not described herein.
Although it should be appreciated that can be computer program production the foregoing describe a kind of way of realization of embodiment of the present invention
Product, but the method or apparatus of embodiments of the present invention can be come in fact according to the combination of software, hardware or software and hardware
It is existing.Hardware components can be realized using special logic;Software section can store in memory, by instruction execution appropriate
System, such as microprocessor or special designs hardware execute.It will be understood by those skilled in the art that above-mentioned side
Method and equipment can be realized using computer executable instructions and/or be included in the processor control code, such as such as
The programmable memory or such as optics of disk, the mounting medium of CD or DVD-ROM, such as read-only memory (firmware)
Or such code is provided in the data medium of electrical signal carrier.Methods and apparatus of the present invention can be advised by such as super large
The semiconductor or such as field programmable gate array of vlsi die or gate array, logic chip, transistor etc., can
The hardware circuit of the programmable hardware device of programmed logic equipment etc. realizes, can also be with being executed by various types of processors
Software realization can also be realized by the combination such as firmware of above-mentioned hardware circuit and software.
It will be appreciated that though it is referred to several modules or unit of device in the detailed description above, but this stroke
It point is merely exemplary rather than enforceable.In fact, according to an illustrative embodiment of the invention, above-described two or
The more feature and function of multimode/unit can realize in a module/unit, conversely, an above-described module/mono-
The feature and function of member can be further divided into be realized by multiple module/units.In addition, above-described certain module/
Unit can be omitted under certain application scenarios.
It should be appreciated that in order not to obscure embodiments of the present invention, specification only to some it is crucial, may not necessary technology
It is described with feature, and the feature that can may do not realized to some those skilled in the art is explained.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
Within god and principle, made by any modification, equivalent replacement etc., should all be included in the protection scope of the present invention.
Claims (15)
1. a kind of text handling method, which is characterized in that provide one or more knowledge points, each knowledge point includes problem
And answer, the content of described problem correspond to received text content, the content of the answer corresponds to text-processing mode, the method
Including:
Pending text is split into multiple short sentences;
Each short sentence is matched with the described problem of each knowledge point;And
When the described problem successful match of the short sentence and the knowledge point, according to the answer of the knowledge point to described short
Sentence carries out text-processing.
2. according to the method described in claim 1, it is characterized in that, one or more of knowledge points are stored in advance in an intelligence
In energy analysis engine;
Wherein, it is described by each short sentence and the described problem of each knowledge point match including:It will be each described
Short sentence inputs the intellectual analysis engine, wherein the intellectual analysis engine is configured to the short sentence that will be inputted and each institute
The described problem for stating knowledge point is matched, and exports the answer of the knowledge point of the short sentence;
Wherein, the answer according to the knowledge point to the short sentence carry out text-processing include:According to the intelligence point
The output result for analysing engine carries out text-processing to the short sentence for matching the described problem of the knowledge point.
3. according to the method described in claim 2, it is characterized in that, described draw each short sentence input intellectual analysis
Hold up including:Obtained the multiple short sentence will be split and input the intellectual analysis engine one by one;
Wherein, the output result according to the intellectual analysis engine to match the knowledge point described problem it is described short
Sentence carries out text-processing:The corresponding short sentence of the output result that the intellectual analysis engine exports one by one is carried out
Corresponding text-processing.
4. according to the method described in claim 2, it is characterized in that, further comprising:
Increase or change or delete the knowledge point in the intellectual analysis engine.
5. according to the method described in claim 1, it is characterized in that, by the institute of each short sentence and each knowledge point
Before the problem of stating is matched, further comprise:
The text formatting for recording the short sentence and/or the position in the pending text.
6. according to the method described in claim 1, it is characterized in that, described by each short sentence and each knowledge point
Described problem carries out matching:
Each short sentence and the described problem of each knowledge point are subjected to Text similarity computing, most by text similarity
The big knowledge point is as the knowledge point with the short sentence successful match;Wherein, when the short sentence and the knowledge point
Described problem successful match when, according to the answer of the knowledge point to the short sentence carry out text-processing include:
Text-processing is carried out to the short sentence according to the answer of the knowledge point of the successful match.
7. according to the method described in claim 1, it is characterized in that, described split into multiple short sentences by pending text and include:
Identify the default fractionation symbol in pending text;And
Content of text in two adjacent default fractionation symbols is split as a short sentence.
8. the method according to the description of claim 7 is characterized in that it is described it is default split symbol include it is following it is several in one kind or
It is a variety of:Punctuation mark, line feed symbol and default fractionation word.
9. according to the method described in claim 1, it is characterized in that, the text-processing mode includes in following processing mode
One or more combinations:Adjust the short sentence before default text formatting, extraction character or after default character, according to pre-
If rule arranges content of text, increases default mark.
10. according to the method described in claim 9, it is characterized in that, described include according to preset rules arrangement content of text:It fills out
Enter preset table or the preset word template of filling.
11. according to the method described in claim 10, it is characterized in that, the word template using semantic formula form or
The form of regular expression.
12. according to the method described in claim 1, it is characterized in that, the pending text is obtained by speech conversion process.
13. a kind of text processing apparatus, which is characterized in that including:
Knowledge data base, including one or more knowledge points, each knowledge point include problem and answer, described problem it is interior
Hold corresponding received text content, the content of the answer corresponds to text-processing mode;
Module is split, is configured to pending text splitting into multiple short sentences;
Matching module is configured to match each short sentence with the described problem of each knowledge point;And
Text processing module, when being configured to the described problem successful match when the short sentence and the knowledge point, according to the knowledge
The answer of point carries out text-processing to the short sentence.
14. a kind of computer equipment, including memory, processor and being stored on the memory is executed by the processor
Computer program, which is characterized in that the processor is realized when executing the computer program as appointed in claim 1 to 12
The step of one the method.
15. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
It is realized when being executed by processor such as the step of any one of claim 1 to 12 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810149309.5A CN108363693A (en) | 2018-02-13 | 2018-02-13 | Text handling method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810149309.5A CN108363693A (en) | 2018-02-13 | 2018-02-13 | Text handling method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108363693A true CN108363693A (en) | 2018-08-03 |
Family
ID=63002713
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810149309.5A Pending CN108363693A (en) | 2018-02-13 | 2018-02-13 | Text handling method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108363693A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109460453A (en) * | 2018-10-09 | 2019-03-12 | 北京来也网络科技有限公司 | Data processing method and device for positive negative sample |
CN109614463A (en) * | 2018-10-24 | 2019-04-12 | 阿里巴巴集团控股有限公司 | Text matches processing method and processing device |
CN109840274A (en) * | 2018-12-28 | 2019-06-04 | 北京百度网讯科技有限公司 | Data processing method and device, storage medium |
CN111191421A (en) * | 2019-12-30 | 2020-05-22 | 出门问问信息科技有限公司 | Text processing method and device, computer storage medium and electronic equipment |
CN111507082A (en) * | 2020-04-23 | 2020-08-07 | 北京奇艺世纪科技有限公司 | Text processing method and device, storage medium and electronic device |
CN111967270A (en) * | 2020-08-16 | 2020-11-20 | 云知声智能科技股份有限公司 | Method and equipment based on character and semantic fusion |
CN112100976A (en) * | 2020-09-24 | 2020-12-18 | 上海松鼠课堂人工智能科技有限公司 | Knowledge point relation marking method and system |
CN112632258A (en) * | 2020-12-30 | 2021-04-09 | 太平金融科技服务(上海)有限公司 | Text data processing method and device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927358A (en) * | 2014-04-15 | 2014-07-16 | 清华大学 | Text search method and system |
CN104850539A (en) * | 2015-05-28 | 2015-08-19 | 宁波薄言信息技术有限公司 | Natural language understanding method and travel question-answering system based on same |
CN105893524A (en) * | 2016-03-31 | 2016-08-24 | 上海智臻智能网络科技股份有限公司 | Intelligent asking and answering method and device |
CN106649209A (en) * | 2016-12-30 | 2017-05-10 | 深圳天珑无线科技有限公司 | Text display method and device |
-
2018
- 2018-02-13 CN CN201810149309.5A patent/CN108363693A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927358A (en) * | 2014-04-15 | 2014-07-16 | 清华大学 | Text search method and system |
CN104850539A (en) * | 2015-05-28 | 2015-08-19 | 宁波薄言信息技术有限公司 | Natural language understanding method and travel question-answering system based on same |
CN105893524A (en) * | 2016-03-31 | 2016-08-24 | 上海智臻智能网络科技股份有限公司 | Intelligent asking and answering method and device |
CN106649209A (en) * | 2016-12-30 | 2017-05-10 | 深圳天珑无线科技有限公司 | Text display method and device |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109460453A (en) * | 2018-10-09 | 2019-03-12 | 北京来也网络科技有限公司 | Data processing method and device for positive negative sample |
CN109614463A (en) * | 2018-10-24 | 2019-04-12 | 阿里巴巴集团控股有限公司 | Text matches processing method and processing device |
CN109614463B (en) * | 2018-10-24 | 2023-02-03 | 创新先进技术有限公司 | Text matching processing method and device |
CN109840274B (en) * | 2018-12-28 | 2021-11-30 | 北京百度网讯科技有限公司 | Data processing method and device and storage medium |
CN109840274A (en) * | 2018-12-28 | 2019-06-04 | 北京百度网讯科技有限公司 | Data processing method and device, storage medium |
CN111191421A (en) * | 2019-12-30 | 2020-05-22 | 出门问问信息科技有限公司 | Text processing method and device, computer storage medium and electronic equipment |
CN111191421B (en) * | 2019-12-30 | 2023-09-12 | 出门问问创新科技有限公司 | Text processing method and device, computer storage medium and electronic equipment |
CN111507082A (en) * | 2020-04-23 | 2020-08-07 | 北京奇艺世纪科技有限公司 | Text processing method and device, storage medium and electronic device |
CN111967270A (en) * | 2020-08-16 | 2020-11-20 | 云知声智能科技股份有限公司 | Method and equipment based on character and semantic fusion |
CN111967270B (en) * | 2020-08-16 | 2023-11-21 | 云知声智能科技股份有限公司 | Method and equipment based on fusion of characters and semantics |
CN112100976A (en) * | 2020-09-24 | 2020-12-18 | 上海松鼠课堂人工智能科技有限公司 | Knowledge point relation marking method and system |
CN112100976B (en) * | 2020-09-24 | 2021-11-16 | 上海松鼠课堂人工智能科技有限公司 | Knowledge point relation marking method and system |
CN112632258A (en) * | 2020-12-30 | 2021-04-09 | 太平金融科技服务(上海)有限公司 | Text data processing method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108363693A (en) | Text handling method and device | |
CN109241538B (en) | Chinese entity relation extraction method based on dependency of keywords and verbs | |
CN105718586B (en) | The method and device of participle | |
WO2018000272A1 (en) | Corpus generation device and method | |
CN106997341B (en) | A kind of innovation scheme matching process, device, server and system | |
CN107463553A (en) | For the text semantic extraction, expression and modeling method and system of elementary mathematics topic | |
CN108984661A (en) | Entity alignment schemes and device in a kind of knowledge mapping | |
CN103020230A (en) | Semantic fuzzy matching method | |
CN109614620B (en) | HowNet-based graph model word sense disambiguation method and system | |
CN109960756A (en) | Media event information inductive method | |
Rachman et al. | CBE: Corpus-based of emotion for emotion detection in text document | |
CN106649250A (en) | Method and device for identifying emotional new words | |
Gómez-Adorno et al. | A graph based authorship identification approach | |
CN110717045A (en) | Letter element automatic extraction method based on letter overview | |
CN113392182A (en) | Knowledge matching method, device, equipment and medium fusing context semantic constraints | |
Iosif et al. | From speaker identification to affective analysis: a multi-step system for analyzing children’s stories | |
CN113312922A (en) | Improved chapter-level triple information extraction method | |
Wax | Automated grammar engineering for verbal morphology | |
CN113361252B (en) | Text depression tendency detection system based on multi-modal features and emotion dictionary | |
CN107622047B (en) | Design decision knowledge extraction and expression method | |
CN110008807A (en) | A kind of training method, device and the equipment of treaty content identification model | |
CN110888983B (en) | Positive and negative emotion analysis method, terminal equipment and storage medium | |
Han et al. | A novel part of speech tagging framework for nlp based business process management | |
CN110413779B (en) | Word vector training method, system and medium for power industry | |
Khankasikam | Knowledge capture for Thai word segmentation by using CommonKADS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180803 |
|
RJ01 | Rejection of invention patent application after publication |