CN108363693A - Text handling method and device - Google Patents

Text handling method and device Download PDF

Info

Publication number
CN108363693A
CN108363693A CN201810149309.5A CN201810149309A CN108363693A CN 108363693 A CN108363693 A CN 108363693A CN 201810149309 A CN201810149309 A CN 201810149309A CN 108363693 A CN108363693 A CN 108363693A
Authority
CN
China
Prior art keywords
text
short sentence
knowledge point
processing
answer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810149309.5A
Other languages
Chinese (zh)
Inventor
李陟
朱频频
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhizhen Intelligent Network Technology Co Ltd
Original Assignee
Shanghai Zhizhen Intelligent Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhizhen Intelligent Network Technology Co Ltd filed Critical Shanghai Zhizhen Intelligent Network Technology Co Ltd
Priority to CN201810149309.5A priority Critical patent/CN108363693A/en
Publication of CN108363693A publication Critical patent/CN108363693A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

An embodiment of the present invention provides a kind of text handling method and device, solve the problems, such as existing text-processing mode change or increase text-processing rule development cost is high, the project cycle is long and difficult in maintenance.This article treatment method provides one or more knowledge points, and each knowledge point includes problem and answer, and the content of described problem corresponds to received text content, and the content of the answer corresponds to text-processing mode, the method includes:Pending text is split into multiple short sentences;Each short sentence is matched with the described problem of each knowledge point;And when the described problem successful match of the short sentence and the knowledge point, text-processing is carried out to the short sentence according to the answer of the knowledge point.

Description

Text handling method and device
Technical field
The present invention relates to field of artificial intelligence, and in particular to a kind of text handling method and device.
Background technology
With the continuous improvement that the continuous development and people of artificial intelligence technology require interactive experience, intelligent interaction Mode gradually starts to substitute some traditional man-machine interaction modes, and has become a research hotspot.Existing intelligence is handed over Mutual mode will carry out the content of text of voice when carrying out semantic analysis to voice text-processing, however existing text-processing Mode is the content needed based on regular expression search, then carries out corresponding text-processing to the content searched out again.It is this Although simple text-processing effect may be implemented in text-processing mode in the case of less complicated, exist debugging it is difficult with And increasing the difficult problem of text-processing rule, the text processing system realized is substantially " disposable " product.If repaiied Change or increase new text-processing rule, then needs that regular expression and/or searching algorithm are carried out updating adjustment, in this way Development cost close to a set of text processing system is developed again, significantly increase development cost, the project cycle is long and ties up Shield is difficult.
Invention content
In view of this, an embodiment of the present invention provides a kind of text handling method and device, solves existing text-processing Development cost height, project cycle length and the problem difficult in maintenance of text-processing rule are changed or increased to mode.
According to an aspect of the present invention, a kind of text handling method is provided, one or more knowledge points, Mei Gesuo are provided It includes problem and answer to state knowledge point, and the content of described problem corresponds to received text content, and the content of the answer corresponds to text Processing mode, the method includes:Pending text is split into multiple short sentences;It will each short sentence and each knowledge The described problem of point is matched;And when the described problem successful match of the short sentence and the knowledge point, known according to this The answer for knowing point carries out text-processing to the short sentence.
In one embodiment, one or more of knowledge points are stored in advance in an intellectual analysis engine;Wherein, institute State by each short sentence and the described problem of each knowledge point match including:Described in each short sentence input Intellectual analysis engine, wherein the intellectual analysis engine is configured to the institute of the short sentence that will be inputted and each knowledge point The problem of stating is matched, and exports the answer of the knowledge point of the short sentence;Wherein, the institute according to the knowledge point State answer includes to short sentence progress text-processing:According to the output result of the intellectual analysis engine to matching the knowledge The short sentence of the described problem of point carries out text-processing.
In one embodiment, described to include by each short sentence input intellectual analysis engine:Fractionation is obtained The multiple short sentence inputs the intellectual analysis engine one by one;Wherein, the output result according to the intellectual analysis engine The short sentence of described problem to matching the knowledge point carries out text-processing:It is defeated one by one to the intellectual analysis engine The corresponding short sentence of the output result gone out carries out corresponding text-processing.
In one embodiment, the method further includes:Increase or changes or delete in the intellectual analysis engine The knowledge point.
In one embodiment, before being matched each short sentence with the described problem of each knowledge point, The method further includes:The text formatting for recording the short sentence and/or the position in the pending text.
In one embodiment, described that each short sentence and the described problem of each knowledge point are subjected to matching packet It includes:Each short sentence and the described problem of each knowledge point are subjected to Text similarity computing, most by text similarity The big knowledge point is as the knowledge point with the short sentence successful match;Wherein, when the short sentence and the knowledge point Described problem successful match when, according to the answer of the knowledge point to the short sentence carry out text-processing include:According to this The answer of the knowledge point of successful match carries out text-processing to the short sentence.
In one embodiment, described pending text is split into multiple short sentences to include:It identifies pre- in pending text If splitting symbol;And the content of text in two adjacent default fractionation symbols is split as a short sentence.
In one embodiment, it is described it is default split symbol include it is following it is several in it is one or more:Punctuation mark, newline Number and default split word.
In one embodiment, the text-processing mode includes one or more combinations in following processing mode:It adjusts The short sentence before default whole text formatting, extraction character or after default character, arranged according to preset rules content of text, Increase default mark.
In one embodiment, described to include according to preset rules arrangement content of text:It inserts preset table or filling is pre- If word template.
In one embodiment, the word template is using the form of semantic formula or the form of regular expression.
In one embodiment, the pending text is obtained by speech conversion process.
According to another aspect of the present invention, a kind of text processing apparatus is provided, including:Knowledge data base, including one Or multiple knowledge points, each knowledge point includes problem and answer, and the content of described problem corresponds to received text content, described The content of answer corresponds to text-processing mode;Module is split, is configured to pending text splitting into multiple short sentences;Match mould Block is configured to match each short sentence with the described problem of each knowledge point;And text processing module, match When being set to the described problem successful match when the short sentence and the knowledge point, according to the answer of the knowledge point to described short Sentence carries out text-processing.
According to another aspect of the present invention, a kind of computer equipment is provided, including memory, processor and is stored in The computer program executed by the processor on the memory, the processor are realized such as when executing the computer program The step of preceding any text handling method.
According to another aspect of the present invention, a kind of computer readable storage medium is provided, computer journey is stored thereon with Sequence, when the computer program is executed by processor realize it is preceding it is any as described in text handling method the step of.
A kind of text handling method and device provided in an embodiment of the present invention, by provide it is one or more include problem and The knowledge point of answer, and the content of pending text is split into short sentence, the search and determination for pending content of text The process of text-processing rule converts in order to which using short sentence as the question answering process of unit, the content of short sentence corresponds to the problem of puing question to, and knows Know the corresponding typical problem of the problems in point, after being matched to corresponding knowledge point, the answer directly corresponding to knowledge point Text-processing can be carried out, can expeditiously realize that the automation of big data quantity content of text arranges.Meanwhile when needing to increase When adding or changing text-processing rule, it is only necessary to the problems in increase or modification knowledge point and answer so that text-processing Rule easily can be edited flexibly, ensure that the versatility of product, it can be achieved that rapid deployment and debugging, greatly reduce exploitation Cost.
Description of the drawings
Fig. 1 show one embodiment of the invention and provides a kind of flow diagram of text handling method.
Fig. 2 show one embodiment of the invention and provides a kind of split process in text handling method to pending text Flow diagram.
Fig. 3 show the stream that one embodiment of the invention provides the calculating process of text similarity in a kind of text handling method Journey schematic diagram.
Fig. 4 show a kind of principle schematic of text handling method of one embodiment of the invention offer.
Fig. 5 show a kind of structural schematic diagram of text processing apparatus of one embodiment of the invention offer.
Fig. 6 show a kind of structural schematic diagram of text processing apparatus of one embodiment of the invention offer.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiment is only a part of the embodiment of the present invention, instead of all the embodiments.Based on this Embodiment in invention, the every other reality that those of ordinary skill in the art are obtained without creative efforts Example is applied, shall fall within the protection scope of the present invention.
Fig. 1 show one embodiment of the invention and provides a kind of flow diagram of text handling method.This article present treatment side Method provides one or more knowledge points, and each knowledge point includes problem and answer, and the content of problem corresponds to received text content, answers The content of case corresponds to text-processing mode.Correspondence between problem and answer can be pre-established by pre- learning process.Such as Shown in Fig. 1, this article treatment method includes the following steps:
Step 101:Pending text is split into multiple short sentences.
By the way that pending text is split into multiple short sentences so that each short sentence becomes one of text-processing substantially singly Member, each short sentence subsequently obtained again to fractionation is matched with knowledge point, in a manner of the corresponding text-processing of determination.It is pending Text can come from third-party data content (such as from third-party contract text or customer information text), also may be used To be the content of text that is obtained by upper voice conversion process (such as in the text that is converted into of the voice recording under customer service scene Hold), the present invention does not limit the particular content of pending text.
In an embodiment of the present invention, the split process to pending text can be completed as follows, such as Fig. 2 institutes Show, which may include following steps:
Step 1011:Identify the default fractionation symbol in pending text.
It is default split symbol may include it is following it is several in it is one or more:Punctuation mark, line feed symbol and default fractionation word. The default particular content for splitting word can be depending on actual application scenarios, and the present invention does not do the default particular content for splitting word It limits.
For example, for following contract text content:
" borrower:I companies
Legal representative (responsible person):H user
Legal address:The areas X of the cities the S roads Y Z "
Can be arranged colon ":" and hiding line feed symbol be that default split accords with.When pending text is identified, It can be accorded with according to the default fractionation in the pending text of scheduled traversal order (such as progressively scanning from left to right) identification.
Step 1012:Content of text in adjacent two default fractionation symbols is split as a short sentence.
For example, following short sentence just can be obtained in said contract content of text after split process:" borrower:", " I is public Department ", " legal representative (responsible person):", " H user ", " legal address:" and " areas X of the cities the S roads Y Z ".And obtained by splitting Each short sentence can carry out subsequent matching process as basic unit and the knowledge point of a text-processing.
Step 102:Each short sentence is matched with the problem of each knowledge point.
As previously mentioned, each knowledge point includes problem and answer, the content of problem corresponds to received text content, answer it is interior Hold corresponding text-processing mode.By the way that each short sentence is matched with the problem of knowledge point, you can determine matched standard text This content, and then determine corresponding answer, i.e., corresponding text-processing mode.For example, using above-mentioned contract text content as example, The knowledge point provided just may include that " borrower's information please be provided" and " legal representative's information please be provide" two knowledge points The problem of.In this way by short sentence " borrower:" when being matched with the problem of knowledge point provided, then it can determine whether matched knowledge The problem of point is that " please provide borrower's information", it can be to short sentence " borrower according to the answer of the knowledge point:" make it is corresponding Text-processing;And by short sentence " legal representative (responsible person):" when being matched with the problem of knowledge point provided, then it can be true The problem of fixed matched knowledge point is that " please provide legal representative's information", it can be " legal to short sentence according to the answer of the knowledge point Representative (responsible person):" make corresponding text-processing.
In an embodiment of the present invention, text can be passed through by the problem of each short sentence and each knowledge point being carried out matched process The mode of this similarity calculation is realized, specifically, as shown in figure 3, the calculating process of text similarity may include walking as follows Suddenly:
Step 1021:The problem of each short sentence and each knowledge point, is subjected to Text similarity computing.
Step 1022:Using the maximum knowledge point of text similarity as the knowledge point with short sentence successful match.
When the short sentence and a knowledge point the problem of when successful match, subsequently then according to the answer of the knowledge point of the successful match Text-processing is carried out to short sentence.One or more realizations in following computational methods can be used in text similarity calculation process: Editing distance computational methods, n-gram computational methods, JaroWinkler computational methods and Soundex computational methods.However The present invention does not do considered critical to the specific implementation of the similarity calculation.
In an embodiment of the present invention, it is contemplated that split obtained short sentence might have various deformation, for further The intelligent experience of text-processing mode is improved, the mode that semantic template can be used in the problems in knowledge point is realized.
Semantic template can be the set for the one or more abstract semantics expression formulas for indicating a certain semantic content, by developing Personnel combine semantic content to generate according to scheduled rule, i.e., can describe corresponding semantic content by a semantic template The sentence of a variety of difference expression ways, the obtained possible various deformation of short sentence is split with reply.In this way will split obtain it is short Sentence is matched with preset semantic template, is avoided and is identified using being only capable of describing a kind of standard semantic template of expression way Limitation when user message.
Each abstract semantics expression formula mainly may include semantic component word and semantic rules word.Semantic component word is by semanteme Ingredient symbol indicates, miscellaneous specific language can be expressed after these semantic components symbol is filled with corresponding value (i.e. content) Justice.
The semantic component symbol of abstract semantics may include:
[concept]:Indicate the word or phrase of main body or object composition.
Such as:" CRBT " in " how open-minded CRBT is ".
[action]:Expression acts the word or phrase of ingredient.
Such as:" handling " in " how credit card is handled ".
[attribute]:Indicate the word or phrase of attribute composition.
Such as:" color " in " which color iphone has ".
[adjective]:Indicate the word or phrase of ornamental equivalent.
Such as:" cheap " in " which brand of refrigerator is cheap ".
Some main abstract semantics classification examples have:
What conceptual illustration [concept] is
Attribute constitutes which [attribute] [concept] has
Behavior [concept] how [action]
Behavior place [concept] is somewhere [action]
Why behavioral reasons [concept] can [action]
Behavior prediction [concept] can or can not [action]
Behavior judges [concept] either with or without [attribute]
[attribute] of attribute situation [concept] is [adjective]
Whether determined property [concept] has [attribute]
Why so [adjective] [attribute] of attribute reason [concept]
Where are the difference of proximate nutrition [concept1] and [concept2]
[attribute] that attribute compares [concept1] and [concept2] has any difference
Question sentence can do general judge in the composition judgement of abstract semantics level by part-of-speech tagging, concept pairs The part of speech answered is noun, and the corresponding parts of speech of action are verb, the corresponding parts of speech of attribute are noun, adjective correspondences Be adjective.
By the abstract semantics [concept] that classification is " behavior " how for [action], the abstract language of the category It may include a plurality of abstract semantics expression formula under justice set:
Abstract semantics classification:Behavior
Abstract semantics expression formula:
A. [concept] [need | should] [how]<[can with]><It carries out>[action]
B. { [concept]~[action] }
c.[concept]<'s>[action]<Method | mode | step>
d.<Which has | what has | either with or without><Pass through | use |>[concept][action]<'s>[method]
E. [how] [action]~[concept]
Tetra- abstract semantics expression formulas of above-mentioned a, b, c, d are all for describing " behavior " this abstract semantics classification 's.Semantic symbol " | " expression "or" relationship, semantic symbol "" indicate that the ingredient is not essential.
It should be appreciated, however, that although the example of some semantic component words, semantic rules word and semantic symbol is presented above, But the particular content and part of speech of semantic component word, the definition of the particular content and part of speech and semantic symbol of semantic rules word and take With can all be preset according to the specific interactive service scene that the intelligent interactive method is applied by developer, the present invention to this simultaneously It does not limit.
In a further embodiment, when identifying semantic component word and semantic rules word in the short sentence that fractionation obtains When, included semantic component word and semantic rules word can be also converted to simplified in user speech operation information and semantic template Then text-string carries out similarity calculation using these text-strings, to improve the efficiency of similarity calculation.
In an embodiment of the present invention, as previously mentioned, semantic template can be made of semantic component word and semantic rules word, and The grammer of these semantic component words and semantic rules word again between part of speech of these words in semantic template and word closes It is related, therefore the similarity calculation process can be specially:First identify word, the word in user speech operation information text Part of speech and grammatical relation, semantic component word and semanteme therein are then identified according to the part of speech of word and grammatical relation Regular word, then the semantic component word identified and semantic rules word introduced into vector space model with calculate split obtain it is short Multiple text similarities between sentence and multiple preset semantic templates.It in an embodiment of the present invention, can the side of participle as follows Word in the short sentence that one or more identifications in method are split, the grammatical relation between the part of speech and word of word: Hidden markov model approach, Forward Maximum Method method, reverse maximum matching process and name entity recognition method.
In an embodiment of the present invention, as previously mentioned, semantic template can be to indicate that the multiple of a certain semantic content are abstracted The set of semantic formula can describe a variety of different expression ways of corresponding semantic content by a semantic template at this time Sentence, with multiple extension expression ways of the same standard expression way of correspondence.Therefore calculate split obtained short sentence with it is pre- If semantic template between text similarity when, need to calculate and split obtained short sentence with multiple preset semantic templates respectively At least one extension of expansion ask between text similarity, then ask the highest extension of text similarity to corresponding semanteme Template is as matched semantic template.These expansion extensions ask can according to included by semantic template semantic component word and/or Semantic rules word and/or semantic symbol and obtain.
For example, " borrower's information please be provide", " whom borrower is" can be the same semantic template " [please carry For] [borrower | creditor] [information | identity] [it is | be] [whose] " two different extensions of expansion ask.When what is split out Short sentence is " borrower:" when, it then can determine that the highest extension of text similarity is asked as " borrower is by Text similarity computing Who", it can determine matched semantic template at this time.And when the short sentence split out is " borrower's information:", although passing through text Similarity calculation determines that the highest extension of text similarity is asked as that " please provide borrower's information", but can determine matched semanteme Template is still above-mentioned semantic template.It can be seen that being indicated in a manner of semantic template by knowledge point the problem of, avoid Limitation when user message is identified using being only capable of describing a kind of standard semantic template of expression way.
Step 103:When the short sentence and knowledge point the problem of when successful match, according to the answer of the knowledge point to short sentence into style of writing Present treatment.
When matched knowledge point is determined by matching process (for example, by using the side of Text similarity computing above-mentioned Formula), that is, the answer in corresponding knowledge point is determined.Short sentence can be carried out at text according to the answer of the matched knowledge point Reason.In an embodiment of the present invention, text-processing mode may include one or more combinations in following processing mode:Adjustment Before default text formatting (such as adjustment font size, font model, font color etc.), extraction character or after default character Short sentence, according to preset rules arrange content of text (such as inserting preset table or the preset word template of filling), increase Default mark.
In an embodiment of the present invention, when text-processing mode includes inserting preset word template, word template can Using the form of semantic formula or the form of regular expression.The present invention does not limit the concrete form of this article character matrix plate.
For example, using above-mentioned contract text content as example, " borrower " and " legal representative " two received text contents Corresponding answer, which may respectively be, to be designated as purple and is designated as yellow, splits obtained short sentence " borrower in this way:" and " legal generation Table people (responsible person):" purple and yellow will be denoted as in pending text.For another example " borrower " and " legal generation Two corresponding answers of received text content of table people " may respectively be in the preset table of filling, and will be after corresponding short sentence The short sentence in face also inserts the predeterminated position of the preset table, and text-processing result obtained in this way can be as shown in the table:
Borrower: I companies
Legal representative (responsible person): H user
In another implementation of the present invention, other than carrying out text-processing to current short sentence, text-processing mode may be used also Including the short sentence before default character of extraction or after default character, and text-processing is also carried out to the short sentence of the extraction.More than It is example to state contract text content, and " borrower " and " legal representative " two corresponding answers of received text content can divide Purple Wei be designated as and be designated as yellow, simultaneously " borrower:" the subsequent short sentence of short sentence " I companies " can also be designated as purple, " legal generation Table people (responsible person):" the subsequent short sentence of short sentence " H user " can also be designated as yellow, specifically set according to user using which kind of mode Text-processing mode determine.
It should be appreciated, however, that the particular content of the answer (i.e. text-processing mode) in knowledge point can basis with processing rule Actual application scenarios and design or adjust, the present invention is to the concrete form of the answer in knowledge point and is not specifically limited.
It can be seen that a kind of text handling method provided in an embodiment of the present invention includes problem by providing one or more With the knowledge point of answer, and the content of pending text is split into short sentence, the search and really for pending content of text The process for determining text-processing rule converts in order to which using short sentence as the question answering process of unit, the content of short sentence corresponds to the problem of puing question to, The problems in knowledge point corresponds to typical problem, after being matched to corresponding knowledge point, direct answering corresponding to knowledge point Case can carry out text-processing, can expeditiously realize that the automation of big data quantity content of text arranges.Meanwhile working as needs When increasing or changing text-processing rule, it is only necessary to the problems in increase or modification knowledge point and answer so that at text Reason rule easily can be edited flexibly, ensure that the versatility of product, it can be achieved that rapid deployment and debugging, greatly reduce out Send out cost.
Fig. 4 show a kind of principle schematic of text handling method of one embodiment of the invention offer.As shown in figure 4, The text handling method that the embodiment is provided is based on an intellectual analysis engine implementation, is prestored in the intellectual analysis engine One or more knowledge points.In this way when being matched each short sentence with the problem of each knowledge point, being in fact will be each Short sentence inputs intellectual analysis engine, and intellectual analysis engine is configured to the short sentence that will be inputted and progress the problem of each knowledge point Match, and exports the answer of the knowledge point of short sentence.According to the output result of intellectual analysis engine again to matching knowledge point the problem of Short sentence carries out text-processing.
It should be appreciated that the intellectual analysis engine can be soft on text processing system front end or cloud server by being arranged Part program realize, by the short sentence for splitting pending text input the intellectual analysis engine can directly determine it is corresponding Text-processing mode, entire text-processing process is convenient and efficient, can be realized to different types of pending text general.Meanwhile When text-processing rule to be changed, directly increase or change or delete the knowledge point in intellectual analysis engine, and does not have to Again text processing system is developed, development cost is further reduced.
In an embodiment of the present invention, it is contemplated that can obtain many short sentences after pending text deconsolidation process, it is therefore desirable to These are split with the orderly processing of obtained short sentence to ensure the efficiency and accuracy of text-processing, may be configured as to split at this time Obtained multiple short sentences input intellectual analysis engine simultaneously one by one, and the output result exported one by one to intellectual analysis engine is corresponding Short sentence carries out corresponding text-processing.
In a further embodiment, due to it will split obtained short sentence input intellectual analysis engine after, identified text Present treatment mode may realize (such as the font size for the content of text for adjusting current short sentence in original pending text Or background color etc.), at this time with regard to needing before being matched each short sentence with the problem of each knowledge point, further record The text formatting of short sentence and/or the position in pending text, it is corresponding in order to be carried out after determining text-processing mode Text-processing process.
Fig. 5 show a kind of structural schematic diagram of text processing apparatus of one embodiment of the invention offer.As shown in figure 5, Text processing unit 50 includes:Knowledge data base 51 splits module 52, matching module 53 and text processing module 54.Specifically For, knowledge data base 51 includes one or more knowledge points, and each knowledge point includes problem and answer, and the content of problem corresponds to The content of received text content, answer corresponds to text-processing mode.Fractionation module 52 is configured to split into pending text more A short sentence.Matching module 53 is configured to match each short sentence with the problem of each knowledge point.Text processing module 54 is matched When being set to the successful match when the short sentence and knowledge point the problem of, text-processing is carried out to short sentence according to the answer of the knowledge point.
A kind of text processing apparatus 50 provided in an embodiment of the present invention includes problem and answer by providing one or more Knowledge point, and the content of pending text is split into short sentence, for pending content of text search and determine text The process of processing rule converts in order to which using short sentence as the question answering process of unit, the content of short sentence corresponds to the problem of puing question to, knowledge point The problems in corresponding typical problem, after being matched to corresponding knowledge point, the answer directly corresponding to knowledge point Text-processing is carried out, can expeditiously realize that the automation of big data quantity content of text arranges.Meanwhile when need increase or When changing text-processing rule, it is only necessary to the problems in increase or modification knowledge point and answer so that text-processing rule Easily can flexibly edit, ensure that the versatility of product, it can be achieved that rapid deployment and debugging, greatly reduce exploitation at This.
In an embodiment of the present invention, as shown in fig. 6, text processing unit 50 can further comprise:Input module 55 With intellectual analysis engine 56.Input module 55 is configured to each short sentence input intellectual analysis engine 56.51 He of knowledge data base Matching module 53 is previously integrated in intellectual analysis engine 56, intellectual analysis engine 56 be configured to by the short sentence of input with each know The problem of knowing point is matched, and exports the answer of the knowledge point of short sentence.Wherein, text processing module 54 is further configured to: The short sentence of the problem of according to the output result of intellectual analysis engine 56 to matching knowledge point carries out text-processing.
It is that each short sentence is inputted into intelligence in fact in this way when being matched each short sentence with the problem of each knowledge point Analysis engine 56, intellectual analysis engine 56 is configured to match the short sentence of input with the problem of each knowledge point, and exports The answer of the knowledge point of short sentence.According to the output result of intellectual analysis engine 56 again to matching knowledge point the problem of short sentence carry out Text-processing.
It should be appreciated that the intellectual analysis engine 56 can be by being arranged on text processing system front end or cloud server Software program realizes that inputting the intellectual analysis engine 56 by the short sentence for splitting pending text can directly determine pair The text-processing mode answered, entire text-processing process is convenient and efficient, can be realized to different types of pending text general.Together When, in an embodiment of the present invention, as shown in Fig. 6, text processing unit 50 can further comprise:Editor module 57, configuration To increase or changing or deleting the knowledge point in intellectual analysis engine 56.In this way when needing to change text-processing rule, just not With text processing system is developed again, development cost further reduced.
In an embodiment of the present invention, it is contemplated that can obtain many short sentences after pending text deconsolidation process, it is therefore desirable to These are split with the orderly processing of obtained short sentence to ensure that the efficiency and accuracy of text-processing, input module 55 can be further It is configured to:Obtained multiple short sentences will be split and input intellectual analysis engine 56 one by one.At this point, text processing module 54 can be further It is configured to:Corresponding text-processing is carried out to the corresponding short sentence of output result that intellectual analysis engine 56 exports one by one.
In an embodiment of the present invention, identified after since obtained short sentence input intellectual analysis engine 56 will be split Text-processing mode may realize that (such as the font for the content of text for adjusting current short sentence is big in original pending text Small or background color etc.), at this time as shown in fig. 6, text processing unit 50 can further comprise:Logging modle 58, is configured to The text formatting of short sentence is recorded before being matched each short sentence with the problem of each knowledge point and/or in pending text In position.
In an embodiment of the present invention, the problem of matching module 53 is further configured to each short sentence and each knowledge point Text similarity computing is carried out, using the maximum knowledge point of text similarity as the knowledge point with short sentence successful match.At this point, literary The answer of the knowledge point of the successful match carries out text-processing to short sentence according to present treatment module 54 can be further configured.This article One or more realizations in following computational methods can be used in this similarity calculation process:Editing distance computational methods, n-gram Computational methods, JaroWinkler computational methods and Soundex computational methods.However the present invention is to the tool of the similarity calculation Body realization method does not do considered critical.
In an embodiment of the present invention, may include as shown in fig. 6, splitting module 52:Recognition unit 521 and fractionation execute Unit 522.Recognition unit 521 is configured to identify that the default fractionation in pending text accords with.Split execution unit 522 be configured to by The default content of text split in symbol of adjacent two is split as a short sentence.
In an embodiment of the present invention, preset split symbol may include it is following it is several in it is one or more:Punctuation mark changes Row symbol and default fractionation word.
In an embodiment of the present invention, text-processing mode may include one or more groups in following processing mode It closes:It adjusts the short sentence before default text formatting, extraction character or after default character, arranged in text according to preset rules Hold, increase default mark.
In an embodiment of the present invention, arranging content of text according to preset rules includes:Insert preset table or filling Preset word template.
It should be appreciated that the particular content of the answer (i.e. text-processing mode) in knowledge point can be according to reality with processing rule Application scenarios and design or adjust, the present invention is to the concrete form to the answer in knowledge point and is not specifically limited.
In an embodiment of the present invention, the form of semantic formula or the form of regular expression can be used in word template.
In an embodiment of the present invention, pending text can be obtained by speech conversion process.Pending text can be Come from third-party data content (such as from third-party contract text or customer information text), can also be by upper The content of text (such as the voice recording under customer service scene be converted into content of text) that voice conversion process obtains, the present invention couple The particular content of pending text does not limit.
One embodiment of the invention also provides a kind of computer equipment, including memory, processor and is stored in memory On the computer program that is executed by processor, processor realizes the text such as preceding any embodiment described in when executing computer program The step for the treatment of method.
One embodiment of the invention also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates It is realized when machine program is executed by processor as described in preceding any embodiment the step of text handling method.The computer stores Medium can be any tangible media, such as floppy disk, CD-ROM, DVD, hard disk drive, even network medium etc..
It should be appreciated that each module or unit described in the text processing apparatus 50 that above-described embodiment is provided with it is preceding The method and step stated is corresponding.The operation of method and step description above-mentioned and feature are equally applicable to text-processing as a result, Device 50 and its included in corresponding module and unit, repeat content details are not described herein.
Although it should be appreciated that can be computer program production the foregoing describe a kind of way of realization of embodiment of the present invention Product, but the method or apparatus of embodiments of the present invention can be come in fact according to the combination of software, hardware or software and hardware It is existing.Hardware components can be realized using special logic;Software section can store in memory, by instruction execution appropriate System, such as microprocessor or special designs hardware execute.It will be understood by those skilled in the art that above-mentioned side Method and equipment can be realized using computer executable instructions and/or be included in the processor control code, such as such as The programmable memory or such as optics of disk, the mounting medium of CD or DVD-ROM, such as read-only memory (firmware) Or such code is provided in the data medium of electrical signal carrier.Methods and apparatus of the present invention can be advised by such as super large The semiconductor or such as field programmable gate array of vlsi die or gate array, logic chip, transistor etc., can The hardware circuit of the programmable hardware device of programmed logic equipment etc. realizes, can also be with being executed by various types of processors Software realization can also be realized by the combination such as firmware of above-mentioned hardware circuit and software.
It will be appreciated that though it is referred to several modules or unit of device in the detailed description above, but this stroke It point is merely exemplary rather than enforceable.In fact, according to an illustrative embodiment of the invention, above-described two or The more feature and function of multimode/unit can realize in a module/unit, conversely, an above-described module/mono- The feature and function of member can be further divided into be realized by multiple module/units.In addition, above-described certain module/ Unit can be omitted under certain application scenarios.
It should be appreciated that in order not to obscure embodiments of the present invention, specification only to some it is crucial, may not necessary technology It is described with feature, and the feature that can may do not realized to some those skilled in the art is explained.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention Within god and principle, made by any modification, equivalent replacement etc., should all be included in the protection scope of the present invention.

Claims (15)

1. a kind of text handling method, which is characterized in that provide one or more knowledge points, each knowledge point includes problem And answer, the content of described problem correspond to received text content, the content of the answer corresponds to text-processing mode, the method Including:
Pending text is split into multiple short sentences;
Each short sentence is matched with the described problem of each knowledge point;And
When the described problem successful match of the short sentence and the knowledge point, according to the answer of the knowledge point to described short Sentence carries out text-processing.
2. according to the method described in claim 1, it is characterized in that, one or more of knowledge points are stored in advance in an intelligence In energy analysis engine;
Wherein, it is described by each short sentence and the described problem of each knowledge point match including:It will be each described Short sentence inputs the intellectual analysis engine, wherein the intellectual analysis engine is configured to the short sentence that will be inputted and each institute The described problem for stating knowledge point is matched, and exports the answer of the knowledge point of the short sentence;
Wherein, the answer according to the knowledge point to the short sentence carry out text-processing include:According to the intelligence point The output result for analysing engine carries out text-processing to the short sentence for matching the described problem of the knowledge point.
3. according to the method described in claim 2, it is characterized in that, described draw each short sentence input intellectual analysis Hold up including:Obtained the multiple short sentence will be split and input the intellectual analysis engine one by one;
Wherein, the output result according to the intellectual analysis engine to match the knowledge point described problem it is described short Sentence carries out text-processing:The corresponding short sentence of the output result that the intellectual analysis engine exports one by one is carried out Corresponding text-processing.
4. according to the method described in claim 2, it is characterized in that, further comprising:
Increase or change or delete the knowledge point in the intellectual analysis engine.
5. according to the method described in claim 1, it is characterized in that, by the institute of each short sentence and each knowledge point Before the problem of stating is matched, further comprise:
The text formatting for recording the short sentence and/or the position in the pending text.
6. according to the method described in claim 1, it is characterized in that, described by each short sentence and each knowledge point Described problem carries out matching:
Each short sentence and the described problem of each knowledge point are subjected to Text similarity computing, most by text similarity The big knowledge point is as the knowledge point with the short sentence successful match;Wherein, when the short sentence and the knowledge point Described problem successful match when, according to the answer of the knowledge point to the short sentence carry out text-processing include:
Text-processing is carried out to the short sentence according to the answer of the knowledge point of the successful match.
7. according to the method described in claim 1, it is characterized in that, described split into multiple short sentences by pending text and include:
Identify the default fractionation symbol in pending text;And
Content of text in two adjacent default fractionation symbols is split as a short sentence.
8. the method according to the description of claim 7 is characterized in that it is described it is default split symbol include it is following it is several in one kind or It is a variety of:Punctuation mark, line feed symbol and default fractionation word.
9. according to the method described in claim 1, it is characterized in that, the text-processing mode includes in following processing mode One or more combinations:Adjust the short sentence before default text formatting, extraction character or after default character, according to pre- If rule arranges content of text, increases default mark.
10. according to the method described in claim 9, it is characterized in that, described include according to preset rules arrangement content of text:It fills out Enter preset table or the preset word template of filling.
11. according to the method described in claim 10, it is characterized in that, the word template using semantic formula form or The form of regular expression.
12. according to the method described in claim 1, it is characterized in that, the pending text is obtained by speech conversion process.
13. a kind of text processing apparatus, which is characterized in that including:
Knowledge data base, including one or more knowledge points, each knowledge point include problem and answer, described problem it is interior Hold corresponding received text content, the content of the answer corresponds to text-processing mode;
Module is split, is configured to pending text splitting into multiple short sentences;
Matching module is configured to match each short sentence with the described problem of each knowledge point;And
Text processing module, when being configured to the described problem successful match when the short sentence and the knowledge point, according to the knowledge The answer of point carries out text-processing to the short sentence.
14. a kind of computer equipment, including memory, processor and being stored on the memory is executed by the processor Computer program, which is characterized in that the processor is realized when executing the computer program as appointed in claim 1 to 12 The step of one the method.
15. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program It is realized when being executed by processor such as the step of any one of claim 1 to 12 the method.
CN201810149309.5A 2018-02-13 2018-02-13 Text handling method and device Pending CN108363693A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810149309.5A CN108363693A (en) 2018-02-13 2018-02-13 Text handling method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810149309.5A CN108363693A (en) 2018-02-13 2018-02-13 Text handling method and device

Publications (1)

Publication Number Publication Date
CN108363693A true CN108363693A (en) 2018-08-03

Family

ID=63002713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810149309.5A Pending CN108363693A (en) 2018-02-13 2018-02-13 Text handling method and device

Country Status (1)

Country Link
CN (1) CN108363693A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460453A (en) * 2018-10-09 2019-03-12 北京来也网络科技有限公司 Data processing method and device for positive negative sample
CN109614463A (en) * 2018-10-24 2019-04-12 阿里巴巴集团控股有限公司 Text matches processing method and processing device
CN109840274A (en) * 2018-12-28 2019-06-04 北京百度网讯科技有限公司 Data processing method and device, storage medium
CN111191421A (en) * 2019-12-30 2020-05-22 出门问问信息科技有限公司 Text processing method and device, computer storage medium and electronic equipment
CN111507082A (en) * 2020-04-23 2020-08-07 北京奇艺世纪科技有限公司 Text processing method and device, storage medium and electronic device
CN111967270A (en) * 2020-08-16 2020-11-20 云知声智能科技股份有限公司 Method and equipment based on character and semantic fusion
CN112100976A (en) * 2020-09-24 2020-12-18 上海松鼠课堂人工智能科技有限公司 Knowledge point relation marking method and system
CN112632258A (en) * 2020-12-30 2021-04-09 太平金融科技服务(上海)有限公司 Text data processing method and device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927358A (en) * 2014-04-15 2014-07-16 清华大学 Text search method and system
CN104850539A (en) * 2015-05-28 2015-08-19 宁波薄言信息技术有限公司 Natural language understanding method and travel question-answering system based on same
CN105893524A (en) * 2016-03-31 2016-08-24 上海智臻智能网络科技股份有限公司 Intelligent asking and answering method and device
CN106649209A (en) * 2016-12-30 2017-05-10 深圳天珑无线科技有限公司 Text display method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927358A (en) * 2014-04-15 2014-07-16 清华大学 Text search method and system
CN104850539A (en) * 2015-05-28 2015-08-19 宁波薄言信息技术有限公司 Natural language understanding method and travel question-answering system based on same
CN105893524A (en) * 2016-03-31 2016-08-24 上海智臻智能网络科技股份有限公司 Intelligent asking and answering method and device
CN106649209A (en) * 2016-12-30 2017-05-10 深圳天珑无线科技有限公司 Text display method and device

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460453A (en) * 2018-10-09 2019-03-12 北京来也网络科技有限公司 Data processing method and device for positive negative sample
CN109614463A (en) * 2018-10-24 2019-04-12 阿里巴巴集团控股有限公司 Text matches processing method and processing device
CN109614463B (en) * 2018-10-24 2023-02-03 创新先进技术有限公司 Text matching processing method and device
CN109840274B (en) * 2018-12-28 2021-11-30 北京百度网讯科技有限公司 Data processing method and device and storage medium
CN109840274A (en) * 2018-12-28 2019-06-04 北京百度网讯科技有限公司 Data processing method and device, storage medium
CN111191421A (en) * 2019-12-30 2020-05-22 出门问问信息科技有限公司 Text processing method and device, computer storage medium and electronic equipment
CN111191421B (en) * 2019-12-30 2023-09-12 出门问问创新科技有限公司 Text processing method and device, computer storage medium and electronic equipment
CN111507082A (en) * 2020-04-23 2020-08-07 北京奇艺世纪科技有限公司 Text processing method and device, storage medium and electronic device
CN111967270A (en) * 2020-08-16 2020-11-20 云知声智能科技股份有限公司 Method and equipment based on character and semantic fusion
CN111967270B (en) * 2020-08-16 2023-11-21 云知声智能科技股份有限公司 Method and equipment based on fusion of characters and semantics
CN112100976A (en) * 2020-09-24 2020-12-18 上海松鼠课堂人工智能科技有限公司 Knowledge point relation marking method and system
CN112100976B (en) * 2020-09-24 2021-11-16 上海松鼠课堂人工智能科技有限公司 Knowledge point relation marking method and system
CN112632258A (en) * 2020-12-30 2021-04-09 太平金融科技服务(上海)有限公司 Text data processing method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108363693A (en) Text handling method and device
CN109241538B (en) Chinese entity relation extraction method based on dependency of keywords and verbs
CN105718586B (en) The method and device of participle
WO2018000272A1 (en) Corpus generation device and method
CN106997341B (en) A kind of innovation scheme matching process, device, server and system
CN107463553A (en) For the text semantic extraction, expression and modeling method and system of elementary mathematics topic
CN108984661A (en) Entity alignment schemes and device in a kind of knowledge mapping
CN103020230A (en) Semantic fuzzy matching method
CN109614620B (en) HowNet-based graph model word sense disambiguation method and system
CN109960756A (en) Media event information inductive method
Rachman et al. CBE: Corpus-based of emotion for emotion detection in text document
CN106649250A (en) Method and device for identifying emotional new words
Gómez-Adorno et al. A graph based authorship identification approach
CN110717045A (en) Letter element automatic extraction method based on letter overview
CN113392182A (en) Knowledge matching method, device, equipment and medium fusing context semantic constraints
Iosif et al. From speaker identification to affective analysis: a multi-step system for analyzing children’s stories
CN113312922A (en) Improved chapter-level triple information extraction method
Wax Automated grammar engineering for verbal morphology
CN113361252B (en) Text depression tendency detection system based on multi-modal features and emotion dictionary
CN107622047B (en) Design decision knowledge extraction and expression method
CN110008807A (en) A kind of training method, device and the equipment of treaty content identification model
CN110888983B (en) Positive and negative emotion analysis method, terminal equipment and storage medium
Han et al. A novel part of speech tagging framework for nlp based business process management
CN110413779B (en) Word vector training method, system and medium for power industry
Khankasikam Knowledge capture for Thai word segmentation by using CommonKADS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180803

RJ01 Rejection of invention patent application after publication