CN110502752A - A kind of text handling method, device, equipment and computer storage medium - Google Patents

A kind of text handling method, device, equipment and computer storage medium Download PDF

Info

Publication number
CN110502752A
CN110502752A CN201910777842.0A CN201910777842A CN110502752A CN 110502752 A CN110502752 A CN 110502752A CN 201910777842 A CN201910777842 A CN 201910777842A CN 110502752 A CN110502752 A CN 110502752A
Authority
CN
China
Prior art keywords
text
processed
word
answer
storage region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910777842.0A
Other languages
Chinese (zh)
Inventor
王妙心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing First Chain Digital Cloud Technology Co Ltd
Original Assignee
Beijing First Chain Digital Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing First Chain Digital Cloud Technology Co Ltd filed Critical Beijing First Chain Digital Cloud Technology Co Ltd
Priority to CN201910777842.0A priority Critical patent/CN110502752A/en
Publication of CN110502752A publication Critical patent/CN110502752A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The present invention discloses a kind of text handling method, device, equipment and computer storage medium, and target signature word is obtained first from the first text to be processed;The first object question sentence text to match with the first text to be processed, and determining the first answer text with first object question sentence textual association are obtained from default storage region;Second text to be processed is segmented, the corresponding sentence element of Feature Words of the second text to be processed is determined from word segmentation result;If in word segmentation result not including target signature word, and sentence element does not include target sentences ingredient, target signature word and the second text to be processed are combined, generate the text to be processed of new second;The the second target question sentence text to match with the second new text to be processed is obtained from default storage region, and determining the second answer text with the second target question sentence textual association, second answer text is shown, robot customer service can effectively provide a user ideal answer after the multiple enquirement in face of user.

Description

A kind of text handling method, device, equipment and computer storage medium
Technical field
The present invention relates to field of artificial intelligence more particularly to a kind of text handling method, device, equipment and calculating Machine storage medium.
Background technique
With the rapid development of artificial intelligence technology, NLP (Natural Language Processing, at natural language Reason technology) it is widely used in the practical businesses environment such as enterprise's customer service.
However in practical applications, based on above several functional characteristics, conversational interaction robot is formd.
In text similarity analysis, existing analysis model is in theoretical research and the displaying of shallow-layer interaction mostly, so And in the practical businesses environment such as enterprise's customer service (such as conversational interaction robot), it is difficult to carry out by Company Knowledge library Accurately identify and position, generally existing expert data lack the case where, can not quickly and effectively be located on business scenario. The knowledge base FAQ data of large data sets used in NLP relative to prolonged exercise, professional domain can only regard Small Sample Database Collection, is unable to satisfy the training set requirement of deep learning.
For example, the customer service that the user encounters is a machine at the beginning when some user needs to ask questions to website People, after user inputs text information to dialog box, the backstage of website can be parsed according to the text information that user inputs, then It can give user feedback one answer text according to parsing result;The solution that the user may be fed back for the first time based on robot customer service It answers text and inputs a new problem again, but there may be after some grammar issues or the website for this new problem Number of units is according to the case where there are expert data missings, and then the backstage of website can not correctly parse the new problem of user proposition, The answer that can not be correctly satisfied with to the user one, the final user needs and the artificial customer service of website carries out telephonic communication, this Sample not only wastes the time of user, and website is also allowed to have to employ more human customers, so that human cost increases.
Above content is only used to facilitate the understanding of the technical scheme, and is not represented and is recognized that above content is existing skill Art.
Summary of the invention
The main purpose of the present invention is to provide the storages of a kind of text handling method, device, equipment and computer to be situated between Matter cannot effectively provide ideal answer to solve robot customer service when facing the multiple enquirement of user.
To achieve the above object, it the present invention provides a kind of text handling method, the described method comprises the following steps:
Receive the first text to be processed of user's input;
Target signature word is obtained from the described first text to be processed;
The first object question sentence text to match with the described first text to be processed is obtained from default storage region, and is determined Text is answered with the first of the first object question sentence textual association, the first answer text is shown;
Receive the second text to be processed of user's input;
Described second text to be processed is segmented, the feature of second text to be processed is determined from word segmentation result The corresponding sentence element of word;
If in the word segmentation result does not include target signature word, and the sentence element of second text to be processed does not wrap Target sentences ingredient is included, the target signature word and the described second text to be processed are combined, generates new second wait locate Manage text;
The the second target question sentence text to match with the second new text to be processed is obtained from default storage region, and Determining the second answer text with the second target question sentence textual association, is shown the second answer text.
Preferably, described the step of target signature word is obtained from the described first text to be processed, comprising:
The described first text to be processed is segmented based on dynamic programming algorithm, so that the first text tool to be processed There are multiple Feature Words;
Calculate separately the power in the described first text to be processed of each Feature Words in the described first text to be processed Weighted value is greater than the Feature Words of default weight threshold as target signature word by weight values;
Correspondingly, described that described second text to be processed is segmented, from word segmentation result determining described second to from The step of managing the Feature Words corresponding sentence element of text, comprising:
The described second text to be processed is segmented based on dynamic programming algorithm, so that the second text tool to be processed There are multiple Feature Words;
The corresponding sentence element of Feature Words of second text to be processed is determined from word segmentation result.
Preferably, described that the first object question sentence to match with the described first text to be processed is obtained from default storage region Text, and determining the first answer text with the first object question sentence textual association, open up the first answer text The step of showing specifically includes:
By Euclidean distance algorithm by the question sentence text to be matched in the described first text to be processed and default storage region Similarity-rough set is carried out, is chosen from comparison result and the described first maximum first object question sentence text of the text degree of approximation to be processed This;
The first answer text with the first object question sentence textual association is searched from default storage region;
The first answer text is shown;
Correspondingly, described that the second target to match with the second new text to be processed is obtained from default storage region Question sentence text, and determining the second answer text with the second target question sentence textual association, to the second answer text into The step of row is shown, specifically includes:
By Euclidean distance algorithm by the question sentence to be matched in the second new text to be processed and default storage region Text carries out similarity-rough set, chooses from comparison result and second new maximum second mesh of the text degree of approximation to be processed Mark question sentence text;
The second answer text with the second target question sentence textual association is searched from default storage region;
The second answer text is shown.
Preferably, the default storage region preserves multiple pre-stored characteristics words;
Correspondingly, described that the described first text to be processed is segmented based on dynamic programming algorithm, so that described first Text to be processed has the step of multiple Feature Words, specifically includes:
The described first text to be processed is segmented in conjunction with multiple pre-stored characteristics words and based on dynamic programming algorithm, and right Word segmentation result carries out stop words processing, so that first text to be processed has multiple Feature Words;
Correspondingly, described that the described second text to be processed is segmented based on dynamic programming algorithm, so that described second Text to be processed has the step of multiple Feature Words, specifically includes:
The described second text to be processed is segmented in conjunction with multiple pre-stored characteristics words and based on dynamic programming algorithm, and right Word segmentation result carries out stop words processing, so that second text to be processed has multiple Feature Words.
Preferably, after described the step of being shown to the second answer text, the method also includes:
The history text of different user's inputs is obtained, the history text that different users is inputted is as history text collection Conjunction is saved in default storage region;
History text in the history text set is clustered, obtain it is multiple it is poly- go out class and it is each it is poly- go out The corresponding cluster granularity of class;
Poly- class out and each poly- corresponding cluster granularity of class out are saved as cluster result;
Wherein, the step of history text in the history text set clusters, specifically includes:
The history text set is traversed, by the current history text traversed respectively with the history text collection Remaining history text to be compared carries out similarity mode in conjunction;
Similarity mode threshold value, the similarity mode are generated according to the corresponding cluster granularity of the current history text If threshold value is characterized as similarity mode, threshold value is smaller, and poly- class out is more, if being that similarity mode threshold value is bigger, gathers out Class is fewer;
Matching probability is greater than the history text to be compared of target cluster granularity as similar with the current history text Subject text is saved.
Preferably, it is described will it is poly- go out class and it is each it is poly- go out the corresponding cluster granularity of class as cluster result into After the step of row saves, further includes:
The filtering that the irrelevant word of business is carried out according to multiple pre-stored characteristics words of the default storage region, to cluster As a result invalid information rejecting processing is carried out.
Preferably, the mistake that the irrelevant word of business is carried out according to multiple pre-stored characteristics words of the default storage region Filter is specifically included with carrying out the step of invalid information rejects processing to cluster result:
It will carry out invalid information and reject treated processing result to carry out data distribution processing, time that cluster result is occurred Number carries out the calculating of mean value and standard deviation, is added according to mean value with standard deviation, mean value, mean value and standard deviation subtract each other carry out stepping;
Stepping result is saved;
When the number that cluster result occurs reaches preset times, by point of mean value and standard deviation that quantity is preset times Shelves result carries out the calculation processing of mean value and standard deviation again;
According to secondary treatment mean value and standard deviation as a result, to multiple pre-stored characteristics words of the default storage region into The adjustment of row weight, and increase certainly in the default storage region with the matched pre-stored characteristics word of secondary treatment result, so that described Become larger with the weight of the matched pre-stored characteristics word of secondary treatment result.
In addition, to achieve the above object, the present invention also proposes a kind of text processing apparatus, described device includes:
First receiving module, for receiving the first text to be processed of user's input;
First participle module, for obtaining target signature word from the described first text to be processed;
First display module, for obtaining the first mesh to match with the described first text to be processed from default storage region Question sentence text, and determining the first answer text with the first object question sentence textual association are marked, to the first answer text It is shown;
Second receiving module, for receiving the second text to be processed of user's input;
Second word segmentation module determines described for segmenting to the described second text to be processed from word segmentation result The corresponding sentence element of Feature Words of two texts to be processed;
Absenceofsubject completion module, if in the word segmentation result not including target signature word, and described second to The sentence element for handling text does not include target sentences ingredient, and the target signature word and the described second text to be processed are carried out Combination, generates the text to be processed of new second;
Second display module, for obtaining match with the second new text to be processed the from default storage region Two target question sentence texts, and determining the second answer text with the second target question sentence textual association, to second answer Text is shown.
In addition, to achieve the above object, the equipment that the present invention also proposes a kind of text-processing, the equipment of the text-processing Include: memory, processor and is stored in the text processor that can be run on the memory and on the processor, institute State the step of text processor is arranged for carrying out text handling method as described above.
In addition, to achieve the above object, the present invention also proposes a kind of computer storage medium, the computer storage medium It is stored with text processor, the step of text processor is arranged for carrying out text handling method as described above.
The present invention obtains target signature word first from the first text to be processed;From default storage region obtain with first to The first object question sentence text that processing text matches, and determining the first answer text with first object question sentence textual association, First answer text is shown;The second text to be processed of user's input is received again;Second text to be processed is divided Word determines the corresponding sentence element of Feature Words of the second text to be processed from word segmentation result;If in word segmentation result does not include Target signature word, and the sentence element of the second text to be processed does not include target sentences ingredient, by target signature word and second to Processing text is combined, and generates the text to be processed of new second;It is obtained and the second new text to be processed from default storage region Originally the second target question sentence text to match, and determining the second answer text with the second target question sentence textual association, to second Answer text is shown, and then robot customer service can effectively provide a user ideal after the multiple enquirement in face of user Answer.
Detailed description of the invention
Fig. 1 is the device structure schematic diagram of the text-processing for the hardware running environment that the embodiment of the present invention is related to;
Fig. 2 is a kind of flow diagram of text handling method first embodiment of the present invention;
Fig. 3 is a kind of flow diagram of text handling method second embodiment of the present invention;
Fig. 4 is a kind of structural block diagram of text processing apparatus of the present invention.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
Referring to Fig.1, Fig. 1 is the text-processing device structure signal for the hardware running environment that the embodiment of the present invention is related to Figure.
As shown in Figure 1, the text-processing equipment of the present embodiment can correspond to a kind of server, and corresponding including server Client, server carry and website be equipped with robot customer service or the corresponding client of server and be equipped with robot customer service.
Text processing equipment may include: processor 1001, such as CPU, communication bus 1002, user interface 1003, Network interface 1004, memory 1005.Wherein, communication bus 1002 is for realizing the connection communication between these components.User Interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), and optional user interface 1003 is also It may include standard wireline interface and wireless interface.Network interface 1004 optionally may include the wireline interface, wireless of standard Interface (such as WI-FI interface).Memory 1005 can be high speed RAM memory, be also possible to stable memory (non- Volatile memory), such as magnetic disk storage.Memory 1005 optionally can also be independently of aforementioned processor 1001 Storage device.
It will be understood by those skilled in the art that structure shown in Fig. 1 does not constitute the restriction to text-processing equipment, it can To include perhaps combining certain components or different component layouts than illustrating more or fewer components.
As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage medium Believe module, user's receiving module and text processor.
In text-processing equipment shown in Fig. 1, network interface 1004 is mainly used for logical with background data base progress data Letter;Text-processing equipment of the invention calls the text processor stored in memory 1005 by processor 1001, and holds The step of row text handling method.
Based on above-mentioned hardware configuration, text handling method embodiment of the present invention is proposed, referring to Fig. 2, a kind of text of Fig. 2 present invention The flow diagram of the first embodiment for the treatment of method.
In the present embodiment, the text handling method the following steps are included:
Step S10: the first text to be processed of user's input is received.
It should be noted that the executing subject of the present embodiment is the processor of above-mentioned text-processing equipment;At the text Reason equipment can correspond to a kind of server, and including the corresponding client of server, server carry and website be equipped with machine People's customer service software program or the corresponding client of server are equipped with robot customer service software program.
The user of the present embodiment can access website by browser, exchanged with robot customer service, input text information (the first text to be processed) arrives dialog box.
Step S20: target signature word is obtained from the described first text to be processed.
In the concrete realization, the described first text to be processed is segmented based on dynamic programming algorithm, so that described One text to be processed has multiple Feature Words;Calculate separately each Feature Words in the described first text to be processed described Weighted value is greater than the Feature Words of default weight threshold as target signature word by the weighted value in one text to be processed.
It should be noted that multiple pre-stored characteristics words can have been pre-saved in default storage region before being segmented, The dedicated dictionary for establishing initialization rapidly in default storage region by artificially mode combination business scenario first, merges The existing general dictionary into database.Dedicated dictionary needs are manually specified by business personnel, and according to high, normal, basic three class It is classified, the basis as similarity calculation later and participle weight dynamic adjustment.
Specifically, the described first text to be processed is divided in conjunction with multiple pre-stored characteristics words and based on dynamic programming algorithm Word, and stop words processing is carried out to word segmentation result, so that first text to be processed has multiple Feature Words;
It is based on dynamic programming algorithm (viterbi algorithm) and dedicated dictionary, is segmented using dynamic programming algorithm, first The words for including all in sentence are found out, every kind of the sum of weight of result that may be segmented then is calculated, segments target are as follows: point Word minimum number out, weight are maximum.If separating the word being not logged in dedicated dictionary, weight is calculated by 0, according to general dictionary It is handled.
For example, sentence " security turn bank do not turn what if ", it is learnt from backstage library, all segmenting list is List, Words all in List are traversed, carry out recurrence (choosing or not selecting) according to dynamic programming method, if selecting the word, The word is deleted in sentence, then proceedes to the sentence of recursive residual;If not selecting, the word is directly deleted from List, continues recurrence This sentence.
Following code can be referred to:
{ //str is input character string to Words F (String str, List list), and function return type is participle knot Fruit take the sum of highest weight as the result of participle with least participle number after expression segments problem str.List is All words present in str.Function return value is a class, List and word segmentation result comprising preservation word segmentation result in class Weight and K, finally to the scoring T=K/List.length of word segmentation result.
First word in w=list;
F f1=F (str-w, List);// using w, this word is segmented
Int k1=f1.k+w.k;// calculate the sum of weight
Double d1=f1.list.length+1;// calculate word segmentation result length
F f2=F (str, List-w);// segmented without using this word of w
Int k2=f2.k;// calculate the sum of weight
Double d2=f2.list.length;// calculate word segmentation result length
if(k1/d1>k2/d2)return f1;
else return f2;
}
Since dynamic programming algorithm time complexity is higher, dimension bit algorithm idea can be used and optimize: is right In some word, if after being split, as soon as the word being not present in dictionary can be split out, then this word is without splitting, Such as " GEM ", meeting spoiled " wound " " industry, plate " after fractionation, and latter two word can reduce the scoring of word not in dictionary after fractionation, Then without splitting.
Before finally we can take then this N kind scheme is saved in number as this participle scheme by N kind word segmentation result According in library, after user has putd question to a word, similarity mode is carried out to scheme all in database, takes similarity highest Question sentence matching result the most.
Step S30: the first object question sentence text to match with the described first text to be processed is obtained from default storage region This, and determining the first answer text with the first object question sentence textual association, the first answer text is shown;
In the concrete realization, by Euclidean distance algorithm by the described first text to be processed and default storage region to It matches question sentence text and carries out similarity-rough set, chosen from comparison result and the described first text degree of approximation to be processed maximum the One target question sentence text;The first answer text with the first object question sentence textual association is searched from default storage region; The first answer text is shown;
It should be noted that the similarity mode algorithm of the present embodiment uses the Text similarity computing side of the cosine law Method.
For example, the Feature Words of the first text to be processed of user's input are as follows: A, B, C, wherein the weight of B and C is the power of 2, A Weight is 1;
And the Feature Words of the first object question sentence text of the dedicated dictionary of default storage region are as follows: B, C, D, E, wherein D, E Weight be 1.Then similarity P=(power B2+ power C2)/[, √ (weighed A2+ power B2+ power C2) * √ (power B2+ power C2+ power D2+ power E2)]。
Step S40: the second text to be processed of user's input is received;
Specifically, the first answer text that the user can be fed back for the first time based on robot customer service inputs one newly again The problem of (the i.e. second text to be processed).
Step S50: segmenting the described second text to be processed, and second text to be processed is determined from word segmentation result This corresponding sentence element of Feature Words;
In the concrete realization, the described second text to be processed is segmented based on dynamic programming algorithm, that is, combined multiple Pre-stored characteristics word simultaneously segments the described second text to be processed based on dynamic programming algorithm, and carries out to word segmentation result deactivated Word is handled so that second text to be processed has multiple Feature Words;And second text to be processed is determined from word segmentation result This corresponding sentence element of Feature Words.
It will be appreciated that sentence element here include at least the subject of Chinese clause, predicate, object, predicative, attribute and The adverbial modifier.
Step S60: if in the word segmentation result not including target signature word, and the sentence of second text to be processed Ingredient does not include target sentences ingredient, and the target signature word and the described second text to be processed are combined, generated new Second text to be processed;
It should be noted that goal sentence element can be understood as targetedly sentence element, such as subject, Subject is best able to represent the theme of this sentence;The storage and memory of theme are according to preset participle weight, in word segmentation result The middle highest word of weight selection is remembered as theme, which is mainly used for new text after user and theme occur Theme completion when missing and noun subject missing.
The secondary treatment of theme memory is led according to general training pattern (large data sets except the process flow) The secondary-confirmation of topic and adjustment, and then generate the second new text to be processed.
Step S70: the second target to match with the second new text to be processed is obtained from default storage region and is asked Sentence text, and determining the second answer text with the second target question sentence textual association, carry out the second answer text It shows.
It in the concrete realization, will be in the second new text to be processed and default storage region by Euclidean distance algorithm Question sentence text to be matched carry out similarity-rough set, chosen from comparison result and the second new text degree of approximation to be processed Maximum second target question sentence text;The second solution with the second target question sentence textual association is searched from default storage region Answer text;The second answer text is shown.
The present embodiment obtains target signature word first from the first text to be processed;It is obtained and first from default storage region The first object question sentence text that text to be processed matches, and determining the first answer text with first object question sentence textual association This, is shown the first answer text;The second text to be processed of user's input is received again;Second text to be processed is carried out Participle determines the corresponding sentence element of Feature Words of the second text to be processed from word segmentation result;If not wrapping in word segmentation result Target signature word is included, and the sentence element of the second text to be processed does not include target sentences ingredient, by target signature word and second Text to be processed is combined, and generates the text to be processed of new second;It is obtained from default storage region to be processed with second newly The second target question sentence text that text matches, and determining the second answer text with the second target question sentence textual association, to the Two answer texts are shown, and then robot customer service can effectively provide a user reason when facing the multiple enquirement of user The answer thought.
In addition, the present embodiment can also access it is following the utility model has the advantages that
Compared to the implementation method relied solely on now in depth learning technology, required basis sample set very little, base It can quickly be established in the knowledge base of enterprise itself, operational feasibility is strong;
Hardware server cost with high costs is needed compared to deep learning, which can use the cloud of light weight The service of calculating is completed, and future expandability and upgrading are more convenient.
Based on the first embodiment of above-mentioned text handling method, text handling method second embodiment of the present invention, ginseng are proposed According to Fig. 3, a kind of flow diagram of the second embodiment of text handling method of Fig. 3 present invention.
In the present embodiment, after the step S70, the text handling method further include:
Step S80: obtaining the history text of different user's inputs, using history text that different users inputs as going through History text collection is saved in default storage region;
Step S90: clustering the history text in the history text set, obtain it is multiple it is poly- go out class and Each poly- corresponding cluster granularity of class out;
Step S100: it is carried out using poly- class out and each poly- corresponding cluster granularity of class out as cluster result It saves;
Wherein, the step S90, specifically includes:
Step S91: traversing the history text set, and the current history text traversed is gone through with described respectively Remaining history text to be compared carries out similarity mode in history text collection;
Step S92: similarity mode threshold value, the phase are generated according to the corresponding cluster granularity of the current history text If being characterized as like degree matching threshold, similarity mode threshold value is smaller, and poly- class out is more, if being that similarity mode threshold value is bigger, It is fewer then to gather the class;
Step S93: using matching probability be greater than target cluster granularity history text to be compared as with the current history The similar subject text of text is saved.
It will be appreciated that the word segmentation result of historical problem texts all in default storage region is saved in database, After user has submitted a question text, similarity mode calculating is carried out to sentence all in database, obtains Pi, is arranged One threshold value Y=80%, takes MaxPi, if MaxPi is more than or equal to Y, directly pushes the answer of corresponding sentence;If being less than Y, The preceding maximum sentence answer of 5 Pi then is pushed to user, the error of smoothing computation is carried out with this.
Further, after step sloo, the text handling method further include:
Step S101: the mistake of the irrelevant word of business is carried out according to multiple pre-stored characteristics words of the default storage region Filter, to carry out invalid information rejecting processing to cluster result;
Step S102: will carry out invalid information and reject treated processing result to carry out data distribution processing, will cluster knot The number that fruit occurs carries out the calculating of mean value and standard deviation, is added according to mean value with standard deviation, mean value, mean value and standard deviation are subtracted each other Carry out stepping;Stepping result is saved;It is default time by quantity when the number that cluster result occurs reaches preset times The stepping result of several mean value and standard deviation carries out the calculation processing of mean value and standard deviation again;According to secondary treatment mean value and Standard deviation as a result, carrying out weight adjustment, and the default memory block to multiple pre-stored characteristics words of the default storage region With the matched pre-stored characteristics word of secondary treatment result from increasing in domain, so that the described and matched pre-stored characteristics of secondary treatment result The weight of word becomes larger.
The present embodiment clusters user version all in set one period, scan full text, to appoint N-1 text of text S and remaining of anticipating carries out similarity mode, if matching probability is more than or equal to a threshold value Y, (Y indicates cluster Granularity, Y is bigger, and cluster granularity is smaller, gather class it is much thinner;Otherwise granularity is bigger, gather the class it is more few more Slightly), then two texts are collectively labeled as Unified number, if two documents, all without marked, a newly-increased cluster is marked Otherwise note marks upper marked class.Cluster result is subjected to invalid information rejecting, according to the everyday expressions letter in general dictionary Breath carries out the filtering of the irrelevant word of business.Filtration treatment result is subjected to data distribution processing, time that cluster result is occurred Number carries out the calculating of mean value and standard deviation, carries out stepping according to mean value+standard deviation, mean value, mean-standard deviation.Stepping result is protected It deposits, adds up after carrying out 100 times (preset times of the present embodiment are 100 times), again by 100 mean value+standard deviation stepping results The secondary calculation processing for carrying out mean value and standard deviation, according to mean value+standard deviation of secondary treatment as a result, carrying out in dedicated dictionary table Weight adjustment.Matched word automatic+1 in dedicated dictionary and secondary treatment result, weight are promoted.
For the present embodiment by being intended to cluster with the algorithm of Dynamic Programming and complete user, the business that can follow up is specific, has The data volume of the promotion later period computation model of effect and the sample set of training.Compared to the method for deep learning, more fitting business field The demand of scape and the real demand of user.
In addition, the present invention also proposes a kind of text processing apparatus referring to Fig. 4, described device includes:
First receiving module 10, for receiving the first text to be processed of user's input;
First participle module 20, for obtaining target signature word from the described first text to be processed;
First display module 30, for obtaining match with the described first text to be processed first from default storage region Target question sentence text, and determining the first answer text with the first object question sentence textual association, to the first answer text Originally it is shown;
Second receiving module 40, for receiving the second text to be processed of user's input;
Second word segmentation module 50, for being segmented to the described second text to be processed, from word segmentation result described in determination The corresponding sentence element of Feature Words of second text to be processed;
Absenceofsubject completion module 60, if in the word segmentation result not including target signature word, and described second The sentence element of text to be processed does not include target sentences ingredient, by the target signature word and the described second text to be processed into Row combination, generates the text to be processed of new second;
Second display module 70, for what is matched from the acquisition of default storage region with the second new text to be processed Second target question sentence text, and determining the second answer text with the second target question sentence textual association, to second solution Text is answered to be shown.
It will be appreciated that the text processing apparatus of this implementation can be a kind of application program, which is loaded in In the equipment for stating the text-processing of embodiment, the specific implementation of text processing apparatus of the present invention can refer to above-mentioned text-processing Embodiment of the method, details are not described herein again.
In addition, being stored at text in the computer storage medium the present invention also provides a kind of computer storage medium The step of reason program, the text processor realizes text handling method as described above when being executed by processor.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, method, article or the system that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or system institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, method of element, article or system.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server, air conditioner or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of text handling method, which is characterized in that the described method includes:
Receive the first text to be processed of user's input;
Target signature word is obtained from the described first text to be processed;
The first object question sentence text to match with the described first text to be processed, and determining and institute are obtained from default storage region The the first answer text for stating first object question sentence textual association, is shown the first answer text;
Receive the second text to be processed of user's input;
Described second text to be processed is segmented, the Feature Words pair of second text to be processed are determined from word segmentation result The sentence element answered;
If in the word segmentation result does not include target signature word, and the sentence element of second text to be processed does not include mesh Sentence element is marked, the target signature word and the described second text to be processed are combined, the text to be processed of new second is generated This;
The the second target question sentence text to match with the second new text to be processed is obtained from default storage region, and is determined Text is answered with the second of the second target question sentence textual association, the second answer text is shown.
2. the method as described in claim 1, which is characterized in that described to obtain target signature from the described first text to be processed The step of word, comprising:
The described first text to be processed is segmented based on dynamic programming algorithm, so that first text to be processed is with more A Feature Words;
The weighted value in the described first text to be processed of each Feature Words in the described first text to be processed is calculated separately, Weighted value is greater than the Feature Words of default weight threshold as target signature word;
Correspondingly, described that described second text to be processed is segmented, second text to be processed is determined from word segmentation result The step of this Feature Words corresponding sentence element, comprising:
The described second text to be processed is segmented based on dynamic programming algorithm, so that second text to be processed is with more A Feature Words;
The corresponding sentence element of Feature Words of second text to be processed is determined from word segmentation result.
3. method according to claim 2, which is characterized in that described to be processed from the acquisition of default storage region and described first The first object question sentence text that text matches, and determining the first answer text with the first object question sentence textual association, The step of being shown to the first answer text, specifically includes:
The question sentence text to be matched in the described first text to be processed and default storage region is carried out by Euclidean distance algorithm Similarity-rough set is chosen and the described first maximum first object question sentence text of the text degree of approximation to be processed from comparison result;
The first answer text with the first object question sentence textual association is searched from default storage region;
The first answer text is shown;
Correspondingly, described that the second target question sentence to match with the second new text to be processed is obtained from default storage region Text, and determining the second answer text with the second target question sentence textual association, open up the second answer text The step of showing specifically includes:
By Euclidean distance algorithm by the question sentence text to be matched in the second new text to be processed and default storage region Similarity-rough set is carried out, chooses from comparison result and is asked with second new maximum second target of the text degree of approximation to be processed Sentence text;
The second answer text with the second target question sentence textual association is searched from default storage region;
The second answer text is shown.
4. method as claimed in claim 2 or claim 3, which is characterized in that the default storage region preserves multiple pre-stored characteristics Word;
Correspondingly, described described first text to be processed to be segmented based on dynamic programming algorithm, so that described first wait locate Reason text has the step of multiple Feature Words, specifically includes:
The described first text to be processed is segmented in conjunction with multiple pre-stored characteristics words and based on dynamic programming algorithm, and to participle As a result stop words processing is carried out, so that first text to be processed has multiple Feature Words;
Correspondingly, described described second text to be processed to be segmented based on dynamic programming algorithm, so that described second wait locate Reason text has the step of multiple Feature Words, specifically includes:
The described second text to be processed is segmented in conjunction with multiple pre-stored characteristics words and based on dynamic programming algorithm, and to participle As a result stop words processing is carried out, so that second text to be processed has multiple Feature Words.
5. method as claimed in claim 4, which is characterized in that it is described to it is described second answer text be shown the step of it Afterwards, the method also includes:
The history text of different user's inputs is obtained, the history text that different users is inputted is protected as history text set It is stored to default storage region;
History text in the history text set is clustered, multiple poly- classes and each poly- class out out are obtained Corresponding cluster granularity;
Poly- class out and each poly- corresponding cluster granularity of class out are saved as cluster result;
Wherein, the step of history text in the history text set clusters, specifically includes:
The history text set is traversed, by the current history text traversed respectively and in the history text set Remaining history text to be compared carries out similarity mode;
Similarity mode threshold value, the similarity mode threshold value are generated according to the corresponding cluster granularity of the current history text If being characterized as, similarity mode threshold value is smaller, and poly- class out is more, if being that similarity mode threshold value is bigger, poly- class out is got over It is few;
Matching probability is greater than the history text to be compared of target cluster granularity as theme similar with the current history text Text is saved.
6. method as claimed in claim 5, which is characterized in that described that poly- class out and each poly- class out is corresponding After the step of cluster granularity is saved as cluster result, further includes:
The filtering that the irrelevant word of business is carried out according to multiple pre-stored characteristics words of the default storage region, to cluster result Carry out invalid information rejecting processing.
7. method as claimed in claim 6, which is characterized in that multiple pre-stored characteristics according to the default storage region Word carries out the filtering of the irrelevant word of business, to carry out the step of invalid information rejects processing to cluster result, specifically includes:
To carry out invalid information and reject treated processing result to carry out data distribution processing, the number that cluster result is occurred into The calculating of row mean value and standard deviation is added according to mean value with standard deviation, mean value, mean value and standard deviation subtract each other carry out stepping;
Stepping result is saved;
When the number that cluster result occurs reaches preset times, by the stepping knot of mean value and standard deviation that quantity is preset times Fruit carries out the calculation processing of mean value and standard deviation again;
According to secondary treatment mean value and standard deviation as a result, multiple pre-stored characteristics words to the default storage region are weighed Recanalization, and in the default storage region with the matched pre-stored characteristics word of secondary treatment result from increasing so that described with two The weight of the secondary matched pre-stored characteristics word of processing result becomes larger.
8. a kind of text processing apparatus, which is characterized in that described device includes:
First receiving module, for receiving the first text to be processed of user's input;
First participle module, for obtaining target signature word from the described first text to be processed;
First display module is asked for obtaining the first object to match with the described first text to be processed from default storage region Sentence text, and determining the first answer text with the first object question sentence textual association, carry out the first answer text It shows;
Second receiving module, for receiving the second text to be processed of user's input;
Second word segmentation module, for being segmented to the described second text to be processed, from word segmentation result determine described second to Handle the corresponding sentence element of Feature Words of text;
Absenceofsubject completion module, if in the word segmentation result not including target signature word, and it is described second to be processed The sentence element of text does not include target sentences ingredient, and the target signature word and the described second text to be processed are carried out group It closes, generates the text to be processed of new second;
Second display module, for obtaining the second mesh to match with the second new text to be processed from default storage region Question sentence text, and determining the second answer text with the second target question sentence textual association are marked, to the second answer text It is shown.
9. a kind of equipment of text-processing, which is characterized in that the equipment of the text-processing includes: memory, processor and deposits The text processor that can be run on the memory and on the processor is stored up, the text processor is configured to reality Now the step of text handling method as described in any one of claims 1 to 7.
10. a kind of computer storage medium, which is characterized in that the computer storage medium is stored with text processor, institute State the step of text processor is arranged for carrying out the text handling method as described in any one of claims 1 to 7.
CN201910777842.0A 2019-08-21 2019-08-21 A kind of text handling method, device, equipment and computer storage medium Pending CN110502752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910777842.0A CN110502752A (en) 2019-08-21 2019-08-21 A kind of text handling method, device, equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910777842.0A CN110502752A (en) 2019-08-21 2019-08-21 A kind of text handling method, device, equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN110502752A true CN110502752A (en) 2019-11-26

Family

ID=68588770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910777842.0A Pending CN110502752A (en) 2019-08-21 2019-08-21 A kind of text handling method, device, equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN110502752A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708862A (en) * 2020-06-02 2020-09-25 上海硬通网络科技有限公司 Text matching method and device and electronic equipment
CN111859902A (en) * 2020-07-16 2020-10-30 微医云(杭州)控股有限公司 Text processing method, device, equipment and medium
WO2021128246A1 (en) * 2019-12-27 2021-07-01 拉克诺德(深圳)科技有限公司 Voice data processing method, apparatus, computer device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108776677A (en) * 2018-05-28 2018-11-09 深圳前海微众银行股份有限公司 Creation method, equipment and the computer readable storage medium of parallel statement library
CN109635091A (en) * 2018-12-14 2019-04-16 上海钛米机器人科技有限公司 A kind of method for recognizing semantics, device, terminal device and storage medium
WO2019153613A1 (en) * 2018-02-09 2019-08-15 平安科技(深圳)有限公司 Chat response method, electronic device and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019153613A1 (en) * 2018-02-09 2019-08-15 平安科技(深圳)有限公司 Chat response method, electronic device and storage medium
CN108776677A (en) * 2018-05-28 2018-11-09 深圳前海微众银行股份有限公司 Creation method, equipment and the computer readable storage medium of parallel statement library
CN109635091A (en) * 2018-12-14 2019-04-16 上海钛米机器人科技有限公司 A kind of method for recognizing semantics, device, terminal device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
余正涛等: "基于问句语料库的受限领域自动应答系统", 《计算机工程与应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021128246A1 (en) * 2019-12-27 2021-07-01 拉克诺德(深圳)科技有限公司 Voice data processing method, apparatus, computer device and storage medium
CN111708862A (en) * 2020-06-02 2020-09-25 上海硬通网络科技有限公司 Text matching method and device and electronic equipment
CN111708862B (en) * 2020-06-02 2024-03-15 上海硬通网络科技有限公司 Text matching method and device and electronic equipment
CN111859902A (en) * 2020-07-16 2020-10-30 微医云(杭州)控股有限公司 Text processing method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US11334635B2 (en) Domain specific natural language understanding of customer intent in self-help
US11004013B2 (en) Training of chatbots from corpus of human-to-human chats
CN109522556B (en) Intention recognition method and device
US11003863B2 (en) Interactive dialog training and communication system using artificial intelligence
CN112346567B (en) Virtual interaction model generation method and device based on AI (Artificial Intelligence) and computer equipment
US10891322B2 (en) Automatic conversation creator for news
US20160293034A1 (en) Question answering system-based generation of distractors using machine learning
US10395641B2 (en) Modifying a language conversation model
WO2022237253A1 (en) Test case generation method, apparatus and device
CN110502752A (en) A kind of text handling method, device, equipment and computer storage medium
US11847423B2 (en) Dynamic intent classification based on environment variables
CN111198817B (en) SaaS software fault diagnosis method and device based on convolutional neural network
US11645288B2 (en) Reassigning gamer clusters based on engagement
US20220138770A1 (en) Method and apparatus for analyzing sales conversation based on voice recognition
CN111694940A (en) User report generation method and terminal equipment
CN112270546A (en) Risk prediction method and device based on stacking algorithm and electronic equipment
CN110347840A (en) Complain prediction technique, system, equipment and the storage medium of text categories
US20230281389A1 (en) Topic suggestion in messaging systems
CN114722839A (en) Man-machine collaborative dialogue interaction system and method
CN111309288B (en) Analysis method and device of software requirement specification file suitable for banking business
US10970490B2 (en) Automatic evaluation of artificial intelligence-based processes
US20230237276A1 (en) System and Method for Incremental Estimation of Interlocutor Intents and Goals in Turn-Based Electronic Conversational Flow
CN111159370A (en) Short-session new problem generation method, storage medium and man-machine interaction device
CN109885668A (en) A kind of expansible field interactive system status tracking method and apparatus
US11922129B2 (en) Causal knowledge identification and extraction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20231110

AD01 Patent right deemed abandoned