CN107346344A - The method and apparatus of text matches - Google Patents
The method and apparatus of text matches Download PDFInfo
- Publication number
- CN107346344A CN107346344A CN201710607397.4A CN201710607397A CN107346344A CN 107346344 A CN107346344 A CN 107346344A CN 201710607397 A CN201710607397 A CN 201710607397A CN 107346344 A CN107346344 A CN 107346344A
- Authority
- CN
- China
- Prior art keywords
- text
- mrow
- history
- current
- msub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method and apparatus of text matches, it is related to computer realm.One embodiment of this method includes:At least one Feature Words of current text are determined, the characteristic vector of the current text is obtained according at least one Feature Words;Using the characteristic vector of the current text, the similarity between any history text in the current text and multiple history texts is calculated;And selection and the similarity of the current text meet matched text of the history text of preset rules as the current text.The embodiment can determine the history text with the text matches of current event, so as to provide suggestion to solve current event.
Description
Technical field
The present invention relates to computer realm, more particularly to a kind of method and apparatus of text matches.
Background technology
At present, the CRM (Customer Relationship Management, customer relation management) of Process-Oriented management
The problem of system is used widely in the customer service work of enterprise, and its main function is record client's consulting, and form crm
Event trouble ticket dispatch is handled to corresponding contact staff, is solved customer issue postscript and is recorded shelves.
During the present invention is realized, inventor has found that prior art at least has problems with:
In the substantial amounts of crm events work order accumulated in routine duties, many work orders are all the events for repeating to occur.Such as
User A seeks advice from goods return and replacement problem, after contact staff's processing can by client the problem of, processing procedure and result recorded crm
In event, but other users are still had afterwards and continue to seek advice from the problem of same.In the prior art, not by above-mentioned history crm
Event work order is used in provides suggestion for current crm events, and so as to cause, customer service operating efficiency is relatively low, serious waste of resources.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus of text matches, can determine and current event
Text matches history text, so as to for solution current event suggestion is provided.
To achieve the above object, according to an aspect of the invention, there is provided a kind of method of text matches.
The method of the text matches of the embodiment of the present invention includes:At least one Feature Words of current text are determined, according to institute
State the characteristic vector that at least one Feature Words obtain the current text;Using the characteristic vector of the current text, institute is calculated
State the similarity between any history text in current text and multiple history texts;And selection and the current text
Similarity meets matched text of the history text of preset rules as the current text.
Alternatively, the characteristic vector for obtaining the current text according at least one Feature Words includes:Calculate
In at least one Feature Words each the current text weighted value, generate the feature of the current text to
Amount.
Alternatively, methods described further comprises:The Feature Words of the current text are calculated described according to following equation
The weighted value of current text;
Wherein, i is positive integer, Wi1For the current text ith feature word the current text weighted value,
ti1It is the ith feature word in the probability of occurrence of the current text, di1To include the ith feature word in text library
Textual data, N is total textual data of text library, and the text library is made up of the current text with history text.
Alternatively, the characteristic vector using the current text, calculates the current text and multiple history texts
In any history text between similarity include:Obtain the characteristic vector of any history text;Utilize the current text
Characteristic vector and any history text characteristic vector, calculate between the current text and the history text
Similarity.
Alternatively, obtaining the characteristic vector of any history text includes:Determine at least one of any history text
Feature Words;Each at least one Feature Words is calculated in the weighted value of the history text, generates the history text
Characteristic vector.
Alternatively, methods described further comprises:The current text and any history text are calculated using following equation
Similarity;
Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history
I-th component of the characteristic vector of text, S are the current text and the similarity of the history text.
Alternatively, the similarity of the selection and the current text meets and worked as described in the history text conduct of preset rules
The matched text of preceding text includes:
By any history text with the similarity of the current text compared with predetermined threshold value, select the similarity
More than the history text of the predetermined threshold value;In the history text that the similarity is more than the predetermined threshold value, described in selection
Matched text of the maximum history text of similarity as the current text.
To achieve the above object, according to an aspect of the invention, there is provided a kind of device of text matches.
The device of the text matches of the embodiment of the present invention includes:Characteristic vector acquisition module, for determining current text
At least one Feature Words, the characteristic vector of the current text is obtained according at least one Feature Words;Matching module, it is used for
Using the characteristic vector of the current text, calculate between any history text in the current text and multiple history texts
Similarity;And selection and the similarity of the current text meet the history text of preset rules as the current text
Matched text.
Alternatively, the characteristic vector acquisition module is used for:Each at least one Feature Words is calculated in institute
The weighted value of current text is stated, generates the characteristic vector of the current text.
Alternatively, the characteristic vector acquisition module is used for:The Feature Words of the current text are calculated according to following equation
In the weighted value of the current text;
Wherein, i is positive integer, Wi1For the current text ith feature word the current text weighted value,
ti1It is the ith feature word in the probability of occurrence of the current text, di1To include the ith feature word in text library
Textual data, N is total textual data of text library, and the text library is made up of the current text with history text.
Alternatively, the matching module is used for:Obtain the characteristic vector of any history text;Utilize the current text
The characteristic vector of characteristic vector and any history text, calculates the phase between the current text and the history text
Like degree.
Alternatively, the matching module is used for:The current text and any history text are calculated using following equation
Similarity;
Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history
I-th component of the characteristic vector of text, S are the current text and the similarity of the history text.
To achieve the above object, according to another aspect of the invention, there is provided a kind of electronic equipment.
The a kind of electronic equipment of the present invention includes:One or more processors;Storage device, for storing one or more
Program, when one or more of programs are by one or more of computing devices so that one or more of processors
The method for realizing text matches provided by the present invention.
To achieve the above object, in accordance with a further aspect of the present invention, there is provided a kind of computer-readable recording medium.
A kind of computer-readable recording medium of the present invention, is stored thereon with computer program, described program is by processor
The method that text matches provided by the present invention are realized during execution.
Technique according to the invention scheme, one embodiment in foregoing invention has the following advantages that or beneficial effect:It is logical
The Feature Words for crossing extraction CRM event texts determine the characteristic vector of text, the digitized representations of text data are realized, after being easy to
Continuous Similarity Measure;By the Similarity Measure of feature based vector, matcher between accurate judgement text is realized
Effect;Meet that the history text of preset rules is used as the matched text of current text by choosing, realize independent of artificial warp
The text tested is automatic, Rapid matching, and then can be solved in the prior art to contact staff's automatic push history reference information
Contact staff is completely dependent on experience and solves the defects of customer issue.
Further effect adds hereinafter in conjunction with embodiment possessed by above-mentioned non-usual optional mode
With explanation.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the key step schematic diagram of the method for text matches according to embodiments of the present invention;
Fig. 2 is the schematic diagram of the major part of the device of text matches according to embodiments of the present invention;
Fig. 3 is the CRM event overall flow figures of prior art;
Fig. 4 is the CRM event overall flow figures of the method for text matches according to embodiments of the present invention;
Fig. 5 is the CRM event particular flow sheets of the method for text matches according to embodiments of the present invention;
Fig. 6 is to can apply to exemplary system architecture figure therein according to embodiments of the present invention;
Fig. 7 is the structural representation for realizing the electronic equipment of the method for the text matches of the embodiment of the present invention.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
The technical scheme of embodiments of the invention by extract the Feature Words of CRM event texts determine the feature of text to
Amount, realizes the digitized representations of text data, is easy to follow-up Similarity Measure;Pass through the similarity meter of feature based vector
Calculate, realize the effect of matcher between accurate judgement text;Meet that the history text of preset rules is used as by choosing to work as
The matched text of preceding text, realize independent of the text of artificial experience is automatic, Rapid matching, so can to contact staff from
Dynamic push history reference information, solve contact staff in the prior art and be completely dependent on lacking for experience solution customer issue
Fall into.
Embodiment one
Fig. 1 is the key step schematic diagram according to the method for the text matches of the present embodiment.
As shown in figure 1, the method for the text matches of the embodiment of the present invention mainly comprises the following steps:
Step S101:At least one Feature Words of current text are determined, according to obtaining at least one Feature Words
The characteristic vector of current text.
In this step, current text segmented, go to prohibit word operation, you can obtain the feature with substantive content
Word.After the Feature Words of current text are obtained, weighted value of each Feature Words in current text of current text is calculated, you can
Generate the characteristic vector of current text.
Specifically, the Feature Words of current text can be obtained by the following formula in the weighted value of current text:
Wherein, i is positive integer, Wi1For current text ith feature word in the weighted value of current text, ti1For i-th
Feature Words are in the probability of occurrence of current text, di1To include the textual data of ith feature word in text library, N is the total of text library
Textual data, text library are made up of the current text with history text, probability of occurrence ti1It is ith feature word in current text
Occurrence number and the ratio of Feature Words sum in text library.
In the present embodiment, prior to, concurrently with, or after step S101 is performed, appointing in multiple history texts can be obtained
The characteristic vector of one history text.Specifically, the Feature Words of any history text are obtained first, then calculate each feature
Word in the weighted value of the history text, finally generated according to the weighted value of each Feature Words the feature of the history text to
Amount.
In practical application, the Feature Words of history text can be obtained by the following formula in the weighted value of the history text:
Wherein, i is positive integer, Wi2For any history text ith feature word in the weighted value of the history text, ti2
It is ith feature word in the probability of occurrence of the history text, di2To include the textual data of the ith feature word in text library,
Probability of occurrence ti2For the ratio of ith feature word Feature Words sum in the occurrence number of the history text and text library.
It is understood that the weighted value of Feature Words can represent the significance level of Feature Words in the text, by multiple spies
Thus the vector of the weighted value composition of sign word can be used for matching judgment as the characteristic vector of text.Usually, Feature Words are in text
Probability of occurrence in this is higher, and its weighted value is bigger;Meanwhile the number that Feature Words occur in text library is more, illustrate that its is only
Characteristic is lower, therefore its weighted value is smaller, and therefore, the weighted value of the Feature Words of current text and history text can pass through above-mentioned public affairs
Formula Wi1、Wi2It is calculated.
Especially, current text characteristic vector and the characteristic vector of any history text are tieed up using Feature Words as vector
Degree, the component of the vector dimension is used as using the weighted value of this feature word.Meanwhile current text characteristic vector and any history text
Characteristic vector be K dimensional vectors, K is the Feature Words sum of current text and history text, and K is positive integer.In generation text
Characteristic vector when, in the vector dimension for the Feature Words that the text includes, the vector dimension is used as using the weighted value of this feature word
Component;In the vector dimension for the Feature Words that the text does not include, its component is zero.
It should be noted that in this step, after obtaining current text and the characteristic vector of history text, it can also preserve
Characteristic vector is used for subsequent match.In other words, subsequently each time in matching process, can't to history text repeated characteristic to
The above-mentioned calculation procedure of amount, but the characteristic vector of current matching process is directly obtained according to the characteristic vector of preservation.It is actual to answer
In, if the Feature Words of the current text of matching process are all contained in text library next time, the feature of each history text
Vector is constant;If if next time there are the Feature Words not having in text library in the current text of matching process, in preservation
The null component for increasing new feature word dimension in the characteristic vector of each history text is the current signature vector for forming each history text.
Step S102:Using the characteristic vector of current text, the current text and appointing in multiple history texts are calculated
Similarity between one history text;Selection and the similarity of current text meet the history texts of preset rules as ought be above
This matched text.
In this step, calculated using the characteristic vector of current text and the characteristic vector of any history text current
Similarity between text and the history text.
Specifically, in this step, the similarity between current text and any history text is calculated by following equation:
Wherein, n is positive integer, AiFor the i-th component of the characteristic vector of current text, BiFor the feature of any history text
I-th component of vector, S are current text and the similarity of the history text.
In embodiments of the present invention, preset rules are:By the similarity of any history text and current text and default threshold
Value is compared, and determines that similarity is more than the history text of predetermined threshold value;In the history text that similarity is more than predetermined threshold value,
Select matched text of the maximum history text of similarity as current text.It is understood that preset rules can basis
Flexible setting is actually needed, such as:The history text maximum with the similarity of current text can also be directly selected as current
The matched text of text.
Pass through above-mentioned steps, you can determine the text that is matched the most with current text from history text, and then can from
With obtaining relevant information in text.
The method of text matches according to embodiments of the present invention can be seen that because employing by extracting CRM events text
This Feature Words determine the technological means of the characteristic vector of text, have reached the digitized representations for realizing text data, after being easy to
The effect of continuous Similarity Measure;By the Similarity Measure of feature based vector, realize and match between accurate judgement text
The effect of program;By choose meet that the history text of preset rules is used as the matched text of current text, realize independent of
The text of artificial experience is automatic, Rapid matching, and then can be solved existing to contact staff's automatic push history reference information
Contact staff is completely dependent on experience and solves the defects of customer issue in technology.
Embodiment two
Fig. 2 is the schematic diagram according to the major part of the device of the text matches of the present embodiment.
As shown in Fig. 2 the device 200 of the text matches of the present embodiment may include characteristic vector acquisition module 201 and matching
Module 202.Wherein:
Characteristic vector acquisition module 201 can be used at least one Feature Words for determining current text, according to described at least one
Individual Feature Words obtain the characteristic vector of the current text.
Matching module 202 can be used for the characteristic vector using the current text, calculates the current text and is gone through with multiple
The similarity between any history text in history text;The similarity with the current text is selected to meet going through for preset rules
Matched text of the history text as the current text.
It is preferred that in the present embodiment, the characteristic vector acquisition module 201 can be used for:Calculate at least one spy
Levy each in word the current text weighted value, generate the characteristic vector of the current text.
In the present embodiment, the characteristic vector acquisition module 201 can be used for:Ought be above according to calculating following equation
Weighted value of this Feature Words in the current text;
Wherein, i is positive integer, Wi1For the current text ith feature word the current text weighted value,
ti1It is the ith feature word in the probability of occurrence of the current text, di1To include the ith feature word in text library
Textual data, N is total textual data of text library, and the text library is made up of the current text with history text.
In practical application, the matching module 202 can be used for:Obtain the characteristic vector of any history text;Using described
The characteristic vector of the characteristic vector of current text and any history text, calculate the current text and history text
Similarity between this.
In the optional implementation of the present embodiment, the matching module 202 can be used for:Described in being calculated using following equation
Current text and the similarity of any history text;
Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history
I-th component of the characteristic vector of text, S are the current text and the similarity of the history text.
From the above, it can be seen that the device of the text matches of the embodiment of the present invention is by extracting CRM event texts
Feature Words determine the characteristic vector of text, realize the digitized representations of text data, are easy to follow-up Similarity Measure;Pass through
The Similarity Measure of feature based vector, realize the effect of matcher between accurate judgement text;Met in advance by choosing
If matched text of the history text as current text of rule, realize independent of the text of artificial experience it is automatic, quick
Match somebody with somebody, and then can solve contact staff in the prior art to contact staff's automatic push history reference information and be completely dependent on certainly
Body experience solves the defects of customer issue.
Embodiment three
It is understood that the method for the text matches of the embodiment of the present invention can be used for the text of most technical fields
This matching, the method for the text matches of the embodiment of the present invention will be introduced by taking the text matches of CRM events as an example below.Need to refer to
Go out, the particular technique content of following CRM event texts matching is not to the method for the text matches of the embodiment of the present invention
Form any restrictions.
The CRM event handlings flow of prior art is as shown in figure 3, as seen from Figure 3:Client seeks advice from inlet wire, visitor first
Linked up after taking personnel's wiring with client, and the event summary classification for recording customer issue and rule of thumb judging, generation are new
CRM events.System carries out distribute leaflets action to the CRM events work order afterwards, is sent to the contact staff specified.Work as contact staff
Described by content the problem of the event that record to system before after receiving distribute leaflets task and event summary classification, advised by enterprise
Fixed normalized service flow solves the problems, such as client.Closed after the completion of event worksheet, CRM is arrived in CRM events storage
In event base.
Above-mentioned crm system is intended only as a workflow management instrument, realize event trouble ticket dispatch and logout storage
Function, do not locate constantly in service process in view of the useful information carried in historical events work order, a line contact staff
Repetitive event is managed, causes operating efficiency relatively low and the serious wasting of resources.
In view of the above-mentioned problems, the present embodiment provides CRM event handling flows as shown in Figure 4.It can be seen that in Fig. 4,
The device of text matches is initially set up, when establishing new CRM events, new CRM events are sent to the device of text matches,
The automatic match information that obtains provides to the contact staff for solving problem.In the present embodiment, history CRM event texts packet
Include:Numbering, event summary classification text, problem describe text and result text.New CRM event text information includes:
Numbering, event summary classification text, problem describe text and creation time.Hereafter will be literary to be described the problem of new CRM events
This determines matched text, and then general as current text in describing text (i.e. history text) from the problem of history CRM events
The suggestion of new CRM events is handled as contact staff with result text output corresponding to text.
In concrete application, the device of above-mentioned text matches includes:Characteristic vector acquisition module and matching module.Wherein:
Characteristic vector acquisition module is used for the text message for gathering new CRM events and history CRM events, and determines above-mentioned
The Feature Words of current text and history text in information;Current text and the characteristic vector of history text are calculated according to Feature Words.
Matching module is used to calculate current text and the similarity of each history text according to characteristic vector, and selects similar
Degree meets the history text of preset rules as matched text.
Specifically, characteristic vector acquisition module performs following steps:
1. the text message of the new CRM events of collection and history CRM events.
Such as:The history CRM event text information of collection is as shown in the table:
The new CRM event text information of collection is as shown in the table:
(and history CRM events are asked for the problem of (i.e. new CRM events describe text) and history text 2. pair current text
Topic description text) segmented, go to prohibit word processing, obtain the Feature Words of current text and history text.
Such as:The current text of precedent and history text are handled, obtain following characteristics word:
3. for each text in history text and current text, power of each of which Feature Words in the text is calculated
Weight values.
Specifically, weighted value is calculated by below equation:
Wherein, i is positive integer, WiFor text ith feature word in the weighted value of the text, tiIt is this feature word at this
The probability of occurrence of text, diTo include the textual data of this feature word in text library, N is total textual data of text library, text library by
Current text forms with history text, probability of occurrence tiFor feature of this feature word in the occurrence number and text library of the text
The ratio of word sum.
4. the characteristic vector of text is constructed by the weighted value of Feature Words.
Such as:The history text that numbering to upper example is 1 calculates with current text, can obtain as shown in the table
Characteristic vector:
Weight Wi | Client | Come Electricity | Instead Feedback | Receive | Dimension Repair | More Change | Firmly Part | By force Row | It is required that | Change Goods | Sell Afterwards | Core It is real | Connect By | Urge | Examine Core |
d1 | 0.05 5 | 0 | 0.0 5 | 0.01 8 | 0.0 5 | 0.0 5 | 0.0 5 | 0.0 5 | 0.01 8 | 0.0 5 | 0.0 5 | 0.0 5 | 0.0 5 | 0.01 8 | 0. 01 8 |
d432 | 0.01 8 | 0.0 5 | 0 | 0.01 8 | 0 | 0 | 0 | 0 | 0.01 8 | 0 | 0 | 0 | 0 | 0.01 8 | 0. 01 8 |
That is the characteristic vector of the history text and current text is respectively:
D1=0.055,0,0.05,0.018,0.05,0.05,0.05,0.05,0.018,0.05,0.05,0 .05,
0.05,0.018,0.018}
D432={ 0.018,0.05,0,0.018,0,0,0,0,0.018,0,0,0,0,0.018,0.018 }
Matching module performs following steps:
1. the similarity of any history text and current text is calculated using following equation.
Wherein, n is positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history text
I-th component of this characteristic vector, S are the current text and the similarity of the history text.
Such as:The d1 and d432 of upper example are calculated, similarity 0.22.
2. similarity is met to matched text of the history text as current text of preset rules.Usually, rule are preset
Can be then:The history text that similarity is more than to predetermined threshold value first is chosen, and therefrom chooses the maximum history of similarity afterwards
Text exports as matched text.
3. being exported the solution link of text in Figure 5 is handled corresponding to matched text to contact staff, asked as solution
The suggestion of topic.
By the above-mentioned steps of characteristic vector acquisition module and matching module, the present embodiment realizes the automatic of current text
Matching, takes full advantage of the valuable information in historical events, greatly improves contact staff's operating efficiency and work quality.
Fig. 5 is the CRM event particular flow sheets of the method for text matches according to embodiments of the present invention.Can be with from Fig. 5
See the specific execution step of characteristic vector acquisition module and matching module.
It is emphasized that application of the method for the text matches that the present embodiment provides in crm system be not to this hair
Bright carry out any restrictions.In fact, the methods of the text matches of the present invention can be used for having Feature Words, history text feature
Any technical field and technological accumulation and inheritance.Such as:Search text commercial product recommending in internet arena, user's advisory text reply,
Customer satisfaction evaluation processing etc., in reply that publishing area is suggested to reader etc., the method for text matches of the invention
It is applicable.
Fig. 6 show can apply the embodiment of the present invention text matches method or text matches device it is exemplary
System architecture 600.
As shown in fig. 6, system architecture 600 can include terminal device 601,602,603, network 604 and server 605
(this framework is only example, and the component included in specific framework can be according to the adjustment of application concrete condition).Network 604 to
The medium of communication link is provided between terminal device 601,602,603 and server 605.Network 604 can include various connections
Type, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 601,602,603 by network 604 with server 605, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 601,602,603
(merely illustrative) such as the application of page browsing device, searching class application, JICQ, mailbox client, social platform softwares.
Terminal device 601,602,603 can have a display screen and a various electronic equipments that supported web page browses, bag
Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..
Server 605 can be to provide the server of various services, such as utilize terminal device 601,602,603 to user
The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to receiving
To the data such as information query request analyze etc. processing, and by result (such as target push information, product letter
Breath -- merely illustrative) feed back to terminal device.
It should be noted that the method for the text matches that the embodiment of the present invention is provided typically is performed by server 605, phase
Ying Di, the device of text matches are generally positioned in server 605.
It should be understood that the number of the terminal device, network and server in Fig. 6 is only schematical.According to realizing need
Will, can have any number of terminal device, network and server.
Present invention also offers a kind of electronic equipment.
The electronic equipment of the embodiment of the present invention includes:One or more processors;Storage device, for storing one or more
Individual program, when one or more of programs are by one or more of computing devices so that one or more of processing
The method that device realizes text matches provided by the present invention.
Below with reference to Fig. 7, it illustrates suitable for for realizing the computer system 700 of the electronic equipment of the embodiment of the present invention
Structural representation.Electronic equipment shown in Fig. 7 is only an example, to the function of the embodiment of the present invention and should not use model
Shroud carrys out any restrictions.
As shown in fig. 7, computer system 700 includes CPU (CPU) 701, it can be read-only according to being stored in
Program in memory (ROM) 702 or be loaded into program in random access storage device (RAM) 703 from storage part 708 and
Perform various appropriate actions and processing.In RAM703, be also stored with computer system 700 operate required various programs and
Data.CPU701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 also connects
To bus 704.
I/O interfaces 705 are connected to lower component:Importation 706 including keyboard, mouse etc.;Penetrated including such as negative electrode
The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 708 including hard disk etc.;
And the communications portion 709 of the NIC including LAN card, modem etc..Communications portion 709 via such as because
The network of spy's net performs communication process.Driver 710 is also according to needing to be connected to I/O interfaces 705.Detachable media 711, such as
Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 710, so as to what is read from it
Computer program is mounted into storage part 708 as needed.
Especially, may be implemented as according to embodiment disclosed by the invention, the process of key step figure above description
Computer software programs.For example, the embodiment of the present invention includes a kind of computer program product, it includes being carried on computer-readable
Computer program on medium, the computer program include the program code for being used for performing the method shown in key step figure.
In above-described embodiment, the computer program can be downloaded and installed by communications portion 709 from network, and/or from removable
Medium 711 is unloaded to be mounted.When the computer program is performed by CPU 701, perform and limited in the system of the present invention
Above-mentioned function.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter
The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just
Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey
The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.In this hair
In bright, computer-readable signal media can be included in a base band or as a part of data-signal propagated of carrier wave, wherein
Carry computer-readable program code.The data-signal of this propagation can take various forms, and include but is not limited to electricity
Magnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable storage medium
Any computer-readable medium beyond matter, the computer-readable medium can be sent, propagated or transmitted for being held by instruction
Row system, device either device use or program in connection.The program code included on computer-readable medium
It can be transmitted, included but is not limited to any appropriate medium:Wirelessly, electric wire, optical cable, RF etc., or above-mentioned any conjunction
Suitable combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey
Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation
The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more
For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame
The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual
On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.
It should be noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform
Defined function or the special hardware based system of operation realize, or can use specialized hardware and computer instruction
Combine to realize.
Being described in unit involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described unit can also be set within a processor, for example, can be described as:A kind of processor bag
Include sample conversion ratio computing module and confidential interval determining module.Wherein, the title of these units not structure under certain conditions
The paired restriction of the unit in itself, for example, characteristic vector acquisition module is also described as " sending out to the matching module connected
Send the unit of the characteristic vector of current text ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be
Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned meter
Calculation machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment so that
The step of equipment performs includes:At least one Feature Words of current text are determined, are obtained according at least one Feature Words
The characteristic vector of the current text;Using the characteristic vector of the current text, the current text and multiple history are calculated
The similarity between any history text in text;And selection and the similarity of the current text meet preset rules
Matched text of the history text as the current text.
Technical scheme according to embodiments of the present invention, the Feature Words by extracting CRM event texts determine the feature of text
Vector, the digitized representations of text data are realized, be easy to follow-up Similarity Measure;Pass through the similarity of feature based vector
Calculate, realize the effect of matcher between accurate judgement text;Meet that the history text of preset rules is used as by choosing
The matched text of current text, realize independent of the text of artificial experience is automatic, Rapid matching, and then can be to contact staff
Automatic push history reference information, solve contact staff in the prior art and be completely dependent on lacking for experience solution customer issue
Fall into.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright
It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (14)
- A kind of 1. method of text matches, it is characterised in that including:At least one Feature Words of current text are determined, the feature of the current text is obtained according at least one Feature Words Vector;Using the characteristic vector of the current text, the current text and any history text in multiple history texts are calculated Between similarity;And the similarity of selection and the current text meets the history text of preset rules as described current The matched text of text.
- 2. according to the method for claim 1, it is characterised in that described to work as according to obtaining at least one Feature Words The characteristic vector of preceding text includes:Calculate each at least one Feature Words the current text weighted value, generate the current text Characteristic vector.
- 3. according to the method for claim 2, it is characterised in that methods described further comprises:Calculated according to following equation Weighted value of at least one Feature Words in the current text;<mrow> <msub> <mi>W</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <msub> <mi>t</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>&times;</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mn>1</mn> <mo>+</mo> <mi>N</mi> </mrow> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mfrac> <mo>)</mo> </mrow> </mrow>Wherein, i is positive integer, Wi1For the current text ith feature word in the weighted value of the current text, ti1For institute State probability of occurrence of the ith feature word in the current text, di1To include the text of the ith feature word in text library Number, N are total textual data of text library, and the text library is made up of the current text with history text.
- 4. according to the method for claim 1, it is characterised in that the characteristic vector using the current text, calculate The similarity between any history text in the current text and multiple history texts includes:Obtain the characteristic vector of any history text;, ought be above described in calculating using the characteristic vector of the current text and the characteristic vector of any history text Similarity between sheet and the history text.
- 5. according to the method for claim 4, it is characterised in that obtaining the characteristic vector of any history text includes:Determine at least one Feature Words of any history text;Calculate each at least one Feature Words the history text weighted value, generate the feature of the history text Vector.
- 6. according to the method for claim 4, it is characterised in that methods described further comprises:Calculated using following equation The current text and the similarity of any history text;<mrow> <mi>S</mi> <mo>=</mo> <mfrac> <mrow> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&times;</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <msqrt> <mrow> <mo>(</mo> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>A</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> <mo>(</mo> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>B</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> </mrow> </msqrt> </mfrac> </mrow>Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history text Characteristic vector the i-th component, S is the similarity of the current text and the history text.
- 7. according to any described methods of claim 1-6, it is characterised in that the selection and the similarity of the current text Meet the history text of preset rules includes as the matched text of the current text:By any history text with the similarity of the current text compared with predetermined threshold value, select the similarity to be more than The history text of the predetermined threshold value;In the history text that the similarity is more than the predetermined threshold value, the maximum history text conduct of the similarity is selected The matched text of the current text.
- A kind of 8. device of text matches, it is characterised in that including:Characteristic vector acquisition module, for determining at least one Feature Words of current text, according at least one Feature Words Obtain the characteristic vector of the current text;Matching module, for the characteristic vector using the current text, calculate in the current text and multiple history texts Any history text between similarity;The history text that the similarity of selection and the current text meets preset rules is made For the matched text of the current text.
- 9. device according to claim 8, it is characterised in that the characteristic vector acquisition module is used for:Calculate each at least one Feature Words the current text weighted value, generate the current text Characteristic vector.
- 10. device according to claim 9, it is characterised in that the characteristic vector acquisition module is used for:Weighted value of at least one Feature Words in the current text is calculated according to following equation;<mrow> <msub> <mi>W</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <msub> <mi>t</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>&times;</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mn>1</mn> <mo>+</mo> <mi>N</mi> </mrow> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mfrac> <mo>)</mo> </mrow> </mrow>Wherein, i is positive integer, Wi1For the current text ith feature word in the weighted value of the current text, ti1For institute State probability of occurrence of the ith feature word in the current text, di1To include the text of the ith feature word in text library Number, N are total textual data of text library, and the text library is made up of the current text with history text.
- 11. device according to claim 8, it is characterised in that the matching module is used for:Obtain the characteristic vector of any history text;Characteristic vector and any history text using the current text This characteristic vector, calculates the similarity between the current text and the history text.
- 12. device according to claim 8, it is characterised in that the matching module is used for:The current text and the similarity of any history text are calculated using following equation;<mrow> <mi>S</mi> <mo>=</mo> <mfrac> <mrow> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&times;</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <msqrt> <mrow> <mo>(</mo> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>A</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> <mo>(</mo> <msubsup> <mo>&Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>B</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> </mrow> </msqrt> </mfrac> </mrow>Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history text Characteristic vector the i-th component, S is the similarity of the current text and the history text.
- 13. a kind of electronic equipment, it is characterised in that including:One or more processors;Storage device, for storing one or more programs,When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-7.
- 14. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that described program is processed The method as described in any in claim 1-7 is realized when device performs.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710607397.4A CN107346344A (en) | 2017-07-24 | 2017-07-24 | The method and apparatus of text matches |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710607397.4A CN107346344A (en) | 2017-07-24 | 2017-07-24 | The method and apparatus of text matches |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107346344A true CN107346344A (en) | 2017-11-14 |
Family
ID=60256940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710607397.4A Pending CN107346344A (en) | 2017-07-24 | 2017-07-24 | The method and apparatus of text matches |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107346344A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107958061A (en) * | 2017-12-01 | 2018-04-24 | 厦门快商通信息技术有限公司 | The computational methods and computer-readable recording medium of a kind of text similarity |
CN109102157A (en) * | 2018-07-11 | 2018-12-28 | 交通银行股份有限公司 | A kind of bank's work order worksheet processing method and system based on deep learning |
CN109242516A (en) * | 2018-09-06 | 2019-01-18 | 北京京东尚科信息技术有限公司 | The single method and apparatus of processing service |
CN110457430A (en) * | 2019-07-02 | 2019-11-15 | 北京瑞卓喜投科技发展有限公司 | A kind of Traceability detection method of text, device and equipment |
CN113762846A (en) * | 2020-10-22 | 2021-12-07 | 北京京东振世信息技术有限公司 | Method and device for distinguishing facial sheet text |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103207899A (en) * | 2013-03-19 | 2013-07-17 | 新浪网技术(中国)有限公司 | Method and system for recommending text files |
CN103389987A (en) * | 2012-05-09 | 2013-11-13 | 阿里巴巴集团控股有限公司 | Text similarity comparison method and system |
CN104239512A (en) * | 2014-09-16 | 2014-12-24 | 电子科技大学 | Text recommendation method |
CN105335496A (en) * | 2015-10-22 | 2016-02-17 | 国网山东省电力公司电力科学研究院 | Customer service repeated call treatment method based on cosine similarity text mining algorithm |
-
2017
- 2017-07-24 CN CN201710607397.4A patent/CN107346344A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103389987A (en) * | 2012-05-09 | 2013-11-13 | 阿里巴巴集团控股有限公司 | Text similarity comparison method and system |
CN103207899A (en) * | 2013-03-19 | 2013-07-17 | 新浪网技术(中国)有限公司 | Method and system for recommending text files |
CN104239512A (en) * | 2014-09-16 | 2014-12-24 | 电子科技大学 | Text recommendation method |
CN105335496A (en) * | 2015-10-22 | 2016-02-17 | 国网山东省电力公司电力科学研究院 | Customer service repeated call treatment method based on cosine similarity text mining algorithm |
Non-Patent Citations (1)
Title |
---|
段利国 等: "综合句法结构及语义相似度的问题推荐技术", 《计算机科学》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107958061A (en) * | 2017-12-01 | 2018-04-24 | 厦门快商通信息技术有限公司 | The computational methods and computer-readable recording medium of a kind of text similarity |
CN109102157A (en) * | 2018-07-11 | 2018-12-28 | 交通银行股份有限公司 | A kind of bank's work order worksheet processing method and system based on deep learning |
CN109242516A (en) * | 2018-09-06 | 2019-01-18 | 北京京东尚科信息技术有限公司 | The single method and apparatus of processing service |
CN110457430A (en) * | 2019-07-02 | 2019-11-15 | 北京瑞卓喜投科技发展有限公司 | A kind of Traceability detection method of text, device and equipment |
CN113762846A (en) * | 2020-10-22 | 2021-12-07 | 北京京东振世信息技术有限公司 | Method and device for distinguishing facial sheet text |
CN113762846B (en) * | 2020-10-22 | 2024-04-16 | 北京京东振世信息技术有限公司 | Method and device for distinguishing face sheet text |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107105031A (en) | Information-pushing method and device | |
CN107346344A (en) | The method and apparatus of text matches | |
CN108805594B (en) | Information pushing method and device | |
CN107491547A (en) | Searching method and device based on artificial intelligence | |
CN107247786A (en) | Method, device and server for determining similar users | |
CN111125574B (en) | Method and device for generating information | |
CN108090162A (en) | Information-pushing method and device based on artificial intelligence | |
CN106649890A (en) | Data storage method and device | |
CN107609890A (en) | A kind of method and apparatus of order tracking | |
CN109840730B (en) | Method and device for data prediction | |
CN107944481A (en) | Method and apparatus for generating information | |
CN115002200B (en) | Message pushing method, device, equipment and storage medium based on user portrait | |
CN107590255A (en) | Information-pushing method and device | |
CN111145009A (en) | Method and device for evaluating risk after user loan and electronic equipment | |
CN106919711A (en) | The method and apparatus of the markup information based on artificial intelligence | |
CN110473042B (en) | Method and device for acquiring information | |
CN111582314A (en) | Target user determination method and device and electronic equipment | |
CN107783962A (en) | Method and device for query statement | |
CN109190123A (en) | Method and apparatus for output information | |
CN112348460A (en) | Resource limit adjusting method and device of cooperative task and electronic equipment | |
CN107291835A (en) | A kind of recommendation method and apparatus of search term | |
CN107704357A (en) | Daily record generation method and device | |
CN111210332A (en) | Method and device for generating post-loan management strategy and electronic equipment | |
CN112749323A (en) | Method and device for constructing user portrait | |
CN105808744A (en) | Information prediction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171114 |