CN107346344A - The method and apparatus of text matches - Google Patents

The method and apparatus of text matches Download PDF

Info

Publication number
CN107346344A
CN107346344A CN201710607397.4A CN201710607397A CN107346344A CN 107346344 A CN107346344 A CN 107346344A CN 201710607397 A CN201710607397 A CN 201710607397A CN 107346344 A CN107346344 A CN 107346344A
Authority
CN
China
Prior art keywords
text
mrow
history
current
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710607397.4A
Other languages
Chinese (zh)
Inventor
李建星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710607397.4A priority Critical patent/CN107346344A/en
Publication of CN107346344A publication Critical patent/CN107346344A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method and apparatus of text matches, it is related to computer realm.One embodiment of this method includes:At least one Feature Words of current text are determined, the characteristic vector of the current text is obtained according at least one Feature Words;Using the characteristic vector of the current text, the similarity between any history text in the current text and multiple history texts is calculated;And selection and the similarity of the current text meet matched text of the history text of preset rules as the current text.The embodiment can determine the history text with the text matches of current event, so as to provide suggestion to solve current event.

Description

The method and apparatus of text matches
Technical field
The present invention relates to computer realm, more particularly to a kind of method and apparatus of text matches.
Background technology
At present, the CRM (Customer Relationship Management, customer relation management) of Process-Oriented management The problem of system is used widely in the customer service work of enterprise, and its main function is record client's consulting, and form crm Event trouble ticket dispatch is handled to corresponding contact staff, is solved customer issue postscript and is recorded shelves.
During the present invention is realized, inventor has found that prior art at least has problems with:
In the substantial amounts of crm events work order accumulated in routine duties, many work orders are all the events for repeating to occur.Such as User A seeks advice from goods return and replacement problem, after contact staff's processing can by client the problem of, processing procedure and result recorded crm In event, but other users are still had afterwards and continue to seek advice from the problem of same.In the prior art, not by above-mentioned history crm Event work order is used in provides suggestion for current crm events, and so as to cause, customer service operating efficiency is relatively low, serious waste of resources.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of method and apparatus of text matches, can determine and current event Text matches history text, so as to for solution current event suggestion is provided.
To achieve the above object, according to an aspect of the invention, there is provided a kind of method of text matches.
The method of the text matches of the embodiment of the present invention includes:At least one Feature Words of current text are determined, according to institute State the characteristic vector that at least one Feature Words obtain the current text;Using the characteristic vector of the current text, institute is calculated State the similarity between any history text in current text and multiple history texts;And selection and the current text Similarity meets matched text of the history text of preset rules as the current text.
Alternatively, the characteristic vector for obtaining the current text according at least one Feature Words includes:Calculate In at least one Feature Words each the current text weighted value, generate the feature of the current text to Amount.
Alternatively, methods described further comprises:The Feature Words of the current text are calculated described according to following equation The weighted value of current text;
Wherein, i is positive integer, Wi1For the current text ith feature word the current text weighted value, ti1It is the ith feature word in the probability of occurrence of the current text, di1To include the ith feature word in text library Textual data, N is total textual data of text library, and the text library is made up of the current text with history text.
Alternatively, the characteristic vector using the current text, calculates the current text and multiple history texts In any history text between similarity include:Obtain the characteristic vector of any history text;Utilize the current text Characteristic vector and any history text characteristic vector, calculate between the current text and the history text Similarity.
Alternatively, obtaining the characteristic vector of any history text includes:Determine at least one of any history text Feature Words;Each at least one Feature Words is calculated in the weighted value of the history text, generates the history text Characteristic vector.
Alternatively, methods described further comprises:The current text and any history text are calculated using following equation Similarity;
Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history I-th component of the characteristic vector of text, S are the current text and the similarity of the history text.
Alternatively, the similarity of the selection and the current text meets and worked as described in the history text conduct of preset rules The matched text of preceding text includes:
By any history text with the similarity of the current text compared with predetermined threshold value, select the similarity More than the history text of the predetermined threshold value;In the history text that the similarity is more than the predetermined threshold value, described in selection Matched text of the maximum history text of similarity as the current text.
To achieve the above object, according to an aspect of the invention, there is provided a kind of device of text matches.
The device of the text matches of the embodiment of the present invention includes:Characteristic vector acquisition module, for determining current text At least one Feature Words, the characteristic vector of the current text is obtained according at least one Feature Words;Matching module, it is used for Using the characteristic vector of the current text, calculate between any history text in the current text and multiple history texts Similarity;And selection and the similarity of the current text meet the history text of preset rules as the current text Matched text.
Alternatively, the characteristic vector acquisition module is used for:Each at least one Feature Words is calculated in institute The weighted value of current text is stated, generates the characteristic vector of the current text.
Alternatively, the characteristic vector acquisition module is used for:The Feature Words of the current text are calculated according to following equation In the weighted value of the current text;
Wherein, i is positive integer, Wi1For the current text ith feature word the current text weighted value, ti1It is the ith feature word in the probability of occurrence of the current text, di1To include the ith feature word in text library Textual data, N is total textual data of text library, and the text library is made up of the current text with history text.
Alternatively, the matching module is used for:Obtain the characteristic vector of any history text;Utilize the current text The characteristic vector of characteristic vector and any history text, calculates the phase between the current text and the history text Like degree.
Alternatively, the matching module is used for:The current text and any history text are calculated using following equation Similarity;
Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history I-th component of the characteristic vector of text, S are the current text and the similarity of the history text.
To achieve the above object, according to another aspect of the invention, there is provided a kind of electronic equipment.
The a kind of electronic equipment of the present invention includes:One or more processors;Storage device, for storing one or more Program, when one or more of programs are by one or more of computing devices so that one or more of processors The method for realizing text matches provided by the present invention.
To achieve the above object, in accordance with a further aspect of the present invention, there is provided a kind of computer-readable recording medium.
A kind of computer-readable recording medium of the present invention, is stored thereon with computer program, described program is by processor The method that text matches provided by the present invention are realized during execution.
Technique according to the invention scheme, one embodiment in foregoing invention has the following advantages that or beneficial effect:It is logical The Feature Words for crossing extraction CRM event texts determine the characteristic vector of text, the digitized representations of text data are realized, after being easy to Continuous Similarity Measure;By the Similarity Measure of feature based vector, matcher between accurate judgement text is realized Effect;Meet that the history text of preset rules is used as the matched text of current text by choosing, realize independent of artificial warp The text tested is automatic, Rapid matching, and then can be solved in the prior art to contact staff's automatic push history reference information Contact staff is completely dependent on experience and solves the defects of customer issue.
Further effect adds hereinafter in conjunction with embodiment possessed by above-mentioned non-usual optional mode With explanation.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the key step schematic diagram of the method for text matches according to embodiments of the present invention;
Fig. 2 is the schematic diagram of the major part of the device of text matches according to embodiments of the present invention;
Fig. 3 is the CRM event overall flow figures of prior art;
Fig. 4 is the CRM event overall flow figures of the method for text matches according to embodiments of the present invention;
Fig. 5 is the CRM event particular flow sheets of the method for text matches according to embodiments of the present invention;
Fig. 6 is to can apply to exemplary system architecture figure therein according to embodiments of the present invention;
Fig. 7 is the structural representation for realizing the electronic equipment of the method for the text matches of the embodiment of the present invention.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
The technical scheme of embodiments of the invention by extract the Feature Words of CRM event texts determine the feature of text to Amount, realizes the digitized representations of text data, is easy to follow-up Similarity Measure;Pass through the similarity meter of feature based vector Calculate, realize the effect of matcher between accurate judgement text;Meet that the history text of preset rules is used as by choosing to work as The matched text of preceding text, realize independent of the text of artificial experience is automatic, Rapid matching, so can to contact staff from Dynamic push history reference information, solve contact staff in the prior art and be completely dependent on lacking for experience solution customer issue Fall into.
Embodiment one
Fig. 1 is the key step schematic diagram according to the method for the text matches of the present embodiment.
As shown in figure 1, the method for the text matches of the embodiment of the present invention mainly comprises the following steps:
Step S101:At least one Feature Words of current text are determined, according to obtaining at least one Feature Words The characteristic vector of current text.
In this step, current text segmented, go to prohibit word operation, you can obtain the feature with substantive content Word.After the Feature Words of current text are obtained, weighted value of each Feature Words in current text of current text is calculated, you can Generate the characteristic vector of current text.
Specifically, the Feature Words of current text can be obtained by the following formula in the weighted value of current text:
Wherein, i is positive integer, Wi1For current text ith feature word in the weighted value of current text, ti1For i-th Feature Words are in the probability of occurrence of current text, di1To include the textual data of ith feature word in text library, N is the total of text library Textual data, text library are made up of the current text with history text, probability of occurrence ti1It is ith feature word in current text Occurrence number and the ratio of Feature Words sum in text library.
In the present embodiment, prior to, concurrently with, or after step S101 is performed, appointing in multiple history texts can be obtained The characteristic vector of one history text.Specifically, the Feature Words of any history text are obtained first, then calculate each feature Word in the weighted value of the history text, finally generated according to the weighted value of each Feature Words the feature of the history text to Amount.
In practical application, the Feature Words of history text can be obtained by the following formula in the weighted value of the history text:
Wherein, i is positive integer, Wi2For any history text ith feature word in the weighted value of the history text, ti2 It is ith feature word in the probability of occurrence of the history text, di2To include the textual data of the ith feature word in text library, Probability of occurrence ti2For the ratio of ith feature word Feature Words sum in the occurrence number of the history text and text library.
It is understood that the weighted value of Feature Words can represent the significance level of Feature Words in the text, by multiple spies Thus the vector of the weighted value composition of sign word can be used for matching judgment as the characteristic vector of text.Usually, Feature Words are in text Probability of occurrence in this is higher, and its weighted value is bigger;Meanwhile the number that Feature Words occur in text library is more, illustrate that its is only Characteristic is lower, therefore its weighted value is smaller, and therefore, the weighted value of the Feature Words of current text and history text can pass through above-mentioned public affairs Formula Wi1、Wi2It is calculated.
Especially, current text characteristic vector and the characteristic vector of any history text are tieed up using Feature Words as vector Degree, the component of the vector dimension is used as using the weighted value of this feature word.Meanwhile current text characteristic vector and any history text Characteristic vector be K dimensional vectors, K is the Feature Words sum of current text and history text, and K is positive integer.In generation text Characteristic vector when, in the vector dimension for the Feature Words that the text includes, the vector dimension is used as using the weighted value of this feature word Component;In the vector dimension for the Feature Words that the text does not include, its component is zero.
It should be noted that in this step, after obtaining current text and the characteristic vector of history text, it can also preserve Characteristic vector is used for subsequent match.In other words, subsequently each time in matching process, can't to history text repeated characteristic to The above-mentioned calculation procedure of amount, but the characteristic vector of current matching process is directly obtained according to the characteristic vector of preservation.It is actual to answer In, if the Feature Words of the current text of matching process are all contained in text library next time, the feature of each history text Vector is constant;If if next time there are the Feature Words not having in text library in the current text of matching process, in preservation The null component for increasing new feature word dimension in the characteristic vector of each history text is the current signature vector for forming each history text.
Step S102:Using the characteristic vector of current text, the current text and appointing in multiple history texts are calculated Similarity between one history text;Selection and the similarity of current text meet the history texts of preset rules as ought be above This matched text.
In this step, calculated using the characteristic vector of current text and the characteristic vector of any history text current Similarity between text and the history text.
Specifically, in this step, the similarity between current text and any history text is calculated by following equation:
Wherein, n is positive integer, AiFor the i-th component of the characteristic vector of current text, BiFor the feature of any history text I-th component of vector, S are current text and the similarity of the history text.
In embodiments of the present invention, preset rules are:By the similarity of any history text and current text and default threshold Value is compared, and determines that similarity is more than the history text of predetermined threshold value;In the history text that similarity is more than predetermined threshold value, Select matched text of the maximum history text of similarity as current text.It is understood that preset rules can basis Flexible setting is actually needed, such as:The history text maximum with the similarity of current text can also be directly selected as current The matched text of text.
Pass through above-mentioned steps, you can determine the text that is matched the most with current text from history text, and then can from With obtaining relevant information in text.
The method of text matches according to embodiments of the present invention can be seen that because employing by extracting CRM events text This Feature Words determine the technological means of the characteristic vector of text, have reached the digitized representations for realizing text data, after being easy to The effect of continuous Similarity Measure;By the Similarity Measure of feature based vector, realize and match between accurate judgement text The effect of program;By choose meet that the history text of preset rules is used as the matched text of current text, realize independent of The text of artificial experience is automatic, Rapid matching, and then can be solved existing to contact staff's automatic push history reference information Contact staff is completely dependent on experience and solves the defects of customer issue in technology.
Embodiment two
Fig. 2 is the schematic diagram according to the major part of the device of the text matches of the present embodiment.
As shown in Fig. 2 the device 200 of the text matches of the present embodiment may include characteristic vector acquisition module 201 and matching Module 202.Wherein:
Characteristic vector acquisition module 201 can be used at least one Feature Words for determining current text, according to described at least one Individual Feature Words obtain the characteristic vector of the current text.
Matching module 202 can be used for the characteristic vector using the current text, calculates the current text and is gone through with multiple The similarity between any history text in history text;The similarity with the current text is selected to meet going through for preset rules Matched text of the history text as the current text.
It is preferred that in the present embodiment, the characteristic vector acquisition module 201 can be used for:Calculate at least one spy Levy each in word the current text weighted value, generate the characteristic vector of the current text.
In the present embodiment, the characteristic vector acquisition module 201 can be used for:Ought be above according to calculating following equation Weighted value of this Feature Words in the current text;
Wherein, i is positive integer, Wi1For the current text ith feature word the current text weighted value, ti1It is the ith feature word in the probability of occurrence of the current text, di1To include the ith feature word in text library Textual data, N is total textual data of text library, and the text library is made up of the current text with history text.
In practical application, the matching module 202 can be used for:Obtain the characteristic vector of any history text;Using described The characteristic vector of the characteristic vector of current text and any history text, calculate the current text and history text Similarity between this.
In the optional implementation of the present embodiment, the matching module 202 can be used for:Described in being calculated using following equation Current text and the similarity of any history text;
Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history I-th component of the characteristic vector of text, S are the current text and the similarity of the history text.
From the above, it can be seen that the device of the text matches of the embodiment of the present invention is by extracting CRM event texts Feature Words determine the characteristic vector of text, realize the digitized representations of text data, are easy to follow-up Similarity Measure;Pass through The Similarity Measure of feature based vector, realize the effect of matcher between accurate judgement text;Met in advance by choosing If matched text of the history text as current text of rule, realize independent of the text of artificial experience it is automatic, quick Match somebody with somebody, and then can solve contact staff in the prior art to contact staff's automatic push history reference information and be completely dependent on certainly Body experience solves the defects of customer issue.
Embodiment three
It is understood that the method for the text matches of the embodiment of the present invention can be used for the text of most technical fields This matching, the method for the text matches of the embodiment of the present invention will be introduced by taking the text matches of CRM events as an example below.Need to refer to Go out, the particular technique content of following CRM event texts matching is not to the method for the text matches of the embodiment of the present invention Form any restrictions.
The CRM event handlings flow of prior art is as shown in figure 3, as seen from Figure 3:Client seeks advice from inlet wire, visitor first Linked up after taking personnel's wiring with client, and the event summary classification for recording customer issue and rule of thumb judging, generation are new CRM events.System carries out distribute leaflets action to the CRM events work order afterwards, is sent to the contact staff specified.Work as contact staff Described by content the problem of the event that record to system before after receiving distribute leaflets task and event summary classification, advised by enterprise Fixed normalized service flow solves the problems, such as client.Closed after the completion of event worksheet, CRM is arrived in CRM events storage In event base.
Above-mentioned crm system is intended only as a workflow management instrument, realize event trouble ticket dispatch and logout storage Function, do not locate constantly in service process in view of the useful information carried in historical events work order, a line contact staff Repetitive event is managed, causes operating efficiency relatively low and the serious wasting of resources.
In view of the above-mentioned problems, the present embodiment provides CRM event handling flows as shown in Figure 4.It can be seen that in Fig. 4, The device of text matches is initially set up, when establishing new CRM events, new CRM events are sent to the device of text matches, The automatic match information that obtains provides to the contact staff for solving problem.In the present embodiment, history CRM event texts packet Include:Numbering, event summary classification text, problem describe text and result text.New CRM event text information includes: Numbering, event summary classification text, problem describe text and creation time.Hereafter will be literary to be described the problem of new CRM events This determines matched text, and then general as current text in describing text (i.e. history text) from the problem of history CRM events The suggestion of new CRM events is handled as contact staff with result text output corresponding to text.
In concrete application, the device of above-mentioned text matches includes:Characteristic vector acquisition module and matching module.Wherein:
Characteristic vector acquisition module is used for the text message for gathering new CRM events and history CRM events, and determines above-mentioned The Feature Words of current text and history text in information;Current text and the characteristic vector of history text are calculated according to Feature Words.
Matching module is used to calculate current text and the similarity of each history text according to characteristic vector, and selects similar Degree meets the history text of preset rules as matched text.
Specifically, characteristic vector acquisition module performs following steps:
1. the text message of the new CRM events of collection and history CRM events.
Such as:The history CRM event text information of collection is as shown in the table:
The new CRM event text information of collection is as shown in the table:
(and history CRM events are asked for the problem of (i.e. new CRM events describe text) and history text 2. pair current text Topic description text) segmented, go to prohibit word processing, obtain the Feature Words of current text and history text.
Such as:The current text of precedent and history text are handled, obtain following characteristics word:
3. for each text in history text and current text, power of each of which Feature Words in the text is calculated Weight values.
Specifically, weighted value is calculated by below equation:
Wherein, i is positive integer, WiFor text ith feature word in the weighted value of the text, tiIt is this feature word at this The probability of occurrence of text, diTo include the textual data of this feature word in text library, N is total textual data of text library, text library by Current text forms with history text, probability of occurrence tiFor feature of this feature word in the occurrence number and text library of the text The ratio of word sum.
4. the characteristic vector of text is constructed by the weighted value of Feature Words.
Such as:The history text that numbering to upper example is 1 calculates with current text, can obtain as shown in the table Characteristic vector:
Weight Wi Client Come Electricity Instead Feedback Receive Dimension Repair More Change Firmly Part By force Row It is required that Change Goods Sell Afterwards Core It is real Connect By Urge Examine Core
d1 0.05 5 0 0.0 5 0.01 8 0.0 5 0.0 5 0.0 5 0.0 5 0.01 8 0.0 5 0.0 5 0.0 5 0.0 5 0.01 8 0. 01 8
d432 0.01 8 0.0 5 0 0.01 8 0 0 0 0 0.01 8 0 0 0 0 0.01 8 0. 01 8
That is the characteristic vector of the history text and current text is respectively:
D1=0.055,0,0.05,0.018,0.05,0.05,0.05,0.05,0.018,0.05,0.05,0 .05,
0.05,0.018,0.018}
D432={ 0.018,0.05,0,0.018,0,0,0,0,0.018,0,0,0,0,0.018,0.018 }
Matching module performs following steps:
1. the similarity of any history text and current text is calculated using following equation.
Wherein, n is positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history text I-th component of this characteristic vector, S are the current text and the similarity of the history text.
Such as:The d1 and d432 of upper example are calculated, similarity 0.22.
2. similarity is met to matched text of the history text as current text of preset rules.Usually, rule are preset Can be then:The history text that similarity is more than to predetermined threshold value first is chosen, and therefrom chooses the maximum history of similarity afterwards Text exports as matched text.
3. being exported the solution link of text in Figure 5 is handled corresponding to matched text to contact staff, asked as solution The suggestion of topic.
By the above-mentioned steps of characteristic vector acquisition module and matching module, the present embodiment realizes the automatic of current text Matching, takes full advantage of the valuable information in historical events, greatly improves contact staff's operating efficiency and work quality.
Fig. 5 is the CRM event particular flow sheets of the method for text matches according to embodiments of the present invention.Can be with from Fig. 5 See the specific execution step of characteristic vector acquisition module and matching module.
It is emphasized that application of the method for the text matches that the present embodiment provides in crm system be not to this hair Bright carry out any restrictions.In fact, the methods of the text matches of the present invention can be used for having Feature Words, history text feature Any technical field and technological accumulation and inheritance.Such as:Search text commercial product recommending in internet arena, user's advisory text reply, Customer satisfaction evaluation processing etc., in reply that publishing area is suggested to reader etc., the method for text matches of the invention It is applicable.
Fig. 6 show can apply the embodiment of the present invention text matches method or text matches device it is exemplary System architecture 600.
As shown in fig. 6, system architecture 600 can include terminal device 601,602,603, network 604 and server 605 (this framework is only example, and the component included in specific framework can be according to the adjustment of application concrete condition).Network 604 to The medium of communication link is provided between terminal device 601,602,603 and server 605.Network 604 can include various connections Type, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 601,602,603 by network 604 with server 605, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 601,602,603 (merely illustrative) such as the application of page browsing device, searching class application, JICQ, mailbox client, social platform softwares.
Terminal device 601,602,603 can have a display screen and a various electronic equipments that supported web page browses, bag Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..
Server 605 can be to provide the server of various services, such as utilize terminal device 601,602,603 to user The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to receiving To the data such as information query request analyze etc. processing, and by result (such as target push information, product letter Breath -- merely illustrative) feed back to terminal device.
It should be noted that the method for the text matches that the embodiment of the present invention is provided typically is performed by server 605, phase Ying Di, the device of text matches are generally positioned in server 605.
It should be understood that the number of the terminal device, network and server in Fig. 6 is only schematical.According to realizing need Will, can have any number of terminal device, network and server.
Present invention also offers a kind of electronic equipment.
The electronic equipment of the embodiment of the present invention includes:One or more processors;Storage device, for storing one or more Individual program, when one or more of programs are by one or more of computing devices so that one or more of processing The method that device realizes text matches provided by the present invention.
Below with reference to Fig. 7, it illustrates suitable for for realizing the computer system 700 of the electronic equipment of the embodiment of the present invention Structural representation.Electronic equipment shown in Fig. 7 is only an example, to the function of the embodiment of the present invention and should not use model Shroud carrys out any restrictions.
As shown in fig. 7, computer system 700 includes CPU (CPU) 701, it can be read-only according to being stored in Program in memory (ROM) 702 or be loaded into program in random access storage device (RAM) 703 from storage part 708 and Perform various appropriate actions and processing.In RAM703, be also stored with computer system 700 operate required various programs and Data.CPU701, ROM 702 and RAM 703 are connected with each other by bus 704.Input/output (I/O) interface 705 also connects To bus 704.
I/O interfaces 705 are connected to lower component:Importation 706 including keyboard, mouse etc.;Penetrated including such as negative electrode The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 708 including hard disk etc.; And the communications portion 709 of the NIC including LAN card, modem etc..Communications portion 709 via such as because The network of spy's net performs communication process.Driver 710 is also according to needing to be connected to I/O interfaces 705.Detachable media 711, such as Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 710, so as to what is read from it Computer program is mounted into storage part 708 as needed.
Especially, may be implemented as according to embodiment disclosed by the invention, the process of key step figure above description Computer software programs.For example, the embodiment of the present invention includes a kind of computer program product, it includes being carried on computer-readable Computer program on medium, the computer program include the program code for being used for performing the method shown in key step figure. In above-described embodiment, the computer program can be downloaded and installed by communications portion 709 from network, and/or from removable Medium 711 is unloaded to be mounted.When the computer program is performed by CPU 701, perform and limited in the system of the present invention Above-mentioned function.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.In this hair In bright, computer-readable signal media can be included in a base band or as a part of data-signal propagated of carrier wave, wherein Carry computer-readable program code.The data-signal of this propagation can take various forms, and include but is not limited to electricity Magnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable storage medium Any computer-readable medium beyond matter, the computer-readable medium can be sent, propagated or transmitted for being held by instruction Row system, device either device use or program in connection.The program code included on computer-readable medium It can be transmitted, included but is not limited to any appropriate medium:Wirelessly, electric wire, optical cable, RF etc., or above-mentioned any conjunction Suitable combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function. It should be noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform Defined function or the special hardware based system of operation realize, or can use specialized hardware and computer instruction Combine to realize.
Being described in unit involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described unit can also be set within a processor, for example, can be described as:A kind of processor bag Include sample conversion ratio computing module and confidential interval determining module.Wherein, the title of these units not structure under certain conditions The paired restriction of the unit in itself, for example, characteristic vector acquisition module is also described as " sending out to the matching module connected Send the unit of the characteristic vector of current text ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned meter Calculation machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment so that The step of equipment performs includes:At least one Feature Words of current text are determined, are obtained according at least one Feature Words The characteristic vector of the current text;Using the characteristic vector of the current text, the current text and multiple history are calculated The similarity between any history text in text;And selection and the similarity of the current text meet preset rules Matched text of the history text as the current text.
Technical scheme according to embodiments of the present invention, the Feature Words by extracting CRM event texts determine the feature of text Vector, the digitized representations of text data are realized, be easy to follow-up Similarity Measure;Pass through the similarity of feature based vector Calculate, realize the effect of matcher between accurate judgement text;Meet that the history text of preset rules is used as by choosing The matched text of current text, realize independent of the text of artificial experience is automatic, Rapid matching, and then can be to contact staff Automatic push history reference information, solve contact staff in the prior art and be completely dependent on lacking for experience solution customer issue Fall into.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims (14)

  1. A kind of 1. method of text matches, it is characterised in that including:
    At least one Feature Words of current text are determined, the feature of the current text is obtained according at least one Feature Words Vector;
    Using the characteristic vector of the current text, the current text and any history text in multiple history texts are calculated Between similarity;And the similarity of selection and the current text meets the history text of preset rules as described current The matched text of text.
  2. 2. according to the method for claim 1, it is characterised in that described to work as according to obtaining at least one Feature Words The characteristic vector of preceding text includes:
    Calculate each at least one Feature Words the current text weighted value, generate the current text Characteristic vector.
  3. 3. according to the method for claim 2, it is characterised in that methods described further comprises:Calculated according to following equation Weighted value of at least one Feature Words in the current text;
    <mrow> <msub> <mi>W</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <msub> <mi>t</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>&amp;times;</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mn>1</mn> <mo>+</mo> <mi>N</mi> </mrow> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mfrac> <mo>)</mo> </mrow> </mrow>
    Wherein, i is positive integer, Wi1For the current text ith feature word in the weighted value of the current text, ti1For institute State probability of occurrence of the ith feature word in the current text, di1To include the text of the ith feature word in text library Number, N are total textual data of text library, and the text library is made up of the current text with history text.
  4. 4. according to the method for claim 1, it is characterised in that the characteristic vector using the current text, calculate The similarity between any history text in the current text and multiple history texts includes:
    Obtain the characteristic vector of any history text;
    , ought be above described in calculating using the characteristic vector of the current text and the characteristic vector of any history text Similarity between sheet and the history text.
  5. 5. according to the method for claim 4, it is characterised in that obtaining the characteristic vector of any history text includes:
    Determine at least one Feature Words of any history text;
    Calculate each at least one Feature Words the history text weighted value, generate the feature of the history text Vector.
  6. 6. according to the method for claim 4, it is characterised in that methods described further comprises:Calculated using following equation The current text and the similarity of any history text;
    <mrow> <mi>S</mi> <mo>=</mo> <mfrac> <mrow> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&amp;times;</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <msqrt> <mrow> <mo>(</mo> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>A</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> <mo>(</mo> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>B</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> </mrow> </msqrt> </mfrac> </mrow>
    Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history text Characteristic vector the i-th component, S is the similarity of the current text and the history text.
  7. 7. according to any described methods of claim 1-6, it is characterised in that the selection and the similarity of the current text Meet the history text of preset rules includes as the matched text of the current text:
    By any history text with the similarity of the current text compared with predetermined threshold value, select the similarity to be more than The history text of the predetermined threshold value;
    In the history text that the similarity is more than the predetermined threshold value, the maximum history text conduct of the similarity is selected The matched text of the current text.
  8. A kind of 8. device of text matches, it is characterised in that including:
    Characteristic vector acquisition module, for determining at least one Feature Words of current text, according at least one Feature Words Obtain the characteristic vector of the current text;
    Matching module, for the characteristic vector using the current text, calculate in the current text and multiple history texts Any history text between similarity;The history text that the similarity of selection and the current text meets preset rules is made For the matched text of the current text.
  9. 9. device according to claim 8, it is characterised in that the characteristic vector acquisition module is used for:
    Calculate each at least one Feature Words the current text weighted value, generate the current text Characteristic vector.
  10. 10. device according to claim 9, it is characterised in that the characteristic vector acquisition module is used for:
    Weighted value of at least one Feature Words in the current text is calculated according to following equation;
    <mrow> <msub> <mi>W</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <msub> <mi>t</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>&amp;times;</mo> <mi>l</mi> <mi>n</mi> <mrow> <mo>(</mo> <mfrac> <mrow> <mn>1</mn> <mo>+</mo> <mi>N</mi> </mrow> <msub> <mi>d</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> </mfrac> <mo>)</mo> </mrow> </mrow>
    Wherein, i is positive integer, Wi1For the current text ith feature word in the weighted value of the current text, ti1For institute State probability of occurrence of the ith feature word in the current text, di1To include the text of the ith feature word in text library Number, N are total textual data of text library, and the text library is made up of the current text with history text.
  11. 11. device according to claim 8, it is characterised in that the matching module is used for:
    Obtain the characteristic vector of any history text;Characteristic vector and any history text using the current text This characteristic vector, calculates the similarity between the current text and the history text.
  12. 12. device according to claim 8, it is characterised in that the matching module is used for:
    The current text and the similarity of any history text are calculated using following equation;
    <mrow> <mi>S</mi> <mo>=</mo> <mfrac> <mrow> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&amp;times;</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <msqrt> <mrow> <mo>(</mo> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>A</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> <mo>(</mo> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </msubsup> <msup> <msub> <mi>B</mi> <mi>i</mi> </msub> <mn>2</mn> </msup> <mo>)</mo> </mrow> </msqrt> </mfrac> </mrow>
    Wherein, i, n are positive integer, AiFor the i-th component of the characteristic vector of the current text, BiFor any history text Characteristic vector the i-th component, S is the similarity of the current text and the history text.
  13. 13. a kind of electronic equipment, it is characterised in that including:
    One or more processors;
    Storage device, for storing one or more programs,
    When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-7.
  14. 14. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that described program is processed The method as described in any in claim 1-7 is realized when device performs.
CN201710607397.4A 2017-07-24 2017-07-24 The method and apparatus of text matches Pending CN107346344A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710607397.4A CN107346344A (en) 2017-07-24 2017-07-24 The method and apparatus of text matches

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710607397.4A CN107346344A (en) 2017-07-24 2017-07-24 The method and apparatus of text matches

Publications (1)

Publication Number Publication Date
CN107346344A true CN107346344A (en) 2017-11-14

Family

ID=60256940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710607397.4A Pending CN107346344A (en) 2017-07-24 2017-07-24 The method and apparatus of text matches

Country Status (1)

Country Link
CN (1) CN107346344A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958061A (en) * 2017-12-01 2018-04-24 厦门快商通信息技术有限公司 The computational methods and computer-readable recording medium of a kind of text similarity
CN109102157A (en) * 2018-07-11 2018-12-28 交通银行股份有限公司 A kind of bank's work order worksheet processing method and system based on deep learning
CN109242516A (en) * 2018-09-06 2019-01-18 北京京东尚科信息技术有限公司 The single method and apparatus of processing service
CN110457430A (en) * 2019-07-02 2019-11-15 北京瑞卓喜投科技发展有限公司 A kind of Traceability detection method of text, device and equipment
CN113762846A (en) * 2020-10-22 2021-12-07 北京京东振世信息技术有限公司 Method and device for distinguishing facial sheet text

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files
CN103389987A (en) * 2012-05-09 2013-11-13 阿里巴巴集团控股有限公司 Text similarity comparison method and system
CN104239512A (en) * 2014-09-16 2014-12-24 电子科技大学 Text recommendation method
CN105335496A (en) * 2015-10-22 2016-02-17 国网山东省电力公司电力科学研究院 Customer service repeated call treatment method based on cosine similarity text mining algorithm

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389987A (en) * 2012-05-09 2013-11-13 阿里巴巴集团控股有限公司 Text similarity comparison method and system
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files
CN104239512A (en) * 2014-09-16 2014-12-24 电子科技大学 Text recommendation method
CN105335496A (en) * 2015-10-22 2016-02-17 国网山东省电力公司电力科学研究院 Customer service repeated call treatment method based on cosine similarity text mining algorithm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
段利国 等: "综合句法结构及语义相似度的问题推荐技术", 《计算机科学》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107958061A (en) * 2017-12-01 2018-04-24 厦门快商通信息技术有限公司 The computational methods and computer-readable recording medium of a kind of text similarity
CN109102157A (en) * 2018-07-11 2018-12-28 交通银行股份有限公司 A kind of bank's work order worksheet processing method and system based on deep learning
CN109242516A (en) * 2018-09-06 2019-01-18 北京京东尚科信息技术有限公司 The single method and apparatus of processing service
CN110457430A (en) * 2019-07-02 2019-11-15 北京瑞卓喜投科技发展有限公司 A kind of Traceability detection method of text, device and equipment
CN113762846A (en) * 2020-10-22 2021-12-07 北京京东振世信息技术有限公司 Method and device for distinguishing facial sheet text
CN113762846B (en) * 2020-10-22 2024-04-16 北京京东振世信息技术有限公司 Method and device for distinguishing face sheet text

Similar Documents

Publication Publication Date Title
CN107105031A (en) Information-pushing method and device
CN107346344A (en) The method and apparatus of text matches
CN108805594B (en) Information pushing method and device
CN107491547A (en) Searching method and device based on artificial intelligence
CN107247786A (en) Method, device and server for determining similar users
CN111125574B (en) Method and device for generating information
CN108090162A (en) Information-pushing method and device based on artificial intelligence
CN106649890A (en) Data storage method and device
CN107609890A (en) A kind of method and apparatus of order tracking
CN109840730B (en) Method and device for data prediction
CN107944481A (en) Method and apparatus for generating information
CN115002200B (en) Message pushing method, device, equipment and storage medium based on user portrait
CN107590255A (en) Information-pushing method and device
CN111145009A (en) Method and device for evaluating risk after user loan and electronic equipment
CN106919711A (en) The method and apparatus of the markup information based on artificial intelligence
CN110473042B (en) Method and device for acquiring information
CN111582314A (en) Target user determination method and device and electronic equipment
CN107783962A (en) Method and device for query statement
CN109190123A (en) Method and apparatus for output information
CN112348460A (en) Resource limit adjusting method and device of cooperative task and electronic equipment
CN107291835A (en) A kind of recommendation method and apparatus of search term
CN107704357A (en) Daily record generation method and device
CN111210332A (en) Method and device for generating post-loan management strategy and electronic equipment
CN112749323A (en) Method and device for constructing user portrait
CN105808744A (en) Information prediction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20171114