CN109726283A - A kind of electric service client's demand recognition methods based on text similarity measurement - Google Patents
A kind of electric service client's demand recognition methods based on text similarity measurement Download PDFInfo
- Publication number
- CN109726283A CN109726283A CN201811463322.4A CN201811463322A CN109726283A CN 109726283 A CN109726283 A CN 109726283A CN 201811463322 A CN201811463322 A CN 201811463322A CN 109726283 A CN109726283 A CN 109726283A
- Authority
- CN
- China
- Prior art keywords
- text
- work order
- client
- demand
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 238000005259 measurement Methods 0.000 title claims abstract description 11
- 238000007635 classification algorithm Methods 0.000 claims abstract description 6
- 230000036651 mood Effects 0.000 claims abstract description 4
- 238000004422 calculation algorithm Methods 0.000 claims description 8
- 238000012549 training Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 description 5
- 238000012545 processing Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 108091006149 Electron carriers Proteins 0.000 description 1
- 238000009412 basement excavation Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of electric service client's demand recognition methods based on text similarity measurement.Recognition methods process is as follows: establishing client's demand hot spot system table;Text Pretreatment: carrying out text participle for the text in work order and text quantify, and carries out cutting to the long content of text, rejects stop words;Stop words refers to the big and invalid vocabulary of auxiliary words of mood on text analyzing without influence, amount, punctuation mark;Automate text classification: finally according to the theme of identification and corresponding dictionary, combining classification algorithm realizes the classification of automation to full dose customer service work order.The present invention has the advantages that cosine similarity can be automatic accurate for themes multiple in text identification, therefore innovative measure text similarity of the present invention is combined with work order data, precisely identifies whole demands of client in every work order.
Description
Technical field:
The present invention relates to the methods for being specially adapted for electric power community service department, and in particular to one kind is based on text similarity
Electric service client's demand recognition methods of measurement.
Background technique:
With internet+, big data, the information technologies rapid development such as cloud computing, most information is via paper carrier
It is transitioned into electron carrier, and in these information, it is largely unstructured or semi-structured text information.How effectively to manage
Reason, the information excavated, contained in analysis magnanimity unstructured data, it has also become the challenge again of big data field.Unstructured
In data, text data is play an important role.For possessing the enterprise of a large amount of text datas, how this part number is effectively utilized
Decide the development in enterprise's future according to resource.In the data of power industry client service center, how work order data are handled,
To accurately identify the demand of client in work order, or even the implicit demand timely newly-increased demand of uprushing of discovery simultaneously is excavated, this
It is most important to the satisfaction of the quality and client that promote service.
Using there are mainly two types of the common methods of excavation of the text similarity to information in text data, one is
SimHash algorithm, another is cosine similarity algorithm, also known as COS distance, is with two vector angles in vector space
Cosine value as measure two inter-individual differences size measurement;Cosine value closer to 1, indicate that angle closer to 0 degree,
I.e. two vectors are more similar.By the way that the processing to content of text is reduced to the vector operation in vector space, and there is calculating to tie
Fruit is accurate, is suitble to the advantages of handling short text.
In the work order of power customer, client's demand of every work order is simultaneously not all single demand, accurately identifies every work
Single whole demands are particularly important;In the Text Classification of machine learning classification, machine learning classification algorithm can only be identified
Single demand is unable to satisfy a case where work order contains multiple demands.The work order of record in to(for) client's demand is by customer service people
Member's processing conversion record, content of text is longer, unified without simplifying well, there are the work order that same work order has multiple demands,
And the recording mode of same demand also difference.
Summary of the invention:
Present invention is primarily based on text similarity measurements to carry out demand knowledge to the text data in electric service client's work order
Not, by cosine similarity algorithm, to treated, text data carries out mining analysis, identifies client's whole demand in work order, with
Just be accurately positioned each client in terms of electricity consumption the problem of.Specific technical solution is as follows:
A kind of electric service client's demand recognition methods based on text similarity measurement, comprises the following processes:
Step 0: establishing client's demand hot spot system table: randomly choosing N sample in full dose sample as training sample
And test sample, N sample is randomly choosed in full dose sample, according to the client for including in cosine similarity algorithm identification work order
Demand defines the business meaning of each theme in conjunction with professional knowledge and logic, forms client's demand hot spot system table;
Step 1: Text Pretreatment: carrying out text participle and text for the text in work order and quantify, to the long content of text into
Stop words is rejected in row cutting;Stop words refers to the big and invalid vocabulary of auxiliary words of mood on text analyzing without influence, amount, punctuate
Symbol;
Step 2: automation text classification: finally according to the theme of identification and corresponding dictionary, combining classification algorithm is to complete
Measure the classification that customer service work order realizes automation.
Preferably, in step 0, the N value 10000.
The present invention has the advantages that
(1) this method proposition is used in full dose customer service work order using cosine similarity algorithm precisely identifies client's demand,
Text data is sufficiently excavated and is applied in real work.
(2) identification that cosine similarity can be automatic accurate for themes multiple in text, thus the present invention it is innovative general
Text similarity measurement is combined with work order data, precisely identifies whole demands of client in every work order.
Specific embodiment:
Embodiment:
A kind of electric service client's demand recognition methods based on text similarity measurement, comprises the following processes:
Step 0: establishing client's demand hot spot system table: randomly choosing 10,000 samples in full dose sample as training sample
Sheet and test sample randomly choose 10,000 samples in full dose sample, include in foundation cosine similarity algorithm identification work order
Client's demand defines the business meaning of each theme in conjunction with professional knowledge and logic, forms client's demand hot spot system table;
Step 1: the text in work order being subjected to text participle and text quantifies, mainly to the long content of text according to certain
Rule carries out cutting, rejects stop words;Stop words refers to the big and invalid word of auxiliary words of mood on text analyzing without influence, amount
Remittance, punctuation mark etc. form specialized dictionary and thesaurus by Text Pretreatment, segment to improve to new data
Accuracy and validity;By calling the jar packet sealed up for safekeeping in this project, pass through the java program of exploitation in the packet on the one hand
The calling for realizing participle tool is calling ICTCLAS to segment tool, and to guarantee that word segmentation result is accurate and validity, electric power is added
Industry specialized dictionary and thesaurus, such as professional word ' three-phase imbalance ', ' three-phase load ', ' three-phase equilibrium ' equal unified definition
For synonym ' three-phase problem ', the professional word ' time should not be so long ', ' overlong time ', ' time span is long ', ' time is too long ',
' time is long ' unified definition is synonym ' overlong time ' etc., and final improve forms 2835 power specialty words, and
1305 synonyms;
Step 2: automation text classification: finally according to the theme of identification and corresponding dictionary, combining classification algorithm is to complete
Measure customer service work order realize automation classification, such as the corresponding dictionary of more family power failure demand themes include ' processing ', ' causing ',
' phone ', ' more family power failures ', ' reflection ', ' verification ', ' incoming call ', ' it is required that ', in combination with including more family power failure demand themes
Other work orders carry out abundant dictionary, ultimately form each demand theme and respectively correspond respective dictionary;Combining classification algorithm pair later
Full dose customer service work order realizes automatic classification, and after new work order data generate, and also in combination with sorting algorithm, generates to new
Work order data classify, to identify client's demand.
Claims (2)
1. a kind of electric service client's demand recognition methods based on text similarity measurement, which is characterized in that including following mistake
Journey:
Step 0: establishing client's demand hot spot system table: randomly choosing N sample in full dose sample as training sample and survey
Sample sheet randomly chooses N sample in full dose sample, tells according to the client for including in cosine similarity algorithm identification work order
It asks, in conjunction with professional knowledge and logic, defines the business meaning of each theme, form client's demand hot spot system table;
Step 1: Text Pretreatment: the text in work order being subjected to text participle and text quantifies, the long content of text is cut
Point, reject stop words;Stop words refers to the big and invalid vocabulary of auxiliary words of mood on text analyzing without influence, amount, punctuation mark;
Step 2: automation text classification: finally according to the theme of identification and corresponding dictionary, combining classification algorithm is to full dose visitor
Take the classification that work order realizes automation.
2. a kind of electric service client's demand recognition methods based on text similarity measurement according to claim 1, special
Sign is, in step 0, the N value 10000.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811463322.4A CN109726283A (en) | 2018-12-03 | 2018-12-03 | A kind of electric service client's demand recognition methods based on text similarity measurement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811463322.4A CN109726283A (en) | 2018-12-03 | 2018-12-03 | A kind of electric service client's demand recognition methods based on text similarity measurement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109726283A true CN109726283A (en) | 2019-05-07 |
Family
ID=66295531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811463322.4A Pending CN109726283A (en) | 2018-12-03 | 2018-12-03 | A kind of electric service client's demand recognition methods based on text similarity measurement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109726283A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112116173A (en) * | 2019-06-19 | 2020-12-22 | 中国石油化工股份有限公司 | Invalid operation reduction system |
CN112667812A (en) * | 2020-12-30 | 2021-04-16 | 云南电网有限责任公司 | Method for identifying power supply service customer electricity quantity and electricity charge demand |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107861942A (en) * | 2017-10-11 | 2018-03-30 | 国网浙江省电力公司电力科学研究院 | A kind of electric power based on deep learning is doubtful to complain work order recognition methods |
-
2018
- 2018-12-03 CN CN201811463322.4A patent/CN109726283A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107861942A (en) * | 2017-10-11 | 2018-03-30 | 国网浙江省电力公司电力科学研究院 | A kind of electric power based on deep learning is doubtful to complain work order recognition methods |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112116173A (en) * | 2019-06-19 | 2020-12-22 | 中国石油化工股份有限公司 | Invalid operation reduction system |
CN112667812A (en) * | 2020-12-30 | 2021-04-16 | 云南电网有限责任公司 | Method for identifying power supply service customer electricity quantity and electricity charge demand |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018000269A1 (en) | Data annotation method and system based on data mining and crowdsourcing | |
CN109389418A (en) | Electric service client's demand recognition methods based on LDA model | |
CN109726283A (en) | A kind of electric service client's demand recognition methods based on text similarity measurement | |
CN107766560B (en) | Method and system for evaluating customer service flow | |
CN107516370A (en) | The automatic test and evaluation method of a kind of bank slip recognition | |
Müller et al. | Comparison of preprocessing approaches for text data in digital shop floor management systems | |
US11741318B2 (en) | Open information extraction from low resource languages | |
CN111221873A (en) | Inter-enterprise homonym identification method and system based on associated network | |
CN107704529A (en) | The recognition methods of information uniqueness, application server, system and storage medium | |
Pham et al. | A hybrid approach to vietnamese word segmentation using part of speech tags | |
CN110909162B (en) | Text quality inspection method, storage medium and electronic equipment | |
CN109388804A (en) | Report core views extracting method and device are ground using the security of deep learning model | |
CN110866394A (en) | Company name identification method and device, computer equipment and readable storage medium | |
CN110826991B (en) | Electronic receipt processing system and method | |
CN107886233B (en) | Service quality evaluation method and system for customer service | |
CN116340172A (en) | Data collection method and device based on test scene and test case detection method | |
CN115618264A (en) | Method, apparatus, device and medium for topic classification of data assets | |
CN113778875B (en) | System test defect classification method, device, equipment and storage medium | |
CN112328951B (en) | Processing method of experimental data of analysis sample | |
CN108255887B (en) | Method and device for verifying industry text | |
CN110083807B (en) | Contract modification influence automatic prediction method, device, medium and electronic equipment | |
CN114564391A (en) | Method and device for determining test case, storage medium and electronic equipment | |
CN110826330B (en) | Name recognition method and device, computer equipment and readable storage medium | |
Park et al. | Identify the failure mode of weapon system (or equipment) using machine learning | |
CN106779396A (en) | A kind of system for recognizing business standing degree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190507 |
|
RJ01 | Rejection of invention patent application after publication |