CN111831809A - Method for extracting keywords of question text - Google Patents
Method for extracting keywords of question text Download PDFInfo
- Publication number
- CN111831809A CN111831809A CN202010694075.XA CN202010694075A CN111831809A CN 111831809 A CN111831809 A CN 111831809A CN 202010694075 A CN202010694075 A CN 202010694075A CN 111831809 A CN111831809 A CN 111831809A
- Authority
- CN
- China
- Prior art keywords
- keyword
- keywords
- text
- idf value
- idf
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 21
- 230000011218 segmentation Effects 0.000 claims abstract description 10
- 238000000605 extraction Methods 0.000 claims abstract description 9
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 6
- 238000012163 sequencing technique Methods 0.000 claims abstract description 5
- 238000012423 maintenance Methods 0.000 abstract description 6
- 238000011156 evaluation Methods 0.000 abstract description 3
- 239000011521 glass Substances 0.000 description 3
- 239000010705 motor oil Substances 0.000 description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000006060 molten glass Substances 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 208000002173 dizziness Diseases 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a keyword extraction method of a problem text, which relates to the technical field of text processing, and comprises the steps of receiving the problem text input by a user, utilizing an IKAnalyzer word segmentation tool to perform word segmentation processing on the problem text to obtain a plurality of keywords of the problem text, utilizing a TF-IDF algorithm to respectively calculate a first TF-IDF value of each keyword, respectively calculating a second TF-IDF value of each keyword according to an attenuation function corresponding to each keyword, sequencing each keyword according to the size of the second TF-IDF value to generate a keyword set, combining the keywords with the largest TF-IDF value as the keywords of the problem text, accurately extracting the keywords in the problem text reported by the user, correcting a preset problem type label and a preset problem description label, providing a basis for product version iteration, and contributing to subsequent operation and maintenance performance evaluation and function improvement work, the user experience is improved.
Description
Technical Field
The invention relates to the technical field of text processing, in particular to a method for extracting keywords of a question text.
Background
With the development of society, people like to go out by using shared vehicles (shared bicycle, shared automobile and the like). Due to the fact that a plurality of vehicle using scenes are available, vehicle using channels are complex, and vehicle sources are numerous, various problems are frequently encountered by a user in the vehicle using process, and after the user reports a problem text through a system problem report function or a customer service message leaving function, a key problem is how to accurately extract keywords from the problem text.
At present, a CRM system is mainly adopted to maintain a problem type label and a problem classification label of a problem text reported by a user, prompt the problem type label in a reporting scene and simultaneously prompt a plurality of problem description labels corresponding to the problem type, and the user confirms a problem range by selecting the problem type label and the problem description label and simultaneously inputs specific contents. In the whole problem text reporting process, it is particularly important to select the problem type tag and the problem description tag, but in an actual situation, there may be a situation that the problem type tag and the problem description tag of the user are selected by mistake or are not selected (due to user experience reasons, the problem type tag and the problem description tag cannot be constrained and forcibly selected), for example, when the wiper cannot be normally used due to the fact that the wiper does not have glass water in the vehicle using process, if the user only selects the wiper and does not select the glass water, the problem text reported omits the glass water problem, so that the subsequent operation and maintenance performance rating and function improvement work cannot be completed, and the user experience is poor.
Disclosure of Invention
In order to solve the defects of the prior art, the embodiment of the invention provides a problem text keyword extraction method, which comprises the following steps:
receiving a problem text input by a user, and performing word segmentation processing on the problem text by using an IKAnalyzer word segmentation tool to obtain a plurality of keywords of the problem text;
respectively calculating a first TF-IDF value of each keyword by using a TF-IDF algorithm;
respectively calculating a second TF-IDF value of each keyword according to the attenuation function corresponding to each keyword;
and sequencing all keywords according to the size of the second TF-IDF value to generate a keyword set, and taking the keyword with the largest TF-IDF value as the keyword of the question text.
Preferably, calculating the second TF-IDF value for each key separately comprises:
and calculating a second TF-IDF value of each keyword under the corresponding attenuation function by using the formula y (f) (x) t, wherein f (x) is the attenuation function of the keyword, and t is the first TF-IDF value of the keyword.
Preferably, after extracting a keyword corresponding to the TF-IDF value with the largest value as a keyword of the question text, the method further comprises:
and respectively judging whether the keywords with the previously set digits in the keyword set hit preset problem type labels or not, and if not, replacing the preset problem type labels with the keywords.
Preferably, after extracting a keyword corresponding to the TF-IDF value with the largest value as a keyword of the question text, the method further comprises:
and respectively judging whether the keywords with the previously set digits in the keyword set hit preset problem description labels or not, and if not, replacing the preset problem description labels with the keywords.
Preferably, the question text comprises question text of a plurality of question types.
The problem text keyword extraction method provided by the embodiment of the invention has the following beneficial effects:
by using the TF-IDF algorithm based on the attenuation function, keywords in a problem text reported by a user can be accurately extracted, a preset problem type label and a preset problem description label are corrected, a product version iteration basis is provided, subsequent operation and maintenance performance evaluation and function improvement work are facilitated, and user experience is improved.
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
The problem text keyword extraction method provided by the embodiment of the invention comprises the following steps:
s101, receiving a problem text input by a user, and performing word segmentation processing on the problem text by using an IKAnalyzer word segmentation tool to obtain a plurality of keywords of the problem text.
S102, calculating the first TF-IDF value of each keyword respectively by using a TF-IDF algorithm.
As a specific embodiment of the present invention, the TF-IDF value is a product of a word frequency of a keyword and an inverse document frequency, and the word frequency of the keyword is a number of times that the keyword appears in a problem text reported by a vehicle on the same day and a number of times that the keyword appears in a problem text reported by the vehicle within 30 days; the calculation formula of the inverse document frequency is f ═ log (m/n), where m is the number of times the keyword is described for all vehicle problems in the city, and n is the number of times the keyword is described for all vehicle problems in all cities.
S103, respectively calculating the second TF-IDF value of each keyword according to the attenuation function corresponding to each keyword.
In a specific embodiment of the present invention, assuming that the maintenance cycle of the vehicle is one month, the key is the vehicle problem, the time decay function of the molten glass is y lgX/10+1, X ∈ (0-1), and the TF-IDF value indicating the problem of the molten glass is increased within 0-1 month until one maintenance cycle. The decay functions of other keywords are maintained according to actual conditions, such as abnormal sound of an engine, vehicle sanitation conditions and the like.
And S104, sequencing all keywords according to the size of the second TF-IDF value, generating a keyword set, and taking the keyword with the maximum TF-IDF value as the keyword of the problem text.
Optionally, the calculating the second TF-IDF value of each key separately comprises:
and calculating a second TF-IDF value of each keyword under the corresponding attenuation function by using the formula y (f) (x) t, wherein f (x) is the attenuation function of the keyword, and t is the first TF-IDF value of the keyword.
Optionally, after extracting a keyword corresponding to the TF-IDF value with the largest value as a keyword of the question text, the method further includes:
and respectively judging whether the keywords with the previously set digits in the keyword set hit the preset problem type tags or not, and if not, replacing the preset problem type tags with the keywords.
Optionally, after extracting a keyword corresponding to the TF-IDF value with the largest value as a keyword of the question text, the method further includes:
and respectively judging whether the keywords with the previously set digits in the keyword set hit the preset problem description labels or not, and if not, replacing the preset problem description labels with the keywords.
As a specific embodiment of the invention, when the problem text reported by the user is that "the peculiar smell in the vehicle in use is like engine oil or gasoline smell, and the air conditioner cannot remove the smell even when the vehicle is started, which leads to dizziness when the vehicle is started", and the selected problem description labels are the air conditioner and the engine oil, respectively, the replaced description labels are the peculiar smell and the engine oil.
Optionally, the question text includes question text for a plurality of question types.
As a specific embodiment of the present invention, the question text includes a description of the question of the vehicle itself, a question of feeling with the vehicle (bad parking, etc.).
The method for extracting keywords of the problem text provided by the embodiment of the invention comprises the steps of receiving the problem text input by a user, utilizing an IKAnalyzer word segmentation tool to perform word segmentation processing on the problem text to obtain a plurality of keywords of the problem text, utilizing a TF-IDF algorithm to respectively calculate a first TF-IDF value of each keyword, respectively calculating a second TF-IDF value of each keyword according to an attenuation function corresponding to each keyword, sequencing each keyword according to the size of the second TF-IDF value, generating a keyword set, combining the keywords with the largest TF-IDF value as the keywords of the problem text, accurately extracting the keywords in the problem text reported by the user, correcting a preset problem type label and a preset problem description label, providing a basis for iteration of a product version, and contributing to subsequent operation and maintenance performance evaluation and function improvement work, the user experience is improved.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. In addition, "first", "second", and the like in the above embodiments are for distinguishing the embodiments, and do not represent merits of the embodiments.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
Claims (5)
1. A question text keyword extraction method is characterized by comprising the following steps:
receiving a problem text input by a user, and performing word segmentation processing on the problem text by using an IKAnalyzer word segmentation tool to obtain a plurality of keywords of the problem text;
respectively calculating a first TF-IDF value of each keyword by using a TF-IDF algorithm;
respectively calculating a second TF-IDF value of each keyword according to the attenuation function corresponding to each keyword;
and sequencing all keywords according to the size of the second TF-IDF value to generate a keyword set, and taking the keyword with the largest TF-IDF value as the keyword of the question text.
2. The question text keyword extraction method of claim 1, wherein calculating the second TF-IDF value of each keyword separately comprises:
and calculating a second TF-IDF value of each keyword under the corresponding attenuation function by using the formula y (f) (x) t, wherein f (x) is the attenuation function of the keyword, and t is the first TF-IDF value of the keyword.
3. The question text keyword extraction method according to claim 1, wherein after extracting a keyword corresponding to a TF-IDF value having a maximum value as a keyword of the question text, the method further comprises:
and respectively judging whether the keywords with the previously set digits in the keyword set hit preset problem type labels or not, and if not, replacing the preset problem type labels with the keywords.
4. The question text keyword extraction method according to claim 1, wherein after extracting a keyword corresponding to a TF-IDF value having a maximum value as a keyword of the question text, the method further comprises:
and respectively judging whether the keywords with the previously set digits in the keyword set hit preset problem description labels or not, and if not, replacing the preset problem description labels with the keywords.
5. The question text keyword extraction method according to claim 1, wherein the question text includes question texts of a plurality of question types.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010694075.XA CN111831809A (en) | 2020-07-17 | 2020-07-17 | Method for extracting keywords of question text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010694075.XA CN111831809A (en) | 2020-07-17 | 2020-07-17 | Method for extracting keywords of question text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111831809A true CN111831809A (en) | 2020-10-27 |
Family
ID=72924322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010694075.XA Pending CN111831809A (en) | 2020-07-17 | 2020-07-17 | Method for extracting keywords of question text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111831809A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864301A (en) * | 2017-10-26 | 2018-03-30 | 平安科技(深圳)有限公司 | Client's label management method, system, computer equipment and storage medium |
CN108446274A (en) * | 2018-03-15 | 2018-08-24 | 北京科技大学 | A kind of keyword extracting method based on time-sensitive tf-idf |
CN108509482A (en) * | 2018-01-23 | 2018-09-07 | 深圳市阿西莫夫科技有限公司 | Question classification method, device, computer equipment and storage medium |
CN109241203A (en) * | 2018-09-27 | 2019-01-18 | 天津理工大学 | A kind of user preference and distance weighted clustering method of time of fusion factor |
CN110032632A (en) * | 2019-04-04 | 2019-07-19 | 平安科技(深圳)有限公司 | Intelligent customer service answering method, device and storage medium based on text similarity |
CN110245214A (en) * | 2019-05-09 | 2019-09-17 | 苏宁易购集团股份有限公司 | A kind of method and system of online service |
-
2020
- 2020-07-17 CN CN202010694075.XA patent/CN111831809A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107864301A (en) * | 2017-10-26 | 2018-03-30 | 平安科技(深圳)有限公司 | Client's label management method, system, computer equipment and storage medium |
CN108509482A (en) * | 2018-01-23 | 2018-09-07 | 深圳市阿西莫夫科技有限公司 | Question classification method, device, computer equipment and storage medium |
CN108446274A (en) * | 2018-03-15 | 2018-08-24 | 北京科技大学 | A kind of keyword extracting method based on time-sensitive tf-idf |
CN109241203A (en) * | 2018-09-27 | 2019-01-18 | 天津理工大学 | A kind of user preference and distance weighted clustering method of time of fusion factor |
CN110032632A (en) * | 2019-04-04 | 2019-07-19 | 平安科技(深圳)有限公司 | Intelligent customer service answering method, device and storage medium based on text similarity |
CN110245214A (en) * | 2019-05-09 | 2019-09-17 | 苏宁易购集团股份有限公司 | A kind of method and system of online service |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110502361B (en) | Fine granularity defect positioning method for bug report | |
JP6359062B2 (en) | Automatic analysis method | |
CN112163072B (en) | Data processing method and device based on multiple data sources | |
CN104392005A (en) | Automobile rapid loss setting method | |
CN105550253B (en) | Method and device for acquiring type relationship | |
CN110968380A (en) | Data visualization method and system | |
CN108345689B (en) | Trademark registration success rate query method and device, and trademark registration method and device | |
CN111858922A (en) | Service side information query method and device, electronic equipment and storage medium | |
CN105786941B (en) | Information mining method and device | |
CN116402630A (en) | Financial risk prediction method and system based on characterization learning | |
CN112016756A (en) | Data prediction method and device | |
CN111831809A (en) | Method for extracting keywords of question text | |
CN116934071A (en) | Risk monitoring method, risk monitoring device, risk monitoring equipment, risk monitoring medium and risk monitoring product | |
CN114417828A (en) | Entity relationship extraction method and system for server alarm log description text | |
CN110929502B (en) | Text error detection method and device | |
CN111209724A (en) | Text verification method and device, storage medium and processor | |
CN115048543B (en) | Image similarity judgment method, image searching method and device | |
CN114897566B (en) | Short video compliance online diagnosis analysis method and diagnosis analysis system based on big data | |
CN113723795B (en) | Information delivery strategy testing method and device, storage medium and electronic equipment | |
CN113918701B (en) | Billboard display method and device | |
Choeronissa et al. | Designing Cutting Tools for The Eva Foam Cutting Process in The Production of Polynet Mesh Spare Parts in CV. ELM Using the QFD Method | |
CN111914259B (en) | Data processing method and computing device | |
CN116089678A (en) | Data labeling method, device, electronic equipment and medium | |
US7263175B2 (en) | Methods, systems, and computer program products for tracking breaks in physical plant during maintenance of a public switched telephone network | |
CN116820523A (en) | System upgrading method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201027 |