CN108153738A - A kind of chat record analysis method and device based on hierarchical clustering - Google Patents
A kind of chat record analysis method and device based on hierarchical clustering Download PDFInfo
- Publication number
- CN108153738A CN108153738A CN201810137784.0A CN201810137784A CN108153738A CN 108153738 A CN108153738 A CN 108153738A CN 201810137784 A CN201810137784 A CN 201810137784A CN 108153738 A CN108153738 A CN 108153738A
- Authority
- CN
- China
- Prior art keywords
- chat record
- expert
- data
- chat
- grade
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to field of computer technology, provide a kind of chat record analysis method and device based on hierarchical clustering.Method includes obtaining chat record and relevant data information, and do the chat record pretreatment before DBSCAN clustering algorithms;Using the clustering algorithm of DBSCAN, clustering processing is done to the data after pretreatment;To the result data of the clustering processing of the DBSCAN, keyword is extracted as hot word, and the number of hot word occurs in statistical data entry using TF IDF algorithms, the hot word most using occurrence number is as the label of the chat record.The present invention proposes a kind of chat record analysis method based on hierarchical clustering, combine the performance characteristics between the clustering algorithm of DBSCAN and TF IDF algorithms, existing random chat record is done with the calibration of characteristic label so that the chat record can be used further in a simplified manner by subsequent process steps.
Description
【Technical field】
The present invention relates to field of computer technology, more particularly to a kind of chat record analysis method based on hierarchical clustering
And device.
【Background technology】
With the fast development of development of Mobile Internet technology, people increasingly get used to linking up and exchanging on line, this
The text data (such as chat record or question and answer data) of magnanimity is created, excavation and analysis to these data tend to obtain
Very abundant information.Text data digging has become one of research hotspot of message area at present, and customer service,
Immense value is played in terms of corporate decision.
However, different with structural data, text data is that height is non-structured, while also has very high ambiguousness
Matter, this also brings challenge to specific analysis work.
In consideration of it, the defects of overcoming present in the prior art is the art urgent problem to be solved.
【Invention content】
The technical problem to be solved by the present invention is to current text data digging have become message area research hotspot it
One, and immense value is played in terms of customer service, corporate decision, however, different with structural data, text data is high
It spends non-structured, while also there is very high ambiguity property, this also brings difficulty to specific analysis work.
The present invention adopts the following technical scheme that:
In a first aspect, the present invention provides a kind of chat record analysis method based on hierarchical clustering, including:
Chat record and relevant data information are obtained, and the pre- place before DBSCAN clustering algorithms is done to the chat record
Reason;
Using the clustering algorithm of DBSCAN, clustering processing is done to the data after pretreatment;
To the result data of the clustering processing of the DBSCAN, keyword is extracted as hot word using TF-IDF algorithms, and
There is the number of hot word in statistical data entry, and the hot word most using occurrence number is as the label of the chat record.
Preferably, the chat record includes the chat note between the customer issue extracted in system log record, client
One or more in chat record between record, client and expert and the reply content published an article corresponding to client;Institute
State relevant data information include the special vocabulary in financial field, Chinese stoplist, pre-training term vector data.
Preferably, the pretreatment done to the chat record before DBSCAN clustering algorithms, including:
Stock name, code in problem data is unified to be substituted with specified identifier, then text data is performed numerous
One or more operation in letter conversion, capital and small letter conversion and stop words removal;
Text data is converted to and is represented by the vector that each entry is formed.
Preferably, the clustering algorithm using DBSCAN does clustering processing to the data after pretreatment, including:
Classification minimum data item number is set as:The interval of data count/a, wherein a is [100-300];
Central point maximum distance is set as:Data average distance/b, wherein b by data average distance may be used with
The mode of machine sampling is estimated to obtain, and interval is [0.1-0.3].
Preferably, it is described that keyword is extracted as hot word using TF-IDF algorithms, it specifically includes:
Pass through formulaOne by one in calculation result data entry importance;Wherein, molecule is this
Occurrence number of the word in chat record, and denominator is then the sum of occurrence number of all words in chat record;
Pass through formulaWord general importance is calculated, wherein, | D | it is corpus
In chat record sum;
According to formula tfidfI, j=tfI, j×idfi, the synthesis importance of each word is calculated, and according to default threshold
Value screens out the entry that comprehensive importance is less than the predetermined threshold value, obtains keyword as hot word.
Preferably, the method further includes:
Confirm one or more user identifier included in chat record, the next chat record will be analyzed
Label is assigned to hobby/speciality information bar of the corresponding user identifier;
According to the label recorded in the hobby of the corresponding user identifier/speciality information bar, marked to the user is logged in
Intelligent terminal push and the tag match information of knowledge.
Preferably, the method further includes:
Accuracy of information included in the chat sentence or entry of each user identifier is corresponded in analysis chat record, and
Expert grade of the relative users mark under the label of the chat record is updated according to described information accuracy to integrate;
Expert's grade integration for when server receives the expert opinion request message that user A is sended over,
Server filters out the mark for the highest chat record of similarity of asking a question with user A from each user identifier that it is managed
Label, and expert's grade and the matched at least one user identifier of request of the user A;Establish at least one user identifier
With the chat window of the user A.
Preferably, corresponding the method has been obtained in user A to further include:
According to the scoring of user A, the account for giving at least one user identifier is rewarded accordingly;And according in history
Each scoring for puing question to user, adds a public praise grade dimension, to put question to user can for expert's grade under each user identifier
With to server send problem request when, can be by setting corresponding expert's grade and/or public praise grade, to screen specified model
Expert assistance in enclosing replies.
Preferably, described information includes one in stock code, stock price, stock trend, listed company's peripheral information
Item is multinomial, described and special under the label of the chat record according to described information accuracy update relative users mark
Family's grade integration, specifically includes:
According to the stock code, stock price and stock trend, the corresponding real stock information of time therewith is matched,
If matching error is less than predetermined threshold value, increase expert grade product of the relative users mark under the label of the chat record
Point, otherwise, reduce expert grade integration of the relative users mark under the label of the chat record;Wherein, described expert etc.
Grade integration is corresponding with each expert's grade;
For listed company's peripheral information, then the preset verification time is given, if by counting greatly within the corresponding verification time
It is matched according to getting in reality with listed company's peripheral information, then increases mark of the relative users mark in the chat record
Otherwise the expert's grade integration signed, reduces expert grade integration of the relative users mark under the label of the chat record.
Second aspect, the present invention also provides a kind of chat record analytical equipment based on hierarchical clustering, including:At least one
A processor;And the memory being connect at least one processor communication;Wherein, be stored with can quilt for the memory
The instruction that at least one processor performs, described instruction are arranged to carry out gathering based on level described in first aspect by program
The chat record analysis method of class.
The third aspect, the present invention also provides a kind of nonvolatile computer storage media, the computer storage media
Computer executable instructions are stored with, which is executed by one or more processors, for completing first
The chat record analysis method based on hierarchical clustering described in aspect.
The present invention proposes a kind of chat record analysis method based on hierarchical clustering, combines the clustering algorithm of DBSCAN
Performance characteristics between TF-IDF algorithms have been done existing random chat record with the calibration of characteristic label,
The chat record is further used in a simplified manner by subsequent process steps.
【Description of the drawings】
In order to illustrate the technical solution of the embodiments of the present invention more clearly, it will make below to required in the embodiment of the present invention
Attached drawing is briefly described.It should be evident that drawings described below is only some embodiments of the present invention, for
For those of ordinary skill in the art, without creative efforts, other are can also be obtained according to these attached drawings
Attached drawing.
Fig. 1 is a kind of chat record analysis method flow diagram based on hierarchical clustering provided in an embodiment of the present invention;
Fig. 2 is the stream pre-processed in a kind of chat record analysis method based on hierarchical clustering provided in an embodiment of the present invention
Journey schematic diagram;
Fig. 3 is IF-IDF algorithms in a kind of chat record analysis method based on hierarchical clustering provided in an embodiment of the present invention
The flow diagram of processing;
Fig. 4 is a kind of the first application scenarios of chat record analysis method based on hierarchical clustering provided in an embodiment of the present invention
Flow diagram;
Fig. 5 is a kind of the second application scenarios of chat record analysis method based on hierarchical clustering provided in an embodiment of the present invention
Flow diagram;
Fig. 6 is a kind of structural representation of chat record analytical equipment based on hierarchical clustering provided in an embodiment of the present invention
Figure.
【Specific embodiment】
In order to make the purpose , technical scheme and advantage of the present invention be clearer, with reference to the accompanying drawings and embodiments, it is right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
In the description of the present invention, the fingers such as term " interior ", " outer ", " longitudinal direction ", " transverse direction ", " on ", " under ", " top ", " bottom "
The orientation or position relationship shown be based on orientation shown in the drawings or position relationship, be for only for ease of the description present invention rather than
It is required that the present invention must be with specific azimuth configuration and operation, therefore it is not construed as limitation of the present invention.
In addition, as long as technical characteristic involved in the various embodiments of the present invention described below is each other not
Conflict is formed to can be combined with each other.
Embodiment 1:
The embodiment of the present invention 1 provides a kind of chat record analysis method based on hierarchical clustering, as shown in Figure 1, including:
In step 201, chat record and relevant data information are obtained, and DBSCAN clusters are done to the chat record
Pretreatment before algorithm.
In embodiments of the present invention, the chat record include system log in extract customer issue record, client it
Between chat record, the chat record between client and expert and one in the reply content published an article corresponding to client or
Person is multinomial;The relevant data information include the special vocabulary in financial field, Chinese stoplist, pre-training term vector number
According to.Wherein, the chat record can capture the acquisitions such as data and word2vec tools by the whole network.
In step 202, using the clustering algorithm of DBSCAN, clustering processing is done to the data after pretreatment.
In step 203, to the result data of the clustering processing of the DBSCAN, keyword is extracted using TF-IDF algorithms
As hot word, and there is the number of hot word in statistical data entry, and the hot word most using occurrence number is as the chat record
Label.
Wherein, it with reference to the embodiment of the present invention preferably after hot word is extracted, is closed hot word as the class discrimination of the category
Keyword, and all chat record contents occur in the logical category and counted, the most chat record content of occurrence number is made
Representative content for the category.
The embodiment of the present invention proposes a kind of chat record analysis method based on hierarchical clustering, combines the poly- of DBSCAN
Performance characteristics between class algorithm and TF-IDF algorithms have been done existing random chat record with characteristic label
Calibration so that the chat record can be used further in a simplified manner by subsequent process steps.
With reference to the embodiment of the present invention, the DBSCAN clusters of being done to the chat record involved in step 201 are calculated
Pretreatment before method additionally provides a kind of concrete methods of realizing, as shown in Fig. 2, including:
In step 2011, stock name, the code in problem data are unified to be substituted with specified identifier, then to text
Notebook data performs the one or more operation in complicated and simple conversion, capital and small letter conversion and stop words removal.
In step 2012, text data is converted to and is represented by the vector that each entry is formed.Specific practice is by text
Word word vector in data represents that the vector that the term vector that then adds up obtains a data represents.
The several definition of DBSCAN algorithms employed in the embodiment of the present invention are introduced first:
Ε neighborhoods:Given object radius is known as the Ε neighborhoods of the object for the region in Ε;
Kernel object:If the sample points in given object Ε fields are more than or equal to MinPts, which is referred to as core
Heart object;
Direct density is reachable:For sample set D, if sample point q, in the Ε fields of p, and p is kernel object,
So object q is reachable from the direct density of object p.
Density is reachable:For sample set D, a string of sample points p1, p2 ... .pn, p=p1, q=pn are given, if object
Pi is reachable from the direct density of pi-1, then object q is reachable from object p density.
Density is connected:There are the point o in sample set D, if object o to object p and object q are that density is reachable
, then p and q density is connected.
It can be found that it is the reachable transitive closure of direct density, and this relationship is asymmetrical, density that density is reachable
It is symmetric relation to be connected.The purpose of DBSCAN is to find density to be connected the maximum set of object.
Eg:Assuming that radius Ε=3, MinPts=3, in the E fields of point p a little { m, p, p1, p2, o }, in the E fields of point m
A little { m, q, p, m1, m2 }, in the E fields of point q a little { q, m }, in the E fields of point o a little { o, p, s }, in the E fields of point s
A little { o, s, s1 }
So kernel object has p, m, o, and (q is not kernel object to s, small because its corresponding E fields midpoint quantity is equal to 2
In MinPts=3);
Point m is reachable from the direct density of point p, because m is in the E fields of p, and p is kernel object;
Point q is reachable from point p density, because point q is reachable from the direct density of point m, and point m is reachable from the direct density of point p;
Point q is connected to point s density, because point q is reachable from point p density, and s is reachable from point p density.
With reference to the embodiment of the present invention, also for the clustering algorithm using DBSCAN, to the data after pretreatment
Clustering processing is done, one group is provided and parameter is effectively configured, including:
Classification minimum data item number is set as:The interval of data count/a, wherein a is [100-300];
Central point maximum distance is set as:Data average distance/b, wherein b by data average distance may be used with
The mode of machine sampling is estimated to obtain, and interval is [0.1-0.3].
With reference to the embodiment of the present invention, the use TF-IDF algorithms extraction keyword involved in step 203 is made
For hot word, as shown in figure 3, specifically including:
In step 2031, pass through formulaOne by one in calculation result data entry weight
The property wanted;Wherein, molecule is occurrence number of the word in chat record, and denominator is then that all words go out in chat record
The sum of occurrence number;
In step 2032, pass through formulaWord general importance is calculated,
Wherein, | D | for the chat record sum in corpus:Number of files (number of files i.e.) comprising word is if the word is not
In corpus, may result in denominator is zero, therefore is used under normal circumstances;
In step 2033, according to formula tfidfI, j=tfI, j×idfi(3), the comprehensive weight of each word is calculated
The property wanted, and entry of the comprehensive importance less than the predetermined threshold value is screened out according to predetermined threshold value, keyword is obtained as hot word.
Label based on the chat record that the embodiment of the present invention is proposed, the embodiment of the present invention additionally provide a kind of user
Method, therefore, after performing step 203 in embodiment 1, as shown in figure 4, the method further includes:
In step 301, confirm one or more user identifier included in chat record, next institute will be analyzed
The label for stating chat record is assigned to hobby/speciality information bar of the corresponding user identifier.
In step 302, according to the label recorded in the hobby of the corresponding user identifier/speciality information bar, to stepping on
Record the intelligent terminal push of the user identifier and the tag match information.
The step 204 and step 205 that the above-mentioned combination embodiment of the present invention proposes are only with obtained by the embodiment of the present invention
One of application scenarios to chat record label (are known as the first application scenarios), as shown in figure 5, for reference to the embodiment of the present invention 1
Another application scenarios (being known as the second application scenarios) obtained afterwards, also, second application scenarios and the first application scenarios
Realization can also be combined, is implemented as in second application scenarios:
In step 401, it analyzes and is corresponded to included in the chat sentence or entry of each user identifier in chat record
Accuracy of information, and the expert of relative users mark under the label of the chat record etc. is updated according to described information accuracy
Grade integration.
Wherein, different expert's grades corresponds to corresponding integral threshold, i.e., phase can be realized more than associated quad
The transition of Ying expert's grade.
In step 402, expert's grade integration in server for receiving the expert opinion that user A sends over
During request message, server filters out highest described with user A similarities of asking a question from each user identifier that it is managed
The label of chat record, and expert's grade and the matched at least one user identifier of request of the user A;Described in establishing at least
The chat window of one user identifier and the user A.
In order to further improve the practicability of the second application scenarios, i.e. at least one of second application scenarios user identifier
(being rated as expert, can possess one or more other users for solving the problems, such as that user A is proposed), needs one
The above-mentioned ecosphere answered a question could be effectively maintained under kind driving force and supervision power, it is therefore preferable that being obtained in user A
Complete corresponding the method further includes:
According to the scoring of user A, the account for giving at least one user identifier is rewarded accordingly;And according in history
Each scoring for puing question to user, adds a public praise grade dimension, to put question to user can for expert's grade under each user identifier
With to server send problem request when, can be by setting corresponding expert's grade and/or public praise grade, to screen specified model
Expert assistance in enclosing replies.
In embodiments of the present invention, described information includes stock code, stock price, stock trend, listed company periphery
One or more in information, it is described and according to described information accuracy update relative users mark in the chat record
Expert's grade integration under label, specifically includes:
According to the stock code, stock price and stock trend, the corresponding real stock information of time therewith is matched,
If matching error is less than predetermined threshold value, increase expert grade product of the relative users mark under the label of the chat record
Point, otherwise, reduce expert grade integration of the relative users mark under the label of the chat record;Wherein, described expert etc.
Grade integration is corresponding with each expert's grade;
For listed company's peripheral information, then the preset verification time is given, if by counting greatly within the corresponding verification time
It is matched according to getting in reality with listed company's peripheral information, then increases mark of the relative users mark in the chat record
Otherwise the expert's grade integration signed, reduces expert grade integration of the relative users mark under the label of the chat record.
Embodiment 2:
It is the configuration diagram of the chat record analytical equipment based on hierarchical clustering of the embodiment of the present invention such as Fig. 6.This reality
The chat record analytical equipment based on hierarchical clustering for applying example includes one or more processors 21 and memory 22.Wherein,
In Fig. 6 by taking a processor 21 as an example.
Processor 21 can be connected with memory 22 by bus or other modes, to be connected as by bus in Fig. 6
Example.
Memory 22 can as a kind of chat record analysis method based on hierarchical clustering and device non-volatile computer
Storage medium is read, available for storage non-volatile software program, non-volatile computer executable program and module, is such as implemented
The chat record analysis method based on hierarchical clustering in example 1.Processor 21 is stored in non-easy in memory 22 by operation
The property lost software program, instruction and module, should so as to perform the various functions of the chat record analytical equipment based on hierarchical clustering
With and data processing, that is, realize embodiment 1 the chat record analysis method based on hierarchical clustering.
Memory 22 can include high-speed random access memory, can also include nonvolatile memory, for example, at least
One disk memory, flush memory device or other non-volatile solid state memory parts.In some embodiments, memory 22
It is optional including relative to the remotely located memory of processor 21, these remote memories can pass through network connection to processor
21.The example of above-mentioned network includes but not limited to internet, intranet, LAN, mobile radio communication and combinations thereof.
Described program instruction/module is stored in the memory 22, is held when by one or more of processors 21
During row, the chat record analysis method based on hierarchical clustering in above-described embodiment 1 is performed, for example, performing figure described above
1- each steps shown in fig. 5.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of embodiment is can to lead to
It crosses program and is completed to instruct relevant hardware, which can be stored in a computer readable storage medium, storage medium
It can include:Read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access
Memory), disk or CD etc..
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention
All any modification, equivalent and improvement made within refreshing and principle etc., should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of chat record analysis method based on hierarchical clustering, which is characterized in that including:
Chat record and relevant data information are obtained, and the pretreatment before DBSCAN clustering algorithms is done to the chat record;
Using the clustering algorithm of DBSCAN, clustering processing is done to the data after pretreatment;
To the result data of the clustering processing of the DBSCAN, keyword is extracted as hot word, and count using TF-IDF algorithms
There is the number of hot word in data entry, and the hot word most using occurrence number is as the label of the chat record.
2. the chat record analysis method according to claim 1 based on hierarchical clustering, which is characterized in that the chat note
Record includes the chat note between the customer issue extracted in system log record, the chat record between client, client and expert
One or more in record and the reply content published an article corresponding to client;The relevant data information includes finance and leads
The special vocabulary in domain, Chinese stoplist, pre-training term vector data.
3. the chat record analysis method according to claim 1 based on hierarchical clustering, which is characterized in that described to described
Chat record does the pretreatment before DBSCAN clustering algorithms, including:
Stock name, code in problem data is unified to be substituted with specified identifier, complicated and simple turn is then performed to text data
It changes, the one or more operation in capital and small letter conversion and stop words removal;
Text data is converted to and is represented by the vector that each entry is formed.
4. the chat record analysis method according to claim 1 based on hierarchical clustering, which is characterized in that the use
The clustering algorithm of DBSCAN does clustering processing to the data after pretreatment, including:
Classification minimum data item number is set as:The interval of data count/a, wherein a is [100-300];
Central point maximum distance is set as:Random pumping may be used by data average distance in data average distance/b, wherein b
The mode of sample is estimated to obtain, and interval is [0.1-0.3].
5. the chat record analysis method according to claim 1 based on hierarchical clustering, which is characterized in that the use
TF-IDF algorithms extract keyword as hot word, specifically include:
Pass through formulaOne by one in calculation result data entry importance;Wherein, molecule is that the word exists
Occurrence number in chat record, and denominator is then the sum of occurrence number of all words in chat record;
Pass through formulaWord general importance is calculated, wherein, | D | for chatting in corpus
Its record sum;
According to formula tfidfI, j=tfI, j×idfi, the synthesis importance of each word is calculated, and sieve according to predetermined threshold value
The entry that comprehensive importance is less than the predetermined threshold value is fallen in choosing, obtains keyword as hot word.
6. the chat record analysis method according to claim 1 based on hierarchical clustering, which is characterized in that the method is also
Including:
Confirm one or more user identifier included in chat record, the label of the chat record come will be analyzed
It is assigned to hobby/speciality information bar of the corresponding user identifier;
According to the label recorded in the hobby of the corresponding user identifier/speciality information bar, to the login user identifier
Intelligent terminal pushes and the tag match information.
7. the chat record analysis method according to claim 1 based on hierarchical clustering, which is characterized in that the method is also
Including:
It analyzes and accuracy of information included in the chat sentence or entry of each user identifier is corresponded in chat record, and according to
Expert grade integration of the described information accuracy update relative users mark under the label of the chat record;
Expert's grade integrates, when server receives the expert opinion request message that user A is sended over, to service
Device filters out the label for the highest chat record of similarity of asking a question with user A from each user identifier that it is managed,
And the matched at least one user identifier of request of expert's grade and the user A;Establish at least one user identifier and
The chat window of the user A.
8. the chat record analysis method according to claim 7 based on hierarchical clustering, which is characterized in that obtained in user A
Corresponding the method is taken to further include:
According to the scoring of user A, the account for giving at least one user identifier is rewarded accordingly;And according to respectively carrying in history
It asks the scoring of user, a public praise grade dimension is added for expert's grade under each user identifier, to put question to user can be
It, can be by setting corresponding expert's grade and/or public praise grade, to screen in specified range when sending problem request to server
Expert assistance reply.
9. the chat record analysis method according to claim 7 based on hierarchical clustering, which is characterized in that described information packet
Include the one or more in stock code, stock price, stock trend, listed company's peripheral information, it is described and according to described
Expert grade integration of the accuracy of information update relative users mark under the label of the chat record, specifically includes:
According to the stock code, stock price and stock trend, the corresponding real stock information of time therewith is matched, if
It is less than predetermined threshold value with error, then increases expert grade integration of the relative users mark under the label of the chat record, it is no
Then, expert grade integration of the relative users mark under the label of the chat record is reduced;Wherein, expert's grade integration
It is corresponding with each expert's grade;
For listed company's peripheral information, then the preset verification time is given, if being obtained within the corresponding verification time by big data
It gets in reality and is matched with listed company's peripheral information, then increase relative users mark under the label of the chat record
Expert's grade integration, otherwise, reduce relative users mark under the label of the chat record expert's grade integration.
10. a kind of chat record analytical equipment based on hierarchical clustering, which is characterized in that including:At least one processor;With
And the memory being connect at least one processor communication;Wherein, be stored with can be by described at least one for the memory
The instruction that processor performs, it is any described based on hierarchical clustering that described instruction by program is arranged to carry out claim 1-9
Chat record analysis method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810137784.0A CN108153738A (en) | 2018-02-10 | 2018-02-10 | A kind of chat record analysis method and device based on hierarchical clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810137784.0A CN108153738A (en) | 2018-02-10 | 2018-02-10 | A kind of chat record analysis method and device based on hierarchical clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108153738A true CN108153738A (en) | 2018-06-12 |
Family
ID=62459939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810137784.0A Pending CN108153738A (en) | 2018-02-10 | 2018-02-10 | A kind of chat record analysis method and device based on hierarchical clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108153738A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920675A (en) * | 2018-07-09 | 2018-11-30 | 北京百悟科技有限公司 | A kind of method, apparatus of information processing, computer storage medium and terminal |
CN109242479A (en) * | 2018-09-04 | 2019-01-18 | 深圳市百宝廊珠宝首饰有限公司 | A kind of method and apparatus that red packet is got based on integrating system |
CN109522415A (en) * | 2018-10-17 | 2019-03-26 | 厦门快商通信息技术有限公司 | A kind of corpus labeling method and device |
CN109766422A (en) * | 2018-12-29 | 2019-05-17 | 上海智臻智能网络科技股份有限公司 | Information processing method, apparatus and system, storage medium, terminal |
CN109857943A (en) * | 2018-12-22 | 2019-06-07 | 深圳市珍爱捷云信息技术有限公司 | Permission Levels determine method, apparatus, computer equipment and readable storage medium storing program for executing |
CN111309859A (en) * | 2020-01-21 | 2020-06-19 | 上饶市中科院云计算中心大数据研究院 | Scenic spot network public praise emotion analysis method and device |
CN115099586A (en) * | 2022-06-10 | 2022-09-23 | 上海异工同智信息科技有限公司 | Method and device for identifying operation risk |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159998A (en) * | 2015-09-08 | 2015-12-16 | 海南大学 | Keyword calculation method based on document clustering |
CN106095988A (en) * | 2016-06-21 | 2016-11-09 | 上海智臻智能网络科技股份有限公司 | Automatic question-answering method and device |
CN106126690A (en) * | 2016-06-29 | 2016-11-16 | 合肥民众亿兴软件开发有限公司 | A kind of info web filter method based on content of text |
CN106874292A (en) * | 2015-12-11 | 2017-06-20 | 北京国双科技有限公司 | Topic processing method and processing device |
US20170200205A1 (en) * | 2016-01-11 | 2017-07-13 | Medallia, Inc. | Method and system for analyzing user reviews |
CN107103043A (en) * | 2017-03-29 | 2017-08-29 | 国信优易数据有限公司 | A kind of Text Clustering Method and system |
-
2018
- 2018-02-10 CN CN201810137784.0A patent/CN108153738A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159998A (en) * | 2015-09-08 | 2015-12-16 | 海南大学 | Keyword calculation method based on document clustering |
CN106874292A (en) * | 2015-12-11 | 2017-06-20 | 北京国双科技有限公司 | Topic processing method and processing device |
US20170200205A1 (en) * | 2016-01-11 | 2017-07-13 | Medallia, Inc. | Method and system for analyzing user reviews |
CN106095988A (en) * | 2016-06-21 | 2016-11-09 | 上海智臻智能网络科技股份有限公司 | Automatic question-answering method and device |
CN106126690A (en) * | 2016-06-29 | 2016-11-16 | 合肥民众亿兴软件开发有限公司 | A kind of info web filter method based on content of text |
CN107103043A (en) * | 2017-03-29 | 2017-08-29 | 国信优易数据有限公司 | A kind of Text Clustering Method and system |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108920675A (en) * | 2018-07-09 | 2018-11-30 | 北京百悟科技有限公司 | A kind of method, apparatus of information processing, computer storage medium and terminal |
CN109242479A (en) * | 2018-09-04 | 2019-01-18 | 深圳市百宝廊珠宝首饰有限公司 | A kind of method and apparatus that red packet is got based on integrating system |
CN109522415A (en) * | 2018-10-17 | 2019-03-26 | 厦门快商通信息技术有限公司 | A kind of corpus labeling method and device |
CN109857943A (en) * | 2018-12-22 | 2019-06-07 | 深圳市珍爱捷云信息技术有限公司 | Permission Levels determine method, apparatus, computer equipment and readable storage medium storing program for executing |
CN109857943B (en) * | 2018-12-22 | 2023-04-18 | 深圳市珍爱捷云信息技术有限公司 | Permission level determination method and device, computer equipment and readable storage medium |
CN109766422A (en) * | 2018-12-29 | 2019-05-17 | 上海智臻智能网络科技股份有限公司 | Information processing method, apparatus and system, storage medium, terminal |
CN111309859A (en) * | 2020-01-21 | 2020-06-19 | 上饶市中科院云计算中心大数据研究院 | Scenic spot network public praise emotion analysis method and device |
CN115099586A (en) * | 2022-06-10 | 2022-09-23 | 上海异工同智信息科技有限公司 | Method and device for identifying operation risk |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108153738A (en) | A kind of chat record analysis method and device based on hierarchical clustering | |
US11218500B2 (en) | Methods and systems for automated parsing and identification of textual data | |
US11868733B2 (en) | Creating a knowledge graph based on text-based knowledge corpora | |
US11663405B2 (en) | Machine learning applications for temporally-related events | |
CN111414479B (en) | Label extraction method based on short text clustering technology | |
CN110909165B (en) | Data processing method, device, medium and electronic equipment | |
US8312049B2 (en) | News group clustering based on cross-post graph | |
US7783642B1 (en) | System and method of identifying web page semantic structures | |
US11048712B2 (en) | Real-time and adaptive data mining | |
US9110985B2 (en) | Generating a conceptual association graph from large-scale loosely-grouped content | |
WO2016179938A1 (en) | Method and device for question recommendation | |
WO2020108063A1 (en) | Feature word determining method, apparatus, and server | |
US20140304267A1 (en) | Suffix tree similarity measure for document clustering | |
US20240163684A1 (en) | Method and System for Constructing and Analyzing Knowledge Graph of Wireless Communication Network Protocol, and Device and Medium | |
CN105045875B (en) | Personalized search and device | |
US20110191270A1 (en) | Intelligent decision supporting system and method for making intelligent decision | |
CN108885623A (en) | The lexical analysis system and method for knowledge based map | |
CN111625658A (en) | Voice interaction method, device and equipment based on knowledge graph and storage medium | |
JP2005523533A (en) | Processing mixed numeric and / or non-numeric data | |
Gu et al. | [Retracted] Application of Fuzzy Decision Tree Algorithm Based on Mobile Computing in Sports Fitness Member Management | |
CN114579397A (en) | Anomaly detection method and system based on data mining | |
Ramkissoon et al. | Legitimacy: an ensemble learning model for credibility based fake news detection | |
CN114491079A (en) | Knowledge graph construction and query method, device, equipment and medium | |
Xiao et al. | Traffic peak period detection from an image processing view | |
CN115660695A (en) | Customer service personnel label portrait construction method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180612 |
|
RJ01 | Rejection of invention patent application after publication |