CN109478191A - Text mining method, text mining program and text mining device - Google Patents
Text mining method, text mining program and text mining device Download PDFInfo
- Publication number
- CN109478191A CN109478191A CN201780043375.8A CN201780043375A CN109478191A CN 109478191 A CN109478191 A CN 109478191A CN 201780043375 A CN201780043375 A CN 201780043375A CN 109478191 A CN109478191 A CN 109478191A
- Authority
- CN
- China
- Prior art keywords
- mentioned
- words
- picture
- text
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/358—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
In text analyzing step (S109~S110), class type cluster analysis is carried out to the words extracted from the text data being entered.In picture generation step (S111), according to most data numbers (n) in group number (m) and group, acquire (m) a cluster from the analysis result of text analyzing step, generate in picture display comprising (n) it is a it is below belong to cluster words group picture data.In analysis as the result is shown step (S112), picture is shown according to picture data generated.In this way, the result of class type cluster analysis is shown in picture in such a way that user can intuitively understand.
Description
Technical field
The present invention relates to text minings, more particularly to by the analysis of text data as the result is shown in the text mining side of picture
Method, text mining program and text mining device.
Background technique
In recent years, parsing is with a large amount of text datas documented by free form, and analytically result acquires useful information
Text mining is attracted attention.Words is extracted in text mining, such as from the text data of analysis object, and passes through parsing words
The frequency of occurrences and there is trend etc. to acquire information.
Hereinafter, will be analyzed as the result is shown for class type cluster analysis is carried out to the words extracted from text data
It is inquired into the text mining device of picture.In class type cluster analysis, according to the similarity between words, and class type
Cluster of the creation comprising the high words of similarity.In general, using arborescence shown in figure 15 (tree figure:
Dendrogram the result of class type cluster analysis) is supplied to user (analyst).
With this case invention associated, a kind of grouping device is recorded in patent document 1, divide group single with class type
Member, the class type divide group unit to construct arborescence, search for arborescence and generate can be from the index that lower layer to upper layer is determined simultaneously
It is stored in storage unit.A kind of offer inquiry unit is provided in patent document 2, includes distance matrix computing unit,
It calculates the distance between keyword, generate can search for keyword between keyword at a distance from distance matrix data and be stored in
Storage unit;And divide group unit, divide keyword class type to group using distance matrix, and as can to upper layer search from lower layer
The index from bottom to top of arborescence constructed by rope and be stored in storage unit.
Existing technical literature
Patent document
Patent document 1: Japanese Patent Laid-Open 2011-216021 bulletin
Patent document 2: Japanese Patent Laid-Open 2012-150539 bulletin
Summary of the invention
Problem to be solved by the invention
Previous text mining device is using arborescence by class type cluster analysis as the result is shown in picture.However, such as
The problem of this text mining device can not intuitively understand analysis result there are user.For example, in analysis shown in figure 15
As a result in, when cluster number is set as 4 by user, as shown in figure 16, cut-off rule can be set on arborescence.However, using
Person just can not intuitively identify the words that each cluster is included only by seeing arborescence so.Also, user is in words number
When situation that is more and changing cluster number, it can not intuitively grasp where the words that each cluster is included can such as change.
Also, user can not learn which words is more important because arborescence does not record the frequency of occurrences of words.
Also, when the text data for analyzing object is the situation of time series data of the information with date or moment etc., user
It is desirable to learn that analysis result changes with time sometimes.However, previous text mining device is unable to satisfy user's
Above-mentioned expectation.
Therefore, the object of the present invention is to provide can intuitively understand the result of class type cluster analysis with user
Mode be shown in the text mining method, text mining program and text mining device of picture.
Technical means to solve problem
First embodiment of the present invention is a kind of text mining method, by the analysis of text data as the result is shown in picture,
It is characterized in that, comprising:
Text analyzing step, to the words (can be individual character and/or word) extracted from the text data being entered
Class type cluster analysis is carried out,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, asked from above-mentioned analysis result
Above-mentioned group number cluster, generate in picture display belong to above-mentioned cluster comprising above-mentioned most data numbers are below
The picture data of the group of words.
Second embodiment of the present invention is characterized in that, in the 1st embodiment of the invention,
The words that above-mentioned group includes be subordinated to according to the sequence of the frequency of occurrences from high to low it is corresponding with above-mentioned group
It is selected in the words of cluster.
Third embodiment of the present invention is characterized in that, in the 2nd embodiment of the invention,
In above-mentioned picture, above-mentioned group has and belongs to the frequency of occurrences of the words of cluster corresponding with above-mentioned group
The corresponding size of aggregate value.
Fourth embodiment of the present invention is characterized in that, in the 3rd embodiment of the invention,
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
Fifth embodiment of the present invention is characterized in that, in the 1st embodiment of the invention,
It further include the instruction input step for inputting the instruction from user,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according in above-mentioned instruction input step
The instruction of middle input and be performed.
Sixth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction of above-mentioned group number is received,
In above-mentioned picture generation step, above-mentioned picture is generated according to the group number set in above-mentioned instruction input step
Face data.
Seventh embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction of above-mentioned most data numbers is received,
In above-mentioned picture generation step, generated according to the most data numbers set in above-mentioned instruction input step
State picture data.
Eighth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction during analysis object is received,
In above-mentioned text analyzing step, to the analysis set in above-mentioned instruction input step in above-mentioned text data
The words that text data during object is included carries out above-mentioned class type cluster analysis.
Ninth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction of analysis target is received,
In above-mentioned text analyzing step, set from extraction in above-mentioned text data and in above-mentioned instruction input step
The words for analyzing the corresponding type of target, carries out above-mentioned class type cluster analysis.
Tenth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, receives words and excludes instruction,
In above-mentioned text analyzing step, the words indicated in above-mentioned instruction input step is excluded, above-mentioned rank is carried out
Laminar cluster analysis.
Eleventh embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
Above-mentioned instruction input step receives near synonym registration instruction,
Multiple words indicated by above-mentioned instruction input step are considered as identical words by above-mentioned text analyzing step, and
Carry out above-mentioned class type cluster analysis.
Twelfth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
Above-mentioned instruction input step receives compound word registration instruction,
Multiple words indicated by above-mentioned instruction input step are merged into 1 words by above-mentioned text analyzing step, and
Carry out above-mentioned class type cluster analysis.
13rd embodiment of the invention is characterized in that, in the 1st embodiment of the invention,
It is generated in above-mentioned picture generation step for showing the analysis result screen comprising above-mentioned group and for setting
The picture data of the analysis setting screen of the display mode of above-mentioned analysis result screen.
Fourteenth embodiment of the present invention is a kind of text mining program, for by the analysis of text data as the result is shown in picture
Face, which is characterized in that the text mining program is made the CPU of computer and executed following step using memory:
Text analyzing step carries out class type cluster analysis to the words extracted from the text data being entered,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, asked from above-mentioned analysis result
Above-mentioned group number cluster, generate in picture display belong to above-mentioned cluster comprising above-mentioned most data numbers are below
The picture data of the group of words.
Fifteenth embodiment of the present invention is characterized in that, in the 14th embodiment of the invention,
The words that above-mentioned group includes be subordinated to according to the sequence of the frequency of occurrences from high to low it is corresponding with above-mentioned group
It is selected in the words of cluster.
Sixteenth embodiment of the present invention is characterized in that, in the 15th embodiment of the invention,
In above-mentioned picture, above-mentioned group has and belongs to the frequency of occurrences of the words of cluster corresponding with above-mentioned group
The corresponding size of aggregate value.
Seventeenth embodiment of the present invention is characterized in that, in the 16th embodiment of the invention,
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
Eighteenth embodiment of the present invention is characterized in that, in the 14th embodiment of the invention,
The text mining program also makes above-mentioned computer execute the instruction input for inputting the instruction from user
Step,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according in above-mentioned instruction input step
The instruction of middle input and be performed.
Nineteenth embodiment of the present invention is characterized in that, in the 14th embodiment of the invention,
It is generated in above-mentioned picture generation step for showing the analysis result screen comprising above-mentioned group and for setting
The picture data of the analysis setting screen of the display mode of above-mentioned analysis result screen.
20th embodiment of the invention is a kind of text mining device, by the analysis of text data as the result is shown in picture
Face comprising:
Text analyzing portion carries out class type cluster analysis to the words extracted from the text data being entered,
Screen generating part generates picture data according to the analysis result in above-mentioned text analyzing portion, and
Portion as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
Above-mentioned screen generating part acquires above-mentioned group from above-mentioned analysis result according to most data numbers in group number and group
The cluster of group number is generated for group of the display comprising above-mentioned most data number words below for belonging to above-mentioned cluster in picture
The picture data of group.
21st embodiment of the invention is characterized in that, in the 20th embodiment of the invention,
The words that above-mentioned group includes be subordinated to according to the sequence of the frequency of occurrences from high to low it is corresponding with above-mentioned group
It is selected in the words of cluster.
22nd embodiment of the invention is characterized in that, in the 21st embodiment of the invention,
In above-mentioned picture, above-mentioned group has and belongs to the frequency of occurrences of the words of cluster corresponding with above-mentioned group
The corresponding size of aggregate value.
23rd embodiment of the invention is characterized in that, in the 22nd embodiment of the invention,
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
24th embodiment of the invention is characterized in that, in the 20th embodiment of the invention,
The text mining device also has the instruction input unit for inputting the instruction from user,
What any one of above-mentioned text analyzing portion and above-mentioned screen generating part basis inputted in above-mentioned instruction input unit
Instruction carrys out work.
25th embodiment of the invention is characterized in that, in the 20th embodiment of the invention,
Above-mentioned screen generating part generates for showing the analysis result screen comprising above-mentioned group and for setting above-mentioned point
Analyse the picture data of the analysis setting screen of the display mode of result screen.
The effect of invention
The embodiment of 1st, the 14th or the 20th according to the present invention carries out class type based on the words for being included to text data
It is after cluster analysis as a result, the group for the words for being included comprising cluster is shown in picture.Also, the words number that group is included
It is limited in most data numbers or less.Therefore, user sees the result that class type cluster analysis can be intuitively understood when picture.
The embodiment of 2nd, the 15th or the 21st according to the present invention occurs in the words that cluster is included in the inside of group
The higher words of frequency is shown.Therefore, user can readily recognize the higher words of the frequency of occurrences that each cluster is included.
The embodiment of 3rd, the 16th or the 22nd according to the present invention, group have the words for being included with cluster in picture
The corresponding size of the aggregate value of the frequency of occurrences.Therefore, user can readily recognize the words frequency of occurrences aggregate value it is biggish
Cluster.
The embodiment of 4th, the 17th or the 23rd according to the present invention, words have ruler corresponding with word frequency in picture
It is very little.Therefore, user, which can readily recognize, there is the higher words of frequency.
The embodiment of 5th, the 18th or the 24th according to the present invention can switch class type collection according to the instruction from user
The display embodiment of the result of cluster analysis.
6th embodiment according to the present invention, can be according to the instruction from user, of group shown by image switching
Number (number of cluster).
7th embodiment according to the present invention can switch for the words that group is included according to the instruction from user
Several upper limit values.
8th embodiment according to the present invention is shown in picture to the text indicated by user during analysis object
The words that data are included carries out the result of class type cluster analysis.Therefore, user can readily recognize class type cluster point
The result of analysis changes with time.
9th embodiment according to the present invention, can be according to the analysis target indicated by user, the words of analysis of shift object
Type simultaneously shows result obtained from carrying out class type cluster analysis in picture.
10th embodiment according to the present invention can be shown in picture by row order of going forward side by side except words indicated by user
Result obtained from laminar cluster analysis.
11st embodiment according to the present invention, can be shown in picture multiple words indicated by user are considered as it is identical
Words simultaneously will carry out result obtained from class type cluster analysis.
12nd embodiment according to the present invention can show in picture multiple words indicated by user merging into 1
A words is gone forward side by side result obtained from row order laminar cluster analysis.
The embodiment of 13rd, the 19th or the 25th according to the present invention, display analysis result screen and analysis setting screen.Cause
This, analysis setting screen can be used to be easily switched into the display side of result obtained from row order laminar cluster analysis in user
Formula.
Detailed description of the invention
Fig. 1 is the block diagram for showing the composition of text mining device of embodiment of the present invention.
Fig. 2 is the block diagram for showing the composition of the computer functioned as text mining device shown in FIG. 1.
Fig. 3 is the figure for showing the display picture of text mining device shown in FIG. 1.
Fig. 4 is the flow chart for showing the movement of text mining device shown in FIG. 1.
Fig. 5 is that the picture data of text mining device shown in FIG. 1 generates the flow chart of processing.
Fig. 6 is the figure for showing the data assigned picture of text mining device shown in FIG. 1.
Fig. 7 is the figure for showing the example for the text data for being input into text mining device shown in FIG. 1.
Fig. 8 is the figure for showing the target assigned picture of text mining device shown in FIG. 1.
Fig. 9 is the figure for showing the near synonym list selection picture of text mining device shown in FIG. 1.
Figure 10 is the figure for showing the compound word list selection picture of text mining device shown in FIG. 1.
Figure 11 A is shown in analysis result screen preceding during setting analysis object in text mining device shown in FIG. 1
Figure.
The analysis result that Figure 11 B is shown in after being set during analyzing object in text mining device shown in FIG. 1 is drawn
The figure in face.
Figure 12 A is shown in the figure that the analysis result screen before words exclusion is carried out in text mining device shown in FIG. 1.
Figure 12 B is shown in the figure that the analysis result screen after words exclusion is carried out in text mining device shown in FIG. 1.
Figure 13 A, which is shown in, carries out the analysis result screen before near synonym registration in text mining device shown in FIG. 1
Figure.
Figure 13 B, which is shown in, carries out the analysis result screen after near synonym registration in text mining device shown in FIG. 1
Figure.
Figure 14 A, which is shown in, carries out the analysis result screen before compound word registration in text mining device shown in FIG. 1
Figure.
Figure 14 B, which is shown in, carries out the analysis result screen after compound word registration in text mining device shown in FIG. 1
Figure.
Figure 15 is the figure for showing the example of arborescence.
Figure 16 is the figure for showing the case where setting cluster number to arborescence shown in figure 15.
Figure 17 is shown in the figure of the words occurred in attached drawing and its explanation.
Specific embodiment
Hereinafter, being dug referring to attached drawing to the text mining method, text mining program and text of embodiments of the present invention
Pick device is illustrated.The text mining method of present embodiment is executed usually using computer.The text of present embodiment
Excavating program is to implement the program of text mining method to use computer.The text mining device of present embodiment is usual
It is to be constituted using computer.The computer for executing text mining program is functioned as text mining device.
Fig. 1 is the block diagram for showing the composition of text mining device of embodiments of the present invention.Text shown in FIG. 1 is dug
Digging device 10 has instruction input unit 11, text analyzing portion 12, screen generating part 13 and analysis portion 14 as the result is shown.To text
Text data 5 of the input of excavating gear 10 as analysis object.Text mining device 10 is mentioned to from the text data 5 being entered
The words taken carries out class type cluster analysis, and will analyze as the result is shown in picture.
The summary of the movement of text mining device 10 is as described below.The finger from user is inputted to instruction input unit 11
Show.Text analyzing portion 12 extracts words from the text data 5 being entered, and carries out class type cluster point to extracted words
Analysis.Screen generating part 13 generates picture data according to the analysis result in text analyzing portion 12.Analyze 14 basis of portion as the result is shown
Picture is shown by the picture data generated of screen generating part 13.
The instruction from user for being input to instruction input unit 11 includes: the setting of group number, most in group
Setting, words during the setting of data number, analysis object exclude, near synonym registration, compound word registration etc..In text data 5
For the time series data of the information with date or moment etc. situation when, text analyzing portion 12 is to the textual data being entered
According to the words that text data in 5, during the analysis object as set by instruction input unit 11 is included, class type is carried out
Cluster analysis.
Screen generating part 13 is when generating picture data, according to most data numbers in group number and group (after details
It states).Also, after having carried out indicated processing, screen generating part 13 generates new picture when user inputs new instruction
Data, and analyze portion 14 as the result is shown and show new picture.In this way, text mining device 10 is cut according to instruction from the user
The analysis mode of exchange of notes notebook data 5 and the display mode of analysis result.
Fig. 2 is the block diagram for showing the composition of the computer functioned as text mining device 10.It is shown in Fig. 2
Computer 20 has CPU (Central Processing Unit: central processing unit) 21, main memory 22, storage unit 23, defeated
Enter portion 24, display unit 25, communication unit 26 and recording media reading section 27.Main memory 22 is for example using DRAM (Dynamic
Random Access Memory: dynamic random access memory).Storage unit 23 for example using hard disk (Hard Disk) or is consolidated
State hard disk (Solid State Drive).Input unit 24 is for example including keyboard (Keyboard) 28 and mouse (Mouse) 29.It is aobvious
Show portion 25 for example using liquid crystal display.Communication unit 26 is the interface circuit of wire communication or wireless communication.Recording medium is read
Portion 27 is the interface circuit for being stored with the recording medium 30 of program etc..Recording medium 30 is for example using CD-ROM (Compact
Disc Read-Only Memory: compact disc read-only memory), DVD-ROM (Digital Versatile Disc Read-
Only Memory: digital versatile disc read-only memory), USB (Universal Serial Bus: universal serial bus)
The non-instantaneous recording medium such as memory.
When computer 20 executes the situation of text mining program 31, storage unit 23 stores text mining program 31 and text
Data 5.Text mining program 31 and text data 5 for example may be either to be received using communication unit 26 from server or other computers
, it can also be read for usage record medium reading part 27 from recording medium 30.
When executing text mining program 31, text mining program 31 and text data 5 are replicated and are transmitted to main memory
22.CPU 21 by main memory 22 as work with memory come using being stored in the text of main memory 22 by executing
Program 31 is excavated, to handle the text data 5 for being stored in main memory 22.At this point, computer 20 is used as text mining device
10 and function.Furthermore the composition of above-described computer 20 only as an example of, arbitrary computer can be used to constitute text
This excavating gear 10.
Hereinafter, using the Japanese data comprising Japanese words as text data 5.Figure 17 is shown in attached drawing and its explanation
The figure of the words of appearance.The meaning of words (Japanese words) and words is recorded in each row of Figure 17.In the following description, exist
When referring to Japanese words, the meaning of words is recorded in the bracket after words sometimes.Furthermore text data 5 also can be any
The data of language.
Fig. 3 is the figure of the display picture of display text excavating gear 10.Display picture 40 shown in Fig. 3 includes analysis
Result screen 41 and analysis setting screen 42.The analysis result of display text analysis portion 12 in analysis result screen 41.Dividing
Show that GUI (Graphical User Interface: graphic user interface) component, the GUI component are used in analysis setting screen 42
In the analysis mode in setting text analyzing portion 12 and the characteristic of the picture data generated of screen generating part 13.
If the result to class type cluster analysis sets cluster number, the words that each cluster is included is determined.Will to from
The words that text data 5 extracts carry out after class type cluster analysis as the result is shown when picture, text mining device 10 be with
Mode shown in Fig. 3 shows group corresponding with cluster, to replace arborescence.
In the following description, cluster shown in picture is also known as group.User uses instruction input unit 11
Carry out most data numbers (upper limit value for the words number that group is included) in designated group number (cluster number) and group.Hereinafter, will
The former is set as m, and the latter is set as n.
In text mining device 10, the words that text data 5 is included is categorized into m cluster, and wraps in each cluster
Containing 1 or more words.M group is shown in analysis result screen 41, shows words in the inside of each group.Use cloud form
Figure shows group, and the words that group is included is shown in the inside of elliptic region.The words that each group is included is limited
At n or less.For example, when some cluster in n=5 includes the situation of 10 words, in analysis result screen 41,
The inside of group shows 5 words.
First slider bar and 2 first buttons of the display for setting group number m (indicate symbol in analysis setting screen 42
Number "+" or "-"), the second slider bar for setting most data number n in group and 2 the second buttons and for setting point
4 boxes and 2 third buttons (indicating arrow to the left or right-hand arrow) during analysis object.
User moves left and right the sliding shoe court of the first slider bar by operation mouse 29, or presses the first button to refer to
Show group number m.Group number m will increase when the first button for indicating symbol "+" is pressed, indicate the first of symbol "-" by
It can then be reduced when button is pressed.The initial value of group number m, such as be set to the analysis result in text analyzing portion 12 and included
The square root of the type of words, or be close to the subduplicate integer.For example, the analysis result in text analyzing portion 12 includes
When having the situation of 16 kinds of words, the initial value of group number m is set to 4.
User moves left and right the sliding shoe court of the second slider bar by operation mouse 29, or presses the second button to refer to
Show most data number n in group.Most data number n in group will increase or reduce when the second button is pressed.Group
The initial value of interior most data number n is for example set to 5.
When text data 5 is the situation of time series data, user uses 4 by operation keyboard 28 or mouse 29
Box specifies date and moment, or presses third button come during indicating analysis object.It is being indicated to the left during analysis object
When the third button of arrow is pressed, towards mobile specified amount (such as 1 month) in the past, and the third for indicating right-hand arrow by
Then towards the mobile specified amount of opposite direction when button is pressed.Initial value during analysis object is for example set to from text data
5 it is oldest at the time of to it is newest at the time of during.Furthermore when text data 5 is not the situation of time series data, Yong Huwu
During method designated analysis object.
Display 1 is above in analysis result screen 41 and a groups below of m, the inside of each group show 1 with
Upper and n words below.In picture, the aggregate value of the frequency of occurrences for the words that cluster corresponding with each group is included is got over
Greatly, which shows with being more amplified.When the words number that cluster is included is more than n situations, shown in the inside of group
The higher n words of the frequency of occurrences.In picture, with regard to words that group is included with comprising the elliptic region of the words for,
The frequency of occurrences of words is higher, and the words which is included is shown with being more amplified with the elliptic region comprising the words.Respectively
Group's subscript has title.The highest words of the frequency of occurrences in the words that the title of group is included using cluster.The title of group
It is underlined and is shown in the inside of group.Furthermore when the inside of elliptic region can not show the situation of words, replace word
Word and show symbol " ... ".
Display (is indicated for the third slider bar of specified scaling multiplying power and 2 the 4th buttons in analysis result screen 41
Symbol "+" or "-").User makes the sliding shoe of third slider bar towards moving left and right by operation mouse 29, or press the 4th by
Button sets scaling multiplying power.In analysis result screen 41, the group comprising words is amplified according to set scaling multiplying power
Or reduce ground display.The initial value of scaling multiplying power is set to 100%.Institute is shown in the analysis result screen 41 of original state
Some groups.
When user changes group number m, most data number n in group or analysis object in analysis setting screen 42
When period, the content of analysis result screen 41 generates variation according to the change.When user refers in analysis result screen 41
When showing that words excludes, near synonym are registered or compound word is registered, the content of analysis result screen 41 also generates change according to the instruction
Change.
When to class type cluster analysis is carried out from the extracted words of text data 5, text mining device 10 is referring to depositing
Contain the exclusion word list for the words that should be excluded, be stored with should be used as near synonym the near synonym list of words that handles and
It is stored with and should be used as compound word come the compound word list of the words handled.With the multiple of equivalent (or roughly the same meaning)
Words and 1 words for representing multiple words are established corresponding relationship and are stored near synonym list.If being linked just
Multiple words as 1 compound word are established corresponding relationship with compound word obtained from the multiple words of connection and are stored in
Compound word list.Such as " daigakusei (university student) " and " gakusei (student) " and " daigakusei " that represents the two
It is established corresponding relationship and is stored near synonym list.Such as both " nintai (restraining oneself) " and " tsuyoi (strong) " and connection
Obtained from " nintaizuyoi (endurance is strong) " be established corresponding relationship and be stored in compound word list.Text mining dress
Set 10 has multiple near synonym lists and multiple compound word lists sometimes.
Fig. 4 is the flow chart of the movement of display text excavating gear 10.Fig. 5 is the frame numbers of display text excavating gear 10
According to the flow chart of the details of generation processing (step S111 shown in Fig. 4).Input unit 24 and the CPU 21 for executing step S113 make
It is functioned for instruction input unit 11.The CPU 21 for executing step S109~S110 plays function as text analyzing portion 12
Energy.The CPU 21 for executing step S111 is functioned as screen generating part 13.Display unit 25 is with execution step S112's
CPU21 is functioned as portion 14 as the result is shown is analyzed.Hereinafter, carrying out the movement to text mining device 10 referring to Fig. 4 and Fig. 5
It is illustrated.
Firstly, CPU 21 makes display unit 25 show data assigned picture 51 (step S101) shown in fig. 6.It is specified in data
Picture 51 shows the box for specified file name and the box for specified folder name.User passes through specified in data
Filename or folder name are specified in picture 51, carry out the text data 5 of designated analysis object.Text data 5 can be both stored in
The storage units such as hard disk 23 can also be stored in the server connected using communication unit 26 or other computers etc..
Then, CPU 21 will use data assigned picture 51 and specified text data 5 is transmitted to main memory 22.Pass through
In this way, text data 5 is input to text mining device 10 (step S102).Fig. 7 is the figure of the example of display text data 5.
Text data shown in Fig. 7 is the data for the report that university student is created, and is the time series number of the information with the date
According to.Text data shown in Fig. 7 is from being above followed successively by " relationship ... about university student in this lecture contents and society ", " general big
Student's graduation do manual work before entering society or ... ", " it is to have paid expensive tuition fee learning that our students, which will have cognition ... " and " learn
Life is the valuable time for making self confidence grow up.And ... ".Furthermore the text that text mining device 10 is analyzed
The type of notebook data 5 is any.
Then, CPU 21 makes display unit 25 show target assigned picture 52 (step S103) shown in Fig. 8.It is specified in target
Display corresponds to 3 radio buttons (Radio Button) of content, feature and evaluation in picture 52.User passes through operation mouse
Mark 29 is pressed any radio button and is come from content, feature and the middle selection analysis target evaluated.Then, CPU 21, which is received, uses
Analysis target specified by target assigned picture 52.In this way, analysis target is input to 10 (step of text mining device
S104)。
Then, CPU 21 makes display unit 25 show near synonym list selection picture 53 (step S105) shown in Fig. 9.Close
Adopted word list selects the title of near synonym list possessed by display text excavating gear 10 in picture 53 and is registered in each close
The near synonym of adopted word list.User selects to select any near synonym to arrange in picture 53 near synonym list by operation mouse 29
Table, to specify near synonym list to be used.In this way, near synonym list (step is selected in text mining device 10
S106)。
Then, CPU 21 makes display unit 25 show compound word list selection picture 54 (step S107) shown in Fig. 10.?
Compound word list selects the title of compound word list possessed by display text excavating gear 10 in picture 54 and is registered in each
The compound word of compound word list.User selects to select any compound word to arrange in picture 54 in compound word list by operation mouse 29
Table, to specify compound word list to be used.In this way, compound word list (step is selected in text mining device 10
S108)。
Then, CPU 21 considers to exclude word list, near synonym list and compound word list, from step s 102
In the text data belonged to during analyzing object in the text data 5 being entered, extracts and correspond in step S104 meaning
The words (step S109) of the type of fixed analysis target.CPU21 is when analyzing the situation that target is " content ", from text data
Noun, proper noun, place name and name are extracted in 5.CPU 21 is from textual data when analyzing the situation that target is " feature "
Noun, proper noun, サ, which are extracted, according to 5 becomes noun and verb.CPU 21 is when analyzing the situation that target is " evaluation ", from text
Adjective is extracted in data 5, describes verb and interjection.Furthermore text mining device 10 can also be supported other than above-mentioned 3
Analyze target.Also, CPU 21 can also be extracted and above-mentioned different types of words according to each analysis target.
When text data 5 is the situation of time series data, CPU 21 is when executing step S109, only from text data
Words is extracted in the text data for being included during the analysis object as indicated by user in 5.Also, being stored in words W1
When excluding the situation of word list, CPU 21 can ignore the words W1 that text data 5 is included when executing step S109 completely.
Also, being established corresponding relationship with the words W2 for representing the two in words W2 and words W3 and being stored in selected near synonym and arrange
When the situation of table, the words W3 that text data 5 is included can all be used as words W2 to come by CPU 21 when executing step S109
Processing.Also, the words W6 obtained from words W4 and words W5 and connection the two is established corresponding relationship and is stored in selected
When the situation for the compound word list selected, CPU 21 is when executing step S109, the words for the connection that can included by text data 5
W4 is all used as words W6 to handle with words W5.
Then, CPU 21 carries out class type cluster analysis (step S110) to the words extracted in step S109.CPU
21 in step s 110, such as (which kind of degree 2 words show at a distance of according to the distance between 2 words in text data 5
Distance), to acquire the similarity between 2 words.CPU 21 uses defined side according to the similarity between obtained words
Method (for example, knearest neighbour method, longest distance method, group average method, decimal system method, Hua Defa (Ward ' s Method) etc.) carries out
Class type cluster analysis.Also, CPU 21 is in step s 110, the frequency of occurrences of each words is acquired.
Then, CPU 21 according in the obtained class type cluster analysis of step S110 as a result, to generate for showing
Analyze the picture data (step S111) of result.CPU 21 carries out processing shown in fig. 5 in step S111.
Group number is set as m by CPU 21, and most data numbers in group are set as n (step S201).Then, 21 CPU
For class type cluster analysis as a result, cluster number is set as m, to acquire m cluster (step S202).Then, 21 CPU
The aggregate value (step S203) of the frequency of occurrences for the words that cluster is included is acquired for each cluster.Then, CPU 21 according to
The aggregate value of the frequency of occurrences acquired in step S203, to determine the display size (step S204) of each group.In step S204
In, the aggregate value of the frequency of occurrences for the words that cluster is included is bigger, and the display size of group is just decided to be bigger.
Then, CPU 21 is directed to each cluster, and the words (step S205) that should be shown is selected from the words that cluster is included.
In step S205, it is a below that n is selected from the words that each cluster is included according to the sequence of the frequency of occurrences from high to low
Words.Then, CPU 21 is directed to each words selected in step S205, determines words according to the frequency of occurrences of words
Display size (step S206).In step S206, the higher words of the frequency of occurrences, the display size of words is just decided to be more
Greatly.
Then, CPU 21 generates the picture data (step S207) for showing the result of class type cluster analysis.In step
(indicate) that the m group be in step S204 with cloud graphic comprising m group in the picture data generated in rapid S207
The size of middle decision.It include n words below in the inside of each group, the n words below have in step S206
The size of decision.Words is shown in the inside of group in picture.CPU 21 is after performing step S207, end picture
Data generation processing.
Then, CPU 21 makes display unit 25 show the picture (step based on the picture data generated in step S111
S112).Then, CPU 21 receives the instruction (step S113) from user.Then, CPU 21 connects according in step S113
The type of the instruction of receipts advances to the either step (step S114) in step S115~S120.
When what CPU 21 was received in step S113 is designated as the situation of " setting of group number ", towards before step S115
Into.In the situation, group number m is set as value (step S115) indicated by user by CPU 21, and towards step S111
Advance.Thereafter, picture data is generated according to set group number m, and shows new picture.In this way, display includes institute
The analysis result screen of the group of specified number.
When what CPU 21 was received in step S113 is designated as the situation of " settings of most data numbers in group ", court
Advance to step S116.In the situation, most data number n in group are set as the (step of value indicated by user by CPU 21
Rapid S116), and advance towards step S111.Thereafter, picture data is generated according to most data number n in set group,
And show new picture.In this way, show that the words number that each group is included is limited in specified below point of value
Analyse result screen.
When what CPU 21 was received in step S113 is designated as the situation of " setting during analysis object ", towards step
S117 advances.In the situation, CPU 21 during analyzing object by during being set as indicated by user (step S117), and court
Advance to step S109.Thereafter, it referring to class type cluster analysis is carried out during set analysis object, generates new for showing
Analysis result picture data, and show new picture.In this way, display is directed to specified analysis object in picture
During text data included words carry out class type cluster analysis obtained from result.
Figure 11 A is the figure of analysis result screen preceding during showing setting analysis object.Figure 11 B is display setting analysis pair
As the figure of the analysis result screen after period.In analysis result screen 61 before setting shown in Figure 11 A, display is to being inputted
Text data 5 in from 1 day 0 January in 2014 when 00 divide and included by text data only when dividing to 31 days 24 December in 2015
Words carry out class type cluster analysis obtained from result.In analysis result screen 62 after being set shown in Figure 11 B, show
00 divides for textual data only when dividing on September 30,24 2014 when showing in the text data 5 inputted from 1 day 0 March in 2014
Result obtained from class type cluster analysis is carried out according to the words for being included.The display content and analysis for analyzing result screen 61 are tied
The display content of fruit picture 62 is different.User can be held by the analysis result screen of front and back during overview setup analysis object
It changes places and identifies that class type cluster analysis result changes with time.
When what CPU 21 was received in step S113 is designated as the situation of " words exclusion ", advance towards step S118.
In the situation, specified words is appended to and excludes word list (step S118) by CPU 21, and towards before step S109
Into.Thereafter, specified words is excluded into row order laminar cluster analysis of going forward side by side, generates the picture for showing new analysis result
Data, and show new picture.In this way, it is shown in picture and specified words is excluded into row order laminar cluster point of going forward side by side
Result obtained from analysis.
Figure 12 A is the figure for showing the analysis result screen before carrying out words exclusion.Figure 12 B is after display carries out words exclusion
Analysis result screen figure.User's operation mouse 29, after the words for having selected to exclude, instruction carries out words exclusion.
In analysis result screen 63 before the exclusion of the words shown in Figure 12 A, select " shakai (society) ", and be selected from the menu
" words exclusion ".Thereafter, it is shown in picture and " shakai " is excluded into result obtained from row order laminar cluster analysis of going forward side by side.?
In analysis result screen 64 after the exclusion of words shown in Figure 12 B, replaces " shakai " and show " shingaku (entering a higher school) ".?
In the words for being included with " shakai " same cluster, " shingaku " is that the frequency of occurrences is only second to show in analysis result screen 63
The frequency of occurrences soprano of 5 words shown.
When what CPU 21 was received in step S113 is designated as the situation of " near synonym registration ", towards before step S119
Into.In the situation, indicated words is appended to the near synonym list (step S119) in being used, and court by CPU 21
Advance to step S109.Thereafter, consider indicated near synonym and carry out class type cluster analysis, generate for showing new point
The picture data of result is analysed, and shows new picture.In this way, it shows in picture using indicated words as near synonym
Result obtained from row order of going forward side by side laminar cluster analysis.
Figure 13 A is the figure for showing the analysis result screen after carrying out near synonym registration.Figure 13 B is that display progress near synonym are stepped on
The figure of analysis result screen after note.User's operation mouse 29, after multiple words that selection should be used as near synonym registration, instruction
Carry out near synonym registration.In analysis result screen 65 before the registration of the near synonym shown in Figure 13 A, " daigakusei is (big for selection
Student) " and " gakusei (student) ", and it is selected from the menu " near synonym registration ".Thereafter, showing in picture will
" daigakusei " and " gakusei " go forward side by side result obtained from row order laminar cluster analysis as near synonym.Shown in Figure 13 B
Near synonym registration after analysis result screen 66 in, " daigakusei " is with bigger size compared with analyzing result screen 65
It is shown, and " shingaku (entering a higher school) " substitution " gakusei " and be shown.According to the frequency of occurrences of " daigakusei " with
The aggregate value of the frequency of occurrences of " gakusei ", " daigakusei " with analysis result screen 65 in " daigakusei " phase
It is more shown than bigger size.
When what CPU 21 was received in step S113 is designated as the situation of " compound word registration ", towards before step S120
Into.In the situation, indicated words is appended to by CPU 21 be used in compound word list (step S120), and court
Advance to step S109.Thereafter, consider that indicated compound word is gone forward side by side row order laminar cluster analysis, is generated for showing new point
The picture data of result is analysed, and shows new picture.In this way, it shows in picture using specified words as compound word
Result obtained from row order of going forward side by side laminar cluster analysis.
Figure 14 A is the figure for showing the analysis result screen before carrying out compound word registration.Figure 14 B is that display progress compound word is stepped on
The figure of analysis result screen after note.Multiple words that user is registered in operation mouse 29 to select to should be used as compound word
Afterwards, instruction carries out " near synonym registration ".In the analysis result screen 67 before compound word shown in figure 14 A registration, " nintai
(restraining oneself) " is selected with " tsuyoi (strong) ", and " compound word registration " is selected from a menu.Thereafter, showing in picture will
" nintai " and " tsuyoi " go forward side by side result obtained from row order laminar cluster analysis as compound word.It is multiple shown in Figure 14 B
Close word registration after analysis result screen 68 in, replace " nintai " and " tsuyoi ", and with " nintai " and " tsuyoi " with
Under size show " nintaizuyoi (endurance is strong) ".
As shown above, the text mining method of present embodiment includes: text analyzing step, to from the text being entered
The words extracted in data carries out class type cluster analysis;Picture generation step, according to the analysis of text analyzing step as a result,
Generate picture data;And step as the result is shown is analyzed, picture is shown according to picture data.In picture generation step, according to
Most data number n in group number m and group acquire m cluster from analysis result, and generate for that will include n below
The group for the words that cluster is included is shown in the picture data of picture.Text mining method according to the present embodiment, can root
According to the words for being included to text data carry out class type cluster analysis as a result, in picture display containing cluster included
The group of words.Also, the quantity of the included words of group is limited in n or less.Therefore, user, can when seeing picture
Intuitively understand the result of class type cluster analysis.
Also, the words that group is included is to be wrapped according to the sequence of the frequency of occurrences from high to low from cluster corresponding with group
It is selected in the words contained.Therefore, in the inside of group, the frequency of occurrences shown in the words that cluster is included is higher
Words.Therefore, user can readily recognize the higher words of the frequency of occurrences that each cluster is included.Also, group is in picture
With size corresponding with the aggregate value of the frequency of occurrences of following words, which is that cluster corresponding with group is included
Words.Therefore, user can readily recognize the biggish cluster of aggregate value of the words frequency of occurrences.Also, the word that group is included
Word has size corresponding with the frequency of occurrences of words in picture.Therefore, user can readily recognize that frequency occur higher
Words.
Also, text mining method includes the instruction input step for inputting the instruction from user, and text analyzing
Either step in step and picture generation step is performed according to the instruction inputted in instruction input step.Therefore, may be used
According to the instruction from user, switch the display mode of the result of class type cluster analysis.In particular, in instruction input step
Receive group number m setting instruction, and in picture generation step according to instruction input step in specified by group number m come
Generate picture data.In this way, according to instruction from the user, switch the areal (cluster being shown in picture
Number).Also, receive most data number n in group in instruction input step, and in picture generation step, according to indicating
The most data number n in group specified in input step generate picture data.In this way, according to the finger from user
Show, switches in the number of the words shown in region.
Also, receive in instruction input step to the instruction during analysis object, and in text analyzing step, to text
In notebook data indicate input step in specify analysis object during text data included words progress stratum
Formula cluster analysis.Therefore, the word for being included to text data during analyzing object indicated by user is shown in picture
Word carries out result obtained from class type cluster analysis.Therefore, user can readily recognize the result of class type cluster analysis
It changes with time.Also, the setting instruction of analysis target is received in instruction input step, and in text analyzing step, from
The words that type corresponding with the analysis target set in instruction input step is extracted in text data 5, carries out class type
Cluster analysis.In this way, it can show that analysis target is according to indicated by user come the word of analysis of shift object in picture
Word type is gone forward side by side result obtained from row order laminar cluster analysis.
Also, receiving words in instruction input step excludes instruction, and in text analyzing step, it will be in instruction input step
The words indicated in rapid excludes row order laminar cluster analysis of going forward side by side.In this way, it can show and arrange words indicated by user
Except result obtained from row order laminar cluster analysis of going forward side by side.Also, near synonym registration instruction is received in instruction input step, and
In text analyzing step, by instruction input step multiple words for indicating be considered as identical words and go forward side by side row order laminar cluster
Analysis.In this way, it can show that multiple words indicated by user, which are considered as identical words, goes forward side by side row order laminar in picture
Result obtained from cluster analysis.Also, compound word registration instruction is received in instruction input step, and in text analyzing step
In, by instruction input step multiple words for indicating merge into 1 words and go forward side by side row order laminar cluster analysis.Pass through this
Sample can show that multiple words indicated by user, which are merged into 1 words, goes forward side by side row order laminar cluster analysis in picture
As a result.
Also, generating picture data in picture generation step, which is for showing the analysis knot comprising group
The analysis setting screen of fruit picture and the display mode for setting analysis result screen.Therefore, analysis result screen and analysis
Setting screen is shown.Therefore, user analysis setting screen can be used and be easily switched into the cluster analysis of row order laminar and
The display mode of obtained result.
The text mining program 31 of present embodiment and the text mining device 10 of present embodiment have and this embodiment party
The identical composition of text mining processing method of formula, to play identical effect.
Text mining method, text mining program and text mining device according to the present embodiment, can be according to text
The words that data are included carries out obtained from class type cluster analysis as a result, display is comprising below most data numbers in picture
The cluster words that is included group.Therefore, user can intuitively understand class type cluster analysis when seeing picture
As a result.
Furthermore this case be advocate based on July 25th, 2016 file an application it is entitled " text mining method,
Japan Patent Patent 2016-145065 priority of text mining program and text mining device " and the application proposed,
The content of the Japan Patent Patent 2016-145065 application is contained in the application by reference.
The explanation of appended drawing reference
5 text datas
10 text mining devices
11 instruction input units
12 text analyzing portions
13 screen generating parts
14 analyze portion as the result is shown
20 computers
21 CPU
22 main memories
23 storage units
24 input units
25 display units
30 recording mediums
31 text mining programs
40 display pictures
41,61~68 analysis result screen
42 analysis setting screens
51 data assigned pictures
52 target assigned pictures
53 near synonym lists select picture
54 compound word lists select picture
Claims (25)
1. a kind of text mining method, by the analysis of text data as the result is shown in picture characterized by comprising
Text analyzing step carries out class type cluster analysis to the words extracted from the text data being entered,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, acquired from above-mentioned analysis result
The cluster of group number is stated, is generated for showing to include above-mentioned most data number words below for belonging to above-mentioned cluster in picture
Group picture data.
2. text mining method as described in claim 1, which is characterized in that
The words that above-mentioned group includes is to be subordinated to cluster corresponding with above-mentioned group according to the sequence of the frequency of occurrences from high to low
Words in select.
3. text mining method as claimed in claim 2, which is characterized in that
In above-mentioned picture, above-mentioned group has and belongs to the total of the frequency of occurrences of the words of cluster corresponding with above-mentioned group
It is worth corresponding size.
4. text mining method as claimed in claim 3, which is characterized in that
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
5. text mining method as described in claim 1, which is characterized in that
It further include the instruction input step for inputting the instruction from user,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according to defeated in above-mentioned instruction input step
The instruction that enters and be performed.
6. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction of above-mentioned group number is received,
In above-mentioned picture generation step, above-mentioned frame numbers are generated according to the group number set in above-mentioned instruction input step
According to.
7. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction of above-mentioned most data numbers is received,
In above-mentioned picture generation step, above-mentioned picture is generated according to the most data numbers set in above-mentioned instruction input step
Face data.
8. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction during analysis object is received,
In above-mentioned text analyzing step, to the analysis object set in above-mentioned instruction input step in above-mentioned text data
During the text data words that is included, carry out above-mentioned class type cluster analysis.
9. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction of analysis target is received,
In above-mentioned text analyzing step, from the analysis extracted in above-mentioned text data with set in above-mentioned instruction input step
The words of the corresponding type of target carries out above-mentioned class type cluster analysis.
10. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, receives words and excludes instruction,
In above-mentioned text analyzing step, the words indicated in above-mentioned instruction input step is excluded, above-mentioned class type is carried out
Cluster analysis.
11. text mining method as claimed in claim 5, which is characterized in that
Above-mentioned instruction input step receives near synonym registration instruction,
Multiple words indicated by above-mentioned instruction input step are considered as identical words by above-mentioned text analyzing step, and are carried out
Above-mentioned class type cluster analysis.
12. text mining method as claimed in claim 5, which is characterized in that
Above-mentioned instruction input step receives compound word registration instruction,
Multiple words indicated by above-mentioned instruction input step are merged into 1 words by above-mentioned text analyzing step, and are carried out
Above-mentioned class type cluster analysis.
13. text mining method as described in claim 1, which is characterized in that
Generated in above-mentioned picture generation step for show to include the analysis result screen of above-mentioned group and above-mentioned for setting
Analyze the picture data of the analysis setting screen of the display mode of result screen.
14. a kind of text mining program, for by the analysis of text data as the result is shown in picture, which is characterized in that the text
This excavation program is made the CPU of computer and is executed following step using memory:
Text analyzing step carries out class type cluster analysis to the words extracted from the text data being entered,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, acquired from above-mentioned analysis result
The cluster of group number is stated, is generated for showing to include above-mentioned most data number words below for belonging to above-mentioned cluster in picture
Group picture data.
15. text mining program as claimed in claim 14, which is characterized in that
The words that above-mentioned group includes is to be subordinated to cluster corresponding with above-mentioned group according to the sequence of the frequency of occurrences from high to low
Words in select.
16. text mining program as claimed in claim 15, which is characterized in that
In above-mentioned picture, above-mentioned group has and belongs to the total of the frequency of occurrences of the words of cluster corresponding with above-mentioned group
It is worth corresponding size.
17. text mining program as claimed in claim 16, which is characterized in that
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
18. text mining program as claimed in claim 14, which is characterized in that
The text mining program also makes above-mentioned computer execute the instruction input step for inputting the instruction from user,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according to defeated in above-mentioned instruction input step
The instruction that enters and be performed.
19. text mining program as claimed in claim 14, which is characterized in that
Generated in above-mentioned picture generation step for show to include the analysis result screen of above-mentioned group and above-mentioned for setting
Analyze the picture data of the analysis setting screen of the display mode of result screen.
20. a kind of text mining device, by the analysis of text data as the result is shown in picture comprising:
Text analyzing portion carries out class type cluster analysis to the words extracted from the text data being entered,
Screen generating part generates picture data according to the analysis result in above-mentioned text analyzing portion, and
Portion as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
Above-mentioned screen generating part acquires above-mentioned group number from above-mentioned analysis result according to most data numbers in group number and group
Cluster, generate for group of the display comprising above-mentioned most data number words below for belonging to above-mentioned cluster in picture
Picture data.
21. text mining device as claimed in claim 20, which is characterized in that
The words that above-mentioned group includes is to be subordinated to cluster corresponding with above-mentioned group according to the sequence of the frequency of occurrences from high to low
Words in select.
22. text mining device as claimed in claim 21, which is characterized in that
In above-mentioned picture, above-mentioned group has and belongs to the total of the frequency of occurrences of the words of cluster corresponding with above-mentioned group
It is worth corresponding size.
23. text mining device as claimed in claim 22, which is characterized in that
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
24. text mining device as claimed in claim 20, which is characterized in that
The text mining device also has the instruction input unit for inputting the instruction from user,
Any one of above-mentioned text analyzing portion and above-mentioned screen generating part are according to the instruction inputted in above-mentioned instruction input unit
Carry out work.
25. text mining device as claimed in claim 20, which is characterized in that
Above-mentioned screen generating part generates for showing the analysis result screen comprising above-mentioned group and for setting above-mentioned analysis knot
The picture data of the analysis setting screen of the display mode of fruit picture.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016145065A JP6794162B2 (en) | 2016-07-25 | 2016-07-25 | Text mining methods, text mining programs, and text mining equipment |
JP2016-145065 | 2016-07-25 | ||
PCT/JP2017/020922 WO2018020842A1 (en) | 2016-07-25 | 2017-06-06 | Text mining method, text mining program, and text mining apparatus |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109478191A true CN109478191A (en) | 2019-03-15 |
CN109478191B CN109478191B (en) | 2022-04-08 |
Family
ID=61015910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201780043375.8A Active CN109478191B (en) | 2016-07-25 | 2017-06-06 | Text mining method, recording medium, and text mining device |
Country Status (5)
Country | Link |
---|---|
JP (1) | JP6794162B2 (en) |
KR (1) | KR102180487B1 (en) |
CN (1) | CN109478191B (en) |
TW (1) | TWI686716B (en) |
WO (1) | WO2018020842A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7078429B2 (en) * | 2018-03-20 | 2022-05-31 | 株式会社Screenホールディングス | Text mining methods, text mining programs, and text mining equipment |
EP3882786A4 (en) | 2019-05-17 | 2022-03-23 | Aixs, Inc. | Cluster analysis method, cluster analysis system, and cluster analysis program |
JP7456486B2 (en) * | 2020-02-25 | 2024-03-27 | 日本電気株式会社 | Item classification support system, method and program |
JPWO2022130547A1 (en) * | 2020-12-16 | 2022-06-23 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0991314A (en) * | 1995-07-14 | 1997-04-04 | Fuji Xerox Co Ltd | Information search device |
JP2000227917A (en) * | 1999-02-05 | 2000-08-15 | Agency Of Ind Science & Technol | Thesaurus browsing system and method therefor and recording medium recording its processing program |
JP2003044491A (en) * | 2001-07-30 | 2003-02-14 | Toshiba Corp | Knowledge analytic system. method for setting analytic condition, saving analytic condition and re-analyzing processing in the system |
JP2005107688A (en) * | 2003-09-29 | 2005-04-21 | Nippon Telegr & Teleph Corp <Ntt> | Information display method and system and information display program |
CN1934570A (en) * | 2004-03-18 | 2007-03-21 | 日本电气株式会社 | Text mining device, method thereof, and program |
JP2010039671A (en) * | 2008-08-04 | 2010-02-18 | Nippon Telegr & Teleph Corp <Ntt> | Text mining apparatus, method, program, and recording medium |
CN104142918A (en) * | 2014-07-31 | 2014-11-12 | 天津大学 | Short text clustering and hotspot theme extraction method based on TF-IDF characteristics |
CN104504024A (en) * | 2014-12-11 | 2015-04-08 | 中国科学院计算技术研究所 | Method and system for mining keywords based on microblog content |
CN105550365A (en) * | 2016-01-15 | 2016-05-04 | 中国科学院自动化研究所 | Visualization analysis system based on text topic model |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6611825B1 (en) * | 1999-06-09 | 2003-08-26 | The Boeing Company | Method and system for text mining using multidimensional subspaces |
KR20090069874A (en) * | 2007-12-26 | 2009-07-01 | 한국과학기술정보연구원 | Method of selecting keyword and similarity coefficient for knowledge map analysis, and system thereof and media that can record computer program sources for method therof |
JP5439261B2 (en) | 2010-04-01 | 2014-03-12 | 日本電信電話株式会社 | Clustering apparatus, clustering method, and clustering program |
JP5545876B2 (en) | 2011-01-17 | 2014-07-09 | 日本電信電話株式会社 | Query providing apparatus, query providing method, and query providing program |
US9477704B1 (en) * | 2012-12-31 | 2016-10-25 | Teradata Us, Inc. | Sentiment expression analysis based on keyword hierarchy |
TW201516713A (en) * | 2013-10-16 | 2015-05-01 | Chunghwa Telecom Co Ltd | File classification method based on group characteristic values |
-
2016
- 2016-07-25 JP JP2016145065A patent/JP6794162B2/en active Active
-
2017
- 2017-06-06 WO PCT/JP2017/020922 patent/WO2018020842A1/en active Application Filing
- 2017-06-06 CN CN201780043375.8A patent/CN109478191B/en active Active
- 2017-06-06 KR KR1020197000933A patent/KR102180487B1/en active IP Right Grant
- 2017-06-30 TW TW106122011A patent/TWI686716B/en active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0991314A (en) * | 1995-07-14 | 1997-04-04 | Fuji Xerox Co Ltd | Information search device |
JP2000227917A (en) * | 1999-02-05 | 2000-08-15 | Agency Of Ind Science & Technol | Thesaurus browsing system and method therefor and recording medium recording its processing program |
JP2003044491A (en) * | 2001-07-30 | 2003-02-14 | Toshiba Corp | Knowledge analytic system. method for setting analytic condition, saving analytic condition and re-analyzing processing in the system |
JP2005107688A (en) * | 2003-09-29 | 2005-04-21 | Nippon Telegr & Teleph Corp <Ntt> | Information display method and system and information display program |
CN1934570A (en) * | 2004-03-18 | 2007-03-21 | 日本电气株式会社 | Text mining device, method thereof, and program |
JP2010039671A (en) * | 2008-08-04 | 2010-02-18 | Nippon Telegr & Teleph Corp <Ntt> | Text mining apparatus, method, program, and recording medium |
CN104142918A (en) * | 2014-07-31 | 2014-11-12 | 天津大学 | Short text clustering and hotspot theme extraction method based on TF-IDF characteristics |
CN104504024A (en) * | 2014-12-11 | 2015-04-08 | 中国科学院计算技术研究所 | Method and system for mining keywords based on microblog content |
CN105550365A (en) * | 2016-01-15 | 2016-05-04 | 中国科学院自动化研究所 | Visualization analysis system based on text topic model |
Non-Patent Citations (1)
Title |
---|
孙露乔: "文本挖掘的研究及其在主题搜索引擎中的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN109478191B (en) | 2022-04-08 |
JP2018018118A (en) | 2018-02-01 |
TW201807597A (en) | 2018-03-01 |
KR102180487B1 (en) | 2020-11-18 |
JP6794162B2 (en) | 2020-12-02 |
TWI686716B (en) | 2020-03-01 |
WO2018020842A1 (en) | 2018-02-01 |
KR20190018480A (en) | 2019-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109478191A (en) | Text mining method, text mining program and text mining device | |
Klemmer et al. | Where do web sites come from? Capturing and interacting with design history | |
CN104036040B (en) | Report form generation method and device | |
US7493570B2 (en) | User interface options of a data lineage tool | |
US8081198B1 (en) | Compact clustered 2-D layout | |
US20120054653A1 (en) | Visualizing user interfaces | |
EP2793147A2 (en) | Computer-implemented system and method for visual search construction, document triage, and coverage tracking | |
EP2180700A1 (en) | Interface system for editing video data | |
CN110728124B (en) | Method, apparatus, device and storage medium for visualizing electronic forms | |
Stigall | How is biodiversity produced? Examining speciation processes during the GOBE | |
JP2010079534A (en) | Information display apparatus, information display method, and program | |
CN107256266A (en) | Query content display method and system | |
CN103309892A (en) | Method and equipment for information processing and Web browsing history navigation and electronic device | |
Wang et al. | Evaluating the effectiveness of tree visualization systems for knowledge discovery. | |
Elias | Enhancing User Interaction with Business Intelligence Dashboards | |
JPH0836585A (en) | Table type database working method | |
Flood et al. | A systematic evaluation of mobile spreadsheet apps | |
CN102402567A (en) | System and program for enumerating local alignments | |
CN112800246B (en) | Policy pedigree construction method and device and electronic equipment | |
CN106294404A (en) | A kind of method and apparatus retrieving location information in list | |
US20180095644A1 (en) | Navigation of data set preparation | |
Nizamee et al. | Visualizing the web search results with web search visualization using scatter plot | |
Javahery et al. | Pattern-based UI design: adding rigor with user and context variables | |
CN110309260A (en) | Text mining method, text mining storage medium and text mining device | |
Sajedi et al. | Improving learnability and usability of software applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |