CN109478191A - Text mining method, text mining program and text mining device - Google Patents

Text mining method, text mining program and text mining device Download PDF

Info

Publication number
CN109478191A
CN109478191A CN201780043375.8A CN201780043375A CN109478191A CN 109478191 A CN109478191 A CN 109478191A CN 201780043375 A CN201780043375 A CN 201780043375A CN 109478191 A CN109478191 A CN 109478191A
Authority
CN
China
Prior art keywords
mentioned
words
picture
text
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201780043375.8A
Other languages
Chinese (zh)
Other versions
CN109478191B (en
Inventor
秋田正史
中村康则
周景龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Screen Holdings Co Ltd
Original Assignee
Screen Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Screen Holdings Co Ltd filed Critical Screen Holdings Co Ltd
Publication of CN109478191A publication Critical patent/CN109478191A/en
Application granted granted Critical
Publication of CN109478191B publication Critical patent/CN109478191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/358Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

In text analyzing step (S109~S110), class type cluster analysis is carried out to the words extracted from the text data being entered.In picture generation step (S111), according to most data numbers (n) in group number (m) and group, acquire (m) a cluster from the analysis result of text analyzing step, generate in picture display comprising (n) it is a it is below belong to cluster words group picture data.In analysis as the result is shown step (S112), picture is shown according to picture data generated.In this way, the result of class type cluster analysis is shown in picture in such a way that user can intuitively understand.

Description

Text mining method, text mining program and text mining device
Technical field
The present invention relates to text minings, more particularly to by the analysis of text data as the result is shown in the text mining side of picture Method, text mining program and text mining device.
Background technique
In recent years, parsing is with a large amount of text datas documented by free form, and analytically result acquires useful information Text mining is attracted attention.Words is extracted in text mining, such as from the text data of analysis object, and passes through parsing words The frequency of occurrences and there is trend etc. to acquire information.
Hereinafter, will be analyzed as the result is shown for class type cluster analysis is carried out to the words extracted from text data It is inquired into the text mining device of picture.In class type cluster analysis, according to the similarity between words, and class type Cluster of the creation comprising the high words of similarity.In general, using arborescence shown in figure 15 (tree figure: Dendrogram the result of class type cluster analysis) is supplied to user (analyst).
With this case invention associated, a kind of grouping device is recorded in patent document 1, divide group single with class type Member, the class type divide group unit to construct arborescence, search for arborescence and generate can be from the index that lower layer to upper layer is determined simultaneously It is stored in storage unit.A kind of offer inquiry unit is provided in patent document 2, includes distance matrix computing unit, It calculates the distance between keyword, generate can search for keyword between keyword at a distance from distance matrix data and be stored in Storage unit;And divide group unit, divide keyword class type to group using distance matrix, and as can to upper layer search from lower layer The index from bottom to top of arborescence constructed by rope and be stored in storage unit.
Existing technical literature
Patent document
Patent document 1: Japanese Patent Laid-Open 2011-216021 bulletin
Patent document 2: Japanese Patent Laid-Open 2012-150539 bulletin
Summary of the invention
Problem to be solved by the invention
Previous text mining device is using arborescence by class type cluster analysis as the result is shown in picture.However, such as The problem of this text mining device can not intuitively understand analysis result there are user.For example, in analysis shown in figure 15 As a result in, when cluster number is set as 4 by user, as shown in figure 16, cut-off rule can be set on arborescence.However, using Person just can not intuitively identify the words that each cluster is included only by seeing arborescence so.Also, user is in words number When situation that is more and changing cluster number, it can not intuitively grasp where the words that each cluster is included can such as change.
Also, user can not learn which words is more important because arborescence does not record the frequency of occurrences of words. Also, when the text data for analyzing object is the situation of time series data of the information with date or moment etc., user It is desirable to learn that analysis result changes with time sometimes.However, previous text mining device is unable to satisfy user's Above-mentioned expectation.
Therefore, the object of the present invention is to provide can intuitively understand the result of class type cluster analysis with user Mode be shown in the text mining method, text mining program and text mining device of picture.
Technical means to solve problem
First embodiment of the present invention is a kind of text mining method, by the analysis of text data as the result is shown in picture, It is characterized in that, comprising:
Text analyzing step, to the words (can be individual character and/or word) extracted from the text data being entered Class type cluster analysis is carried out,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, asked from above-mentioned analysis result Above-mentioned group number cluster, generate in picture display belong to above-mentioned cluster comprising above-mentioned most data numbers are below The picture data of the group of words.
Second embodiment of the present invention is characterized in that, in the 1st embodiment of the invention,
The words that above-mentioned group includes be subordinated to according to the sequence of the frequency of occurrences from high to low it is corresponding with above-mentioned group It is selected in the words of cluster.
Third embodiment of the present invention is characterized in that, in the 2nd embodiment of the invention,
In above-mentioned picture, above-mentioned group has and belongs to the frequency of occurrences of the words of cluster corresponding with above-mentioned group The corresponding size of aggregate value.
Fourth embodiment of the present invention is characterized in that, in the 3rd embodiment of the invention,
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
Fifth embodiment of the present invention is characterized in that, in the 1st embodiment of the invention,
It further include the instruction input step for inputting the instruction from user,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according in above-mentioned instruction input step The instruction of middle input and be performed.
Sixth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction of above-mentioned group number is received,
In above-mentioned picture generation step, above-mentioned picture is generated according to the group number set in above-mentioned instruction input step Face data.
Seventh embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction of above-mentioned most data numbers is received,
In above-mentioned picture generation step, generated according to the most data numbers set in above-mentioned instruction input step State picture data.
Eighth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction during analysis object is received,
In above-mentioned text analyzing step, to the analysis set in above-mentioned instruction input step in above-mentioned text data The words that text data during object is included carries out above-mentioned class type cluster analysis.
Ninth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, the setting instruction of analysis target is received,
In above-mentioned text analyzing step, set from extraction in above-mentioned text data and in above-mentioned instruction input step The words for analyzing the corresponding type of target, carries out above-mentioned class type cluster analysis.
Tenth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
In above-mentioned instruction input step, receives words and excludes instruction,
In above-mentioned text analyzing step, the words indicated in above-mentioned instruction input step is excluded, above-mentioned rank is carried out Laminar cluster analysis.
Eleventh embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
Above-mentioned instruction input step receives near synonym registration instruction,
Multiple words indicated by above-mentioned instruction input step are considered as identical words by above-mentioned text analyzing step, and Carry out above-mentioned class type cluster analysis.
Twelfth embodiment of the present invention is characterized in that, in the 5th embodiment of the invention,
Above-mentioned instruction input step receives compound word registration instruction,
Multiple words indicated by above-mentioned instruction input step are merged into 1 words by above-mentioned text analyzing step, and Carry out above-mentioned class type cluster analysis.
13rd embodiment of the invention is characterized in that, in the 1st embodiment of the invention,
It is generated in above-mentioned picture generation step for showing the analysis result screen comprising above-mentioned group and for setting The picture data of the analysis setting screen of the display mode of above-mentioned analysis result screen.
Fourteenth embodiment of the present invention is a kind of text mining program, for by the analysis of text data as the result is shown in picture Face, which is characterized in that the text mining program is made the CPU of computer and executed following step using memory:
Text analyzing step carries out class type cluster analysis to the words extracted from the text data being entered,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, asked from above-mentioned analysis result Above-mentioned group number cluster, generate in picture display belong to above-mentioned cluster comprising above-mentioned most data numbers are below The picture data of the group of words.
Fifteenth embodiment of the present invention is characterized in that, in the 14th embodiment of the invention,
The words that above-mentioned group includes be subordinated to according to the sequence of the frequency of occurrences from high to low it is corresponding with above-mentioned group It is selected in the words of cluster.
Sixteenth embodiment of the present invention is characterized in that, in the 15th embodiment of the invention,
In above-mentioned picture, above-mentioned group has and belongs to the frequency of occurrences of the words of cluster corresponding with above-mentioned group The corresponding size of aggregate value.
Seventeenth embodiment of the present invention is characterized in that, in the 16th embodiment of the invention,
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
Eighteenth embodiment of the present invention is characterized in that, in the 14th embodiment of the invention,
The text mining program also makes above-mentioned computer execute the instruction input for inputting the instruction from user Step,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according in above-mentioned instruction input step The instruction of middle input and be performed.
Nineteenth embodiment of the present invention is characterized in that, in the 14th embodiment of the invention,
It is generated in above-mentioned picture generation step for showing the analysis result screen comprising above-mentioned group and for setting The picture data of the analysis setting screen of the display mode of above-mentioned analysis result screen.
20th embodiment of the invention is a kind of text mining device, by the analysis of text data as the result is shown in picture Face comprising:
Text analyzing portion carries out class type cluster analysis to the words extracted from the text data being entered,
Screen generating part generates picture data according to the analysis result in above-mentioned text analyzing portion, and
Portion as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
Above-mentioned screen generating part acquires above-mentioned group from above-mentioned analysis result according to most data numbers in group number and group The cluster of group number is generated for group of the display comprising above-mentioned most data number words below for belonging to above-mentioned cluster in picture The picture data of group.
21st embodiment of the invention is characterized in that, in the 20th embodiment of the invention,
The words that above-mentioned group includes be subordinated to according to the sequence of the frequency of occurrences from high to low it is corresponding with above-mentioned group It is selected in the words of cluster.
22nd embodiment of the invention is characterized in that, in the 21st embodiment of the invention,
In above-mentioned picture, above-mentioned group has and belongs to the frequency of occurrences of the words of cluster corresponding with above-mentioned group The corresponding size of aggregate value.
23rd embodiment of the invention is characterized in that, in the 22nd embodiment of the invention,
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
24th embodiment of the invention is characterized in that, in the 20th embodiment of the invention,
The text mining device also has the instruction input unit for inputting the instruction from user,
What any one of above-mentioned text analyzing portion and above-mentioned screen generating part basis inputted in above-mentioned instruction input unit Instruction carrys out work.
25th embodiment of the invention is characterized in that, in the 20th embodiment of the invention,
Above-mentioned screen generating part generates for showing the analysis result screen comprising above-mentioned group and for setting above-mentioned point Analyse the picture data of the analysis setting screen of the display mode of result screen.
The effect of invention
The embodiment of 1st, the 14th or the 20th according to the present invention carries out class type based on the words for being included to text data It is after cluster analysis as a result, the group for the words for being included comprising cluster is shown in picture.Also, the words number that group is included It is limited in most data numbers or less.Therefore, user sees the result that class type cluster analysis can be intuitively understood when picture.
The embodiment of 2nd, the 15th or the 21st according to the present invention occurs in the words that cluster is included in the inside of group The higher words of frequency is shown.Therefore, user can readily recognize the higher words of the frequency of occurrences that each cluster is included.
The embodiment of 3rd, the 16th or the 22nd according to the present invention, group have the words for being included with cluster in picture The corresponding size of the aggregate value of the frequency of occurrences.Therefore, user can readily recognize the words frequency of occurrences aggregate value it is biggish Cluster.
The embodiment of 4th, the 17th or the 23rd according to the present invention, words have ruler corresponding with word frequency in picture It is very little.Therefore, user, which can readily recognize, there is the higher words of frequency.
The embodiment of 5th, the 18th or the 24th according to the present invention can switch class type collection according to the instruction from user The display embodiment of the result of cluster analysis.
6th embodiment according to the present invention, can be according to the instruction from user, of group shown by image switching Number (number of cluster).
7th embodiment according to the present invention can switch for the words that group is included according to the instruction from user Several upper limit values.
8th embodiment according to the present invention is shown in picture to the text indicated by user during analysis object The words that data are included carries out the result of class type cluster analysis.Therefore, user can readily recognize class type cluster point The result of analysis changes with time.
9th embodiment according to the present invention, can be according to the analysis target indicated by user, the words of analysis of shift object Type simultaneously shows result obtained from carrying out class type cluster analysis in picture.
10th embodiment according to the present invention can be shown in picture by row order of going forward side by side except words indicated by user Result obtained from laminar cluster analysis.
11st embodiment according to the present invention, can be shown in picture multiple words indicated by user are considered as it is identical Words simultaneously will carry out result obtained from class type cluster analysis.
12nd embodiment according to the present invention can show in picture multiple words indicated by user merging into 1 A words is gone forward side by side result obtained from row order laminar cluster analysis.
The embodiment of 13rd, the 19th or the 25th according to the present invention, display analysis result screen and analysis setting screen.Cause This, analysis setting screen can be used to be easily switched into the display side of result obtained from row order laminar cluster analysis in user Formula.
Detailed description of the invention
Fig. 1 is the block diagram for showing the composition of text mining device of embodiment of the present invention.
Fig. 2 is the block diagram for showing the composition of the computer functioned as text mining device shown in FIG. 1.
Fig. 3 is the figure for showing the display picture of text mining device shown in FIG. 1.
Fig. 4 is the flow chart for showing the movement of text mining device shown in FIG. 1.
Fig. 5 is that the picture data of text mining device shown in FIG. 1 generates the flow chart of processing.
Fig. 6 is the figure for showing the data assigned picture of text mining device shown in FIG. 1.
Fig. 7 is the figure for showing the example for the text data for being input into text mining device shown in FIG. 1.
Fig. 8 is the figure for showing the target assigned picture of text mining device shown in FIG. 1.
Fig. 9 is the figure for showing the near synonym list selection picture of text mining device shown in FIG. 1.
Figure 10 is the figure for showing the compound word list selection picture of text mining device shown in FIG. 1.
Figure 11 A is shown in analysis result screen preceding during setting analysis object in text mining device shown in FIG. 1 Figure.
The analysis result that Figure 11 B is shown in after being set during analyzing object in text mining device shown in FIG. 1 is drawn The figure in face.
Figure 12 A is shown in the figure that the analysis result screen before words exclusion is carried out in text mining device shown in FIG. 1.
Figure 12 B is shown in the figure that the analysis result screen after words exclusion is carried out in text mining device shown in FIG. 1.
Figure 13 A, which is shown in, carries out the analysis result screen before near synonym registration in text mining device shown in FIG. 1 Figure.
Figure 13 B, which is shown in, carries out the analysis result screen after near synonym registration in text mining device shown in FIG. 1 Figure.
Figure 14 A, which is shown in, carries out the analysis result screen before compound word registration in text mining device shown in FIG. 1 Figure.
Figure 14 B, which is shown in, carries out the analysis result screen after compound word registration in text mining device shown in FIG. 1 Figure.
Figure 15 is the figure for showing the example of arborescence.
Figure 16 is the figure for showing the case where setting cluster number to arborescence shown in figure 15.
Figure 17 is shown in the figure of the words occurred in attached drawing and its explanation.
Specific embodiment
Hereinafter, being dug referring to attached drawing to the text mining method, text mining program and text of embodiments of the present invention Pick device is illustrated.The text mining method of present embodiment is executed usually using computer.The text of present embodiment Excavating program is to implement the program of text mining method to use computer.The text mining device of present embodiment is usual It is to be constituted using computer.The computer for executing text mining program is functioned as text mining device.
Fig. 1 is the block diagram for showing the composition of text mining device of embodiments of the present invention.Text shown in FIG. 1 is dug Digging device 10 has instruction input unit 11, text analyzing portion 12, screen generating part 13 and analysis portion 14 as the result is shown.To text Text data 5 of the input of excavating gear 10 as analysis object.Text mining device 10 is mentioned to from the text data 5 being entered The words taken carries out class type cluster analysis, and will analyze as the result is shown in picture.
The summary of the movement of text mining device 10 is as described below.The finger from user is inputted to instruction input unit 11 Show.Text analyzing portion 12 extracts words from the text data 5 being entered, and carries out class type cluster point to extracted words Analysis.Screen generating part 13 generates picture data according to the analysis result in text analyzing portion 12.Analyze 14 basis of portion as the result is shown Picture is shown by the picture data generated of screen generating part 13.
The instruction from user for being input to instruction input unit 11 includes: the setting of group number, most in group Setting, words during the setting of data number, analysis object exclude, near synonym registration, compound word registration etc..In text data 5 For the time series data of the information with date or moment etc. situation when, text analyzing portion 12 is to the textual data being entered According to the words that text data in 5, during the analysis object as set by instruction input unit 11 is included, class type is carried out Cluster analysis.
Screen generating part 13 is when generating picture data, according to most data numbers in group number and group (after details It states).Also, after having carried out indicated processing, screen generating part 13 generates new picture when user inputs new instruction Data, and analyze portion 14 as the result is shown and show new picture.In this way, text mining device 10 is cut according to instruction from the user The analysis mode of exchange of notes notebook data 5 and the display mode of analysis result.
Fig. 2 is the block diagram for showing the composition of the computer functioned as text mining device 10.It is shown in Fig. 2 Computer 20 has CPU (Central Processing Unit: central processing unit) 21, main memory 22, storage unit 23, defeated Enter portion 24, display unit 25, communication unit 26 and recording media reading section 27.Main memory 22 is for example using DRAM (Dynamic Random Access Memory: dynamic random access memory).Storage unit 23 for example using hard disk (Hard Disk) or is consolidated State hard disk (Solid State Drive).Input unit 24 is for example including keyboard (Keyboard) 28 and mouse (Mouse) 29.It is aobvious Show portion 25 for example using liquid crystal display.Communication unit 26 is the interface circuit of wire communication or wireless communication.Recording medium is read Portion 27 is the interface circuit for being stored with the recording medium 30 of program etc..Recording medium 30 is for example using CD-ROM (Compact Disc Read-Only Memory: compact disc read-only memory), DVD-ROM (Digital Versatile Disc Read- Only Memory: digital versatile disc read-only memory), USB (Universal Serial Bus: universal serial bus) The non-instantaneous recording medium such as memory.
When computer 20 executes the situation of text mining program 31, storage unit 23 stores text mining program 31 and text Data 5.Text mining program 31 and text data 5 for example may be either to be received using communication unit 26 from server or other computers , it can also be read for usage record medium reading part 27 from recording medium 30.
When executing text mining program 31, text mining program 31 and text data 5 are replicated and are transmitted to main memory 22.CPU 21 by main memory 22 as work with memory come using being stored in the text of main memory 22 by executing Program 31 is excavated, to handle the text data 5 for being stored in main memory 22.At this point, computer 20 is used as text mining device 10 and function.Furthermore the composition of above-described computer 20 only as an example of, arbitrary computer can be used to constitute text This excavating gear 10.
Hereinafter, using the Japanese data comprising Japanese words as text data 5.Figure 17 is shown in attached drawing and its explanation The figure of the words of appearance.The meaning of words (Japanese words) and words is recorded in each row of Figure 17.In the following description, exist When referring to Japanese words, the meaning of words is recorded in the bracket after words sometimes.Furthermore text data 5 also can be any The data of language.
Fig. 3 is the figure of the display picture of display text excavating gear 10.Display picture 40 shown in Fig. 3 includes analysis Result screen 41 and analysis setting screen 42.The analysis result of display text analysis portion 12 in analysis result screen 41.Dividing Show that GUI (Graphical User Interface: graphic user interface) component, the GUI component are used in analysis setting screen 42 In the analysis mode in setting text analyzing portion 12 and the characteristic of the picture data generated of screen generating part 13.
If the result to class type cluster analysis sets cluster number, the words that each cluster is included is determined.Will to from The words that text data 5 extracts carry out after class type cluster analysis as the result is shown when picture, text mining device 10 be with Mode shown in Fig. 3 shows group corresponding with cluster, to replace arborescence.
In the following description, cluster shown in picture is also known as group.User uses instruction input unit 11 Carry out most data numbers (upper limit value for the words number that group is included) in designated group number (cluster number) and group.Hereinafter, will The former is set as m, and the latter is set as n.
In text mining device 10, the words that text data 5 is included is categorized into m cluster, and wraps in each cluster Containing 1 or more words.M group is shown in analysis result screen 41, shows words in the inside of each group.Use cloud form Figure shows group, and the words that group is included is shown in the inside of elliptic region.The words that each group is included is limited At n or less.For example, when some cluster in n=5 includes the situation of 10 words, in analysis result screen 41, The inside of group shows 5 words.
First slider bar and 2 first buttons of the display for setting group number m (indicate symbol in analysis setting screen 42 Number "+" or "-"), the second slider bar for setting most data number n in group and 2 the second buttons and for setting point 4 boxes and 2 third buttons (indicating arrow to the left or right-hand arrow) during analysis object.
User moves left and right the sliding shoe court of the first slider bar by operation mouse 29, or presses the first button to refer to Show group number m.Group number m will increase when the first button for indicating symbol "+" is pressed, indicate the first of symbol "-" by It can then be reduced when button is pressed.The initial value of group number m, such as be set to the analysis result in text analyzing portion 12 and included The square root of the type of words, or be close to the subduplicate integer.For example, the analysis result in text analyzing portion 12 includes When having the situation of 16 kinds of words, the initial value of group number m is set to 4.
User moves left and right the sliding shoe court of the second slider bar by operation mouse 29, or presses the second button to refer to Show most data number n in group.Most data number n in group will increase or reduce when the second button is pressed.Group The initial value of interior most data number n is for example set to 5.
When text data 5 is the situation of time series data, user uses 4 by operation keyboard 28 or mouse 29 Box specifies date and moment, or presses third button come during indicating analysis object.It is being indicated to the left during analysis object When the third button of arrow is pressed, towards mobile specified amount (such as 1 month) in the past, and the third for indicating right-hand arrow by Then towards the mobile specified amount of opposite direction when button is pressed.Initial value during analysis object is for example set to from text data 5 it is oldest at the time of to it is newest at the time of during.Furthermore when text data 5 is not the situation of time series data, Yong Huwu During method designated analysis object.
Display 1 is above in analysis result screen 41 and a groups below of m, the inside of each group show 1 with Upper and n words below.In picture, the aggregate value of the frequency of occurrences for the words that cluster corresponding with each group is included is got over Greatly, which shows with being more amplified.When the words number that cluster is included is more than n situations, shown in the inside of group The higher n words of the frequency of occurrences.In picture, with regard to words that group is included with comprising the elliptic region of the words for, The frequency of occurrences of words is higher, and the words which is included is shown with being more amplified with the elliptic region comprising the words.Respectively Group's subscript has title.The highest words of the frequency of occurrences in the words that the title of group is included using cluster.The title of group It is underlined and is shown in the inside of group.Furthermore when the inside of elliptic region can not show the situation of words, replace word Word and show symbol " ... ".
Display (is indicated for the third slider bar of specified scaling multiplying power and 2 the 4th buttons in analysis result screen 41 Symbol "+" or "-").User makes the sliding shoe of third slider bar towards moving left and right by operation mouse 29, or press the 4th by Button sets scaling multiplying power.In analysis result screen 41, the group comprising words is amplified according to set scaling multiplying power Or reduce ground display.The initial value of scaling multiplying power is set to 100%.Institute is shown in the analysis result screen 41 of original state Some groups.
When user changes group number m, most data number n in group or analysis object in analysis setting screen 42 When period, the content of analysis result screen 41 generates variation according to the change.When user refers in analysis result screen 41 When showing that words excludes, near synonym are registered or compound word is registered, the content of analysis result screen 41 also generates change according to the instruction Change.
When to class type cluster analysis is carried out from the extracted words of text data 5, text mining device 10 is referring to depositing Contain the exclusion word list for the words that should be excluded, be stored with should be used as near synonym the near synonym list of words that handles and It is stored with and should be used as compound word come the compound word list of the words handled.With the multiple of equivalent (or roughly the same meaning) Words and 1 words for representing multiple words are established corresponding relationship and are stored near synonym list.If being linked just Multiple words as 1 compound word are established corresponding relationship with compound word obtained from the multiple words of connection and are stored in Compound word list.Such as " daigakusei (university student) " and " gakusei (student) " and " daigakusei " that represents the two It is established corresponding relationship and is stored near synonym list.Such as both " nintai (restraining oneself) " and " tsuyoi (strong) " and connection Obtained from " nintaizuyoi (endurance is strong) " be established corresponding relationship and be stored in compound word list.Text mining dress Set 10 has multiple near synonym lists and multiple compound word lists sometimes.
Fig. 4 is the flow chart of the movement of display text excavating gear 10.Fig. 5 is the frame numbers of display text excavating gear 10 According to the flow chart of the details of generation processing (step S111 shown in Fig. 4).Input unit 24 and the CPU 21 for executing step S113 make It is functioned for instruction input unit 11.The CPU 21 for executing step S109~S110 plays function as text analyzing portion 12 Energy.The CPU 21 for executing step S111 is functioned as screen generating part 13.Display unit 25 is with execution step S112's CPU21 is functioned as portion 14 as the result is shown is analyzed.Hereinafter, carrying out the movement to text mining device 10 referring to Fig. 4 and Fig. 5 It is illustrated.
Firstly, CPU 21 makes display unit 25 show data assigned picture 51 (step S101) shown in fig. 6.It is specified in data Picture 51 shows the box for specified file name and the box for specified folder name.User passes through specified in data Filename or folder name are specified in picture 51, carry out the text data 5 of designated analysis object.Text data 5 can be both stored in The storage units such as hard disk 23 can also be stored in the server connected using communication unit 26 or other computers etc..
Then, CPU 21 will use data assigned picture 51 and specified text data 5 is transmitted to main memory 22.Pass through In this way, text data 5 is input to text mining device 10 (step S102).Fig. 7 is the figure of the example of display text data 5. Text data shown in Fig. 7 is the data for the report that university student is created, and is the time series number of the information with the date According to.Text data shown in Fig. 7 is from being above followed successively by " relationship ... about university student in this lecture contents and society ", " general big Student's graduation do manual work before entering society or ... ", " it is to have paid expensive tuition fee learning that our students, which will have cognition ... " and " learn Life is the valuable time for making self confidence grow up.And ... ".Furthermore the text that text mining device 10 is analyzed The type of notebook data 5 is any.
Then, CPU 21 makes display unit 25 show target assigned picture 52 (step S103) shown in Fig. 8.It is specified in target Display corresponds to 3 radio buttons (Radio Button) of content, feature and evaluation in picture 52.User passes through operation mouse Mark 29 is pressed any radio button and is come from content, feature and the middle selection analysis target evaluated.Then, CPU 21, which is received, uses Analysis target specified by target assigned picture 52.In this way, analysis target is input to 10 (step of text mining device S104)。
Then, CPU 21 makes display unit 25 show near synonym list selection picture 53 (step S105) shown in Fig. 9.Close Adopted word list selects the title of near synonym list possessed by display text excavating gear 10 in picture 53 and is registered in each close The near synonym of adopted word list.User selects to select any near synonym to arrange in picture 53 near synonym list by operation mouse 29 Table, to specify near synonym list to be used.In this way, near synonym list (step is selected in text mining device 10 S106)。
Then, CPU 21 makes display unit 25 show compound word list selection picture 54 (step S107) shown in Fig. 10.? Compound word list selects the title of compound word list possessed by display text excavating gear 10 in picture 54 and is registered in each The compound word of compound word list.User selects to select any compound word to arrange in picture 54 in compound word list by operation mouse 29 Table, to specify compound word list to be used.In this way, compound word list (step is selected in text mining device 10 S108)。
Then, CPU 21 considers to exclude word list, near synonym list and compound word list, from step s 102 In the text data belonged to during analyzing object in the text data 5 being entered, extracts and correspond in step S104 meaning The words (step S109) of the type of fixed analysis target.CPU21 is when analyzing the situation that target is " content ", from text data Noun, proper noun, place name and name are extracted in 5.CPU 21 is from textual data when analyzing the situation that target is " feature " Noun, proper noun, サ, which are extracted, according to 5 becomes noun and verb.CPU 21 is when analyzing the situation that target is " evaluation ", from text Adjective is extracted in data 5, describes verb and interjection.Furthermore text mining device 10 can also be supported other than above-mentioned 3 Analyze target.Also, CPU 21 can also be extracted and above-mentioned different types of words according to each analysis target.
When text data 5 is the situation of time series data, CPU 21 is when executing step S109, only from text data Words is extracted in the text data for being included during the analysis object as indicated by user in 5.Also, being stored in words W1 When excluding the situation of word list, CPU 21 can ignore the words W1 that text data 5 is included when executing step S109 completely. Also, being established corresponding relationship with the words W2 for representing the two in words W2 and words W3 and being stored in selected near synonym and arrange When the situation of table, the words W3 that text data 5 is included can all be used as words W2 to come by CPU 21 when executing step S109 Processing.Also, the words W6 obtained from words W4 and words W5 and connection the two is established corresponding relationship and is stored in selected When the situation for the compound word list selected, CPU 21 is when executing step S109, the words for the connection that can included by text data 5 W4 is all used as words W6 to handle with words W5.
Then, CPU 21 carries out class type cluster analysis (step S110) to the words extracted in step S109.CPU 21 in step s 110, such as (which kind of degree 2 words show at a distance of according to the distance between 2 words in text data 5 Distance), to acquire the similarity between 2 words.CPU 21 uses defined side according to the similarity between obtained words Method (for example, knearest neighbour method, longest distance method, group average method, decimal system method, Hua Defa (Ward ' s Method) etc.) carries out Class type cluster analysis.Also, CPU 21 is in step s 110, the frequency of occurrences of each words is acquired.
Then, CPU 21 according in the obtained class type cluster analysis of step S110 as a result, to generate for showing Analyze the picture data (step S111) of result.CPU 21 carries out processing shown in fig. 5 in step S111.
Group number is set as m by CPU 21, and most data numbers in group are set as n (step S201).Then, 21 CPU For class type cluster analysis as a result, cluster number is set as m, to acquire m cluster (step S202).Then, 21 CPU The aggregate value (step S203) of the frequency of occurrences for the words that cluster is included is acquired for each cluster.Then, CPU 21 according to The aggregate value of the frequency of occurrences acquired in step S203, to determine the display size (step S204) of each group.In step S204 In, the aggregate value of the frequency of occurrences for the words that cluster is included is bigger, and the display size of group is just decided to be bigger.
Then, CPU 21 is directed to each cluster, and the words (step S205) that should be shown is selected from the words that cluster is included. In step S205, it is a below that n is selected from the words that each cluster is included according to the sequence of the frequency of occurrences from high to low Words.Then, CPU 21 is directed to each words selected in step S205, determines words according to the frequency of occurrences of words Display size (step S206).In step S206, the higher words of the frequency of occurrences, the display size of words is just decided to be more Greatly.
Then, CPU 21 generates the picture data (step S207) for showing the result of class type cluster analysis.In step (indicate) that the m group be in step S204 with cloud graphic comprising m group in the picture data generated in rapid S207 The size of middle decision.It include n words below in the inside of each group, the n words below have in step S206 The size of decision.Words is shown in the inside of group in picture.CPU 21 is after performing step S207, end picture Data generation processing.
Then, CPU 21 makes display unit 25 show the picture (step based on the picture data generated in step S111 S112).Then, CPU 21 receives the instruction (step S113) from user.Then, CPU 21 connects according in step S113 The type of the instruction of receipts advances to the either step (step S114) in step S115~S120.
When what CPU 21 was received in step S113 is designated as the situation of " setting of group number ", towards before step S115 Into.In the situation, group number m is set as value (step S115) indicated by user by CPU 21, and towards step S111 Advance.Thereafter, picture data is generated according to set group number m, and shows new picture.In this way, display includes institute The analysis result screen of the group of specified number.
When what CPU 21 was received in step S113 is designated as the situation of " settings of most data numbers in group ", court Advance to step S116.In the situation, most data number n in group are set as the (step of value indicated by user by CPU 21 Rapid S116), and advance towards step S111.Thereafter, picture data is generated according to most data number n in set group, And show new picture.In this way, show that the words number that each group is included is limited in specified below point of value Analyse result screen.
When what CPU 21 was received in step S113 is designated as the situation of " setting during analysis object ", towards step S117 advances.In the situation, CPU 21 during analyzing object by during being set as indicated by user (step S117), and court Advance to step S109.Thereafter, it referring to class type cluster analysis is carried out during set analysis object, generates new for showing Analysis result picture data, and show new picture.In this way, display is directed to specified analysis object in picture During text data included words carry out class type cluster analysis obtained from result.
Figure 11 A is the figure of analysis result screen preceding during showing setting analysis object.Figure 11 B is display setting analysis pair As the figure of the analysis result screen after period.In analysis result screen 61 before setting shown in Figure 11 A, display is to being inputted Text data 5 in from 1 day 0 January in 2014 when 00 divide and included by text data only when dividing to 31 days 24 December in 2015 Words carry out class type cluster analysis obtained from result.In analysis result screen 62 after being set shown in Figure 11 B, show 00 divides for textual data only when dividing on September 30,24 2014 when showing in the text data 5 inputted from 1 day 0 March in 2014 Result obtained from class type cluster analysis is carried out according to the words for being included.The display content and analysis for analyzing result screen 61 are tied The display content of fruit picture 62 is different.User can be held by the analysis result screen of front and back during overview setup analysis object It changes places and identifies that class type cluster analysis result changes with time.
When what CPU 21 was received in step S113 is designated as the situation of " words exclusion ", advance towards step S118. In the situation, specified words is appended to and excludes word list (step S118) by CPU 21, and towards before step S109 Into.Thereafter, specified words is excluded into row order laminar cluster analysis of going forward side by side, generates the picture for showing new analysis result Data, and show new picture.In this way, it is shown in picture and specified words is excluded into row order laminar cluster point of going forward side by side Result obtained from analysis.
Figure 12 A is the figure for showing the analysis result screen before carrying out words exclusion.Figure 12 B is after display carries out words exclusion Analysis result screen figure.User's operation mouse 29, after the words for having selected to exclude, instruction carries out words exclusion. In analysis result screen 63 before the exclusion of the words shown in Figure 12 A, select " shakai (society) ", and be selected from the menu " words exclusion ".Thereafter, it is shown in picture and " shakai " is excluded into result obtained from row order laminar cluster analysis of going forward side by side.? In analysis result screen 64 after the exclusion of words shown in Figure 12 B, replaces " shakai " and show " shingaku (entering a higher school) ".? In the words for being included with " shakai " same cluster, " shingaku " is that the frequency of occurrences is only second to show in analysis result screen 63 The frequency of occurrences soprano of 5 words shown.
When what CPU 21 was received in step S113 is designated as the situation of " near synonym registration ", towards before step S119 Into.In the situation, indicated words is appended to the near synonym list (step S119) in being used, and court by CPU 21 Advance to step S109.Thereafter, consider indicated near synonym and carry out class type cluster analysis, generate for showing new point The picture data of result is analysed, and shows new picture.In this way, it shows in picture using indicated words as near synonym Result obtained from row order of going forward side by side laminar cluster analysis.
Figure 13 A is the figure for showing the analysis result screen after carrying out near synonym registration.Figure 13 B is that display progress near synonym are stepped on The figure of analysis result screen after note.User's operation mouse 29, after multiple words that selection should be used as near synonym registration, instruction Carry out near synonym registration.In analysis result screen 65 before the registration of the near synonym shown in Figure 13 A, " daigakusei is (big for selection Student) " and " gakusei (student) ", and it is selected from the menu " near synonym registration ".Thereafter, showing in picture will " daigakusei " and " gakusei " go forward side by side result obtained from row order laminar cluster analysis as near synonym.Shown in Figure 13 B Near synonym registration after analysis result screen 66 in, " daigakusei " is with bigger size compared with analyzing result screen 65 It is shown, and " shingaku (entering a higher school) " substitution " gakusei " and be shown.According to the frequency of occurrences of " daigakusei " with The aggregate value of the frequency of occurrences of " gakusei ", " daigakusei " with analysis result screen 65 in " daigakusei " phase It is more shown than bigger size.
When what CPU 21 was received in step S113 is designated as the situation of " compound word registration ", towards before step S120 Into.In the situation, indicated words is appended to by CPU 21 be used in compound word list (step S120), and court Advance to step S109.Thereafter, consider that indicated compound word is gone forward side by side row order laminar cluster analysis, is generated for showing new point The picture data of result is analysed, and shows new picture.In this way, it shows in picture using specified words as compound word Result obtained from row order of going forward side by side laminar cluster analysis.
Figure 14 A is the figure for showing the analysis result screen before carrying out compound word registration.Figure 14 B is that display progress compound word is stepped on The figure of analysis result screen after note.Multiple words that user is registered in operation mouse 29 to select to should be used as compound word Afterwards, instruction carries out " near synonym registration ".In the analysis result screen 67 before compound word shown in figure 14 A registration, " nintai (restraining oneself) " is selected with " tsuyoi (strong) ", and " compound word registration " is selected from a menu.Thereafter, showing in picture will " nintai " and " tsuyoi " go forward side by side result obtained from row order laminar cluster analysis as compound word.It is multiple shown in Figure 14 B Close word registration after analysis result screen 68 in, replace " nintai " and " tsuyoi ", and with " nintai " and " tsuyoi " with Under size show " nintaizuyoi (endurance is strong) ".
As shown above, the text mining method of present embodiment includes: text analyzing step, to from the text being entered The words extracted in data carries out class type cluster analysis;Picture generation step, according to the analysis of text analyzing step as a result, Generate picture data;And step as the result is shown is analyzed, picture is shown according to picture data.In picture generation step, according to Most data number n in group number m and group acquire m cluster from analysis result, and generate for that will include n below The group for the words that cluster is included is shown in the picture data of picture.Text mining method according to the present embodiment, can root According to the words for being included to text data carry out class type cluster analysis as a result, in picture display containing cluster included The group of words.Also, the quantity of the included words of group is limited in n or less.Therefore, user, can when seeing picture Intuitively understand the result of class type cluster analysis.
Also, the words that group is included is to be wrapped according to the sequence of the frequency of occurrences from high to low from cluster corresponding with group It is selected in the words contained.Therefore, in the inside of group, the frequency of occurrences shown in the words that cluster is included is higher Words.Therefore, user can readily recognize the higher words of the frequency of occurrences that each cluster is included.Also, group is in picture With size corresponding with the aggregate value of the frequency of occurrences of following words, which is that cluster corresponding with group is included Words.Therefore, user can readily recognize the biggish cluster of aggregate value of the words frequency of occurrences.Also, the word that group is included Word has size corresponding with the frequency of occurrences of words in picture.Therefore, user can readily recognize that frequency occur higher Words.
Also, text mining method includes the instruction input step for inputting the instruction from user, and text analyzing Either step in step and picture generation step is performed according to the instruction inputted in instruction input step.Therefore, may be used According to the instruction from user, switch the display mode of the result of class type cluster analysis.In particular, in instruction input step Receive group number m setting instruction, and in picture generation step according to instruction input step in specified by group number m come Generate picture data.In this way, according to instruction from the user, switch the areal (cluster being shown in picture Number).Also, receive most data number n in group in instruction input step, and in picture generation step, according to indicating The most data number n in group specified in input step generate picture data.In this way, according to the finger from user Show, switches in the number of the words shown in region.
Also, receive in instruction input step to the instruction during analysis object, and in text analyzing step, to text In notebook data indicate input step in specify analysis object during text data included words progress stratum Formula cluster analysis.Therefore, the word for being included to text data during analyzing object indicated by user is shown in picture Word carries out result obtained from class type cluster analysis.Therefore, user can readily recognize the result of class type cluster analysis It changes with time.Also, the setting instruction of analysis target is received in instruction input step, and in text analyzing step, from The words that type corresponding with the analysis target set in instruction input step is extracted in text data 5, carries out class type Cluster analysis.In this way, it can show that analysis target is according to indicated by user come the word of analysis of shift object in picture Word type is gone forward side by side result obtained from row order laminar cluster analysis.
Also, receiving words in instruction input step excludes instruction, and in text analyzing step, it will be in instruction input step The words indicated in rapid excludes row order laminar cluster analysis of going forward side by side.In this way, it can show and arrange words indicated by user Except result obtained from row order laminar cluster analysis of going forward side by side.Also, near synonym registration instruction is received in instruction input step, and In text analyzing step, by instruction input step multiple words for indicating be considered as identical words and go forward side by side row order laminar cluster Analysis.In this way, it can show that multiple words indicated by user, which are considered as identical words, goes forward side by side row order laminar in picture Result obtained from cluster analysis.Also, compound word registration instruction is received in instruction input step, and in text analyzing step In, by instruction input step multiple words for indicating merge into 1 words and go forward side by side row order laminar cluster analysis.Pass through this Sample can show that multiple words indicated by user, which are merged into 1 words, goes forward side by side row order laminar cluster analysis in picture As a result.
Also, generating picture data in picture generation step, which is for showing the analysis knot comprising group The analysis setting screen of fruit picture and the display mode for setting analysis result screen.Therefore, analysis result screen and analysis Setting screen is shown.Therefore, user analysis setting screen can be used and be easily switched into the cluster analysis of row order laminar and The display mode of obtained result.
The text mining program 31 of present embodiment and the text mining device 10 of present embodiment have and this embodiment party The identical composition of text mining processing method of formula, to play identical effect.
Text mining method, text mining program and text mining device according to the present embodiment, can be according to text The words that data are included carries out obtained from class type cluster analysis as a result, display is comprising below most data numbers in picture The cluster words that is included group.Therefore, user can intuitively understand class type cluster analysis when seeing picture As a result.
Furthermore this case be advocate based on July 25th, 2016 file an application it is entitled " text mining method, Japan Patent Patent 2016-145065 priority of text mining program and text mining device " and the application proposed, The content of the Japan Patent Patent 2016-145065 application is contained in the application by reference.
The explanation of appended drawing reference
5 text datas
10 text mining devices
11 instruction input units
12 text analyzing portions
13 screen generating parts
14 analyze portion as the result is shown
20 computers
21 CPU
22 main memories
23 storage units
24 input units
25 display units
30 recording mediums
31 text mining programs
40 display pictures
41,61~68 analysis result screen
42 analysis setting screens
51 data assigned pictures
52 target assigned pictures
53 near synonym lists select picture
54 compound word lists select picture

Claims (25)

1. a kind of text mining method, by the analysis of text data as the result is shown in picture characterized by comprising
Text analyzing step carries out class type cluster analysis to the words extracted from the text data being entered,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, acquired from above-mentioned analysis result The cluster of group number is stated, is generated for showing to include above-mentioned most data number words below for belonging to above-mentioned cluster in picture Group picture data.
2. text mining method as described in claim 1, which is characterized in that
The words that above-mentioned group includes is to be subordinated to cluster corresponding with above-mentioned group according to the sequence of the frequency of occurrences from high to low Words in select.
3. text mining method as claimed in claim 2, which is characterized in that
In above-mentioned picture, above-mentioned group has and belongs to the total of the frequency of occurrences of the words of cluster corresponding with above-mentioned group It is worth corresponding size.
4. text mining method as claimed in claim 3, which is characterized in that
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
5. text mining method as described in claim 1, which is characterized in that
It further include the instruction input step for inputting the instruction from user,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according to defeated in above-mentioned instruction input step The instruction that enters and be performed.
6. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction of above-mentioned group number is received,
In above-mentioned picture generation step, above-mentioned frame numbers are generated according to the group number set in above-mentioned instruction input step According to.
7. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction of above-mentioned most data numbers is received,
In above-mentioned picture generation step, above-mentioned picture is generated according to the most data numbers set in above-mentioned instruction input step Face data.
8. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction during analysis object is received,
In above-mentioned text analyzing step, to the analysis object set in above-mentioned instruction input step in above-mentioned text data During the text data words that is included, carry out above-mentioned class type cluster analysis.
9. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, the setting instruction of analysis target is received,
In above-mentioned text analyzing step, from the analysis extracted in above-mentioned text data with set in above-mentioned instruction input step The words of the corresponding type of target carries out above-mentioned class type cluster analysis.
10. text mining method as claimed in claim 5, which is characterized in that
In above-mentioned instruction input step, receives words and excludes instruction,
In above-mentioned text analyzing step, the words indicated in above-mentioned instruction input step is excluded, above-mentioned class type is carried out Cluster analysis.
11. text mining method as claimed in claim 5, which is characterized in that
Above-mentioned instruction input step receives near synonym registration instruction,
Multiple words indicated by above-mentioned instruction input step are considered as identical words by above-mentioned text analyzing step, and are carried out Above-mentioned class type cluster analysis.
12. text mining method as claimed in claim 5, which is characterized in that
Above-mentioned instruction input step receives compound word registration instruction,
Multiple words indicated by above-mentioned instruction input step are merged into 1 words by above-mentioned text analyzing step, and are carried out Above-mentioned class type cluster analysis.
13. text mining method as described in claim 1, which is characterized in that
Generated in above-mentioned picture generation step for show to include the analysis result screen of above-mentioned group and above-mentioned for setting Analyze the picture data of the analysis setting screen of the display mode of result screen.
14. a kind of text mining program, for by the analysis of text data as the result is shown in picture, which is characterized in that the text This excavation program is made the CPU of computer and is executed following step using memory:
Text analyzing step carries out class type cluster analysis to the words extracted from the text data being entered,
Picture generation step generates picture data according to the analysis result in above-mentioned text analyzing step, and
Step as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
In above-mentioned picture generation step, according to most data numbers in group number and group, acquired from above-mentioned analysis result The cluster of group number is stated, is generated for showing to include above-mentioned most data number words below for belonging to above-mentioned cluster in picture Group picture data.
15. text mining program as claimed in claim 14, which is characterized in that
The words that above-mentioned group includes is to be subordinated to cluster corresponding with above-mentioned group according to the sequence of the frequency of occurrences from high to low Words in select.
16. text mining program as claimed in claim 15, which is characterized in that
In above-mentioned picture, above-mentioned group has and belongs to the total of the frequency of occurrences of the words of cluster corresponding with above-mentioned group It is worth corresponding size.
17. text mining program as claimed in claim 16, which is characterized in that
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
18. text mining program as claimed in claim 14, which is characterized in that
The text mining program also makes above-mentioned computer execute the instruction input step for inputting the instruction from user,
Either step in above-mentioned text analyzing step and above-mentioned picture generation step is according to defeated in above-mentioned instruction input step The instruction that enters and be performed.
19. text mining program as claimed in claim 14, which is characterized in that
Generated in above-mentioned picture generation step for show to include the analysis result screen of above-mentioned group and above-mentioned for setting Analyze the picture data of the analysis setting screen of the display mode of result screen.
20. a kind of text mining device, by the analysis of text data as the result is shown in picture comprising:
Text analyzing portion carries out class type cluster analysis to the words extracted from the text data being entered,
Screen generating part generates picture data according to the analysis result in above-mentioned text analyzing portion, and
Portion as the result is shown is analyzed, picture is shown according to above-mentioned picture data;
Above-mentioned screen generating part acquires above-mentioned group number from above-mentioned analysis result according to most data numbers in group number and group Cluster, generate for group of the display comprising above-mentioned most data number words below for belonging to above-mentioned cluster in picture Picture data.
21. text mining device as claimed in claim 20, which is characterized in that
The words that above-mentioned group includes is to be subordinated to cluster corresponding with above-mentioned group according to the sequence of the frequency of occurrences from high to low Words in select.
22. text mining device as claimed in claim 21, which is characterized in that
In above-mentioned picture, above-mentioned group has and belongs to the total of the frequency of occurrences of the words of cluster corresponding with above-mentioned group It is worth corresponding size.
23. text mining device as claimed in claim 22, which is characterized in that
In above-mentioned picture, the words that above-mentioned group includes has size corresponding with the frequency of occurrences of above-mentioned words.
24. text mining device as claimed in claim 20, which is characterized in that
The text mining device also has the instruction input unit for inputting the instruction from user,
Any one of above-mentioned text analyzing portion and above-mentioned screen generating part are according to the instruction inputted in above-mentioned instruction input unit Carry out work.
25. text mining device as claimed in claim 20, which is characterized in that
Above-mentioned screen generating part generates for showing the analysis result screen comprising above-mentioned group and for setting above-mentioned analysis knot The picture data of the analysis setting screen of the display mode of fruit picture.
CN201780043375.8A 2016-07-25 2017-06-06 Text mining method, recording medium, and text mining device Active CN109478191B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
JP2016145065A JP6794162B2 (en) 2016-07-25 2016-07-25 Text mining methods, text mining programs, and text mining equipment
JP2016-145065 2016-07-25
PCT/JP2017/020922 WO2018020842A1 (en) 2016-07-25 2017-06-06 Text mining method, text mining program, and text mining apparatus

Publications (2)

Publication Number Publication Date
CN109478191A true CN109478191A (en) 2019-03-15
CN109478191B CN109478191B (en) 2022-04-08

Family

ID=61015910

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780043375.8A Active CN109478191B (en) 2016-07-25 2017-06-06 Text mining method, recording medium, and text mining device

Country Status (5)

Country Link
JP (1) JP6794162B2 (en)
KR (1) KR102180487B1 (en)
CN (1) CN109478191B (en)
TW (1) TWI686716B (en)
WO (1) WO2018020842A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7078429B2 (en) * 2018-03-20 2022-05-31 株式会社Screenホールディングス Text mining methods, text mining programs, and text mining equipment
EP3882786A4 (en) 2019-05-17 2022-03-23 Aixs, Inc. Cluster analysis method, cluster analysis system, and cluster analysis program
JP7456486B2 (en) * 2020-02-25 2024-03-27 日本電気株式会社 Item classification support system, method and program
JPWO2022130547A1 (en) * 2020-12-16 2022-06-23

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0991314A (en) * 1995-07-14 1997-04-04 Fuji Xerox Co Ltd Information search device
JP2000227917A (en) * 1999-02-05 2000-08-15 Agency Of Ind Science & Technol Thesaurus browsing system and method therefor and recording medium recording its processing program
JP2003044491A (en) * 2001-07-30 2003-02-14 Toshiba Corp Knowledge analytic system. method for setting analytic condition, saving analytic condition and re-analyzing processing in the system
JP2005107688A (en) * 2003-09-29 2005-04-21 Nippon Telegr & Teleph Corp <Ntt> Information display method and system and information display program
CN1934570A (en) * 2004-03-18 2007-03-21 日本电气株式会社 Text mining device, method thereof, and program
JP2010039671A (en) * 2008-08-04 2010-02-18 Nippon Telegr & Teleph Corp <Ntt> Text mining apparatus, method, program, and recording medium
CN104142918A (en) * 2014-07-31 2014-11-12 天津大学 Short text clustering and hotspot theme extraction method based on TF-IDF characteristics
CN104504024A (en) * 2014-12-11 2015-04-08 中国科学院计算技术研究所 Method and system for mining keywords based on microblog content
CN105550365A (en) * 2016-01-15 2016-05-04 中国科学院自动化研究所 Visualization analysis system based on text topic model

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6611825B1 (en) * 1999-06-09 2003-08-26 The Boeing Company Method and system for text mining using multidimensional subspaces
KR20090069874A (en) * 2007-12-26 2009-07-01 한국과학기술정보연구원 Method of selecting keyword and similarity coefficient for knowledge map analysis, and system thereof and media that can record computer program sources for method therof
JP5439261B2 (en) 2010-04-01 2014-03-12 日本電信電話株式会社 Clustering apparatus, clustering method, and clustering program
JP5545876B2 (en) 2011-01-17 2014-07-09 日本電信電話株式会社 Query providing apparatus, query providing method, and query providing program
US9477704B1 (en) * 2012-12-31 2016-10-25 Teradata Us, Inc. Sentiment expression analysis based on keyword hierarchy
TW201516713A (en) * 2013-10-16 2015-05-01 Chunghwa Telecom Co Ltd File classification method based on group characteristic values

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0991314A (en) * 1995-07-14 1997-04-04 Fuji Xerox Co Ltd Information search device
JP2000227917A (en) * 1999-02-05 2000-08-15 Agency Of Ind Science & Technol Thesaurus browsing system and method therefor and recording medium recording its processing program
JP2003044491A (en) * 2001-07-30 2003-02-14 Toshiba Corp Knowledge analytic system. method for setting analytic condition, saving analytic condition and re-analyzing processing in the system
JP2005107688A (en) * 2003-09-29 2005-04-21 Nippon Telegr & Teleph Corp <Ntt> Information display method and system and information display program
CN1934570A (en) * 2004-03-18 2007-03-21 日本电气株式会社 Text mining device, method thereof, and program
JP2010039671A (en) * 2008-08-04 2010-02-18 Nippon Telegr & Teleph Corp <Ntt> Text mining apparatus, method, program, and recording medium
CN104142918A (en) * 2014-07-31 2014-11-12 天津大学 Short text clustering and hotspot theme extraction method based on TF-IDF characteristics
CN104504024A (en) * 2014-12-11 2015-04-08 中国科学院计算技术研究所 Method and system for mining keywords based on microblog content
CN105550365A (en) * 2016-01-15 2016-05-04 中国科学院自动化研究所 Visualization analysis system based on text topic model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙露乔: "文本挖掘的研究及其在主题搜索引擎中的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Also Published As

Publication number Publication date
CN109478191B (en) 2022-04-08
JP2018018118A (en) 2018-02-01
TW201807597A (en) 2018-03-01
KR102180487B1 (en) 2020-11-18
JP6794162B2 (en) 2020-12-02
TWI686716B (en) 2020-03-01
WO2018020842A1 (en) 2018-02-01
KR20190018480A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
CN109478191A (en) Text mining method, text mining program and text mining device
Klemmer et al. Where do web sites come from? Capturing and interacting with design history
CN104036040B (en) Report form generation method and device
US7493570B2 (en) User interface options of a data lineage tool
US8081198B1 (en) Compact clustered 2-D layout
US20120054653A1 (en) Visualizing user interfaces
EP2793147A2 (en) Computer-implemented system and method for visual search construction, document triage, and coverage tracking
EP2180700A1 (en) Interface system for editing video data
CN110728124B (en) Method, apparatus, device and storage medium for visualizing electronic forms
Stigall How is biodiversity produced? Examining speciation processes during the GOBE
JP2010079534A (en) Information display apparatus, information display method, and program
CN107256266A (en) Query content display method and system
CN103309892A (en) Method and equipment for information processing and Web browsing history navigation and electronic device
Wang et al. Evaluating the effectiveness of tree visualization systems for knowledge discovery.
Elias Enhancing User Interaction with Business Intelligence Dashboards
JPH0836585A (en) Table type database working method
Flood et al. A systematic evaluation of mobile spreadsheet apps
CN102402567A (en) System and program for enumerating local alignments
CN112800246B (en) Policy pedigree construction method and device and electronic equipment
CN106294404A (en) A kind of method and apparatus retrieving location information in list
US20180095644A1 (en) Navigation of data set preparation
Nizamee et al. Visualizing the web search results with web search visualization using scatter plot
Javahery et al. Pattern-based UI design: adding rigor with user and context variables
CN110309260A (en) Text mining method, text mining storage medium and text mining device
Sajedi et al. Improving learnability and usability of software applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant