CN102937985A - Method for classifying, optimizing and analyzing website based on user mental model - Google Patents

Method for classifying, optimizing and analyzing website based on user mental model Download PDF

Info

Publication number
CN102937985A
CN102937985A CN2012104137748A CN201210413774A CN102937985A CN 102937985 A CN102937985 A CN 102937985A CN 2012104137748 A CN2012104137748 A CN 2012104137748A CN 201210413774 A CN201210413774 A CN 201210413774A CN 102937985 A CN102937985 A CN 102937985A
Authority
CN
China
Prior art keywords
concept
user
cognition
matrix
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012104137748A
Other languages
Chinese (zh)
Other versions
CN102937985B (en
Inventor
吴鹏
张佩佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201210413774.8A priority Critical patent/CN102937985B/en
Publication of CN102937985A publication Critical patent/CN102937985A/en
Application granted granted Critical
Publication of CN102937985B publication Critical patent/CN102937985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for classifying, optimizing and analyzing a website based on a user mental model. The method comprises the following steps of: firstly preprocessing log data of the website, wherein the log data contains the data relative to a concept of optimizing of a website classified catalogue issued by a user based on cognition of the user on the website classified catalogue, and extracting the concept from the log data by preprocessing; then determining a co-occurrence relation between the concept issued by the user and the concept in the website classified catalogue by a user mental model classification theory, wherein the concept presents a specific name of the website classified catalogue, such as books and daily articles; converting the co-occurrence relation into a co-occurrence matrix; converting the co-occurrence matrix into a similarity matrix by virtue of a pearson coefficient algorithm; and finally carrying out clustering analysis and multi-dimensional dimensional analysis to analyze similarity and spatiality among concepts of the cognition of the user on the website classified catalogue. Due to the adoption of six steps, decision supports can be provided to the optimizing of the website classified catalogue from a quantified angle based on the user mental model of the website.

Description

A kind of websites collection method for optimization analysis based on user's mental model
Technical field
The present invention relates to a kind of websites collection method for optimization analysis, particularly a kind of websites collection method for optimization analysis based on user's mental model.
Background technology
The optimization of site information taxonomic hierarchies is on the basis of the existing information classification system in assessment website, and whether determine needs to adjust the existing information classification system in website, then determines how to adjust if need to adjust.The at present research for site information taxonomic hierarchies optimization method also is short of very much, mainly concentrate in the research of classification foundation, standard, the research of principle scheduling theory and the particular problem phenomenons such as the level of classifying, granularity, also only pay close attention to from existing sorting technique and look for defective, profound exploration is carried out in rarer research under the support of certain theoretical method.
Norman has proposed to exist in the interaction design 3 kinds of mental models first in his " The Design ofEveryday Things " book, namely show model, user's mental model, system model, he think performance model and user's mental model more near the time, the user more can understand the web site organization structure, can carry out more efficiently acquisition of information.Therefore at the mental model that carries out examining or check when the site information taxonomic hierarchies is optimized the user as far as possible, namely the user is to the cognition of websites collection system.
In website user's mental model was measured, psychological approximate data was the subjective evaluation by the concept Relations Among of individual perception, and wherein " similarity ", " spatiality " are mainly to take measurement of an angle.The quantitative measurment of mental model all is as starting point mostly take concept similarity, extract the related notion in the research theme, adopt different sorting technique contrived experiments, obtain tested similarity assessment data to concept, to data analysis, characterize the user about the mental model of correlative study theme.Cluster analysis is exactly to process the concept similarity data usually, according to concept similarity concept is classified.And the spatiality of concept refers to relative position (the Rusbult C.E of different concepts in tested psychological space, Onizuka R.K, Lipkus I.What Do We Really Want: MentalModels of Ideal Romantic Involvement Explored through Multidimensional Scaling[J] .Journal ofExperimental Social Psychology, 1993,29 (9): 493-527), the multi-dimentional scale method can be used for the measurement of concept space, obtain the user about the spatial characterization of concept, observe intuitively the user about the mental model of a certain field concept.And the current research that both are combined for the measurement of web catalogue optimization user mental model does not also have.
Current research to the website user also all is to rest on traditional user to investigate the stage, the user who adopts investigates mode and mainly comprises: scene method, focus group, usability testing, in-depth interview, observation etc., but all there is limitation in these methods, the valid data that can get access to are very limited, Expenses Cost is high and when adopting these modes to carry out user investigation, relating to problem can not be too many, so acquired information is too macroscopical, is difficult to obtain to the real useful detailed information of user behavior research.
Therefore still there are some problems in websites collection system optimization method: (1) is difficult to carry out effective user study, is difficult to collect all sidedly the user to the cognition of website; (2) seldom " customer-centric " carry out the websites collection system optimization.
Summary of the invention
Technical matters solved by the invention is to provide a kind of websites collection method for optimization analysis based on user's mental model.
The technical solution that realizes the object of the invention is: a kind of websites collection method for optimization analysis based on user's mental model, and step is as follows:
Step 1, the web log file data are carried out pre-service, are specially:
Step 1-1, the web log file data are purified, irrelevant or have wrong data with analysis purpose in the deletion journal file, the irrelevant data of described and analysis purpose comprise: comprise concept in the split catalog data, comprise product with the data of coded representation; The described data of mistake that exist comprise: misspelling, product description mistake; The attribute of selecting afterwards Data processing to need, described attribute comprises user's name, user region, user cognition concept, product category, described user cognition concept for the user based on the concept of optimizing about web catalogue that the cognition of web catalogue is submitted to;
Step 1-2, the data that purified among the step 1-1 are carried out format conversion, the form of the user cognition concept extracted and region, three attributes of title is unified, be specially and remove numbering, unified, the single plural number of capital and small letter is unified;
Step 1-3, determine the frequency that the user cognition concept occurs, setting threshold afterwards, threshold value is according to actual amount of data and extract user cognition concept quantity and determine, chooses the frequency greater than the user cognition concept of this threshold value, and the record frequency;
Step 2, determine the concept co-occurrence whether in user cognition concept and the web catalogue, specifically utilize the mental model category theory, the user cognition concept is retrieved concept and the frequency in the split catalog that occurs in the statistics result for retrieval to the website as search key;
Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetric matrix, the first row and first is classified concept as, comprise the concept in user cognition concept and the web catalogue, the remaining element lattice are the co-occurrence frequency between concept, are specially the co-occurrence frequency between concept in the first row corresponding to cell and the first row;
Step 4, the basis of co-occurrence matrix generates similarity matrix in step 3;
Step 5, carry out cluster analysis on the basis of step 4, specifically utilize the pedigree clustering procedure that similarity matrix is carried out cluster, according to the statistic of cluster, determine the cluster result of concept afterwards, described concept comprises the user cognition concept of extraction and the concept in the web catalogue;
Step 6, utilize Multidimensional Scaling that the similarity matrix in the step 4 is analyzed, obtain the Multidimensional Scaling space diagram of corresponding dimension, analyze thereby finish websites collection optimization.
The present invention compared with prior art, its remarkable advantage is: (1) the present invention directly utilizes the web log file data to carry out user study, saves the cost of user's investigation, can comprehensively obtain user profile; (2) adopt quantitative calculation method, the result is accurate, analyzes the net result that obtains and can directly provide foundation for the websites collection system optimization; (3) cluster analysis and Multidimensional Scaling represent two key points " similarity " and " spatiality " of user's mental model, and both analysis results can verify mutually, represent intuitively visualization result.
Below in conjunction with accompanying drawing the present invention is described in further detail.
Description of drawings
Fig. 1 is the websites collection method for optimization analysis process flow diagram based on user's mental model of the present invention.
Fig. 2 is concept and self-defined group name clustering tree in the secondary classification catalogue.
Fig. 3 is the Multidimensional Scaling space diagram of concept and self-defined group name in the secondary classification catalogue.
Embodiment
A kind of websites collection method for optimization analysis based on user's mental model, step is as follows:
Step 1, the web log file data are carried out pre-service, are specially:
Step 1-1, the web log file data are purified, irrelevant or have wrong data with analysis purpose in the deletion journal file, the irrelevant data of described and analysis purpose comprise: comprise concept in the split catalog data, comprise product with the data of coded representation; The described data of mistake that exist comprise: misspelling, product description mistake; The attribute of selecting afterwards Data processing to need, described attribute comprises user's name, user region, user cognition concept, product category, described user cognition concept for the user based on the concept of optimizing about web catalogue that the cognition of web catalogue is submitted to, be that the user is when utilizing web catalogue to browse, when can not find only concept, in the more suitably concept of oneself thinking of website interactive interface submission; For example the user utilizes split catalog to search the books of data mining in Joyo.com, find that such books belong to split catalog " database " classification, think improper like this, the user thinks that data mining should directly appear in the split catalog classification, and at this moment " data mining " is exactly described user cognition concept.
Step 1-2, the data that purified among the step 1-1 are carried out format conversion, the form of the user cognition concept extracted and region, three attributes of title is unified, be specially and remove numbering, unified, the single plural number of capital and small letter is unified;
Step 1-3, determine the frequency that the user cognition concept occurs, setting threshold afterwards, threshold value is determined according to actual amount of data and extraction user cognition concept quantity, for example, in all less situation of actual amount of data and user cognition concept quantity, in order to obtain certain data volume, can set less threshold value.Choose the frequency greater than the user cognition concept of this threshold value, and the record frequency;
Step 2, determine the concept co-occurrence whether in user cognition concept and the web catalogue, specifically utilize the mental model category theory, the user cognition concept is retrieved concept and the frequency in the split catalog that occurs in the statistics result for retrieval to the website as search key;
According to the mental model category theory, when the user carries out acquisition of information at the website use split catalog, main employing level, the vertical and impartial click mode of horizontal vertical, in the click process according to the correlativity between concept in the split catalog, select the high concept of correlativity to click, utilize this principle, the user cognition concept is retrieved to the website as search key, concept and the frequency thereof in the split catalog that occurs in the statistics result for retrieval are with the correlativity between concept in analysis user cognitive concept and the web catalogue.
The mental model category theory is Charles Cole, modal three kinds of the mental model that the people such as Yang Lin found through experiments the people is vertical-type (26%), horizontal type (31%), and impartial type (21%) consist of 78% crowd's mental model type altogether.The classification of mental model is to determine according to the hierachy number among the mental model figure and number of regions.Three common class mental model features are as follows:
● vertical: the mental model that the level of vertical dimensions is Duoed than horizontal dimensions
● level: the mental model that the level of horizontal dimensions is Duoed than vertical dimensions
● equalization: the mental model that vertical dimensions and horizontal dimensions level equate
According to this theory, it is expanded to the user utilize split catalog to carry out in the information access process, suppose the user when the website use split catalog carries out acquisition of information, also adopt the mode of vertical, level and horizontal vertical equalization to click.
Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetric matrix, the first row and first is classified concept as, comprise the concept in user cognition concept and the web catalogue, the remaining element lattice are the co-occurrence frequency between concept, are specially the co-occurrence frequency between concept in the first row corresponding to cell and the first row;
Determine the co-occurrence frequency between concept in the co-occurrence matrix, concrete steps are as follows:
Step 3-1, determine the co-occurrence frequency of concept in the user cognition concept and classification catalogue specifically to be divided into two kinds
Situation: a kind of is the co-occurrence frequency of concept in user cognition concept and the secondary classification catalogue, is designated as F 1,
F 1The frequency that concept occurs in the secondary classification catalogue in the=p*x p=result for retrieval
The frequency that x=user cognition concept occurs
Another kind is the co-occurrence frequency of concept in user cognition concept and the reclassify catalogue, is the frequency that the user cognition concept occurs;
Step 3-2, determine the co-occurrence frequency between the concept in the split catalog, be the smaller value in the co-occurrence frequency of concept and all user cognition concepts in two split catalogs, to its summation, be designated as F afterwards 2, m, n represent respectively concept A in the split catalog, the co-occurrence frequency of B and user cognition concept, and used formula is:
F 2=SUM(MIN(m,n))
Step 3-3, determine the co-occurrence frequency between the user cognition concept, the co-occurrence frequency between the user cognition concept is 0.
Step 4, the basis of co-occurrence matrix generates similarity matrix in step 3;
Generate similarity matrix and specifically adopt the pearson relative coefficient to calculate as similarity, used formula is
r = Σ i = 1 n ( X i - X ‾ ) ( Y i - Y ‾ ) Σ i = 1 n ( X i - X ‾ ) 2 Σ i = 1 n ( Y i - Y ‾ ) 2
In the formula, r is the degree of two linear dependence powers between variable, usually satisfies 0≤r≤1, and n is sample size, x, y and
Figure BDA00002305796100052
Be respectively observed reading and the average of two variablees.
Step 5, carry out cluster analysis on the basis of step 4, specifically utilize the pedigree clustering procedure that similarity matrix is carried out cluster, according to the statistic of cluster, determine the cluster result of concept afterwards, described concept comprises the user cognition concept of extraction and the concept in the web catalogue;
Utilize the pedigree clustering procedure that similarity matrix is carried out cluster, afterwards according to the statistic of cluster, determine the cluster result of concept, specifically may further comprise the steps:
Step 5-1, determine the distance between sample, consist of the symmetry distance matrix, adopt T i, T jExpression sample i, j, d(T i, T j) distance of expression between i, the j, note by abridging and be d Ij, used variance weighted range formula is
d ij = [ Σ k = 1 p ( T ik - T ik ) 2 S k 2 ] 1 2
With N sample as N class, M p, M qRepresent two classes, contain respectively N p, N qIndividual sample, M p, M qBetween distance D very Pq, calculate sample distance between any two, consist of a symmetry distance matrix D (0);
Step 5-2, merging classification, generate new distance matrix, specifically select the least member on the off-diagonal among the D (0), if this least member is Dpq, at this moment Mp={Xp}, Mq={Xq} are merged into new class Mr={Xp, an Xq} with Mp, Mq, the corresponding ranks of cancellation Mp, Mq in D (0), and add by new class Mr to be delegation and the row that the distance between the class of polymerization forms with remaining other, to obtain new Distance matrix D (1) that it is N-1 rank square formations;
Step 5-3, repeating step 5-2 until N sample poly-be 1 large class;
Step 5-4, determine the cluster result of concept according to the statistic of pedigree clustering method, described statistic comprises: R 2Statistic, half is R partially 2Statistic, pseudo-F statistic, pseudo-t 2Statistic.
Step 6, utilize Multidimensional Scaling that the similarity matrix in the step 4 is analyzed, obtain the Multidimensional Scaling space diagram of corresponding dimension, analyze thereby finish websites collection optimization.Utilize Multidimensional Scaling that similarity matrix is analyzed, generate the Multidimensional Scaling space diagram, specifically may further comprise the steps:
Step 6-1, generation observing matrix, specifically utilize Euclid to stimulate the space to carry out spatial description, calculate based on the Minkowski Distance function: supposition is in web catalogue, and is tested cognitive as basic input data to the concept Relations Among, be provided with n object, can get
Figure BDA00002305796100061
Individual object to apart from S Ij, the distance table between some i and the j is shown d Ij, used formula is:
S ij = [ Σ a v ( x ia - x ja ) 2 ] 1 2
In the formula, v represents dimension, X IaCoordinate points i on the expression a dimension, X JaCoordinate points j on the expression a dimension;
Step 6-2, homomorphic mapping are specifically sought the q dimension space of a dimensionality reduction, do homomorphic mapping and process, and make d in the q dimension space IjBe object to the distance in the p space with former apart from S IjBe complementary, if d IjWith S IjBe complementary fully, the distance relation is d between each paired object I1>d I2>...>d Im, namely this distance that falls progressively is consistent with original similarity order of rising progressively;
Step 6-3, reliability and validity check, determine optimum number of dimensions, calculated difference degree K specifically, be called Cruise gram coefficient, be used for checking the space diagram that obtains whether to have effective representativeness and stress stress exponent, be the degree of fitting value, be defined as the departure between the distance of the theoretical of similarity assessment data representatives and calculating, Stress adopts formula to be:
Stress = Σ i Σ j ( d ij - d ^ ij ) 2 / Σ i Σ j d ij 2
D wherein IjBe to satisfy tested original input concept distance order relation, make again the reference value of stress exponent value minimum simultaneously.Above-mentioned K value is the bigger the better, and is being acceptable more than 0.60 generally; The stress value generally can be accepted with interior 0.20, and stress exponent size and degree of fitting relation sees Table 1 in detail
Table 1 stress exponent size and degree of fitting relation
Stress Degree of fitting
0.200 Bad
0.100 All right
0.050 Good
0.025 Very good
0.000 Fully match
Step 6-4, according to the optimum number of dimensions of determining among the step 6-3, generate the Multidimensional Scaling space diagram.
Below in conjunction with embodiment the present invention is done further detailed description:
Goal in research: the optimization of made in China net illuminating product split catalog is analyzed.
Data declaration: made in China net (international station http://www.made-in-china.com/) product classification catalogue Lights﹠amp; The large class of Lighting Zhejiang, Shanghai, Jiangsu, Guangdong four provinces and cities' User Defined group name data (6872 record).The made in China net is called self-defined group name with the user cognition concept.
Step 1 is carried out pre-service to the web log file data, is specially:
1) the web log file data are purified after, filter out the attribute that Data processing needs, comprise Business Name, province, city and self-defined group name, concrete form is as shown in table 2:
Data layout behind table 2 data purification
Figure BDA00002305796100072
2) first the numbering that comprises in the self-defined group name is removed, then self-defined group name is converted into small letter, remove plural form, and according to first letter mother sorts;
3) because the less self-defined group name quantity of the frequency that filters out is very large, threshold value is made as 4, selects the frequency greater than 4 User Defined group name, select at last 114 self-defined group names and record its frequency.The self-defined group name result who filters out is as shown in table 3:
The self-defined group name the selection result of table 3
Figure BDA00002305796100081
Step 2 is determined the concept co-occurrence whether in self-defined group name and the web catalogue.Concrete operations are as follows:
1) signs in to the international station of made in China net http://www.made-in-china.com/;
2) the self-defined group name that input need to be retrieved in frame retrieval is selected " Lights﹠amp in " all categories " drop-down menu; Lighting ", click search to then;
3) concept in the secondary classification order that occurs in the statistics result for retrieval " catalog ";
4) click successively the concept that occurs in " catalog ", the concept that occurs in " catalog " at this moment be the concept in the reclassify catalogue of correspondence;
The secondary classification catalogue that occurs among the record catalog, the concept in the reclassify catalogue, the corresponding unit lattice fill in 1, obtain original cooccurrence relation statistical form, and partial results is as shown in table 4:
Table 4 part co-occurrence is statistical form as a result
In the ensuing processing procedure, the processing procedure of concept is similar in self-defined group name and secondary classification catalogue and the reclassify catalogue, the below in the secondary classification catalogue concept and the cooccurrence relation of self-defined group name as example.
Step 3 generates co-occurrence matrix, is specially:
1) determines the co-occurrence frequency between concept and self-defined group name in the secondary classification catalogue; Specifically the frequency number with self-defined group name multiply by the frequency that concept occurs in the secondary classification catalogue, and it is as shown in table 5 to obtain partial results:
The co-occurrence frequency partial results of the concept in the table 5 secondary classification catalogue and self-defined group name
Figure BDA00002305796100091
2) determine the co-occurrence frequency between the concept in the secondary classification catalogue;
Calculate on the basis as a result in previous step, illustrate, the co-occurrence frequency such as Interior lighting and LED lighting, the row of these two concepts B, C by name among the excel, therefore formula is SUM(MIN(B, C)), namely at first select the less data of every delegation in two row, then summation;
3) the co-occurrence frequency between self-defined group name all fills out 0, and the co-occurrence matrix that obtains at last is as shown in table 6:
The co-occurrence matrix of concept and self-defined group name in the table 6 part secondary classification catalogue
Interiorlighting ledlighting lightingfixtureg bulblamp lightingdecoration
Interior_lighting 14441 6587 11403 10697
led_lighting 14441 6643 12204 11108
lighting_fixtures 6587 6643 6467 5836
bulb_lamp 11403 12204 6467 9433
lighting_decoration 10697 11108 5836 9433
outdoor_lighting 14640 17255 6620 12498 11189
camping_light 1116 1116 995 1100 1110
emergency_indicator_light 2245 2240 2129 2226 2205
torch 653 653 582 645 648
portable_lighting 1364 1388 1289 1356 1353
Step 4 generates similarity matrix, adopts SAS software, selects Pearson correlation coefficient to calculate, and obtains similarity matrix, and partial results is as shown in table 7:
Table 7 similarity matrix partial results
Step 5, cluster analysis, utilize SAS software, choose the pedigree clustering method, carry out cluster analysis, the between class distance method is chosen the methods such as ward, complete, single, through comparing, find result's the best that method=ward obtains, with the mode of sample with each merging two classes, the process operation result of last 15 merging is as shown in table 8:
Table 8SAS cluster process method=ward operation result table
Three statistics, half inclined to one side R according to the pedigree clustering method 2Statistic (SPRSQ), pseudo-F statistic (PSF), pseudo-t 2It is 4 that statistic (PST2) is selected optimum classification number.Totally 127 concepts in the cluster result, 114 self-defined group names wherein, 13 secondary classification catalogue concepts, best classification number is 4, wherein 13 second-level directory concepts are in the middle of two classes.Cluster result (runic is the concept in the secondary classification catalogue, and the concept of overstriking is not self-defined group name) as shown in table 9.
The self-defined group name of table 9 and second-level directory cluster result
Figure BDA00002305796100102
Four classes that mark in the clustering tree that Fig. 2 represents are mutually corresponding with the cluster result in the table 9.Cluster result represents to be gathered that correlativity is maximum between the concept in a class, as led_plug_light, induction_lamp in the 4th class, led_module, led_rigid_bar, led_moving_head, led_rope_light, these eight concepts of led_dance_floor, led_recessed_light by poly-be a class, then illustrate in all concepts, correlativity between these eight concepts is maximum, can place the same class classification.
Step 6, Multidimensional Scaling directly adopts the Multidimensional Scaling function in the SAS software to analyze in this example, can verify the accuracy of cluster result and the visual cluster result that represents by Multidimensional Scaling.
In order to make the Multidimensional Scaling result more clear, concept is replaced with variable X 1~X127, the variable numbering is consistent with concept sequence number in the cluster result.Can be found out that by the Multidimensional Scaling space diagram 127 concepts have been divided into four classes, it has verified cluster result dry straightly, has also showed very intuitively the cluster result of concept, thereby the split catalog optimization of having finished made in China net illumination series products is analyzed.
By above-mentioned example as can be known, method of the present invention directly utilizes the web log file data to carry out user study, saves the cost of user's investigation, can comprehensively obtain user profile.

Claims (6)

1. websites collection method for optimization analysis based on user's mental model is characterized in that step is as follows:
Step 1, the web log file data are carried out pre-service, are specially:
Step 1-1, the web log file data are purified, irrelevant or have wrong data with analysis purpose in the deletion journal file, the irrelevant data of described and analysis purpose comprise: comprise concept in the split catalog data, comprise product with the data of coded representation; The described data of mistake that exist comprise: misspelling, product description mistake; The attribute of selecting afterwards Data processing to need, described attribute comprises user's name, user region, user cognition concept, product category, described user cognition concept for the user based on the concept of optimizing about web catalogue that the cognition of web catalogue is submitted to;
Step 1-2, the data that purified among the step 1-1 are carried out format conversion, the form of the user cognition concept extracted and region, three attributes of title is unified, be specially and remove numbering, unified, the single plural number of capital and small letter is unified;
Step 1-3, determine the frequency that the user cognition concept occurs, setting threshold is chosen the frequency greater than the user cognition concept of this threshold value afterwards, and the record frequency;
Step 2, determine the concept co-occurrence whether in user cognition concept and the web catalogue, specifically utilize the mental model category theory, the user cognition concept is retrieved concept and the frequency in the split catalog that occurs in the statistics result for retrieval to the website as search key;
Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetric matrix, the first row and first is classified concept as, comprise the concept in user cognition concept and the web catalogue, the remaining element lattice are the co-occurrence frequency between concept, are specially the co-occurrence frequency between concept in the first row corresponding to cell and the first row;
Step 4, the basis of co-occurrence matrix generates similarity matrix in step 3;
Step 5, carry out cluster analysis on the basis of step 4, specifically utilize the pedigree clustering procedure that similarity matrix is carried out cluster, according to the statistic of cluster, determine the cluster result of concept afterwards, described concept comprises the user cognition concept of extraction and the concept in the web catalogue;
Step 6, utilize Multidimensional Scaling that the similarity matrix in the step 4 is analyzed, obtain the Multidimensional Scaling space diagram of corresponding dimension, analyze thereby finish websites collection optimization.
2. the websites collection method for optimization analysis based on user's mental model according to claim 1, it is characterized in that, in the step 2 according to the mental model category theory, when the user carries out acquisition of information at the website use split catalog, main employing level, the click mode of vertical and horizontal vertical equalization, in the click process according to the correlativity between concept in the split catalog, select the high concept of correlativity to click, utilize this principle, the user cognition concept is retrieved to the website as search key, concept and the frequency thereof in the split catalog that occurs in the statistics result for retrieval are with the correlativity between concept in analysis user cognitive concept and the web catalogue.
3. the websites collection method for optimization analysis based on user's mental model according to claim 1 is characterized in that, determines the co-occurrence frequency between concept in the co-occurrence matrix in the step 3, and concrete steps are as follows:
Step 3-1, determine the co-occurrence frequency of concept in the user cognition concept and classification catalogue, specifically be divided into two kinds of situations: a kind of is the co-occurrence frequency of concept in user cognition concept and the secondary classification catalogue, is designated as F 1,
F 1The frequency that concept occurs in the secondary classification catalogue in the=p*x p=result for retrieval
The frequency that x=user cognition concept occurs
Another kind is the co-occurrence frequency of concept in user cognition concept and the reclassify catalogue, is the frequency that the user cognition concept occurs;
Step 3-2, determine the co-occurrence frequency between the concept in the split catalog, be the smaller value in the co-occurrence frequency of concept and all user cognition concepts in two split catalogs, to its summation, be designated as F afterwards 2, m, n represent respectively concept A in the split catalog, the co-occurrence frequency of B and user cognition concept, and used formula is:
F 2=SUM(MIN(m,n))
Step 3-3, determine the co-occurrence frequency between the user cognition concept, the co-occurrence frequency between the user cognition concept is 0.
4. the websites collection method for optimization analysis based on user's mental model according to claim 1 is characterized in that, step 4 generates similarity matrix and specifically adopts the pearson relative coefficient to calculate as similarity, and used formula is
r = Σ i = 1 n ( X i - X ‾ ) ( Y i - Y ‾ ) Σ i = 1 n ( X i - X ‾ ) 2 Σ i = 1 n ( Y i - Y ‾ ) 2
In the formula, r is the degree of two linear dependence powers between variable, usually satisfies 0≤r≤1, and n is sample size, x, y and
Figure FDA00002305796000022
Be respectively observed reading and the average of two variablees.
5. the websites collection method for optimization analysis based on user's mental model according to claim 1, it is characterized in that, utilize the pedigree clustering procedure that similarity matrix is carried out cluster in the step 5, afterwards according to the statistic of cluster, determine the cluster result of concept, specifically may further comprise the steps:
Step 5-1, determine the distance between sample, consist of the symmetry distance matrix, adopt T i, T jExpression sample i, j, d(T i, T j) distance of expression between i, the j, note by abridging and be d Ij, used variance weighted range formula is
d ij = [ Σ k = 1 p ( T ik - T jk ) 2 S k 2 ] 1 2
With N sample as N class, M p, M qRepresent two classes, contain respectively N p, N qIndividual sample, M p, M qBetween distance D very Pq, calculate sample distance between any two, consist of a symmetry distance matrix D (0);
Step 5-2, merging classification, generate new distance matrix, specifically select the least member on the off-diagonal among the D (0), if this least member is Dpq, at this moment Mp={Xp}, Mq={Xq} are merged into new class Mr={Xp, an Xq} with Mp, Mq, the corresponding ranks of cancellation Mp, Mq in D (0), and add by new class Mr to be delegation and the row that the distance between the class of polymerization forms with remaining other, to obtain new Distance matrix D (1) that it is N-1 rank square formations;
Step 5-3, repeating step 5-2 until N sample poly-be 1 large class;
Step 5-4, determine the cluster result of concept according to the statistic of pedigree clustering method, described statistic comprises: R 2Statistic, half is R partially 2Statistic, pseudo-F statistic, pseudo-t 2Statistic.
6. the websites collection method for optimization analysis based on user's mental model according to claim 1 is characterized in that, utilizes Multidimensional Scaling that similarity matrix is analyzed in the step 6, generates the Multidimensional Scaling space diagram, specifically may further comprise the steps:
Step 6-1, generation observing matrix, specifically utilize Euclid to stimulate the space to carry out spatial description, calculate based on the Minkowski Distance function: supposition is in web catalogue, and is tested cognitive as basic input data to the concept Relations Among, be provided with n object, can get Individual object to apart from S Ij, the distance table between some i and the j is shown d Ij, used formula is:
S ij = [ Σ a v ( x ia - x ja ) 2 ] 1 2
In the formula, v represents dimension, X IaCoordinate points i on the expression a dimension, X JaCoordinate points j on the expression a dimension;
Step 6-2, homomorphic mapping are specifically sought the q dimension space of a dimensionality reduction, do homomorphic mapping and process, and make d in the q dimension space IjBe object to the distance in the p space with former apart from S IjBe complementary, if d IjWith S IjBe complementary fully, the distance relation is d between each paired object I1>d I2>...>d Im, namely this distance that falls progressively is consistent with original similarity order of rising progressively;
Step 6-3, reliability and validity check, determine optimum number of dimensions, calculated difference degree K specifically, be called Cruise gram coefficient, be used for checking the space diagram that obtains whether to have effective representativeness and stress stress exponent, be the degree of fitting value, be defined as between the distance of the theoretical of similarity assessment data representatives and calculating
Stress = Σ i Σ j ( d ij - d ^ ij ) 2 / Σ i Σ j d ij 2 Departure, Stress adopts formula to be:
D wherein IjBe to satisfy tested original input concept distance order relation, make again the reference value of stress exponent value minimum simultaneously;
Step 6-4, according to the optimum number of dimensions of determining among the step 6-3, generate the Multidimensional Scaling space diagram.
CN201210413774.8A 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model Active CN102937985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210413774.8A CN102937985B (en) 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210413774.8A CN102937985B (en) 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model

Publications (2)

Publication Number Publication Date
CN102937985A true CN102937985A (en) 2013-02-20
CN102937985B CN102937985B (en) 2016-07-06

Family

ID=47696882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210413774.8A Active CN102937985B (en) 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model

Country Status (1)

Country Link
CN (1) CN102937985B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793504A (en) * 2014-01-24 2014-05-14 北京理工大学 Cluster initial point selection method based on user preference and project properties
CN104199828A (en) * 2014-07-26 2014-12-10 复旦大学 Method for establishing social network based on transaction log data
CN106202572A (en) * 2016-08-18 2016-12-07 广州视睿电子科技有限公司 E-book catalog indication method and device
CN109166180A (en) * 2018-08-03 2019-01-08 贵州大学 VR system user Experience design method under mental model driving
CN112347318A (en) * 2020-10-26 2021-02-09 杭州数智政通科技有限公司 Method, device and medium for dividing industry classes of enterprises

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
吴鹏等: "《网站用户信息获取中的心智模型研究》", 《情报学报》 *
甘利人等: "《信息用户检索决策中的心智模型分析》", 《信息用户检索决策中的心智模型分析》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793504A (en) * 2014-01-24 2014-05-14 北京理工大学 Cluster initial point selection method based on user preference and project properties
CN103793504B (en) * 2014-01-24 2018-02-27 北京理工大学 A kind of cluster initial point system of selection based on user preference and item attribute
CN104199828A (en) * 2014-07-26 2014-12-10 复旦大学 Method for establishing social network based on transaction log data
CN104199828B (en) * 2014-07-26 2017-07-07 复旦大学 A kind of community network construction method based on transaction journal data
CN106202572A (en) * 2016-08-18 2016-12-07 广州视睿电子科技有限公司 E-book catalog indication method and device
CN109166180A (en) * 2018-08-03 2019-01-08 贵州大学 VR system user Experience design method under mental model driving
CN109166180B (en) * 2018-08-03 2022-12-13 贵州大学 VR system user experience design method under drive of mental model
CN112347318A (en) * 2020-10-26 2021-02-09 杭州数智政通科技有限公司 Method, device and medium for dividing industry classes of enterprises
CN112347318B (en) * 2020-10-26 2022-08-02 杭州数智政通科技有限公司 Method, device and medium for dividing industry classes of enterprises

Also Published As

Publication number Publication date
CN102937985B (en) 2016-07-06

Similar Documents

Publication Publication Date Title
CN103823896B (en) Subject characteristic value algorithm and subject characteristic value algorithm-based project evaluation expert recommendation algorithm
CN106055539B (en) The method and apparatus that name disambiguates
CN101409634B (en) Quantitative analysis tools and method for internet news influence based on information retrieval
Baccini et al. How many performance measures to evaluate information retrieval systems?
CN103559191B (en) Based on latent space study and Bidirectional sort study across media sort method
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
CN102937985B (en) A kind of websites collection method for optimization analysis based on user's mental model
CN107203849B (en) Regional talent supply quantitative analysis method based on big data
CN101751447A (en) Network image retrieval method based on semantic analysis
CN107016501A (en) A kind of efficient industrial big data multidimensional analysis method
Li et al. Identifying patent conflicts: TRIZ-led patent mapping
CN105354325A (en) Document retrieval and analysis system
CN101702167A (en) Method for extracting attribution and comment word with template based on internet
CN111192176A (en) Online data acquisition method and device supporting education informatization assessment
CN106354799A (en) Subject data set multi-layer facet filtration method and system based on data quality
CN106611016A (en) Image retrieval method based on decomposable word pack model
CN102902984B (en) Remote-sensing image semi-supervised projection dimension reducing method based on local consistency
Petrovich et al. Exploring knowledge dynamics in the humanities. Two science mapping experiments
Sahu et al. Image Mining: A New Approach for data mining based on texture
CN106941419B (en) visual analysis method and system for network architecture and network communication mode
Ritze Web-scale web table to knowledge base matching
CN106709824A (en) Method for architecture evaluation based on network text semantic analysis
Vasconcelos et al. The utility of open-access biodiversity information in representing anurans in the Brazilian Atlantic Forest and Cerrado
Murata Modularities for bipartite networks
Xiaohuan et al. Visual exploration for time series data using multivariate analysis method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wu Peng

Inventor after: Zhang Peipei

Inventor after: Zhang Jingjing

Inventor before: Wu Peng

Inventor before: Zhang Peipei

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: WU PENG ZHANG PEIPEI TO: WU PENG ZHANG PEIPEI ZHANG JINGJING

C53 Correction of patent of invention or patent application
CB03 Change of inventor or designer information

Inventor after: Wu Peng

Inventor after: Zhang Peipei

Inventor after: Zhang Jingjing

Inventor after: Zeng Huaxiang

Inventor before: Wu Peng

Inventor before: Zhang Peipei

Inventor before: Zhang Jingjing

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: WU PENG ZHANG PEIPEI ZHANG JINGJING TO: WU PENG ZHANG PEIPEI ZHANG JINGJING CENG HUAXIANG

C14 Grant of patent or utility model
GR01 Patent grant