CN102937985B

CN102937985B - A kind of websites collection method for optimization analysis based on user's mental model

Info

Publication number: CN102937985B
Application number: CN201210413774.8A
Authority: CN
Inventors: 吴鹏; 张佩佩; 张晶晶; 曾华翔
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2012-10-25
Filing date: 2012-10-25
Publication date: 2016-07-06
Anticipated expiration: 2032-10-25
Also published as: CN102937985A

Abstract

The invention discloses a kind of websites collection method for optimization analysis based on user's mental model.First web log file data are pre-processed by the method, and described daily record data is the data comprising the concept optimized about web catalogue that user submits to based on the cognition to web catalogue, extracts concept by pretreatment from daily record data；Then utilize user's mental model category theory, determine that the cooccurrence relation between the concept in itself and web catalogue, described concept refer to web catalogue specific name, such as books, daily necessities；Afterwards cooccurrence relation is converted to co-occurrence matrix；Co-occurrence matrix is converted to similarity matrix by recycling pearson Coefficient Algorithm；Finally carry out cluster analysis and Multidimensional Scaling, to analyze user about the similitude between the concept of web catalogue cognition and spatiality.Utilize above-mentioned six steps just can provide decision support from quantitative angle for websites collection optimization based on website user's mental model.

Description

A kind of websites collection method for optimization analysis based on user's mental model

Technical field

The present invention relates to a kind of websites collection method for optimization analysis, a kind of website based on user's mental model is divided Class method for optimization analysis.

Background technology

Site information taxonomic hierarchies optimization is on the basis of the existing information classification system in assessment website, decides whether Adjust the existing information classification system in website, if desired adjust, determine how adjustment.Excellent currently for site information taxonomic hierarchies The research of change method is short of the most very much, is concentrated mainly on classification foundation, standard, the research of principle scheduling theory and the level of classification, granularity In research etc. particular problem phenomenon, also focusing simply on and look for defect from existing sorting technique, rarer research is necessarily Theoretical method support under carry out profound level exploration.

Norman proposes in interaction design in his " The Design ofEveryday Things " book first There are 3 kinds of mental models, i.e. presentation model, user's mental model, system model, he thinks presentation model and user's mental model Closer to time, user is more understood that web site organization structure, can more efficiently carry out acquisition of information.Therefore divide carrying out site information The mental model of user, i.e. user's cognition to websites collection system should be examined or check during class system optimization as far as possible.

In the mental model of website user is measured, psychology approximate data is the master of relation between the concept by individual perception Seeing assessment, wherein " similitude ", " spatiality " are mainly to measure angle.The quantitative measurment of mental model is the most all with concept phase It is starting point like property, extracts the related notion in research theme, use different sorting technique contrived experiments, obtain tested to generally Data are analyzed by the similarity assessment data read, and characterize user's mental model about correlative study theme.Cluster analysis The most just it is used to process concept similarity data, according to concept similarity, concept is classified.And the spatiality of concept is Refer to the different concepts relative position (Rusbult C.E, Onizuka R.K, Lipkus I.What in tested psychological space Do We Really Want?:MentalModels of Ideal Romantic Involvement Explored through Multidimensional Scaling[J].Journal ofExperimental Social Psychology, 1993,29 (9): 493-527), multi-dimentional scale method can be used for the measurement of concept space, obtains user's spatial table about concept Levy, observe user's mental model about a certain field concept intuitively.And currently combine both for websites collection In catalogue optimization, user's mental model surveys quantifier elimination does not also have.

Currently the research to website user is the most all to rest on traditional user to investigate the stage, and the user of employing investigates mode Specifically include that scene method, focus group, usability testing, in-depth interview, observation etc., but these methods all exist limitation Property, it is possible to the valid data got are extremely limited, when Expenses Cost is high and uses these modes to carry out user investigation, relate to asking Topic can not be too many, and the information that therefore obtains is the most macroscopical, is difficult to obtain the detailed information actually useful to user behavior research.

Therefore still there are some problems in websites collection system optimization method: (1) is difficult to carry out effective user study, very Difficult collection user's cognition to website all sidedly；(2) seldom " customer-centric " carries out websites collection system optimization.

Summary of the invention

Technical problem solved by the invention is to provide a kind of websites collection optimization based on user's mental model to analyze Method.

The technical solution realizing the object of the invention is: a kind of websites collection optimization based on user's mental model is analyzed Method, step is as follows:

Step 1, web log file data are pre-processed, particularly as follows:

Step 1-1, web log file data are purified, delete unrelated with analysis purpose in journal file or there is mistake Data, the described data unrelated with analysis purpose include: comprise the data of concept in classified catalogue, comprise product code table The data shown；The described data that there is mistake include: misspelling, product description mistake；Data processing is selected to need afterwards Attribute, described attribute includes user's name, user region, user cognition concept, product category, described user cognition concept The concept optimized about web catalogue submitted to based on the cognition to web catalogue for user；

Step 1-2, data cleaned in step 1-1 are carried out format conversion, by the user cognition concept extracted and ground Territory, the form of three attributes of title are unified, and specially remove numbering, capital and small letter plural number unification unified, single；

Step 1-3, determine the frequency that user cognition concept occurs, set threshold value afterwards, threshold value according to actual amount of data and Extract user cognition concept quantity to determine, choose the frequency user cognition concept more than this threshold value, and record the frequency；

Step 2, the concept determined in user cognition concept and web catalogue whether co-occurrence, specifically utilizes intelligence mould Type category theory, retrieves user cognition concept to website as search key, and occur in statistics retrieval result divides Concept in class catalogue and the frequency；

Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetrical matrix, and the first row and first is classified as concept, including using Concept in family cognitive concept and web catalogue, the co-occurrence frequency that remaining element lattice are between concept, specially cell The co-occurrence frequency between concept in corresponding the first row and first row；

Similarity matrix is generated on the basis of step 4, in step 3 co-occurrence matrix；

Step 5, on the basis of step 4, carry out cluster analysis, specifically utilize pedigree clusters that similarity matrix is carried out Cluster, afterwards according to cluster statistic, determine the cluster result of concept, described concept include extract user cognition concept and Concept in web catalogue；

Step 6, utilize Multidimensional Scaling that the similarity matrix in step 4 is analyzed, obtain corresponding dimension many Dimension dimensional analysis space diagram, thus complete websites collection optimization and analyze.

Compared with prior art, its remarkable advantage is the present invention: (1) present invention directly utilizes web log file data and uses Family is studied, and saves the cost of user's investigation, can comprehensively obtain user profile；(2) using quantitative calculation method, result is accurate Really, analyze the final result obtained and directly can provide foundation for websites collection system optimization；(3) cluster analysis and multi-dimentional scale divide Both analysis represent two key points " similitude " and " spatiality " of user's mental model, and both analysis results can mutually be tested Card, represents visualization result intuitively.

Below in conjunction with the accompanying drawings the present invention is described in further detail.

Accompanying drawing explanation

Fig. 1 is the websites collection method for optimization analysis flow chart based on user's mental model of the present invention.

Fig. 2 is concept and self-defined group name clustering tree in secondary classification catalogue.

Fig. 3 is concept and the Multidimensional Scaling space diagram of self-defined group name in secondary classification catalogue.

Detailed description of the invention

A kind of websites collection method for optimization analysis based on user's mental model, step is as follows:

Step 1, web log file data are pre-processed, particularly as follows:

Step 1-1, web log file data are purified, delete unrelated with analysis purpose in journal file or there is mistake Data, the described data unrelated with analysis purpose include: comprise the data of concept in classified catalogue, comprise product code table The data shown；The described data that there is mistake include: misspelling, product description mistake；Data processing is selected to need afterwards Attribute, described attribute includes user's name, user region, user cognition concept, product category, described user cognition concept The concept optimized about web catalogue submitted to based on the cognition to web catalogue for user, i.e. user utilizes net When classified catalogue of standing browses, when can not find most suitable concept, oneself think more suitable what website interactive interface was submitted to Concept；Such as user utilizes classified catalogue to search the books of data mining in Joyo.com, finds that such books belong to classification Catalogue " database " classification, it is believed that the most improper, user thinks that data mining should occur directly in classified catalogue classification, At this moment " data mining " is exactly described user cognition concept.

Step 1-3, determine the frequency that user cognition concept occurs, set threshold value afterwards, threshold value according to actual amount of data and Extract user cognition concept quantity to determine, such as, in the case of actual amount of data and user cognition concept quantity are the least, for The data volume that acquisition is certain, can set less threshold value.Choose the frequency user cognition concept more than this threshold value, and record frequency Secondary；

According to mental model category theory, user, when website use classified catalogue carries out acquisition of information, mainly uses water Flat, vertically and horizontally vertical impartial click mode, according to correlation between concept in classified catalogue during click, selects phase The concept of Guan Xinggao is clicked on, and utilizes this principle, user cognition concept is retrieved as search key to website, Concept in the classified catalogue occurred in statistics retrieval result and the frequency thereof, to analyze user cognition concept and web catalogue Correlation between middle concept.

Mental model category theory is that Charles Cole, Yang Lin et al. is found through experiments the mental model of people Common three kinds are vertical-type (26%), horizontal type (31%), and impartial type (21%), altogether constitute the mental model class of 78% crowd Type.The classification of mental model is based on what the hierachy number in mental model figure and number of regions determined.Three common class mental models Feature is as follows:

● vertical: the mental model that the level of vertical dimensions is more than horizontal dimensions

● level: the mental model that the level of horizontal dimensions is more than vertical dimensions

● impartial: vertical dimensions and the equal mental model of horizontal dimensions level

Theoretical according to this, expanded to user and utilize classified catalogue to carry out in information access process, it is assumed that Yong Hu When website use classified catalogue carries out acquisition of information, the mode being also adopted by vertical, level and horizontal vertical equalization is clicked on.

Determine the co-occurrence frequency between concept in co-occurrence matrix, specifically comprise the following steps that

Step 3-1, determine the co-occurrence frequency of concept in user cognition concept and classified catalogue, be specifically divided into two kinds

Situation: a kind of is user cognition concept and the co-occurrence frequency of concept in secondary classification catalogue, is designated as F₁,

F₁The frequency that in=p*x p=retrieval result, in secondary classification catalogue, concept occurs

The frequency that x=user cognition concept occurs

Another kind is user cognition concept and the co-occurrence frequency of concept in three grades of classified catalogues, is user cognition concept and goes out The existing frequency；

Step 3-2, determine the co-occurrence frequency between concept in classified catalogue, be in two classified catalogues concept with all Smaller value in the co-occurrence frequency of user cognition concept, sums afterwards, is designated as F₂, m, n represent in classified catalogue general respectively Reading the co-occurrence frequency of A, B and user cognition concept, formula used is:

F₂=SUM (MIN (m, n))

Step 3-3, the co-occurrence frequency determined between user cognition concept, the co-occurrence frequency between user cognition concept is 0.

Generating similarity matrix specifically uses pearson relative coefficient to calculate as similarity, and formula used is

r = \frac{Σ_{i = 1}^{n} (X_{i} - \overset{&OverBar;}{X}) (Y_{i} - \overset{&OverBar;}{Y})}{\sqrt{Σ_{i = 1}^{n} {(X_{i} - \overset{&OverBar;}{X})}^{2}} \sqrt{Σ_{i = 1}^{n} {(Y_{i} - \overset{&OverBar;}{Y})}^{2}}}

In formula, r is the degree that the linear correlation between two variablees is strong and weak, generally meets 0≤r≤1, and n is sample size, x, y WithIt is respectively observation and the average of two variablees.

Utilize pedigree clusters that similarity matrix is clustered, afterwards according to the statistic of cluster, determine the poly-of concept Class result, specifically includes following steps:

Step 5-1, the distance determined between sample, constitute symmetry distance matrix, uses T_i, T_jRepresent sample i, j, d(T_i, T_j) represent the distance between i, j, it is abbreviated as d_ij, variance weighted range formula used is

d_{ij} = {[Σ_{k = 1}^{p} \frac{{(T_{ik} - T_{ik})}^{2}}{S_{k}^{2}}]}^{\frac{1}{2}}

Using N number of sample as N number of class, M_p、M_qRepresent two classes, contain N respectively_p, N_qIndividual sample, M_p、M_qBetween distance Extremely D_pq, calculate sample distance between any two, constitute a symmetry distance matrix D (0)；

Step 5-2, merging classification, generate new distance matrix, specifically select the smallest element on off-diagonal in D (0) Element, if this least member is Dpq, at this moment Mp={Xp}, Mq={Xq}, Mp, Mq are merged into new class Mr={Xp, an Xq}, D (0) eliminates the ranks corresponding to Mp, Mq, and there was added new class Mr with remaining other be the distance institute group between the class be polymerized The a line become and row, obtain new Distance matrix D (1), and it is N-1 rank square formations；

Step 5-3, repetition step 5-2 are until it is 1 big class that N number of sample gathers；

Step 5-4, statistic according to pedigree clustering method determine the cluster result of concept, and described statistic includes: R² Statistic, half R partially²Statistic, Pseudo F-Statistics, pseudo-t²Statistic.

Step 6, utilize Multidimensional Scaling that the similarity matrix in step 4 is analyzed, obtain corresponding dimension many Dimension dimensional analysis space diagram, thus complete websites collection optimization and analyze.Multidimensional Scaling is utilized similarity matrix to be carried out point Analysis, generates Multidimensional Scaling space diagram, specifically includes following steps:

Step 6-1, generation observing matrix, specifically utilize Euclid to stimulate space to carry out spatial description, based on Min Kefu This base distance function calculates: assuming that in web catalogue, tested to relation cognition between concept as basic input Data, are provided with n object, can obtainDistance S between individual object pair_ij, the distance between point i and j is expressed as d_ij, institute With formula it is:

S_{ij} = {[Σ_{a}^{v} {(x_{ia} - x_{ja})}^{2}]}^{\frac{1}{2}}

In formula, v represents dimension, X_iaRepresent coordinate points i in a dimension, X_jaRepresent coordinate points j in a dimension；

Step 6-2, Homomorphic Mapping, specifically find the q dimension space of a dimensionality reduction, does Homomorphic Mapping and processes, makes q dimension space Interior d_ijI.e. object is to the distance in p space and former distance S_ijMatch, if d_ijWith S_ijMatch completely, each paired object Spacing relation is d_i1＞ d_i2＞ ... ＞ d_im, i.e. this distance that falls progressively is consistent with the original similarity order risen progressively；

Step 6-3, reliability and validity inspection, determine optimum number of dimensions, specifically calculate difference degree K, referred to as Cruise Gram coefficient, whether the space diagram obtained for inspection institute has the most representative and stress stress exponent, for degree of fitting value, Being defined as the departure between theoretical and the distance of calculating that similarity assessment data represent, Stress employing formula is:

Stress = \sqrt{\underset{i}{Σ} \underset{j}{Σ} {(d_{ij} - {\hat{d}}_{ij})}^{2} / \underset{i}{Σ} \underset{j}{Σ} d_{ij}^{2}}

Wherein d_ijIt is to meet the tested concept distance order relation that is originally inputted, makes again the reference that stress exponent value is minimum simultaneously Value.Above-mentioned K value is the bigger the better, and is typically above acceptable 0.60；Stress value typically can accept, in detail within 0.20 Thin stress exponent size is shown in Table 1 with degree of fitting relation

Table 1 stress exponent size and degree of fitting relation

Stress	Degree of fitting
		0.200	Bad
0.100	All right
		0.050	Good
0.025	The best
		0.000	Matching completely

Step 6-4, according to the optimum number of dimensions determined in step 6-3, generate Multidimensional Scaling space diagram.

Below in conjunction with embodiment the present invention done further detailed description:

Goal in research: made in China net illuminating product classified catalogue optimization is analyzed.

Data illustrate: made in China net (international station http://www.made-in-china.com/) product classification catalogue Lights&Lighting big class Zhejiang, Shanghai, Jiangsu, Guangdong four provinces and cities User Defined group name data (6872 record).In State manufactures net and user cognition concept is referred to as self-defined group name.

Web log file data are pre-processed by step 1, particularly as follows:

1), after web log file data being purified, the attribute that Data processing needs is filtered out, including Business Name, province Part, city and self-defined group name, concrete form is as shown in table 2:

Data form after table 2 data purification

2) first the numbering comprised in self-defined group name is removed, then self-defined group name is converted into small letter, remove plural number Form, and according to first letter mother is ranked up；

3) very big due to the self-defined group name quantity that the frequency filtered out is less, threshold value is set to 4, selects the frequency more than 4 User Defined group name, finally select 114 self-defined group names and record its frequency.The self-defined group name result filtered out is such as Shown in table 3:

Table 3 self-defined group name the selection result

Step 2, determines the whether co-occurrence of the concept in self-defined group name and web catalogue.Concrete operations are as follows:

1) station, made in China net world http://www.made-in-china.com/ is signed in；

2) in frame retrieval, input needs the self-defined group name of retrieval, selects in " all categories " drop-down menu " Lights&Lighting ", then clicks on search；

3) concept in the secondary classification mesh occurred in statistics retrieval result " catalog "；

4) clicking on the concept occurred in " catalog " successively, the concept now occurred in " catalog " is corresponding three grades Concept in classified catalogue；

Recording the concept in the secondary classification catalogue occurred in catalog, three grades of classified catalogues, corresponding unit lattice fill in 1, Obtaining original cooccurrence relation statistical form, partial results is as shown in table 4:

Table 4 part co-occurrence result statistical form

In ensuing processing procedure, self-defined group name and the process of concept in secondary classification catalogue and three grades of classified catalogues Process is similar to, below as a example by the cooccurrence relation of the concept in secondary classification catalogue and self-defined group name.

Step 3, generates co-occurrence matrix, particularly as follows:

1) the co-occurrence frequency between concept and self-defined group name in secondary classification catalogue is determined；Specifically by self-defined group name Frequency number is multiplied by the frequency that in secondary classification catalogue, concept occurs, obtains partial results as shown in table 5:

Concept in table 5 secondary classification catalogue and the co-occurrence frequency partial results of self-defined group name

2) the co-occurrence frequency between the concept in secondary classification catalogue is determined；

Calculate on the basis of previous step result, illustrate, such as Interior lighting and LED lighting The co-occurrence frequency, the row of the two concept entitled B, C in excel, therefore formula is SUM(MIN(B, C)), first select two The data that in row, every a line is less, then sue for peace；

3) the co-occurrence frequency between self-defined group name all fills out 0, and the co-occurrence matrix finally obtained is as shown in table 6:

Concept and the co-occurrence matrix of self-defined group name in table 6 part secondary classification catalogue

	Interiorlighting	ledlighting	lightingfixtureg	bulblamp	lightingdecoration
						Interior_lighting		14441	6587	11403	10697
led_lighting	14441		6643	12204	11108
						lighting_fixtures	6587	6643		6467	5836
bulb_lamp	11403	12204	6467		9433
						lighting_decoration	10697	11108	5836	9433
outdoor_lighting	14640	17255	6620	12498	11189
						camping_light	1116	1116	995	1100	1110
emergency_indicator_light	2245	2240	2129	2226	2205
						torch	653	653	582	645	648
portable_lighting	1364	1388	1289	1356	1353

Step 4, generates similarity matrix, uses SAS software, selects Pearson correlation coefficient to calculate, obtains similar Property matrix, partial results is as shown in table 7:

Table 7 similarity matrix partial results

Step 5, cluster analysis, utilize SAS software, choose pedigree clustering method, carry out cluster analysis, between class distance method Choose the methods such as ward, complete, single, through comparing, find that the result that method=ward obtains is optimal, by sample In the way of merging two classes, last 15 process operation results merged are as shown in table 8 every time:

Table 8SAS cluster process method=ward operation result table

Three statistics according to pedigree clustering method half R partially²Statistic (SPRSQ), Pseudo F-Statistics (PSF), pseudo-t²System Metering (PST2) selects optimum classification number to be 4.Totally 127 concepts, wherein 114 self-defined group names in cluster result, 13 two Level classified catalogue concept, optimal classification number is 4, and wherein 13 second-level directory concepts are in the middle of two classes.Cluster result is such as Shown in table 9 (runic is the concept in secondary classification catalogue, and the concept of non-overstriking is self-defined group name).

The self-defined group name of table 9 and second-level directory cluster result

Four classes marked in the clustering tree that Fig. 2 represents are the most corresponding with the cluster result in table 9.Cluster result represents Between the concept gathered in a class, correlation is maximum, as led_plug_light in the 4th class, induction_lamp, led_module、led_rigid_bar、led_moving_head、led_rope_light、led_dance_floor、led_ It is a class that these eight concepts of recessed_light are gathered, then illustrate in all concepts, the correlation between these eight concepts It is maximum, is placed in same class classification.

Step 6, Multidimensional Scaling, this example directly use the Multidimensional Scaling function in SAS software be analyzed, The accuracy of cluster result can be verified by Multidimensional Scaling, and visualization represents cluster result.

In order to make Multidimensional Scaling result relatively sharp, concept variable X 1～X127 being replaced, variable numbering is with poly- Concept sequence number in class result is consistent.Be can be seen that by Multidimensional Scaling space diagram, 127 concepts have been divided into four classes, its knot Fruit demonstrates cluster result well, illustrates the cluster result of concept the most intuitively, thus completes made in China net and shine The classified catalogue optimization of bright series products is analyzed.

From above-mentioned example, the method for the present invention directly utilizes web log file data and carries out user study, saves user The cost of investigation, can comprehensively obtain user profile.

Claims

1. a websites collection method for optimization analysis based on user's mental model, it is characterised in that step is as follows:

Step 1, web log file data are pre-processed, particularly as follows:

Step 1-1, web log file data are purified, delete number that is unrelated with analysis purpose in journal file or that there is mistake According to, the described data unrelated with analysis purpose include: comprise the data of concept in classified catalogue, comprise what product code represented Data；The described data that there is mistake include: misspelling, product description mistake；Select the genus that Data processing needs afterwards Property, described attribute includes user's name, user region, user cognition concept, product category, and described user cognition concept is for using The concept optimized about web catalogue that family is submitted to based on the cognition to web catalogue；

Step 1-2, data cleaned in step 1-1 are carried out format conversion, by the user cognition concept extracted and region, name The form claiming three attributes is unified, and specially removes numbering, capital and small letter plural number unification unified, single；

Step 1-3, determine the frequency that user cognition concept occurs, set threshold value afterwards, choose the frequency user more than this threshold value Cognitive concept, and record the frequency；

Step 2, the concept determined in user cognition concept and web catalogue whether co-occurrence, specifically utilizes mental model to divide Class is theoretical, user cognition concept is retrieved as search key to website, the classification mesh occurred in statistics retrieval result Concept in record and the frequency；

Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetrical matrix, and the first row and first is classified as concept, recognizes including user Knowing the concept in concept and web catalogue, the co-occurrence frequency that remaining element lattice are between concept, specially cell are corresponding The first row and first row in the co-occurrence frequency between concept；

Step 5, on the basis of step 4, carry out cluster analysis, specifically utilize pedigree clusters that similarity matrix is clustered, Afterwards according to the statistic of cluster, determine that the cluster result of concept, described concept include user cognition concept and the website extracted Concept in classified catalogue, specifically includes following steps:

Step 5-1, the distance determined between sample, constitute symmetry distance matrix, uses T_i, T_jRepresent sample i, j, d (T_i, T_j) represent Distance between i, j, is abbreviated as d_ij, variance weighted range formula used is

Step 5-2, merging classification, generate new distance matrix, specifically select the least member on off-diagonal in D (0), will This least member value is assigned to Dpq, at this moment Mp={Xp}, Mq={Xq}, and Mp, Mq are merged into new class Mr={Xp, an Xq}, D (0) eliminates the ranks corresponding to Mp, Mq, and adds by the distance institute group between new class Mr and other unpolymerized classes remaining The a line become and row, obtain new Distance matrix D (1), and it is N-1 rank square formations；

Step 5-4, statistic according to pedigree clustering method determine the cluster result of concept, and described statistic includes: R²Statistics Amount, half R partially²Statistic, Pseudo F-Statistics, pseudo-t²Statistic；

Step 6, utilize Multidimensional Scaling that the similarity matrix in step 4 is analyzed, obtain the multidimensional chi of corresponding dimension Degree analysis space figure, thus complete websites collection optimization and analyze, utilize Multidimensional Scaling that similarity matrix is analyzed, raw Become Multidimensional Scaling space diagram, specifically include following steps:

Step 6-1, generation observing matrix, specifically utilize Euclid to stimulate space to carry out spatial description, based on Minkowski Distance function calculates: assuming that in web catalogue, tested to relation cognition between concept as substantially inputting data, It is provided with n object, can obtainDistance S between individual object pair_ij, the distance between point i and j is expressed as d_ij, public affairs used Formula is:

Step 6-2, Homomorphic Mapping, specifically find the q dimension space of a dimensionality reduction, does Homomorphic Mapping and processes, makes d in q dimension space_ij With former distance S_ijMatch, described d_ijFor object to the distance in q space, if d_ijWith S_ijMatch completely, each the most right As spacing relation is d_i1＞ d_i2＞ ... ＞ d_im, i.e. this distance that falls progressively is consistent with the original similarity order risen progressively；

Step 6-3, reliability and validity inspection, determine optimum number of dimensions, specifically calculates difference degree K, referred to as Cruise gram and is Number, whether the space diagram obtained for inspection institute has the most representative and stress stress exponent, for degree of fitting value, definition For the departure between theoretical and the distance of calculating that similarity assessment data represent, Stress uses

Formula is:

Wherein d_ijIt is to meet the tested concept distance order relation that is originally inputted, makes again the reference value that stress exponent value is minimum simultaneously；

Websites collection method for optimization analysis based on user's mental model the most according to claim 1, it is characterised in that step According to mental model category theory in rapid 2, user when website use classified catalogue carries out acquisition of information, mainly use level, Vertically and horizontally vertical impartial click mode, according to correlation between concept in classified catalogue during click, selects relevant Property high concept click on, utilize this principle, user cognition concept retrieved as search key to website, system Concept in the classified catalogue occurred in meter retrieval result and the frequency thereof, to analyze in user cognition concept and web catalogue Correlation between concept.

Websites collection method for optimization analysis based on user's mental model the most according to claim 1, it is characterised in that step Determine the co-occurrence frequency between concept in co-occurrence matrix in rapid 3, specifically comprise the following steps that

Step 3-1, determine the co-occurrence frequency of concept in user cognition concept and classified catalogue, be specifically divided into two kinds of situations:

A kind of is user cognition concept and the co-occurrence frequency of concept in secondary classification catalogue, is designated as F₁,

F₁=p*x, the frequency that during wherein p is retrieval result, in secondary classification catalogue, concept occurs

X is the frequency that user cognition concept occurs

Another kind is user cognition concept and the co-occurrence frequency of concept in three grades of classified catalogues, is what user cognition concept occurred The frequency；

Step 3-2, determine the co-occurrence frequency between concept in classified catalogue, be concept and all users in two classified catalogues Smaller value in the co-occurrence frequency of cognitive concept, sums afterwards, is designated as F₂, m, n represent concept A in classified catalogue, B respectively With the co-occurrence frequency of user cognition concept, formula used is:

F₂=SUM (MIN (m, n))

Websites collection method for optimization analysis based on user's mental model the most according to claim 1, it is characterised in that step Rapid 4 generate similarity matrix specifically uses pearson relative coefficient to calculate as similarity, and formula used is

In formula, r is the degree that linear correlation between two variablees is strong and weak, generally meets 0≤r≤1, and n is sample size, X, Y andIt is respectively observation and the average of two variablees.