CN102937985B - A kind of websites collection method for optimization analysis based on user's mental model - Google Patents

A kind of websites collection method for optimization analysis based on user's mental model Download PDF

Info

Publication number
CN102937985B
CN102937985B CN201210413774.8A CN201210413774A CN102937985B CN 102937985 B CN102937985 B CN 102937985B CN 201210413774 A CN201210413774 A CN 201210413774A CN 102937985 B CN102937985 B CN 102937985B
Authority
CN
China
Prior art keywords
concept
user
catalogue
cognition
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210413774.8A
Other languages
Chinese (zh)
Other versions
CN102937985A (en
Inventor
吴鹏
张佩佩
张晶晶
曾华翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201210413774.8A priority Critical patent/CN102937985B/en
Publication of CN102937985A publication Critical patent/CN102937985A/en
Application granted granted Critical
Publication of CN102937985B publication Critical patent/CN102937985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of websites collection method for optimization analysis based on user's mental model.First web log file data are pre-processed by the method, and described daily record data is the data comprising the concept optimized about web catalogue that user submits to based on the cognition to web catalogue, extracts concept by pretreatment from daily record data;Then utilize user's mental model category theory, determine that the cooccurrence relation between the concept in itself and web catalogue, described concept refer to web catalogue specific name, such as books, daily necessities;Afterwards cooccurrence relation is converted to co-occurrence matrix;Co-occurrence matrix is converted to similarity matrix by recycling pearson Coefficient Algorithm;Finally carry out cluster analysis and Multidimensional Scaling, to analyze user about the similitude between the concept of web catalogue cognition and spatiality.Utilize above-mentioned six steps just can provide decision support from quantitative angle for websites collection optimization based on website user's mental model.

Description

A kind of websites collection method for optimization analysis based on user's mental model
Technical field
The present invention relates to a kind of websites collection method for optimization analysis, a kind of website based on user's mental model is divided Class method for optimization analysis.
Background technology
Site information taxonomic hierarchies optimization is on the basis of the existing information classification system in assessment website, decides whether Adjust the existing information classification system in website, if desired adjust, determine how adjustment.Excellent currently for site information taxonomic hierarchies The research of change method is short of the most very much, is concentrated mainly on classification foundation, standard, the research of principle scheduling theory and the level of classification, granularity In research etc. particular problem phenomenon, also focusing simply on and look for defect from existing sorting technique, rarer research is necessarily Theoretical method support under carry out profound level exploration.
Norman proposes in interaction design in his " The Design ofEveryday Things " book first There are 3 kinds of mental models, i.e. presentation model, user's mental model, system model, he thinks presentation model and user's mental model Closer to time, user is more understood that web site organization structure, can more efficiently carry out acquisition of information.Therefore divide carrying out site information The mental model of user, i.e. user's cognition to websites collection system should be examined or check during class system optimization as far as possible.
In the mental model of website user is measured, psychology approximate data is the master of relation between the concept by individual perception Seeing assessment, wherein " similitude ", " spatiality " are mainly to measure angle.The quantitative measurment of mental model is the most all with concept phase It is starting point like property, extracts the related notion in research theme, use different sorting technique contrived experiments, obtain tested to generally Data are analyzed by the similarity assessment data read, and characterize user's mental model about correlative study theme.Cluster analysis The most just it is used to process concept similarity data, according to concept similarity, concept is classified.And the spatiality of concept is Refer to the different concepts relative position (Rusbult C.E, Onizuka R.K, Lipkus I.What in tested psychological space Do We Really Want?:MentalModels of Ideal Romantic Involvement Explored through Multidimensional Scaling[J].Journal ofExperimental Social Psychology, 1993,29 (9): 493-527), multi-dimentional scale method can be used for the measurement of concept space, obtains user's spatial table about concept Levy, observe user's mental model about a certain field concept intuitively.And currently combine both for websites collection In catalogue optimization, user's mental model surveys quantifier elimination does not also have.
Currently the research to website user is the most all to rest on traditional user to investigate the stage, and the user of employing investigates mode Specifically include that scene method, focus group, usability testing, in-depth interview, observation etc., but these methods all exist limitation Property, it is possible to the valid data got are extremely limited, when Expenses Cost is high and uses these modes to carry out user investigation, relate to asking Topic can not be too many, and the information that therefore obtains is the most macroscopical, is difficult to obtain the detailed information actually useful to user behavior research.
Therefore still there are some problems in websites collection system optimization method: (1) is difficult to carry out effective user study, very Difficult collection user's cognition to website all sidedly;(2) seldom " customer-centric " carries out websites collection system optimization.
Summary of the invention
Technical problem solved by the invention is to provide a kind of websites collection optimization based on user's mental model to analyze Method.
The technical solution realizing the object of the invention is: a kind of websites collection optimization based on user's mental model is analyzed Method, step is as follows:
Step 1, web log file data are pre-processed, particularly as follows:
Step 1-1, web log file data are purified, delete unrelated with analysis purpose in journal file or there is mistake Data, the described data unrelated with analysis purpose include: comprise the data of concept in classified catalogue, comprise product code table The data shown;The described data that there is mistake include: misspelling, product description mistake;Data processing is selected to need afterwards Attribute, described attribute includes user's name, user region, user cognition concept, product category, described user cognition concept The concept optimized about web catalogue submitted to based on the cognition to web catalogue for user;
Step 1-2, data cleaned in step 1-1 are carried out format conversion, by the user cognition concept extracted and ground Territory, the form of three attributes of title are unified, and specially remove numbering, capital and small letter plural number unification unified, single;
Step 1-3, determine the frequency that user cognition concept occurs, set threshold value afterwards, threshold value according to actual amount of data and Extract user cognition concept quantity to determine, choose the frequency user cognition concept more than this threshold value, and record the frequency;
Step 2, the concept determined in user cognition concept and web catalogue whether co-occurrence, specifically utilizes intelligence mould Type category theory, retrieves user cognition concept to website as search key, and occur in statistics retrieval result divides Concept in class catalogue and the frequency;
Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetrical matrix, and the first row and first is classified as concept, including using Concept in family cognitive concept and web catalogue, the co-occurrence frequency that remaining element lattice are between concept, specially cell The co-occurrence frequency between concept in corresponding the first row and first row;
Similarity matrix is generated on the basis of step 4, in step 3 co-occurrence matrix;
Step 5, on the basis of step 4, carry out cluster analysis, specifically utilize pedigree clusters that similarity matrix is carried out Cluster, afterwards according to cluster statistic, determine the cluster result of concept, described concept include extract user cognition concept and Concept in web catalogue;
Step 6, utilize Multidimensional Scaling that the similarity matrix in step 4 is analyzed, obtain corresponding dimension many Dimension dimensional analysis space diagram, thus complete websites collection optimization and analyze.
Compared with prior art, its remarkable advantage is the present invention: (1) present invention directly utilizes web log file data and uses Family is studied, and saves the cost of user's investigation, can comprehensively obtain user profile;(2) using quantitative calculation method, result is accurate Really, analyze the final result obtained and directly can provide foundation for websites collection system optimization;(3) cluster analysis and multi-dimentional scale divide Both analysis represent two key points " similitude " and " spatiality " of user's mental model, and both analysis results can mutually be tested Card, represents visualization result intuitively.
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Accompanying drawing explanation
Fig. 1 is the websites collection method for optimization analysis flow chart based on user's mental model of the present invention.
Fig. 2 is concept and self-defined group name clustering tree in secondary classification catalogue.
Fig. 3 is concept and the Multidimensional Scaling space diagram of self-defined group name in secondary classification catalogue.
Detailed description of the invention
A kind of websites collection method for optimization analysis based on user's mental model, step is as follows:
Step 1, web log file data are pre-processed, particularly as follows:
Step 1-1, web log file data are purified, delete unrelated with analysis purpose in journal file or there is mistake Data, the described data unrelated with analysis purpose include: comprise the data of concept in classified catalogue, comprise product code table The data shown;The described data that there is mistake include: misspelling, product description mistake;Data processing is selected to need afterwards Attribute, described attribute includes user's name, user region, user cognition concept, product category, described user cognition concept The concept optimized about web catalogue submitted to based on the cognition to web catalogue for user, i.e. user utilizes net When classified catalogue of standing browses, when can not find most suitable concept, oneself think more suitable what website interactive interface was submitted to Concept;Such as user utilizes classified catalogue to search the books of data mining in Joyo.com, finds that such books belong to classification Catalogue " database " classification, it is believed that the most improper, user thinks that data mining should occur directly in classified catalogue classification, At this moment " data mining " is exactly described user cognition concept.
Step 1-2, data cleaned in step 1-1 are carried out format conversion, by the user cognition concept extracted and ground Territory, the form of three attributes of title are unified, and specially remove numbering, capital and small letter plural number unification unified, single;
Step 1-3, determine the frequency that user cognition concept occurs, set threshold value afterwards, threshold value according to actual amount of data and Extract user cognition concept quantity to determine, such as, in the case of actual amount of data and user cognition concept quantity are the least, for The data volume that acquisition is certain, can set less threshold value.Choose the frequency user cognition concept more than this threshold value, and record frequency Secondary;
Step 2, the concept determined in user cognition concept and web catalogue whether co-occurrence, specifically utilizes intelligence mould Type category theory, retrieves user cognition concept to website as search key, and occur in statistics retrieval result divides Concept in class catalogue and the frequency;
According to mental model category theory, user, when website use classified catalogue carries out acquisition of information, mainly uses water Flat, vertically and horizontally vertical impartial click mode, according to correlation between concept in classified catalogue during click, selects phase The concept of Guan Xinggao is clicked on, and utilizes this principle, user cognition concept is retrieved as search key to website, Concept in the classified catalogue occurred in statistics retrieval result and the frequency thereof, to analyze user cognition concept and web catalogue Correlation between middle concept.
Mental model category theory is that Charles Cole, Yang Lin et al. is found through experiments the mental model of people Common three kinds are vertical-type (26%), horizontal type (31%), and impartial type (21%), altogether constitute the mental model class of 78% crowd Type.The classification of mental model is based on what the hierachy number in mental model figure and number of regions determined.Three common class mental models Feature is as follows:
● vertical: the mental model that the level of vertical dimensions is more than horizontal dimensions
● level: the mental model that the level of horizontal dimensions is more than vertical dimensions
● impartial: vertical dimensions and the equal mental model of horizontal dimensions level
Theoretical according to this, expanded to user and utilize classified catalogue to carry out in information access process, it is assumed that Yong Hu When website use classified catalogue carries out acquisition of information, the mode being also adopted by vertical, level and horizontal vertical equalization is clicked on.
Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetrical matrix, and the first row and first is classified as concept, including using Concept in family cognitive concept and web catalogue, the co-occurrence frequency that remaining element lattice are between concept, specially cell The co-occurrence frequency between concept in corresponding the first row and first row;
Determine the co-occurrence frequency between concept in co-occurrence matrix, specifically comprise the following steps that
Step 3-1, determine the co-occurrence frequency of concept in user cognition concept and classified catalogue, be specifically divided into two kinds
Situation: a kind of is user cognition concept and the co-occurrence frequency of concept in secondary classification catalogue, is designated as F1,
F1The frequency that in=p*x p=retrieval result, in secondary classification catalogue, concept occurs
The frequency that x=user cognition concept occurs
Another kind is user cognition concept and the co-occurrence frequency of concept in three grades of classified catalogues, is user cognition concept and goes out The existing frequency;
Step 3-2, determine the co-occurrence frequency between concept in classified catalogue, be in two classified catalogues concept with all Smaller value in the co-occurrence frequency of user cognition concept, sums afterwards, is designated as F2, m, n represent in classified catalogue general respectively Reading the co-occurrence frequency of A, B and user cognition concept, formula used is:
F2=SUM (MIN (m, n))
Step 3-3, the co-occurrence frequency determined between user cognition concept, the co-occurrence frequency between user cognition concept is 0.
Similarity matrix is generated on the basis of step 4, in step 3 co-occurrence matrix;
Generating similarity matrix specifically uses pearson relative coefficient to calculate as similarity, and formula used is
r = Σ i = 1 n ( X i - X ‾ ) ( Y i - Y ‾ ) Σ i = 1 n ( X i - X ‾ ) 2 Σ i = 1 n ( Y i - Y ‾ ) 2
In formula, r is the degree that the linear correlation between two variablees is strong and weak, generally meets 0≤r≤1, and n is sample size, x, y WithIt is respectively observation and the average of two variablees.
Step 5, on the basis of step 4, carry out cluster analysis, specifically utilize pedigree clusters that similarity matrix is carried out Cluster, afterwards according to cluster statistic, determine the cluster result of concept, described concept include extract user cognition concept and Concept in web catalogue;
Utilize pedigree clusters that similarity matrix is clustered, afterwards according to the statistic of cluster, determine the poly-of concept Class result, specifically includes following steps:
Step 5-1, the distance determined between sample, constitute symmetry distance matrix, uses Ti, TjRepresent sample i, j, d(Ti, Tj) represent the distance between i, j, it is abbreviated as dij, variance weighted range formula used is
d ij = [ Σ k = 1 p ( T ik - T ik ) 2 S k 2 ] 1 2
Using N number of sample as N number of class, Mp、MqRepresent two classes, contain N respectivelyp, NqIndividual sample, Mp、MqBetween distance Extremely Dpq, calculate sample distance between any two, constitute a symmetry distance matrix D (0);
Step 5-2, merging classification, generate new distance matrix, specifically select the smallest element on off-diagonal in D (0) Element, if this least member is Dpq, at this moment Mp={Xp}, Mq={Xq}, Mp, Mq are merged into new class Mr={Xp, an Xq}, D (0) eliminates the ranks corresponding to Mp, Mq, and there was added new class Mr with remaining other be the distance institute group between the class be polymerized The a line become and row, obtain new Distance matrix D (1), and it is N-1 rank square formations;
Step 5-3, repetition step 5-2 are until it is 1 big class that N number of sample gathers;
Step 5-4, statistic according to pedigree clustering method determine the cluster result of concept, and described statistic includes: R2 Statistic, half R partially2Statistic, Pseudo F-Statistics, pseudo-t2Statistic.
Step 6, utilize Multidimensional Scaling that the similarity matrix in step 4 is analyzed, obtain corresponding dimension many Dimension dimensional analysis space diagram, thus complete websites collection optimization and analyze.Multidimensional Scaling is utilized similarity matrix to be carried out point Analysis, generates Multidimensional Scaling space diagram, specifically includes following steps:
Step 6-1, generation observing matrix, specifically utilize Euclid to stimulate space to carry out spatial description, based on Min Kefu This base distance function calculates: assuming that in web catalogue, tested to relation cognition between concept as basic input Data, are provided with n object, can obtainDistance S between individual object pairij, the distance between point i and j is expressed as dij, institute With formula it is:
S ij = [ Σ a v ( x ia - x ja ) 2 ] 1 2
In formula, v represents dimension, XiaRepresent coordinate points i in a dimension, XjaRepresent coordinate points j in a dimension;
Step 6-2, Homomorphic Mapping, specifically find the q dimension space of a dimensionality reduction, does Homomorphic Mapping and processes, makes q dimension space Interior dijI.e. object is to the distance in p space and former distance SijMatch, if dijWith SijMatch completely, each paired object Spacing relation is di1> di2> ... > dim, i.e. this distance that falls progressively is consistent with the original similarity order risen progressively;
Step 6-3, reliability and validity inspection, determine optimum number of dimensions, specifically calculate difference degree K, referred to as Cruise Gram coefficient, whether the space diagram obtained for inspection institute has the most representative and stress stress exponent, for degree of fitting value, Being defined as the departure between theoretical and the distance of calculating that similarity assessment data represent, Stress employing formula is:
Stress = Σ i Σ j ( d ij - d ^ ij ) 2 / Σ i Σ j d ij 2
Wherein dijIt is to meet the tested concept distance order relation that is originally inputted, makes again the reference that stress exponent value is minimum simultaneously Value.Above-mentioned K value is the bigger the better, and is typically above acceptable 0.60;Stress value typically can accept, in detail within 0.20 Thin stress exponent size is shown in Table 1 with degree of fitting relation
Table 1 stress exponent size and degree of fitting relation
Stress Degree of fitting
0.200 Bad
0.100 All right
0.050 Good
0.025 The best
0.000 Matching completely
Step 6-4, according to the optimum number of dimensions determined in step 6-3, generate Multidimensional Scaling space diagram.
Below in conjunction with embodiment the present invention done further detailed description:
Goal in research: made in China net illuminating product classified catalogue optimization is analyzed.
Data illustrate: made in China net (international station http://www.made-in-china.com/) product classification catalogue Lights&Lighting big class Zhejiang, Shanghai, Jiangsu, Guangdong four provinces and cities User Defined group name data (6872 record).In State manufactures net and user cognition concept is referred to as self-defined group name.
Web log file data are pre-processed by step 1, particularly as follows:
1), after web log file data being purified, the attribute that Data processing needs is filtered out, including Business Name, province Part, city and self-defined group name, concrete form is as shown in table 2:
Data form after table 2 data purification
2) first the numbering comprised in self-defined group name is removed, then self-defined group name is converted into small letter, remove plural number Form, and according to first letter mother is ranked up;
3) very big due to the self-defined group name quantity that the frequency filtered out is less, threshold value is set to 4, selects the frequency more than 4 User Defined group name, finally select 114 self-defined group names and record its frequency.The self-defined group name result filtered out is such as Shown in table 3:
Table 3 self-defined group name the selection result
Step 2, determines the whether co-occurrence of the concept in self-defined group name and web catalogue.Concrete operations are as follows:
1) station, made in China net world http://www.made-in-china.com/ is signed in;
2) in frame retrieval, input needs the self-defined group name of retrieval, selects in " all categories " drop-down menu " Lights&Lighting ", then clicks on search;
3) concept in the secondary classification mesh occurred in statistics retrieval result " catalog ";
4) clicking on the concept occurred in " catalog " successively, the concept now occurred in " catalog " is corresponding three grades Concept in classified catalogue;
Recording the concept in the secondary classification catalogue occurred in catalog, three grades of classified catalogues, corresponding unit lattice fill in 1, Obtaining original cooccurrence relation statistical form, partial results is as shown in table 4:
Table 4 part co-occurrence result statistical form
In ensuing processing procedure, self-defined group name and the process of concept in secondary classification catalogue and three grades of classified catalogues Process is similar to, below as a example by the cooccurrence relation of the concept in secondary classification catalogue and self-defined group name.
Step 3, generates co-occurrence matrix, particularly as follows:
1) the co-occurrence frequency between concept and self-defined group name in secondary classification catalogue is determined;Specifically by self-defined group name Frequency number is multiplied by the frequency that in secondary classification catalogue, concept occurs, obtains partial results as shown in table 5:
Concept in table 5 secondary classification catalogue and the co-occurrence frequency partial results of self-defined group name
2) the co-occurrence frequency between the concept in secondary classification catalogue is determined;
Calculate on the basis of previous step result, illustrate, such as Interior lighting and LED lighting The co-occurrence frequency, the row of the two concept entitled B, C in excel, therefore formula is SUM(MIN(B, C)), first select two The data that in row, every a line is less, then sue for peace;
3) the co-occurrence frequency between self-defined group name all fills out 0, and the co-occurrence matrix finally obtained is as shown in table 6:
Concept and the co-occurrence matrix of self-defined group name in table 6 part secondary classification catalogue
Interiorlighting ledlighting lightingfixtureg bulblamp lightingdecoration
Interior_lighting 14441 6587 11403 10697
led_lighting 14441 6643 12204 11108
lighting_fixtures 6587 6643 6467 5836
bulb_lamp 11403 12204 6467 9433
lighting_decoration 10697 11108 5836 9433
outdoor_lighting 14640 17255 6620 12498 11189
camping_light 1116 1116 995 1100 1110
emergency_indicator_light 2245 2240 2129 2226 2205
torch 653 653 582 645 648
portable_lighting 1364 1388 1289 1356 1353
Step 4, generates similarity matrix, uses SAS software, selects Pearson correlation coefficient to calculate, obtains similar Property matrix, partial results is as shown in table 7:
Table 7 similarity matrix partial results
Step 5, cluster analysis, utilize SAS software, choose pedigree clustering method, carry out cluster analysis, between class distance method Choose the methods such as ward, complete, single, through comparing, find that the result that method=ward obtains is optimal, by sample In the way of merging two classes, last 15 process operation results merged are as shown in table 8 every time:
Table 8SAS cluster process method=ward operation result table
Three statistics according to pedigree clustering method half R partially2Statistic (SPRSQ), Pseudo F-Statistics (PSF), pseudo-t2System Metering (PST2) selects optimum classification number to be 4.Totally 127 concepts, wherein 114 self-defined group names in cluster result, 13 two Level classified catalogue concept, optimal classification number is 4, and wherein 13 second-level directory concepts are in the middle of two classes.Cluster result is such as Shown in table 9 (runic is the concept in secondary classification catalogue, and the concept of non-overstriking is self-defined group name).
The self-defined group name of table 9 and second-level directory cluster result
Four classes marked in the clustering tree that Fig. 2 represents are the most corresponding with the cluster result in table 9.Cluster result represents Between the concept gathered in a class, correlation is maximum, as led_plug_light in the 4th class, induction_lamp, led_module、led_rigid_bar、led_moving_head、led_rope_light、led_dance_floor、led_ It is a class that these eight concepts of recessed_light are gathered, then illustrate in all concepts, the correlation between these eight concepts It is maximum, is placed in same class classification.
Step 6, Multidimensional Scaling, this example directly use the Multidimensional Scaling function in SAS software be analyzed, The accuracy of cluster result can be verified by Multidimensional Scaling, and visualization represents cluster result.
In order to make Multidimensional Scaling result relatively sharp, concept variable X 1~X127 being replaced, variable numbering is with poly- Concept sequence number in class result is consistent.Be can be seen that by Multidimensional Scaling space diagram, 127 concepts have been divided into four classes, its knot Fruit demonstrates cluster result well, illustrates the cluster result of concept the most intuitively, thus completes made in China net and shine The classified catalogue optimization of bright series products is analyzed.
From above-mentioned example, the method for the present invention directly utilizes web log file data and carries out user study, saves user The cost of investigation, can comprehensively obtain user profile.

Claims (4)

1. a websites collection method for optimization analysis based on user's mental model, it is characterised in that step is as follows:
Step 1, web log file data are pre-processed, particularly as follows:
Step 1-1, web log file data are purified, delete number that is unrelated with analysis purpose in journal file or that there is mistake According to, the described data unrelated with analysis purpose include: comprise the data of concept in classified catalogue, comprise what product code represented Data;The described data that there is mistake include: misspelling, product description mistake;Select the genus that Data processing needs afterwards Property, described attribute includes user's name, user region, user cognition concept, product category, and described user cognition concept is for using The concept optimized about web catalogue that family is submitted to based on the cognition to web catalogue;
Step 1-2, data cleaned in step 1-1 are carried out format conversion, by the user cognition concept extracted and region, name The form claiming three attributes is unified, and specially removes numbering, capital and small letter plural number unification unified, single;
Step 1-3, determine the frequency that user cognition concept occurs, set threshold value afterwards, choose the frequency user more than this threshold value Cognitive concept, and record the frequency;
Step 2, the concept determined in user cognition concept and web catalogue whether co-occurrence, specifically utilizes mental model to divide Class is theoretical, user cognition concept is retrieved as search key to website, the classification mesh occurred in statistics retrieval result Concept in record and the frequency;
Step 3, generation co-occurrence matrix, described co-occurrence matrix is symmetrical matrix, and the first row and first is classified as concept, recognizes including user Knowing the concept in concept and web catalogue, the co-occurrence frequency that remaining element lattice are between concept, specially cell are corresponding The first row and first row in the co-occurrence frequency between concept;
Similarity matrix is generated on the basis of step 4, in step 3 co-occurrence matrix;
Step 5, on the basis of step 4, carry out cluster analysis, specifically utilize pedigree clusters that similarity matrix is clustered, Afterwards according to the statistic of cluster, determine that the cluster result of concept, described concept include user cognition concept and the website extracted Concept in classified catalogue, specifically includes following steps:
Step 5-1, the distance determined between sample, constitute symmetry distance matrix, uses Ti, TjRepresent sample i, j, d (Ti, Tj) represent Distance between i, j, is abbreviated as dij, variance weighted range formula used is
Using N number of sample as N number of class, Mp、MqRepresent two classes, contain N respectivelyp, NqIndividual sample, Mp、MqBetween distance extremely Dpq, calculate sample distance between any two, constitute a symmetry distance matrix D (0);
Step 5-2, merging classification, generate new distance matrix, specifically select the least member on off-diagonal in D (0), will This least member value is assigned to Dpq, at this moment Mp={Xp}, Mq={Xq}, and Mp, Mq are merged into new class Mr={Xp, an Xq}, D (0) eliminates the ranks corresponding to Mp, Mq, and adds by the distance institute group between new class Mr and other unpolymerized classes remaining The a line become and row, obtain new Distance matrix D (1), and it is N-1 rank square formations;
Step 5-3, repetition step 5-2 are until it is 1 big class that N number of sample gathers;
Step 5-4, statistic according to pedigree clustering method determine the cluster result of concept, and described statistic includes: R2Statistics Amount, half R partially2Statistic, Pseudo F-Statistics, pseudo-t2Statistic;
Step 6, utilize Multidimensional Scaling that the similarity matrix in step 4 is analyzed, obtain the multidimensional chi of corresponding dimension Degree analysis space figure, thus complete websites collection optimization and analyze, utilize Multidimensional Scaling that similarity matrix is analyzed, raw Become Multidimensional Scaling space diagram, specifically include following steps:
Step 6-1, generation observing matrix, specifically utilize Euclid to stimulate space to carry out spatial description, based on Minkowski Distance function calculates: assuming that in web catalogue, tested to relation cognition between concept as substantially inputting data, It is provided with n object, can obtainDistance S between individual object pairij, the distance between point i and j is expressed as dij, public affairs used Formula is:
In formula, v represents dimension, XiaRepresent coordinate points i in a dimension, XjaRepresent coordinate points j in a dimension;
Step 6-2, Homomorphic Mapping, specifically find the q dimension space of a dimensionality reduction, does Homomorphic Mapping and processes, makes d in q dimension spaceij With former distance SijMatch, described dijFor object to the distance in q space, if dijWith SijMatch completely, each the most right As spacing relation is di1> di2> ... > dim, i.e. this distance that falls progressively is consistent with the original similarity order risen progressively;
Step 6-3, reliability and validity inspection, determine optimum number of dimensions, specifically calculates difference degree K, referred to as Cruise gram and is Number, whether the space diagram obtained for inspection institute has the most representative and stress stress exponent, for degree of fitting value, definition For the departure between theoretical and the distance of calculating that similarity assessment data represent, Stress uses
Formula is:
Wherein dijIt is to meet the tested concept distance order relation that is originally inputted, makes again the reference value that stress exponent value is minimum simultaneously;
Step 6-4, according to the optimum number of dimensions determined in step 6-3, generate Multidimensional Scaling space diagram.
Websites collection method for optimization analysis based on user's mental model the most according to claim 1, it is characterised in that step According to mental model category theory in rapid 2, user when website use classified catalogue carries out acquisition of information, mainly use level, Vertically and horizontally vertical impartial click mode, according to correlation between concept in classified catalogue during click, selects relevant Property high concept click on, utilize this principle, user cognition concept retrieved as search key to website, system Concept in the classified catalogue occurred in meter retrieval result and the frequency thereof, to analyze in user cognition concept and web catalogue Correlation between concept.
Websites collection method for optimization analysis based on user's mental model the most according to claim 1, it is characterised in that step Determine the co-occurrence frequency between concept in co-occurrence matrix in rapid 3, specifically comprise the following steps that
Step 3-1, determine the co-occurrence frequency of concept in user cognition concept and classified catalogue, be specifically divided into two kinds of situations:
A kind of is user cognition concept and the co-occurrence frequency of concept in secondary classification catalogue, is designated as F1,
F1=p*x, the frequency that during wherein p is retrieval result, in secondary classification catalogue, concept occurs
X is the frequency that user cognition concept occurs
Another kind is user cognition concept and the co-occurrence frequency of concept in three grades of classified catalogues, is what user cognition concept occurred The frequency;
Step 3-2, determine the co-occurrence frequency between concept in classified catalogue, be concept and all users in two classified catalogues Smaller value in the co-occurrence frequency of cognitive concept, sums afterwards, is designated as F2, m, n represent concept A in classified catalogue, B respectively With the co-occurrence frequency of user cognition concept, formula used is:
F2=SUM (MIN (m, n))
Step 3-3, the co-occurrence frequency determined between user cognition concept, the co-occurrence frequency between user cognition concept is 0.
Websites collection method for optimization analysis based on user's mental model the most according to claim 1, it is characterised in that step Rapid 4 generate similarity matrix specifically uses pearson relative coefficient to calculate as similarity, and formula used is
In formula, r is the degree that linear correlation between two variablees is strong and weak, generally meets 0≤r≤1, and n is sample size, X, Y andIt is respectively observation and the average of two variablees.
CN201210413774.8A 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model Active CN102937985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210413774.8A CN102937985B (en) 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210413774.8A CN102937985B (en) 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model

Publications (2)

Publication Number Publication Date
CN102937985A CN102937985A (en) 2013-02-20
CN102937985B true CN102937985B (en) 2016-07-06

Family

ID=47696882

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210413774.8A Active CN102937985B (en) 2012-10-25 2012-10-25 A kind of websites collection method for optimization analysis based on user's mental model

Country Status (1)

Country Link
CN (1) CN102937985B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793504B (en) * 2014-01-24 2018-02-27 北京理工大学 A kind of cluster initial point system of selection based on user preference and item attribute
CN104199828B (en) * 2014-07-26 2017-07-07 复旦大学 A kind of community network construction method based on transaction journal data
CN106202572B (en) * 2016-08-18 2020-03-06 广州视睿电子科技有限公司 Method and device for displaying e-book catalog
CN109166180B (en) * 2018-08-03 2022-12-13 贵州大学 VR system user experience design method under drive of mental model
CN112347318B (en) * 2020-10-26 2022-08-02 杭州数智政通科技有限公司 Method, device and medium for dividing industry classes of enterprises

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《信息用户检索决策中的心智模型分析》;甘利人等;《信息用户检索决策中的心智模型分析》;20100831;第29卷(第4期);641-651 *
《网站用户信息获取中的心智模型研究》;吴鹏等;《情报学报》;20110930;第30卷(第9期);935-945 *

Also Published As

Publication number Publication date
CN102937985A (en) 2013-02-20

Similar Documents

Publication Publication Date Title
Rafols et al. Content‐based and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects
CN105468605B (en) Entity information map generation method and device
CN102937985B (en) A kind of websites collection method for optimization analysis based on user's mental model
CN106055539B (en) The method and apparatus that name disambiguates
CN103559191B (en) Based on latent space study and Bidirectional sort study across media sort method
CN101894058B (en) Method and device for analyzing test coverage automatically aiming at automatic test system
CN102456016B (en) Method and device for sequencing search results
Ali et al. An overview of Web search evaluation methods
CN101751447A (en) Network image retrieval method based on semantic analysis
Pham et al. The structure of the computer science knowledge network
Tao et al. Eventcube: multi-dimensional search and mining of structured and text data
CN112927782B (en) Heart health state early warning system based on text emotion analysis
CN103886072B (en) Search result clustering system in the search engine of colliery
CN109492022A (en) The searching method of semantic-based improved k-means algorithm
CN106611016A (en) Image retrieval method based on decomposable word pack model
CN105701227B (en) A kind of across media method for measuring similarity and search method based on local association figure
Petrovich et al. Exploring knowledge dynamics in the humanities. Two science mapping experiments
Ritze Web-scale web table to knowledge base matching
CN114077652A (en) Data processing method based on multidimensional data cube and electronic device
Vasconcelos et al. The utility of open-access biodiversity information in representing anurans in the Brazilian Atlantic Forest and Cerrado
Awadallah et al. OpinioNetIt: A structured and faceted knowledge-base of opinions
KR20120052145A (en) System and method on generating niche evaluation model and niche technological areas assessment using the model
Yang et al. Evaluation and assessment of machine learning based user story grouping: A framework and empirical studies
Shi et al. Visual analysis of citation context-based article influence ranking
Xu Design and Application of College Students' Psychological Data Mining and Analysis System

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wu Peng

Inventor after: Zhang Peipei

Inventor after: Zhang Jingjing

Inventor before: Wu Peng

Inventor before: Zhang Peipei

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: WU PENG ZHANG PEIPEI TO: WU PENG ZHANG PEIPEI ZHANG JINGJING

C53 Correction of patent for invention or patent application
CB03 Change of inventor or designer information

Inventor after: Wu Peng

Inventor after: Zhang Peipei

Inventor after: Zhang Jingjing

Inventor after: Zeng Huaxiang

Inventor before: Wu Peng

Inventor before: Zhang Peipei

Inventor before: Zhang Jingjing

COR Change of bibliographic data

Free format text: CORRECT: INVENTOR; FROM: WU PENG ZHANG PEIPEI ZHANG JINGJING TO: WU PENG ZHANG PEIPEI ZHANG JINGJING CENG HUAXIANG

C14 Grant of patent or utility model
GR01 Patent grant