CN106682129B

CN106682129B - Hierarchical concept vectorization increment processing method in personal big data management

Info

Publication number: CN106682129B
Application number: CN201611154347.7A
Authority: CN
Inventors: 杨良怀; 汪庆顺; 庄慧; 范玉雷; 龚卫华; 方文菲
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2016-12-14
Filing date: 2016-12-14
Publication date: 2020-02-21
Anticipated expiration: 2036-12-14
Also published as: CN106682129A

Abstract

The method for vectorizing incremental processing of the level concept in personal big data management comprises the following steps: 1) and (3) vectorizing all concepts during the initial operation of the system, and carrying out concept vector merging operation on all branch nodes. 2) Executing when the user operates the concept tree: 2.1) obtaining the concept vector and the total number of words of the operated node and the father node thereof; 2.2) modifying the concept vector of the father node according to a formula; 2.3) taking a parent node as an operated node, and recursively executing from 2.1) to a root node; 2.4) updating the inverse document frequency vector. 3) The error accumulation is performed to a certain extent: 3.1) acquiring a current inverse document frequency vector and an inverse document frequency initial value vector; 3.2) updating all vector weights in the vector space in batch; 3.3) updating the vector of the initial value of the frequency of the inverse document. The invention realizes the hierarchical concept vectorization incremental calculation method in personal big data management, can quickly adjust the concept vector in the concept space and improves the execution efficiency.

Description

Hierarchical concept vectorization increment processing method in personal big data management

Technical Field

The invention relates to the management, organization, query and retrieval technology of personal big data, in particular to a hierarchical concept vectorization method based on a vector space model and an incremental calculation method thereof.

Background

With the development of information technology, personal data is explosively increased, including personal documents (text, images, voice), mails, health data, personal mobile phone contact information (WeChat, QQ), Internet data and the like, and enters the age of personal big data (personal big data); the development of wearable devices will further exacerbate the growth of data, and people can record smells and sees and collect physiological health data all day long. How to manage and organize personal big data can always obtain accurate, proper, complete and high-quality information in a proper place through simple operation, and is a target of a personal information management system. However, even if dealing with electronic documents piled up by individuals, the current approaches are not as desirable. For example, due to the lapse of time, people often memorize the previously stored information gradually and fuzzily, and the existing retrieval tools use a keyword matching mode to retrieve the information, so that fuzzy and associated query clues in the brain and sea of the user cannot be fully utilized, and often the retrieval efficiency is low. In addition, the information retrieval method based on accurate matching is difficult to help the user find potentially relevant information.

In the face of mass data, data abstraction helps to grasp and understand mass data. The personal big data management system provided by the invention effectively organizes data by adopting a concept space, wherein the concept refers to a set formed by information resources with similarity or correlation among the information resources, and the set can represent a certain class or a transaction, a task and the like. The user main body can establish a series of concepts according to work needs, personal preferences, personal habits and the like, and the concepts are associated with each other and with respective data sources, so that the whole semantic contact network is formed, and the efficient management of personal information is realized. Associations between concepts may be context, identity, true inclusion in relationships, true inclusion relationships, cross relationships, aggregation, etc., but are the most used or associated for information management. And the concept space is composed of concepts and a semantic relation network taking the concepts as nodes. In practice, each concept is used as a node, and a multi-level tree structure is organized according to the upper and lower relation among the concepts, which is called as a concept tree in the invention and is easy to accept and use by users. How to fully utilize the semantics contained in the concept to improve the query quality is a considerable issue.

For unstructured text data, the document vectorization technology is a method capable of utilizing semantic information contained in a document, and is a basic technology for solving the problems. In the document vectorization technology, documents are regarded as a set of feature items (words), processing of document contents is simplified into vector operation in a vector space, semantic similarity of texts is expressed according to a similarity in the vector space, semantically related documents are provided for users, and the information retrieval breadth is expanded. The semantic similarity can be used as a clue for further retrieval of the user to guide the user to improve the depth of information retrieval. The method for vectorizing the document can be popularized into a concept space, the concept of the document can be similar to the document, and then the concept vectorization can be carried out.

In general, the concept vectorization process is computationally expensive due to the fact that the feature items of a concept are usually thousands. If the traditional document vectorization method is adopted for carrying out concept vectorization, when the number of concepts is changed, such as the addition of a new concept or the deletion of an old concept, all existing concept vectors can be deviated; if the vector space is reconstructed, the amount of computation is usually large.

In addition, most of the conventional document vectorization technologies are based on a single-layer document classification structure, and are not suitable for being directly applied to a concept tree. In the concept tree, when the concept corresponding to the branch node is vectorized, in order to more completely express the semantic information of the concept, in addition to calculating the concept corresponding to the node itself, the concept corresponding to the lower node should be fused, and the specific way is to merge the concept vector of the branch node and the concept vector of the child node.

The invention aims to solve the problem of concept vectorization high-efficiency calculation in the personal big data management, and develops a concept vectorization method based on a hierarchical concept structure on the basis of a vector space model. Aiming at the problem that the vector space generates deviation due to the change of a concept tree structure, a vector increment calculation method is introduced for efficiently adjusting the vector space, and errors generated in the increment calculation process are accumulated and repaired.

Disclosure of Invention

The invention provides a concept vectorization method facing mass personal big data and based on a hierarchical concept structure to solve the problems, aiming at overcoming the defects that the conventional document vectorization technology is not suitable for a concept tree structure and the calculation amount required by vector space reconstruction is huge when the concept tree structure is changed.

The method for processing the hierarchy concept vectorization increment in the personal big data management is applied to a concept space layer of a personal big data management model. The invention can be divided into a vector space initialization stage and a vector increment calculation stage, wherein the vector space initialization stage can be further subdivided into a preprocessing stage and a concept vector merging stage. And in the preprocessing stage, vectorizing the concept of each node in the concept tree to be expressed as a concept vector, recording the total word number of each node and the inverse document frequency of each characteristic item, calculating the weight of each characteristic item by adopting a tf-idf method in the vectorizing process, wherein the total word number of each node refers to the total word number contained in the concept corresponding to the node. It should be noted that a concept may contain multiple documents, and all documents within the same concept are computed as a whole at the time of computation. In the preprocessing process, only the concept corresponding to the node is calculated for the branch node. The concept vector merging phase comprises running the following steps on a computer:

1) taking a root node of the concept tree as a target node;

2) for the target node, all m child nodes C of the target node are obtained₁，C₂，…，C_m；

3) Obtaining C₁，C₂，…，C_mCorresponding concept vector V_C1，V_C2，…，V_CmAnd a concept vector V corresponding to the target node;

(3.1) if there is a child node C_iIs a branch node and its corresponding concept vector is not merged, denoted by C_iMerging the concept vectors of the target nodes from the step (2).

4) And calculating the sum L of the total number of words of the target node and all the child nodes thereof. Creating a one in vector spaceConcept vector V_new；

5) Assume that there are n different feature terms T in the vector space₁，T₂，…，T_nThen a concept vector V is given, which corresponds to the feature term T_iIs recorded as V.W_iWherein the total number of corresponding words is marked as L_V，V_CiTotal number of words of L_Ci(ii) a Calculating V_new.W_i＝(V.W_i*L_V+V_C1.W_i*L_C1+V_C2.W_i*L_C2+…+V_Cm.W_i*L_Cm) L, where i ═ 1,2, …, n.

6) Changing concept vector corresponding to target node into V_newThe total number of words is changed to L.

The vector increment calculation stage can be divided into an increment calculation process and an error completion process. The incremental computation process is performed immediately after each update operation of the concept tree by the user. The updating operation of the concept tree comprises adding, deleting or moving concept nodes, wherein the concept nodes are regarded as two-step operation of deleting and then adding. For adding or deleting nodes, the following steps are executed on the computer:

A1. node N to be added or deleted_cAs a target node;

A2. searching parent node N of target node_p. If N is present_pAnd if not, ending the incremental calculation process.

A3. Obtaining N_cCorresponding concept vector V_cAnd total number of words L_c，N_pCorresponding concept vector V_pAnd total number of words L_p。

A4. Suppose that the vector space contains n different feature items in total, which are respectively marked as T₁，T₂，…，T_nThe corresponding weight component is denoted as W₁，W₂，…，W_n. To V_pPerforms the following operations:

(A4.1) if it is an Add node operation, V_p.W_i＝(L_p*V_p.W_i+L_c*V_c.W_i)/(L_p+L_c) 1,2, …, N_pIs changed to (L)_p+L_c)；

(A4.2) if it is a delete node operation, V_p.W_i＝(L_p*V_p.W_i-L_c*V_c.W_i)/(L_p-L_c) 1,2, …, N_pIs changed to (L)_p-L_c)。

A5. Will N_pAnd (3) as a target node, starting from (2).

Further, the error completion process can be subdivided into an inverse document frequency error accumulation vector updating part and a feature item weight batch updating part. Note that the inverse document frequency follows the convention in the conventional tf-idf algorithm, and in the present invention, the inverse document frequency of a feature item is calculated in terms of the total number of concepts and the number of concepts containing the feature item. Here, the "concept" corresponds to a "document" in the conventional td-idf algorithm.

There are several global values in the whole concept space, including the inverse document frequency vector V_idfAnd vector V of initial values of inverse document frequency_ini. Suppose there are n different feature items in the vector space, which are respectively denoted as T₁，T₂，…，T_n. Given a concept vector V, which corresponds to a feature term T_iIs recorded as V.W_i(ii) a For the characteristic item T_iComprising T_iTotal number of concepts of (2) is denoted as T_iAnd F. Wherein the updating part of the inverse document frequency error accumulation vector is executed immediately after the increment calculation process is finished each time, and the method comprises the following steps of executing on a computer:

D1. obtaining the total concept quantity A and the inverse document frequency vector V in the current concept tree_idfVector V of initial values of inverse document frequency_ini。

D2. To V_idfAnd V_iniPerforms the following operations:

(D2.1) if V_idf.W_iWhen the value is 0, then V_ini.W_i＝log((A/(T_i.F+0.01))+0.01)，i＝1,2，…，n；

(D2.2)V_idf.W_i＝log((A/(T_i.F+0.01))+0.01)，i＝1,2，…，n；

The characteristic item weight batch updating part is executed after a plurality of times of incremental calculation processes, the execution is not required to be immediately executed after the completion of certain incremental calculation, the frequency can be changed according to requirements, and the execution process comprises the following steps executed on a computer:

E1. obtaining a current inverse document frequency vector V_idfVector V of initial values of inverse document frequency_ini。

E2. For each node N in the concept tree, its corresponding concept vector V performs the operations of:

V.W_i＝V.W_i*V_idf.W_i/V_ini.W_i，i＝1,2，…，n。

E3.V_ini.W_mi＝V_idf.W_i，i＝1,2，…，n。

further, the personal big data management model is used for completing a series of functions of organization, storage, management, processing and the like of personal information. The personal big data management model comprises a resource layer, a concept space layer and an application layer:

F1) the resource layer includes a large amount of personal data stored in the DBMS, file system, and other systems. Wherein the personal information in the file system includes textual data and non-textual data. The text data comprises data such as email, pdf files, office files and html files, and the non-text data comprises data such as video, audio and pictures;

F2) the concept space layer uses concepts to point to a set formed by information resources with similarity or correlation among the information resources, uniformly identifies data with different types and formats by using the concepts, establishes mutual association and facilitates the abstraction and management of information resources by users;

F3) the application layer is responsible for interacting with a user and providing applications including navigation technology, visualization technology, editing tools and the like.

The concept space layer organizes personal information in a concept tree manner. The concept tree is formed by semantic associations between concepts. Thus, the concept tree satisfies the following condition:

G1) the hierarchical relation of all concepts forms a tree structure, the nodes in the tree represent the concepts, and the edges represent the upper and lower relations among the concepts;

G2) the root node is used as a concept complete set identifier, the branch node is a concept with lower child nodes, and the leaf node is a concept without child nodes;

G3) each branch node has no less than one child node.

Still further, the whole stage takes a vector space model as a support. The vector space model comprises four parts of constructing concept vectors, storing the concept vectors, maintaining the concept vectors and calculating the similarity:

H1) the constructed concept vector is a vector formed by representing concepts into feature items and feature weights according to information resource sets contained in the concepts;

H2) the concept vector storage is to store the related information of the concept vector obtained in the concept vector construction process into a database;

H3) the maintenance concept vector is used for reflecting the changes to the concept vector of the related concept after the concept tree structure is changed and accumulated for a certain number of times;

H4) the similarity calculation is to calculate the similarity between the selected concept and other concepts according to the concept vector of the selected concept and other concepts.

Compared with the prior art, the document vectorization method under the current single-level classification structure is expanded, concept vectorization of the level concept structure is realized, and the method is the basis for realizing similarity calculation between concepts. Aiming at massive personal big data, a vector increment calculation method is provided, vector space can be efficiently adjusted according to changes of a concept tree structure, errors generated in the increment calculation process are accumulated, and after the errors are accumulated to a certain degree, the vector space is updated in batches to repair the errors. Not only ensures the vectorization effect, but also greatly reduces the calculation amount.

The invention has the advantages that: the concept vector in the concept space in the massive big data processing can be adjusted rapidly, and the execution efficiency is improved.

Drawings

FIG. 1 is a schematic diagram of the personal big data management model and vector space model of the present invention.

FIG. 2 is a schematic diagram of feature vectors in the vector space model of the present invention.

FIG. 3 is a general flow diagram of the method of the present invention.

Fig. 4 is a flow chart of the concept vector merging phase in the present invention.

FIG. 5 is a flow chart of the incremental computation process of the present invention.

FIG. 6 is a flowchart of the updating portion of the inverse document frequency error accumulation vector in the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1, a concept vectorization method based on a hierarchical concept structure is applied to a concept space layer of a personal big data management model. The personal big data management model is used for finishing a series of functions of organization, storage, management, processing and the like of personal information, and comprises a resource layer, a concept space layer and an application layer:

F1. the resource layer includes personal information stored in the DBMS, file system, and other systems. The personal information in the file system comprises text data and non-text data, wherein the text data comprises data such as email, pdf files, office files and html files, and the non-text data comprises data such as video, audio and pictures;

F2. the concept space layer uses concepts to point to a set formed by information resources with similarity or correlation among the information resources, the concepts are used for uniformly identifying data with different types and formats, and mutual correlation is established, so that a user can conveniently abstract and manage the information resources. The concept space layer organizes the personal data space in a concept tree manner. Concept trees are formed by semantic associations between concepts. Thus, the concept tree satisfies the following condition: the hierarchical relation of all concepts forms a tree structure, the nodes in the tree represent the concepts, and the edges represent the upper and lower relations among the concepts; the root node is used as a concept complete set identifier, the branch node is a concept with lower child nodes, and the leaf node is a concept without child nodes; each branch node has no less than one child node.

F3. The application layer is responsible for interacting with users and providing applications including navigation technology, visualization technology, editing tools and the like. Visualization techniques present a concept tree of concept space layers and provide view support for navigation techniques, editing tools. Editing tools provide operations to add concepts, render concepts, establish semantic associations, merge concepts, move concepts, and the like.

Concept vectorization based on a hierarchical concept structure includes a vector space initialization stage and a vector increment calculation stage.

The vector space initialization stage may be further subdivided into a pre-processing stage and a conceptual vector merging stage. In the preprocessing stage, a vector space model is used as a support, the concept of each node on the concept tree is vectorized and expressed into a concept vector, and the total number of words of each node is recorded. Referring to fig. 1, the vector space model includes four parts of constructing concept vectors, storing concept vectors, maintaining concept vectors, and calculating similarity:

G1. constructing a concept vector is to represent concepts as a vector consisting of feature terms and corresponding weights according to the personal information collection contained. If the personal information is text data, the following steps can be adopted to construct the concept vector:

G11) performing word segmentation on the personal information text data by adopting a word segmentation device to obtain characteristic items;

G12) and calculating the weight of the characteristic item by adopting a tf-idf method. The weight of the feature t in the concept d is as follows: tf idf. Where tf represents the probability of occurrence of the feature t in the concept d, idf represents the inverse document frequency with a value of log ((N/(a +0.01)) +0.01), N represents the total number of concepts contained in the concept tree, a represents the total number of concepts containing the feature t, and +0.01 is operated to prevent the occurrence of a denominator or log-true number less than or equal to 0;

G13) and selecting the characteristic items by adopting an information gain method. The information gain is an index for measuring the importance degree of a characteristic item commonly used in the field of machine learning, and the information quantity carried by the characteristic item is calculated according to the condition that text characteristics appear or do not appear in a text;

G14) and according to the personal information file set contained in the concept, each characteristic item is assigned with a weight, and the concept is also represented as a vector consisting of the characteristic items and the characteristic weights. Each row in fig. 2 is a feature vector, which represents concept i and represents the weight corresponding to the ith feature item;

G2. storing the characteristic vector is to store the related information of the concept vector obtained in the process of constructing the concept vector into a database;

G3. maintaining concept vectors is to reflect changes to concept vectors of related concepts after the concept tree structure is changed and accumulated for a certain number of times;

G4. the similarity calculation is to calculate the similarity between the selected concept and other concepts based on the concept vector of the selected concept and other concepts.

Referring to fig. 4, the concept vector merging phase comprises running on a computer the following steps:

1) taking a root node of the concept tree as a target node;

4) And calculating the sum L of the total number of words of the target node and all the child nodes thereof. Creating a concept vector V in vector space_new；

5) Assume that there are n different feature terms T in the vector space₁，T₂，…，T_nThen given the concept directionQuantity V, corresponding to characteristic term T_iIs recorded as V.W_iWherein the total number of corresponding words is marked as L_V，V_CiTotal number of words of L_Ci(ii) a Calculating V_new.W_i＝(V.W_i*L_V+V_C1.W_i*L_C1+V_C2.W_i*L_C2+…+V_Cm.W_i*L_Cm) L, where i ═ 1,2, …, n.

The vector increment calculation stage can be divided into an increment calculation process and an error completion process. Referring to fig. 5, the incremental computation is performed after each operation (such as adding, deleting, and moving a concept node) performed on the concept tree by the user, and includes the following steps (the following operations are performed after adding or deleting a node) executed on the computer:

A1. node N to be added or deleted_cAs a target node;

A4. Suppose that the vector space contains n different feature items in total, which are respectively marked as T₁，T₂…, Tn, the corresponding weight component is denoted as W₁，W₂，…，W_n. To V_pPerforms the following operations:

(4.1) if it is an Add node operation, V_p.W_i＝(L_p*V_p.W_i+L_c*V_c.W_i)/(L_p+L_c) 1,2, …, N_pIs changed to (L)_p+L_c)；

(4.2) if the operation is to delete the node, V_p.W_i＝(L_p*V_p.W_i-L_c*V_c.W_i)/(L_p-L_c) 1,2, …, N_pIs changed to (L)_p-L_c)。

A5. Will N_pAnd (3) as a target node, starting from (2).

In particular, the mobile node operation can be regarded as one adding node operation after one deleting node operation, and the target nodes of the two operations are the same.

There are several global values in the whole concept space, including the inverse document frequency vector V_idfAnd vector V of initial values of inverse document frequency_ini. Suppose there are n different feature items in the vector space, which are respectively denoted as T₁，T₂，…，T_n. Given a concept vector V, which corresponds to a feature term T_iIs recorded as V.W_i(ii) a For the characteristic item T_iComprising T_iTotal number of concepts of (2) is denoted as T_iAnd F. Wherein the updating part of the inverse document frequency error accumulation vector is executed immediately after the increment calculation process is finished each time, referring to fig. 6, the method comprises the following steps executed on the computer:

D3. obtaining the total concept quantity A and the inverse document frequency vector V in the current concept tree_idfVector V of initial values of inverse document frequency_ini。

D4. To V_idfAnd V_iniPerforms the following operations:

(D2.1) if V_idf.W_iWhen is equal to 0, then V_ini.W_i＝log((A/(T_i.F+0.01))+0.01)，i＝1,2，…，n；

(D2.2)V_idf.W_i＝log((A/(T_i.F+0.01))+0.01)，i＝1,2，…，n；

E4. obtaining a current inverse document frequency vector V_idfVector V of initial values of inverse document frequency_ini。

E5. For each node N in the concept tree, its corresponding concept vector V performs the operations of:

V.W_i＝V.W_i*V_idf.W_i/V_ini.W_i，i＝1,2，…，n。

E6.V_ini.W_mi＝V_idf.W_i，i＝1,2，…，n。

the above embodiments are only for illustrating the invention, and all the steps can be changed, and all the equivalent changes and modifications based on the technical scheme of the invention should not be excluded from the protection scope of the invention.

Claims

1. A level concept vectorization increment processing method in personal big data management comprises a vector space initialization stage and a vector increment calculation stage, wherein the vector space initialization stage can be further subdivided into a preprocessing stage and a concept vector merging stage, and the vector increment calculation stage can be divided into an increment calculation process and an error completion process; the preprocessing stage vectorizes the concept of each node in the concept tree to be expressed as a concept vector, and records the total word number of each node and the inverse document frequency of each characteristic item; the concept vector merging phase comprises running the following steps on a computer:

1) taking a root node of the concept tree as a target node;

(3.1) if there is a child node C_iIs a branch node and its corresponding concept vector is not merged, denoted by C_iMerging the concept vectors of the target nodes from the step 2) for the target nodes;

4) calculating the sum L of the total number of words of the target node and all the child nodes of the target node; creating a concept vector V in vector space_new；

5) Assume that there are n different feature terms T in the vector space₁，T₂，…，T_nThen a concept vector V is given, which corresponds to the feature term T_iIs recorded as V.W_iWherein the total number of corresponding words is marked as L_V，V_CiTotal number of words of L_Ci(ii) a Calculating V_new.W_i＝(V.W_i*L_V+V_C1.W_i*L_C1+V_C2.W_i*L_C2+…+V_Cm.W_i*L_Cm) L, where i ═ 1,2, …, n;

6) changing concept vector corresponding to target node into V_newThe total number of words is changed to L;

the incremental calculation process is executed immediately after a user updates the concept tree each time; the updating operation of the concept tree comprises adding, deleting or moving concept nodes, wherein the moving concept nodes are regarded as two steps of operation of deleting and then adding; for adding or deleting nodes, the following steps are executed on the computer:

A1. node N to be added or deleted_cAs a target node;

A2. searching parent node N of target node_p(ii) a If N is present_pIf not, ending the incremental calculation process;

A3. obtaining N_cCorresponding concept vector V_cAnd total number of words L_c，N_pCorresponding concept vector V_pAnd total number of words L_p；

A4. Suppose that the vector space contains n different feature items in total, which are respectively marked as T₁，T₂，…，T_nThe corresponding weight component is denoted as W₁，W₂，…，W_n(ii) a To V_pPerforms the following operations:

(A4.2) if it is a delete node operation, V_p.W_i＝(L_p*V_p.W_i-L_c*V_c.W_i)/(L_p-L_c) 1,2, …, N_pIs changed to (L)_p-L_c)；

A5. Will N_pAs a target node, execution starts from a 2;

the error completion process can be subdivided into an inverse document frequency error accumulation vector updating part and a characteristic item weight batch updating part; there are several global values in the whole concept space, including the inverse document frequency vector V_idfAnd vector V of initial values of inverse document frequency_ini(ii) a Suppose there are n different feature items in the vector space, which are respectively denoted as T₁，T₂，…，T_n(ii) a Given a concept vector V, which corresponds to a feature term T_iIs recorded as V.W_i(ii) a For the characteristic item T_iComprising T_iTotal number of concepts of (2) is denoted as T_i.F; wherein the updating part of the inverse document frequency error accumulation vector is executed immediately after the increment calculation process is finished each time, and the method comprises the following steps of executing on a computer:

D1. obtaining the total concept quantity A and the inverse document frequency vector V in the current concept tree_idfVector V of initial values of inverse document frequency_ini；

D2. To V_idfAnd V_iniPerforms the following operations:

(D2.1) if V_idf.W_i＝＝0，V_ini.W_i＝log((A/(T_i.F+0.01))+0.01)，i＝1,2，…，n；

(D2.2)V_idf.W_i＝log((A/(T_i.F+0.01))+0.01)，i＝1,2，…，n；

E1. obtaining a current inverse document frequency vector V_idfVector V of initial values of inverse document frequency_ini；

E2. For each node N in the concept tree, its corresponding concept vector V performs the operations of: V.W_i＝V.W_i*V_idf.W_i/V_ini.W_i，i＝1,2，…，n；

E3.V_ini.W_i＝V_idf.W_i，i＝1,2，…，n。

2. The method for vectorized incremental processing of concept hierarchy in personal big data management as claimed in claim 1, wherein: the personal big data management model is used for finishing the functions of organization, storage, management and processing of personal information; the personal big data management model comprises a resource layer, a concept space layer and an application layer:

F1) the resource layer includes personal information stored in the DBMS, file system, and other systems;

wherein the personal information in the file system includes textual data and non-textual data; the text data comprises email, pdf files, office files and html file data, and the non-text data comprises video, audio and picture data;

F3) the application layer is responsible for interacting with a user and provides applications including navigation technology, visualization technology and editing tools.

3. The method as claimed in claim 2, wherein the concept space layer organizes personal information in a concept tree manner; the concept tree is formed by semantic associations between concepts; thus, the concept tree satisfies the following condition:

G3) each branch node has no less than one child node.

4. The method for vectorized incremental processing of concept hierarchy in personal big data management as claimed in claim 1, wherein: the whole stage takes a vector space model as a support; the vector space model comprises four parts of constructing concept vectors, storing the concept vectors, maintaining the concept vectors and calculating the similarity: