CN107577690B - Recommendation method and recommendation device for mass information data - Google Patents

Recommendation method and recommendation device for mass information data Download PDF

Info

Publication number
CN107577690B
CN107577690B CN201710346631.2A CN201710346631A CN107577690B CN 107577690 B CN107577690 B CN 107577690B CN 201710346631 A CN201710346631 A CN 201710346631A CN 107577690 B CN107577690 B CN 107577690B
Authority
CN
China
Prior art keywords
user
metadata
cluster
template
static
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710346631.2A
Other languages
Chinese (zh)
Other versions
CN107577690A (en
Inventor
白鹤
侯斌
刘东海
杨帆
颜斯泰
罗亚林
王云福
涂红兵
戴伟琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen China Guangdong Nuclear Engineering Design Co Ltd
Original Assignee
China General Nuclear Power Corp
China Nuclear Power Engineering Co Ltd
Shenzhen China Guangdong Nuclear Engineering Design Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China General Nuclear Power Corp, China Nuclear Power Engineering Co Ltd, Shenzhen China Guangdong Nuclear Engineering Design Co Ltd filed Critical China General Nuclear Power Corp
Priority to CN201710346631.2A priority Critical patent/CN107577690B/en
Publication of CN107577690A publication Critical patent/CN107577690A/en
Application granted granted Critical
Publication of CN107577690B publication Critical patent/CN107577690B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention belongs to the technical field of information processing, and provides a recommendation method and a recommendation device for mass information data. The recommendation method comprises the following steps: acquiring metadata information from an enterprise content management system (ECM); generating a metadata clustering template according to the metadata set sample space of the metadata information; acquiring a static attribute space of a user according to related information of the user; acquiring a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template; monitoring the behavior log of the user, and acquiring the attention word of the user within preset time according to the behavior log of the user; forming a text index according to text analysis of the massive data unstructured document; and searching the content to be recommended according to the text index, the attention words of the user in the preset time and the static mass data template. The invention effectively solves the problem that the user can not timely and effectively obtain the required information.

Description

Recommendation method and recommendation device for mass information data
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a recommendation method and a recommendation device for mass information data.
Background
The content information data of the nuclear power engineering enterprise is complex, the quantity of document materials is huge, the million-level data is achieved, and particularly, the data are transferred by project engineering files, technical documents, business contracts, correspondence pieces and various technical lines (such as AP1000 and EPR three-generation nuclear power technology). Most of technical data are stored in an Enterprise Content Management platform (ECM) in a semi-structured manner, so that the information amount is huge, and technicians cannot obtain related knowledge updates in time.
Therefore, a new technical solution is needed to solve the above technical problems.
Disclosure of Invention
In view of this, embodiments of the present invention provide a recommendation method and a recommendation apparatus for mass information data, which aim to solve the problem that a user cannot timely and effectively obtain required information.
In a first aspect of the embodiments of the present invention, a recommendation method for massive information data is provided, where the recommendation method includes:
acquiring metadata information from an enterprise content management system (ECM);
generating a metadata clustering template according to the metadata set sample space of the metadata information;
acquiring a static attribute space of a user according to related information of the user;
acquiring a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template;
monitoring the behavior log of the user, and acquiring the attention word of the user within preset time according to the behavior log of the user;
forming a text index according to text analysis of the massive data unstructured document;
and searching the content to be recommended according to the text index, the attention words of the user in the preset time and the static mass data template.
In a first aspect of the embodiments of the present invention, a recommendation device for mass information data is provided, where the recommendation device includes:
the system comprises a metadata information acquisition module, a metadata information acquisition module and a metadata information acquisition module, wherein the metadata information acquisition module is used for acquiring metadata information from an enterprise content management system (ECM);
the metadata aggregation template generation module is used for generating a metadata clustering template according to the metadata set sample space of the metadata information;
the static attribute space acquisition module is used for acquiring the static attribute space of the user according to the relevant information of the user;
the static mass data template acquisition module is used for acquiring a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template;
the attention word acquisition module is used for monitoring the behavior log of the user and acquiring the attention words of the user within preset time according to the behavior log of the user;
the text index forming module is used for forming a text index according to the text analysis of the massive data unstructured document;
and the recommended content searching module is used for searching the content to be recommended according to the text index, the attention words of the user in the preset time and the static mass data template.
Compared with the prior art, the embodiment of the invention has the following beneficial effects: according to the embodiment of the invention, a corresponding static mass data template is obtained according to a static attribute space and a metadata clustering template of a user, a behavior log of the user is monitored, an attention word of the user in a preset time is obtained according to the behavior log of the user, and a text index is formed according to text analysis of a mass data unstructured document, so that contents to be recommended can be quickly searched according to the text index, the attention word of the user in the preset time and the static mass data template. The embodiment of the invention can combine the static information with the dynamic data and quickly finish the data knowledge pushing of nuclear power professionals, thereby ensuring that the professionals can timely and effectively obtain accurate matched effective information.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart illustrating an implementation of a method for pushing massive information data according to an embodiment of the present invention;
fig. 2 is a flowchart of an implementation of a method for pushing mass information data according to a second embodiment of the present invention;
fig. 3 is a schematic composition diagram of a pushing device for mass information data according to a third embodiment of the present invention;
fig. 4 is a schematic composition diagram of a pushing device for mass information data according to a fourth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention realizes a semi-structured massive nuclear power information recommendation system, on one hand, the concept of the knowledge ontology is utilized to perform professional clustering analysis on the technical information structured metadata, and the static massive data template in the assumed space is obtained through a massive data learning analysis algorithm in combination with the technical background and induction preference of nuclear power professionals. On the other hand, a text index is formed according to text analysis of the massive data unstructured document and is combined with the dynamic requirements of nuclear power professionals, index retrieval of data is carried out in a static massive data template, utilization and combination of static information and dynamic data are finally achieved, and data knowledge recommendation of the nuclear power professionals is completed.
The invention realizes the matching processing method of the static data (including metadata and text) of the massive semi-structured nuclear power technical document and the massive data of nuclear power professional requirements (including static knowledge background and dynamic requirements). Configurable nuclear power technology document basic information constraint and a nuclear power professional technical personnel background analysis and recognition technology are included; a method for establishing a structured metadata clustering template and a static mass data template; combining a dynamic log capture analysis technology and a text analysis technology; performing a weighted sorting algorithm on the text matching by using an inverted index technology; a nuclear power professional knowledge information recommendation function scheme integrating static information and dynamic requirements. The technical method meets the information propagation and reconstruction requirements of enterprise knowledge management, and ensures that professional technicians can timely and effectively obtain accurate and matched effective information.
The first embodiment is as follows:
fig. 1 shows an implementation process of a recommendation method for mass information data according to an embodiment of the present invention, where the implementation process is detailed as follows:
step S101, metadata information is acquired from the enterprise content management system ECM.
In the embodiment of the present invention, the ECM may be a nuclear power enterprise content management system, and the ECM includes a large amount of enterprise content, including but not limited to metadata information, unstructured file text content, system access and retrieval related logs, and personnel information.
And S102, generating a metadata clustering template according to the metadata set sample space of the metadata information.
Specifically, a complex metadata structure is simplified to generate a metadata clustering template, that is, contents represented by structured metadata are classified by a clustering method to extract a core metadata structure.
Step S103, obtaining the static attribute space of the user according to the relevant information of the user.
Specifically, the static attribute space of the technical personnel is obtained according to the technical personnel background, such as relevant information of profession, department, participation project, stage, position and the like, and the static attribute space of each technical personnel is recorded.
And step S104, acquiring a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template.
Specifically, the static massive data template is obtained by combining nuclear power technology knowledge clustering obtained according to the metadata clustering template in the step S102 and professional background analysis data obtained in the step S103.
Step S105, monitoring the behavior log of the user, and acquiring the attention word of the user within the preset time according to the behavior log of the user.
Specifically, the user's attention points need to be analyzed by a time-sequence-based user behavior log monitoring and recording method, and further, user behaviors and expectations are mined from log data.
First, the contents of user search, review and attention recorded by the system are collected. Secondly, according to the fact that each retrieval content is decomposed into a plurality of keywords, the frequency and the times of the concerned content units of the user retrieval content are recorded according to time factors (time sequences), and finally the recent popular concerned words of the user are formed.
And step S106, forming a text index according to the text analysis of the unstructured documents of the mass data.
Specifically, information is obtained from a text set, the text is analyzed and preprocessed according to a nuclear power dictionary, vocabularies in the text are screened and recognized, and useless words are removed according to a stop word list. The characteristic extraction is to weight and order the words in the text set according to the word frequency of the words in the text set and the proportion of the number of times of the words appearing in each text of the text set to the number of the texts, namely, the words in the dictionary have higher weight. Selecting how many words form a feature vector according to the sequence of the feature words, indexing the massive texts by a MapReduce algorithm, and giving out feature results and abstracts of the documents.
And S107, searching the content to be recommended according to the text index, the attention word of the user in the preset time and the static mass data template.
Specifically, dynamic index retrieval is established on the basis of indexes of a sample space and unstructured texts under a static data space model algorithm, and finally recommended knowledge information is selected through index sorting.
The embodiment of the invention can combine the static information with the dynamic data and quickly finish the data knowledge pushing of nuclear power professionals, thereby ensuring that the professionals can timely and effectively obtain accurate matched effective information.
Example two:
fig. 2 shows an implementation process of the recommendation method for mass information data according to the second embodiment of the present invention, where the implementation process is detailed as follows:
in step S201, metadata information is acquired from the enterprise content management system ECM.
The step is the same as step S101, and reference may be made to the related description of step S101, which is not repeated herein.
And step S202, generating a metadata clustering template according to the metadata set sample space of the metadata information.
The step is the same as step S102, and reference may be made to the related description of step S102, which is not repeated herein.
Optionally, the generating a metadata clustering template according to the metadata set sample space of the metadata information includes:
step one, randomly selecting K objects from the metadata set sample space as initial cluster centers, wherein K is an integer larger than zero, and one cluster object corresponds to one type of technical documents;
calculating the similarity between all objects in the metadata set sample space and K cluster centers, and classifying each object in all the objects into a cluster with the highest similarity to the object;
recalculating the cluster center of each cluster according to the object in each cluster so as to recalculate K cluster centers;
if any cluster center in the K cluster centers which are recalculated changes, recalculating the similarity between all the objects and the K cluster centers which are recalculated, and classifying each object in all the objects into a cluster with the highest corresponding similarity to form a new cluster object;
and step five, repeating the step three and the step four until K cluster centers are not changed any more, wherein the K cluster centers form the metadata clustering template.
The metadata attribute set space is composed of a collection of independent attribute sets that can be in multiple dimensions. Randomly selecting K objects in a metadata set sample space as the centers of initial clusters (the total work division number of professional technology can be more than or equal to the total work division number), calculating the similarity of each object and the centers of the K clusters, classifying each object into the most similar cluster, and calculating a new average value (center) of the objects in the clusters; then calculating the similarity between each object and the centers of the new K clusters, and assigning each object to the most similar cluster again according to the similarity between each object and the new cluster mean value to form a new cluster object; and updating the average value of the clusters, namely calculating the average value of each object until the average value does not change any more, and finally forming the metadata clustering template.
It should be noted that the static massive data template includes a plurality of cluster objects, each cluster object includes knowledge contents with the same technical features, that is, one cluster object is a class of technical documents.
Step S203, obtaining the static attribute space of the user according to the relevant information of the user.
The step is the same as step S103, and reference may be made to the related description of step S103, which is not described herein again.
And step S204, acquiring a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template.
And the static attribute space of the user corresponds to the technical characteristic parameters described by the metadata clustering template, the intersection of the attribute parameters of the user and the metadata clustering template is taken, and finally, the attribute weights are adjusted according to actual services to form a static data model template.
Optionally, each user belongs to a category of technology-concerned groups; the obtaining of the corresponding static massive data template according to the static attribute space of the user and the metadata clustering template includes:
calculating the matching relation between the technical documents of each type and the attention population mu of each type according to the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template
Figure GDA0001470423080000071
To obtain the static mass data template, wherein attiIs the ith attribute parameter in the intersection of the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template, n is the number of the attribute parameters in the intersection, Meta (att)i) Is attiAttribute information in the metadata clustering template, speciality (att)i) Is attiAttribute information in the user's static attribute space,
Figure GDA0001470423080000072
is attiThe weight of (2).
For any one document belonging to the static sample space D of the user mu, the static support strength V (mu) and the attribute parameter attiThe attribute information in the metadata clustering template is inversely related to the variance of the attribute information in the static attribute space of the user, although this value should be multiplied by an attribute parameter attiImportance indication of
Figure GDA0001470423080000073
Namely, the weight, and finally, after the information of all the attributes is gathered, the static support strength is formed.
The greater the support strength, the higher the attention degree of the group, so each professional attention matrix can be formed according to the ranking for the use of the subsequent modules.
The step is the same as step S104, and reference may be made to the related description of step S104, which is not repeated herein.
Step S205, monitoring the behavior log of the user, and acquiring the attention word of the user within the preset time according to the behavior log of the user.
The step is the same as step S105, and reference may be made to the related description of step S105, which is not repeated herein.
And step S206, forming a text index according to the text analysis of the massive data unstructured document.
The step is the same as step S106, and reference may be made to the related description of step S106, which is not repeated herein.
And step S207, searching the content to be recommended according to the text index, the attention word of the user in the preset time and the static mass data template.
And dynamic index retrieval is established on the basis of indexes of a sample space and unstructured texts under a static data space model algorithm, and finally recommended knowledge information is selected through index sorting.
The dynamic index retrieval analysis is divided into two aspects, namely content support strength and time support strength.
The content support strength comprises a sample space in a static mass data template, each piece of data in the sample space has corresponding support strength, and the support strengths are calculated from metadata of the nuclear power document; in addition, the method also comprises the step of forming a text index according to the text analysis of the massive data unstructured document, wherein the part is called full text support strength and is a result obtained through the full text index of the document.
The time support strength can be understood as the freshness, from the document perspective, the time factor of document generation is called the document freshness, and the knowledge content viewed, retrieved, downloaded and concerned by the user monitored in step S205 is also related to time, which becomes the attention freshness, and the content information of the attention point and the freshness of each attention point are obtained by calculating the time dimension.
And finally, calculating to obtain a final recommended content result according to the latest attention point of the user and the index sequence of the sample space.
Optionally, the searching for the content to be recommended according to the text index, the attention word of the user within a preset time, and the static massive data template includes:
acquiring the frequency of the attention word of the user in the text index within the preset time
Figure GDA0001470423080000081
Wherein the content of the first and second substances,
Figure GDA0001470423080000082
the j-th attention word of the user in a preset time is shown;
according to
Figure GDA0001470423080000091
And V (mu,) calculating the recommendation strength of each technical document
Figure GDA0001470423080000092
Wherein m is the number of the attention words of the user in the preset time,
Figure GDA0001470423080000093
in order to focus on the temporal freshness weight,
Figure GDA0001470423080000094
for the frequency weight of interest, τ () is the update time parameter of the document;
and generating recommendation content in a list form for the technical documents corresponding to the recommendation strength meeting the preset conditions according to the recommendation strength of each type of technical documents.
The preset time may be a period time set by a user, for example, one week, and is not limited herein. The preset condition may be recommendation strength greater than a preset threshold, and the technical documents corresponding to the recommendation strengths may be arranged in a descending order according to the recommendation strength.
And step S208, recording the searched content to be recommended and the static mass data template.
And recording the operation process, namely recording the static support vector result on one hand, and recording the dynamic requirement updating process and the dynamic index information on the other hand.
Example three:
fig. 3 is a schematic composition diagram of a recommendation apparatus for massive information data according to a third embodiment of the present invention, and for convenience of description, only the parts related to the third embodiment of the present invention are shown, which is detailed as follows:
a metadata information acquisition module 31 for acquiring metadata information from the enterprise content management system ECM;
a metadata aggregation template generation module 32, configured to generate a metadata clustering template according to a metadata set sample space of the metadata information;
a static attribute space obtaining module 33, configured to obtain a static attribute space of a user according to relevant information of the user;
a static mass data template obtaining module 34, configured to obtain a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template;
the word-of-interest obtaining module 35 is configured to monitor the behavior log of the user, and obtain a word of interest of the user within a preset time according to the behavior log of the user;
a text index forming module 36, configured to form a text index according to text analysis of the unstructured documents with mass data;
and the recommended content searching module 37 is configured to search for a content to be recommended according to the text index, the word of interest of the user within a preset time, and the static mass data template.
The metadata information obtaining module 31 is an interface module between a recommendation device for mass information data and an enterprise content management platform, and is responsible for performing data interaction with an ECM (nuclear power enterprise content management system), wherein the enterprise content mainly includes: metadata information, unstructured file text content, system access and retrieval related logs, and personnel information. These information will be stored in the metadata information acquisition module 31 collectively for each module to call, and the main user is the metadata aggregation template generation module 32.
In addition, the update of the system integration data is also taken charge of by the metadata information acquisition module 31.
The recommendation device for mass information data provided in the embodiment of the present invention may be used in the aforementioned first recommendation method embodiment, and for details, refer to the description of the aforementioned first recommendation method embodiment, which is not described herein again.
Example four:
fig. 4 is a schematic composition diagram of a recommendation apparatus for mass information data according to a fourth embodiment of the present invention, and for convenience of description, only the parts related to the embodiment of the present invention are shown, which is detailed as follows:
a metadata information acquisition module 41 for acquiring metadata information from the enterprise content management system ECM;
a metadata aggregation template generating module 42, configured to generate a metadata clustering template according to a metadata set sample space of the metadata information;
a static attribute space obtaining module 43, configured to obtain a static attribute space of a user according to relevant information of the user;
a static mass data template obtaining module 44, configured to obtain a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template;
an attention word obtaining module 45, configured to monitor a behavior log of the user, and obtain an attention word of the user within a preset time according to the behavior log of the user;
a text index forming module 46, configured to form a text index according to text analysis of the unstructured documents with mass data;
a recommended content searching module 47, configured to search for a content to be recommended according to the text index, the word of interest of the user within a preset time, and the static mass data template;
and the log recording module 48 is used for recording the searched content to be recommended and the static mass data template.
The metadata clustering template generating module 42 includes:
a selecting unit 421, configured to arbitrarily select K objects from the metadata set sample space as initial cluster centers, where K is an integer greater than zero, and one of the cluster objects corresponds to one class of technical documents;
a first calculating unit 422, configured to calculate similarities of all objects in the metadata set sample space and K cluster centers, and classify each object in all the objects into a cluster with the highest similarity to the object;
a second calculating unit 423 for recalculating the cluster center of each cluster according to the objects in the cluster to recalculate K cluster centers;
a third calculating unit 424, configured to, if any cluster center of the recalculated K cluster centers changes, recalculate the similarity between the all objects and the recalculated K cluster centers, and classify each object of the all objects into a cluster with the highest corresponding similarity, so as to form a new cluster object;
a forming unit 425 configured to repeatedly execute the second calculating unit and the third calculating unit until K cluster centers are no longer changed, the K cluster centers forming the metadata clustering template.
Each user belongs to a technical concern group; the static massive data template obtaining module 44 is specifically configured to:
calculating the matching relation between the technical documents of each type and the attention population mu of each type according to the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template
Figure GDA0001470423080000111
To obtain the static mass data template, wherein attiIs the ith attribute parameter in the intersection of the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template, n is the number of the attribute parameters in the intersection, Meta (att)i) Is attiValue in the static attribute space of the user, Special (att)i) Is attiA value in the metadata cluster template,
Figure GDA0001470423080000121
is attiThe weight of (2).
The recommended content search module 47 includes:
a frequency obtaining unit 471, configured to obtain a frequency of appearance of a word of interest of the user in a text index within a preset time
Figure GDA0001470423080000122
Wherein the content of the first and second substances,
Figure GDA0001470423080000123
the j-th attention word of the user in a preset time is shown;
a recommendation force calculation unit 472 for calculating a recommendation force according to
Figure GDA0001470423080000124
And V (mu,) calculating the recommendation strength of each technical document
Figure GDA0001470423080000125
Wherein m is the number of the attention words of the user in the preset time,
Figure GDA0001470423080000126
in order to focus on the temporal freshness weight,
Figure GDA0001470423080000127
to focus on the frequency weight, τ () is the documentThe update time parameter of (2);
the recommended content generating unit 473 is configured to generate the recommended content in a list form for the technical documents corresponding to the recommendation strength that meets the preset condition according to the recommendation strength of each type of technical document.
The recommendation device for mass information data provided in the embodiment of the present invention may be used in the aforementioned second corresponding recommendation method embodiment, and for details, reference is made to the description of the aforementioned second embodiment, which is not described herein again.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the foregoing function distribution may be completed by different functional modules as required, that is, the internal structure of the apparatus is divided into different functional modules, and the functional modules may be implemented in a hardware form or a software form. In addition, the specific names of the functional modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application.
In conclusion, the embodiment of the invention fills the recommendation problem of nuclear power structured mass information, can effectively combine with the attention information according to the characteristics of nuclear power technical files and the professional attributes of professionals, and can adapt to various nuclear power technical routes. The system can dynamically record the user attention information and record the related operation in a log form. The invention constructs an intelligent knowledge extraction and matching processing method for nuclear power technical data, effectively improves the propagation efficiency and accuracy of nuclear power technical information knowledge, effectively improves the working efficiency, reduces the production cost, and is stable and reliable.
It will be further understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (6)

1. A recommendation method for massive information data is characterized by comprising the following steps:
acquiring metadata information from an enterprise content management system (ECM);
generating a metadata clustering template according to the metadata set sample space of the metadata information;
acquiring a static attribute space of a user according to related information of the user;
each user belongs to a class of technology concern groups; acquiring a corresponding static mass data template according to the static attribute space of the user and the metadata clustering template, wherein the method comprises the following steps: calculating the matching relation between each type of technical document and each type of technical concern group mu according to the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template
Figure FDA0002681020900000011
To obtain the static mass data template, wherein attiIs the ith attribute parameter in the intersection of the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template, n is the number of the attribute parameters in the intersection, Meta (att)i) Is attiValue in the static attribute space of the user, Special (att)i) Is attiA value in the metadata cluster template,
Figure FDA0002681020900000012
is attiThe weight of (2);
monitoring the behavior log of the user, and acquiring the attention word of the user within preset time according to the behavior log of the user;
forming a text index according to text analysis of the massive data unstructured document;
searching contents to be recommended according to the text index, the attention words of the user in a preset time and the static mass data template;
and recording the searched content to be recommended and the static mass data template.
2. The recommendation method according to claim 1, wherein the generating a metadata clustering template according to the metadata set sample space of the metadata information comprises:
step one, randomly selecting K objects from the metadata set sample space as initial cluster centers, wherein K is an integer larger than zero, and one cluster object corresponds to one type of technical documents;
calculating the similarity between all objects in the metadata set sample space and K cluster centers, and classifying each object in all the objects into a cluster with the highest similarity to the object;
recalculating the cluster center of each cluster according to the object in each cluster so as to recalculate K cluster centers;
if any cluster center in the K cluster centers which are recalculated changes, recalculating the similarity between all the objects and the K cluster centers which are recalculated, and classifying each object in all the objects into a cluster with the highest corresponding similarity to form a new cluster object;
and step five, repeating the step three and the step four until K cluster centers are not changed any more, wherein the K cluster centers form the metadata clustering template.
3. The recommendation method according to claim 2, wherein the searching for the content to be recommended according to the text index, the word of interest of the user within a preset time, and the static massive data template comprises:
acquiring the frequency of the attention word of the user in the text index within the preset time
Figure FDA0002681020900000021
Wherein the content of the first and second substances,
Figure FDA0002681020900000022
the j-th attention word of the user in a preset time is shown;
according to
Figure FDA0002681020900000023
And V (mu,) calculating the recommendation strength of each technical document
Figure FDA0002681020900000024
Wherein m is the number of the attention words of the user in the preset time,
Figure FDA0002681020900000025
in order to focus on the temporal freshness weight,
Figure FDA0002681020900000026
for the frequency weight of interest, τ () is the update time parameter of the document;
and generating recommendation content in a list form for the technical documents corresponding to the recommendation strength meeting the preset conditions according to the recommendation strength of each type of technical documents.
4. A recommendation apparatus for mass information data, the recommendation apparatus comprising:
the system comprises a metadata information acquisition module, a metadata information acquisition module and a metadata information acquisition module, wherein the metadata information acquisition module is used for acquiring metadata information from an enterprise content management system (ECM);
the metadata aggregation template generation module is used for generating a metadata clustering template according to the metadata set sample space of the metadata information;
the static attribute space acquisition module is used for acquiring the static attribute space of the user according to the relevant information of the user;
a static mass data template obtaining module for obtaining corresponding static state according to the static attribute space of the user and the metadata clustering templateMass data templates; each user belongs to a class of technology concern groups; the static mass data template acquisition module is specifically configured to: calculating the matching relation between each type of technical document and each type of technical concern group mu according to the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template
Figure FDA0002681020900000031
To obtain the static mass data template, wherein attiIs the ith attribute parameter in the intersection of the attribute parameters in the static attribute space of the user and the attribute parameters in the metadata clustering template, n is the number of the attribute parameters in the intersection, Meta (att)i) Is attiValue in the static attribute space of the user, Special (att)i) Is attiA value in the metadata cluster template,
Figure FDA0002681020900000032
is attiThe weight of (2);
the attention word acquisition module is used for monitoring the behavior log of the user and acquiring the attention words of the user within preset time according to the behavior log of the user;
the text index forming module is used for forming a text index according to the text analysis of the massive data unstructured document;
the recommended content searching module is used for searching contents to be recommended according to the text index, the attention words of the user in a preset time and the static mass data template;
and the log recording module is used for searching the content to be recommended and the static mass data template.
5. The recommendation device of claim 4, wherein the metadata clustering template generation module comprises:
a selecting unit, configured to arbitrarily select K objects from the metadata set sample space as initial cluster centers, where K is an integer greater than zero, and one of the cluster objects corresponds to one class of technical documents;
the first calculating unit is used for calculating the similarity between all objects in the metadata set sample space and K cluster centers and classifying each object in all the objects into a cluster with the highest similarity to the object;
a second calculating unit, configured to recalculate a cluster center of each cluster according to an object in the cluster, so as to recalculate K cluster centers;
a third calculating unit, configured to recalculate the similarity between all the objects and the recalculated K cluster centers if any one of the recalculated K cluster centers is changed, and classify each of the objects into a cluster with the highest corresponding similarity, so as to form a new cluster object;
and the forming unit is used for repeatedly executing the second calculating unit and the third calculating unit until K cluster centers are not changed any more, and the K cluster centers form the metadata clustering template.
6. The recommendation device of claim 5, wherein the recommended content search module comprises:
a frequency obtaining unit, configured to obtain a frequency of appearance of a word of interest of the user in a text index within a preset time
Figure FDA0002681020900000041
Wherein the content of the first and second substances,
Figure FDA0002681020900000042
the j-th attention word of the user in a preset time is shown;
a recommendation force calculation unit for calculating a recommendation force based on
Figure FDA0002681020900000043
And V (mu,) calculating the recommendation strength of each technical document
Figure FDA0002681020900000044
Wherein m is the number of the attention words of the user in the preset time,
Figure FDA0002681020900000045
in order to focus on the temporal freshness weight,
Figure FDA0002681020900000046
for the frequency weight of interest, τ () is the update time parameter of the document;
and the recommended content generating unit is used for generating recommended content in a list form according to the recommendation strength of each type of technical document and the technical documents corresponding to the recommendation strength meeting the preset conditions.
CN201710346631.2A 2017-05-17 2017-05-17 Recommendation method and recommendation device for mass information data Active CN107577690B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710346631.2A CN107577690B (en) 2017-05-17 2017-05-17 Recommendation method and recommendation device for mass information data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710346631.2A CN107577690B (en) 2017-05-17 2017-05-17 Recommendation method and recommendation device for mass information data

Publications (2)

Publication Number Publication Date
CN107577690A CN107577690A (en) 2018-01-12
CN107577690B true CN107577690B (en) 2021-01-05

Family

ID=61049374

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710346631.2A Active CN107577690B (en) 2017-05-17 2017-05-17 Recommendation method and recommendation device for mass information data

Country Status (1)

Country Link
CN (1) CN107577690B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446333B (en) * 2018-02-22 2022-01-18 寇毅 Big data text mining processing system and method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577579A (en) * 2013-11-08 2014-02-12 南方电网科学研究院有限责任公司 Resource recommendation method and system based on potential demands of users
CN104615779A (en) * 2015-02-28 2015-05-13 云南大学 Method for personalized recommendation of Web text
CN106383887A (en) * 2016-09-22 2017-02-08 深圳市博安达信息技术股份有限公司 Environment-friendly news data acquisition and recommendation display method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110191311A1 (en) * 2010-02-03 2011-08-04 Gartner, Inc. Bi-model recommendation engine for recommending items and peers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577579A (en) * 2013-11-08 2014-02-12 南方电网科学研究院有限责任公司 Resource recommendation method and system based on potential demands of users
CN104615779A (en) * 2015-02-28 2015-05-13 云南大学 Method for personalized recommendation of Web text
CN106383887A (en) * 2016-09-22 2017-02-08 深圳市博安达信息技术股份有限公司 Environment-friendly news data acquisition and recommendation display method and system

Also Published As

Publication number Publication date
CN107577690A (en) 2018-01-12

Similar Documents

Publication Publication Date Title
US6665661B1 (en) System and method for use in text analysis of documents and records
US8171029B2 (en) Automatic generation of ontologies using word affinities
US8533203B2 (en) Identifying synonyms of entities using a document collection
US11157550B2 (en) Image search based on feature values
CN108304444B (en) Information query method and device
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
US20100293179A1 (en) Identifying synonyms of entities using web search
CN113868235A (en) Big data-based information retrieval and analysis system
CN114238573B (en) Text countercheck sample-based information pushing method and device
CN111125086A (en) Method, device, storage medium and processor for acquiring data resources
CN112100396A (en) Data processing method and device
CN111026870A (en) ICT system fault analysis method integrating text classification and image recognition
CN112100506B (en) Information pushing method, system, equipment and storage medium
KR101753768B1 (en) A knowledge management system of searching documents on categories by using weights
Benny et al. Hadoop framework for entity resolution within high velocity streams
CN107577690B (en) Recommendation method and recommendation device for mass information data
Ruambo et al. Towards enhancing information retrieval systems: A brief survey of strategies and challenges
CN111046059B (en) Low-efficiency SQL statement analysis method and system based on distributed database cluster
JP2013174988A (en) Similar document retrieval support apparatus and similar document retrieval support program
CN112100177A (en) Data storage method and device, computer equipment and storage medium
CN111026940A (en) Network public opinion and risk information monitoring system and electronic equipment for power grid electromagnetic environment
CN114780712B (en) News thematic generation method and device based on quality evaluation
CN115757735A (en) Intelligent retrieval method and system for power grid digital construction result resources
CN114281983B (en) Hierarchical text classification method, hierarchical text classification system, electronic device and storage medium
Fischer et al. Timely semantics: a study of a stream-based ranking system for entity relationships

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20181225

Address after: 518124 Office Building of Daya Bay Nuclear Power Base Engineering Company, Pengfei Road, Dapeng New District, Shenzhen City, Guangdong Province

Applicant after: China Nuclear Power Engineering Co., Ltd.

Applicant after: Shenzhen China Nuclear Power Design Co., Ltd.

Applicant after: China General Nuclear Power Corporation

Address before: 518124 Office Building of Daya Bay Nuclear Power Base Engineering Company, Pengfei Road, Dapeng New District, Shenzhen City, Guangdong Province

Applicant before: China Nuclear Power Engineering Co., Ltd.

Applicant before: China General Nuclear Power Corporation

GR01 Patent grant
GR01 Patent grant