CN109933717B - Academic conference recommendation system based on hybrid recommendation algorithm - Google Patents

Academic conference recommendation system based on hybrid recommendation algorithm Download PDF

Info

Publication number
CN109933717B
CN109933717B CN201910042396.9A CN201910042396A CN109933717B CN 109933717 B CN109933717 B CN 109933717B CN 201910042396 A CN201910042396 A CN 201910042396A CN 109933717 B CN109933717 B CN 109933717B
Authority
CN
China
Prior art keywords
conference
mail
user
webpage
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910042396.9A
Other languages
Chinese (zh)
Other versions
CN109933717A (en
Inventor
张凌
徐傲雪
张晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910042396.9A priority Critical patent/CN109933717B/en
Publication of CN109933717A publication Critical patent/CN109933717A/en
Application granted granted Critical
Publication of CN109933717B publication Critical patent/CN109933717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses an academic conference recommendation system based on a mixed recommendation algorithm, which can acquire academic conference information from a user personal mail and a publishing website of a public conference, generate a conference information abstract by extracting and processing original information through content filtering and effective information, realize personalized recommendation of the academic conference information by fusing a collaborative filtering based on a user and the mixed recommendation algorithm based on content according to the historical behavior of the user and an academic conference text representation method based on TF-IDF and word vectors, and realize information push through a display based on WEB and a network disk filing method. The academic conference recommendation system is helpful for improving the information processing efficiency of scientific research personnel and effectively solving the problem of excessive selection of academic conferences.

Description

Academic conference recommendation system based on hybrid recommendation algorithm
Technical Field
The invention relates to the technical field of computer networks, in particular to an academic conference recommendation system based on a hybrid recommendation algorithm.
Background
Today, users can access a large number of network applications through a variety of devices and services. Because the functions provided by the mobile platform are enhanced, users can access interested network resources anytime and anywhere, and the information amount of the internet is increased, the recommendation system becomes an effective strategy for overcoming the information overload problem. The recommendation system can effectively solve the problem of excessive selection hidden under huge network resources, and the like, has non-trivial practicability, and is widely adopted in many network applications. And the diversity of users and resources causes the effect of the recommendation engine realized by a single model to be unsatisfactory, so that the research of the hybrid recommendation algorithm has great significance.
The problem of information overload also exists in the environments of colleges and universities and research institutions, wherein academic conference information is paid much attention, the information is very concerned and frequently contacted by teachers and students of colleges and universities and research personnel, the source channel of the information comprises scientific research mails received by the research personnel, websites for publicly releasing academic conference notifications, internal resource sharing of institutions to which the research personnel belong and the like. In recent years, the number of academic conferences is continuously expanding, and the quality is also poor, so that some common problems exist in reality, on one hand, junk mails, subscription mails and advertisement mails on a network are overflowed, and scientific research personnel need to spend considerable time and energy on screening notification mails of the academic conferences when processing the mails; college personnel on the other hand can be exposed to a lot of academic conference information, but most of the academic conference information does not meet the research fields and the scientific research levels of the college personnel; in addition, the academic conference information often has some unimportant parts, time is spent for positioning effective information when the information is processed, and the personalized push of the academic conference information is realized, so that the efficiency of processing the information by scientific research personnel is improved, the academic enthusiasm is stimulated, the academic research work is promoted, and the study significance is realized.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides an academic conference recommendation system based on a hybrid recommendation algorithm, can acquire academic conference information from a user personal mail and a publishing website of a public conference, realizes personalized recommendation of the academic conference information according to the historical behaviors of the user and the internal characteristics of the academic conference, and realizes information push through two methods of WEB-based display and network disk filing.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: an academic conference recommendation system based on a hybrid recommendation algorithm obtains academic conference notification information through two methods of user mail filtering and public conference notification website information acquisition, preprocessing the mail data and extracting HTML webpage data from the academic conference notification information, processing the mail data and the HTML webpage data to generate conference information with a uniform format, persistently storing the conference information on a server, setting proper time interval to arrange timing task according to server performance and data updating speed, the timing task realizes the calculation of the relevance between the user and the article, the calculation method is a mixed recommendation algorithm which combines the collaborative filtering based on the user and the content, where the content-based algorithm is based on the text representation of the TF-IDF in combination with the word vector, the conference recommendation is carried out on the users according to the relevancy, and the recommendation result is pushed to the users through two methods of webpage display and archiving to a network disk; the system specifically comprises the following modules:
the mail information processing module is used for receiving the electronic mail data, decoding the mail data to form a mail abstract, screening academic conference notification mails, classifying academic conference fields based on an SVM (support vector machine) according to mail texts, extracting effective information based on rules from the screened mail texts, and storing the processed academic conference information metadata and the mail abstract;
the webpage information processing module is used for specifying a target webpage according to system configuration, checking the condition of the target webpage in real time, collecting academic conference notification resources updated by the target webpage, recording invalid webpages which cannot be connected or have structure change, extracting effective information in the webpage by a label-based method, and storing processed academic conference information metadata and webpage summaries;
the academic conference recommendation module is used for preprocessing the conference data stored by the user data and mail information processing module and the webpage information processing module, obtaining the correlation degree of the user items at the appointed time through the timing task by fusing the collaborative filtering based on the user and the mixed recommendation algorithm based on the content, and generating and caching the recommendation result according to the correlation degree and the user configuration;
the network disk filing module is used for acquiring the recommendation result of the academic conference recommendation module, sequentially checking the associated network disks according to the users contained in the recommendation result, and filing the academic conference notification mail abstract or the webpage text recommended to the corresponding users to the network disk of the users;
and the conference information display and configuration management module is used for realizing the management of personal information of a user and the relevant setting of a user configuration system, displaying the abstract of the relevant academic conference, feeding back a recommendation result and subscribing a conference website.
Further, the mail information processing module comprises a mail acquisition component and a mail processing component; the mail collection component realizes mail receiving, mail analysis, mail filtering and mail caching; the mail processing component realizes classification of academic conference notification mail field based on SVM, effective information extraction based on rules, and storage of academic conference information metadata and mail original files; wherein:
the mail collection component creates a socket for monitoring a 25-port, realizes that a channel receives an SMTP connection command and processes the command according TO a protocol specification, refuses transmission if a recipient address identified by an RCPT TO command is not in a user list of a system, acquires the specific content of a mail in transmission meeting conditions, decodes mail message data according TO the SMTP and MIME protocol, extracts a mail head and a mail body of the mail, acquires the recipient address, a sender address and subject data in the mail head, filters the mail body, namely the mail body by using a method based on a keyword rule, and screens out academic conference notification mails; to prevent blocking mail acceptance, a unique file name is generated for mail passing the screening, and the file name is expressed as [ field name: the format of the field writes the mail header data and the mail text into a file to form a mail abstract, creates a mail processing work queue through a Redis database, and adds the file name into the processing queue to inform a subsequent mail processing component;
the method comprises the steps that after a mail processing component acquires a file to be processed by polling a mail processing work queue, a SVM model obtained by offline training is used for carrying out domain multi-classification on a mail text to obtain a conference domain of the academic conference notification mail, the effective information in the mail text is extracted by utilizing the rule of a regular expression and comprises a conference name, a conference expiration number, conference starting time, conference ending time, conference interception time and a conference submission mode, the conference domain, a conference source, namely a mail receiver address and a mail abstract file address are added to be stored in a relational database as conference information metadata, a conference data preprocessing work queue is created through a Redis database, and a conference ID of a newly added conference is added to the data preprocessing work queue.
Further, the webpage information processing module comprises a webpage acquisition component and a webpage processing component; the webpage acquisition component realizes target webpage updating inspection, target webpage information acquisition and target webpage failure alarm; the webpage processing component realizes the extraction of effective webpage information based on tags and the storage of academic conference information metadata and webpage abstract; wherein:
the webpage collection component executes a collection task at 0 point every day, and obtains the short name, URL and latest update date of a target website by reading a configuration file, wherein the target website is a website for publishing a conference holding notice publicly, webpage resources on the collected webpage corresponding to the conference holding notice published after the latest update date are transmitted by an HTML format file and the latest update date of the configuration file is modified, and if the collection fails, error information is obtained, a mail is used for notifying an administrator, and the state of the configuration file corresponding to the target website is changed into unavailable;
the webpage processing component realizes an effective information acquisition function based on an HTML label by observing the webpage structure design of each target website, selects different information extraction functions to process the webpage resources according to the short name of the target website, acquires a conference name, a conference expiration number, a conference start time, a conference end time, a conference cut-off time, a conference field and a conference posting mode, saves a conference holding notification text, saves a webpage source website short name, a webpage collection time and the conference holding notification text as a webpage abstract file, stores conference effective information and a webpage abstract file address in a relational database, and adds a conference ID of a newly added conference to a conference data preprocessing work queue.
Further, the academic conference recommendation module comprises a data preprocessing component and a core recommendation component; the data preprocessing component is used for realizing user data preprocessing, conference information preprocessing, intermediate data storage and failure data checking and cleaning; the core recommendation component realizes correlation calculation, user configuration reading, recommendation result generation and recommendation result caching; wherein:
the data preprocessing component checks the conference data preprocessing queue and the user data preprocessing queue through polling, performs failure data check before data preprocessing, sets the conference state with the conference starting time being later than the current date as a historical conference, and does not serve as a recommended alternative conference, and respectively realizes the following two data preprocessing methods:
for the preprocessing of user data, an article-user inverted list stored by a system is created, the preference similarity of related users is calculated, the preference similarity between the users is measured by cosine similarity, N (u) is set as a conference set interested by the user u, N (v) is set as a conference set interested by the user v, and the preference similarity between the user u and the user v is set as
Figure BDA0001947978240000051
The preprocessing of the conference data comprises the steps of segmenting words and removing stop words from text data of the conference data, performing text representation by using word vectors which are combined with TF-IDF and pre-trained on a large corpus, and representing the document vectors by using the word vectors
Figure BDA0001947978240000052
Wherein DiDenotes the ith document, K (t, D)i) The expression t is in DiTF-IDF value of (1), vtRepresenting word vectors of the words t, and calculating the similarity between every two documents by using the Euclidean distance after obtaining document vectors;
in order to save the computing time, the intermediate data obtained after the preprocessing is stored in a local file system in a persistent mode;
the core recommendation component executes the recommendation task at a fixed time through the timed task setting, and the specific steps are as follows:
a. finding out K users most similar to the user u according to the user preference similarity matrix, representing the K users by using a set S (u, K), extracting all conferences interested by the user in S, and removing the conferences interested by u; for each candidate meeting i, the degree to which user u is interested in it is calculated using the following formula:
Figure BDA0001947978240000053
where N (i) represents a user interested in the i meeting, wuvRepresenting the similarity of the preference of the user u and the user v, and selecting p (u, i) to be the maximum according to the recommended number M of meetings expected to be received and configured by the user2 x M make up the candidate meeting;
b. finding out the most similar conference with the conference set I (u) interested by the user u in the candidate conference set J (u) obtained in the step a according to the text expression vector of the conference and the similarity thereof, and calculating the interest degree of the user u for each candidate conference i belongs to J (u) by using a formula:
Figure BDA0001947978240000061
wherein d isijRepresenting the distance between the conference i and the conference j, and obtaining M conferences which are finally recommended to the user u after M candidate conferences are screened;
c. and creating a network disk filing queue through Redis, and recording the obtained recommendation result in a mode that (user: the json format of conference 1, conference 2, conference 3 … is added to the mesh archive queue.
Further, the network disk filing module needs to integrate a third-party network disk storage service, a DCampus WebLib cloud disk system is used as the network disk storage service, the network disk filing module sequentially inquires user names and passwords of associated network disks for users in a recommendation result when detecting the recommendation result in the process of polling a network disk filing queue, logs in corresponding account numbers through HTTP interfaces provided by the third-party cloud service WebLib, creates a new directory taking the current date as a directory name under the specified directory, and sequentially uploads a conference mail abstract or a webpage abstract in a user recommendation list to the specified directory.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. by aggregating a plurality of mailboxes of users and disclosing information of a plurality of academic conference websites, the information sources can be ensured to be rich and reliable.
2. Through effective information extraction, key information held by academic conferences can be displayed more intuitively.
3. Through a mixed recommendation algorithm, pseudo recommendations which do not accord with the user preferences in the results of the collaborative filtering method based on the users can be screened out.
4. By combining the TF-IDF and the word vector method to express the text, key characteristic information can be obtained under the condition of keeping text semantic information.
5. Through two methods of a WEB-based display mode and a user-associated network disk filing method, the user can be ensured to conveniently check the recommendation result, and the storage of resources is realized.
Drawings
Fig. 1 is an architecture diagram of an academic conference recommendation system based on a hybrid recommendation algorithm.
Detailed Description
The present invention will be further described with reference to the following specific examples.
The academic conference recommendation system based on the hybrid recommendation algorithm provided by the embodiment is an academic conference recommendation system which is developed by mainly using Python language and runs on a CentOS operating system, a mail filing cloud disk supported by the system is a DCampus WebLib cloud disk system developed by Guangzhou county network company and Guangdong province computer network key laboratory, the cloud disk belongs to enterprise-level cloud disk service, and multiple versions such as a standard version, a scientific research institution version and a medical institution version have been developed according to knowledge management requirements of different organizations. File management, user management, full-base search, file cabinet management and the like are supported, and multi-terminal access is supported. As shown in fig. 1, the academic conference recommendation system is realized by supporting an internal module and an HTTP interface of a third party cloud service WebLib, and includes:
and the mail information processing module is used for receiving the electronic mail data, decoding the mail data to form a mail abstract, screening academic conference notification mails, classifying academic conference fields based on the SVM, extracting effective information based on rules from the screened mail body, and storing the processed academic conference information metadata and the mail abstract.
And the webpage information processing module is used for specifying a target webpage according to system configuration, checking the condition of the target webpage in real time, collecting academic conference notification resources updated by the target webpage, recording invalid webpages which cannot be connected or have structural changes, extracting effective information in the webpage by a label-based method, and storing the processed academic conference information metadata and webpage abstract.
And the academic conference recommendation module is used for preprocessing user data and conference data stored by the mail information processing module and the webpage information processing module, acquiring the relevance of the user items at the appointed time through a timing task by fusing a collaborative filtering based on the user and a mixed recommendation algorithm based on the content, and generating and caching a recommendation result according to the relevance and the user configuration.
And the network disk filing module is used for acquiring the recommendation result of the academic conference recommendation module, sequentially checking the associated network disks according to the users contained in the recommendation result, and filing the academic conference notification mail abstract or the webpage text recommended to the corresponding users to the network disk of the user.
And the conference information display and configuration management module is used for realizing the management of personal information and relevant settings of a user configuration system by a user, displaying relevant academic conference summaries, feeding back recommendation results and subscribing a conference website.
The mail information processing module comprises a mail acquisition component and a mail processing component; the mail collection component realizes mail receiving, mail analysis, mail filtering and mail caching; the mail processing component realizes classification of academic conference notification mail field based on SVM, effective information extraction based on rules, and storage of academic conference information metadata and mail original files.
The mail collection component creates a socket for monitoring a 25-port, realizes that a channel receives an SMTP connection command and processes the command according TO a protocol specification, refuses transmission if a recipient address identified by an RCPT TO command is not in a user list of a system, acquires the specific content of a mail in transmission meeting conditions, decodes mail message data according TO the SMTP and MIME protocols, extracts a mail head and a mail body of the mail, acquires the recipient address, a sender address and subject data in the mail head, filters the mail body, namely the mail body by using a method based on a keyword rule, and screens out academic conference notification mails. To prevent blocking mail acceptance, a unique file name is generated for mail passing the screening, and the file name is expressed as [ field name: the field is used for writing the mail header data and the mail text into a file to form a mail abstract, creating a mail processing work queue through a Redis database, and adding the file name into the processing queue to inform a subsequent mail processing component.
After the mail processing component acquires a file to be processed by polling a mail processing work queue, a SVM model obtained by offline training is used for carrying out domain multi-classification on a mail text to obtain a conference domain of the academic conference notification mail, effective information in the mail text is extracted by utilizing a rule of a regular expression and comprises a conference name, a conference expiration number, conference start time, conference expiration time, conference interception time and a conference submission mode, the conference domain, a conference source, namely a mail receiver address and a mail abstract file address are added and stored in a relational database as conference information metadata, a conference data preprocessing work queue is established through a Redis database, and a conference ID of a newly added conference is added to the data preprocessing work queue.
The webpage information processing module comprises a webpage acquisition component and a webpage processing component; the webpage acquisition component realizes target webpage updating inspection, target webpage information acquisition and target webpage failure alarm; the webpage processing component realizes the extraction of effective information of the webpage based on the label and the storage of academic conference information metadata and webpage abstract.
The webpage collection component executes a collection task at 0 point every day, obtains the short name, URL and latest update date of a target website by reading a configuration file, the target website is a website for publishing the conference holding notice publicly, collects webpage resources of the conference holding notice published after the latest update date on a webpage and transmits the webpage resources in an HTML format file, modifies the latest update date of the configuration file, and if collection fails, obtains error information, notifies a manager with a mail and changes the state of the configuration file corresponding to the target website to be unavailable.
The web page processing component realizes an effective information acquisition function based on an HTML label by observing the web page structural design of each target website, selects different information extraction functions to process the web page resources according to the short name of the target website, acquires a conference name, a conference expiration number, a conference start time, a conference end time, a conference cut-off time, a conference field and a conference posting mode, saves a conference holding notification text, saves a web page source website short name, a web page acquisition time and the conference holding notification text as a web page abstract file, stores the effective information of other conferences and the address of the web page abstract file in a relational database, and adds a conference ID of a newly-added conference to a conference data preprocessing work queue.
The academic conference recommendation module comprises a data preprocessing component and a core recommendation component; the data preprocessing component is used for realizing user data preprocessing, conference information preprocessing, intermediate data storage, failure data checking and cleaning; and the core recommendation component realizes correlation calculation, user configuration reading, recommendation result generation and recommendation result caching.
The data preprocessing component checks the conference data preprocessing queue and the user data preprocessing queue through polling, invalid data is checked before data preprocessing is carried out, the conference state with the conference starting time being later than the current date is set as a historical conference and is not used as a recommended alternative conference, two data preprocessing methods are respectively realized as follows, and in order to save the computing time, intermediate data obtained after preprocessing are stored in a local file system in a persistent mode.
For the pretreatment of user data, an article-user inverted list stored by the system is created, the preference similarity of related users is calculated, the system uses cosine similarity to measure the preference similarity between users, N (u) is set as a conference set interested by user u, N (v) is set as a conference set interested by user v, and the preference similarity of user u and user v is set as
Figure BDA0001947978240000101
The preprocessing of the conference data comprises the steps of segmenting words and deactivating words of text data of the conference data, performing text representation by using word vectors which are combined with TF-IDF and are pre-trained on a large corpus, and representing the document vectors as
Figure BDA0001947978240000102
Wherein DiDenotes the ith document, K (t, D)i) The expression t is in DiTF-IDF value of (1), vtA word vector representing the word t, after obtaining a document vector,the similarity between each document is calculated using the euclidean distance.
The core recommendation component executes the recommendation task at a fixed time through the timed task setting, and the specific steps are as follows:
a. and finding out K users most similar to the user u according to the user preference similarity matrix, representing by using a set S (u, K), extracting all the conferences interested by the user in S, and removing the conferences interested by u. For each candidate meeting i, the degree to which user u is interested in it is calculated using the following formula:
Figure BDA0001947978240000103
where N (i) represents a user interested in the i meeting, wuvAnd representing the similarity of the preference of the user u and the user v, and selecting 2M with the largest p (u, i) to form a candidate conference according to the recommended number M of the conferences expected to be received and configured by the user.
b. Finding out the conference most similar to the conference set I (u) interested by the user u in the candidate conference set J (u) obtained in the step i according to the text expression vector of the conference and the similarity thereof, wherein for each candidate conference i belongs to J (u), the interest degree of the user u in the candidate conference set I (u) is calculated by the following formula:
Figure BDA0001947978240000111
wherein d isijAnd representing the distance between the conference i and the conference j, and obtaining M conferences recommended to the user u finally after M candidate conferences are screened.
c. And creating a network disk filing queue through Redis, and recording the obtained recommendation result in a mode that (user: the json format of conference 1, conference 2, conference 3 … is added to the mesh archive queue.
The network disk filing module needs to integrate third-party network disk storage service, the system uses DCampus WebLib as the network disk storage service, and the network disk system is developed by key computer network laboratories of Guangzhou county network company and Guangdong province. And when detecting a recommendation result in the process of polling the network disk filing queue, the network disk filing module sequentially inquires a user name and a password of an associated network disk for a user in the recommendation result, logs in a corresponding account through an HTTP interface provided by WebLib, creates a new directory taking the current date as a directory name under the specified directory, and sequentially uploads a conference mail abstract or a webpage abstract in a user recommendation list to the specified directory.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that the changes in the shape and principle of the present invention should be covered within the protection scope of the present invention.

Claims (4)

1. An academic conference recommendation system based on a hybrid recommendation algorithm is characterized in that: the system acquires academic conference notification information through two methods of user mail filtering and public conference notification website information acquisition, mail data preprocessing and HTML webpage data extraction are respectively carried out on the academic conference notification information, conference information in a uniform format is generated after processing and persistently stored on a server, a proper time interval is set according to server performance and data updating speed to arrange a timing task, the timing task realizes calculation of the correlation degree of a user and an article, the calculation method is a mixed recommendation algorithm combining user-based collaborative filtering and content, wherein the content-based algorithm is based on text expression of TF-IDF combined word vectors, conference recommendation is carried out on the user according to the correlation degree, and a recommendation result is pushed to the user through two methods of webpage display and webpage filing; the system specifically comprises the following modules:
the mail information processing module is used for receiving the electronic mail data, decoding the mail data to form a mail abstract, screening academic conference notification mails, classifying academic conference fields based on an SVM (support vector machine) according to mail texts, extracting effective information based on rules from the screened mail texts, and storing the processed academic conference information metadata and the mail abstract;
the webpage information processing module is used for specifying a target webpage according to system configuration, checking the condition of the target webpage in real time, collecting academic conference notification resources updated by the target webpage, recording invalid webpages which cannot be connected or have structure change, extracting effective information in the webpage by a label-based method, and storing processed academic conference information metadata and webpage summaries;
the academic conference recommendation module is used for preprocessing the conference data stored by the user data and mail information processing module and the webpage information processing module, obtaining the correlation degree of the user items at the appointed time through the timing task by fusing the collaborative filtering based on the user and the mixed recommendation algorithm based on the content, and generating and caching the recommendation result according to the correlation degree and the user configuration;
the network disk filing module is used for acquiring the recommendation result of the academic conference recommendation module, sequentially checking the associated network disks according to the users contained in the recommendation result, and filing the academic conference notification mail abstract or the webpage text recommended to the corresponding users to the network disk of the users;
the conference information display and configuration management module is used for realizing the management of personal information of a user and the relevant setting of a user configuration system, displaying the abstract of a relevant academic conference, feeding back a recommendation result and subscribing a conference website;
the academic conference recommendation module comprises a data preprocessing component and a core recommendation component; the data preprocessing component is used for realizing user data preprocessing, conference information preprocessing, intermediate data storage and failure data checking and cleaning; the core recommendation component realizes correlation calculation, user configuration reading, recommendation result generation and recommendation result caching; wherein:
the data preprocessing component checks the conference data preprocessing queue and the user data preprocessing queue through polling, performs failure data check before data preprocessing, sets the conference state with the conference starting time being later than the current date as a historical conference, and does not serve as a recommended alternative conference, and respectively realizes the following two data preprocessing methods:
for the preprocessing of user data, an article-user inverted list stored by a system is created, the preference similarity of related users is calculated, the preference similarity between the users is measured by cosine similarity, N (u) is set as a conference set interested by the user u, N (v) is set as a conference set interested by the user v, and the preference similarity between the user u and the user v is set as
Figure FDA0002935400500000021
The preprocessing of the conference data comprises the steps of segmenting words and removing stop words from text data of the conference data, performing text representation by using word vectors which are combined with TF-IDF and pre-trained on a large corpus, and representing the document vectors by using the word vectors
Figure FDA0002935400500000022
Wherein DiDenotes the ith document, K (t, D)i) The expression t is in DiTF-IDF value of (1), vtRepresenting word vectors of the words t, and calculating the similarity between every two documents by using the Euclidean distance after obtaining document vectors;
in order to save the computing time, the intermediate data obtained after the preprocessing is stored in a local file system in a persistent mode;
the core recommendation component executes the recommendation task at a fixed time through the timed task setting, and the specific steps are as follows:
a. finding out K users most similar to the user u according to the user preference similarity matrix, representing the K users by using a set S (u, K), extracting all conferences interested by the user in S, and removing the conferences interested by u; for each candidate meeting i, the degree to which user u is interested in it is calculated using the following formula:
Figure FDA0002935400500000031
where N (i) represents a user interested in the i meeting, wuvRepresenting the similarity of the preference of the user u and the user v, and selecting 2 × M with the largest p (u, i) to form a candidate conference according to the number M of recommended conferences expected to be received and configured by the user;
b. finding out the most similar conference with the conference set I (u) interested by the user u in the candidate conference set J (u) obtained in the step a according to the text expression vector of the conference and the similarity thereof, and calculating the interest degree of the user u for each candidate conference i belongs to J (u) by using a formula:
Figure FDA0002935400500000032
wherein d isijRepresenting the distance between the conference i and the conference j, and obtaining M conferences which are finally recommended to the user u after M candidate conferences are screened;
c. and creating a network disk filing queue through Redis, and recording the obtained recommendation result in a mode that (user: the json format of conference 1, conference 2, conference 3 … is added to the mesh archive queue.
2. The academic conference recommendation system based on the hybrid recommendation algorithm according to claim 1, wherein: the mail information processing module comprises a mail acquisition component and a mail processing component; the mail collection component realizes mail receiving, mail analysis, mail filtering and mail caching; the mail processing component realizes classification of academic conference notification mail field based on SVM, effective information extraction based on rules, and storage of academic conference information metadata and mail original files; wherein:
the mail collection component creates a socket for monitoring a 25-port, realizes that a channel receives an SMTP connection command and processes the command according TO a protocol specification, refuses transmission if a recipient address identified by an RCPT TO command is not in a user list of a system, acquires the specific content of a mail in transmission meeting conditions, decodes mail message data according TO the SMTP and MIME protocol, extracts a mail head and a mail body of the mail, acquires the recipient address, a sender address and subject data in the mail head, filters the mail body, namely the mail body by using a method based on a keyword rule, and screens out academic conference notification mails; to prevent blocking mail acceptance, a unique file name is generated for mail passing the screening, and the file name is expressed as [ field name: the format of the field writes the mail header data and the mail text into a file to form a mail abstract, creates a mail processing work queue through a Redis database, and adds the file name into the processing queue to inform a subsequent mail processing component;
the method comprises the steps that after a mail processing component acquires a file to be processed by polling a mail processing work queue, a SVM model obtained by offline training is used for carrying out domain multi-classification on a mail text to obtain a conference domain of the academic conference notification mail, the effective information in the mail text is extracted by utilizing the rule of a regular expression and comprises a conference name, a conference expiration number, conference starting time, conference ending time, conference interception time and a conference submission mode, the conference domain, a conference source, namely a mail receiver address and a mail abstract file address are added to be stored in a relational database as conference information metadata, a conference data preprocessing work queue is created through a Redis database, and a conference ID of a newly added conference is added to the data preprocessing work queue.
3. The academic conference recommendation system based on the hybrid recommendation algorithm according to claim 1, wherein: the webpage information processing module comprises a webpage acquisition component and a webpage processing component; the webpage acquisition component realizes target webpage updating inspection, target webpage information acquisition and target webpage failure alarm; the webpage processing component realizes the extraction of effective webpage information based on tags and the storage of academic conference information metadata and webpage abstract; wherein:
the webpage collection component executes a collection task at 0 point every day, and obtains the short name, URL and latest update date of a target website by reading a configuration file, wherein the target website is a website for publishing a conference holding notice publicly, webpage resources on the collected webpage corresponding to the conference holding notice published after the latest update date are transmitted by an HTML format file and the latest update date of the configuration file is modified, and if the collection fails, error information is obtained, a mail is used for notifying an administrator, and the state of the configuration file corresponding to the target website is changed into unavailable;
the webpage processing component realizes an effective information acquisition function based on an HTML label by observing the webpage structure design of each target website, selects different information extraction functions to process the webpage resources according to the short name of the target website, acquires a conference name, a conference expiration number, a conference start time, a conference end time, a conference cut-off time, a conference field and a conference posting mode, saves a conference holding notification text, saves a webpage source website short name, a webpage collection time and the conference holding notification text as a webpage abstract file, stores conference effective information and a webpage abstract file address in a relational database, and adds a conference ID of a newly added conference to a conference data preprocessing work queue.
4. The academic conference recommendation system based on the hybrid recommendation algorithm according to claim 1, wherein: the network disk archiving module needs to integrate third-party network disk storage service, a DCampus WebLib cloud disk system is used as the network disk storage service, the network disk archiving module sequentially inquires user names and passwords of associated network disks for users in recommendation results when detecting the recommendation results in the process of polling a network disk archiving queue, logs in corresponding account numbers through HTTP interfaces provided by the third-party cloud service WebLib, creates a new directory taking the current date as the directory name under the specified directory, and sequentially uploads meeting mail summaries or webpage summaries in a user recommendation list to the specified directory.
CN201910042396.9A 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm Active CN109933717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910042396.9A CN109933717B (en) 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910042396.9A CN109933717B (en) 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm

Publications (2)

Publication Number Publication Date
CN109933717A CN109933717A (en) 2019-06-25
CN109933717B true CN109933717B (en) 2021-05-14

Family

ID=66985105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910042396.9A Active CN109933717B (en) 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm

Country Status (1)

Country Link
CN (1) CN109933717B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111796830B (en) * 2020-06-08 2023-09-19 成都数之联科技股份有限公司 Protocol analysis processing method, device, equipment and medium
CN112687272B (en) * 2020-12-18 2023-03-21 北京金山云网络技术有限公司 Conference summary recording method and device and electronic equipment
CN113077235B (en) * 2021-04-12 2024-03-22 上海明略人工智能(集团)有限公司 Conference schedule conflict management method, system, electronic equipment and storage medium
CN113127633B (en) * 2021-06-17 2021-09-21 平安科技(深圳)有限公司 Intelligent conference management method and device, computer equipment and storage medium
CN113420058B (en) * 2021-07-01 2022-07-01 宁波大学 Conversational academic conference recommendation method based on combination of user historical behaviors

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101755283A (en) * 2007-07-24 2010-06-23 三星电子株式会社 Method and apparatus for recommending information using hybrid algorithm
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system
CN104572874A (en) * 2014-12-19 2015-04-29 北京锐安科技有限公司 Webpage information extraction method and device
CN105787068A (en) * 2016-03-01 2016-07-20 上海交通大学 Academic recommendation method and system based on citation network and user proficiency analysis
CN106610970A (en) * 2015-10-21 2017-05-03 上海文广互动电视有限公司 Collaborative filtering-based content recommendation system and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10043191B2 (en) * 2006-07-18 2018-08-07 Buzzfeed, Inc. System and method for online product promotion

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101755283A (en) * 2007-07-24 2010-06-23 三星电子株式会社 Method and apparatus for recommending information using hybrid algorithm
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system
CN104572874A (en) * 2014-12-19 2015-04-29 北京锐安科技有限公司 Webpage information extraction method and device
CN106610970A (en) * 2015-10-21 2017-05-03 上海文广互动电视有限公司 Collaborative filtering-based content recommendation system and method
CN105787068A (en) * 2016-03-01 2016-07-20 上海交通大学 Academic recommendation method and system based on citation network and user proficiency analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于内容和协同过滤的混合推荐技术;胡迎松 等;《第二届全国Web信息系统及其应用会议(WISA2005")》;20070429;正文第2小节 *
学术相关通知类邮件处理系统设计;徐傲雪 等;《中国教育网络》;20181005(第2018年10期);摘要、正文"引言"至"实验评估"等小节、图1 *

Also Published As

Publication number Publication date
CN109933717A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109933717B (en) Academic conference recommendation system based on hybrid recommendation algorithm
Eberendu Unstructured Data: an overview of the data of Big Data
US10387559B1 (en) Template-based identification of user interest
US9990368B2 (en) System and method for automatic generation of information-rich content from multiple microblogs, each microblog containing only sparse information
US11934474B2 (en) Selective presentation of content types and sources in search
US9137190B2 (en) System and method for content-based message distribution
US7657603B1 (en) Methods and systems of electronic message derivation
JP4812747B2 (en) Method and system for capturing and extracting information
US10127300B2 (en) Mapping relationships using electronic communications data
US20120042020A1 (en) Micro-blog message filtering
US20060053156A1 (en) Systems and methods for developing intelligence from information existing on a network
US20140074612A1 (en) System and Method for Targeting Information Items Based on Popularities of the Information Items
US20080140684A1 (en) Systems and methods for information categorization
US9477720B1 (en) Social search endorsements
CN104281607A (en) Microblog hot topic analyzing method
WO2012047385A1 (en) Collecting and presenting information
KR20120087972A (en) Mechanism for adding content from a search to a document or message
lvaro Cuesta et al. A Framework for massive Twitter data extraction and analysis
Sun et al. Efficient event detection in social media data streams
Wang et al. Spade: a social-spam analytics and detection framework
EP2698955A1 (en) Method and system for sending an alert message to a user
US20190244175A1 (en) System for Inspecting Messages Using an Interaction Engine
Arif et al. Social network extraction: a review of automatic techniques
JP5324824B2 (en) Information processing apparatus, information processing system, information processing method, and program for classifying network nodes
KR102188337B1 (en) Classifier recall estimation for sparse topics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant