CN109933717A - A kind of academic conference recommender system based on mixing proposed algorithm - Google Patents

A kind of academic conference recommender system based on mixing proposed algorithm Download PDF

Info

Publication number
CN109933717A
CN109933717A CN201910042396.9A CN201910042396A CN109933717A CN 109933717 A CN109933717 A CN 109933717A CN 201910042396 A CN201910042396 A CN 201910042396A CN 109933717 A CN109933717 A CN 109933717A
Authority
CN
China
Prior art keywords
meeting
user
mail
information
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910042396.9A
Other languages
Chinese (zh)
Other versions
CN109933717B (en
Inventor
张凌
徐傲雪
张晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201910042396.9A priority Critical patent/CN109933717B/en
Publication of CN109933717A publication Critical patent/CN109933717A/en
Application granted granted Critical
Publication of CN109933717B publication Critical patent/CN109933717B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The academic conference recommender system based on mixing proposed algorithm that the invention discloses a kind of, academic conference information can be obtained from individual subscriber mail and open session issuing web site, conferencing information abstract is generated by information filtering and effective information extraction process raw information, and the collaborative filtering according to the historical behavior of user and based on the academic conference document representation method of TF-IDF and term vector using fusion based on user and the mixing proposed algorithm based on content realize the personalized recommendation of academic conference information, and pass through the push of displaying and Dropbox filing two methods realization information based on WEB.The academic conference recommender system helps to improve the efficiency that scientific research personnel handles information, effectively improves the excessive select permeability of academic conference.

Description

A kind of academic conference recommender system based on mixing proposed algorithm
Technical field
The present invention relates to the technical fields of computer network, refer in particular to a kind of academic conference based on mixing proposed algorithm Recommender system.
Background technique
Nowadays, user can pass through various equipment and a large amount of network application of service access.It is provided due to mobile platform Function enhancing, user can access interested Internet resources whenever and wherever possible, internet increasingly on information content continuous increasing Long, recommender system has become the available strategy for overcoming this type of information overload problem.Recommender system can effectively improve excessive choosing The problem being hidden under huge network resource such as select, the practicality should not be underestimated, therefore it is extensive in many network applications Using.And the diversity of user and resource causes the recommended engine effect of single model realization not fully up to expectations, therefore mixing pushes away The research for recommending algorithm is of great significance.
Under colleges and universities, research institution's environment, the problem of equally existing information overload, wherein academic conference information is to have attracted much attention , this type of information is that colleges and universities teachers and students, researcher pay special attention to and contact often, and the sources of this type of information include section The scientific research mail that the personnel of grinding receive publishes the website of academic conference notice, the internal resource point of scientific research personnel's institutional affiliation It enjoys.The quantity of academic conference in recent years is also constantly extending, and quality is also very different, therefore there are some common in reality The problem of, spam on one side network, subscribe to mail, advertisement matter is spread unchecked, scientific research personnel needs to consume when handling mail Take the notice class mail that considerable time energy goes screening academic conference;Another aspect colleges and universities personnel can touch many Art conferencing information, but wherein the overwhelming majority and do not meet their research field and scientific research level;In addition such Symposium Held Often there are some unessential parts in view information, needs that the time is spent to remove positioning effective information when handling information, realize and learn The personalized push of art conferencing information helps to improve scientific research personnel and handles the efficiency of information, to excite academic enthusiasm, promotes Academic or school work very has research significance.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology with it is insufficient, propose a kind of based on mixing proposed algorithm Academic conference recommender system can obtain academic conference information, and root from individual subscriber mail and open session issuing web site The personalized recommendation of academic conference information is realized according to the historical behavior and academic conference internal characteristics of user, and by being based on WEB Displaying and Dropbox filing two methods realize information push.
To achieve the above object, a kind of technical solution provided by the present invention are as follows: Symposium Held based on mixing proposed algorithm Recommender system is discussed, it is academic that the system is filtered by user mail, open session notifies that site information acquires two methods acquisition Notice of meeting information, the pretreatment and html web page data for carrying out mail data respectively to academic conference notification information are extracted, warp The conferencing information persistent storage of unified format is generated after processing on the server, according to server performance and data renewal speed Suitable time interval arrangement timed task is set, which realizes the calculating of user and the article degree of correlation, calculation method For a kind of collaborative filtering merged based on user and based on the mixing proposed algorithm of content, wherein based on the algorithm of content with TF- Based on the text representation of IDF combination term vector, and meeting recommendation is carried out to user according to the degree of correlation, recommendation results pass through webpage It shows and files to two methods of Dropbox and be pushed to user;The system is specifically included with lower module:
E-mail messages processing module, for realizing e-mail data is received, decoding mail data forms mail property, sieve It takes art notice of meeting class mail as an elective course and the academic conference domain classification based on SVM is carried out according to message body, to by screening Message body carries out rule-based effective information extraction, storage treated academic conference information metadata and mail property;
Web information processing module specifies target webpage for realizing according to system configuration, checks target webpage feelings in real time Condition, the academic conference that acquisition target webpage updates notify resource, and record can not connect or the failure webpage of structure change, and uses base The effective information in webpage, storage treated academic conference information metadata and web-page summarization are extracted in the method for label;
Academic conference recommending module, for realizing to user data and e-mail messages processing module and Web information processing mould The conferencing data of block storage is pre-processed, at the appointed time passed through by timed task merge collaborative filtering based on user and Mixing proposed algorithm based on content obtains the consumer articles degree of correlation, and according to the degree of correlation and user configuration generate recommendation results simultaneously Caching;
Dropbox profiling module obtains for realizing academic conference recommending module recommendation results, includes according to recommendation results Association Dropbox is examined successively in user, by recommend corresponding user academic conference circular mail abstract or Web page text filing to The Dropbox at family;
Conferencing information is shown and configuration management module, related to user configuration system for realizing user management personal information Setting shows related academic conference abstract, and feedback recommendation is as a result, subscribe to meeting website.
Further, the e-mail messages processing module includes mail acquisition component and mail treatment component;The mail is adopted Collect component and realizes mail reception, mail parsing, filtrating mail and mail caching;The mail treatment component is realized based on SVM's Academic conference circular mail domain classification, rule-based effective information extract, academic conference information metadata and mail original part Storage;Wherein:
The socket of 25 ports is monitored in the mail acquisition component creation, realizes that a channel receives the order of SMTP connection And handled by protocol specification, if the address of the addressee that RCPT TO order is identified not in the user list of system, is refused Transmission obtains mail particular content in qualified transmission, according to SMTP and MIME protocol-decoding email messages data, Its mail head and mail body are extracted, address of the addressee, sender address and subject data in mail head are obtained, using based on pass The method of keyword rule is filtered mail body, that is, message body, filters out academic conference notice class mail;To prevent from blocking Mail receives, for generating unique filename by the mail of screening, with the format of [field name: field] respectively by mail head Data and message body write-in file form mail property, mail treatment work queue are created by Redis database, by file Name, which is added in processing queue, notifies subsequent mail items processing component;
After the mail treatment component gets file to be processed by poll mail treatment work queue, using offline The SVM model that training obtains carries out field mostly classification to message body and obtains the meeting field of the academic conference circular mail, benefit With the effective information in the Rule Extraction message body of regular expression include meeting title, meeting number, meeting start time, Meeting deadline, meeting dead line and meeting contribute mode, in addition meeting field, meeting source, that is, mail recipient Location, mail property file address are stored in relevant database as conferencing information metadata, are created by Redis database The Conference ID of newly-increased meeting is added to data prediction work queue by conferencing data pretreatment work queue.
Further, the Web information processing module includes web retrieval component and Web Page Processing component;The webpage is adopted Collect component and realizes that target webpage updates inspection, target webpage information collection and target webpage Failure Alarm;The Web Page Processing group Part realizes the extraction of webpage effective information, the storage of academic conference information metadata and web-page summarization based on label;Wherein:
The web retrieval component timing obtains target network by reading configuration file in daily 0 point of execution acquisition tasks Abbreviation, URL and the recent renewal date stood, targeted website are the website for publishing meeting and holding notice, acquire phase on webpage The web page resources of notice are held for the meeting issued after the recent renewal date to transmit and modify with html format file The configuration file recent renewal date obtains error message and using mail notification administrator and changes configuration file if acquisition failure Corresponding targeted website state is unavailable;
The Web Page Processing component is realized by observing the structure of web page design of each targeted website based on html tag Effective information obtains function, selects different information extraction functions to handle above-mentioned web page resources according to the abbreviation of targeted website, obtains Take meeting title, meeting number, meeting start time, meeting deadline, meeting dead line, meeting field and meeting submission Notice text is held in mode, meeting, and notice text is held in web page source website abbreviation, web retrieval time and meeting and saved as Web-page summarization file, it will view effective information and web-page summarization file address are stored in relevant database, by newly-increased meeting Conference ID be added to conferencing data pretreatment work queue.
Further, the academic conference recommending module includes data prediction component and core recommendation component;The data Pre-processing assembly realizes user data pretreatment, conferencing information pretreatment, intermediate data storage and fail data inspection and cleaning; The core recommendation component realizes that relatedness computation, user configuration are read, recommendation results generate and recommendation results caching;Wherein:
The data prediction component pre-processes queue by poll check conferencing data and user data pre-processes queue, Fail data inspection is carried out before carrying out data prediction, it will the view time started is later than the conference status setting of current date Realize that two kinds of data preprocessing methods are as follows respectively not as alternative meeting is recommended for history meeting:
Pretreatment for user data, article-user's inverted list that creation system saves, and calculate the happiness of associated user Good similarity measures the hobby similarity between user using cosine similarity, if N (u) is the interested meeting set of user u, N It (v) is the interested meeting set of user v, then the hobby similarity of user u and user v is
Pretreatment for conferencing data is the text data of conferencing data to be segmented and gone stop words, uses combination TF-IDF and the term vector that pre-training is carried out on large corpora carry out text representation, and document vector is expressed asWherein DiIndicate i-th document, K (t, Di) indicate word t in DiIn TF-IDF value, vtIndicate word t Term vector, after obtaining document vector, calculate the similarity between each document using Euclidean distance;
The time is calculated to save, the intermediate data persistence obtained after pretreatment is stored in local file system;
The core recommendation component is arranged by timed task, executes recommendation task in the set time, the specific steps are as follows:
A, according to user preferences similarity matrix, the k user most like with user u is found out, is indicated with set S (u, K), The interested meeting of user in S is all extracted, and removes u interested meeting;For each candidate meeting i, User u calculates its following formula of interested degree:Wherein N (i) is indicated to i The interested user of meeting, wuvThe hobby similarity for indicating user u and user v, according to the received recommendation of the hope of user configuration Conference numbers M selects the candidate meeting of p (u, i) maximum 2*M composition;
B, according to the text representation vector and its similarity of meeting, candidate meeting set J (u) obtained in step a is found out In the meeting most like with the interested meeting set I (u) of user u, for each candidate meeting i ∈ J (u), user u feels it The degree of interest is calculated with formula:Wherein dijThe distance for indicating meeting i and meeting j, screens out M time After selecting meeting, the consequently recommended M meeting to user u is obtained;
C, Dropbox filing queue is created by Redis, by obtained recommendation results with { user: [meeting 1, meeting 2, meeting 3 ...] json format } is added to Dropbox filing queue.
Further, the Dropbox profiling module needs integrated third party's Dropbox storage service, uses DCampus WebLib Cloud disk system detects recommendation during poll Dropbox filing queue as Dropbox storage service, the Dropbox profiling module When as a result, the user name and password of association Dropbox are successively inquired the user in recommendation results, third party cloud service is passed through HTTP interface provided by WebLib logs on to corresponding account, and creation is using current date as the new mesh of directory name under specified directory Record, successively the meeting mail abstract in upload user recommendation list or web-page summarization are to specified directory.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, by multiple mailboxes of syndication users, the information of multiple open academic conference websites can guarantee information source It is abundant reliable.
2, it is extracted by effective information, can more intuitively show the key message that academic conference is held.
3, by mixing proposed algorithm, it can filter out and not meet user's happiness in the collaborative filtering method result based on user Good pseudo- recommendation.
4, text representation is carried out by the method for combining TF-IDF and term vector, it can be in the item for retaining text semantic information Crucial characteristic information is obtained under part.
5, by the two methods of filing of exhibition method and user-association Dropbox based on WEB, it can guarantee that user is convenient Recommendation results are checked on ground, and realize the storage of resource.
Detailed description of the invention
Fig. 1 is the academic conference recommender system architecture diagram based on mixing proposed algorithm.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
Academic conference recommender system based on mixing proposed algorithm provided by the present embodiment is mainly to use Python language Say the academic conference recommender system of exploitation run in CentOS operating system, the mail filing cloud disk that this system is supported is The DCampus WebLib cloud disk system that Guangzhou Shuo Yuan network company and computer network key lab, Guangdong Province are researched and developed, should Dropbox belongs to enterprise-level cloud disk service, for the information management demand of different tissues, has developed standard edition, scientific research institution Multiple versions such as version and medical institutions edition.It supports file management, user management, full library searching and file cabinet management etc., and supports Multiple terminal access.As shown in Figure 1, supporting to realize Symposium Held by the HTTP interface that internal module and third party cloud service WebLib Recommender system is discussed, it includes:
E-mail messages processing module, for realizing e-mail data is received, decoding mail data forms mail property, sieve It takes art notice of meeting class mail as an elective course and carries out the academic conference domain classification based on SVM, the message body by screening is carried out Rule-based effective information extracts, storage treated academic conference information metadata and mail property.
Web information processing module specifies target webpage for realizing according to system configuration, checks target webpage feelings in real time Condition, the academic conference that acquisition target webpage updates notify resource, and record can not connect or the failure webpage of structure change, and uses base The effective information in webpage, storage treated academic conference information metadata and web-page summarization are extracted in the method for label.
Academic conference recommending module, to user data and above-mentioned e-mail messages processing module and webpage information The conferencing data of reason module storage is pre-processed, and at the appointed time passes through collaboration of the fusion based on user by timed task Filter and the mixing proposed algorithm based on content obtain the consumer articles degree of correlation, and according to the degree of correlation and user configuration generate recommend knot Fruit simultaneously caches.
Dropbox profiling module is tied for realizing the acquisition of recommendation results described in above-mentioned academic conference recommending module according to recommendation Association Dropbox is examined successively in the user that fruit includes, and will recommend the academic conference circular mail abstract or Web page text of corresponding user File the Dropbox of user.
Conferencing information is shown and configuration management module, related for realizing user management personal information, user configuration system Setting shows related academic conference abstract, and feedback recommendation is as a result, subscribe to meeting website.
The e-mail messages processing module includes mail acquisition component and mail treatment component;Mail acquisition component realizes postal Part receives, mail parsing, filtrating mail, mail caching;Mail treatment component realizes the academic conference circular mail neck based on SVM Domain classification, rule-based effective information extract, the storage of academic conference information metadata and mail original part.
The socket of 25 ports is monitored in the creation of mail acquisition component, realizes that a channel receives the order of SMTP connection and presses Protocol specification is handled, if the address of the addressee that RCPT TO order is identified, not in the user list of system, refusal passes It is defeated, in qualified transmission, obtains mail particular content and mentioned according to SMTP and MIME protocol-decoding email messages data Take its mail head and mail body, address of the addressee, sender address, subject data in acquisition mail head, using based on key The method of word rule is filtered mail body, that is, message body, filters out academic conference notice class mail.To prevent obstruction postal Part receives, for generating unique filename by the mail of screening, with the format of [field name: field] respectively by mail head's number Mail property is formed according to message body write-in file, mail treatment work queue is created by Redis database, by filename It is added in processing queue and notifies subsequent mail items processing component.
After mail treatment component gets file to be processed by poll mail treatment work queue, off-line training is used Obtained SVM model carries out field mostly classification to message body and obtains the meeting field of the academic conference circular mail, using just Then the effective information in the Rule Extraction message body of expression formula includes meeting title, meeting number, meeting start time, meeting Deadline, meeting dead line, meeting submission mode, in addition meeting field, meeting source, that is, mail recipient address, mail Summary file address is stored in relevant database as conferencing information metadata, passes through Redis database creation meeting number The Conference ID of newly-increased meeting is added to data prediction work queue by Data preprocess work queue.
The Web information processing module includes web retrieval component and Web Page Processing component;Web retrieval component realizes mesh It marks webpage and updates inspection, target webpage information collection, target webpage Failure Alarm;Web Page Processing component realizes the net based on label Page effective information extracts, the storage of academic conference information metadata and web-page summarization.
Web retrieval component timing obtains targeted website by reading configuration file in daily 0 point of execution acquisition tasks Referred to as, URL and recent renewal date, targeted website are the website for publishing meeting and holding notice, acquire webpage on relative to The meeting of publication is held the web page resources of notice and is transmitted with html format file and modify configuration after the recent renewal date The file recent renewal date obtains error message using mail notification administrator and changes configuration file correspondence if acquisition failure Targeted website state is unavailable.
Web Page Processing component is realized by observing the structure of web page design of each targeted website based on the effective of html tag Acquisition of information function selects different information extraction functions to handle above-mentioned web page resources, obtains meeting according to the abbreviation of targeted website Title, meeting number, meeting start time, meeting deadline, meeting dead line, meeting field, meeting submission mode are discussed, Notice text is held in meeting, web page source website abbreviation, web retrieval time, meeting is held, text is notified to save as webpage and pluck File is wanted, other above-mentioned meeting effective informations and web-page summarization file address are stored in relevant database, by newly-increased meeting The Conference ID of view is added to conferencing data pretreatment work queue.
The academic conference recommending module includes data prediction component and core recommendation component;Data prediction component is real Current amount Data preprocess, conferencing information pretreatment, intermediate data storage, fail data inspection and cleaning;Core recommendation component Realize that relatedness computation, user configuration are read, recommendation results generate, recommendation results caching.
Data prediction component pre-processes queue and user data by poll check conferencing data and pre-processes queue, into Fail data inspection is carried out before line number Data preprocess, it will the conference status that the view time started is later than current date is set as going through History meeting realizes that two kinds of data preprocessing methods are as follows respectively, calculates the time to save not as alternative meeting is recommended, The intermediate data persistence obtained after pretreatment is stored in local file system.
Pretreatment for user data, article-user's inverted list that creation system saves, and calculate the happiness of associated user Good similarity, this system measure the hobby similarity between user using cosine similarity, if N (u) is the interested meeting of user u Set, N (v) are the interested meeting set of user v, then the hobby similarity of user u and user v is
Pretreatment for conferencing data is to be segmented to the text data of conferencing data, remove stop words, uses combination TF-IDF and the term vector that pre-training is carried out on large corpora carry out text representation, and document vector is represented byWherein DiIndicate i-th document, K (t, Di) indicate word t in DiIn TF-IDF value, vtIndicate word t Term vector, after obtaining document vector, calculate the similarity between each document using Euclidean distance.
Core recommendation component is arranged by timed task, executes recommendation task in the set time, the specific steps are as follows:
A, according to user preferences similarity matrix, the k user most like with user u is found out, is indicated with set S (u, K), The interested meeting of user in S is all extracted, and removes u interested meeting.For each candidate meeting i, User u calculates its following formula of interested degree:Wherein N (i) is indicated emerging to i meeting sense The user of interest, wuvThe hobby similarity for indicating user u and user v, according to the received recommendation conference numbers of the hope of user configuration M selects the candidate meeting of p (u, i) maximum 2*M composition.
B, according to the text representation vector and its similarity of meeting, candidate meeting set J (u) obtained in step i is found out In the meeting most like with the interested meeting set I (u) of user u, for each candidate meeting i ∈ J (u), user u feels it The degree of interest is calculated with following formula:Wherein dijThe distance for indicating meeting i and meeting j screens out M After candidate meeting, the consequently recommended M meeting to user u is obtained.
C, Dropbox filing queue is created by Redis, by obtained recommendation results with { user: [meeting 1, meeting 2, meeting 3 ...] json format } is added to Dropbox filing queue.
The Dropbox profiling module needs integrated third party's Dropbox storage service, and this system is made using DCampus WebLib For Dropbox storage service, which is ground by Guangzhou Shuo Yuan network company and computer network key lab, Guangdong Province Hair.Dropbox profiling module is during poll Dropbox filing queue, when detecting recommendation results, to the user in recommendation results according to The user name and password of secondary inquiry association Dropbox, log on to corresponding account by HTTP interface provided by WebLib, specified It is created under catalogue using current date as the new directory of directory name, successively the meeting mail abstract or net in upload user recommendation list Page is made a summary to specified directory.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore All shapes according to the present invention change made by principle, should all be included within the scope of protection of the present invention.

Claims (5)

1. a kind of academic conference recommender system based on mixing proposed algorithm, it is characterised in that: the system passes through user mail Two methods of filtering, open session notice site information acquisition obtain academic conference notification information, to academic conference notification information The pretreatment and html web page data for carrying out mail data respectively are extracted, and the conferencing information for generating unified format after processing is lasting Change storage on the server, suitable time interval be arranged according to server performance and data renewal speed and arranges timed task, The timed task realizes the calculating of user and the article degree of correlation, and calculation method is a kind of collaborative filtering and base of the fusion based on user In the mixing proposed algorithm of content, wherein based on the algorithm of content based on the text representation of TF-IDF combination term vector, and Meeting recommendation is carried out to user according to the degree of correlation, recommendation results are pushed to use to two methods of Dropbox by web page display and filing Family;The system is specifically included with lower module:
E-mail messages processing module, for realizing e-mail data is received, decoding mail data forms mail property, and screening is learned Art notice of meeting class mail simultaneously carries out the academic conference domain classification based on SVM according to message body, to the mail by screening Text carries out rule-based effective information extraction, storage treated academic conference information metadata and mail property;
Web information processing module specifies target webpage for realizing according to system configuration, checks target webpage situation in real time, adopts Collect the academic conference that target webpage updates and notify resource, record can not connect or the failure webpage of structure change, and with based on mark The methods of label extracts the effective information in webpage, storage treated academic conference information metadata and web-page summarization;
Academic conference recommending module is deposited for realizing to user data and e-mail messages processing module and Web information processing module The conferencing data of storage is pre-processed, and is at the appointed time passed through the collaborative filtering merged based on user by timed task and is based on The mixing proposed algorithm of content obtains the consumer articles degree of correlation, and according to the degree of correlation and user configuration generates and recommendation results and delays It deposits;
Dropbox profiling module, for realizing the acquisition of academic conference recommending module recommendation results, the user for including according to recommendation results Association Dropbox is examined successively, the academic conference circular mail for recommending corresponding user abstract or Web page text filing are arrived user's Dropbox;
Conferencing information is shown and configuration management module, sets for realizing user management personal information is related to user configuration system It sets, shows related academic conference abstract, feedback recommendation is as a result, subscribe to meeting website.
2. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute Stating e-mail messages processing module includes mail acquisition component and mail treatment component;The mail acquisition component realizes that mail connects It receives, mail parses, filtrating mail and mail cache;The mail treatment component realizes the academic conference circular mail based on SVM Domain classification, rule-based effective information extract, the storage of academic conference information metadata and mail original part;Wherein:
The socket of 25 ports is monitored in the mail acquisition component creation, realizes that a channel receives the order of SMTP connection and presses Protocol specification is handled, if the address of the addressee that RCPT TO order is identified, not in the user list of system, refusal passes It is defeated, in qualified transmission, obtains mail particular content and mentioned according to SMTP and MIME protocol-decoding email messages data Its mail head and mail body are taken, address of the addressee, sender address and subject data in mail head are obtained, using based on key The method of word rule is filtered mail body, that is, message body, filters out academic conference notice class mail;To prevent obstruction postal Part receives, for generating unique filename by the mail of screening, with the format of [field name: field] respectively by mail head's number Mail property is formed according to message body write-in file, mail treatment work queue is created by Redis database, by filename It is added in processing queue and notifies subsequent mail items processing component;
After the mail treatment component gets file to be processed by poll mail treatment work queue, off-line training is used Obtained SVM model carries out field mostly classification to message body and obtains the meeting field of the academic conference circular mail, using just Then the effective information in the Rule Extraction message body of expression formula includes meeting title, meeting number, meeting start time, meeting Deadline, meeting dead line and meeting submission mode, in addition meeting field, meeting source, that is, mail recipient address, postal Part Summary file address is stored in relevant database as conferencing information metadata, passes through the creation meeting of Redis database The Conference ID of newly-increased meeting is added to data prediction work queue by data prediction work queue.
3. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute Stating Web information processing module includes web retrieval component and Web Page Processing component;The web retrieval component realizes target webpage Update inspection, target webpage information collection and target webpage Failure Alarm;The Web Page Processing component realizes the net based on label The extraction of page effective information, the storage of academic conference information metadata and web-page summarization;Wherein:
The web retrieval component timing obtains targeted website by reading configuration file in daily 0 point of execution acquisition tasks Referred to as, URL and recent renewal date, targeted website are the website for publishing meeting and holding notice, acquire webpage on relative to The meeting of publication is held the web page resources of notice and is transmitted with html format file and modify configuration after the recent renewal date The file recent renewal date obtains error message using mail notification administrator and changes configuration file correspondence if acquisition failure Targeted website state is unavailable;
The Web Page Processing component is realized by observing the structure of web page design of each targeted website based on the effective of html tag Acquisition of information function selects different information extraction functions to handle above-mentioned web page resources, obtains meeting according to the abbreviation of targeted website Discuss title, meeting number, meeting start time, meeting deadline, meeting dead line, meeting field and meeting submission side Notice text is held in formula, meeting, and notice text is held in web page source website abbreviation, web retrieval time and meeting and save as net Page Summary file, it will view effective information and web-page summarization file address are stored in relevant database, by newly-increased meeting Conference ID is added to conferencing data pretreatment work queue.
4. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute Stating academic conference recommending module includes data prediction component and core recommendation component;The data prediction component realizes user Data prediction, conferencing information pretreatment, intermediate data storage and fail data inspection and cleaning;The core recommendation component is real Existing relatedness computation, user configuration are read, recommendation results generate and recommendation results caching;Wherein:
The data prediction component pre-processes queue and user data by poll check conferencing data and pre-processes queue, into Fail data inspection is carried out before line number Data preprocess, it will the conference status that the view time started is later than current date is set as going through History meeting realizes that two kinds of data preprocessing methods are as follows not as alternative meeting is recommended respectively:
Pretreatment for user data, article-user's inverted list that creation system saves, and calculate the hobby phase of associated user Like degree, the hobby similarity between user is measured using cosine similarity, if N (u) is the interested meeting set of user u, N (v) For the interested meeting set of user v, then the hobby similarity of user u and user v is
Pretreatment for conferencing data is the text data of conferencing data to be segmented and gone stop words, using in conjunction with TF- IDF and the term vector that pre-training is carried out on large corpora carry out text representation, and document vector is expressed asWherein DiIndicate i-th document, K (t, Di) indicate word t in DiIn TF-IDF value, vtIndicate word t Term vector, after obtaining document vector, calculate the similarity between each document using Euclidean distance;
The time is calculated to save, the intermediate data persistence obtained after pretreatment is stored in local file system;
The core recommendation component is arranged by timed task, executes recommendation task in the set time, the specific steps are as follows:
A, according to user preferences similarity matrix, the k user most like with user u is found out, is indicated with set S (u, K), by S The middle interested meeting of user all extracts, and removes u interested meeting;For each candidate meeting i, user U calculates its following formula of interested degree:Wherein N (i) is indicated to i meeting Interested user, wuvThe hobby similarity for indicating user u and user v, according to the received recommendation meeting of the hope of user configuration Quantity M selects the candidate meeting of p (u, i) maximum 2*M composition;
B, according to the text representation vector and its similarity of meeting, find out in candidate meeting set J (u) obtained in step a with The most like meeting of the interested meeting set I (u) of user u, for each candidate meeting i ∈ J (u), user u is interested in it Degree calculated with formula:Wherein dijThe distance for indicating meeting i and meeting j screens out M candidate meeting After view, the consequently recommended M meeting to user u is obtained;
C, Dropbox filing queue is created by Redis, by obtained recommendation results with { user: [meeting 1, meeting 2, meeting 3 ...] json format } is added to Dropbox filing queue.
5. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute It states Dropbox profiling module and needs integrated third party's Dropbox storage service, DCampus WebLib cloud disk system is used to deposit as Dropbox Storage service, the Dropbox profiling module is during poll Dropbox filing queue, when detecting recommendation results, in recommendation results User successively inquire association Dropbox user name and password, by third party cloud service WebLib provided by HTTP interface step on Corresponding account is recorded, is created under specified directory using current date as the new directory of directory name, successively upload user recommendation list In meeting mail abstract or web-page summarization to specified directory.
CN201910042396.9A 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm Active CN109933717B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910042396.9A CN109933717B (en) 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910042396.9A CN109933717B (en) 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm

Publications (2)

Publication Number Publication Date
CN109933717A true CN109933717A (en) 2019-06-25
CN109933717B CN109933717B (en) 2021-05-14

Family

ID=66985105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910042396.9A Active CN109933717B (en) 2019-01-17 2019-01-17 Academic conference recommendation system based on hybrid recommendation algorithm

Country Status (1)

Country Link
CN (1) CN109933717B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111796830A (en) * 2020-06-08 2020-10-20 成都数之联科技有限公司 Protocol analysis processing method, device, equipment and medium
CN112687272A (en) * 2020-12-18 2021-04-20 北京金山云网络技术有限公司 Conference summary recording method and device and electronic equipment
CN113077235A (en) * 2021-04-12 2021-07-06 上海明略人工智能(集团)有限公司 Conference schedule conflict management method and system, electronic equipment and storage medium
CN113127633A (en) * 2021-06-17 2021-07-16 平安科技(深圳)有限公司 Intelligent conference management method and device, computer equipment and storage medium
CN113420058A (en) * 2021-07-01 2021-09-21 宁波大学 Conversational academic conference recommendation method based on combination of user historical behaviors

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080033781A1 (en) * 2006-07-18 2008-02-07 Jonah Holmes Peretti System and method for online product promotion
CN101755283A (en) * 2007-07-24 2010-06-23 三星电子株式会社 Method and apparatus for recommending information using hybrid algorithm
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system
CN104572874A (en) * 2014-12-19 2015-04-29 北京锐安科技有限公司 Webpage information extraction method and device
CN105787068A (en) * 2016-03-01 2016-07-20 上海交通大学 Academic recommendation method and system based on citation network and user proficiency analysis
CN106610970A (en) * 2015-10-21 2017-05-03 上海文广互动电视有限公司 Collaborative filtering-based content recommendation system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080033781A1 (en) * 2006-07-18 2008-02-07 Jonah Holmes Peretti System and method for online product promotion
CN101755283A (en) * 2007-07-24 2010-06-23 三星电子株式会社 Method and apparatus for recommending information using hybrid algorithm
CN103049575A (en) * 2013-01-05 2013-04-17 华中科技大学 Topic-adaptive academic conference searching system
CN104572874A (en) * 2014-12-19 2015-04-29 北京锐安科技有限公司 Webpage information extraction method and device
CN106610970A (en) * 2015-10-21 2017-05-03 上海文广互动电视有限公司 Collaborative filtering-based content recommendation system and method
CN105787068A (en) * 2016-03-01 2016-07-20 上海交通大学 Academic recommendation method and system based on citation network and user proficiency analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
徐傲雪 等: "学术相关通知类邮件处理系统设计", 《中国教育网络》 *
胡迎松 等: "基于内容和协同过滤的混合推荐技术", 《第二届全国WEB信息系统及其应用会议(WISA2005")》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111796830A (en) * 2020-06-08 2020-10-20 成都数之联科技有限公司 Protocol analysis processing method, device, equipment and medium
CN111796830B (en) * 2020-06-08 2023-09-19 成都数之联科技股份有限公司 Protocol analysis processing method, device, equipment and medium
CN112687272A (en) * 2020-12-18 2021-04-20 北京金山云网络技术有限公司 Conference summary recording method and device and electronic equipment
CN112687272B (en) * 2020-12-18 2023-03-21 北京金山云网络技术有限公司 Conference summary recording method and device and electronic equipment
CN113077235A (en) * 2021-04-12 2021-07-06 上海明略人工智能(集团)有限公司 Conference schedule conflict management method and system, electronic equipment and storage medium
CN113077235B (en) * 2021-04-12 2024-03-22 上海明略人工智能(集团)有限公司 Conference schedule conflict management method, system, electronic equipment and storage medium
CN113127633A (en) * 2021-06-17 2021-07-16 平安科技(深圳)有限公司 Intelligent conference management method and device, computer equipment and storage medium
CN113420058A (en) * 2021-07-01 2021-09-21 宁波大学 Conversational academic conference recommendation method based on combination of user historical behaviors

Also Published As

Publication number Publication date
CN109933717B (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN109933717A (en) A kind of academic conference recommender system based on mixing proposed algorithm
US8601004B1 (en) System and method for targeting information items based on popularities of the information items
US7949714B1 (en) System and method for targeting advertisements or other information using user geographical information
US20120233209A1 (en) Enterprise search over private and public data
US9002725B1 (en) System and method for targeting information based on message content
Vosecky et al. Searching for quality microblog posts: Filtering and ranking based on content analysis and implicit links
CN112486917A (en) Method and system for automatically generating information-rich content from multiple microblogs
Milanesi et al. How do you depict sustainability? An analysis of images posted on Instagram by sustainable fashion companies
WO2012137215A1 (en) A system and method for communication
US9697527B2 (en) Centralized social network response tracking
US10516643B2 (en) Client side social network response tracking
Abel et al. Linkage, aggregation, alignment and enrichment of public user profiles with Mypes
Wang et al. Spade: a social-spam analytics and detection framework
CN102844757A (en) Company network
Joly et al. Contextual recommendation of social updates, a tag-based framework
Arif et al. Social network extraction: a review of automatic techniques
JP2009510598A (en) Communication and collaboration system
Liu et al. Research on the application of SNS in university libraries: A case study of microblogs in Chinese “211 project” universities
Steele et al. Putting the public into public health information dissemination: social media and health-related web pages
Xianlei et al. Finding domain experts in microblogs
US20180101615A1 (en) Systems, methods and techniques for customizable domain-based searching
Lutu Web 2.0 computing and social media as solution enablers for economic development in Africa
US20160028659A1 (en) System and Method for Targeting Advertisements or Other Information Based on Recently Sent Message or Messages
Yang et al. Micro-blog friend recommendation algorithms based on content and social relationship
Feng et al. Negative Examples Sampling Based on Factorization Machines for OCCF

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant