CN109933717A - A kind of academic conference recommender system based on mixing proposed algorithm - Google Patents
A kind of academic conference recommender system based on mixing proposed algorithm Download PDFInfo
- Publication number
- CN109933717A CN109933717A CN201910042396.9A CN201910042396A CN109933717A CN 109933717 A CN109933717 A CN 109933717A CN 201910042396 A CN201910042396 A CN 201910042396A CN 109933717 A CN109933717 A CN 109933717A
- Authority
- CN
- China
- Prior art keywords
- meeting
- user
- information
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
The academic conference recommender system based on mixing proposed algorithm that the invention discloses a kind of, academic conference information can be obtained from individual subscriber mail and open session issuing web site, conferencing information abstract is generated by information filtering and effective information extraction process raw information, and the collaborative filtering according to the historical behavior of user and based on the academic conference document representation method of TF-IDF and term vector using fusion based on user and the mixing proposed algorithm based on content realize the personalized recommendation of academic conference information, and pass through the push of displaying and Dropbox filing two methods realization information based on WEB.The academic conference recommender system helps to improve the efficiency that scientific research personnel handles information, effectively improves the excessive select permeability of academic conference.
Description
Technical field
The present invention relates to the technical fields of computer network, refer in particular to a kind of academic conference based on mixing proposed algorithm
Recommender system.
Background technique
Nowadays, user can pass through various equipment and a large amount of network application of service access.It is provided due to mobile platform
Function enhancing, user can access interested Internet resources whenever and wherever possible, internet increasingly on information content continuous increasing
Long, recommender system has become the available strategy for overcoming this type of information overload problem.Recommender system can effectively improve excessive choosing
The problem being hidden under huge network resource such as select, the practicality should not be underestimated, therefore it is extensive in many network applications
Using.And the diversity of user and resource causes the recommended engine effect of single model realization not fully up to expectations, therefore mixing pushes away
The research for recommending algorithm is of great significance.
Under colleges and universities, research institution's environment, the problem of equally existing information overload, wherein academic conference information is to have attracted much attention
, this type of information is that colleges and universities teachers and students, researcher pay special attention to and contact often, and the sources of this type of information include section
The scientific research mail that the personnel of grinding receive publishes the website of academic conference notice, the internal resource point of scientific research personnel's institutional affiliation
It enjoys.The quantity of academic conference in recent years is also constantly extending, and quality is also very different, therefore there are some common in reality
The problem of, spam on one side network, subscribe to mail, advertisement matter is spread unchecked, scientific research personnel needs to consume when handling mail
Take the notice class mail that considerable time energy goes screening academic conference;Another aspect colleges and universities personnel can touch many
Art conferencing information, but wherein the overwhelming majority and do not meet their research field and scientific research level;In addition such Symposium Held
Often there are some unessential parts in view information, needs that the time is spent to remove positioning effective information when handling information, realize and learn
The personalized push of art conferencing information helps to improve scientific research personnel and handles the efficiency of information, to excite academic enthusiasm, promotes
Academic or school work very has research significance.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology with it is insufficient, propose a kind of based on mixing proposed algorithm
Academic conference recommender system can obtain academic conference information, and root from individual subscriber mail and open session issuing web site
The personalized recommendation of academic conference information is realized according to the historical behavior and academic conference internal characteristics of user, and by being based on WEB
Displaying and Dropbox filing two methods realize information push.
To achieve the above object, a kind of technical solution provided by the present invention are as follows: Symposium Held based on mixing proposed algorithm
Recommender system is discussed, it is academic that the system is filtered by user mail, open session notifies that site information acquires two methods acquisition
Notice of meeting information, the pretreatment and html web page data for carrying out mail data respectively to academic conference notification information are extracted, warp
The conferencing information persistent storage of unified format is generated after processing on the server, according to server performance and data renewal speed
Suitable time interval arrangement timed task is set, which realizes the calculating of user and the article degree of correlation, calculation method
For a kind of collaborative filtering merged based on user and based on the mixing proposed algorithm of content, wherein based on the algorithm of content with TF-
Based on the text representation of IDF combination term vector, and meeting recommendation is carried out to user according to the degree of correlation, recommendation results pass through webpage
It shows and files to two methods of Dropbox and be pushed to user;The system is specifically included with lower module:
E-mail messages processing module, for realizing e-mail data is received, decoding mail data forms mail property, sieve
It takes art notice of meeting class mail as an elective course and the academic conference domain classification based on SVM is carried out according to message body, to by screening
Message body carries out rule-based effective information extraction, storage treated academic conference information metadata and mail property;
Web information processing module specifies target webpage for realizing according to system configuration, checks target webpage feelings in real time
Condition, the academic conference that acquisition target webpage updates notify resource, and record can not connect or the failure webpage of structure change, and uses base
The effective information in webpage, storage treated academic conference information metadata and web-page summarization are extracted in the method for label;
Academic conference recommending module, for realizing to user data and e-mail messages processing module and Web information processing mould
The conferencing data of block storage is pre-processed, at the appointed time passed through by timed task merge collaborative filtering based on user and
Mixing proposed algorithm based on content obtains the consumer articles degree of correlation, and according to the degree of correlation and user configuration generate recommendation results simultaneously
Caching;
Dropbox profiling module obtains for realizing academic conference recommending module recommendation results, includes according to recommendation results
Association Dropbox is examined successively in user, by recommend corresponding user academic conference circular mail abstract or Web page text filing to
The Dropbox at family;
Conferencing information is shown and configuration management module, related to user configuration system for realizing user management personal information
Setting shows related academic conference abstract, and feedback recommendation is as a result, subscribe to meeting website.
Further, the e-mail messages processing module includes mail acquisition component and mail treatment component;The mail is adopted
Collect component and realizes mail reception, mail parsing, filtrating mail and mail caching;The mail treatment component is realized based on SVM's
Academic conference circular mail domain classification, rule-based effective information extract, academic conference information metadata and mail original part
Storage;Wherein:
The socket of 25 ports is monitored in the mail acquisition component creation, realizes that a channel receives the order of SMTP connection
And handled by protocol specification, if the address of the addressee that RCPT TO order is identified not in the user list of system, is refused
Transmission obtains mail particular content in qualified transmission, according to SMTP and MIME protocol-decoding email messages data,
Its mail head and mail body are extracted, address of the addressee, sender address and subject data in mail head are obtained, using based on pass
The method of keyword rule is filtered mail body, that is, message body, filters out academic conference notice class mail;To prevent from blocking
Mail receives, for generating unique filename by the mail of screening, with the format of [field name: field] respectively by mail head
Data and message body write-in file form mail property, mail treatment work queue are created by Redis database, by file
Name, which is added in processing queue, notifies subsequent mail items processing component;
After the mail treatment component gets file to be processed by poll mail treatment work queue, using offline
The SVM model that training obtains carries out field mostly classification to message body and obtains the meeting field of the academic conference circular mail, benefit
With the effective information in the Rule Extraction message body of regular expression include meeting title, meeting number, meeting start time,
Meeting deadline, meeting dead line and meeting contribute mode, in addition meeting field, meeting source, that is, mail recipient
Location, mail property file address are stored in relevant database as conferencing information metadata, are created by Redis database
The Conference ID of newly-increased meeting is added to data prediction work queue by conferencing data pretreatment work queue.
Further, the Web information processing module includes web retrieval component and Web Page Processing component;The webpage is adopted
Collect component and realizes that target webpage updates inspection, target webpage information collection and target webpage Failure Alarm;The Web Page Processing group
Part realizes the extraction of webpage effective information, the storage of academic conference information metadata and web-page summarization based on label;Wherein:
The web retrieval component timing obtains target network by reading configuration file in daily 0 point of execution acquisition tasks
Abbreviation, URL and the recent renewal date stood, targeted website are the website for publishing meeting and holding notice, acquire phase on webpage
The web page resources of notice are held for the meeting issued after the recent renewal date to transmit and modify with html format file
The configuration file recent renewal date obtains error message and using mail notification administrator and changes configuration file if acquisition failure
Corresponding targeted website state is unavailable;
The Web Page Processing component is realized by observing the structure of web page design of each targeted website based on html tag
Effective information obtains function, selects different information extraction functions to handle above-mentioned web page resources according to the abbreviation of targeted website, obtains
Take meeting title, meeting number, meeting start time, meeting deadline, meeting dead line, meeting field and meeting submission
Notice text is held in mode, meeting, and notice text is held in web page source website abbreviation, web retrieval time and meeting and saved as
Web-page summarization file, it will view effective information and web-page summarization file address are stored in relevant database, by newly-increased meeting
Conference ID be added to conferencing data pretreatment work queue.
Further, the academic conference recommending module includes data prediction component and core recommendation component;The data
Pre-processing assembly realizes user data pretreatment, conferencing information pretreatment, intermediate data storage and fail data inspection and cleaning;
The core recommendation component realizes that relatedness computation, user configuration are read, recommendation results generate and recommendation results caching;Wherein:
The data prediction component pre-processes queue by poll check conferencing data and user data pre-processes queue,
Fail data inspection is carried out before carrying out data prediction, it will the view time started is later than the conference status setting of current date
Realize that two kinds of data preprocessing methods are as follows respectively not as alternative meeting is recommended for history meeting:
Pretreatment for user data, article-user's inverted list that creation system saves, and calculate the happiness of associated user
Good similarity measures the hobby similarity between user using cosine similarity, if N (u) is the interested meeting set of user u, N
It (v) is the interested meeting set of user v, then the hobby similarity of user u and user v is
Pretreatment for conferencing data is the text data of conferencing data to be segmented and gone stop words, uses combination
TF-IDF and the term vector that pre-training is carried out on large corpora carry out text representation, and document vector is expressed asWherein DiIndicate i-th document, K (t, Di) indicate word t in DiIn TF-IDF value, vtIndicate word t
Term vector, after obtaining document vector, calculate the similarity between each document using Euclidean distance;
The time is calculated to save, the intermediate data persistence obtained after pretreatment is stored in local file system;
The core recommendation component is arranged by timed task, executes recommendation task in the set time, the specific steps are as follows:
A, according to user preferences similarity matrix, the k user most like with user u is found out, is indicated with set S (u, K),
The interested meeting of user in S is all extracted, and removes u interested meeting;For each candidate meeting i,
User u calculates its following formula of interested degree:Wherein N (i) is indicated to i
The interested user of meeting, wuvThe hobby similarity for indicating user u and user v, according to the received recommendation of the hope of user configuration
Conference numbers M selects the candidate meeting of p (u, i) maximum 2*M composition;
B, according to the text representation vector and its similarity of meeting, candidate meeting set J (u) obtained in step a is found out
In the meeting most like with the interested meeting set I (u) of user u, for each candidate meeting i ∈ J (u), user u feels it
The degree of interest is calculated with formula:Wherein dijThe distance for indicating meeting i and meeting j, screens out M time
After selecting meeting, the consequently recommended M meeting to user u is obtained;
C, Dropbox filing queue is created by Redis, by obtained recommendation results with { user: [meeting 1, meeting 2, meeting
3 ...] json format } is added to Dropbox filing queue.
Further, the Dropbox profiling module needs integrated third party's Dropbox storage service, uses DCampus WebLib
Cloud disk system detects recommendation during poll Dropbox filing queue as Dropbox storage service, the Dropbox profiling module
When as a result, the user name and password of association Dropbox are successively inquired the user in recommendation results, third party cloud service is passed through
HTTP interface provided by WebLib logs on to corresponding account, and creation is using current date as the new mesh of directory name under specified directory
Record, successively the meeting mail abstract in upload user recommendation list or web-page summarization are to specified directory.
Compared with prior art, the present invention have the following advantages that with the utility model has the advantages that
1, by multiple mailboxes of syndication users, the information of multiple open academic conference websites can guarantee information source
It is abundant reliable.
2, it is extracted by effective information, can more intuitively show the key message that academic conference is held.
3, by mixing proposed algorithm, it can filter out and not meet user's happiness in the collaborative filtering method result based on user
Good pseudo- recommendation.
4, text representation is carried out by the method for combining TF-IDF and term vector, it can be in the item for retaining text semantic information
Crucial characteristic information is obtained under part.
5, by the two methods of filing of exhibition method and user-association Dropbox based on WEB, it can guarantee that user is convenient
Recommendation results are checked on ground, and realize the storage of resource.
Detailed description of the invention
Fig. 1 is the academic conference recommender system architecture diagram based on mixing proposed algorithm.
Specific embodiment
The present invention is further explained in the light of specific embodiments.
Academic conference recommender system based on mixing proposed algorithm provided by the present embodiment is mainly to use Python language
Say the academic conference recommender system of exploitation run in CentOS operating system, the mail filing cloud disk that this system is supported is
The DCampus WebLib cloud disk system that Guangzhou Shuo Yuan network company and computer network key lab, Guangdong Province are researched and developed, should
Dropbox belongs to enterprise-level cloud disk service, for the information management demand of different tissues, has developed standard edition, scientific research institution
Multiple versions such as version and medical institutions edition.It supports file management, user management, full library searching and file cabinet management etc., and supports
Multiple terminal access.As shown in Figure 1, supporting to realize Symposium Held by the HTTP interface that internal module and third party cloud service WebLib
Recommender system is discussed, it includes:
E-mail messages processing module, for realizing e-mail data is received, decoding mail data forms mail property, sieve
It takes art notice of meeting class mail as an elective course and carries out the academic conference domain classification based on SVM, the message body by screening is carried out
Rule-based effective information extracts, storage treated academic conference information metadata and mail property.
Web information processing module specifies target webpage for realizing according to system configuration, checks target webpage feelings in real time
Condition, the academic conference that acquisition target webpage updates notify resource, and record can not connect or the failure webpage of structure change, and uses base
The effective information in webpage, storage treated academic conference information metadata and web-page summarization are extracted in the method for label.
Academic conference recommending module, to user data and above-mentioned e-mail messages processing module and webpage information
The conferencing data of reason module storage is pre-processed, and at the appointed time passes through collaboration of the fusion based on user by timed task
Filter and the mixing proposed algorithm based on content obtain the consumer articles degree of correlation, and according to the degree of correlation and user configuration generate recommend knot
Fruit simultaneously caches.
Dropbox profiling module is tied for realizing the acquisition of recommendation results described in above-mentioned academic conference recommending module according to recommendation
Association Dropbox is examined successively in the user that fruit includes, and will recommend the academic conference circular mail abstract or Web page text of corresponding user
File the Dropbox of user.
Conferencing information is shown and configuration management module, related for realizing user management personal information, user configuration system
Setting shows related academic conference abstract, and feedback recommendation is as a result, subscribe to meeting website.
The e-mail messages processing module includes mail acquisition component and mail treatment component;Mail acquisition component realizes postal
Part receives, mail parsing, filtrating mail, mail caching;Mail treatment component realizes the academic conference circular mail neck based on SVM
Domain classification, rule-based effective information extract, the storage of academic conference information metadata and mail original part.
The socket of 25 ports is monitored in the creation of mail acquisition component, realizes that a channel receives the order of SMTP connection and presses
Protocol specification is handled, if the address of the addressee that RCPT TO order is identified, not in the user list of system, refusal passes
It is defeated, in qualified transmission, obtains mail particular content and mentioned according to SMTP and MIME protocol-decoding email messages data
Take its mail head and mail body, address of the addressee, sender address, subject data in acquisition mail head, using based on key
The method of word rule is filtered mail body, that is, message body, filters out academic conference notice class mail.To prevent obstruction postal
Part receives, for generating unique filename by the mail of screening, with the format of [field name: field] respectively by mail head's number
Mail property is formed according to message body write-in file, mail treatment work queue is created by Redis database, by filename
It is added in processing queue and notifies subsequent mail items processing component.
After mail treatment component gets file to be processed by poll mail treatment work queue, off-line training is used
Obtained SVM model carries out field mostly classification to message body and obtains the meeting field of the academic conference circular mail, using just
Then the effective information in the Rule Extraction message body of expression formula includes meeting title, meeting number, meeting start time, meeting
Deadline, meeting dead line, meeting submission mode, in addition meeting field, meeting source, that is, mail recipient address, mail
Summary file address is stored in relevant database as conferencing information metadata, passes through Redis database creation meeting number
The Conference ID of newly-increased meeting is added to data prediction work queue by Data preprocess work queue.
The Web information processing module includes web retrieval component and Web Page Processing component;Web retrieval component realizes mesh
It marks webpage and updates inspection, target webpage information collection, target webpage Failure Alarm;Web Page Processing component realizes the net based on label
Page effective information extracts, the storage of academic conference information metadata and web-page summarization.
Web retrieval component timing obtains targeted website by reading configuration file in daily 0 point of execution acquisition tasks
Referred to as, URL and recent renewal date, targeted website are the website for publishing meeting and holding notice, acquire webpage on relative to
The meeting of publication is held the web page resources of notice and is transmitted with html format file and modify configuration after the recent renewal date
The file recent renewal date obtains error message using mail notification administrator and changes configuration file correspondence if acquisition failure
Targeted website state is unavailable.
Web Page Processing component is realized by observing the structure of web page design of each targeted website based on the effective of html tag
Acquisition of information function selects different information extraction functions to handle above-mentioned web page resources, obtains meeting according to the abbreviation of targeted website
Title, meeting number, meeting start time, meeting deadline, meeting dead line, meeting field, meeting submission mode are discussed,
Notice text is held in meeting, web page source website abbreviation, web retrieval time, meeting is held, text is notified to save as webpage and pluck
File is wanted, other above-mentioned meeting effective informations and web-page summarization file address are stored in relevant database, by newly-increased meeting
The Conference ID of view is added to conferencing data pretreatment work queue.
The academic conference recommending module includes data prediction component and core recommendation component;Data prediction component is real
Current amount Data preprocess, conferencing information pretreatment, intermediate data storage, fail data inspection and cleaning;Core recommendation component
Realize that relatedness computation, user configuration are read, recommendation results generate, recommendation results caching.
Data prediction component pre-processes queue and user data by poll check conferencing data and pre-processes queue, into
Fail data inspection is carried out before line number Data preprocess, it will the conference status that the view time started is later than current date is set as going through
History meeting realizes that two kinds of data preprocessing methods are as follows respectively, calculates the time to save not as alternative meeting is recommended,
The intermediate data persistence obtained after pretreatment is stored in local file system.
Pretreatment for user data, article-user's inverted list that creation system saves, and calculate the happiness of associated user
Good similarity, this system measure the hobby similarity between user using cosine similarity, if N (u) is the interested meeting of user u
Set, N (v) are the interested meeting set of user v, then the hobby similarity of user u and user v is
Pretreatment for conferencing data is to be segmented to the text data of conferencing data, remove stop words, uses combination
TF-IDF and the term vector that pre-training is carried out on large corpora carry out text representation, and document vector is represented byWherein DiIndicate i-th document, K (t, Di) indicate word t in DiIn TF-IDF value, vtIndicate word t
Term vector, after obtaining document vector, calculate the similarity between each document using Euclidean distance.
Core recommendation component is arranged by timed task, executes recommendation task in the set time, the specific steps are as follows:
A, according to user preferences similarity matrix, the k user most like with user u is found out, is indicated with set S (u, K),
The interested meeting of user in S is all extracted, and removes u interested meeting.For each candidate meeting i,
User u calculates its following formula of interested degree:Wherein N (i) is indicated emerging to i meeting sense
The user of interest, wuvThe hobby similarity for indicating user u and user v, according to the received recommendation conference numbers of the hope of user configuration
M selects the candidate meeting of p (u, i) maximum 2*M composition.
B, according to the text representation vector and its similarity of meeting, candidate meeting set J (u) obtained in step i is found out
In the meeting most like with the interested meeting set I (u) of user u, for each candidate meeting i ∈ J (u), user u feels it
The degree of interest is calculated with following formula:Wherein dijThe distance for indicating meeting i and meeting j screens out M
After candidate meeting, the consequently recommended M meeting to user u is obtained.
C, Dropbox filing queue is created by Redis, by obtained recommendation results with { user: [meeting 1, meeting 2, meeting
3 ...] json format } is added to Dropbox filing queue.
The Dropbox profiling module needs integrated third party's Dropbox storage service, and this system is made using DCampus WebLib
For Dropbox storage service, which is ground by Guangzhou Shuo Yuan network company and computer network key lab, Guangdong Province
Hair.Dropbox profiling module is during poll Dropbox filing queue, when detecting recommendation results, to the user in recommendation results according to
The user name and password of secondary inquiry association Dropbox, log on to corresponding account by HTTP interface provided by WebLib, specified
It is created under catalogue using current date as the new directory of directory name, successively the meeting mail abstract or net in upload user recommendation list
Page is made a summary to specified directory.
Embodiment described above is only the preferred embodiments of the invention, and but not intended to limit the scope of the present invention, therefore
All shapes according to the present invention change made by principle, should all be included within the scope of protection of the present invention.
Claims (5)
1. a kind of academic conference recommender system based on mixing proposed algorithm, it is characterised in that: the system passes through user mail
Two methods of filtering, open session notice site information acquisition obtain academic conference notification information, to academic conference notification information
The pretreatment and html web page data for carrying out mail data respectively are extracted, and the conferencing information for generating unified format after processing is lasting
Change storage on the server, suitable time interval be arranged according to server performance and data renewal speed and arranges timed task,
The timed task realizes the calculating of user and the article degree of correlation, and calculation method is a kind of collaborative filtering and base of the fusion based on user
In the mixing proposed algorithm of content, wherein based on the algorithm of content based on the text representation of TF-IDF combination term vector, and
Meeting recommendation is carried out to user according to the degree of correlation, recommendation results are pushed to use to two methods of Dropbox by web page display and filing
Family;The system is specifically included with lower module:
E-mail messages processing module, for realizing e-mail data is received, decoding mail data forms mail property, and screening is learned
Art notice of meeting class mail simultaneously carries out the academic conference domain classification based on SVM according to message body, to the mail by screening
Text carries out rule-based effective information extraction, storage treated academic conference information metadata and mail property;
Web information processing module specifies target webpage for realizing according to system configuration, checks target webpage situation in real time, adopts
Collect the academic conference that target webpage updates and notify resource, record can not connect or the failure webpage of structure change, and with based on mark
The methods of label extracts the effective information in webpage, storage treated academic conference information metadata and web-page summarization;
Academic conference recommending module is deposited for realizing to user data and e-mail messages processing module and Web information processing module
The conferencing data of storage is pre-processed, and is at the appointed time passed through the collaborative filtering merged based on user by timed task and is based on
The mixing proposed algorithm of content obtains the consumer articles degree of correlation, and according to the degree of correlation and user configuration generates and recommendation results and delays
It deposits;
Dropbox profiling module, for realizing the acquisition of academic conference recommending module recommendation results, the user for including according to recommendation results
Association Dropbox is examined successively, the academic conference circular mail for recommending corresponding user abstract or Web page text filing are arrived user's
Dropbox;
Conferencing information is shown and configuration management module, sets for realizing user management personal information is related to user configuration system
It sets, shows related academic conference abstract, feedback recommendation is as a result, subscribe to meeting website.
2. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute
Stating e-mail messages processing module includes mail acquisition component and mail treatment component;The mail acquisition component realizes that mail connects
It receives, mail parses, filtrating mail and mail cache;The mail treatment component realizes the academic conference circular mail based on SVM
Domain classification, rule-based effective information extract, the storage of academic conference information metadata and mail original part;Wherein:
The socket of 25 ports is monitored in the mail acquisition component creation, realizes that a channel receives the order of SMTP connection and presses
Protocol specification is handled, if the address of the addressee that RCPT TO order is identified, not in the user list of system, refusal passes
It is defeated, in qualified transmission, obtains mail particular content and mentioned according to SMTP and MIME protocol-decoding email messages data
Its mail head and mail body are taken, address of the addressee, sender address and subject data in mail head are obtained, using based on key
The method of word rule is filtered mail body, that is, message body, filters out academic conference notice class mail;To prevent obstruction postal
Part receives, for generating unique filename by the mail of screening, with the format of [field name: field] respectively by mail head's number
Mail property is formed according to message body write-in file, mail treatment work queue is created by Redis database, by filename
It is added in processing queue and notifies subsequent mail items processing component;
After the mail treatment component gets file to be processed by poll mail treatment work queue, off-line training is used
Obtained SVM model carries out field mostly classification to message body and obtains the meeting field of the academic conference circular mail, using just
Then the effective information in the Rule Extraction message body of expression formula includes meeting title, meeting number, meeting start time, meeting
Deadline, meeting dead line and meeting submission mode, in addition meeting field, meeting source, that is, mail recipient address, postal
Part Summary file address is stored in relevant database as conferencing information metadata, passes through the creation meeting of Redis database
The Conference ID of newly-increased meeting is added to data prediction work queue by data prediction work queue.
3. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute
Stating Web information processing module includes web retrieval component and Web Page Processing component;The web retrieval component realizes target webpage
Update inspection, target webpage information collection and target webpage Failure Alarm;The Web Page Processing component realizes the net based on label
The extraction of page effective information, the storage of academic conference information metadata and web-page summarization;Wherein:
The web retrieval component timing obtains targeted website by reading configuration file in daily 0 point of execution acquisition tasks
Referred to as, URL and recent renewal date, targeted website are the website for publishing meeting and holding notice, acquire webpage on relative to
The meeting of publication is held the web page resources of notice and is transmitted with html format file and modify configuration after the recent renewal date
The file recent renewal date obtains error message using mail notification administrator and changes configuration file correspondence if acquisition failure
Targeted website state is unavailable;
The Web Page Processing component is realized by observing the structure of web page design of each targeted website based on the effective of html tag
Acquisition of information function selects different information extraction functions to handle above-mentioned web page resources, obtains meeting according to the abbreviation of targeted website
Discuss title, meeting number, meeting start time, meeting deadline, meeting dead line, meeting field and meeting submission side
Notice text is held in formula, meeting, and notice text is held in web page source website abbreviation, web retrieval time and meeting and save as net
Page Summary file, it will view effective information and web-page summarization file address are stored in relevant database, by newly-increased meeting
Conference ID is added to conferencing data pretreatment work queue.
4. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute
Stating academic conference recommending module includes data prediction component and core recommendation component;The data prediction component realizes user
Data prediction, conferencing information pretreatment, intermediate data storage and fail data inspection and cleaning;The core recommendation component is real
Existing relatedness computation, user configuration are read, recommendation results generate and recommendation results caching;Wherein:
The data prediction component pre-processes queue and user data by poll check conferencing data and pre-processes queue, into
Fail data inspection is carried out before line number Data preprocess, it will the conference status that the view time started is later than current date is set as going through
History meeting realizes that two kinds of data preprocessing methods are as follows not as alternative meeting is recommended respectively:
Pretreatment for user data, article-user's inverted list that creation system saves, and calculate the hobby phase of associated user
Like degree, the hobby similarity between user is measured using cosine similarity, if N (u) is the interested meeting set of user u, N (v)
For the interested meeting set of user v, then the hobby similarity of user u and user v is
Pretreatment for conferencing data is the text data of conferencing data to be segmented and gone stop words, using in conjunction with TF-
IDF and the term vector that pre-training is carried out on large corpora carry out text representation, and document vector is expressed asWherein DiIndicate i-th document, K (t, Di) indicate word t in DiIn TF-IDF value, vtIndicate word t
Term vector, after obtaining document vector, calculate the similarity between each document using Euclidean distance;
The time is calculated to save, the intermediate data persistence obtained after pretreatment is stored in local file system;
The core recommendation component is arranged by timed task, executes recommendation task in the set time, the specific steps are as follows:
A, according to user preferences similarity matrix, the k user most like with user u is found out, is indicated with set S (u, K), by S
The middle interested meeting of user all extracts, and removes u interested meeting;For each candidate meeting i, user
U calculates its following formula of interested degree:Wherein N (i) is indicated to i meeting
Interested user, wuvThe hobby similarity for indicating user u and user v, according to the received recommendation meeting of the hope of user configuration
Quantity M selects the candidate meeting of p (u, i) maximum 2*M composition;
B, according to the text representation vector and its similarity of meeting, find out in candidate meeting set J (u) obtained in step a with
The most like meeting of the interested meeting set I (u) of user u, for each candidate meeting i ∈ J (u), user u is interested in it
Degree calculated with formula:Wherein dijThe distance for indicating meeting i and meeting j screens out M candidate meeting
After view, the consequently recommended M meeting to user u is obtained;
C, Dropbox filing queue is created by Redis, by obtained recommendation results with { user: [meeting 1, meeting 2, meeting
3 ...] json format } is added to Dropbox filing queue.
5. a kind of academic conference recommender system based on mixing proposed algorithm according to claim 1, it is characterised in that: institute
It states Dropbox profiling module and needs integrated third party's Dropbox storage service, DCampus WebLib cloud disk system is used to deposit as Dropbox
Storage service, the Dropbox profiling module is during poll Dropbox filing queue, when detecting recommendation results, in recommendation results
User successively inquire association Dropbox user name and password, by third party cloud service WebLib provided by HTTP interface step on
Corresponding account is recorded, is created under specified directory using current date as the new directory of directory name, successively upload user recommendation list
In meeting mail abstract or web-page summarization to specified directory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910042396.9A CN109933717B (en) | 2019-01-17 | 2019-01-17 | Academic conference recommendation system based on hybrid recommendation algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910042396.9A CN109933717B (en) | 2019-01-17 | 2019-01-17 | Academic conference recommendation system based on hybrid recommendation algorithm |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109933717A true CN109933717A (en) | 2019-06-25 |
CN109933717B CN109933717B (en) | 2021-05-14 |
Family
ID=66985105
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910042396.9A Active CN109933717B (en) | 2019-01-17 | 2019-01-17 | Academic conference recommendation system based on hybrid recommendation algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933717B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111796830A (en) * | 2020-06-08 | 2020-10-20 | 成都数之联科技有限公司 | Protocol analysis processing method, device, equipment and medium |
CN112687272A (en) * | 2020-12-18 | 2021-04-20 | 北京金山云网络技术有限公司 | Conference summary recording method and device and electronic equipment |
CN113077235A (en) * | 2021-04-12 | 2021-07-06 | 上海明略人工智能(集团)有限公司 | Conference schedule conflict management method and system, electronic equipment and storage medium |
CN113127633A (en) * | 2021-06-17 | 2021-07-16 | 平安科技(深圳)有限公司 | Intelligent conference management method and device, computer equipment and storage medium |
CN113420058A (en) * | 2021-07-01 | 2021-09-21 | 宁波大学 | Conversational academic conference recommendation method based on combination of user historical behaviors |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080033781A1 (en) * | 2006-07-18 | 2008-02-07 | Jonah Holmes Peretti | System and method for online product promotion |
CN101755283A (en) * | 2007-07-24 | 2010-06-23 | 三星电子株式会社 | Method and apparatus for recommending information using hybrid algorithm |
CN103049575A (en) * | 2013-01-05 | 2013-04-17 | 华中科技大学 | Topic-adaptive academic conference searching system |
CN104572874A (en) * | 2014-12-19 | 2015-04-29 | 北京锐安科技有限公司 | Webpage information extraction method and device |
CN105787068A (en) * | 2016-03-01 | 2016-07-20 | 上海交通大学 | Academic recommendation method and system based on citation network and user proficiency analysis |
CN106610970A (en) * | 2015-10-21 | 2017-05-03 | 上海文广互动电视有限公司 | Collaborative filtering-based content recommendation system and method |
-
2019
- 2019-01-17 CN CN201910042396.9A patent/CN109933717B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080033781A1 (en) * | 2006-07-18 | 2008-02-07 | Jonah Holmes Peretti | System and method for online product promotion |
CN101755283A (en) * | 2007-07-24 | 2010-06-23 | 三星电子株式会社 | Method and apparatus for recommending information using hybrid algorithm |
CN103049575A (en) * | 2013-01-05 | 2013-04-17 | 华中科技大学 | Topic-adaptive academic conference searching system |
CN104572874A (en) * | 2014-12-19 | 2015-04-29 | 北京锐安科技有限公司 | Webpage information extraction method and device |
CN106610970A (en) * | 2015-10-21 | 2017-05-03 | 上海文广互动电视有限公司 | Collaborative filtering-based content recommendation system and method |
CN105787068A (en) * | 2016-03-01 | 2016-07-20 | 上海交通大学 | Academic recommendation method and system based on citation network and user proficiency analysis |
Non-Patent Citations (2)
Title |
---|
徐傲雪 等: "学术相关通知类邮件处理系统设计", 《中国教育网络》 * |
胡迎松 等: "基于内容和协同过滤的混合推荐技术", 《第二届全国WEB信息系统及其应用会议(WISA2005")》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111796830A (en) * | 2020-06-08 | 2020-10-20 | 成都数之联科技有限公司 | Protocol analysis processing method, device, equipment and medium |
CN111796830B (en) * | 2020-06-08 | 2023-09-19 | 成都数之联科技股份有限公司 | Protocol analysis processing method, device, equipment and medium |
CN112687272A (en) * | 2020-12-18 | 2021-04-20 | 北京金山云网络技术有限公司 | Conference summary recording method and device and electronic equipment |
CN112687272B (en) * | 2020-12-18 | 2023-03-21 | 北京金山云网络技术有限公司 | Conference summary recording method and device and electronic equipment |
CN113077235A (en) * | 2021-04-12 | 2021-07-06 | 上海明略人工智能(集团)有限公司 | Conference schedule conflict management method and system, electronic equipment and storage medium |
CN113077235B (en) * | 2021-04-12 | 2024-03-22 | 上海明略人工智能(集团)有限公司 | Conference schedule conflict management method, system, electronic equipment and storage medium |
CN113127633A (en) * | 2021-06-17 | 2021-07-16 | 平安科技(深圳)有限公司 | Intelligent conference management method and device, computer equipment and storage medium |
CN113420058A (en) * | 2021-07-01 | 2021-09-21 | 宁波大学 | Conversational academic conference recommendation method based on combination of user historical behaviors |
Also Published As
Publication number | Publication date |
---|---|
CN109933717B (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933717A (en) | A kind of academic conference recommender system based on mixing proposed algorithm | |
US8601004B1 (en) | System and method for targeting information items based on popularities of the information items | |
US7949714B1 (en) | System and method for targeting advertisements or other information using user geographical information | |
US20120233209A1 (en) | Enterprise search over private and public data | |
US9002725B1 (en) | System and method for targeting information based on message content | |
Vosecky et al. | Searching for quality microblog posts: Filtering and ranking based on content analysis and implicit links | |
CN112486917A (en) | Method and system for automatically generating information-rich content from multiple microblogs | |
Milanesi et al. | How do you depict sustainability? An analysis of images posted on Instagram by sustainable fashion companies | |
WO2012137215A1 (en) | A system and method for communication | |
US9697527B2 (en) | Centralized social network response tracking | |
US10516643B2 (en) | Client side social network response tracking | |
Abel et al. | Linkage, aggregation, alignment and enrichment of public user profiles with Mypes | |
Wang et al. | Spade: a social-spam analytics and detection framework | |
CN102844757A (en) | Company network | |
Joly et al. | Contextual recommendation of social updates, a tag-based framework | |
Arif et al. | Social network extraction: a review of automatic techniques | |
JP2009510598A (en) | Communication and collaboration system | |
Liu et al. | Research on the application of SNS in university libraries: A case study of microblogs in Chinese “211 project” universities | |
Steele et al. | Putting the public into public health information dissemination: social media and health-related web pages | |
Xianlei et al. | Finding domain experts in microblogs | |
US20180101615A1 (en) | Systems, methods and techniques for customizable domain-based searching | |
Lutu | Web 2.0 computing and social media as solution enablers for economic development in Africa | |
US20160028659A1 (en) | System and Method for Targeting Advertisements or Other Information Based on Recently Sent Message or Messages | |
Yang et al. | Micro-blog friend recommendation algorithms based on content and social relationship | |
Feng et al. | Negative Examples Sampling Based on Factorization Machines for OCCF |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |