CN112380457A - Accurate personalized recommendation method based on purchase information - Google Patents
Accurate personalized recommendation method based on purchase information Download PDFInfo
- Publication number
- CN112380457A CN112380457A CN202011417355.2A CN202011417355A CN112380457A CN 112380457 A CN112380457 A CN 112380457A CN 202011417355 A CN202011417355 A CN 202011417355A CN 112380457 A CN112380457 A CN 112380457A
- Authority
- CN
- China
- Prior art keywords
- information
- recommendation
- supplier
- data
- suppliers
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000001914 filtration Methods 0.000 claims abstract description 44
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000009193 crawling Effects 0.000 claims description 25
- 230000011218 segmentation Effects 0.000 claims description 10
- 238000013500 data storage Methods 0.000 claims description 9
- 238000004140 cleaning Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 239000000047 product Substances 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims description 3
- 239000013589 supplement Substances 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 5
- 239000003795 chemical substances by application Substances 0.000 description 4
- 235000014510 cooky Nutrition 0.000 description 4
- 230000006399 behavior Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 2
- 241000721047 Danaus plexippus Species 0.000 description 1
- 238000010219 correlation analysis Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000005059 dormancy Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000005242 forging Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/08—Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
- G06Q10/087—Inventory or stock management, e.g. order filling, procurement or balancing against orders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0631—Item recommendations
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Accounting & Taxation (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Finance (AREA)
- Economics (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Human Resources & Organizations (AREA)
- Entrepreneurship & Innovation (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an accurate personalized recommendation method based on purchasing information. The accurate personalized recommendation method based on the purchase information comprises the following steps: acquiring information of suppliers and purchasers, and storing data; performing data preprocessing on the acquired information, acquiring words meeting the specified part of speech, and acquiring candidate labels based on the acquired words; selecting a recommendation algorithm which best meets the requirements from methods based on collaborative filtering recommendation, content recommendation and combined recommendation, circularly traversing purchased candidate tags, extracting the tags of the candidate tags to obtain a tag set, and selecting suppliers with top ranks; and recommending the purchasing information to the matched purchasing merchants. The invention provides a brand-new point cloud registration method system, which can accurately and individually match purchase information with a supplier and recommend the purchase information with individual recommendation characteristics to the supplier.
Description
Technical Field
The invention relates to the technical field of computers, in particular to an accurate personalized recommendation method based on purchase information.
Background
In recent years, internet information is exponentially increased, and a recommendation system can help a user to find interested articles, so that the recommendation system is widely applied to e-commerce, search engines, video music websites, social networks and the like. When a user wants to go online for shopping, the recommendation system can help the user to select satisfied commodities; to learn about the information, the recommender system will prepare interesting news for you; the recommendation system can provide a course suitable for you for learning to charge; wanting to relax, the recommendation system can devote you a short video that you want to get; wanting to close eyes and foster spirit, the recommendation system can play music of the scene for you, so to speak, the recommendation system never influences the life of people like the prior art.
With the development of internet technology, various suppliers can inquire more and more types of purchasing information and larger data volume on the internet, such as national bidding purchasing information platforms, government purchasing information networks and the like, and tens of thousands of purchasing information are published every day. How to solve the technical problems of complicated information and accurate matching of each piece of purchasing information to an accurate supplier in the massive data.
Disclosure of Invention
The invention provides an accurate personalized recommendation method based on purchasing information, and aims to solve the technical problems that in the background technology, information is complicated and purchasing information is accurately matched with an accurate supplier in massive data.
In order to achieve the above object, the present invention provides a precise personalized recommendation method based on procurement information, step S1, obtaining information of suppliers and procurers, and storing data;
step S2, data preprocessing is carried out on the acquired information, words meeting the appointed part of speech are collected, and candidate labels are collected based on the collected words;
s3, selecting a recommendation algorithm which best meets requirements from methods based on collaborative filtering recommendation, content recommendation and combined recommendation, circularly traversing purchased candidate labels, extracting the labels of the candidate labels to obtain a label set, and selecting suppliers with top ranks;
and step S4, looping step S3 until all candidate tags of the purchase are recommended to the matched buyers.
Preferably, the acquiring information of the supplier and the buyer in step S1 includes: step S11, collecting information from network; the method specifically comprises the following steps: according to a given initial URL seed set, the parameter crawling depth set by the system and the number of URLs downloaded in each layer, completing a webpage crawling task according to a breadth-first traversal cycle until meeting the condition that a crawler finishes the task.
Preferably, the acquiring information of the supplier and the buyer in step S1 further includes: step S12, obtaining the information of the supplier and the buyer from the existing system data, specifically including the following steps:
step S121, registering a supplier and a buyer to become a system user;
and step S122, the registered suppliers and the buyers supplement the corresponding basic information, including the purchasing information issued by the buyers, the product information of the suppliers, and the characteristic data, preference and classification information of the buyers and the suppliers.
Preferably, the step S11 of collecting information from the network specifically includes the following steps:
step S111, compiling a crawler program with the ability of bypassing the anti-crawler;
and step S112, acquiring supplier information and purchasing information data from the Internet through a crawler program.
Preferably, the step S112 specifically includes the following steps:
step S1121, selecting a seed file to be searched in the seed set, and selecting a URL from the seed file and starting crawling work by the distributed web crawler;
step S1122, after the WEB crawler program obtains the URL, establishing an Http link with a related WEB server according to the URL, if the link is successful, entering step S1123, and if the link is unsuccessful, marking the link;
step S1123, capturing the page by using an Http protocol;
step S1124, comprehensively analyzing the captured page to extract effective key information;
step S1125, if the analyzed webpage contains repeated URL links, filtering the repeated URLs;
step S1126, continuously saving the filtered URL link to a URL link library to prepare for crawling a webpage for a web crawler at the next stage;
and step S1127, crawling is carried out according to the updated URL, whether the crawling stopping condition set by the user is met or not is judged, if yes, the crawling is stopped, and if not, the crawling is executed in a circulating mode all the time.
Preferably, the data preprocessing in step S2 includes data cleaning, chinese word segmentation, part of speech tagging, and stop word filtering, and specifically includes the following steps:
step S21, data cleaning: filtering useless information in the acquired information preliminarily, reserving the useful information, and finally reserving a text set only containing the feature words;
step S22, performing word segmentation and part-of-speech tagging: taking words meeting the specified part of speech as candidate words;
step S23, calculating TF-IDF value of each word;
and step S24, according to the TF-IDF value descending order of each word, collecting candidate labels and outputting the possible keywords with the specified number.
Preferably, the label extraction in step S3 is specifically to perform label extraction by a method including a method based on word frequency and based on a support vector machine, and includes the following steps:
step S31, obtaining a user attribute database and a candidate item set;
step S32, extracting the characteristics of the user attribute database through the characteristic vector, and obtaining the related recommendation of the initial characteristic article from the candidate article set;
and step S33, determining a final recommendation result by combining the characteristics of candidate item set filtering, ranking and recommendation interpretation selection.
Preferably, the collaborative filtering based recommendation in step S3 includes a supplier-based collaborative filtering algorithm recommendation and a procurement information-based collaborative filtering algorithm recommendation, where:
the supplier-based collaborative filtering algorithm recommendation specifically comprises: when a supplier is newly added, recommending the data which is in the data storage of the step S1 and is interested by the suppliers with the same industry and similar operation range to the supplier;
the collaborative filtering algorithm recommendation based on the purchase information specifically comprises the following steps: based on the supplier 'S previous purchase data of interest, data in the data store of step S1 that has the same tag as the supplier' S previous purchase data of interest is also recommended to the supplier.
Preferably, in step S3, based on the content recommendation, specifically, constructing a provider preference document according to the provider history information, calculating the similarity between the recommended purchase information and the provider preference document, and recommending the most similar purchase information to the provider.
Preferably, the recommendation algorithm that best meets the requirement selected in step S3 is specifically:
when the data amount in the data storage is not large in step S1, a single algorithm may be used to obtain a corresponding supplier data match;
when the data volume in the data storage is larger in step S1, a rough recommendation result is generated by the supplier-based collaborative filtering algorithm, then the purchasing information-based collaborative filtering algorithm is used for removing and further refining, and finally the content-based collaborative filtering algorithm is used for making a more accurate recommendation on the basis of the previous recommendation result.
The technical effects which can be achieved by adopting the invention are as follows: the invention matches the relative purchasing information for the supplier by the digital operation technology and quickly and accurately recommends the purchasing information to the supplier.
Drawings
FIG. 1 is a general flow chart of a precise personalized recommendation method based on purchasing information according to the present invention;
FIG. 2 is a flow chart of data collection from the Internet according to an accurate personalized recommendation method based on purchasing information;
FIG. 3 is a part-of-speech tagging flow chart of an accurate personalized recommendation method based on procurement information according to the present invention;
FIG. 4 is a data recommendation flow chart of label extraction of an accurate personalized recommendation method based on procurement information according to the present invention;
FIG. 5 is a supplier-based collaborative filtering algorithm diagram of an accurate personalized recommendation method based on procurement information according to the present invention;
FIG. 6 is a diagram of a collaborative filtering algorithm based on procurement information for an accurate personalized recommendation method based on procurement information according to the present invention;
fig. 7 is a content-based collaborative filtering algorithm diagram of an accurate personalized recommendation method based on procurement information according to the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
Aiming at the existing problems, the invention provides an accurate personalized recommendation method based on purchase information.
As shown in fig. 1, which is a flowchart of the method of the present invention, a method for accurate personalized recommendation based on procurement information, step S1, obtaining information of suppliers and buyers, and storing the data;
step S2, data preprocessing is carried out on the acquired information, words meeting the appointed part of speech are collected, and candidate labels are collected based on the collected words;
s3, selecting a recommendation algorithm which best meets requirements from methods based on collaborative filtering recommendation, content recommendation and combined recommendation, circularly traversing purchased candidate labels, extracting the labels of the candidate labels to obtain a label set, and selecting the suppliers listed in the top five;
and step S4, looping step S3 until all candidate tags of the purchase are recommended to the matched buyers.
The step S1 of acquiring the information of the supplier and the buyer includes:
step S11, collecting information from network; the method specifically comprises the following steps: according to a given initial URL seed set, the parameter crawling depth set by the system and the number of URLs downloaded in each layer, completing a webpage crawling task according to a breadth-first traversal cycle until meeting the condition that a crawler finishes the task.
Step S12, obtaining the information of supplier and buyer from the existing system data; the method specifically comprises the following steps:
step S121, registering a supplier and a buyer to become a system user;
and step S122, the registered suppliers and the buyers supplement the corresponding basic information, including the purchasing information issued by the buyers, the product information of the suppliers, and the characteristic data, preference and classification information of the buyers and the suppliers.
The step S11 of collecting information from the network specifically includes the following steps:
step S111, compiling a crawler program with the ability of bypassing the anti-crawler; the crawler program is specifically a program for effectively acquiring website data by closing a Robots protocol, forging a request header, based on an IP proxy, based on Cookies and in a speed-limited access mode. Even though Robots is called a "monarch agreement" for both parties, there are many cases where the protocol is turned on to make the crawler unobtainable. The fake request header is: the server knows who visits the website through the User-Agent field, each browser has a regular fixed User-Agent, and the server cannot be distinguished as long as the crawler is disguised as a regular browser. The IP-based proxy is: the IP agents are used for accessing the website in turn, so that the time delay of accessing the server is increased, the frequency is reduced, and the server is difficult to detect. Based on Cookies, the method comprises the following steps: cookies expiration events of the target website can be researched, a browser can be simulated, and Cookies are generated at regular time to visit the website without being sealed. The speed limit access is as follows: if the crawler cyclically violently crawls data without dormancy, the IP is sealed at any time, the speed-limiting access of the crawler is easy to realize, the capturing time is long, the efficiency is high, and the crawling of the target content can be quickly realized by combining with an IP agent.
And step S112, acquiring supplier information and purchasing information data from the Internet through a crawler program.
As shown in fig. 2, the step S112 specifically includes the following steps:
step S1121, selecting a seed file to be searched in the seed set, and selecting a URL from the seed file and starting crawling work by the distributed web crawler;
step S1122, after the WEB crawler program obtains the URL, establishing an Http link with a related WEB server according to the URL, if the link is successful, entering step S1123, and if the link is unsuccessful, marking the link;
step S1123, capturing the page by using an Http protocol;
step S1124, comprehensively analyzing the captured page to extract effective key information;
step S1125, if the analyzed webpage contains repeated URL links, filtering the repeated URLs;
step S1126, continuously saving the filtered URL link to a URL link library to prepare for crawling a webpage for a web crawler at the next stage;
and step S1127, crawling is carried out according to the updated URL, whether the crawling stopping condition set by the user is met or not is judged, if yes, the crawling is stopped, and if not, the crawling is executed in a circulating mode all the time.
The data preprocessing in the step S2 includes data cleaning, chinese word segmentation, part of speech tagging, and stop word filtering, and specifically includes the following steps:
step S21, data cleaning: useless information in the information obtained by preliminary filtering, such as some irrelevant symbols, dates and the like, is reserved, and finally a text set only containing characteristic words is reserved;
step S22, performing word segmentation and part-of-speech tagging: taking words meeting the specified part of speech as candidate words;
wherein, the Chinese word segmentation is as follows: identifying each word from the Chinese sentence without separation;
as shown in fig. 3, parts of speech are labeled as: each word obtained by word segmentation in the text is marked with a suitable mark, namely, each word is determined to be a noun, a verb, an adjective or other part of speech.
Step S23, calculating TF-IDF (term-inverse document frequency) values of each word respectively;
and step S24, according to the TF-IDF value descending order of each word, collecting candidate labels and outputting the possible keywords with the specified number.
As shown in fig. 4, the extracting of the tag in step S3 is specifically a tag extraction performed by a method including a method based on a term frequency and a support vector machine, and includes the following steps:
step S31, obtaining a user attribute database and a candidate item set;
step S32, extracting the characteristics of the user attribute database through the characteristic vector, and obtaining the related recommendation of the initial characteristic article from the candidate article set;
and step S33, determining a final recommendation result by combining the characteristics of candidate item set filtering, ranking and recommendation interpretation selection.
The recommended explanation is specifically as follows: and predicting the purchasing information interested by the suppliers through relevant purchasing information issued by the purchasers, supplier industries and the interaction of the operation range of each supplier, so as to recommend the most suitable purchasing information to a specific supplier.
The collaborative filtering-based recommendation in step S3 includes a supplier-based collaborative filtering algorithm recommendation and a procurement information-based collaborative filtering algorithm recommendation, where:
as shown in fig. 5, the supplier-based collaborative filtering algorithm recommendation specifically includes: when a supplier is newly added, recommending the data which is in the data storage of the step S1 and is interested by the suppliers with the same industry and similar operation range to the supplier;
as shown in fig. 6, the collaborative filtering algorithm recommendation based on the procurement information specifically includes: based on the supplier 'S previous purchase data of interest, data in the data store of step S1 that has the same tag as the supplier' S previous purchase data of interest is also recommended to the supplier.
As shown in fig. 7, in step S3, based on the content recommendation, specifically, constructing a provider preference document according to the provider history information, calculating the similarity between the recommended procurement information and the provider preference document, and recommending the most similar procurement information to the provider.
The recommendation algorithm that best meets the requirements selected in step S3 is specifically:
when the data amount in the data storage is not large in step S1, a single algorithm may be used to obtain a corresponding supplier data match;
when the data volume in the data storage is larger in step S1, a rough recommendation result is generated by the supplier-based collaborative filtering algorithm, then the purchasing information-based collaborative filtering algorithm is used for removing and further refining, and finally the content-based collaborative filtering algorithm is used for making a more accurate recommendation on the basis of the previous recommendation result.
In another preferred embodiment of the invention, the crawler program is used to capture the purchase information of the item "computer equipment" and the supplier, and the correlation analysis between the supplier label and the purchase information is used to detect the subject correlation between the supplier label and the purchase information issued by the supplier label, thereby providing possibility for the purchase information identification based on the supplier label. The method specifically comprises the following steps:
step 1: and (5) acquiring supplier data. The method comprises the steps of collecting a supplier with 'science and technology and research' industries as a research object, and capturing 245 supplier data through a crawler program, wherein the supplier relates to a supplier name, a supplier type, a belonging industry, an address, a supplier general view, an operation range, establishment time and the like.
Step 2: and (6) acquiring purchasing information data. Collecting purchase information with 'computer equipment' and 'software equipment' and using crawler program to fetch 15377 pieces of purchase information, in which the purchase information includes purchase information name, purchase code, purchase information description, requirements for operation party, cut-off time, delivery time and delivery place, etc.
And step 3: data preprocessing and label matching.
1) Data cleaning: in the data preparation process, a crawler program is used for acquiring user-defined tags, 245 suppliers are acquired this time, users with tags of 0 are removed, and 207 suppliers and 14399 pieces of acquisition information are finally obtained.
2) Word segmentation processing: on the basis of the data, the information of the suppliers and the purchasing is segmented, the information of the suppliers is segmented through a balance segmentation system, and words in the obtained result are provided with part-of-speech marks, such as nouns/n, verbs/v and adjectives/a. The tags are mainly nouns, and exclude data of other parts of speech. Finally, one station and one station are obtained. Removing some meaningless descriptors, and finally labeling the corresponding labels of the suppliers, namely computer equipment, hardware equipment, network hardware and Beijing.
3) The supplier label matches the procurement information label.
And 4, step 4: and matching the purchasing merchants with the purchasing information one by one according to a collaborative filtering algorithm based on the purchasing information.
The invention matches the relative purchasing information for the supplier by the digital operation technology and quickly and accurately recommends the purchasing information to the supplier. Specifically, aiming at solving the problem of information overload by finding purchase information which is attached to a supplier from massive purchase information by the supplier, the most important function of the accurate personalized recommendation method based on the purchase information is to generate personalized recommendation by analyzing the behaviors of the supplier and the behaviors of other suppliers so as to 'guess' the preference and interest of the supplier. The personalized recommendation service based on the purchasing information can not only improve the utilization rate of the purchasing information, but also enable a supplier to quickly match the purchasing information meeting the requirements. The invention is mainly based on personalized algorithm, information searching behavior and supplier preference. And the label corresponding to the purchasing information, the supplier industry and the operation range are taken as intermediate variables to discuss how the personalized recommendation characteristics influence the purchasing information to be recommended to the supplier.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. An accurate personalized recommendation method based on purchase information is characterized by comprising the following steps:
step S1, obtaining information of suppliers and purchasers, and storing data;
step S2, data preprocessing is carried out on the acquired information, words meeting the appointed part of speech are collected, and candidate labels are collected based on the collected words;
s3, selecting a recommendation algorithm which best meets requirements from methods based on collaborative filtering recommendation, content recommendation and combined recommendation, circularly traversing purchased candidate labels, extracting the labels of the candidate labels to obtain a label set, and selecting suppliers with top ranks;
and step S4, looping step S3 until all candidate tags of the purchase are recommended to the matched buyers.
2. The method for accurate personalized recommendation based on purchasing information as claimed in claim 1, wherein the step S1 of obtaining the information of suppliers and purchasers includes: step S11, collecting information from network; the method specifically comprises the following steps: according to a given initial URL seed set, the parameter crawling depth set by the system and the number of URLs downloaded in each layer, completing a webpage crawling task according to a breadth-first traversal cycle until meeting the condition that a crawler finishes the task.
3. The method for accurate personalized recommendation based on purchasing information as claimed in claim 1, wherein the step of obtaining the information of the supplier and the purchaser in step S1 further comprises: step S12, obtaining the information of the supplier and the buyer from the existing system data, specifically including the following steps:
step S121, registering a supplier and a buyer to become a system user;
and step S122, the registered suppliers and the buyers supplement the corresponding basic information, including the purchasing information issued by the buyers, the product information of the suppliers, and the characteristic data, preference and classification information of the buyers and the suppliers.
4. The method for accurate personalized recommendation based on procurement information of claim 2, characterized in that the step S11 of collecting information from the internet specifically comprises the following steps:
step S111, compiling a crawler program with the ability of bypassing the anti-crawler;
and step S112, acquiring supplier information and purchasing information data from the Internet through a crawler program.
5. The method as claimed in claim 4, wherein the step S112 specifically includes the following steps:
step S1121, selecting a seed file to be searched in the seed set, and selecting a URL from the seed file and starting crawling work by the distributed web crawler;
step S1122, after the WEB crawler program obtains the URL, establishing an Http link with a related WEB server according to the URL, if the link is successful, entering step S1123, and if the link is unsuccessful, marking the link;
step S1123, capturing the page by using an Http protocol;
step S1124, comprehensively analyzing the captured page to extract effective key information;
step S1125, if the analyzed webpage contains repeated URL links, filtering the repeated URLs;
step S1126, continuously saving the filtered URL link to a URL link library to prepare for crawling a webpage for a web crawler at the next stage;
and step S1127, crawling is carried out according to the updated URL, whether the crawling stopping condition set by the user is met or not is judged, if yes, the crawling is stopped, and if not, the crawling is executed in a circulating mode all the time.
6. The method as claimed in claim 1, wherein the step S2 of preprocessing the data includes data cleaning, chinese word segmentation, part of speech tagging, and stop word filtering, and includes the following steps:
step S21, data cleaning: filtering useless information in the acquired information preliminarily, reserving the useful information, and finally reserving a text set only containing the feature words;
step S22, performing word segmentation and part-of-speech tagging: taking words meeting the specified part of speech as candidate words;
step S23, calculating TF-IDF value of each word;
and step S24, according to the TF-IDF value descending order of each word, collecting candidate labels and outputting the possible keywords with the specified number.
7. The method for accurate personalized recommendation based on procurement information of claim 1, wherein the label extraction in step S3 is specifically a label extraction by a method comprising a term frequency based and support vector machine based method, comprising the following steps:
step S31, obtaining a user attribute database and a candidate item set;
step S32, extracting the characteristics of the user attribute database through the characteristic vector, and obtaining the related recommendation of the initial characteristic article from the candidate article set;
and step S33, determining a final recommendation result by combining the characteristics of candidate item set filtering, ranking and recommendation interpretation selection.
8. The method of claim 1, wherein the collaborative filtering based recommendation in step S3 includes a supplier-based collaborative filtering algorithm recommendation and a procurement information-based collaborative filtering algorithm recommendation, wherein:
the supplier-based collaborative filtering algorithm recommendation specifically comprises: when a supplier is newly added, recommending the data which is in the data storage of the step S1 and is interested by the suppliers with the same industry and similar operation range to the supplier;
the collaborative filtering algorithm recommendation based on the purchase information specifically comprises the following steps: based on the supplier 'S previous purchase data of interest, data in the data store of step S1 that has the same tag as the supplier' S previous purchase data of interest is also recommended to the supplier.
9. The method as claimed in claim 8, wherein the step S3 is implemented by constructing a provider preference document based on the content recommendation, calculating the similarity between the recommended procurement information and the provider preference document, and recommending the most similar procurement information to the provider.
10. The accurate personalized recommendation method based on procurement information of claim 9, characterized in that the recommendation algorithm that best meets the requirements in step S3 is specifically:
when the data amount in the data storage is not large in step S1, a single algorithm may be used to obtain a corresponding supplier data match;
when the data volume in the data storage is larger in step S1, a rough recommendation result is generated by the supplier-based collaborative filtering algorithm, then the purchasing information-based collaborative filtering algorithm is used for removing and further refining, and finally the content-based collaborative filtering algorithm is used for making a more accurate recommendation on the basis of the previous recommendation result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011417355.2A CN112380457A (en) | 2020-12-07 | 2020-12-07 | Accurate personalized recommendation method based on purchase information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011417355.2A CN112380457A (en) | 2020-12-07 | 2020-12-07 | Accurate personalized recommendation method based on purchase information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112380457A true CN112380457A (en) | 2021-02-19 |
Family
ID=74590625
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011417355.2A Pending CN112380457A (en) | 2020-12-07 | 2020-12-07 | Accurate personalized recommendation method based on purchase information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112380457A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239319A (en) * | 2021-05-17 | 2021-08-10 | 云工工业科技(深圳)有限公司 | Method for automatically matching and pushing supplier to bid and quote |
CN113420231A (en) * | 2021-05-25 | 2021-09-21 | 国网浙江省电力有限公司物资分公司 | Data recommendation algorithm applied to purchasing system |
CN114387010A (en) * | 2021-12-07 | 2022-04-22 | 北京隆道网络科技有限公司 | Information pushing method and system based on supply chain management |
CN116680268A (en) * | 2023-06-09 | 2023-09-01 | 四川观想科技股份有限公司 | Intelligent equipment full life cycle comprehensive guarantee data management method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156450A (en) * | 2014-08-15 | 2014-11-19 | 同济大学 | Item information recommending method based on user network data |
CN108256024A (en) * | 2018-01-10 | 2018-07-06 | 链家网(北京)科技有限公司 | A kind of source of houses recommends method |
CN108960986A (en) * | 2018-06-26 | 2018-12-07 | 西安交通大学 | A kind of supplier's recommended method based on web crawlers |
CN109767292A (en) * | 2018-12-20 | 2019-05-17 | 厦门笨鸟电子商务有限公司 | A kind of buyer company recommended method |
-
2020
- 2020-12-07 CN CN202011417355.2A patent/CN112380457A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104156450A (en) * | 2014-08-15 | 2014-11-19 | 同济大学 | Item information recommending method based on user network data |
CN108256024A (en) * | 2018-01-10 | 2018-07-06 | 链家网(北京)科技有限公司 | A kind of source of houses recommends method |
CN108960986A (en) * | 2018-06-26 | 2018-12-07 | 西安交通大学 | A kind of supplier's recommended method based on web crawlers |
CN109767292A (en) * | 2018-12-20 | 2019-05-17 | 厦门笨鸟电子商务有限公司 | A kind of buyer company recommended method |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113239319A (en) * | 2021-05-17 | 2021-08-10 | 云工工业科技(深圳)有限公司 | Method for automatically matching and pushing supplier to bid and quote |
CN113420231A (en) * | 2021-05-25 | 2021-09-21 | 国网浙江省电力有限公司物资分公司 | Data recommendation algorithm applied to purchasing system |
CN114387010A (en) * | 2021-12-07 | 2022-04-22 | 北京隆道网络科技有限公司 | Information pushing method and system based on supply chain management |
CN114387010B (en) * | 2021-12-07 | 2022-07-12 | 北京隆道网络科技有限公司 | Information pushing method and system based on supply chain management |
CN116680268A (en) * | 2023-06-09 | 2023-09-01 | 四川观想科技股份有限公司 | Intelligent equipment full life cycle comprehensive guarantee data management method |
CN116680268B (en) * | 2023-06-09 | 2024-02-27 | 四川观想科技股份有限公司 | Intelligent equipment full life cycle comprehensive guarantee data management method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Khder | Web scraping or web crawling: State of art, techniques, approaches and application. | |
Zhao et al. | Connecting social media to e-commerce: Cold-start product recommendation using microblogging information | |
CN108694223B (en) | User portrait database construction method and device | |
CN112380457A (en) | Accurate personalized recommendation method based on purchase information | |
EP3239855A1 (en) | Analysis and collection system for user interest data and method therefor | |
US20120198056A1 (en) | Techniques for Analyzing Website Content | |
CN106919625B (en) | Internet user attribute identification method and device | |
CN102073726B (en) | Structured data import method and device for search engine system | |
CN103886074A (en) | Commodity recommendation system based on social media | |
US11561988B2 (en) | Systems and methods for harvesting data associated with fraudulent content in a networked environment | |
CN110298029A (en) | Friend recommendation method, apparatus, equipment and medium based on user's corpus | |
US7962523B2 (en) | System and method for detecting templates of a website using hyperlink analysis | |
EP3289487B1 (en) | Computer-implemented methods of website analysis | |
TW201401088A (en) | Search method and apparatus | |
US9619705B1 (en) | Object identification in visual media | |
Radovanović et al. | Review spam detection using machine learning | |
CN111858915A (en) | Information recommendation method and system based on label similarity | |
Rao et al. | A survey on methods used in web usage mining | |
Dias et al. | Automating the extraction of static content and dynamic behaviour from e-commerce websites | |
CN108446333B (en) | Big data text mining processing system and method thereof | |
CN112989824A (en) | Information pushing method and device, electronic equipment and storage medium | |
KR20190055963A (en) | Goods exposure system in online shopping mall with keyword analyzing | |
CN111611484A (en) | Stock recommendation method and system based on article attribute identification | |
US20130232139A1 (en) | Electronic device and method for generating recommendation content | |
CN113127736A (en) | Classification recommendation method and device based on search history |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210219 |