CN110263022A - Hotel's data matching method and device - Google Patents

Hotel's data matching method and device Download PDF

Info

Publication number
CN110263022A
CN110263022A CN201910380145.1A CN201910380145A CN110263022A CN 110263022 A CN110263022 A CN 110263022A CN 201910380145 A CN201910380145 A CN 201910380145A CN 110263022 A CN110263022 A CN 110263022A
Authority
CN
China
Prior art keywords
hotel
data
matching
platform
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910380145.1A
Other languages
Chinese (zh)
Other versions
CN110263022B (en
Inventor
殷际峰
乔扬
杨德峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Silk Road Tiandi Electronic Commerce Co Ltd
Original Assignee
Shenzhen Silk Road Tiandi Electronic Commerce Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Silk Road Tiandi Electronic Commerce Co Ltd filed Critical Shenzhen Silk Road Tiandi Electronic Commerce Co Ltd
Priority to CN201910380145.1A priority Critical patent/CN110263022B/en
Publication of CN110263022A publication Critical patent/CN110263022A/en
Application granted granted Critical
Publication of CN110263022B publication Critical patent/CN110263022B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/12Hotels or restaurants
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of hotel's data matching method and devices, it is related to hotel's data management field, the wherein platform hotel data that method passes through acquisition different platform, and carry out data processing, obtain original hotel's data, original hotel's data directory is written, then according to platform priority, choose the original hotel's data of the first platform of original hotel's data, write-in matching hotel's concordance list, according to the similarity of matching attribute, hotel is successively carried out with original hotel's data of remaining platform to match to obtain matching hotel's concordance list, matching list and basic information table are generated according to matching hotel's concordance list, matching Accuracy Verification is done in conjunction with matching list and basic information table.The present invention is verified in such a way that ES establishes index and reversely verifies by similarity algorithm, and execution efficiency is high, it can filter and match poor hotel's link, multi-platform hotel can automatically be matched, matching accuracy is more than 90%, artificial accounting is reduced, hotel's matching efficiency is improved.

Description

Hotel's data matching method and device
Technical field
The present invention relates to hotel's data management field, especially a kind of hotel's data matching method and device.
Background technique
Existing considerable tourism platform, same hotel is likely to occur on different platforms, therefore online tourism website Need to match hotel to merge the identical hotel of essence, in existing hotel's matching scheme, mainstream way be to title address into Then row parsing is added the latitudes such as telephone number and is given a mark and weighted to different dimensions by way of weight is previously set Summation, obtains similarity to the end, determines whether the link of the hotel Liang Jia POI is same according to the similarity threshold of setting.But It is that in the matching process, artificial accounting is still higher, for example, 60% or more POI is real in the application scenarios of Meituan wine trip Body links matching task and carries out full automatic treatment, and remaining 40%POI entity link needs to carry out artificial operation processing, mainly relates to And duplicate removal and Late Stage Verification work, labor intensive resource influence matching efficiency.
Therefore it needs to propose that one kind can reduce artificial accounting, human resources is liberated, to improve the hotel of matching efficiency Matching process.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.For this purpose, of the invention Purpose is to provide a kind of hotel's matching process and device for improving matching efficiency.
The technical scheme adopted by the invention is that:
In a first aspect, the present invention provides a kind of hotel's data matching method, matched using ElasticSearch execution Journey, comprising:
The platform hotel data of different platform are obtained, and data processing is carried out to platform hotel data, are obtained original Original hotel's data directory is written in hotel's data;
According to preset platform priority orders, original hotel's data of the first platform, write-in matching hotel index are chosen Table carries out hotel's matching with the original hotel's data of the platform of remaining platform one by one, is matched according to the similarity of matching attribute Hotel's concordance list;
Matching list and basic information table are generated according to matching hotel's concordance list, the basic information table includes hotel's mark Quasi- number and different platform original number;
In conjunction with the matching list and the basic information table, matching Accuracy Verification is carried out.
Further, the matched process in the hotel specifically:
Original hotel's data of the first platform are inquired one by one in original hotel's data directory, until inquiring phase When like the original hotel's data for spending highest another platform, reversely verified in original hotel's data directory described another Original hotel's data of platform, if inquire original hotel's data of another platform, compare two it is described another flat Whether hotel's number of the original hotel's data of platform is consistent;
It, will be described if the similarity that the hotel Liang Ge numbers the consistent and described matching attribute is greater than the first similarity threshold The hotel of original hotel's data of another platform numbers, and original hotel's number of the first platform in matching hotel's concordance list is written According to matching hotel number in, otherwise, using original hotel's data of another platform as the write-in of new data described With in hotel's concordance list.
Further, the matching Accuracy Verification specifically:
Lookup names are identical from the matching list and the identical hotel in address, offline longitude and latitude mistake and Matching Platform number Measure few hotel;
And/or the conduct verifying hotel, hotel of preset quantity is randomly selected in the matching list, in the matching hotel Title, the address similarity in each verifying hotel are successively calculated in concordance list, and matching matter is assessed according to calculated result Amount.
Further, the data processing includes: data cleansing and/or the verification of each platform data and/or data deduplication.
Further, the data cleansing includes:
Hotel in platform hotel data China and International Hotel are distinguished according to coordinate inverse check function;
And mutually verified according to the coordinate inverse check function of different providers, hotel's coordinate and hotel address are carried out pair It answers.
Further, each platform data, which verifies, includes:
The longitude and latitude in hotel in platform hotel data is verified whether in the second preset range, and offline does not meet The platform hotel data of two preset ranges;
Verify whether the title in hotel, address, longitude and latitude field in platform hotel data have content missing, if there is Content lacks, then offline corresponding hotel's data;
It verifies the star in hotel in platform hotel data and/or scores whether in third preset range, and modify Data normalization processing is done in the hotel for not meeting third preset range to it.Further, the data deduplication includes:
Duplicate removal is carried out to title, the identical hotel in address;
And/or hotel name is identical, address is located at the hotel in the 4th preset range, it is included into repetition hotel library by group, Hotel, the hotel Zhong Tongzu, library is repeated to described, obtains price, the offline wine for not getting price according to prefixed time interval Shop;
If being deposited into spare hotel library without price with group hotel, price acquisition is periodically carried out, when getting After price, then activation has the hotel of price, reenters in platform hotel data.
Second aspect, the present invention provide a kind of hotel's data matching device, comprising:
Platform hotel data crawl device, for obtaining the platform hotel data of different platform, and to the platform hotel Data carry out data processing, obtain original hotel's data, original hotel's data directory is written;
Hotel's coalignment, for choosing original hotel's data of the first platform according to preset platform priority orders, Write-in matching hotel's concordance list is carried out with the original hotel's data of the platform of remaining platform one by one according to the similarity of matching attribute Hotel's matching obtains matching hotel's concordance list;
Matching list and essential information meter apparatus are generated, for generating matching list and basis according to matching hotel's concordance list Information table, the basic information table include hotel's standard number and different platform original number;
Accuracy Verification device is matched, for carrying out matching accuracy in conjunction with the matching list and the basic information table Verifying.
The third aspect, the present invention provide a kind of hotel's Data Matching equipment, comprising:
At least one processor;And the memory being connect at least one described processor communication;
Wherein, the processor is by calling the computer program stored in the memory, for executing such as first party The described in any item hotel's data matching methods in face.
Fourth aspect, the present invention provide a kind of computer readable storage medium, the computer-readable recording medium storage There are computer executable instructions, the computer executable instructions are for executing computer as first aspect is described in any item Method.
The beneficial effect of the embodiment of the present invention is:
The embodiment of the present invention counts platform hotel data by obtaining a large amount of platform hotels data of different platform According to processing, original hotel's data are obtained, original hotel's data directory is written, then according to platform priority, choose original wine The original hotel's data of first platform of shop data, write-in matching hotel's concordance list, according to the similarity of matching attribute, successively with its Original hotel's data of remaining platform carry out hotel's matching, obtain matching hotel's concordance list, according to matching hotel's concordance list generation With table and basic information table, matching Accuracy Verification is carried out in obtained matching list.The present invention is built by ElasticSearch Lithol draws and is reversely verified by similarity algorithm, and execution efficiency is high, can filter and match poor hotel's link, and By case verification, full-automatic matching can be realized to multi-platform hotel, Auto-matching accuracy is more than 90%, therefore is only needed A small amount of human resources supplement completes matching process, reduces artificial accounting, improves hotel's matching efficiency.
The present invention is suitable for various types of matchings or rate of exchange project.
Detailed description of the invention
Fig. 1 is hotel's data matching method implementation flow chart of embodiment one in the present invention;
Fig. 2 is original hotel's data directory and the matching hotel of hotel's data matching method of embodiment one in the present invention Concordance list schematic diagram;
Fig. 3 is that hotel's data matching method of embodiment one in the present invention integrally realizes block schematic illustration;
Fig. 4 is hotel's data matching device structural block diagram of embodiment two in the present invention.
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing, and obtain other embodiments.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.
Embodiment one:
The embodiment of the present invention one provides a kind of hotel's data matching method, and Fig. 1 is a kind of wine provided in an embodiment of the present invention The implementation flow chart of shop data matching method, as shown in Figure 1, this method executes matching using ElasticSearch (being denoted as ES) Process in the present embodiment, can also realize that this will not be repeated here by modes such as MongoDB, specifically includes the following steps:
S1: obtaining original hotel's data of multiple platforms, specifically: the api interface by crawling platform or different platform, Obtain a large amount of platform hotels data of different platform, such as take journey, skill dragon, road trip, go where platform, and to platform hotel data Data processing is carried out, original hotel's data is obtained, original hotel's data directory of ES is written, is denoted as all_plats_in.
Wherein, find that higher from the hotel's quality of data for crawling platform lists page, repeated data is few by practice, because In this present embodiment, more preferably mode is to utilize to crawl platform.
S2: hotel's matching is carried out, specifically: according to preset platform priority, choose original hotel's data first is flat The original hotel's data of platform are written in ES and obtain matching hotel's concordance list, all_hotel_in are denoted as, according to the similar of matching attribute Degree successively carries out hotel's matching with original hotel's data of remaining platform, obtains matching hotel's concordance list.
Preset platform priority, which refers to, is ranked up the data of different platform, can need to be modified according to business, Such as the first platform of Cheng Zuowei is taken in setting, i.e., most important business platform, the corresponding hotel's data for taking journey are platform priority the One original hotel's data.
In the present embodiment, matching attribute includes: title, address, longitude and latitude, phone, postcode etc., according to business demand to Delimit weight and priority with the factor, for example, under an embodiment of the present embodiment, with longitude and latitude (3 kilometers of filtering limitation with Interior hotel) it is used as screening conditions, followed by title, address, phone, postcode, when calculating matching attribute similarity, ES is at other It is scanned in platform hotel data, and the hotel of search is given a mark to obtain similarity, so that it is determined that similarity is highest Hotel's data.
S3: matching list (being denoted as hotel_map_in) and basic information table are generated in ES according to matching hotel's concordance list (hotel_in), wherein matching list hotel_map_in includes: the standard number in hotel and the hotel matched in different platform Original number, standard number refers to needs the number standard that defines to be numbered to hotel according to business, basic information table Hotel_in from matching hotel's concordance list combine each platform original hotel's data extract to obtain, comprising: each hotel it is basic Information, such as title, address, phone, star, number of matches, longitude and latitude, regional information etc., number of matches can be used as the later period Recommend the recommendation Consideration in hotel to user.
The acquisition process of above-mentioned regional information are as follows: according to preset platform priority, from the original hotel's data of each platform Basic information is extracted, the platform information table of ES is generated, according to latitude and longitude information in platform information table, passes through Google or Baidu's coordinate It is counter to look into interface urban information is saved into database, after last phase tidying up improves, by city city, area state, country The regional informations such as country extract, and are added in basic information table hotel_in.
S4: in conjunction with matching list and basic information table, matching Accuracy Verification is carried out.
A kind of detailed process of hotel's data matching method of the present embodiment is described below in detail.
Wherein in S1, data processing is specifically included: data cleansing, the verification of each platform data and data deduplication.
1) data cleansing includes:
Hotel in platform hotel data China and International Hotel are distinguished according to coordinate inverse check function, and according to difference The coordinate inverse check function of provider is mutually verified, and hotel's coordinate and hotel address are corresponded to, this is because platform is former There may be part dirty datas in beginning hotel's data, and longitude and latitude or address information are problematic, if just existed without processing It causes a hidden trouble in subsequent matching process.In an embodiment of the present embodiment, the coordinate of Baidu or Google can use Inverse check function is mutually verified.
2) each platform data, which verifies, includes:
The longitude and latitude in hotel in platform hotel data is verified whether in the second preset range, and offline not meet second pre- If the platform hotel data of range, in the present embodiment, the second preset range optionally includes: verification hotel's latitude -90 to 90 Between, longitude is between -180 to 180, and longitude and latitude is between -1 to 1, theoretically position of the longitude and latitude between -1 to 1 It sets not in land, therefore hotel is not present, in the present embodiment, longitude and latitude verification can be detailed according to land, continent and country etc. It counts accurately according to the second preset range adjustment is carried out, to repair more data problems.
Verify whether the title in hotel, address, longitude and latitude field in platform hotel data have content missing, if there is Content missing, then offline corresponding hotel's data, hotel here include including domestic hotel and International Hotel, corresponding title Chinese and foreign language title, corresponding address include Chinese address and foreign language address, and corresponding longitude and latitude field includes Chinese Longitude and latitude field and foreign language longitude and latitude field cause to have because hotel's foreign language title disunity may be caused due to translation reason For a little hotel's data there are Chinese Fields are mixed in foreign language title, address field, these details will in checkout procedure It pays attention to.
Verify platform hotel data in hotel star or scoring whether in third preset range and each field whether Have whether missing, format all meet the requirements, and modify the hotel for not meeting third preset range, data normalization processing is done to it In the present embodiment, third preset range optionally includes: that star or scoring are unified between 0-5.
3) data deduplication includes:
Duplicate removal carried out to the identical hotel in title, address, in the present embodiment, by the domestic hotel of the same platform, When hotel name, address are completely the same, carry out duplicate removal and be included into repetition hotel library.By International Hotel China and foreign countries literary fames, foreign language The identical hotel's duplicate removal in location, and it is included into repetition hotel library.
It is hotel name is identical and address is located at the hotel in the 4th preset range, repetition hotel library, counterweight are included by group Multiple hotel, the hotel Zhong Tongzu, library obtains price, the offline hotel for not getting price according to prefixed time interval.
If being deposited into spare hotel library without price with group hotel, price acquisition is periodically carried out, when getting After price, then activation has the hotel of price, reenters in platform hotel data.
In the present embodiment, the 4th preset range is optionally that (parameter is according to actual match effect within 3 kilometers of longitude and latitude Continue to optimize and adjust), i.e., hotel name is identical, and repetition hotel library is included into the hotel that longitude and latitude is located within 3 kilometers.For Pairs of repetition hotel, crawls hotel's price of following 5 time intervals, the value of time interval to be staggered weekend and section it is false Day, if this is in duplicate hotel, one of them can obtain price, one cannot, then the offline hotel that cannot obtain price, It is not considered when matching, it is offline accurate to may be selected to improve by way of different time intervals are repeatedly arranged and carry out price queries Degree.Desk checking is done to the hotel for both having price, rejects and repeats hotel.If the hotel Liang Ge cannot all obtain price This is deposited into spare hotel library to hotel is repeated, and periodically carries out price acquisition to the hotel in spare hotel library by hotel, once After getting price, then the hotel is activated, reenters matching process.
Wherein S2, the matched process in hotel specifically:
S21: original hotel's data of current platform are inquired original hotel's data of another platform, are matched in ES It after the highest hotel of factor similarity, is reversely verified, specifically: another platform is reversely verified in all_plats_in Original hotel's data compare former and later two another original wine of platform if inquire original hotel's data of another platform Whether hotel's number of shop data is consistent.
Such as in a certain embodiment, A platform has a hotel, entitled beauty hotel, B platform have beautiful hotel with And another fine hotel, family, when the fine hotel of B platform is inquired in original hotel's data of A platform, then according to matching Similarity returns to the beautiful hotel of A platform, is inquired, can be returned if returning again to B platform after finding the beautiful hotel of A platform Beautiful hotel is gone back to, former and later two hotels number inquired in this way is just inconsistent, it is therefore desirable to which the reversed verifying of progress excludes this Situation.
S22: it after ES matching and reversed verifying, after being tentatively judged as same hotel, is further done outside ES similar Degree verification, when hotel's number is consistent and the similarity of matching attribute is greater than preset first similarity threshold, by another platform Original hotel's data hotel number, write-in matching hotel's concordance list in the original hotel's data of the first platform matching hotel compile In number (being denoted as hotel_id), if hotel number it is inconsistent if using original hotel's data of another platform as a new data In write-in matching hotel's concordance list.
Why the present embodiment will carry out similarity verification outside ES, be the usual feelings because ES is when carrying out hotel's retrieval Under condition, a query result can be returned from a heap data, even the hotel Liang Ge data are practical less related, matched by ES Later, the highest hotel's data of similarity can be also exported, are thought in turn, even if ES provides the highest hotel of similarity, in fact Border similarity is it is possible to lower, and especially when two platforms only exist a hotel respectively within 3 kilometers, ES is often this The hotel Liang Jia is matched to together.
Such as A platform has a hotel, hotel name are as follows: quality inn&suites, B platform has a hotel, wine Trade name is known as: holiday inn brownsville, and the two passes through ES matching and reversed verifying, but both practical is not Same hotel, it is therefore desirable to after being reversely verified again, be continued according to preset first similarity threshold similar Degree is verified, and in an embodiment of the present embodiment, the first similarity threshold is optionally 30%, which is a large amount of hotels With the empirical parameter obtained in the process, can need to be adjusted according to business.
S23: and so on, by original hotel's data of remaining all platform all with all_hotel_in (saved The original hotel's data of one platform) it is matched, when the first platform hotel data carry out matching it with the data of multiple platforms one by one Afterwards, multiple hotels may be matched to, might have hotel's number of multiple and different platforms behind hotel_id at this time.
As shown in Fig. 2, for original hotel's data directory and matching hotel's concordance list schematic diagram of the present embodiment.From figure As can be seen that all_plats_in stores the original hotel information of each platform, shown in figure, the same hotel is in skill dragon and goes Where hotel's data of two platforms, and after overmatching enter all_hotel_in in only one skill dragon platform wine Shop, and will go in hotel id where write-in hotel_id.
In actual use, all_plats_in may store ten thousand hotel's data of about 400-500, because having The largely repetition hotel data from different platform, after the matching of the hotel of the present embodiment, into all_hotel_in It reduces to ten thousand different hotels about more than 100 in hotel.
Wherein S4 carries out matching Accuracy Verification in conjunction with matching list and basic information table, specifically:
S41: in conjunction with matching list and basic information table, lookup names are identical and the identical hotel in address, offline longitude and latitude are wrong Accidentally and the few hotel of Matching Platform quantity, it is therefore an objective to the hotel for the longitude and latitude mistake that may be missed before is offline, for example, two The hotel of same address of the same name, longitude and latitude differ by more than 6 kilometers, even reach tens kilometers, and thus judgement must have one to go out Mistake, can be by way of inquiring longitude and latitude come error correction, hotel that can also be few with offline Matching Platform quantity.
S42: alternatively, the hotel for randomly selecting preset quantity in matching list is indexed as verifying hotel in matching hotel Title, the address similarity in each verifying hotel are successively calculated in table or basic information table, and according to calculated result assessment With quality.
In an embodiment of the present embodiment, 100 hotels of random inspection from basic information table hotel_in, With in hotel concordance list all_hotel_in, by inquiring similarity in ES, if quality of match is poor, it is likely that occur one Hotel searches the high hotel of a plurality of similarity, if quality of match is good, seldom occurs similarity height in all_hotel_in Hotel, therefore can according to calculated result assess quality of match.
As shown in figure 3, a kind of hotel's data matching method for the present embodiment integrally realizes block schematic illustration, it can from figure To find out, from platform is crawled or different platform api interface obtains the platform hotel data of multiple platforms, carry out data cleansing, Each platform data verification and data deduplication obtain original hotel's data of multiple platforms, wherein can be by part during data deduplication Hotel's deposit repeats hotel library or spare hotel library.Then hotel is carried out to original hotel's data to match to obtain matching list and basis Information table finally carries out matching Accuracy Verification.
The present embodiment passes through similarity algorithm in such a way that ElasticSearch establishes index and is reversely verified It is verified, execution efficiency is high, can filter and match poor hotel's link, and pass through case verification, can be to multi-platform Hotel realizes full-automatic matching, and Auto-matching accuracy is more than 90%, therefore a small amount of human resources supplement is only needed to complete matching Process reduces artificial accounting, improves hotel's matching efficiency.
Embodiment two:
Second embodiment of the present invention provides a kind of hotel's data matching devices, as shown in figure 4, being a kind of hotel of the present embodiment Data matching device structural block diagram, it can be seen from the figure that including following components:
Platform hotel data crawl device 10, for obtaining the platform hotel data of different platform, and to the platform wine Shop data carry out data processing, obtain original hotel's data, original hotel's data directory is written;
Hotel's coalignment 20, for choosing original hotel's number of the first platform according to preset platform priority orders According to, write-in matching hotel's concordance list, according to the similarity of matching attribute, one by one with the original hotel's data of the platform of remaining platform into Serve a round of liquor to the guests shop matching, obtain matching hotel's concordance list;
Matching list and essential information meter apparatus 30 are generated, for generating matching list and base according to matching hotel's concordance list Plinth information table, the basic information table include hotel's standard number and different platform original number;
Accuracy Verification device 40 is matched, it is accurate in conjunction with the matching list and the basic information table, carrying out matching Property verifying.
In addition, the present invention also provides a kind of hotel's Data Matching equipment, comprising:
At least one processor, and the memory being connect at least one described processor communication;
Wherein, the processor is by calling the computer program stored in the memory, for executing such as embodiment Method described in one.
In addition, the present invention also provides a kind of computer readable storage medium, computer-readable recording medium storage has calculating Machine executable instruction, the method that wherein computer executable instructions are used to that computer to be made to execute as described in embodiment one.
The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations, although referring to aforementioned each reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified, or equivalent substitution of some or all of the technical features;And These are modified or replaceed, the range for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution, It should all cover within the scope of the claims and the description of the invention.

Claims (10)

1. a kind of hotel's data matching method, which is characterized in that execute matching process using ElasticSearch, comprising:
The platform hotel data of different platform are obtained, and data processing is carried out to platform hotel data, obtain original hotel Original hotel's data directory is written in data;
According to preset platform priority orders, original hotel's data of the first platform, write-in matching hotel's concordance list, root are chosen According to the similarity of matching attribute, hotel's matching is carried out with the original hotel's data of the platform of remaining platform one by one, obtains matching hotel Concordance list;
Matching list and basic information table are generated according to matching hotel's concordance list, the basic information table includes that hotel's standard is compiled Number and different platform original number;
In conjunction with the matching list and the basic information table, matching Accuracy Verification is carried out.
2. a kind of hotel's data matching method according to claim 1, which is characterized in that the matched process tool in hotel Body are as follows:
Original hotel's data of the first platform are inquired one by one in original hotel's data directory, until inquiring similarity When original hotel's data of highest another platform, another platform is reversely verified in original hotel's data directory Original hotel's data if it is former to compare two another platforms inquire original hotel's data of another platform Whether hotel's number of beginning hotel's data is consistent;
It, will be described another if the similarity that the hotel Liang Ge numbers the consistent and described matching attribute is greater than the first similarity threshold The hotel of original hotel's data of platform numbers, and original hotel's data of the first platform in matching hotel's concordance list are written It matches in hotel's number, otherwise, the matching wine is written using original hotel's data of another platform as a new data In the concordance list of shop.
3. a kind of hotel's data matching method according to claim 1, which is characterized in that the matching Accuracy Verification tool Body are as follows:
Lookup names are identical from the matching list and the identical hotel in address, offline longitude and latitude mistake and Matching Platform quantity is few Hotel;
And/or the hotel that preset quantity is randomly selected in the matching list is indexed as verifying hotel in the matching hotel Title, the address similarity in each verifying hotel are successively calculated in table, and quality of match is assessed according to calculated result.
4. a kind of hotel's data matching method according to claim 1, which is characterized in that the data processing includes: number According to cleaning and/or the verification of each platform data and/or data deduplication.
5. a kind of hotel's data matching method according to claim 4, which is characterized in that the data cleansing includes:
Hotel in platform hotel data China and International Hotel are distinguished according to coordinate inverse check function;
And mutually verified according to the coordinate inverse check function of different providers, hotel's coordinate and hotel address are corresponded to.
6. a kind of hotel's data matching method according to claim 5, which is characterized in that each platform data verification packet It includes:
The longitude and latitude in hotel in platform hotel data is verified whether in the second preset range, and offline not meet second pre- If the platform hotel data of range;
Verify whether the title in hotel, address, longitude and latitude field in platform hotel data have content missing, if there is content It lacks, then offline corresponding hotel's data;
It verifies the star in hotel in platform hotel data and/or scores whether in third preset range, and modify and be not inconsistent The hotel for closing third preset range, data normalization processing is done to it.
7. according to a kind of described in any item hotel's data matching methods of claim 5 to 6, which is characterized in that the data are gone Include: again
Are carried out by duplicate removal, and is included into repetition hotel library by group for title, the identical hotel in address;
And/or hotel name is identical, address is located at the hotel in the 4th preset range, repetition hotel library is included by group, to institute Repetition hotel, the hotel Zhong Tongzu, library is stated, obtains price, the offline hotel for not getting price according to prefixed time interval;
If being deposited into spare hotel library without price with group hotel, price acquisition is periodically carried out, when getting price Afterwards, then the hotel for having price is activated, is reentered in platform hotel data.
8. a kind of hotel's data matching device characterized by comprising
Platform hotel data crawl device, for obtaining the platform hotel data of different platform, and to platform hotel data Data processing is carried out, original hotel's data is obtained, original hotel's data directory is written;
Hotel's coalignment, for choosing original hotel's data of the first platform, write-in according to preset platform priority orders Hotel's concordance list is matched, according to the similarity of matching attribute, carries out hotel with the original hotel's data of the platform of remaining platform one by one Matching obtains matching hotel's concordance list;
Matching list and essential information meter apparatus are generated, for generating matching list and basic information according to matching hotel's concordance list Table, the basic information table include hotel's standard number and different platform original number;
Accuracy Verification device is matched, for carrying out matching Accuracy Verification in conjunction with the matching list and the basic information table.
9. a kind of hotel's Data Matching equipment characterized by comprising
At least one processor;And the memory being connect at least one described processor communication;
Wherein, the processor is by calling the computer program stored in the memory, for execute as claim 1 to 7 described in any item hotel's data matching methods.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer can It executes instruction, the computer executable instructions are for making computer execute hotel's number as described in any one of claim 1 to 7 According to matching process.
CN201910380145.1A 2019-05-08 2019-05-08 Hotel data matching method and device Active CN110263022B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910380145.1A CN110263022B (en) 2019-05-08 2019-05-08 Hotel data matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910380145.1A CN110263022B (en) 2019-05-08 2019-05-08 Hotel data matching method and device

Publications (2)

Publication Number Publication Date
CN110263022A true CN110263022A (en) 2019-09-20
CN110263022B CN110263022B (en) 2023-03-14

Family

ID=67914352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910380145.1A Active CN110263022B (en) 2019-05-08 2019-05-08 Hotel data matching method and device

Country Status (1)

Country Link
CN (1) CN110263022B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325638A (en) * 2020-02-10 2020-06-23 北京蚂蜂窝网络科技有限公司 Hotel identification processing method, device, equipment and storage medium
CN111639253A (en) * 2020-05-22 2020-09-08 北京百度网讯科技有限公司 Data duplication judging method, device, equipment and storage medium
CN113361920A (en) * 2021-06-04 2021-09-07 上海华客信息科技有限公司 Hotel service optimization index recommendation method, system, equipment and storage medium
CN113628003A (en) * 2021-07-22 2021-11-09 上海泛宥信息科技有限公司 Hotel matching method, system, terminal and storage medium
CN114358979A (en) * 2022-01-12 2022-04-15 平安科技(深圳)有限公司 Hotel matching method and device, electronic equipment and storage medium
CN114860771A (en) * 2022-04-28 2022-08-05 北京合思信息技术有限公司 Hotel information data processing method and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050149507A1 (en) * 2003-02-05 2005-07-07 Nye Timothy G. Systems and methods for identifying an internet resource address
CN104158885A (en) * 2014-08-21 2014-11-19 中南大学 Method and system of streaming loading of application based on location information
CN104751232A (en) * 2015-04-27 2015-07-01 携程计算机技术(上海)有限公司 Automatic matching method for hotels
CN104778637A (en) * 2014-01-10 2015-07-15 携程计算机技术(上海)有限公司 Hotel data processing system and method
CN104809141A (en) * 2014-01-29 2015-07-29 携程计算机技术(上海)有限公司 Matching system and method of hotel data
CN105205699A (en) * 2015-09-17 2015-12-30 北京众荟信息技术有限公司 User label and hotel label matching method and device based on hotel comments
CN105761173A (en) * 2016-02-26 2016-07-13 姜恒 Hotel self-service management method and system
CN106447881A (en) * 2016-12-27 2017-02-22 安恒世通(北京)网络科技有限公司 Self-help registration management system for apartments
CN107291939A (en) * 2017-07-06 2017-10-24 携程计算机技术(上海)有限公司 The clustering match method and system of hotel information
CN107316231A (en) * 2017-06-27 2017-11-03 携程计算机技术(上海)有限公司 Hotel's pricing information method for pushing, system and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050149507A1 (en) * 2003-02-05 2005-07-07 Nye Timothy G. Systems and methods for identifying an internet resource address
CN104778637A (en) * 2014-01-10 2015-07-15 携程计算机技术(上海)有限公司 Hotel data processing system and method
CN104809141A (en) * 2014-01-29 2015-07-29 携程计算机技术(上海)有限公司 Matching system and method of hotel data
CN104158885A (en) * 2014-08-21 2014-11-19 中南大学 Method and system of streaming loading of application based on location information
CN104751232A (en) * 2015-04-27 2015-07-01 携程计算机技术(上海)有限公司 Automatic matching method for hotels
CN105205699A (en) * 2015-09-17 2015-12-30 北京众荟信息技术有限公司 User label and hotel label matching method and device based on hotel comments
CN105761173A (en) * 2016-02-26 2016-07-13 姜恒 Hotel self-service management method and system
CN106447881A (en) * 2016-12-27 2017-02-22 安恒世通(北京)网络科技有限公司 Self-help registration management system for apartments
CN107316231A (en) * 2017-06-27 2017-11-03 携程计算机技术(上海)有限公司 Hotel's pricing information method for pushing, system and storage medium
CN107291939A (en) * 2017-07-06 2017-10-24 携程计算机技术(上海)有限公司 The clustering match method and system of hotel information

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325638A (en) * 2020-02-10 2020-06-23 北京蚂蜂窝网络科技有限公司 Hotel identification processing method, device, equipment and storage medium
CN111639253A (en) * 2020-05-22 2020-09-08 北京百度网讯科技有限公司 Data duplication judging method, device, equipment and storage medium
CN113361920A (en) * 2021-06-04 2021-09-07 上海华客信息科技有限公司 Hotel service optimization index recommendation method, system, equipment and storage medium
CN113628003A (en) * 2021-07-22 2021-11-09 上海泛宥信息科技有限公司 Hotel matching method, system, terminal and storage medium
CN114358979A (en) * 2022-01-12 2022-04-15 平安科技(深圳)有限公司 Hotel matching method and device, electronic equipment and storage medium
CN114860771A (en) * 2022-04-28 2022-08-05 北京合思信息技术有限公司 Hotel information data processing method and device

Also Published As

Publication number Publication date
CN110263022B (en) 2023-03-14

Similar Documents

Publication Publication Date Title
CN110263022A (en) Hotel's data matching method and device
CN103473230B (en) Service area determines that method, logistics service provider recommend method and related device
CN105630823B (en) Data cached monitoring method, device and system based on distributed system
US20080198995A1 (en) System and method for providing a search portal with enhanced results
CN106651392A (en) Intelligent business location selection method, apparatus and system
CN109084795B (en) Method and device for searching service facilities based on map service
US20080027995A1 (en) Systems and methods for survey scheduling and implementation
CN103914536A (en) Interest point recommending method and system for electronic maps
CN102289467A (en) Method and device for determining target site
CN105426375B (en) A kind of calculation method and device of relational network
CN106354719B (en) POI service providing method, POI data processing method and processing device
CN110019542A (en) The method, apparatus of the generation method of business connection, generation organizational member database and identification member of the same name in enterprise's map
CN109002492A (en) A kind of point prediction technique based on LightGBM
CN107577749A (en) A kind of management method of supplier
CN108776678A (en) Index creation method and device based on mobile terminal NoSQL databases
CN108960562A (en) A kind of regional influence appraisal procedure and device
CN110428282A (en) Information query method and device based on gas station
CN109725928A (en) Gray scale dissemination method, device, equipment and readable storage medium storing program for executing
Fu et al. An evaluation model for island tourism competitiveness: Empirical study on Penghu Islands
CN104077392A (en) Method and device for searching suggestion prompting
CN102053960A (en) Method and system for constructing quick and accurate Internet of things and Internet search engine according to group requirement characteristics
JP2003281326A (en) Method and program of skill analysis
CN101685445A (en) Method for expressing distance priority of network geographic information subject matters
US20190266163A1 (en) System and method for behavior-on-read query processing
CN116070875A (en) User demand analysis method, device and medium based on household service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant