CN110263022A - Hotel's data matching method and device - Google Patents
Hotel's data matching method and device Download PDFInfo
- Publication number
- CN110263022A CN110263022A CN201910380145.1A CN201910380145A CN110263022A CN 110263022 A CN110263022 A CN 110263022A CN 201910380145 A CN201910380145 A CN 201910380145A CN 110263022 A CN110263022 A CN 110263022A
- Authority
- CN
- China
- Prior art keywords
- hotel
- data
- matching
- platform
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 238000012795 verification Methods 0.000 claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013524 data verification Methods 0.000 claims description 2
- 238000004140 cleaning Methods 0.000 claims 1
- 238000013523 data management Methods 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 230000004913 activation Effects 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000013589 supplement Substances 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010415 tidying Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/12—Hotels or restaurants
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- Quality & Reliability (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of hotel's data matching method and devices, it is related to hotel's data management field, the wherein platform hotel data that method passes through acquisition different platform, and carry out data processing, obtain original hotel's data, original hotel's data directory is written, then according to platform priority, choose the original hotel's data of the first platform of original hotel's data, write-in matching hotel's concordance list, according to the similarity of matching attribute, hotel is successively carried out with original hotel's data of remaining platform to match to obtain matching hotel's concordance list, matching list and basic information table are generated according to matching hotel's concordance list, matching Accuracy Verification is done in conjunction with matching list and basic information table.The present invention is verified in such a way that ES establishes index and reversely verifies by similarity algorithm, and execution efficiency is high, it can filter and match poor hotel's link, multi-platform hotel can automatically be matched, matching accuracy is more than 90%, artificial accounting is reduced, hotel's matching efficiency is improved.
Description
Technical field
The present invention relates to hotel's data management field, especially a kind of hotel's data matching method and device.
Background technique
Existing considerable tourism platform, same hotel is likely to occur on different platforms, therefore online tourism website
Need to match hotel to merge the identical hotel of essence, in existing hotel's matching scheme, mainstream way be to title address into
Then row parsing is added the latitudes such as telephone number and is given a mark and weighted to different dimensions by way of weight is previously set
Summation, obtains similarity to the end, determines whether the link of the hotel Liang Jia POI is same according to the similarity threshold of setting.But
It is that in the matching process, artificial accounting is still higher, for example, 60% or more POI is real in the application scenarios of Meituan wine trip
Body links matching task and carries out full automatic treatment, and remaining 40%POI entity link needs to carry out artificial operation processing, mainly relates to
And duplicate removal and Late Stage Verification work, labor intensive resource influence matching efficiency.
Therefore it needs to propose that one kind can reduce artificial accounting, human resources is liberated, to improve the hotel of matching efficiency
Matching process.
Summary of the invention
The present invention is directed to solve at least some of the technical problems in related technologies.For this purpose, of the invention
Purpose is to provide a kind of hotel's matching process and device for improving matching efficiency.
The technical scheme adopted by the invention is that:
In a first aspect, the present invention provides a kind of hotel's data matching method, matched using ElasticSearch execution
Journey, comprising:
The platform hotel data of different platform are obtained, and data processing is carried out to platform hotel data, are obtained original
Original hotel's data directory is written in hotel's data;
According to preset platform priority orders, original hotel's data of the first platform, write-in matching hotel index are chosen
Table carries out hotel's matching with the original hotel's data of the platform of remaining platform one by one, is matched according to the similarity of matching attribute
Hotel's concordance list;
Matching list and basic information table are generated according to matching hotel's concordance list, the basic information table includes hotel's mark
Quasi- number and different platform original number;
In conjunction with the matching list and the basic information table, matching Accuracy Verification is carried out.
Further, the matched process in the hotel specifically:
Original hotel's data of the first platform are inquired one by one in original hotel's data directory, until inquiring phase
When like the original hotel's data for spending highest another platform, reversely verified in original hotel's data directory described another
Original hotel's data of platform, if inquire original hotel's data of another platform, compare two it is described another flat
Whether hotel's number of the original hotel's data of platform is consistent;
It, will be described if the similarity that the hotel Liang Ge numbers the consistent and described matching attribute is greater than the first similarity threshold
The hotel of original hotel's data of another platform numbers, and original hotel's number of the first platform in matching hotel's concordance list is written
According to matching hotel number in, otherwise, using original hotel's data of another platform as the write-in of new data described
With in hotel's concordance list.
Further, the matching Accuracy Verification specifically:
Lookup names are identical from the matching list and the identical hotel in address, offline longitude and latitude mistake and Matching Platform number
Measure few hotel;
And/or the conduct verifying hotel, hotel of preset quantity is randomly selected in the matching list, in the matching hotel
Title, the address similarity in each verifying hotel are successively calculated in concordance list, and matching matter is assessed according to calculated result
Amount.
Further, the data processing includes: data cleansing and/or the verification of each platform data and/or data deduplication.
Further, the data cleansing includes:
Hotel in platform hotel data China and International Hotel are distinguished according to coordinate inverse check function;
And mutually verified according to the coordinate inverse check function of different providers, hotel's coordinate and hotel address are carried out pair
It answers.
Further, each platform data, which verifies, includes:
The longitude and latitude in hotel in platform hotel data is verified whether in the second preset range, and offline does not meet
The platform hotel data of two preset ranges;
Verify whether the title in hotel, address, longitude and latitude field in platform hotel data have content missing, if there is
Content lacks, then offline corresponding hotel's data;
It verifies the star in hotel in platform hotel data and/or scores whether in third preset range, and modify
Data normalization processing is done in the hotel for not meeting third preset range to it.Further, the data deduplication includes:
Duplicate removal is carried out to title, the identical hotel in address;
And/or hotel name is identical, address is located at the hotel in the 4th preset range, it is included into repetition hotel library by group,
Hotel, the hotel Zhong Tongzu, library is repeated to described, obtains price, the offline wine for not getting price according to prefixed time interval
Shop;
If being deposited into spare hotel library without price with group hotel, price acquisition is periodically carried out, when getting
After price, then activation has the hotel of price, reenters in platform hotel data.
Second aspect, the present invention provide a kind of hotel's data matching device, comprising:
Platform hotel data crawl device, for obtaining the platform hotel data of different platform, and to the platform hotel
Data carry out data processing, obtain original hotel's data, original hotel's data directory is written;
Hotel's coalignment, for choosing original hotel's data of the first platform according to preset platform priority orders,
Write-in matching hotel's concordance list is carried out with the original hotel's data of the platform of remaining platform one by one according to the similarity of matching attribute
Hotel's matching obtains matching hotel's concordance list;
Matching list and essential information meter apparatus are generated, for generating matching list and basis according to matching hotel's concordance list
Information table, the basic information table include hotel's standard number and different platform original number;
Accuracy Verification device is matched, for carrying out matching accuracy in conjunction with the matching list and the basic information table
Verifying.
The third aspect, the present invention provide a kind of hotel's Data Matching equipment, comprising:
At least one processor;And the memory being connect at least one described processor communication;
Wherein, the processor is by calling the computer program stored in the memory, for executing such as first party
The described in any item hotel's data matching methods in face.
Fourth aspect, the present invention provide a kind of computer readable storage medium, the computer-readable recording medium storage
There are computer executable instructions, the computer executable instructions are for executing computer as first aspect is described in any item
Method.
The beneficial effect of the embodiment of the present invention is:
The embodiment of the present invention counts platform hotel data by obtaining a large amount of platform hotels data of different platform
According to processing, original hotel's data are obtained, original hotel's data directory is written, then according to platform priority, choose original wine
The original hotel's data of first platform of shop data, write-in matching hotel's concordance list, according to the similarity of matching attribute, successively with its
Original hotel's data of remaining platform carry out hotel's matching, obtain matching hotel's concordance list, according to matching hotel's concordance list generation
With table and basic information table, matching Accuracy Verification is carried out in obtained matching list.The present invention is built by ElasticSearch
Lithol draws and is reversely verified by similarity algorithm, and execution efficiency is high, can filter and match poor hotel's link, and
By case verification, full-automatic matching can be realized to multi-platform hotel, Auto-matching accuracy is more than 90%, therefore is only needed
A small amount of human resources supplement completes matching process, reduces artificial accounting, improves hotel's matching efficiency.
The present invention is suitable for various types of matchings or rate of exchange project.
Detailed description of the invention
Fig. 1 is hotel's data matching method implementation flow chart of embodiment one in the present invention;
Fig. 2 is original hotel's data directory and the matching hotel of hotel's data matching method of embodiment one in the present invention
Concordance list schematic diagram;
Fig. 3 is that hotel's data matching method of embodiment one in the present invention integrally realizes block schematic illustration;
Fig. 4 is hotel's data matching device structural block diagram of embodiment two in the present invention.
Specific embodiment
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, Detailed description of the invention will be compareed below
A specific embodiment of the invention.It should be evident that drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing, and obtain other embodiments.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool
The purpose of the embodiment of body, it is not intended that in the limitation present invention.
Embodiment one:
The embodiment of the present invention one provides a kind of hotel's data matching method, and Fig. 1 is a kind of wine provided in an embodiment of the present invention
The implementation flow chart of shop data matching method, as shown in Figure 1, this method executes matching using ElasticSearch (being denoted as ES)
Process in the present embodiment, can also realize that this will not be repeated here by modes such as MongoDB, specifically includes the following steps:
S1: obtaining original hotel's data of multiple platforms, specifically: the api interface by crawling platform or different platform,
Obtain a large amount of platform hotels data of different platform, such as take journey, skill dragon, road trip, go where platform, and to platform hotel data
Data processing is carried out, original hotel's data is obtained, original hotel's data directory of ES is written, is denoted as all_plats_in.
Wherein, find that higher from the hotel's quality of data for crawling platform lists page, repeated data is few by practice, because
In this present embodiment, more preferably mode is to utilize to crawl platform.
S2: hotel's matching is carried out, specifically: according to preset platform priority, choose original hotel's data first is flat
The original hotel's data of platform are written in ES and obtain matching hotel's concordance list, all_hotel_in are denoted as, according to the similar of matching attribute
Degree successively carries out hotel's matching with original hotel's data of remaining platform, obtains matching hotel's concordance list.
Preset platform priority, which refers to, is ranked up the data of different platform, can need to be modified according to business,
Such as the first platform of Cheng Zuowei is taken in setting, i.e., most important business platform, the corresponding hotel's data for taking journey are platform priority the
One original hotel's data.
In the present embodiment, matching attribute includes: title, address, longitude and latitude, phone, postcode etc., according to business demand to
Delimit weight and priority with the factor, for example, under an embodiment of the present embodiment, with longitude and latitude (3 kilometers of filtering limitation with
Interior hotel) it is used as screening conditions, followed by title, address, phone, postcode, when calculating matching attribute similarity, ES is at other
It is scanned in platform hotel data, and the hotel of search is given a mark to obtain similarity, so that it is determined that similarity is highest
Hotel's data.
S3: matching list (being denoted as hotel_map_in) and basic information table are generated in ES according to matching hotel's concordance list
(hotel_in), wherein matching list hotel_map_in includes: the standard number in hotel and the hotel matched in different platform
Original number, standard number refers to needs the number standard that defines to be numbered to hotel according to business, basic information table
Hotel_in from matching hotel's concordance list combine each platform original hotel's data extract to obtain, comprising: each hotel it is basic
Information, such as title, address, phone, star, number of matches, longitude and latitude, regional information etc., number of matches can be used as the later period
Recommend the recommendation Consideration in hotel to user.
The acquisition process of above-mentioned regional information are as follows: according to preset platform priority, from the original hotel's data of each platform
Basic information is extracted, the platform information table of ES is generated, according to latitude and longitude information in platform information table, passes through Google or Baidu's coordinate
It is counter to look into interface urban information is saved into database, after last phase tidying up improves, by city city, area state, country
The regional informations such as country extract, and are added in basic information table hotel_in.
S4: in conjunction with matching list and basic information table, matching Accuracy Verification is carried out.
A kind of detailed process of hotel's data matching method of the present embodiment is described below in detail.
Wherein in S1, data processing is specifically included: data cleansing, the verification of each platform data and data deduplication.
1) data cleansing includes:
Hotel in platform hotel data China and International Hotel are distinguished according to coordinate inverse check function, and according to difference
The coordinate inverse check function of provider is mutually verified, and hotel's coordinate and hotel address are corresponded to, this is because platform is former
There may be part dirty datas in beginning hotel's data, and longitude and latitude or address information are problematic, if just existed without processing
It causes a hidden trouble in subsequent matching process.In an embodiment of the present embodiment, the coordinate of Baidu or Google can use
Inverse check function is mutually verified.
2) each platform data, which verifies, includes:
The longitude and latitude in hotel in platform hotel data is verified whether in the second preset range, and offline not meet second pre-
If the platform hotel data of range, in the present embodiment, the second preset range optionally includes: verification hotel's latitude -90 to 90
Between, longitude is between -180 to 180, and longitude and latitude is between -1 to 1, theoretically position of the longitude and latitude between -1 to 1
It sets not in land, therefore hotel is not present, in the present embodiment, longitude and latitude verification can be detailed according to land, continent and country etc.
It counts accurately according to the second preset range adjustment is carried out, to repair more data problems.
Verify whether the title in hotel, address, longitude and latitude field in platform hotel data have content missing, if there is
Content missing, then offline corresponding hotel's data, hotel here include including domestic hotel and International Hotel, corresponding title
Chinese and foreign language title, corresponding address include Chinese address and foreign language address, and corresponding longitude and latitude field includes Chinese
Longitude and latitude field and foreign language longitude and latitude field cause to have because hotel's foreign language title disunity may be caused due to translation reason
For a little hotel's data there are Chinese Fields are mixed in foreign language title, address field, these details will in checkout procedure
It pays attention to.
Verify platform hotel data in hotel star or scoring whether in third preset range and each field whether
Have whether missing, format all meet the requirements, and modify the hotel for not meeting third preset range, data normalization processing is done to it
In the present embodiment, third preset range optionally includes: that star or scoring are unified between 0-5.
3) data deduplication includes:
Duplicate removal carried out to the identical hotel in title, address, in the present embodiment, by the domestic hotel of the same platform,
When hotel name, address are completely the same, carry out duplicate removal and be included into repetition hotel library.By International Hotel China and foreign countries literary fames, foreign language
The identical hotel's duplicate removal in location, and it is included into repetition hotel library.
It is hotel name is identical and address is located at the hotel in the 4th preset range, repetition hotel library, counterweight are included by group
Multiple hotel, the hotel Zhong Tongzu, library obtains price, the offline hotel for not getting price according to prefixed time interval.
If being deposited into spare hotel library without price with group hotel, price acquisition is periodically carried out, when getting
After price, then activation has the hotel of price, reenters in platform hotel data.
In the present embodiment, the 4th preset range is optionally that (parameter is according to actual match effect within 3 kilometers of longitude and latitude
Continue to optimize and adjust), i.e., hotel name is identical, and repetition hotel library is included into the hotel that longitude and latitude is located within 3 kilometers.For
Pairs of repetition hotel, crawls hotel's price of following 5 time intervals, the value of time interval to be staggered weekend and section it is false
Day, if this is in duplicate hotel, one of them can obtain price, one cannot, then the offline hotel that cannot obtain price,
It is not considered when matching, it is offline accurate to may be selected to improve by way of different time intervals are repeatedly arranged and carry out price queries
Degree.Desk checking is done to the hotel for both having price, rejects and repeats hotel.If the hotel Liang Ge cannot all obtain price
This is deposited into spare hotel library to hotel is repeated, and periodically carries out price acquisition to the hotel in spare hotel library by hotel, once
After getting price, then the hotel is activated, reenters matching process.
Wherein S2, the matched process in hotel specifically:
S21: original hotel's data of current platform are inquired original hotel's data of another platform, are matched in ES
It after the highest hotel of factor similarity, is reversely verified, specifically: another platform is reversely verified in all_plats_in
Original hotel's data compare former and later two another original wine of platform if inquire original hotel's data of another platform
Whether hotel's number of shop data is consistent.
Such as in a certain embodiment, A platform has a hotel, entitled beauty hotel, B platform have beautiful hotel with
And another fine hotel, family, when the fine hotel of B platform is inquired in original hotel's data of A platform, then according to matching
Similarity returns to the beautiful hotel of A platform, is inquired, can be returned if returning again to B platform after finding the beautiful hotel of A platform
Beautiful hotel is gone back to, former and later two hotels number inquired in this way is just inconsistent, it is therefore desirable to which the reversed verifying of progress excludes this
Situation.
S22: it after ES matching and reversed verifying, after being tentatively judged as same hotel, is further done outside ES similar
Degree verification, when hotel's number is consistent and the similarity of matching attribute is greater than preset first similarity threshold, by another platform
Original hotel's data hotel number, write-in matching hotel's concordance list in the original hotel's data of the first platform matching hotel compile
In number (being denoted as hotel_id), if hotel number it is inconsistent if using original hotel's data of another platform as a new data
In write-in matching hotel's concordance list.
Why the present embodiment will carry out similarity verification outside ES, be the usual feelings because ES is when carrying out hotel's retrieval
Under condition, a query result can be returned from a heap data, even the hotel Liang Ge data are practical less related, matched by ES
Later, the highest hotel's data of similarity can be also exported, are thought in turn, even if ES provides the highest hotel of similarity, in fact
Border similarity is it is possible to lower, and especially when two platforms only exist a hotel respectively within 3 kilometers, ES is often this
The hotel Liang Jia is matched to together.
Such as A platform has a hotel, hotel name are as follows: quality inn&suites, B platform has a hotel, wine
Trade name is known as: holiday inn brownsville, and the two passes through ES matching and reversed verifying, but both practical is not
Same hotel, it is therefore desirable to after being reversely verified again, be continued according to preset first similarity threshold similar
Degree is verified, and in an embodiment of the present embodiment, the first similarity threshold is optionally 30%, which is a large amount of hotels
With the empirical parameter obtained in the process, can need to be adjusted according to business.
S23: and so on, by original hotel's data of remaining all platform all with all_hotel_in (saved
The original hotel's data of one platform) it is matched, when the first platform hotel data carry out matching it with the data of multiple platforms one by one
Afterwards, multiple hotels may be matched to, might have hotel's number of multiple and different platforms behind hotel_id at this time.
As shown in Fig. 2, for original hotel's data directory and matching hotel's concordance list schematic diagram of the present embodiment.From figure
As can be seen that all_plats_in stores the original hotel information of each platform, shown in figure, the same hotel is in skill dragon and goes
Where hotel's data of two platforms, and after overmatching enter all_hotel_in in only one skill dragon platform wine
Shop, and will go in hotel id where write-in hotel_id.
In actual use, all_plats_in may store ten thousand hotel's data of about 400-500, because having
The largely repetition hotel data from different platform, after the matching of the hotel of the present embodiment, into all_hotel_in
It reduces to ten thousand different hotels about more than 100 in hotel.
Wherein S4 carries out matching Accuracy Verification in conjunction with matching list and basic information table, specifically:
S41: in conjunction with matching list and basic information table, lookup names are identical and the identical hotel in address, offline longitude and latitude are wrong
Accidentally and the few hotel of Matching Platform quantity, it is therefore an objective to the hotel for the longitude and latitude mistake that may be missed before is offline, for example, two
The hotel of same address of the same name, longitude and latitude differ by more than 6 kilometers, even reach tens kilometers, and thus judgement must have one to go out
Mistake, can be by way of inquiring longitude and latitude come error correction, hotel that can also be few with offline Matching Platform quantity.
S42: alternatively, the hotel for randomly selecting preset quantity in matching list is indexed as verifying hotel in matching hotel
Title, the address similarity in each verifying hotel are successively calculated in table or basic information table, and according to calculated result assessment
With quality.
In an embodiment of the present embodiment, 100 hotels of random inspection from basic information table hotel_in,
With in hotel concordance list all_hotel_in, by inquiring similarity in ES, if quality of match is poor, it is likely that occur one
Hotel searches the high hotel of a plurality of similarity, if quality of match is good, seldom occurs similarity height in all_hotel_in
Hotel, therefore can according to calculated result assess quality of match.
As shown in figure 3, a kind of hotel's data matching method for the present embodiment integrally realizes block schematic illustration, it can from figure
To find out, from platform is crawled or different platform api interface obtains the platform hotel data of multiple platforms, carry out data cleansing,
Each platform data verification and data deduplication obtain original hotel's data of multiple platforms, wherein can be by part during data deduplication
Hotel's deposit repeats hotel library or spare hotel library.Then hotel is carried out to original hotel's data to match to obtain matching list and basis
Information table finally carries out matching Accuracy Verification.
The present embodiment passes through similarity algorithm in such a way that ElasticSearch establishes index and is reversely verified
It is verified, execution efficiency is high, can filter and match poor hotel's link, and pass through case verification, can be to multi-platform
Hotel realizes full-automatic matching, and Auto-matching accuracy is more than 90%, therefore a small amount of human resources supplement is only needed to complete matching
Process reduces artificial accounting, improves hotel's matching efficiency.
Embodiment two:
Second embodiment of the present invention provides a kind of hotel's data matching devices, as shown in figure 4, being a kind of hotel of the present embodiment
Data matching device structural block diagram, it can be seen from the figure that including following components:
Platform hotel data crawl device 10, for obtaining the platform hotel data of different platform, and to the platform wine
Shop data carry out data processing, obtain original hotel's data, original hotel's data directory is written;
Hotel's coalignment 20, for choosing original hotel's number of the first platform according to preset platform priority orders
According to, write-in matching hotel's concordance list, according to the similarity of matching attribute, one by one with the original hotel's data of the platform of remaining platform into
Serve a round of liquor to the guests shop matching, obtain matching hotel's concordance list;
Matching list and essential information meter apparatus 30 are generated, for generating matching list and base according to matching hotel's concordance list
Plinth information table, the basic information table include hotel's standard number and different platform original number;
Accuracy Verification device 40 is matched, it is accurate in conjunction with the matching list and the basic information table, carrying out matching
Property verifying.
In addition, the present invention also provides a kind of hotel's Data Matching equipment, comprising:
At least one processor, and the memory being connect at least one described processor communication;
Wherein, the processor is by calling the computer program stored in the memory, for executing such as embodiment
Method described in one.
In addition, the present invention also provides a kind of computer readable storage medium, computer-readable recording medium storage has calculating
Machine executable instruction, the method that wherein computer executable instructions are used to that computer to be made to execute as described in embodiment one.
The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations, although referring to aforementioned each reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified, or equivalent substitution of some or all of the technical features;And
These are modified or replaceed, the range for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution,
It should all cover within the scope of the claims and the description of the invention.
Claims (10)
1. a kind of hotel's data matching method, which is characterized in that execute matching process using ElasticSearch, comprising:
The platform hotel data of different platform are obtained, and data processing is carried out to platform hotel data, obtain original hotel
Original hotel's data directory is written in data;
According to preset platform priority orders, original hotel's data of the first platform, write-in matching hotel's concordance list, root are chosen
According to the similarity of matching attribute, hotel's matching is carried out with the original hotel's data of the platform of remaining platform one by one, obtains matching hotel
Concordance list;
Matching list and basic information table are generated according to matching hotel's concordance list, the basic information table includes that hotel's standard is compiled
Number and different platform original number;
In conjunction with the matching list and the basic information table, matching Accuracy Verification is carried out.
2. a kind of hotel's data matching method according to claim 1, which is characterized in that the matched process tool in hotel
Body are as follows:
Original hotel's data of the first platform are inquired one by one in original hotel's data directory, until inquiring similarity
When original hotel's data of highest another platform, another platform is reversely verified in original hotel's data directory
Original hotel's data if it is former to compare two another platforms inquire original hotel's data of another platform
Whether hotel's number of beginning hotel's data is consistent;
It, will be described another if the similarity that the hotel Liang Ge numbers the consistent and described matching attribute is greater than the first similarity threshold
The hotel of original hotel's data of platform numbers, and original hotel's data of the first platform in matching hotel's concordance list are written
It matches in hotel's number, otherwise, the matching wine is written using original hotel's data of another platform as a new data
In the concordance list of shop.
3. a kind of hotel's data matching method according to claim 1, which is characterized in that the matching Accuracy Verification tool
Body are as follows:
Lookup names are identical from the matching list and the identical hotel in address, offline longitude and latitude mistake and Matching Platform quantity is few
Hotel;
And/or the hotel that preset quantity is randomly selected in the matching list is indexed as verifying hotel in the matching hotel
Title, the address similarity in each verifying hotel are successively calculated in table, and quality of match is assessed according to calculated result.
4. a kind of hotel's data matching method according to claim 1, which is characterized in that the data processing includes: number
According to cleaning and/or the verification of each platform data and/or data deduplication.
5. a kind of hotel's data matching method according to claim 4, which is characterized in that the data cleansing includes:
Hotel in platform hotel data China and International Hotel are distinguished according to coordinate inverse check function;
And mutually verified according to the coordinate inverse check function of different providers, hotel's coordinate and hotel address are corresponded to.
6. a kind of hotel's data matching method according to claim 5, which is characterized in that each platform data verification packet
It includes:
The longitude and latitude in hotel in platform hotel data is verified whether in the second preset range, and offline not meet second pre-
If the platform hotel data of range;
Verify whether the title in hotel, address, longitude and latitude field in platform hotel data have content missing, if there is content
It lacks, then offline corresponding hotel's data;
It verifies the star in hotel in platform hotel data and/or scores whether in third preset range, and modify and be not inconsistent
The hotel for closing third preset range, data normalization processing is done to it.
7. according to a kind of described in any item hotel's data matching methods of claim 5 to 6, which is characterized in that the data are gone
Include: again
Are carried out by duplicate removal, and is included into repetition hotel library by group for title, the identical hotel in address;
And/or hotel name is identical, address is located at the hotel in the 4th preset range, repetition hotel library is included by group, to institute
Repetition hotel, the hotel Zhong Tongzu, library is stated, obtains price, the offline hotel for not getting price according to prefixed time interval;
If being deposited into spare hotel library without price with group hotel, price acquisition is periodically carried out, when getting price
Afterwards, then the hotel for having price is activated, is reentered in platform hotel data.
8. a kind of hotel's data matching device characterized by comprising
Platform hotel data crawl device, for obtaining the platform hotel data of different platform, and to platform hotel data
Data processing is carried out, original hotel's data is obtained, original hotel's data directory is written;
Hotel's coalignment, for choosing original hotel's data of the first platform, write-in according to preset platform priority orders
Hotel's concordance list is matched, according to the similarity of matching attribute, carries out hotel with the original hotel's data of the platform of remaining platform one by one
Matching obtains matching hotel's concordance list;
Matching list and essential information meter apparatus are generated, for generating matching list and basic information according to matching hotel's concordance list
Table, the basic information table include hotel's standard number and different platform original number;
Accuracy Verification device is matched, for carrying out matching Accuracy Verification in conjunction with the matching list and the basic information table.
9. a kind of hotel's Data Matching equipment characterized by comprising
At least one processor;And the memory being connect at least one described processor communication;
Wherein, the processor is by calling the computer program stored in the memory, for execute as claim 1 to
7 described in any item hotel's data matching methods.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer can
It executes instruction, the computer executable instructions are for making computer execute hotel's number as described in any one of claim 1 to 7
According to matching process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910380145.1A CN110263022B (en) | 2019-05-08 | 2019-05-08 | Hotel data matching method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910380145.1A CN110263022B (en) | 2019-05-08 | 2019-05-08 | Hotel data matching method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110263022A true CN110263022A (en) | 2019-09-20 |
CN110263022B CN110263022B (en) | 2023-03-14 |
Family
ID=67914352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910380145.1A Active CN110263022B (en) | 2019-05-08 | 2019-05-08 | Hotel data matching method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263022B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325638A (en) * | 2020-02-10 | 2020-06-23 | 北京蚂蜂窝网络科技有限公司 | Hotel identification processing method, device, equipment and storage medium |
CN111639253A (en) * | 2020-05-22 | 2020-09-08 | 北京百度网讯科技有限公司 | Data duplication judging method, device, equipment and storage medium |
CN113361920A (en) * | 2021-06-04 | 2021-09-07 | 上海华客信息科技有限公司 | Hotel service optimization index recommendation method, system, equipment and storage medium |
CN113628003A (en) * | 2021-07-22 | 2021-11-09 | 上海泛宥信息科技有限公司 | Hotel matching method, system, terminal and storage medium |
CN114358979A (en) * | 2022-01-12 | 2022-04-15 | 平安科技(深圳)有限公司 | Hotel matching method and device, electronic equipment and storage medium |
CN114860771A (en) * | 2022-04-28 | 2022-08-05 | 北京合思信息技术有限公司 | Hotel information data processing method and device |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149507A1 (en) * | 2003-02-05 | 2005-07-07 | Nye Timothy G. | Systems and methods for identifying an internet resource address |
CN104158885A (en) * | 2014-08-21 | 2014-11-19 | 中南大学 | Method and system of streaming loading of application based on location information |
CN104751232A (en) * | 2015-04-27 | 2015-07-01 | 携程计算机技术(上海)有限公司 | Automatic matching method for hotels |
CN104778637A (en) * | 2014-01-10 | 2015-07-15 | 携程计算机技术(上海)有限公司 | Hotel data processing system and method |
CN104809141A (en) * | 2014-01-29 | 2015-07-29 | 携程计算机技术(上海)有限公司 | Matching system and method of hotel data |
CN105205699A (en) * | 2015-09-17 | 2015-12-30 | 北京众荟信息技术有限公司 | User label and hotel label matching method and device based on hotel comments |
CN105761173A (en) * | 2016-02-26 | 2016-07-13 | 姜恒 | Hotel self-service management method and system |
CN106447881A (en) * | 2016-12-27 | 2017-02-22 | 安恒世通(北京)网络科技有限公司 | Self-help registration management system for apartments |
CN107291939A (en) * | 2017-07-06 | 2017-10-24 | 携程计算机技术(上海)有限公司 | The clustering match method and system of hotel information |
CN107316231A (en) * | 2017-06-27 | 2017-11-03 | 携程计算机技术(上海)有限公司 | Hotel's pricing information method for pushing, system and storage medium |
-
2019
- 2019-05-08 CN CN201910380145.1A patent/CN110263022B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050149507A1 (en) * | 2003-02-05 | 2005-07-07 | Nye Timothy G. | Systems and methods for identifying an internet resource address |
CN104778637A (en) * | 2014-01-10 | 2015-07-15 | 携程计算机技术(上海)有限公司 | Hotel data processing system and method |
CN104809141A (en) * | 2014-01-29 | 2015-07-29 | 携程计算机技术(上海)有限公司 | Matching system and method of hotel data |
CN104158885A (en) * | 2014-08-21 | 2014-11-19 | 中南大学 | Method and system of streaming loading of application based on location information |
CN104751232A (en) * | 2015-04-27 | 2015-07-01 | 携程计算机技术(上海)有限公司 | Automatic matching method for hotels |
CN105205699A (en) * | 2015-09-17 | 2015-12-30 | 北京众荟信息技术有限公司 | User label and hotel label matching method and device based on hotel comments |
CN105761173A (en) * | 2016-02-26 | 2016-07-13 | 姜恒 | Hotel self-service management method and system |
CN106447881A (en) * | 2016-12-27 | 2017-02-22 | 安恒世通(北京)网络科技有限公司 | Self-help registration management system for apartments |
CN107316231A (en) * | 2017-06-27 | 2017-11-03 | 携程计算机技术(上海)有限公司 | Hotel's pricing information method for pushing, system and storage medium |
CN107291939A (en) * | 2017-07-06 | 2017-10-24 | 携程计算机技术(上海)有限公司 | The clustering match method and system of hotel information |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111325638A (en) * | 2020-02-10 | 2020-06-23 | 北京蚂蜂窝网络科技有限公司 | Hotel identification processing method, device, equipment and storage medium |
CN111639253A (en) * | 2020-05-22 | 2020-09-08 | 北京百度网讯科技有限公司 | Data duplication judging method, device, equipment and storage medium |
CN113361920A (en) * | 2021-06-04 | 2021-09-07 | 上海华客信息科技有限公司 | Hotel service optimization index recommendation method, system, equipment and storage medium |
CN113628003A (en) * | 2021-07-22 | 2021-11-09 | 上海泛宥信息科技有限公司 | Hotel matching method, system, terminal and storage medium |
CN114358979A (en) * | 2022-01-12 | 2022-04-15 | 平安科技(深圳)有限公司 | Hotel matching method and device, electronic equipment and storage medium |
CN114860771A (en) * | 2022-04-28 | 2022-08-05 | 北京合思信息技术有限公司 | Hotel information data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN110263022B (en) | 2023-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110263022A (en) | Hotel's data matching method and device | |
CN103473230B (en) | Service area determines that method, logistics service provider recommend method and related device | |
CN105630823B (en) | Data cached monitoring method, device and system based on distributed system | |
US20080198995A1 (en) | System and method for providing a search portal with enhanced results | |
CN106651392A (en) | Intelligent business location selection method, apparatus and system | |
CN109084795B (en) | Method and device for searching service facilities based on map service | |
US20080027995A1 (en) | Systems and methods for survey scheduling and implementation | |
CN103914536A (en) | Interest point recommending method and system for electronic maps | |
CN102289467A (en) | Method and device for determining target site | |
CN105426375B (en) | A kind of calculation method and device of relational network | |
CN106354719B (en) | POI service providing method, POI data processing method and processing device | |
CN110019542A (en) | The method, apparatus of the generation method of business connection, generation organizational member database and identification member of the same name in enterprise's map | |
CN109002492A (en) | A kind of point prediction technique based on LightGBM | |
CN107577749A (en) | A kind of management method of supplier | |
CN108776678A (en) | Index creation method and device based on mobile terminal NoSQL databases | |
CN108960562A (en) | A kind of regional influence appraisal procedure and device | |
CN110428282A (en) | Information query method and device based on gas station | |
CN109725928A (en) | Gray scale dissemination method, device, equipment and readable storage medium storing program for executing | |
Fu et al. | An evaluation model for island tourism competitiveness: Empirical study on Penghu Islands | |
CN104077392A (en) | Method and device for searching suggestion prompting | |
CN102053960A (en) | Method and system for constructing quick and accurate Internet of things and Internet search engine according to group requirement characteristics | |
JP2003281326A (en) | Method and program of skill analysis | |
CN101685445A (en) | Method for expressing distance priority of network geographic information subject matters | |
US20190266163A1 (en) | System and method for behavior-on-read query processing | |
CN116070875A (en) | User demand analysis method, device and medium based on household service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |