CN115797109B - Hotel aggregation method, device and system for different suppliers - Google Patents

Hotel aggregation method, device and system for different suppliers Download PDF

Info

Publication number
CN115797109B
CN115797109B CN202310044353.0A CN202310044353A CN115797109B CN 115797109 B CN115797109 B CN 115797109B CN 202310044353 A CN202310044353 A CN 202310044353A CN 115797109 B CN115797109 B CN 115797109B
Authority
CN
China
Prior art keywords
hotel
similarity
target
hotels
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310044353.0A
Other languages
Chinese (zh)
Other versions
CN115797109A (en
Inventor
冷鹏
赵鹏
罗宁
居锴
秦小康
邓高明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sifang Qidian Technology Co ltd
Original Assignee
Beijing Sifang Qidian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sifang Qidian Technology Co ltd filed Critical Beijing Sifang Qidian Technology Co ltd
Priority to CN202310044353.0A priority Critical patent/CN115797109B/en
Publication of CN115797109A publication Critical patent/CN115797109A/en
Application granted granted Critical
Publication of CN115797109B publication Critical patent/CN115797109B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a hotel aggregation method, device and system for different suppliers, and belongs to the field of data aggregation; firstly, calculating the name similarity of hotels to be aggregated and basic hotels according to hotel names, and further selecting N basic hotels with higher name similarity as target basic hotels; because the names are similar and the same hotel cannot be completely confirmed, the name similarity corresponding to the basic target hotel is also required to be adjusted according to the hotel address and the contact telephone, and the final target similarity is obtained; because at most one target basic hotel and the hotel to be aggregated are the same hotel, the target basic hotel corresponding to the target similarity with the largest value is used as the target hotel, and finally if the target similarity of the target hotel is greater than the preset similarity, the target hotel and the hotel to be aggregated are judged to be the same hotel, and the hotel to be aggregated and the target hotel are aggregated, so that the user can conveniently select the hotel.

Description

Hotel aggregation method, device and system for different suppliers
Technical Field
The invention relates to the field of data aggregation, in particular to a hotel aggregation method, device and system for different suppliers.
Background
With the increase of people going out, hotels are used as places for providing rest, and the hotel is very suitable for people to go out. While hotel providers now exist to provide hotel introductions, the data from different hotel providers may not be comprehensive, and thus, when the most appropriate hotel needs to be selected, the data from multiple providers needs to be integrated.
However, due to the different habits of each provider, in practice, even the same hotel will occur, but there may be different situations in hotel names among different providers; or even if the hotel names of different suppliers are the same and the hotel addresses provided by different suppliers are similar, but in fact, the situation of two hotels greatly interferes with the selection of the hotels by the user.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a hotel aggregation method, a hotel aggregation device and a hotel aggregation system for different suppliers, so as to solve the problem that even the same hotel can appear in practice due to different habits of each supplier, but different hotel names can exist in different suppliers; or even if the hotel names of different suppliers are the same and the hotel addresses provided by different suppliers are similar, but the situation of two hotels is actually the case, so that the problem of selecting the hotels by users is greatly disturbed.
The technical scheme adopted for solving the technical problems is as follows:
in a first aspect, a hotel aggregation method for different suppliers is provided, comprising the steps of:
acquiring hotel data of all suppliers, wherein the hotel data comprises hotel names, hotel addresses and contact phones; taking hotels of hotel data of any provider as basic hotels, and taking hotels in the hotel data of the rest providers as hotels to be aggregated;
calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, sequencing all the basic hotels according to the sequence of the name similarity from big to small, and taking the first N basic hotels in the sequencing as target basic hotels, wherein N is a positive integer;
adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity, and taking the target basic hotel corresponding to the target similarity with the largest value as a target hotel;
and if the target similarity of the target hotels is greater than the preset similarity, polymerizing the hotels to be polymerized with the target hotels.
Further, calculating the similarity of names of any hotel to be aggregated and all basic hotels according to the hotel names includes:
Performing word segmentation operation on hotel names of the hotels to be aggregated and any one of the basic hotels;
acquiring the attribute of the word segmentation obtained after the word segmentation operation according to a pre-constructed dictionary, wherein the attribute comprises a city, a name identification word, a description word, an industry word and a landmark;
calculating attribute similarity of the word segmentation of the hotel to be aggregated and the hotel name of the basic hotel;
and calculating the product of the attribute similarity of each attribute and the corresponding attribute weight coefficient, and adding the products of all the attributes to obtain the name similarity of the hotel to be aggregated and the basic hotel.
Further, when the attribute is a name identifier word, the calculating the attribute similarity of the word segmentation of the same attribute in the hotel name of the hotel to be aggregated and the basic hotel includes:
if all words of second name identification words of the hotel names of the hotels to be aggregated are included in the first name identification words of the hotel names of the basic hotels, the word number of target words is obtained, the attribute similarity is calculated according to the word number of the target words, the target words are words which are identical to the words of the second name identification words in the first name identification words, and the second name identification words consist of all the target words;
And if the first name identification word of the hotel name of the basic hotel does not comprise all words of the second name identification word of the hotel name of the hotel to be aggregated, the attribute similarity is 0.
Further, the calculating the attribute similarity according to the word number of the target word includes:
acquiring a first length of the first name identification word, a second length of the second name identification word and a third length of the target word with the longest length;
when the number of words is 1, calculating the attribute similarity according to the first length and the second length, wherein the calculation formula is as follows:
attribute similarity = second length/first length + preset adjustment factor (1-second length/first length) × (1-1/2) (second Length-1) );
When the number of words is greater than 1, calculating the attribute similarity according to the first length, the second length, the third length and the same number, wherein the calculation formula is as follows:
attribute similarity = (third length/2+second length+1/number of words) [ second length/first length+preset adjustment coefficient ] (third Length-1) )】。
Further, when the attribute is not a name identification word, the calculating the attribute similarity of the word segmentation of the same attribute in the hotel name of the hotel to be aggregated and the basic hotel includes:
If the word of any attribute in the hotel name of the hotel to be aggregated is the same as the word of the corresponding same attribute in the hotel name of the basic hotel, the attribute similarity of the attribute is 1;
if the word of any attribute in the hotel name of the hotel to be aggregated is different from the word of the same attribute corresponding to the hotel name of the basic hotel, the attribute similarity of the attribute is 0.
Further, the method further comprises the following steps:
if the hotel to be aggregated comprises the word segmentation of five attributes, the attribute weight coefficient corresponding to each attribute adopts a preset weight;
if the hotel to be aggregated does not all comprise the word segmentation of five attributes, acquiring a first preset weight of the attributes not included in the hotel name of the hotel to be aggregated and a second preset weight of the attributes included in the hotel name of the hotel to be aggregated; calculating the target weight of the attribute included in the hotel name of the hotel to be aggregated according to the first preset weight and the second preset weight, wherein the calculation formula of the target weight of any attribute is as follows:
target weight = second preset weight + second preset weight/(1-first preset weight) × first preset weight.
Further, the hotel address comprises a road name and a house number; the step of adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity comprises the following steps:
if the hotel addresses of the hotels to be aggregated and the target basic hotel are the same, adjusting the target similarity of the target basic hotel to be a full value;
if the hotel addresses of the hotels to be aggregated and the target basic hotels are not identical, adjusting the name similarity of the target basic hotels to obtain target similarity according to the following mode:
taking the value of the name similarity as hotel similarity;
if the road names are the same but the house numbers are different, reducing the hotel similarity at the current moment by a first preset value; if the contact phones of the hotels to be aggregated and the target basic hotels are the same, increasing the hotel similarity at the current moment by a second preset value; if the distance between the hotel to be aggregated and the target basic hotel is within the preset distance according to the hotel address, increasing the hotel similarity at the current moment by a third preset value;
and taking the final hotel similarity as the target similarity.
Further, calculating the similarity of names of any hotel to be aggregated and all basic hotels according to the hotel names includes:
acquiring a first basic hotel of which the country, city and county in the hotel address of the basic hotel are the same as the country, city and county in the hotel address of the hotel to be aggregated;
calculating the name similarity of the hotels to be aggregated and the first basic hotels according to the hotel names; and sequencing all the first basic hotels according to the sequence of the name similarity from big to small, and taking the first N first basic hotels in the sequencing as target basic hotels, wherein N is a positive integer.
In a second aspect, there is provided a hotel aggregation apparatus of different suppliers, comprising:
the hotel data acquisition module is used for acquiring hotel data of all suppliers, wherein the hotel data comprises hotel names, hotel addresses and contact phones; taking hotels of hotel data of any provider as basic hotels, and taking hotels in the hotel data of the rest providers as hotels to be aggregated;
the target basic hotel acquisition module is used for calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, sequencing all the basic hotels according to the sequence from the large to the small of the name similarity, and taking the first N basic hotels in the sequencing as target basic hotels, wherein N is a positive integer;
The target hotel determination module is used for adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity, and taking the target basic hotel corresponding to the target similarity with the largest value as the target hotel;
and the hotel aggregation module is used for aggregating the hotels to be aggregated with the target hotels if the target similarity of the target hotels is greater than a preset similarity.
In a third aspect, a hotel aggregation system for different suppliers is provided, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured for performing the method of any one of the technical solutions provided in the first aspect.
The beneficial effects are that:
the technical scheme of the application provides a hotel aggregation method, device and system for different suppliers, which comprises the steps of firstly acquiring hotel data of all suppliers, and then taking the hotels in the hotel data of one of the suppliers as basic hotels, and taking the hotels in the hotel data of other suppliers as hotels to be aggregated; firstly calculating the name similarity of hotels to be aggregated and basic hotels according to the hotel names, and then selecting N basic hotels with higher name similarity as target basic hotels; because the names are similar and the same hotel cannot be completely confirmed, the name similarity corresponding to the basic target hotel is also required to be adjusted according to the hotel address and the contact telephone, and the final target similarity is obtained; because at most one target basic hotel and the hotel to be aggregated are the same hotel, the target basic hotel corresponding to the target similarity with the largest value is used as the target hotel, and finally if the target similarity of the target hotel is greater than the preset similarity, the target hotel and the hotel to be aggregated are judged to be the same hotel, and the hotel to be aggregated and the target hotel are aggregated, so that the user can conveniently select the hotel.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a hotel aggregation method for different suppliers provided by an embodiment of the present invention;
FIG. 2 is a flow chart of a hotel aggregation method for a specific different provider provided by an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a hotel polymerization device of different suppliers according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present invention will be described in detail with reference to the accompanying drawings and examples. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, based on the examples herein, which are within the scope of the protection sought by those of ordinary skill in the art without undue effort, are intended to be encompassed by the present application.
First embodiment referring to fig. 1, an embodiment of the present invention provides a hotel aggregation method for different suppliers, including the steps of:
s11: acquiring hotel data of all suppliers, wherein the hotel data comprises hotel names and hotel addresses and contact phones; taking hotels of hotel data of any provider as basic hotels, and taking hotels in the hotel data of the rest providers as hotels to be aggregated;
s12: calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, sequencing all the basic hotels according to the sequence of the name similarity from big to small, and taking the first N basic hotels in the sequencing as target basic hotels, wherein N is a positive integer;
s13: adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity, and taking the target basic hotel corresponding to the target similarity with the largest value as the target hotel;
s14: and if the target similarity of the target hotels is greater than the preset similarity, polymerizing the hotels to be polymerized with the target hotels.
According to the hotel aggregation method for different suppliers provided by the embodiment of the invention, firstly, hotel data of all suppliers are obtained, then, hotels in the hotel data of one supplier are used as basic hotels, and hotels in the hotel data of other suppliers are used as hotels to be aggregated; firstly calculating the name similarity of hotels to be aggregated and basic hotels according to the hotel names, and then selecting N basic hotels with higher name similarity as target basic hotels; because the names are similar and the same hotel cannot be completely confirmed, the name similarity corresponding to the basic target hotel is also required to be adjusted according to the hotel address and the contact telephone, and the final target similarity is obtained; because at most one target basic hotel and the hotel to be aggregated are the same hotel, the target basic hotel corresponding to the target similarity with the largest value is used as the target hotel, and finally if the target similarity of the target hotel is greater than the preset similarity, the target hotel and the hotel to be aggregated are judged to be the same hotel, and the hotel to be aggregated and the target hotel are aggregated, so that the user can conveniently select the hotel.
A second embodiment, as a supplementary explanation of the above embodiment, the present invention provides a specific hotel aggregation method for different suppliers, as shown in fig. 2, comprising the steps of:
acquiring hotel data of all suppliers, wherein the hotel data comprises hotel names and hotel addresses and contact phones; taking hotels of hotel data of any provider as basic hotels, and taking hotels in the hotel data of the rest providers as hotels to be aggregated; selecting hotels of one provider as basic hotels, then corresponding hotels to be aggregated of other providers one by one to the basic hotels, and if the hotels are the same, aggregating; if not, the description is not the same. Wherein the judgment is as follows:
acquiring a first basic hotel of which the country, city and county in the hotel address are the same as those in the hotel address of the hotel to be aggregated; firstly, pre-screening is carried out according to the country, the city and the county in the hotel address, and because the hotels which are similar in location are probably confused in practice, the first basic hotel is obtained after pre-screening, so that the calculated amount can be greatly reduced when the name similarity is calculated, and the aggregation speed is improved. It should be noted that, data format normalization is generally performed on hotel data of suppliers, and normalization processing is performed on the hotel data. Therefore, the general hotel addresses all have countries, cities and counties, and if the basic hotels with the data are not provided, the basic hotels are not used as the first basic hotels to avoid mistakes.
Calculating the name similarity of the hotels to be aggregated and the first basic hotels according to the hotel names; sorting all first basic hotels according to the sequence of the name similarity from big to small, and taking the first N first basic hotels in the sorting as target basic hotels, wherein N is a positive integer; calculating attribute similarity of the word segmentation of the hotel to be aggregated and the hotel name of the first basic hotel; specifically, word segmentation operation is carried out on hotel names of hotels to be aggregated and any first basic hotel; an IK word segmentation device is used for carrying out word segmentation operation in an exemplary manner; obtaining the attribute of the word segmentation obtained after the word segmentation operation according to a pre-constructed dictionary, wherein the attribute comprises a city, a name identification word, a description word, an industry word and a landmark; i.e. pre-building dictionaries of various vocabularies. Such as a city dictionary, a name identification word dictionary, a description word dictionary, an industry word dictionary and a landmark dictionary, then determining the attribute of the segmented words according to the constructed dictionary, splitting only words which are completely matched with the words in the dictionary, and recording the part of speech of the split words as the attribute thereof. The city is a component containing meanings such as city and administrative district, which is included in the hotel name. The name identification words are strong identification words of hotel names, have extremely strong distinguishing capability, and hotel names with different name identification words can be generally directly regarded as different hotels. Such as: branding, etc.; therefore, when preset weights are allocated subsequently, the weights of the general hotel identification words are highest. The descriptor is used to describe the nature of the hotel, such as: "business, quick" and the like. Industry words are used to indicate the type of hotel, such as: "Hotel, hotel, guest stack, hotel, civilian" and the like. The landmark is used for distinguishing different branch shops of the hotel, particularly two hotels of the same brand in similar areas, and each hotel is positioned at a certain landmark attachment, so that the landmark is introduced into the hotel names for distinguishing.
When the attribute is a name identification word, calculating the attribute similarity of the word segmentation of the same attribute in the hotel names of the hotels to be aggregated and the basic hotels, wherein the method comprises the following steps: if all words of second name identification words of hotel names of hotels to be aggregated are included in first name identification words of hotel names of basic hotels, the word number of target words is obtained, attribute similarity is calculated according to the word number of the target words, the target words are words which are completely identical to the words of the second name identification words in the first name identification words, and the second name identification words consist of all the target words;
specifically, a first length of a first name identification word, a second length of a second name identification word, and a third length of a target word having the longest length are acquired.
When the number of words is 1, calculating attribute similarity according to the first length and the second length, wherein the calculation formula is as follows: attribute similarity = second length/first length + preset adjustment factor (1-second length/first length) × (1-1/2) (second Length-1) ) The method comprises the steps of carrying out a first treatment on the surface of the When the number of words is 1, the second name identification word is completely appeared in the first name identification word, for example, the first name identification word is ABC, the second name identification word is AB, and the words are spokenThe overlap ratio is high, so that only the first length and the second length need to be compared.
When the number of words is greater than 1, calculating attribute similarity according to the first length, the second length and the third length and the same number, wherein the calculation formula is as follows: attribute similarity = (third length/2+second length+1/number of words) [ second length/first length+preset adjustment coefficient ] (third Length-1) ) The process is carried out. The preset adjustment coefficient is in the range of 0.8-1, preferably 0.8. Illustratively, the first name identifier is ABCDE and the second name identifier is abdde. That is, the second name identifier is divided into a plurality of words and appears in the first name identifier, so that the attribute similarity is calculated according to the first length and the second length, and the attribute similarity is larger according to the third length and the number of words.
When the attribute is not the name identification word, calculating the attribute similarity of the word segmentation of the same attribute in the hotel names of the hotels to be aggregated and the basic hotels, wherein the method comprises the following steps: if the word of any attribute in the hotel names of the hotels to be aggregated is the same as the word of the corresponding same attribute in the hotel names of the basic hotels, the attribute similarity of the attributes is 1; if the word of any attribute in the hotel names of the hotels to be aggregated is different from the word of the corresponding same attribute in the hotel names of the basic hotels, the attribute similarity of the attributes is 0. Therefore, the city, industry words, description words and landmark words are simple and direct, the assigned weight is smaller, and the similarity is not required to be calculated excessively, so that the overall calculation amount is reduced.
If the first name identification word of the hotel name of the basic hotel does not include all words of the second name identification word of the hotel name of the hotel to be aggregated, the attribute similarity is 0. If the first name identification word does not include all words in the second name identification word, namely the words which do not exist in the first name identification word are described in the second name identification word, and the first name identification word is basically described as the word with strong identification, and the first name identification word is completely different from the second name identification word, so that the attribute similarity is directly set to 0.
And calculating the product of the attribute similarity of each attribute and the corresponding attribute weight coefficient, and adding the products of all the attributes to obtain the name similarity of the hotels to be aggregated and the basic hotels. As an optional implementation manner of the embodiment of the invention, the attribute weight coefficient corresponding to each attribute adopts a preset weight. If the hotel to be aggregated comprises the word segmentation of five attributes, the attribute weight coefficient of each attribute is required to be a preset weight, and if the hotel to be aggregated does not comprise the word segmentation of five attributes, a first preset weight of the attribute which is not included in the hotel name of the hotel to be aggregated and a second preset weight of the attribute which is included in the hotel name of the hotel to be aggregated are obtained; calculating target weights of attributes included in hotel names of hotels to be aggregated according to the first preset weights and the second preset weights, wherein a target weight calculation formula of any attribute is as follows: target weight = second preset weight + second preset weight/(1-first preset weight) × first preset weight. That is, when one or more attributes are missing, attribute weight coefficients corresponding to the missing attributes need to be allocated to the attributes with the missing attributes, so that the name similarity can be obtained more accurately.
Adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity, and taking the target basic hotel corresponding to the target similarity with the largest value as the target hotel; the hotel address comprises a road name and a house number; the adjustment mode is as follows: if the hotel addresses of the hotels to be aggregated and the target basic hotels are the same, the target similarity of the target basic hotels is adjusted to be a full value, generally 100%; if the hotel addresses of the hotel to be aggregated and the target basic hotel are not identical, the name similarity of the target basic hotel is adjusted according to the following mode to obtain the target similarity: taking the value of the name similarity as hotel similarity; if the road names are the same but the house numbers are different, the hotel similarity at the current moment is reduced by a first preset value, and the probability that the hotel similarity is not the same house is high because the general house numbers are different, so that a certain value needs to be reduced; if the contact phones of the hotels to be aggregated and the target basic hotels are the same, the hotel similarity at the current moment is increased by a second preset value, and because the boss or the contact person of the two hotels is probably the same, the contact phones cannot be directly confirmed to be the same, but the value of the hotel similarity can be increased; if the distance between the hotel to be aggregated and the target basic hotel is within the preset distance according to the hotel address, the hotel similarity at the current moment is increased by a third preset value. And finally, taking the final hotel similarity as the target similarity. The first preset value, the second preset value and the third preset value are set according to actual needs.
And if the target similarity of the target hotels is greater than the preset similarity, polymerizing the hotels to be polymerized with the target hotels. The preset similarity is set according to actual requirements.
According to the specific hotel aggregation method for different suppliers provided by the embodiment of the invention, firstly, hotel data of all suppliers are obtained, then, hotels in the hotel data of one supplier are taken as basic hotels, and hotels in the hotel data of other suppliers are taken as hotels to be aggregated; firstly, primarily screening according to countries, cities and counties in hotel addresses to obtain a first basic hotel; so as to reduce the calculated amount when the name similarity is calculated subsequently; calculating the name similarity of the hotels to be aggregated and the first basic hotels according to the hotel names, and further selecting N first basic hotels with higher name similarity as target basic hotels; because the names are similar and the same hotel cannot be completely confirmed, the name similarity corresponding to the basic target hotel is also required to be adjusted according to the hotel address and the contact telephone, and the final target similarity is obtained; because at most one target basic hotel and the hotel to be aggregated are the same hotel, the target basic hotel corresponding to the target similarity with the largest value is used as the target hotel, and finally if the target similarity of the target hotel is greater than the preset similarity, the target hotel and the hotel to be aggregated are judged to be the same hotel, and the hotel to be aggregated and the target hotel are aggregated, so that the user can conveniently select the hotel.
In a third embodiment, the present invention provides a hotel aggregation apparatus of different suppliers, as shown in fig. 3, comprising:
a hotel data acquisition module 31, configured to acquire hotel data of all suppliers, where the hotel data includes a hotel name and a hotel address, and a contact phone; taking hotels of hotel data of any provider as basic hotels, and taking hotels in the hotel data of the rest providers as hotels to be aggregated;
the target basic hotel obtaining module 32 is configured to calculate the name similarity between any hotel to be aggregated and all basic hotels according to the hotel names, order all basic hotels according to the order of the name similarity from big to small, and use the top N basic hotels in the order as target basic hotels, where N is a positive integer; specifically, the target basic hotel obtaining module 32 performs a word segmentation operation on the hotel names of the hotels to be aggregated and any basic hotel; obtaining the attribute of the word segmentation obtained after the word segmentation operation according to a pre-constructed dictionary, wherein the attribute comprises a city, a name identification word, a description word, an industry word and a landmark; calculating attribute similarity of the word segmentation of the same attribute in hotel names of hotels to be aggregated and basic hotels; and calculating the product of the attribute similarity of each attribute and the corresponding attribute weight coefficient, and adding the products of all the attributes to obtain the name similarity of the hotels to be aggregated and the basic hotels.
When the attribute is a name identification word, calculating the attribute similarity of the word segmentation of the same attribute in the hotel names of the hotels to be aggregated and the basic hotels, wherein the method comprises the following steps: if all words of second name identification words of hotel names of hotels to be aggregated are included in first name identification words of hotel names of basic hotels, the word number of target words is obtained, attribute similarity is calculated according to the word number of the target words, the target words are words which are completely identical to the words of the second name identification words in the first name identification words, and the second name identification words consist of all the target words; if the first name identification word of the hotel name of the basic hotel does not include all words of the second name identification word of the hotel name of the hotel to be aggregated, the attribute similarity is 0. Further, calculating the attribute similarity according to the word number of the target word includes: acquiring a first length of a first name identification word, a second length of a second name identification word and a third length of a target word with the longest length; when the number of words is 1, calculating attribute similarity according to the first length and the second length, wherein the calculation formula is as follows:
attribute similarity = second length/first length + preset adjustment factor (1-second length/first length) × (1-1/2) (second Length-1) );
When the number of words is greater than 1, calculating attribute similarity according to the first length, the second length and the third length and the same number, wherein the calculation formula is as follows:
attribute similarity = (third length/2+second length+1/number of words) [ second length/first length+preset adjustment coefficient ] (third Length-1) )】。
When the attribute is not the name identification word, calculating the attribute similarity of the word segmentation of the same attribute in the hotel names of the hotels to be aggregated and the basic hotels, wherein the method comprises the following steps:
if the word of any attribute in the hotel names of the hotels to be aggregated is the same as the word of the corresponding same attribute in the hotel names of the basic hotels, the attribute similarity of the attributes is 1;
if the word of any attribute in the hotel names of the hotels to be aggregated is different from the word of the corresponding same attribute in the hotel names of the basic hotels, the attribute similarity of the attributes is 0.
Further comprises: if the hotel to be aggregated comprises the word segmentation of five attributes, the attribute weight coefficient corresponding to each attribute adopts a preset weight;
if the hotel to be aggregated does not all comprise the word of five attributes, acquiring a first preset weight of the attributes not included in the hotel name of the hotel to be aggregated and a second preset weight of the attributes included in the hotel name of the hotel to be aggregated; calculating target weights of attributes included in hotel names of hotels to be aggregated according to the first preset weights and the second preset weights, wherein a target weight calculation formula of any attribute is as follows:
Target weight = second preset weight + second preset weight/(1-first preset weight) × first preset weight.
As an alternative implementation manner of the embodiment of the present invention, the target basic hotel obtaining module 32 is configured to obtain a first basic hotel in which the country, city and county in the hotel address of the basic hotel are the same as the country, city and county in the hotel address of the hotel to be aggregated;
calculating the name similarity of the hotels to be aggregated and the first basic hotels according to the hotel names; and sequencing all the first basic hotels according to the sequence of the name similarity from big to small, and taking the first N first basic hotels in the sequencing as target basic hotels, wherein N is a positive integer.
The target hotel determination module 33 is configured to adjust the name similarity corresponding to the target basic hotel according to the hotel address and the contact phone to obtain a target similarity, and take the target basic hotel corresponding to the target similarity with the largest value as the target hotel; the hotel address comprises a road name and a house number; adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain the target similarity, wherein the method comprises the following steps:
if the hotel addresses of the hotels to be aggregated and the target basic hotels are the same, the target similarity of the target basic hotels is adjusted to be a full value;
If the hotel addresses of the hotel to be aggregated and the target basic hotel are not identical, the name similarity of the target basic hotel is adjusted according to the following mode to obtain the target similarity:
taking the value of the name similarity as hotel similarity;
if the road names are the same but the house numbers are different, reducing the hotel similarity at the current moment by a first preset value; if the contact phones of the hotel to be aggregated and the target basic hotel are the same, increasing the hotel similarity at the current moment by a second preset value; if the distance between the hotel to be aggregated and the target basic hotel is within the preset distance according to the hotel address, increasing the hotel similarity at the current moment by a third preset value;
and taking the final hotel similarity as the target similarity.
The hotel aggregation module 34 is configured to aggregate the hotel to be aggregated with the target hotel if the target similarity of the target hotel is greater than the preset similarity.
According to the hotel aggregation device for different suppliers provided by the embodiment of the invention, firstly, hotel data of all suppliers are obtained, then, hotels in the hotel data of one supplier are taken as basic hotels, and hotels in the hotel data of other suppliers are taken as hotels to be aggregated; firstly, primarily screening according to countries, cities and counties in hotel addresses to obtain a first basic hotel; so as to reduce the calculated amount when the name similarity is calculated subsequently; calculating the name similarity of the hotels to be aggregated and the first basic hotels according to the hotel names, and further selecting N first basic hotels with higher name similarity as target basic hotels; because the names are similar and the same hotel cannot be completely confirmed, the name similarity corresponding to the basic target hotel is also required to be adjusted according to the hotel address and the contact telephone, and the final target similarity is obtained; because at most one target basic hotel and the hotel to be aggregated are the same hotel, the target basic hotel corresponding to the target similarity with the largest value is used as the target hotel, and finally if the target similarity of the target hotel is greater than the preset similarity, the target hotel and the hotel to be aggregated are judged to be the same hotel, and the hotel to be aggregated and the target hotel are aggregated, so that the user can conveniently select the hotel.
In a fourth embodiment, the present invention provides a hotel aggregation system for different suppliers, comprising:
a processor;
a memory for storing processor-executable instructions;
the processor is configured to perform the hotel aggregation methods of the different suppliers provided by the first embodiment or the second embodiment.
According to the hotel aggregation system for different suppliers provided by the embodiment of the invention, executable instructions of the processor are stored through the memory, when the executable instructions are executed, the processor can acquire hotel data of all suppliers, then hotels in the hotel data of one supplier are taken as basic hotels, and hotels in the hotel data of other suppliers are taken as hotels to be aggregated; firstly calculating the name similarity of hotels to be aggregated and basic hotels according to the hotel names, and then selecting N basic hotels with higher name similarity as target basic hotels; because the names are similar and the same hotel cannot be completely confirmed, the name similarity corresponding to the basic target hotel is also required to be adjusted according to the hotel address and the contact telephone, and the final target similarity is obtained; because at most one target basic hotel and the hotel to be aggregated are the same hotel, the target basic hotel corresponding to the target similarity with the largest value is used as the target hotel, and finally if the target similarity of the target hotel is greater than the preset similarity, the target hotel and the hotel to be aggregated are judged to be the same hotel, and the hotel to be aggregated and the target hotel are aggregated, so that the user can conveniently select the hotel.
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "plurality" means at least two.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims (7)

1. A hotel aggregation method for different suppliers, comprising the steps of:
acquiring hotel data of all suppliers, wherein the hotel data comprises hotel names, hotel addresses and contact phones; taking hotels of hotel data of any provider as basic hotels, and taking hotels in the hotel data of the rest providers as hotels to be aggregated;
calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, sequencing all the basic hotels according to the sequence of the name similarity from big to small, and taking the first N basic hotels in the sequencing as target basic hotels, wherein N is a positive integer;
adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity, and taking the target basic hotel corresponding to the target similarity with the largest value as a target hotel;
If the target similarity of the target hotels is greater than the preset similarity, polymerizing the hotels to be polymerized with the target hotels;
calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, including:
performing word segmentation operation on hotel names of the hotels to be aggregated and any one of the basic hotels;
acquiring the attribute of the word segmentation obtained after the word segmentation operation according to a pre-constructed dictionary, wherein the attribute comprises a city, a name identification word, a description word, an industry word and a landmark;
calculating attribute similarity of the word segmentation of the hotel to be aggregated and the hotel name of the basic hotel;
calculating the product of the attribute similarity of each attribute and the corresponding attribute weight coefficient, and adding the products of all the attributes to obtain the name similarity of the hotel to be aggregated and the basic hotel;
when the attribute is a name identification word, the calculating the attribute similarity of the word segmentation of the same attribute in the hotel names of the hotel to be aggregated and the basic hotel includes:
if all words of second name identification words of the hotel names of the hotels to be aggregated are included in the first name identification words of the hotel names of the basic hotels, the word number of target words is obtained, the attribute similarity is calculated according to the word number of the target words, the target words are words which are identical to the words of the second name identification words in the first name identification words, and the second name identification words consist of all the target words;
If the first name identification word of the hotel name of the basic hotel does not comprise all words of the second name identification word of the hotel name of the hotel to be aggregated, the attribute similarity is 0;
the calculating the attribute similarity according to the word number of the target word comprises the following steps:
acquiring a first length of the first name identification word, a second length of the second name identification word and a third length of the target word with the longest length;
when the number of words is 1, calculating the attribute similarity according to the first length and the second length, wherein the calculation formula is as follows:
attribute similarity = second length/first length + preset adjustment factor (1-second length/first length) × (1-1/2) (second Length-1) );
When the number of words is greater than 1, calculating the attribute similarity according to the first length, the second length, the third length and the number of words, wherein the calculation formula is as follows:
attribute similarity = (third length/2+second length+1/number of words) [ second length/first length+preset adjustment coefficient ] (third Length-1) )】。
2. The method according to claim 1, characterized in that: when the attribute is not a name identification word, the calculating the attribute similarity of the word segmentation of the same attribute in the hotel names of the hotel to be aggregated and the basic hotel includes:
If the word of any attribute in the hotel name of the hotel to be aggregated is the same as the word of the corresponding same attribute in the hotel name of the basic hotel, the attribute similarity of the attribute is 1;
if the word of any attribute in the hotel name of the hotel to be aggregated is different from the word of the same attribute corresponding to the hotel name of the basic hotel, the attribute similarity of the attribute is 0.
3. The method as recited in claim 1, further comprising:
if the hotel to be aggregated comprises the word segmentation of five attributes, the attribute weight coefficient corresponding to each attribute adopts a preset weight;
if the hotel to be aggregated does not all comprise the word segmentation of five attributes, acquiring a first preset weight of the attributes not included in the hotel name of the hotel to be aggregated and a second preset weight of the attributes included in the hotel name of the hotel to be aggregated; calculating the target weight of the attribute included in the hotel name of the hotel to be aggregated according to the first preset weight and the second preset weight, wherein the calculation formula of the target weight of any attribute is as follows:
target weight = second preset weight + second preset weight/(1-first preset weight) × first preset weight.
4. The method according to claim 1, characterized in that: the hotel address comprises a road name and a house number; the step of adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity comprises the following steps:
if the hotel addresses of the hotels to be aggregated and the target basic hotel are the same, adjusting the target similarity of the target basic hotel to be a full value;
if the hotel addresses of the hotels to be aggregated and the target basic hotels are not identical, adjusting the name similarity of the target basic hotels to obtain target similarity according to the following mode:
taking the value of the name similarity as hotel similarity;
if the road names are the same but the house numbers are different, reducing the hotel similarity at the current moment by a first preset value; if the contact phones of the hotels to be aggregated and the target basic hotels are the same, increasing the hotel similarity at the current moment by a second preset value; if the distance between the hotel to be aggregated and the target basic hotel is within the preset distance according to the hotel address, increasing the hotel similarity at the current moment by a third preset value;
And taking the final hotel similarity as the target similarity.
5. The method according to claim 1, characterized in that: calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, including:
acquiring a first basic hotel of which the country, city and county in the hotel address of the basic hotel are the same as the country, city and county in the hotel address of the hotel to be aggregated;
calculating the name similarity of the hotels to be aggregated and the first basic hotels according to the hotel names; and sequencing all the first basic hotels according to the sequence of the name similarity from big to small, and taking the first N first basic hotels in the sequencing as target basic hotels, wherein N is a positive integer.
6. A hotel aggregation apparatus of a different vendor, comprising:
the hotel data acquisition module is used for acquiring hotel data of all suppliers, wherein the hotel data comprises hotel names, hotel addresses and contact phones; taking hotels of hotel data of any provider as basic hotels, and taking hotels in the hotel data of the rest providers as hotels to be aggregated;
the target basic hotel acquisition module is used for calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, sequencing all the basic hotels according to the sequence from the large to the small of the name similarity, and taking the first N basic hotels in the sequencing as target basic hotels, wherein N is a positive integer; calculating the name similarity of any hotel to be aggregated and all basic hotels according to the hotel names, including: performing word segmentation operation on hotel names of the hotels to be aggregated and any one of the basic hotels; acquiring the attribute of the word segmentation obtained after the word segmentation operation according to a pre-constructed dictionary, wherein the attribute comprises a city, a name identification word, a description word, an industry word and a landmark; calculating attribute similarity of the word segmentation of the hotel to be aggregated and the hotel name of the basic hotel; calculating the product of the attribute similarity of each attribute and the corresponding attribute weight coefficient, and adding the products of all the attributes to obtain the name similarity of the hotel to be aggregated and the basic hotel; when the attribute is a name identification word, the calculating the attribute similarity of the word segmentation of the same attribute in the hotel names of the hotel to be aggregated and the basic hotel includes: if all words of second name identification words of the hotel names of the hotels to be aggregated are included in the first name identification words of the hotel names of the basic hotels, the word number of target words is obtained, the attribute similarity is calculated according to the word number of the target words, the target words are words which are identical to the words of the second name identification words in the first name identification words, and the second name identification words consist of all the target words; if the first name identification word of the hotel name of the basic hotel does not comprise all words of the second name identification word of the hotel name of the hotel to be aggregated, the attribute similarity is 0; the calculating the attribute similarity according to the word number of the target word comprises the following steps: acquiring a first length of the first name identification word, a second length of the second name identification word and a third length of the target word with the longest length; when the number of words is 1, calculating the attribute similarity according to the first length and the second length, wherein the calculation formula is as follows: attribute similarity =second length/first length+preset adjustment coefficient (1-second length/first length) × (1-1/2) (second Length-1) ) The method comprises the steps of carrying out a first treatment on the surface of the When the number of words is greater than 1, calculating the attribute similarity according to the first length, the second length, the third length and the number of words, wherein the calculation formula is as follows: attribute similarity = (third length/2+second length+1/number of words) [ second length/first length+preset adjustment coefficient ] (third Length-1) )】;
The target hotel determination module is used for adjusting the name similarity corresponding to the target basic hotel according to the hotel address and the contact telephone to obtain target similarity, and taking the target basic hotel corresponding to the target similarity with the largest value as the target hotel;
and the hotel aggregation module is used for aggregating the hotels to be aggregated with the target hotels if the target similarity of the target hotels is greater than a preset similarity.
7. A hotel aggregation system of different suppliers, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to perform the method of any of claims 1-5.
CN202310044353.0A 2023-01-30 2023-01-30 Hotel aggregation method, device and system for different suppliers Active CN115797109B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310044353.0A CN115797109B (en) 2023-01-30 2023-01-30 Hotel aggregation method, device and system for different suppliers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310044353.0A CN115797109B (en) 2023-01-30 2023-01-30 Hotel aggregation method, device and system for different suppliers

Publications (2)

Publication Number Publication Date
CN115797109A CN115797109A (en) 2023-03-14
CN115797109B true CN115797109B (en) 2023-05-05

Family

ID=85429078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310044353.0A Active CN115797109B (en) 2023-01-30 2023-01-30 Hotel aggregation method, device and system for different suppliers

Country Status (1)

Country Link
CN (1) CN115797109B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874755B (en) * 2018-08-31 2024-04-12 阿里巴巴集团控股有限公司 Shop data processing method and device and electronic equipment
KR102375265B1 (en) * 2021-06-11 2022-03-15 안소현 A system of Hotel reservation that performs price comparison
CN114943462A (en) * 2022-06-02 2022-08-26 上海华客信息科技有限公司 Hotel group data processing method, system, equipment and storage medium
CN115392955B (en) * 2022-08-10 2024-03-01 中国银联股份有限公司 Store duplicate removal processing method, store duplicate removal processing device, store duplicate removal processing equipment and storage medium
CN115392961A (en) * 2022-08-18 2022-11-25 百威投资(中国)有限公司 Off-line merchant information matching method, device and storage medium

Also Published As

Publication number Publication date
CN115797109A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
US9646609B2 (en) Caching apparatus for serving phonetic pronunciations
JP6375293B2 (en) Method and apparatus for recommending candidate terms based on geographic location
CN106547871B (en) Neural network-based search result recall method and device
US8595210B2 (en) Search apparatus, search method and program
US20150254334A1 (en) Dynamic Language Model
KR101495980B1 (en) Method and apparatus for identifying network functions based on user data
JP2015084240A (en) Method and apparatus for classifying content
CN104992706A (en) Voice-based information pushing method and device
EP2946311A2 (en) Accumulation of real-time crowd sourced data for inferring metadata about entities
CN106951527B (en) Song recommendation method and device
CN105302887A (en) Information pushing method and pushing apparatus
US20130268521A1 (en) Related pivoted search queries
US20140280053A1 (en) Contextual socially aware local search
US20150026196A1 (en) Location-aware content detection
CN109710753B (en) Method and device for generating shortcut information based on personalized theme and electronic equipment
CN109740016A (en) Method, apparatus, server and the computer readable storage medium of music query
JP6594317B2 (en) Generating news timelines and recommended news editions
CN106126503A (en) Business scope localization method and terminal
WO2010096986A1 (en) Mobile search method and device
CN105573971B (en) Table reconfiguration device and method
US20160004748A1 (en) Generating localized name pronunciation
CN109255049A (en) A kind of wisdom music recommender system
US20160246789A1 (en) Searching content of prominent users in social networks
CN115797109B (en) Hotel aggregation method, device and system for different suppliers
CN109697244A (en) Information processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant