CN110674367B

CN110674367B - Single Chinese character retrieval method and device based on travel industry products

Info

Publication number: CN110674367B
Application number: CN201910855488.9A
Authority: CN
Inventors: 王星杰; 洪晓; 李少辉; 朱少东; 赵文志; 王偕旭; 李乐天; 谢富成; 刘骏杰; 陈光站
Original assignee: Guangzhou Guangzhilv International Travel Service Co ltd; Guangzhou Yiqixing Information Technology Co ltd
Current assignee: Guangzhou Guangzhilv International Travel Service Co ltd; Guangzhou Yiqixing Information Technology Co ltd
Priority date: 2019-09-09
Filing date: 2019-09-09
Publication date: 2022-02-01
Anticipated expiration: 2039-09-09
Also published as: CN110674367A

Abstract

The invention discloses a single Chinese character retrieval method and a single Chinese character retrieval device based on products in the tourism industry, wherein the method constructs a coordinate system corresponding to each document to be compared according to characters in each document to be compared and query characters input by a user; matching the characters of the document to be compared and the query character in each coordinate system, and obtaining a plurality of matching points according to a preset screening strategy; connecting all the matching points to obtain a plurality of similarity judgment lines; respectively calculating the score of each similarity judgment line according to the influence degree of the position, the position sequence degree and the continuity degree of the characters; according to the scores of the similarity judgment lines in the documents to be compared, respectively obtaining similarity scores between the query characters and the documents to be compared; and sequencing the documents to be compared according to the similarity scores to obtain a query result. By adopting the technical scheme of the invention, the problem that tourism products are difficult to be subjected to similarity scoring/sequencing due to word meaning loss can be solved under a word list method full-text retrieval system.

Description

Single Chinese character retrieval method and device based on travel industry products

Technical Field

The invention relates to a search engine technology, in particular to a single Chinese character retrieval method and a single Chinese character retrieval device based on products in the tourism industry.

Background

Currently, in the field of Online Travel Agents (OTA), if a travel product retrieved by a user can be accurately located from a massive travel product library, and a travel product with the highest correlation degree is recommended to the user, improvement of user retention degree and order conversion rate is facilitated. Therefore, the efficient searching method of the travel products is an important precondition for realizing the rapid flow conversion.

In the prior art, a word list method or word list method full text retrieval system is generally adopted to retrieve tourism products. The word list method full-text retrieval system firstly carries out word segmentation on search contents input by a user, secondly carries out extraction on key words according to the content after word segmentation, and finally retrieves related travel products according to the key words. However, the vocabulary full-text retrieval system depends on a preset dictionary base, and the dictionary base needs to be maintained and updated all the time. However, in the OTA field, product information such as various groups, hotels, entrance tickets, etc. contains a lot of proper nouns (e.g., "high-speed rail tour", "japanese tour", "ulan", "weijing international", etc.), and the search method using the vocabulary full-text search system may have wrong phrase resolution, resulting in low precision. In addition, new words in the OTA field are endlessly layered, and it is difficult to record a dictionary with all words.

The word table method full text retrieval system firstly splits the search content input by the user into a plurality of single characters, and secondly retrieves the related travel products according to the split single characters. The full-text retrieval system based on the word table method does not need to depend on word banks and word segmentation, and the recall ratio can reach 100%. However, the existing word table method full-text retrieval system needs to divide the search content into a plurality of single characters, so that the number of related travel products retrieved according to the single characters is large; meanwhile, the search content is divided into single characters, which causes the condition of word meaning loss and is difficult to perform similarity scoring/sequencing on the obtained travel products, so that a user needs to spend a large amount of time to find the travel products related to the self demand, and further the retention degree and the order conversion rate of the user are reduced.

Disclosure of Invention

The embodiment of the invention provides a single Chinese character retrieval method and a single Chinese character retrieval device based on products in the tourism industry, which can solve the problem that the tourism products are difficult to be subjected to similarity scoring/sequencing due to word meaning deficiency under a word table method full-text retrieval system.

The embodiment of the invention provides a single Chinese character retrieval method based on products in the tourism industry, which comprises the following steps:

acquiring N query characters input by a user, wherein N is a positive integer greater than 0;

acquiring all documents to be compared according to the N query characters, wherein each document to be compared comprises a plurality of characters, and the document to be compared comprises the N query characters;

respectively constructing a plane rectangular coordinate system corresponding to each document to be compared according to the characters in each document to be compared and the N query characters; one document to be compared corresponds to one rectangular plane coordinate system;

matching the document to be compared with the characters with the same N query characters in each plane rectangular coordinate system to obtain coordinates of a plurality of matching points to form first matching point coordinate data; one rectangular plane coordinate system corresponds to one first matching point coordinate data;

screening the conflict matching points in the coordinate data of each first matching point according to a preset screening strategy to obtain coordinate data of a second matching point;

connecting the matching points in the second matching point coordinate data according to a preset connection rule to obtain a plurality of similarity judgment lines;

respectively calculating the similarity score of each similarity judgment line according to the influence degree of the position, the position sequence degree of the character and the continuity degree;

respectively obtaining similarity scores between the N query characters and the documents to be compared according to the similarity scores of the similarity judgment lines in the documents to be compared;

and sequencing the documents to be compared according to the similarity scores of the documents to be compared to obtain a query result.

As a preferred scheme, according to a preset screening strategy, screening the conflicting matching points in each first matching point coordinate data to obtain second matching point coordinate data, specifically:

judging whether the first matching point coordinate data has a conflict matching point; if the first matching point coordinate data does not have the conflict matching point, marking the first matching point data as second matching point coordinate data;

if the first matching point coordinate data contains the conflict matching points, extracting all the conflict matching points from the first matching point coordinate data;

calculating a cosine value between each straight line and the abscissa axis according to the straight line determined by each conflict matching point and a matching point adjacent to the right side of the conflict matching point, and sequentially obtaining a first cosine value corresponding to each conflict matching point;

respectively calculating the difference between the first cosine value and the optimal reference value, and acquiring a conflict matching point with the minimum difference;

and eliminating all the conflict matching points in the first matching point coordinate data except the conflict matching point with the minimum difference value to obtain second matching point coordinate data.

Preferably, the similarity score of each of the similarity determination lines is calculated as follows:

wherein the content of the first and second substances,

for the degree of influence of position, sim (L)_i) Is the degree of the positional order of the characters, (X)_i+1-X_i) Is a degree of continuity.

Preferably, the influence degree of the position is calculated as follows:

wherein, f (X)_i) The value range is (0,1), the constant a as a super parameter can be adjusted according to the actual situation, and X_iThe abscissa value of the ith matching point.

Preferably, the calculation method of the position order of the characters is as follows:

wherein L is_iIs a passing point (X)_i,Y_i) And point (X)_i+1,Y_i+1) Is the matching point M, cos theta_i＝(X_i,Y_i) A matching point M adjacent to the right of the matching point_i+1＝(X_i+1,Y_i+1) And cosine of the axis of abscissa.

Preferably, the continuity is obtained from a difference in abscissa values between two matching points passing through the similarity judgment line.

Correspondingly, this embodiment still provides a single chinese character retrieval device based on tourism trade product, includes:

the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring N query characters input by a user, and N is a positive integer larger than 0;

the query module is used for acquiring all documents to be compared according to the N query characters, wherein each document to be compared comprises a plurality of characters, and the document to be compared comprises the N query characters;

the rectangular coordinate system generating module is used for respectively constructing a planar rectangular coordinate system corresponding to each document to be compared according to the characters in each document to be compared and the N query characters; one document to be compared corresponds to one rectangular plane coordinate system;

the matching data acquisition module is used for matching the document to be compared with the characters with the same N query characters in each rectangular plane coordinate system to acquire coordinates of a plurality of matching points to form first matching point coordinate data; one rectangular plane coordinate system corresponds to one first matching point coordinate data;

the matching point data screening module is used for screening the conflict matching points in the first matching point coordinate data according to a preset screening strategy to obtain second matching point coordinate data;

a similarity judgment line obtaining module, configured to connect matching points in each of the second matching point coordinate data according to a preset connection rule to obtain multiple similarity judgment lines;

the first calculation module is used for respectively calculating the similarity score of each similarity judgment line according to the influence degree of the position, the position sequence degree of the character and the continuity degree;

a second calculation module, configured to obtain similarity scores between the N query characters and the documents to be compared, respectively, according to similarity scores of the similarity judgment lines in the documents to be compared;

and the ranking module is used for ranking the documents to be compared according to the similarity scores of the documents to be compared to obtain a query result.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides a single Chinese character retrieval method based on products in the tourism industry, which comprises the steps of constructing a coordinate system corresponding to each document to be compared according to characters in each document to be compared and query characters input by a user; matching the characters of the document to be compared and the query character in each coordinate system, and obtaining a plurality of matching points according to a preset screening strategy; connecting all the matching points to obtain a plurality of similarity judgment lines; respectively calculating the score of each similarity judgment line according to the influence degree of the position, the position sequence degree and the continuity degree of the characters; according to the scores of the similarity judgment lines in the documents to be compared, respectively obtaining similarity scores between the query characters and the documents to be compared; and sequencing the documents to be compared according to the similarity scores to obtain a query result. Compared with the prior art that the tourism products are searched by word list full-text search, the method and the system can solve the problem that the tourism products are difficult to be subjected to similarity scoring/sequencing due to word meaning loss under a word list full-text search system.

Drawings

FIG. 1 is a schematic flow chart of a first embodiment of a single Chinese character retrieval method based on products in the travel industry according to the present invention;

fig. 2 is a schematic structural diagram of a second embodiment of the single chinese character retrieval device based on products in the travel industry according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a single chinese character search method based on products in the travel industry according to a first embodiment of the present invention. As shown in fig. 1, the construction method includes steps 101 to 109, and each step is as follows:

step 101: n query characters input by a user are obtained, wherein N is a positive integer larger than 0.

Step 102: and acquiring all documents to be compared according to the N query characters, wherein each document to be compared comprises a plurality of characters, and each document to be compared comprises the N query characters.

In this embodiment, according to the N query characters, the word table method full-text retrieval system is used to obtain all the documents to be compared, which is beneficial to ensuring that the recall ratio of the obtained documents to be compared is 100%; meanwhile, the word list method full-text retrieval system does not need to depend on word banks and word segmentation, and has strong new word processing capacity.

In this embodiment, the documents to be compared refer to travel products. For example, the N query characters input by the user are "shanghai double-flying", the obtained document to be compared is "shanghai double-flying 5 tours", and the document to be compared of "shanghai double-flying 5 tours" contains the four query characters input by the user, i.e., "shanghai double-flying".

Step 103: respectively constructing a plane rectangular coordinate system corresponding to each document to be compared according to the characters in each document to be compared and the N query characters; wherein, one document to be compared corresponds to one rectangular plane coordinate system.

For example, N query characters are "guangzhou hotel", the document to be compared is "guangzhou hotel (original guangzhou hotel)", N query characters are used as ordinate, the document to be compared is used as abscissa, and each character is a unit length, a plane rectangular coordinate system is established, wherein the coordinate of each query character is: p_{All-grass of Longtube Fang}(0,0)，P_{State of the year}(0,1)，P_{Guest with a lock}(0,2)，P_Shop(0, 3); the coordinates of the characters in each document to be compared are: p_{All-grass of Longtube Fang}(0,0)，P_{State of the year}(1,0)，P_{Guest with a lock}(2,0)，P_Shop(3,0)，P₍(4,0)，P_{Original source}(5,0)，P_{All-grass of Longtube Fang}(6,0)，P_{State of the year}(7,0)，P_Wine(8,0)，P_Shop(9,0)，P₎(10,0)。

As an example of this embodiment, a rectangular plane coordinate system may be established by taking the query character as an abscissa and taking the document to be compared as an ordinate. At this time, the single Chinese character retrieval method can be realized only by correspondingly adjusting the formulas in the subsequent steps.

Step 104: in each plane rectangular coordinate system, matching the document to be compared with the characters with the same number of N query characters to obtain the coordinates of a plurality of matching points to form first matching point coordinate data; and one planar rectangular coordinate system corresponds to one first matching point coordinate data.

For example, the N query characters are "guangzhou hotel", the document to be compared is "guangzhou hotel (original guangzhou hotel)", and in the rectangular plane coordinate system constructed by "guangzhou hotel" and "guangzhou hotel (original guangzhou hotel)", the characters of the document to be compared, which are the same as the N query characters, such as "guang", "state", "guest", "museum", "guang", "state", are matched, and the coordinates corresponding to these characters, such as P, are obtained_{Guang 1}(0,0)，P_{State 1}(1,1)，P_{Guest with a lock}(2,2)，P_{Shop 1}(3,3)，P_{Guang 2}(6,0)，P_{State 2}And (7,1) forming first matching point coordinate data.

Step 105: and screening the conflict matching points in the coordinate data of each first matching point according to a preset screening strategy to obtain coordinate data of a second matching point.

In this embodiment, step 105 specifically includes: judging whether the first matching point coordinate data has a conflict matching point; if the first matching point coordinate data does not have the conflict matching point, marking the first matching point data as second matching point coordinate data; if the first matching point coordinate data has the conflict matching points, extracting all the conflict matching points from the first matching point coordinate data; calculating a cosine value between each collision matching point and the abscissa axis according to a straight line determined by each collision matching point and a matching point adjacent to the right side of the collision matching point, and sequentially obtaining a first cosine value corresponding to each collision matching point; respectively calculating the difference between the first cosine value and the optimal reference value, and acquiring a conflict matching point with the minimum difference; eliminating all conflict matching points in the first matching point coordinate data except the conflict matching point with the minimum difference value to obtain second matching point coordinate data; the judgment standard of the conflict matching points is that a plurality of matching points appear on the same abscissa value, and the matching points are called conflict matching points.

For example, if the N query characters input by the user are "guangzhou hotel", the document to be compared is "guangzhou hotel (original guangzhou hotel)", and in the rectangular plane coordinate system constructed by "guangzhou hotel" and "guangzhou hotel (original guangzhou hotel)", the characters of the document to be compared, which are the same as the N query characters, such as "guangzhou", "state", "guest", "restaurant", "guangzhou", "state", are matched, and the coordinates corresponding to these characters, such as P, are obtained_{Guang 1}(0,0)，P_{State 1}(1,1)，P_{Guest with a lock}(2,2)，P_{Shop 1}(3,3)，P_{Guang 2}(6,0)，P_{State 2}And (7,1) forming first matching point coordinate data, wherein the first matching point coordinate data does not contain the conflict matching point, and therefore the first matching point coordinate data is marked as second matching point coordinate data.

If the N query characters input by the user are ' Guangdong Guangzhou hotels ', the document to be compared is ' Guangdong Guangzhou hotels ' (former Guangzhou hotels '). In a plane rectangular coordinate system constructed by 'Guangdong Guangzhou Hotel' and 'Guangdong Guangzhou Hotel (original Guangzhou Hotel'), the coordinates of each query character are as follows: p_{All-grass of Longtube Fang}(0,0)，P_East(0,1)，P_{All-grass of Longtube Fang}(0,2)，P_{State of the year}(0,3)，P_{Guest with a lock}(0,4)，P_Shop(0, 5); the coordinates of the characters in each document to be compared are: p_{All-grass of Longtube Fang}(0,0)，P_East(1,0)，P_{All-grass of Longtube Fang}(2,0)，P_{State of the year}(3,0)，P_{Guest with a lock}(4,0)，P_Shop(5,0)，P₍(6,0)，P_{Original source}(7,0)，P_{All-grass of Longtube Fang}(8,0)，P_{State of the year}(9,0)，P_Wine(10,0)，P_Shop(11,0)，P₎(12,0). Matching the document to be compared with the characters with the same N query characters, such as 'wide', 'east', 'wide', 'state', 'guest', 'museum', 'wide', 'state', and obtaining the corresponding coordinates of the characters, such as P_{Guang 1}(0,0)，P_{Guang 1}(0,2)，P_East(1,1)，P_{Guang 2}(2,0)，P_{Guang 2}(2,2)，P_{State 1}(3,3)，P_{Guest with a lock}(4,4)，P_Shop(5,5)，P_{Guang 3}(8,0)，P_{Guang 3}(8,2)，P_{State 2}(9,3) forming first matching point coordinate data, wherein if the first matching point coordinate data contains a conflict matching point "wide", calculating according to the following steps:

firstly, according to the longitudinal coordinate value of the conflict matching point, selecting from small to large, and marking as M_ijWhere index i represents the ith match point, index j represents the jth conflicting match point, P_{Guang 1}(0,0) is M₁₁，P_{Guang 1}(0,2) is M₂₁，P_{Guang 2}(2,0) is M₄₂，P_{Guang 2}(2,2) is M₅₂，P_{Guang 3}(8,0) is M₉₃，P_{Guang 3}(8,2) is M₁₀₃. With P_{Guang 2}(2,0) and P_{Guang 2}(2,2) for example, P is calculated separately_{Guang 2}(2,0) and P_{Guang 2}(2,0) matching point P adjacent to the right_{State 1}(3,3) calculating the cosine value between the horizontal and vertical scalesThe formula is as follows:

respectively adding P_{Guang 2}(2,0) and P_{State 1}(3,3) substituting into the formula to obtain

In the same way, P_{Guang 2}(2,2) and neighboring matching points P_{State 1}(3,3), cosine values between the horizontal and vertical scales are as follows:

next, cos θ is calculated separately₁，cosθ₂The difference between the two is 45 degrees, the conflict matching point with small difference is selected, and P is selected_{Guang 2}(2,0) collision matching point, thus the collision matching point P_{Guang 2}And (2,2) removing.

In this embodiment, if M_iMatch points for conflicts, and M_i+1Also for conflicting matching points, we should prefer M_i+2After the determined selection, go back to the calculation M_i+1And so on.

Step 106: and connecting the matching points in the coordinate data of each second matching point according to a preset connection rule to obtain a plurality of similarity judgment lines.

In this embodiment, taking the plane rectangular coordinate system constructed by "guangzhou hotel" and "guangzhou hotel (original guangzhou hotel)" as an example, the matching point in the second matching point coordinate data is P_{Guang 1}(0,0)，P_{State 1}(1,1)，P_{Guest with a lock}(2,2)，P_{Shop 1}(3,3)，P_{Guang 2}(6,0)，P_{State 2}(7,1) connecting the matching points in a left-to-right manner to obtain L₁，L₂，L₃，L₄，L₅Wherein L is₁Is passing through P_{Guang 1}(0,0) and P_{State 1}Straight line of (1,1), L₂Is passing through P_{State 1}(1,1) and P_{Guest with a lock}Straight line of (2,2), L₃Is passing through P_{Guest with a lock}(2,2) and P_{Shop 1}Straight line of (3,3), L₄Is passing through P_{Shop 1}(3,3) and P_{Guang 2}(60) straight line, L₅Is passing through P_{Guang 2}(6,0) and P_{State 2}The straight line of (7, 1).

Step 107: and respectively calculating the similarity score of each similarity judgment line according to the influence degree of the position, the position sequence degree of the character and the continuity.

In this embodiment, the influence of the location refers to a relationship between the location of the N query characters input by the user and the starting location of the document to be compared, for example, if the N query characters are "guangzhou", the document to be compared S1 is "guangzhou hotel", and the document to be compared S2 is "guangdong guangzhou hotel", the correlation between "guangzhou" and "guangzhou hotel" is higher than the correlation between "guangzhou" and "guangdong guangzhou hotel".

In the present embodiment, the position order degree of the characters refers to the relationship between the position of each character in the N query characters input by the user and the position of each character in the document to be compared, for example, if the N query characters are "guangzhou", the document to be compared S1 is "guangzhou winston", the document to be compared S2 is "winston hotel in guangxi congratu", and the correlation of "winston hotel in guangxi congratu" is higher than that of "winston hotel in guangxi congratu.

In this embodiment, the continuity refers to a length relationship between the characters in the document to be compared and N query characters input by the user and continuously matched characters, for example, if the N query characters are "guangdong zhou", the document to be compared S1 is "guangdong guangzhou shengliang", and the document to be compared S2 is "guangdong guangzhou", the number of the characters continuously matched by "guangdong guangzhou shengliang" and "guangdong guangzhou" is greater than the number of the characters continuously matched by "guangdong guangzhou" and "guangdong guangzhou", so that the "guangdong shengliang hotel" has a higher correlation than the "guangdong guangzhou".

In this embodiment, the similarity score of each similarity judgment line is calculated as follows:

wherein the content of the first and second substances,

for the degree of influence of position, sim (L)_i) Is the degree of the positional order of the characters, (X)_i+1-X_i) In order to be a degree of continuity,

and (4) representing the similarity score of the ith similarity judgment line.

In this embodiment, when calculating the influence degree of the position, the smaller the abscissa of the line segment start matching point is, the greater the influence degree is, so that:

formula for calculating the influence degree as a position, wherein f (X)_i) The value range is (0,1), the constant a as a super parameter can be adjusted according to the actual situation, and the value range is +/-infinity, X_iExpressed as the abscissa value of the ith matching point.

In the present embodiment, when calculating the degree of the order of the positions of the characters, the inventors found through a large number of experiments that when the positions of N query characters match and are continuous, and at the same time θ ∈ [0 °, 90 °), the slope k of the connected similarity determination line becomes 1, and forms an angle θ of 45 degrees with the abscissa axis. Therefore, the highest score is obtained when the angle θ between the line segments is 45 degrees. In addition, when θ ∈ [90 °, 180 ° ], it belongs to a punitive score, i.e., a negative score. As the angle θ increases, its penalty should be larger. Then, a segmentation formula for obtaining the similarity:

therefore, the formula for calculating the degree of positional order of characters is as follows:

wherein L is_iTo pass through the matching point M_i(X_i,Y_i) And matching point M_i+1(X_i+1,Y_i+1) Is the matching point M, cos theta_i＝(X_i,Y_i) A matching point M adjacent to the right of the matching point_i+1＝(X_i+1,Y_i+1) Cosine of the axis of abscissa; by sim (L)_i) The control of forward adding or backward subtracting can be obtained, thereby effectively improving the precision ratio.

In the present embodiment, in calculating the continuity, the inventors found through a large number of experiments that the longer the length of the similarity judgment line is, the higher the degree of influence is, and thus the formula len (L) is used_i)＝(X_i+1-X_i) The continuity of the similarity judgment line obtained from the difference in abscissa values between the two matching points passing through the similarity judgment line is calculated.

Step 108: and respectively obtaining similarity scores between the N query characters and the documents to be compared according to the similarity scores of the similarity judgment lines in the documents to be compared.

In this implementation, the similarity scores of the similarity judgment lines in the documents to be compared are summed up, and the calculation formula is as follows:

wherein L is_iThe ith similarity judgment line is shown.

And (4) representing the similarity score of the ith similarity judgment line. The similarity score of each document to be compared can be obtained through the formula, and the higher the similarity score is, the higher the similarity is. Generally, if there is no similar judgment line with a slope k equal to 1 in all the "similar judgment lines", the documents to be compared corresponding to the similar judgment lines are eliminated.

Step 109: and sequencing the documents to be compared according to the similarity scores of the documents to be compared to obtain a query result.

In view of the above, the single chinese character retrieval method based on products in the travel industry according to the embodiments of the present invention constructs a coordinate system corresponding to each document to be compared according to characters in each document to be compared and query characters input by a user; matching the characters of the document to be compared and the query character in each coordinate system, and obtaining a plurality of matching points according to a preset screening strategy; connecting all the matching points to obtain a plurality of similarity judgment lines; respectively calculating the score of each similarity judgment line according to the influence degree of the position, the position sequence degree and the continuity degree of the characters; according to the scores of the similarity judgment lines in the documents to be compared, respectively obtaining similarity scores between the query characters and the documents to be compared; and sequencing the documents to be compared according to the similarity scores to obtain a query result. Compared with the prior art that the tourism products are searched by word list full-text search, the method and the system can solve the problem that the tourism products are difficult to be subjected to similarity scoring/sequencing due to word meaning loss under a word list full-text search system.

Second embodiment of the invention:

fig. 2 is a schematic structural diagram of a single chinese character retrieval device based on products in the travel industry according to a second embodiment of the present invention. The device includes: the device comprises an acquisition module 201, a query module 202, a rectangular coordinate system generation module 203, a matching point data acquisition module 204, a matching point data screening module 205, a similarity judgment line acquisition module 206, a first calculation module 207, a second calculation module 208 and a sorting module 209.

An obtaining module 201, configured to obtain N query characters input by a user, where N is a positive integer greater than 0;

the query module 202 is configured to obtain all documents to be compared according to the N query characters, where each document to be compared includes a plurality of characters, and the document to be compared includes the N query characters;

the rectangular coordinate system generating module 203 is configured to respectively construct a planar rectangular coordinate system corresponding to each document to be compared according to the characters in each document to be compared and the N query characters; one document to be compared corresponds to one rectangular plane coordinate system;

a matching point data obtaining module 204, configured to match, in each planar rectangular coordinate system, a document to be compared with characters of which the N query characters are the same, obtain coordinates of a plurality of matching points, and form first matching point coordinate data; one rectangular plane coordinate system corresponds to one first matching point coordinate data;

the matching point data screening module 205 is configured to screen a conflict matching point in each first matching point coordinate data according to a preset screening policy to obtain second matching point coordinate data;

a similarity judgment line obtaining module 206, configured to connect the matching points in each second matching point coordinate data according to a preset connection rule, so as to obtain multiple similarity judgment lines;

a first calculating module 207, configured to calculate a similarity score of each similarity judgment line according to the influence degree of the position, the position order degree of the character, and the continuity degree;

the second calculating module 208 is configured to obtain similarity scores between the N query characters and the documents to be compared, respectively, according to the similarity score of each similarity judgment line in each document to be compared;

and the sorting module 209 is configured to sort the documents to be compared according to the similarity scores of the documents to be compared, so as to obtain a query result.

The more detailed working principle and process of this embodiment can refer to, but are not limited to, the single chinese character retrieval method based on products in the travel industry described in the first embodiment.

As can be seen from the above, the single chinese character retrieval device based on the products in the travel industry according to the embodiments of the present invention obtains the score of the similarity between each document to be compared and the query character input by the user by calculating the influence degree of the position of each similarity judgment line, the position sequence degree of the character, and the continuity degree, and then ranks the documents to be compared. Therefore, the problem that the tourism products are difficult to score/sort in similarity due to word sense loss under a word list full-text retrieval system is solved.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A single Chinese character retrieval method based on products in the travel industry is characterized by comprising the following steps:

2. The single chinese character retrieval method based on travel industry products as claimed in claim 1, wherein the screening of the conflicting matching points in each of the first matching point coordinate data according to a preset screening policy to obtain second matching point coordinate data specifically comprises:

3. The single Chinese character retrieval method based on travel industry products as claimed in claim 1, wherein the similarity score of each similarity judgment line is calculated by the following method:

wherein the content of the first and second substances,

4. The single Chinese character retrieval method based on travel industry products as claimed in claim 3, wherein the influence degree of the position is calculated by the following method:

wherein, f (X)_i) The position influence degree is (0,1) in the value range, the constant a as a super parameter can be adjusted according to the actual situation, and X_iThe abscissa value of the ith matching point.

5. The single Chinese character retrieval method based on travel industry products as claimed in claim 3, wherein the position sequence degree of the characters is calculated by the following method:

6. The single chinese character retrieval method based on travel industry products as set forth in claim 3, wherein the continuity is obtained by a difference of abscissa values between two matching points passing through the similarity judgment line.

7. A single Chinese character retrieval device based on products in the travel industry is characterized by comprising:

8. The single chinese character retrieval device based on travel industry products as recited in claim 7, wherein the step of screening the conflicting matching points in each of the first matching point coordinate data according to a preset screening policy to obtain second matching point coordinate data specifically comprises: