CN110765262A - POI text retrieval method and device and electronic equipment - Google Patents

POI text retrieval method and device and electronic equipment Download PDF

Info

Publication number
CN110765262A
CN110765262A CN201910906460.3A CN201910906460A CN110765262A CN 110765262 A CN110765262 A CN 110765262A CN 201910906460 A CN201910906460 A CN 201910906460A CN 110765262 A CN110765262 A CN 110765262A
Authority
CN
China
Prior art keywords
poi
pinyin
word segmentation
text
query information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910906460.3A
Other languages
Chinese (zh)
Inventor
沈奇
陈欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Didi Infinity Technology and Development Co Ltd
Original Assignee
Beijing Didi Infinity Technology and Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology and Development Co Ltd filed Critical Beijing Didi Infinity Technology and Development Co Ltd
Priority to CN201910906460.3A priority Critical patent/CN110765262A/en
Publication of CN110765262A publication Critical patent/CN110765262A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a POI text retrieval method, a POI text retrieval device and electronic equipment, wherein the method comprises the following steps: obtaining pinyin query information of a user; carrying out sequential word segmentation processing on the pinyin query information to obtain a word segmentation segment group comprising at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size; respectively taking the segmentation fragments in the segmentation fragment group as search words, and searching the POI texts in the current POI database by using the search words to obtain the matching degree of each POI text and the segmentation fragment group; each POI text in the POI database comprises POI pinyin fragments obtained through sequential word segmentation processing; and determining the POI text with the matching degree reaching a preset matching threshold value as a retrieval result corresponding to the pinyin query information. The retrieval accuracy of pinyin inquiry is improved by carrying out sequential word segmentation processing on pinyin inquiry information of a user and data in a POI database.

Description

POI text retrieval method and device and electronic equipment
Technical Field
The application relates to the technical field of data processing, in particular to a POI text retrieval method and device and electronic equipment.
Background
At present, a huge amount of POI (Point of Interest) data is stored in a POI library, and when a query input by a user is a pinyin scene or a pinyin plus text scene, a large amount of input queries often exist in a situation that prefixes of words in the POI database are inconsistent, for example: the POI vocabulary stored in the POI database is "university of people in china", and what the user inputs is "daxue of people". To recall such events, the current solution is to use pinyin word segmentation query, for example, to segment "people daxue" into "people, da, xue", but this approach is only a simple word segmentation process on the query input, and the recall result is still poor.
Disclosure of Invention
In view of the above, an object of the present application is to provide a method and an apparatus for searching a POI text, and an electronic device, which can improve accuracy of POI text search.
According to an aspect of the present application, there is provided a method for retrieving text of a POI, the method comprising: obtaining pinyin query information of a user; carrying out sequential word segmentation processing on the pinyin query information to obtain a word segmentation segment group comprising at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size; respectively taking the segmentation fragments in the segmentation fragment group as search words, and searching the POI texts in the current POI database by using the search words to obtain the matching degree of each POI text and the segmentation fragment group; each POI text in the POI database comprises POI pinyin fragments obtained through sequential word segmentation processing; and determining the POI text with the matching degree reaching a preset matching threshold value as a retrieval result corresponding to the pinyin query information.
In some embodiments, the step of obtaining pinyin query information of the user includes: receiving query information of a user; if the query information comprises Chinese characters, converting the Chinese characters into pinyin to obtain pinyin query information of the user.
In some embodiments, the step of performing sequential word segmentation processing on the pinyin query information to obtain a word segmentation group including at least one word segmentation segment includes: taking a preset value as the window size, and performing sliding window operation according to the byte stream of the pinyin query information to obtain at least one word segmentation segment arranged according to the byte stream sequence; and taking at least one word segmentation segment arranged according to the byte stream sequence as a word segmentation segment group corresponding to the pinyin query information.
In some embodiments, before the step of obtaining the pinyin query information of the user, the method further includes: obtaining pinyin fields corresponding to all POI texts in a POI database; aiming at the pinyin field corresponding to each POI text, the following steps are executed: performing sequential word segmentation processing on pinyin fields corresponding to the POI text to obtain at least one word segmentation segment corresponding to the POI text; and taking at least one word segmentation segment corresponding to the POI text as a POI pinyin segment corresponding to the POI text, and storing the POI pinyin segment in a POI database.
In some embodiments, the step of using the segmentation segments in the segmentation segment group as search words respectively, using the search words to search the POI texts in the current POI database to obtain the matching degree between each POI text and the segmentation segment group includes: for each POI text in the current POI database, the following steps are performed: using each participle fragment in the participle fragment group as a search word, searching in POI pinyin fragments corresponding to the POI text one by one, and determining the number of the participle fragments searched in the POI text by the participle fragments in the participle fragment group; and dividing the number of the retrieved word segmentation segments by the total number of the word segmentation segments in the word segmentation segment group to obtain the matching degree of the POI text and the word segmentation segment group.
In some embodiments, the step of using the segmentation segments in the segmentation segment group as search words respectively, using the search words to search the POI texts in the current POI database to obtain the matching degree between each POI text and the segmentation segment group includes: for each participle segment in the participle segment group, the following steps are performed: taking the word segmentation as a search word, searching in the POI pinyin segmentation corresponding to each POI text one by one, and updating the matching degree of each POI text according to the current search result; and when each participle fragment in the participle fragment group completes retrieval, taking the last matching degree of each POI text as the matching degree of the POI text and the participle fragment group.
In some embodiments, after the step of determining, as the retrieval result corresponding to the pinyin query information, the POI text whose matching degree reaches the preset matching threshold value, the method further includes: sequencing the POI texts in the retrieval result in the order of the heat degree of the POI texts from large to small; and displaying the sorted retrieval results.
In some embodiments, after the step of determining, as the retrieval result corresponding to the pinyin query information, the POI text whose matching degree reaches the preset matching threshold value, the method further includes: counting the times of target POI pinyin fragments included in each POI text in the retrieval result; the target POI pinyin fragment is a POI pinyin fragment matched with the participle fragments in the participle fragment group; sequencing POI texts in a retrieval result in a sequence of the number of target POI pinyin fragments in the POI texts from large to small; and displaying the sorted retrieval results.
In some embodiments, the pinyin query information is geographic location query information; before the step of obtaining the pinyin query information of the user, the method further comprises the following steps: receiving a region name input by a user; and taking the POI database corresponding to the area name as the current POI database.
In some embodiments, the preset value is 2 pinyin bytes or 3 pinyin bytes.
According to another aspect of the present application, there is provided a POI text retrieval apparatus, including: the information acquisition module is used for acquiring pinyin inquiry information of a user; the sequential word segmentation module is used for performing sequential word segmentation processing on the pinyin query information to obtain a word segmentation segment group comprising at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size; the retrieval module is used for retrieving the POI texts in the current POI database by using the retrieval words by taking the participle fragments in the participle fragment group as retrieval words respectively to obtain the matching degree of each POI text and the participle fragment group; each POI text in the POI database comprises POI pinyin fragments obtained through sequential word segmentation processing; and the retrieval result determining module is used for determining the POI text with the matching degree reaching the preset matching threshold value as the retrieval result corresponding to the pinyin query information.
In some embodiments, the information acquisition module is further configured to: receiving query information of a user; if the query information comprises Chinese characters, converting the Chinese characters into pinyin to obtain pinyin query information of the user.
In some embodiments, the sequential tokenization module is further to: taking a preset value as the window size, and performing sliding window operation according to the byte stream of the pinyin query information to obtain at least one word segmentation segment arranged according to the byte stream sequence; and taking at least one word segmentation segment arranged according to the byte stream sequence as a word segmentation segment group corresponding to the pinyin query information.
In some embodiments, the apparatus further comprises: the database word segmentation processing module is used for: obtaining pinyin fields corresponding to all POI texts in a POI database; aiming at the pinyin field corresponding to each POI text, the following steps are executed: performing sequential word segmentation processing on pinyin fields corresponding to the POI text to obtain at least one word segmentation segment corresponding to the POI text; and taking at least one word segmentation segment corresponding to the POI text as a POI pinyin segment corresponding to the POI text, and storing the POI pinyin segment in a POI database.
In some embodiments, the retrieval module is further to: for each POI text in the current POI database, the following steps are performed: using each participle fragment in the participle fragment group as a search word, searching in POI pinyin fragments corresponding to the POI text one by one, and determining the number of the participle fragments searched in the POI text by the participle fragments in the participle fragment group; and dividing the number of the retrieved word segmentation segments by the total number of the word segmentation segments in the word segmentation segment group to obtain the matching degree of the POI text and the word segmentation segment group.
In some embodiments, the retrieval module is further to: for each participle segment in the participle segment group, the following steps are performed: taking the word segmentation as a search word, searching in the POI pinyin segmentation corresponding to each POI text one by one, and updating the matching degree of each POI text according to the current search result; and when each participle fragment in the participle fragment group completes retrieval, taking the last matching degree of each POI text as the matching degree of the POI text and the participle fragment group.
In some embodiments, the apparatus further comprises: a ranking presentation module to: sequencing the POI texts in the retrieval result in the order of the heat degree of the POI texts from large to small; and displaying the sorted retrieval results.
In some embodiments, the ordering presentation module is further to: counting the times of target POI pinyin fragments included in each POI text in the retrieval result; the target POI pinyin fragment is a POI pinyin fragment matched with the participle fragments in the participle fragment group; sequencing POI texts in a retrieval result in a sequence of the number of target POI pinyin fragments in the POI texts from large to small; and displaying the sorted retrieval results.
In some embodiments, the pinyin query information is geographic location query information; the device still includes: a current database determination module to: receiving a region name input by a user; and taking the POI database corresponding to the area name as the current POI database.
In some embodiments, the preset value is 2 pinyin bytes or 3 pinyin bytes.
According to another aspect of the present application, there is provided an electronic device including: the electronic device comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the steps of any one of the methods.
According to another aspect of the application, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods described above.
Based on any one of the aspects, firstly, pinyin query information of a user is obtained, and sequential word segmentation processing is carried out on the pinyin query information to obtain a word segmentation group comprising at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size; then, taking the participle fragments in the participle fragment group as search words respectively, and searching the POI texts in the current POI database by using the search words to obtain the matching degree of each POI text and the participle fragment group; each POI text in the POI database comprises POI pinyin fragments obtained through sequential word segmentation processing; and finally, determining the POI text with the matching degree reaching a preset matching threshold value as a retrieval result corresponding to the pinyin query information. According to the method and the system, the pinyin query information of the user and the POI texts in the POI database are subjected to the sequential word segmentation processing, so that the retrieval matching is more accurate, and the retrieval accuracy of the pinyin query is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic architecture diagram illustrating a POI text retrieval system according to an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a POI text retrieval method provided in an embodiment of the present application;
fig. 3 is a flowchart illustrating another POI text retrieval method provided in an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a retrieval result in a POI text retrieval method provided in an embodiment of the present application;
fig. 5 is a flowchart illustrating another POI text retrieval method provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram illustrating a POI text retrieval apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram illustrating another POI text retrieval apparatus according to an embodiment of the present application;
fig. 8 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it should be understood that the drawings in the present application are for illustrative and descriptive purposes only and are not used to limit the scope of protection of the present application. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this application illustrate operations implemented according to some embodiments of the present application. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. One skilled in the art, under the guidance of this application, may add one or more other operations to, or remove one or more operations from, the flowchart.
In addition, the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
To enable those skilled in the art to use the present disclosure, the following embodiments are presented in conjunction with a specific application scenario, "geo-location query. It will be apparent to those skilled in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the application. Although the present application is described primarily in the context of geographic location queries, it should be understood that this is merely one exemplary embodiment.
It should be noted that in the embodiments of the present application, the term "comprising" is used to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.
Fig. 1 is a schematic architecture diagram of a POI text retrieval system according to an embodiment of the present disclosure. For example, the retrieval system of POI text may be an online transportation service platform for transportation services such as taxi, designated drive service, express, carpool, bus service, driver rental, or regular service, or any combination thereof. The retrieval system of POI text may include one or more of a server 110, a network 120, a service requester 130, and a POI database 140.
In some embodiments, the server 110 may include a processor. The processor may process information and/or data related to the service request to perform one or more of the functions described herein. For example, the processor may determine the retrieval result based on the service request, i.e., the geographic location query information, obtained from the service requester 130.
In some embodiments, the device type corresponding to the service request end 130 may be a mobile device, such as a smart home device, a wearable device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or may be a tablet computer, a laptop computer, or a built-in device in a motor vehicle, or the like.
In some embodiments, the POI database 140 may be connected to the network 120 to communicate with one or more components in a retrieval system of POI text (e.g., the server 110, the service requester 130, etc.). One or more components in the POI text retrieval system may access data or instructions stored in the POI database 140 via the network 120. In some embodiments, the POI database 140 may be directly connected to one or more components in the retrieval system of POI text, or the POI database 140 may also be part of the server 110.
The following describes in detail a method for retrieving a POI text provided in an embodiment of the present application with reference to the content described in the POI text retrieval system shown in fig. 1.
Referring to fig. 2, a schematic flow chart of a method for retrieving a POI text provided in the embodiment of the present application is shown, where the method may be executed by a server in a system for retrieving a POI text, and the specific execution process includes:
step S202, obtaining pinyin inquiry information of the user.
The server firstly acquires pinyin query information of a user based on POI text query information input by the user, if the query information input by the user is pure pinyin characters, the subsequent steps can be directly carried out, and if the query information is pinyin Chinese character mixing or pure Chinese character query, the pure pinyin query information needs to be converted into the pure pinyin query information. For example, the user inputs the information of the "renmindaxue" full pinyin or the information of the "renmin university" pinyin mixture, and the pinyin query information of the user finally obtained by the server is all the "renmindaxue".
Step S204, carrying out sequential word segmentation processing on the pinyin query information to obtain a word segmentation group comprising at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size.
After the server obtains pinyin query information of a user, sequential word segmentation processing is further performed on the pinyin query information, the sequential word segmentation processing is performed in a mode of sliding window operation according to a byte stream of the pinyin query information by taking a preset value as the size of a window, wherein the preset value can be an integer larger than or equal to 2, and certainly can not exceed the byte number of the pinyin query information, otherwise word segmentation has no significance. Generally, the smaller the preset value is, the greater the POI text recall rate is, i.e. the greater the number of POI texts retrieved. As a preferred implementation manner, the window size preset value adopted in the embodiment of the present application is 2 pinyin bytes or 3 pinyin bytes. For example, the pinyin query information is "renmindaxue", and the group of participle segments obtained by sequential participle processing is "renmin mindaxue" or "renminda mindaxue".
Step S206, using the participle fragments in the participle fragment group as search words respectively, and searching the POI texts in the current POI database by using the search words to obtain the matching degree of each POI text and the participle fragment group; each POI text in the POI database comprises POI pinyin fragments obtained through sequential word segmentation processing.
After a participle fragment group corresponding to pinyin query information is obtained, each participle fragment in the participle fragment group is used as a search word to search from a current POI database, and each POI text in the POI database is a POI pinyin fragment obtained through the sequential participle processing. Through retrieval, the matching degree of each POI text and the participle fragment group can be determined, namely the ratio of the number of the participle fragments matched with the POI pinyin fragments of the POI text to the total number of the participle fragments in the participle fragment group. There are various ways for the specific matching degree determination process, which will be described in detail later.
And S208, determining the POI text with the matching degree reaching the preset matching threshold value as a retrieval result corresponding to the pinyin query information.
Specifically, for example, if the set matching threshold is 100%, the POI text that completely matches the segmentation segments in the segmentation segment group is determined as the final search result, and this way requires that the matching degree is also 100%, so the final search result is relatively high. If the set matching threshold is 80%, the POI text with a matching degree equal to 80% can be determined as the final search result. In consideration of the number and accuracy of the retrieval results, the user can set the matching threshold value differently according to the actual situation.
The POI text retrieval method provided by the embodiment of the application can be used for a scene of quickly prompting related results in real time when a user inputs the scenic spot query information. By carrying out the sequential word segmentation processing on the pinyin query information of the user and the POI texts in the POI database, the retrieval matching is more accurate, and the retrieval accuracy of the pinyin query is improved.
The data in the POI database is usually geographical location text information, including administrative areas such as provinces, cities and districts in China, tourist attractions or specific types such as railway stations, bus stops and the like. In order to improve the recall accuracy, the query information mixed by pinyin and chinese characters is generally processed by pinyin conversion, that is, the steps shown in fig. 3:
step S302, receiving the query information of the user.
Step S304, if the query information includes Chinese characters, converting the Chinese characters into pinyin to obtain pinyin query information of the user.
Such as: the user inputs 'renmindaxue' full pinyin information, the pinyin query information is 'renmindaxue', if the user inputs 'renmin university' Chinese character pinyin mixed information, the server converts the Chinese characters into pinyin firstly, and finally the obtained pinyin query information of the user is 'renmindaxue'.
In some embodiments, the step of performing sequential word segmentation processing on the pinyin query information to obtain a word segmentation segment group including at least one word segmentation segment specifically includes the following steps:
taking a preset value as the window size, and performing sliding window operation according to the byte stream of the pinyin query information to obtain at least one word segmentation segment arranged according to the byte stream sequence; and taking at least one word segmentation segment arranged according to the byte stream sequence as a word segmentation segment group corresponding to the pinyin query information.
After pinyin query information of a user is acquired, sequential word segmentation processing can be performed on the pinyin query information through a bigram language model, for example: the pinyin query information of the user is: "renmindaxue", then it is participled as: "Renmminda daxue".
In addition, the pinyin query information can be subjected to sequential word segmentation processing with three pinyin bytes as window sizes through a trigram language model, for example, "university of people in china" can be split into: "Zhongguoren guorennminda mindaxue".
Meanwhile, in this embodiment, the above sequential word segmentation processing is also performed on the data in the POI database, and the specific process includes: obtaining pinyin fields corresponding to all POI texts in a POI database; aiming at the pinyin field corresponding to each POI text, the following steps are executed: performing sequential word segmentation processing on pinyin fields corresponding to the POI text to obtain at least one word segmentation segment corresponding to the POI text; and taking at least one word segmentation segment corresponding to the POI text as a POI pinyin segment corresponding to the POI text, and storing the POI pinyin segment in a POI database.
In actual operation, in order to give consideration to recall rate and retrieval accuracy of the POI text, the POI text in the POI database is subjected to word segmentation processing with a preset window size of 2 pinyin bytes. And storing the result after word segmentation in a database, wherein the database can be an original POI database or a newly established POI database. In order to improve the retrieval speed, a POI database can be newly established as the current POI database.
The following process of using the segmentation segments in the segmentation segment group as search words respectively and using the search words to search the POI texts in the current POI database to obtain the matching degree of each POI text and the segmentation segment group is elaborated in detail:
the first mode is as follows: for each POI text in the current POI database, the following steps are performed: using each participle fragment in the participle fragment group as a search word, searching in POI pinyin fragments corresponding to the POI text one by one, and determining the number of the participle fragments searched in the POI text by the participle fragments in the participle fragment group; and dividing the number of the retrieved word segmentation segments by the total number of the word segmentation segments in the word segmentation segment group to obtain the matching degree of the POI text and the word segmentation segment group.
Such as: the participle segments in the participle segment group are respectively 'renmin', 'minda' and 'daxue', the 'renmin', 'minda' and 'daxue' are respectively used as search words to be searched in POI pinyin segments corresponding to the POI text one by one, if the POI text also has the 'renmin' and the 'minda', namely two search words are searched in the POI text, the matching degree of the POI text and the participle segment group is as follows: 2/3. Similarly, if only "renmin" exists in a certain POI text, the matching degree is: 1/3.
The second mode is as follows: for each participle segment in the participle segment group, the following steps are performed: taking the word segmentation as a search word, searching in the POI pinyin segmentation corresponding to each POI text one by one, and updating the matching degree of each POI text according to the current search result; and when each participle fragment in the participle fragment group completes retrieval, taking the last matching degree of each POI text as the matching degree of the POI text and the participle fragment group.
Further, by using the above example, the participle segments in the participle segment group are respectively "renmin", "minda", and "daxue", and the "renmin" is used as the search word, and the search is performed one by one in each POI text, for example, if the participle segment is not searched in the first text, the current matching degree is 0, and the participle segment is searched in the second text, the current matching degree is 100%, and after the current matching degree with each POI text is determined, the search is performed using the second participle segment "minda" as the search word, for example, the participle segment is searched in the first text, the current matching degree is updated to 50%, and if the participle segment is not searched in the second text, the current matching degree is updated from 100% to 50%. And repeating the steps until the last word segmentation is used as a search word for searching, and determining the final matching degree of each POI text and the word segmentation group.
After the POI text whose matching degree reaches the preset matching threshold is determined as the retrieval result corresponding to the pinyin query information, since there may be many retrieval results that meet the condition, there is a sorting process when presenting to the user, for example, the steps in this embodiment further include:
sequencing the POI texts in the retrieval result in the order of the heat degree of the POI texts from large to small; and displaying the sorted retrieval results. As shown in fig. 4. The user inputs "yiyuan crown", and the corresponding search results are shown in the figure by the ranking results from high degree to low degree. The ranking is performed according to the popularity, so that the click rate of the user can be improved.
Or counting the times of the target POI pinyin fragments included in each POI text in the retrieval result; the target POI pinyin fragment is a POI pinyin fragment matched with the participle fragments in the participle fragment group; sequencing POI texts in a retrieval result in a sequence of the number of target POI pinyin fragments in the POI texts from large to small; and displaying the sorted retrieval results.
As a preferred embodiment, the pinyin query information is geographic location query information; before the step of obtaining the pinyin query information of the user, a determination process of the database may also be included, which specifically includes the following steps, as shown in fig. 5:
step S502, receiving the area name input by the user.
And step S504, taking the POI database corresponding to the area name as the current POI database.
As shown in fig. 4, if the large area selected by the user is beijing, the server will use the POI database corresponding to beijing as the current POI database for the user to search, so the search results are all the position information of the beginning of beijing.
According to the POI text retrieval method provided by the embodiment of the application, the segmentation fragments are used as the retrieval words, the retrieval is carried out in the POI database after the segmentation processing in advance, and the searching speed can be improved. The method can be used for a scene of quickly prompting relevant results in real time when the user inputs the scenic spot or other geographic position query information. On one hand, the position information to be input by the user can be supplemented and prompted, and on the other hand, the user can be guided to search accurately, so that the accuracy of the scenic spot retrieval result and the result click rate are improved.
Based on the foregoing method embodiment, an embodiment of the present application further provides a POI text retrieval apparatus, as shown in fig. 6, the apparatus includes: an information acquisition module 62, a sequential word segmentation module 64, a retrieval module 66, and a retrieval result determination module 68.
The information acquisition module 62 is configured to acquire pinyin query information of a user; a sequential word segmentation module 64, configured to perform sequential word segmentation processing on the pinyin query information to obtain a word segmentation segment group including at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size; the retrieval module 66 is configured to use the segmentation segments in the segmentation segment group as retrieval words respectively, and retrieve the POI texts in the current POI database by using the retrieval words to obtain a matching degree between each POI text and the segmentation segment group; each POI text in the POI database comprises POI pinyin fragments obtained through sequential word segmentation processing; and the retrieval result determining module 68 is configured to determine the POI text with the matching degree reaching the preset matching threshold as the retrieval result corresponding to the pinyin query information.
The POI text retrieval device provided by the embodiment of the application takes the segmentation fragments as the retrieval words, and retrieves in the POI database in which the POI texts subjected to word segmentation processing in advance are stored, so that the search speed can be improved. The method can be used for a scene of quickly prompting relevant results in real time when the user inputs the scenic spot or other geographic position query information. The method and the device can supplement and prompt the position information to be input by the user, and can guide the user to search accurately, so that the accuracy of the scenic spot retrieval result and the result click rate are improved.
In some embodiments, the apparatus further includes, in addition to the information obtaining module 702, the sequential word segmentation module 704, the retrieval module 706, and the retrieval result determination module 708, which are similar to those of the previous embodiment: a database participle processing module 710 and a ranking presentation module 712, as shown in fig. 7.
The database participle processing module 710 is configured to: obtaining pinyin fields corresponding to all POI texts in a POI database; aiming at the pinyin field corresponding to each POI text, the following steps are executed: performing sequential word segmentation processing on pinyin fields corresponding to the POI text to obtain at least one word segmentation segment corresponding to the POI text; and taking at least one word segmentation segment corresponding to the POI text as a POI pinyin segment corresponding to the POI text, and storing the POI pinyin segment in a POI database.
The ranking presentation module 712 is configured to: sequencing the POI texts in the retrieval result in the order of the heat degree of the POI texts from large to small; and displaying the sorted retrieval results.
In some embodiments, the ranking presentation module 712 is further configured to: counting the times of target POI pinyin fragments included in each POI text in the retrieval result; the target POI pinyin fragment is a POI pinyin fragment matched with the participle fragments in the participle fragment group; sequencing POI texts in a retrieval result in a sequence of the number of target POI pinyin fragments in the POI texts from large to small; and displaying the sorted retrieval results.
In some embodiments, the information obtaining module 702 is further configured to: receiving query information of a user; if the query information comprises Chinese characters, converting the Chinese characters into pinyin to obtain pinyin query information of the user.
In some embodiments, the above-mentioned sequential participle module 704 is further configured to: taking a preset value as the window size, and performing sliding window operation according to the byte stream of the pinyin query information to obtain at least one word segmentation segment arranged according to the byte stream sequence; and taking at least one word segmentation segment arranged according to the byte stream sequence as a word segmentation segment group corresponding to the pinyin query information.
In some embodiments, the retrieving module 706 is further configured to: for each POI text in the current POI database, the following steps are performed: using each participle fragment in the participle fragment group as a search word, searching in POI pinyin fragments corresponding to the POI text one by one, and determining the number of the participle fragments searched in the POI text by the participle fragments in the participle fragment group; and dividing the number of the retrieved word segmentation segments by the total number of the word segmentation segments in the word segmentation segment group to obtain the matching degree of the POI text and the word segmentation segment group.
In some embodiments, the retrieving module 706 is further configured to: for each participle segment in the participle segment group, the following steps are performed: taking the word segmentation as a search word, searching in the POI pinyin segmentation corresponding to each POI text one by one, and updating the matching degree of each POI text according to the current search result; and when each participle fragment in the participle fragment group completes retrieval, taking the last matching degree of each POI text as the matching degree of the POI text and the participle fragment group.
In some embodiments, the pinyin query information is geographic location query information; the above-mentioned device still includes: a current database determination module 714 for: receiving a region name input by a user; and taking the POI database corresponding to the area name as the current POI database.
In some embodiments, the predetermined value is 2 pinyin bytes or 3 pinyin bytes.
The modules may be connected or in communication with each other via a wired or wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, etc., or any combination thereof. The wireless connection may comprise a connection over a LAN, WAN, bluetooth, ZigBee, NFC, or the like, or any combination thereof. Two or more modules may be combined into a single module, and any one module may be divided into two or more units.
For ease of understanding, fig. 8 illustrates a schematic diagram of exemplary hardware and software components of an electronic device 800 that may implement the concepts of the present application, according to some embodiments of the present application. For example, the processor 820 may be used on the electronic device 800 and to perform functions in the present application.
The electronic device 800 may be a general-purpose computer or a special-purpose computer, both of which may be used to implement the method of identifying abnormal driving behavior of the present application. Although only a single computer is shown, for convenience, the functions described herein may be implemented in a distributed fashion across multiple similar platforms to balance processing loads.
For example, electronic device 800 may include a network port 810 connected to a network, one or more processors 820 for executing program instructions, a communication bus 830, and different forms of storage media 840, such as disks, ROM, or RAM, or any combination thereof. Illustratively, the computer platform may also include program instructions stored in ROM, RAM, or other types of non-transitory storage media, or any combination thereof. The method of the present application may be implemented in accordance with these program instructions. The electronic device 800 also includes an Input/Output (I/O) interface 850 between the computer and other Input/Output devices (e.g., keyboard, display screen).
For ease of illustration, only one processor is depicted in the electronic device 800. It should be noted, however, that the electronic device 800 in the present application may also include multiple processors, and thus steps performed by one processor described in the present application may also be performed by multiple processors in combination or separately. For example, if the processor of the electronic device 800 performs step a and step B, it should be understood that step a and step B may also be performed by two different processors together or performed separately in one processor. For example, a first processor performs step a and a second processor performs step B, or the first processor and the second processor perform steps a and B together.
The embodiment of the application also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the POI text retrieval method are executed.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is run, the method for retrieving the POI text can be executed, so that the problem of low accuracy rate of retrieving the POI text in the prior art is solved, and the effect of improving the retrieval accuracy rate in pinyin query is achieved.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this application. In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for retrieving POI text is characterized by comprising the following steps:
obtaining pinyin query information of a user;
carrying out sequential word segmentation processing on the pinyin query information to obtain a word segmentation segment group comprising at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size;
respectively taking the segmentation fragments in the segmentation fragment group as search words, and searching the POI texts in the current POI database by using the search words to obtain the matching degree of each POI text and the segmentation fragment group; each POI text in the POI database comprises a POI pinyin fragment obtained through the sequential word segmentation processing;
and determining the POI text with the matching degree reaching a preset matching threshold value as a retrieval result corresponding to the pinyin query information.
2. The method as claimed in claim 1, wherein the step of performing sequential word segmentation processing on the pinyin query information to obtain a word segmentation group including at least one word segmentation segment comprises:
taking a preset value as the window size, and performing sliding window operation according to the byte stream of the pinyin query information to obtain at least one word segmentation segment arranged according to the byte stream sequence;
and taking at least one word segmentation segment arranged according to the byte stream sequence as a word segmentation segment group corresponding to the pinyin query information.
3. The method of claim 1, further comprising, before the step of obtaining pinyin query information of the user:
obtaining pinyin fields corresponding to all POI texts in the POI database;
aiming at the pinyin field corresponding to each POI text, the following steps are executed:
performing the sequential word segmentation processing on the pinyin field corresponding to the POI text to obtain at least one word segmentation segment corresponding to the POI text; and taking at least one word segmentation segment corresponding to the POI text as a POI pinyin segment corresponding to the POI text, and storing the POI pinyin segment in the POI database.
4. The method according to claim 1, wherein the step of using the segmentation of the segmentation group as search words to search the POI texts in the current POI database by using the search words to obtain the matching degree between each POI text and the segmentation group comprises:
for each POI text in the current POI database, the following steps are performed:
using each participle fragment in the participle fragment group as a search word, searching in POI pinyin fragments corresponding to the POI text one by one, and determining the number of the participle fragments searched in the POI text by the participle fragments in the participle fragment group; and dividing the number of the retrieved word segmentation segments by the total number of the word segmentation segments in the word segmentation segment group to obtain the matching degree of the POI text and the word segmentation segment group.
5. The method according to claim 1, wherein the step of using the segmentation of the segmentation group as search words to search the POI texts in the current POI database by using the search words to obtain the matching degree between each POI text and the segmentation group comprises:
for each participle segment of the participle segment group, performing the following steps: searching in the POI pinyin fragment corresponding to each POI text one by taking the word segmentation fragments as search words, and updating the matching degree of each POI text according to the current search result;
and when each word segmentation in the word segmentation group is searched, taking the last matching degree of each POI text as the matching degree of the POI text and the word segmentation group.
6. The method according to claim 1, wherein after the step of determining the POI text having a matching degree reaching a preset matching threshold as the retrieval result corresponding to the pinyin query information, the method further comprises:
sequencing the POI texts in the retrieval result in the order of the heat degree of the POI texts from large to small;
and displaying the sorted retrieval results.
7. The method of claim 1, wherein the pinyin query information is geographic location query information;
before the step of obtaining the pinyin query information of the user, the method further comprises the following steps:
receiving a region name input by a user;
and taking the POI database corresponding to the area name as a current POI database.
8. An apparatus for searching text of POI, the apparatus comprising:
the information acquisition module is used for acquiring pinyin inquiry information of a user;
the sequential word segmentation module is used for performing sequential word segmentation processing on the pinyin query information to obtain a word segmentation segment group comprising at least one word segmentation segment; the sequential word segmentation processing is sliding window operation which is carried out according to the byte stream of the pinyin query information by taking a preset value as a window size;
the retrieval module is used for taking the participle fragments in the participle fragment group as retrieval words respectively, and retrieving the POI texts in the current POI database by applying the retrieval words to obtain the matching degree of each POI text and the participle fragment group; each POI text in the POI database comprises a POI pinyin fragment obtained through the sequential word segmentation processing;
and the retrieval result determining module is used for determining the POI text with the matching degree reaching a preset matching threshold value as the retrieval result corresponding to the pinyin query information.
9. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating via the bus when the electronic device is operating, the processor executing the machine-readable instructions to perform the steps of the method according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.
CN201910906460.3A 2019-09-24 2019-09-24 POI text retrieval method and device and electronic equipment Pending CN110765262A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910906460.3A CN110765262A (en) 2019-09-24 2019-09-24 POI text retrieval method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910906460.3A CN110765262A (en) 2019-09-24 2019-09-24 POI text retrieval method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN110765262A true CN110765262A (en) 2020-02-07

Family

ID=69330220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910906460.3A Pending CN110765262A (en) 2019-09-24 2019-09-24 POI text retrieval method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110765262A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767477A (en) * 2020-06-19 2020-10-13 北京百度网讯科技有限公司 Retrieval method, retrieval device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254557A (en) * 2011-07-04 2011-11-23 深圳市子栋科技有限公司 Navigation method and system based on natural voice identification
CN103530380A (en) * 2013-10-17 2014-01-22 北京奇虎科技有限公司 Vertical search device and method
CN104462085A (en) * 2013-09-12 2015-03-25 腾讯科技(深圳)有限公司 Method and device for correcting search keywords
CN106326233A (en) * 2015-06-18 2017-01-11 阿里巴巴集团控股有限公司 Address prompting method and device
CN108287843A (en) * 2017-01-09 2018-07-17 北京四维图新科技股份有限公司 A kind of method and apparatus and navigation equipment of interest point information retrieval
CN109885641A (en) * 2019-01-21 2019-06-14 瀚高基础软件股份有限公司 A kind of method and system of database Chinese Full Text Retrieval
CN110019645A (en) * 2017-09-28 2019-07-16 北京搜狗科技发展有限公司 Index base construction method, searching method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254557A (en) * 2011-07-04 2011-11-23 深圳市子栋科技有限公司 Navigation method and system based on natural voice identification
CN104462085A (en) * 2013-09-12 2015-03-25 腾讯科技(深圳)有限公司 Method and device for correcting search keywords
CN103530380A (en) * 2013-10-17 2014-01-22 北京奇虎科技有限公司 Vertical search device and method
CN106326233A (en) * 2015-06-18 2017-01-11 阿里巴巴集团控股有限公司 Address prompting method and device
CN108287843A (en) * 2017-01-09 2018-07-17 北京四维图新科技股份有限公司 A kind of method and apparatus and navigation equipment of interest point information retrieval
CN110019645A (en) * 2017-09-28 2019-07-16 北京搜狗科技发展有限公司 Index base construction method, searching method and device
CN109885641A (en) * 2019-01-21 2019-06-14 瀚高基础软件股份有限公司 A kind of method and system of database Chinese Full Text Retrieval

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767477A (en) * 2020-06-19 2020-10-13 北京百度网讯科技有限公司 Retrieval method, retrieval device, electronic equipment and storage medium
CN111767477B (en) * 2020-06-19 2023-07-28 北京百度网讯科技有限公司 Retrieval method, retrieval device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11698261B2 (en) Method, apparatus, computer device and storage medium for determining POI alias
CN109299320B (en) Information interaction method and device, computer equipment and storage medium
US9410817B2 (en) Method and apparatus for pushing track information
CN108197198B (en) Interest point searching method, device, equipment and medium
CN111782977B (en) Point-of-interest processing method, device, equipment and computer readable storage medium
CN106776763B (en) Destination searching method and device
US9251395B1 (en) Providing resources to users in a social network system
CN109492081B (en) Text information searching and information interaction method, device, equipment and storage medium
CN108228657B (en) Method and device for realizing keyword retrieval
CN111292752A (en) User intention identification method and device, electronic equipment and storage medium
CN109492066A (en) A kind of determination method, apparatus, equipment and the storage medium of point of interest branch name
CN111896016A (en) Position information processing method and device, storage medium and terminal
CN110688434A (en) Method, device, equipment and medium for processing interest points
JP3307843B2 (en) Map display device in hypertext structure
CN110609879B (en) Interest point duplicate determination method and device, computer equipment and storage medium
CN110765262A (en) POI text retrieval method and device and electronic equipment
CN108491387B (en) Method and apparatus for outputting information
CN107239209B (en) Photographing search method, device, terminal and storage medium
EP4174439A1 (en) Method and apparatus for processing map information, device, and storage medium
CN111984876A (en) Interest point processing method, device, equipment and computer readable storage medium
CN111831929A (en) Method and device for acquiring POI information
CN105574019B (en) Query parameter processing method and device
WO2012015021A1 (en) Stroke and structure input method and system
CN111382218A (en) System and method for point of interest (POI) retrieval
JP7142074B2 (en) Method, apparatus, device and computer readable storage medium used for navigation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200207

RJ01 Rejection of invention patent application after publication