CN108491387B - Method and apparatus for outputting information - Google Patents

Method and apparatus for outputting information Download PDF

Info

Publication number
CN108491387B
CN108491387B CN201810231488.7A CN201810231488A CN108491387B CN 108491387 B CN108491387 B CN 108491387B CN 201810231488 A CN201810231488 A CN 201810231488A CN 108491387 B CN108491387 B CN 108491387B
Authority
CN
China
Prior art keywords
keyword
name
word vector
category
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810231488.7A
Other languages
Chinese (zh)
Other versions
CN108491387A (en
Inventor
阎晓静
孙建丽
刘燕云
金鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Original Assignee
Baidu Online Network Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu Online Network Technology Beijing Co Ltd filed Critical Baidu Online Network Technology Beijing Co Ltd
Priority to CN201810231488.7A priority Critical patent/CN108491387B/en
Publication of CN108491387A publication Critical patent/CN108491387A/en
Application granted granted Critical
Publication of CN108491387B publication Critical patent/CN108491387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method and a device for outputting information. One embodiment of the method comprises: acquiring a name text of a target interest point; performing word segmentation on the name text to obtain at least one name keyword; determining category keywords of the target interest point based on similarity between word vectors corresponding to the name keywords in the at least one name keyword and word vectors corresponding to each category keyword in a preset category keyword set; and outputting the category key words of the target interest points. The method and the device for determining the interest point category realize that the interest point category is automatically determined according to the name text of the interest point, and reduce the cost for determining the interest point category.

Description

Method and apparatus for outputting information
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to the technical field of electronic maps, and particularly relates to a method and a device for outputting information.
Background
POI (Point of Interest), is important information in an electronic map. In an electronic map, a POI may be used to characterize a house, a shop, a mailbox, a bus station, a park or a school, etc. Generally, POI data mostly include one or several of name, address, longitude and latitude, and category information, and when a user searches for POI using a search function of an electronic map, the user often needs to search for POI data by category, such as searching for nearby restaurants, and if the POI data does not include category information, the POI data without "restaurant" in the name cannot be presented to the user. Therefore, POI data needs to be category-labeled.
At present, when the category information of the POI is determined according to the name of the POI, a manual marking method is mostly adopted, and the cost is high.
Disclosure of Invention
The embodiment of the application provides a method and a device for outputting information.
In a first aspect, an embodiment of the present application provides a method for outputting information, where the method includes: acquiring a name text of a target interest point; performing word segmentation on the name text to obtain at least one name keyword; determining category keywords of the target interest point based on similarity between word vectors corresponding to the name keywords in the at least one name keyword and word vectors corresponding to each category keyword in a preset category keyword set; and outputting the category key words of the target interest points.
In some embodiments, determining the category keyword of the target interest point based on a similarity between a word vector corresponding to the name keyword in the at least one name keyword and a word vector corresponding to each category keyword in a preset category keyword set includes: determining one name keyword in the at least one name keyword as a target keyword, and determining a word vector corresponding to the target keyword in a preset word vector table as a target word vector, wherein the preset word vector table is used for representing the corresponding relationship between words and the word vector; and determining the category key words with the highest similarity between the corresponding word vectors in the preset category key word set and the target word vectors as the category key words of the target interest points, wherein the word vectors corresponding to the category key words are inquired in a preset word vector table.
In some embodiments, segmenting the name text to obtain at least one name keyword comprises: word segmentation is carried out on the name text based on a preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword, wherein the keyword category comprises: suffix keywords, scope keywords, and core keywords.
In some embodiments, determining one of the at least one name keyword as the target keyword comprises: and determining the suffix keyword in the at least one name keyword as a target keyword in response to determining that the suffix keyword is included in the at least one name keyword and determining that a word vector corresponding to the suffix keyword in the at least one name keyword exists in a preset word vector table.
In some embodiments, determining one of the at least one name keyword as the target keyword further comprises: and determining the scope keyword in the at least one name keyword as a target keyword in response to determining that the suffix keyword is not included in the at least one name keyword and the scope keyword is included in the at least one name keyword, and determining that a word vector corresponding to the scope keyword in the at least one name keyword exists in a preset word vector table.
In some embodiments, determining one of the at least one name keyword as the target keyword further comprises: and determining the core keyword in the at least one name keyword as a target keyword in response to determining that the at least one name keyword includes the core keyword and does not include the suffix keyword and the range keyword, and determining that a word vector corresponding to the core keyword in the at least one name keyword exists in a preset word vector table.
In some embodiments, the preset word vector table is obtained by: obtaining a sample set, wherein each sample comprises a name text of an interest point and a labeling category keyword; performing word segmentation on the name text in each sample in the sample set; and generating a word vector table based on a preset word vector generation method by taking the word segmentation result of the name text in each sample in the sample set and the labeled category keywords as a corpus.
In a second aspect, an embodiment of the present application provides an apparatus for outputting information, including: the acquisition unit is configured to acquire a name text of the target interest point; the word cutting unit is configured for cutting words of the name text to obtain at least one name keyword; the determining unit is configured to determine the category keywords of the target interest point based on the similarity between the word vector corresponding to the name keyword in the at least one name keyword and the word vector corresponding to each category keyword in a preset category keyword set; and the output unit is used for outputting the category key words of the target interest points.
In some embodiments, the determining unit comprises: the first determining module is configured to determine one name keyword in the at least one name keyword as a target keyword, and determine a word vector corresponding to the target keyword in a preset word vector table as a target word vector, wherein the preset word vector table is used for representing a corresponding relationship between words and the word vector; and the second determining module is configured to determine the category keyword with the highest similarity between the corresponding word vector in the preset category keyword set and the target word vector as the category keyword of the target interest point, wherein the word vector corresponding to the category keyword is queried in a preset word vector table.
In some embodiments, the word segmentation unit is further to: word segmentation is carried out on the name text based on a preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword, wherein the keyword category comprises: suffix keywords, scope keywords, and core keywords.
In some embodiments, the first determining module is further to: and determining the suffix keyword in the at least one name keyword as a target keyword in response to determining that the suffix keyword is included in the at least one name keyword and determining that a word vector corresponding to the suffix keyword in the at least one name keyword exists in a preset word vector table.
In some embodiments, the first determining module is further to: and determining the scope keyword in the at least one name keyword as a target keyword in response to determining that the suffix keyword is not included in the at least one name keyword and the scope keyword is included in the at least one name keyword, and determining that a word vector corresponding to the scope keyword in the at least one name keyword exists in a preset word vector table.
In some embodiments, the first determining module is further to: and determining the core keyword in the at least one name keyword as a target keyword in response to determining that the at least one name keyword includes the core keyword and does not include the suffix keyword and the range keyword, and determining that a word vector corresponding to the core keyword in the at least one name keyword exists in a preset word vector table.
In some embodiments, the preset word vector table is obtained by: obtaining a sample set, wherein each sample comprises a name text of an interest point and a labeling category keyword; performing word segmentation on the name text in each sample in the sample set; and generating a word vector table based on a preset word vector generation method by taking the word segmentation result of the name text in each sample in the sample set and the labeled category keywords as a corpus.
In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method as described in any implementation manner of the first aspect.
According to the method and the device for outputting the information, at least one name keyword is obtained by segmenting the name text of the target interest point, the category keywords of the target interest point are determined based on the similarity between the word vector corresponding to the name keyword in the at least one name keyword and the word vector corresponding to each category keyword in the preset category keyword set, and the category keywords of the target interest point are output, so that the category of the interest point is automatically determined according to the name text of the interest point, and the cost for determining the category of the interest point is reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram for one embodiment of a method for outputting information, in accordance with the present application;
FIG. 3 is a flow diagram of yet another embodiment of a method for outputting information according to the present application;
FIG. 4 is a schematic block diagram illustrating one embodiment of an apparatus for outputting information according to the present application;
FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device according to embodiments of the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the present method for outputting information or apparatus for outputting information may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as an electronic map application, a navigation application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as a background server that provides support for electronic map-like applications displayed on the terminal devices 101, 102, 103. The background server may analyze and otherwise process the received data such as the address query request, and feed back a processing result (e.g., electronic map data) to the terminal device.
It should be noted that the method for outputting information provided in the embodiment of the present application is generally performed by the server 105, and accordingly, the apparatus for outputting information is generally disposed in the server 105.
It should be noted that the method for outputting information provided in the embodiment of the present application may be executed by the server 105, and may also be executed by another electronic device (not shown in fig. 1) communicatively connected to the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for outputting information in accordance with the present application is shown. The method for outputting information comprises the following steps:
step 201, obtaining the name text of the target interest point.
In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the method for outputting information operates may locally or remotely acquire the name text of the target point of interest from other electronic devices (for example, a terminal device shown in fig. 1) that are in communication connection with the electronic device.
As an example, when the electronic device locally obtains the name text of the target interest point, the target interest point may be any one of at least one interest point in an electronic map service provided by the electronic device, and this embodiment only takes the target interest point as an example for description, and in practice, the method for outputting information described in this embodiment may be applied to any one of the at least one interest point.
As an example, when the electronic device remotely obtains the name text of the target point of interest from another electronic device (e.g., the terminal device shown in fig. 1) communicatively connected to the electronic device, the other electronic device communicatively connected to the electronic device may be an electronic device (e.g., a terminal device used by a technician) for providing a point of interest name service for an electronic map service provided by the electronic device, and the name text of the target point of interest may be text input by a user received by the other electronic device communicatively connected to the electronic device.
Step 202, performing word segmentation on the name text to obtain at least one name keyword.
In this embodiment, the electronic device (for example, the server shown in fig. 1) may use various word segmentation methods to segment the name text obtained in step 201 to obtain at least one name keyword. Here, a word segmentation method based on character string matching, a word segmentation method based on understanding, and a word segmentation method based on statistics, and other word segmentation methods now known or developed in the future may be employed.
It should be noted that the above word segmentation methods are well-known technologies that are widely researched and applied at present, and are not described herein again.
For example, the name text "a certain technology building" is cut into three name keywords "a certain", "technology" and "building".
Step 203, determining the category keywords of the target interest point based on the similarity between the word vector corresponding to the name keyword in the at least one name keyword and the word vector corresponding to each category keyword in the preset category keyword set.
In this embodiment, the electronic device on which the method for outputting information is executed may determine the category keyword of the target point of interest based on a similarity between a word vector corresponding to a name keyword of the at least one name keyword and a word vector corresponding to each category keyword of a preset category keyword set.
In some optional implementations of this embodiment, step 203 may be performed as follows:
first, for each name keyword of at least one name keyword, a word vector corresponding to the name keyword may be searched in a preset word vector table.
Here, the word vector corresponding to a word is a real number vector representing the word by Distributed Representation (or Distributed Representation). Here, the preset word vector table is used to map words or phrases into real number vectors, and the mapped real number vectors are word vectors. By using a word vector table, it is possible to reduce features in natural language from a high dimensional space of vocabulary size to a relatively low dimensional space. The principle of the weighing word vector table is as follows: the similarity between word vectors of two words with similar semantics should be higher, and vice versa lower. For example, one record in the word vector table may be the word "beijing", and the corresponding word vector is "-0.1654, 0.8764, 0.5364, -0.6354, 0.1645", where the word vector has 5 dimensions, and may have any dimension in practical application, which is not specifically limited in this application.
It should be noted that the preset word vector table may be a correspondence table which is pre-made by a technician based on statistics of a large amount of corpora and the association relations between the words in the corpora and stores the correspondence relations between a plurality of words and word vectors; the preset word vector table can also be obtained by training by a machine learning method based on a corpus, and how to train the word vector table is the prior art widely researched and applied at present, and is not described herein again.
As an example, a sentence library including a large number of sentences and a word included in each sentence may be first acquired; then, for each word included in each sentence in the sentence library, the sentence in the sentence library including the word is obtained, and further, in the sentences, the context word adjacent to the word is obtained, and the word vector of each word is calculated based on the principle that the sum of the association degrees of the word and the context word is maximum.
As an example, a preset type of each sentence, to which each term to be analyzed belongs, in the sentence library, included in the sentence library may also be obtained, so as to obtain a type set corresponding to each term to be analyzed; setting the word vector of each word to be analyzed as a training variable, and establishing a calculation model of the sum of the association degrees of the words to be analyzed as a training model according to the type set and the word vector corresponding to each word to be analyzed; and training the training variables according to the training model based on the principle of maximizing the sum of the association degrees to obtain the word vector of each word to be analyzed.
Optionally, the preset word vector table may be obtained by:
first, a sample set is obtained, wherein each sample comprises a name text of a point of interest and a labeling category keyword.
Second, the word is cut for the name text in each sample in the sample set.
Thirdly, the word segmentation result of the name text in each sample in the sample set and the labeled category keywords are used as a corpus, and a word vector table is generated based on a preset word vector generation method.
Secondly, for each category keyword in the preset category keyword set, a word vector corresponding to the category keyword can be searched in a preset word vector table.
Again, a center vector of the word vector corresponding to each of the at least one name keyword may be determined. That is, the average value of the values of the dimensions of the word vector corresponding to each name keyword is used as the value of the corresponding dimension of the central vector.
Then, the similarity between the central vector and the word vector corresponding to each category keyword in the preset category keyword set can be calculated.
Here, various methods of calculating the similarity between vectors may be employed, which may include, but are not limited to, a similarity based on the euclidean distance, a similarity based on the manhattan distance, a similarity based on the chebyshev distance, a similarity based on the minkowski distance, a similarity based on the standard euclidean distance, a similarity based on the mahalanobis distance, and a cosine similarity, and a similarity calculation method between vectors that is now known or will be developed in the future.
And finally, determining the category keyword with the maximum similarity between the corresponding word vector in the preset category keyword set and the central vector as the category keyword for determining the target interest point.
In some optional implementations of this embodiment, step 203 may further be performed as follows:
first, one of the at least one name keyword may be determined as a target keyword.
For example, any one of the at least one name keyword may be determined as the target keyword.
Secondly, the word vector corresponding to the target keyword in the preset word vector table can be determined as the target word vector.
Here, the preset word vector table is used to represent the correspondence between words and word vectors.
And thirdly, for each category keyword in the preset category keyword set, searching a word vector corresponding to the category keyword in the preset word vector table.
Then, the similarity between the target word vector and the word vector corresponding to each category keyword in the preset category keyword set can be calculated.
And finally, determining the category keyword with the highest similarity between the corresponding word vector in the preset category keyword set and the target word vector as the category keyword of the target interest point.
And step 204, outputting the category key words of the target interest points.
In this embodiment, the electronic device may output the category keyword of the target interest point determined in step 203 to another electronic device local to the electronic device or in communication connection with the electronic device.
In some optional implementation manners of this embodiment, when the electronic device locally obtains the name text of the target interest point in step 201, and the target interest point may be any one of at least one interest point in an electronic map service provided by the electronic device, the electronic device may output the category keyword of the target interest point determined in step 203 to another function module locally run by the electronic device, for example, the category keyword may be a function module that specifies a category for the target interest point.
In some optional implementation manners of this embodiment, the electronic device may also output the category keyword of the target interest point determined in step 203 to an electronic device (for example, an electronic map data storage server) in communication connection with the electronic device and configured to provide a category calibration service for an electronic map service provided by the electronic device, so that the electronic device providing the category calibration service for the electronic map service provided by the electronic device can calibrate the category of the target interest point.
The method provided by the embodiment of the application obtains at least one name keyword by segmenting the name text of the target interest point, determines the category keyword of the target interest point based on the similarity between the word vector corresponding to the name keyword in the at least one name keyword and the word vector corresponding to each category keyword in the preset category keyword set, and outputs the category keyword of the target interest point, thereby automatically determining the category of the interest point according to the name text of the interest point, and reducing the cost for determining the category of the interest point.
With further reference to fig. 3, a flow 300 of yet another embodiment of a method for outputting information is shown. The process 300 of the method for outputting information includes the steps of:
step 301, obtaining a name text of the target interest point.
In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the method for outputting information operates may locally or remotely acquire the name text of the target point of interest from other electronic devices (for example, a terminal device shown in fig. 1) that are in communication connection with the electronic device.
Step 302, performing word segmentation on the name text based on a preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword.
In this embodiment, the electronic device may segment the name text based on a preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword.
Here, the keyword categories may include: suffix keywords, scope keywords, and core keywords. The suffix keywords, the range keywords and the core keywords are arranged in the order of the range from large to small for representing the range of the interest points. That is, the number of the interest points that can be found by the user in the electronic map service by searching the suffix keyword is greater than the number of the interest points that can be found by the user in the electronic map service by searching the range keyword, and the number of the interest points that can be found by the user in the electronic map service by searching the range keyword is greater than the number of the interest points that can be found by the user in the electronic map service by searching the core keyword.
For example, by segmenting the name text "quinke science and technology building", the name keywords "quinke", "science and technology" and "building" can be obtained, where "building" is a suffix keyword, "science and technology" is a range keyword, and "quinke" is a core keyword. The number of the interest points which can be searched by the user in the electronic map service for the 'building' is larger than that of the interest points which can be searched by the user in the electronic map service for the 'science and technology', and the number of the interest points which can be searched by the user in the electronic map service for the 'science and technology' is larger than that of the interest points which can be searched by the user in the electronic map service for the 'quiz department'.
Here, the preset corpus may be composed of a large number of names of points of interest in reality.
In some optional implementations of this embodiment, step 302 may proceed as follows:
firstly, word segmentation can be performed on the name text based on various preset word segmentation methods to obtain a word segmentation sequence, wherein the word segmentation sequence is composed of at least one name keyword arranged in sequence.
Then, for each name keyword in the word cutting sequence, the keyword category of the name keyword can be determined according to the position of the name keyword in the word cutting sequence.
As an example, for each name keyword in the word cutting sequence, according to a preset rule, the keyword category of the name keyword may be determined according to the position of the name keyword in the word cutting sequence. For example, the first name keyword from the right in the word segmentation sequence may be determined as a suffix keyword, the first name keyword from the left in the word segmentation sequence may be determined as a core keyword, and the other name keywords except the first name keyword from the left and the first name keyword from the right in the word segmentation sequence may be determined as range keywords.
In some optional implementations of this embodiment, step 302 may also be performed as follows:
firstly, word segmentation can be performed on the name text based on various preset word segmentation methods to obtain a word segmentation sequence, wherein the word segmentation sequence is composed of at least one name keyword arranged in sequence.
Then, for each name keyword in the word segmentation sequence, a keyword category matched with the name keyword can be searched in a preset keyword category table, and the searched keyword category is determined as the keyword category corresponding to the name keyword.
Here, the preset keyword category table is used to represent the correspondence between the keywords and the keyword categories. In practice, the preset keyword category table may be a correspondence table in which a large number of keywords and corresponding keyword categories are stored, which is specified in advance by a technician based on suffix keywords, range keywords, and core keywords among names of a large number of points of interest. It is known from practical experience that core keywords are often unique keywords, and enumerating all the core keywords causes a preset keyword category table to occupy too much space, and for this reason, only two keyword categories, namely a suffix keyword and a range keyword, may be stored in the preset keyword category table, and a word not in the preset keyword table is determined as a core keyword.
Step 303, determining whether the at least one name keyword includes a suffix keyword and a word vector corresponding to the suffix keyword in the at least one name keyword exists in a preset word vector table.
As known from practical experience, suffix keywords in the name texts of the interest points are close to the categories of the interest points to a greater extent than scope keywords, and scope keywords are close to the categories of the interest points to a greater extent than core keywords, so that the target keywords can be selected from the scope keywords to the scope keywords in the sequence from the suffix keywords to the core keywords.
That is, in this embodiment, after segmenting the name text based on the preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword, the electronic device may determine whether the at least one name keyword includes a suffix keyword and a word vector corresponding to the suffix keyword in the at least one name keyword exists in the preset word vector table, if so, go to step 304, and if not, go to step 305. The preset word vector table is used for representing the corresponding relation between words and word vectors. For the explanation of the preset word vector table, refer to the related description in the embodiment shown in fig. 2, which is not repeated herein.
Step 304, determining a suffix keyword in the at least one name keyword as a target keyword.
Here, the electronic device may determine, in step 303, that the suffix keyword is included in the at least one name keyword and a word vector corresponding to the suffix keyword among the at least one name keyword exists in the preset word vector table, determine the suffix keyword among the at least one name keyword as the target keyword, and go to step 309 after step 304 is executed.
Step 305, determining whether the at least one name keyword includes a range keyword and a word vector corresponding to the range keyword in the at least one name keyword exists in a preset word vector table.
Here, the electronic device may determine, in step 303, whether a range keyword is included in the at least one name keyword and a word vector corresponding to the range keyword in the at least one name keyword exists in the preset word vector table in a case where it is determined that the suffix keyword is not included in the at least one name keyword, or the suffix keyword is included in the at least one name keyword but the word vector corresponding to the suffix keyword in the at least one name keyword does not exist in the at least one name keyword in the preset word vector table, and if so, go to step 306, and if not, go to step 307.
Step 306, determining a scope keyword in the at least one name keyword as a target keyword.
Here, the electronic device may determine the scope keyword of the at least one name keyword as the target keyword in a case where it is determined in step 305 that the scope keyword is included in the at least one name keyword and a word vector corresponding to the scope keyword of the at least one name keyword exists in a preset word vector table. After step 306 is executed, the flow goes to step 309.
Step 307, determining whether the at least one name keyword includes a core keyword and a word vector corresponding to the core keyword in the at least one name keyword exists in a preset word vector table.
Here, the electronic device may determine whether a core keyword is included in the at least one name keyword and a word vector corresponding to the core keyword among the at least one name keyword exists in the preset word vector table in a case where it is determined that the range keyword is not included in the at least one name keyword or the range keyword is included in the at least one name keyword but the word vector corresponding to the range keyword among the at least one name keyword does not exist in the at least one name keyword in step 305. If yes, go to step 308, if no, end.
Step 308, determining a core keyword in the at least one name keyword as a target keyword.
Here, the electronic device may determine, in step 307, that the at least one name keyword includes a core keyword and a word vector corresponding to the core keyword of the at least one name keyword exists in a preset word vector table, the core keyword of the at least one name keyword is determined as the target keyword. After step 308 is executed, the flow goes to step 309.
Step 309, determining the word vector corresponding to the target keyword in the preset word vector table as the target word vector.
Step 310, determining the category keyword with the highest similarity between the corresponding word vector in the preset category keyword set and the target word vector as the category keyword of the target interest point.
Here, the word vectors corresponding to the category keywords are found in the preset word vector table.
And 311, outputting the category key words of the target interest points.
Here, the specific operation of step 311 is substantially the same as the specific operation of step 204 in the embodiment shown in fig. 2, and is not described again here.
As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the method for outputting information in the present embodiment highlights the step of selecting the target keyword in the order of the suffix keyword, the scope keyword, and the core keyword. Therefore, the scheme described in the embodiment can be fit to the real situation, so that the accuracy of determining the interest point category according to the name of the interest point is further improved.
With further reference to fig. 4, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for outputting information, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 4, the apparatus 400 for outputting information of the present embodiment includes: an acquisition unit 401, a word segmentation unit 402, a determination unit 403, and an output unit 404. The obtaining unit 401 is configured to obtain a name text of a target interest point; a word segmentation unit 402 configured to segment words of the name text to obtain at least one name keyword; a determining unit 403, configured to determine a category keyword of the target interest point based on a similarity between a word vector corresponding to a name keyword in the at least one name keyword and a word vector corresponding to each category keyword in a preset category keyword set; the output unit 404 is configured to output the category keyword of the target interest point.
In this embodiment, specific processes of the obtaining unit 401, the word segmentation unit 402, the determining unit 403, and the output unit 404 of the apparatus 400 for outputting information and technical effects thereof may refer to related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, which are not described herein again.
In some optional implementations of this embodiment, the determining unit 403 may include: a first determining module 4031 configured to determine one of the at least one name keyword as a target keyword, and determine a word vector corresponding to the target keyword in a preset word vector table as a target word vector, where the preset word vector table is used to represent a corresponding relationship between words and word vectors; a second determining module 4032, configured to determine a category keyword with the highest similarity between the corresponding word vector in the preset category keyword set and the target word vector as the category keyword of the target interest point, where the word vector corresponding to the category keyword is found in the preset word vector table.
In some optional implementations of this embodiment, the word segmentation unit 402 may be further configured to: word segmentation is carried out on the name text based on a preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword, wherein the keyword category comprises: suffix keywords, scope keywords, and core keywords.
In some optional implementations of this embodiment, the first determining module 4031 may be further configured to: and determining a suffix keyword in the at least one name keyword as a target keyword in response to determining that the at least one name keyword includes the suffix keyword and determining that a word vector corresponding to the suffix keyword in the at least one name keyword exists in the preset word vector table.
In some optional implementations of this embodiment, the first determining module 4031 may be further configured to: and determining the range keyword in the at least one name keyword as a target keyword in response to determining that the suffix keyword is not included in the at least one name keyword and the range keyword is included in the at least one name keyword, and determining that a word vector corresponding to the range keyword in the at least one name keyword exists in the preset word vector table.
In some optional implementations of this embodiment, the first determining module 4031 may be further configured to: and determining the core keyword in the at least one name keyword as a target keyword in response to determining that the at least one name keyword includes the core keyword and does not include the suffix keyword and the range keyword, and determining that a word vector corresponding to the core keyword in the at least one name keyword exists in the preset word vector table.
In some optional implementation manners of this embodiment, the preset word vector table may be obtained by: obtaining a sample set, wherein each sample comprises a name text of an interest point and a labeling category keyword; performing word segmentation on the name text in each sample in the sample set; and generating a word vector table based on a preset word vector generation method by taking the word segmentation result of the name text in each sample in the sample set and the labeled category keywords as a corpus.
It should be noted that, for details of implementation and technical effects of each unit in the apparatus for outputting information provided in the embodiment of the present application, reference may be made to descriptions of other embodiments in the present application, and details are not described herein again.
Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An Input/Output (I/O) interface 505 is also connected to bus 504.
The following components are connected to the I/O interface 505: a storage section 506 including a hard disk and the like; and a communication section 507 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 507 performs communication processing via a network such as the internet. The driver 508 is also connected to the I/O interface 505 as necessary. A removable medium 509 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 508 as necessary, so that a computer program read out therefrom is mounted into the storage section 506 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 507 and/or installed from the removable medium 509. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 501. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a word segmentation unit, a determination unit, and an output unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the obtaining unit may also be described as a "unit that obtains the name text of the target point of interest".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a name text of a target interest point; performing word segmentation on the name text to obtain at least one name keyword; determining category keywords of the target interest point based on similarity between word vectors corresponding to the name keywords in the at least one name keyword and word vectors corresponding to each category keyword in a preset category keyword set; and outputting the category key words of the target interest points.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (16)

1. A method for outputting information, comprising:
acquiring a name text of a target interest point;
performing word segmentation on the name text to obtain at least one name keyword, wherein the keyword category corresponding to the name keyword comprises: suffix keywords, scope keywords, and core keywords;
determining the category keywords of the target interest point based on the similarity between the word vector corresponding to the name keyword in the at least one name keyword and the word vector corresponding to each category keyword in a preset category keyword set, including: determining one name keyword in the at least one name keyword as a target keyword, and selecting the target keyword from a range keyword to a core keyword according to the sequence from a suffix keyword to the range keyword, wherein word vectors corresponding to the name keyword and word vectors corresponding to each category keyword are obtained by querying through a preset word vector table;
and outputting the category key words of the target interest points.
2. The method of claim 1, wherein the determining the category keyword of the target point of interest based on similarity between a word vector corresponding to a name keyword of the at least one name keyword and a word vector corresponding to each category keyword of a preset category keyword set comprises:
determining a word vector corresponding to the target keyword in a preset word vector table as a target word vector, wherein the preset word vector table is used for representing a corresponding relation between words and the word vector;
and determining the category key word with the highest similarity between the corresponding word vector in the preset category key word set and the target word vector as the category key word of the target interest point, wherein the word vector corresponding to the category key word is inquired in the preset word vector table.
3. The method of claim 2, wherein said segmenting said name text to obtain at least one name keyword comprises:
and performing word segmentation on the name text based on a preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword.
4. The method of claim 3, wherein the determining one of the at least one name keyword as a target keyword comprises:
and determining a suffix keyword in the at least one name keyword as a target keyword in response to determining that the at least one name keyword includes the suffix keyword and determining that a word vector corresponding to the suffix keyword in the at least one name keyword exists in the preset word vector table.
5. The method of claim 4, wherein the determining one of the at least one name keyword as a target keyword further comprises:
and in response to determining that the suffix keyword is not included in the at least one name keyword and the at least one name keyword includes a range keyword, and determining that a word vector corresponding to the range keyword in the at least one name keyword exists in the preset word vector table, determining the range keyword in the at least one name keyword as a target keyword.
6. The method of claim 5, wherein the determining one of the at least one name keyword as a target keyword further comprises:
and in response to determining that the at least one name keyword includes a core keyword and does not include a suffix keyword and a range keyword, and determining that a word vector corresponding to the core keyword of the at least one name keyword exists in the preset word vector table, determining the core keyword of the at least one name keyword as a target keyword.
7. The method according to any one of claims 1-6, wherein the preset word vector table is obtained by:
obtaining a sample set, wherein each sample comprises a name text of an interest point and a labeling category keyword;
segmenting the name text in each sample in the sample set;
and generating a word vector table based on a preset word vector generation method by taking the word segmentation result of the name text in each sample in the sample set and the labeled category keywords as a corpus.
8. An apparatus for outputting information, comprising:
the acquisition unit is configured to acquire a name text of the target interest point;
a word cutting unit configured to cut words of the name text to obtain at least one name keyword, wherein the keyword category corresponding to the name keyword includes: suffix keywords, scope keywords, and core keywords;
a determining unit, configured to determine the category keyword of the target interest point based on a similarity between a word vector corresponding to a name keyword in the at least one name keyword and a word vector corresponding to each category keyword in a preset category keyword set, including: determining one name keyword in the at least one name keyword as a target keyword, and selecting the target keyword from a range keyword to a core keyword according to the sequence from a suffix keyword to the range keyword, wherein word vectors corresponding to the name keyword and word vectors corresponding to each category keyword are obtained by querying through a preset word vector table;
and the output unit is configured to output the category key words of the target interest points.
9. The apparatus of claim 8, wherein the determining unit comprises:
the first determining module is configured to determine a word vector corresponding to the target keyword in a preset word vector table as a target word vector, wherein the preset word vector table is used for representing a corresponding relationship between words and the word vector;
and the second determining module is configured to determine the category keyword with the highest similarity between the corresponding word vector in the preset category keyword set and the target word vector as the category keyword of the target interest point, wherein the word vector corresponding to the category keyword is queried in the preset word vector table.
10. The apparatus of claim 9, wherein the word-cutting unit is further configured to:
and performing word segmentation on the name text based on a preset corpus to obtain at least one name keyword and a keyword category corresponding to each name keyword.
11. The apparatus of claim 10, wherein the first determining means is further for:
and determining a suffix keyword in the at least one name keyword as a target keyword in response to determining that the at least one name keyword includes the suffix keyword and determining that a word vector corresponding to the suffix keyword in the at least one name keyword exists in the preset word vector table.
12. The apparatus of claim 11, wherein the first determining means is further for:
and in response to determining that the suffix keyword is not included in the at least one name keyword and the at least one name keyword includes a range keyword, and determining that a word vector corresponding to the range keyword in the at least one name keyword exists in the preset word vector table, determining the range keyword in the at least one name keyword as a target keyword.
13. The apparatus of claim 12, wherein the first determining means is further for:
and in response to determining that the at least one name keyword includes a core keyword and does not include a suffix keyword and a range keyword, and determining that a word vector corresponding to the core keyword of the at least one name keyword exists in the preset word vector table, determining the core keyword of the at least one name keyword as a target keyword.
14. The apparatus according to any one of claims 8-13, wherein the preset word vector table is obtained by:
obtaining a sample set, wherein each sample comprises a name text of an interest point and a labeling category keyword;
segmenting the name text in each sample in the sample set;
and generating a word vector table based on a preset word vector generation method by taking the word segmentation result of the name text in each sample in the sample set and the labeled category keywords as a corpus.
15. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-7.
16. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.
CN201810231488.7A 2018-03-20 2018-03-20 Method and apparatus for outputting information Active CN108491387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810231488.7A CN108491387B (en) 2018-03-20 2018-03-20 Method and apparatus for outputting information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810231488.7A CN108491387B (en) 2018-03-20 2018-03-20 Method and apparatus for outputting information

Publications (2)

Publication Number Publication Date
CN108491387A CN108491387A (en) 2018-09-04
CN108491387B true CN108491387B (en) 2022-04-22

Family

ID=63318928

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810231488.7A Active CN108491387B (en) 2018-03-20 2018-03-20 Method and apparatus for outputting information

Country Status (1)

Country Link
CN (1) CN108491387B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213916A (en) * 2018-09-14 2019-01-15 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN115168711A (en) * 2019-11-18 2022-10-11 百度在线网络技术(北京)有限公司 Interest point selection method and device, electronic equipment and storage medium
CN113255398B (en) * 2020-02-10 2023-08-18 百度在线网络技术(北京)有限公司 Point of interest weight judging method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509233A (en) * 2011-11-29 2012-06-20 汕头大学 User online action information-based recommendation method
CN103577423A (en) * 2012-07-23 2014-02-12 阿里巴巴集团控股有限公司 Keyword classification method and system
CN104090890A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method, device and server for obtaining similarity of key words
CN104572645A (en) * 2013-10-11 2015-04-29 高德软件有限公司 Method and device for POI (Point Of Interest) data association
CN104915453A (en) * 2015-07-01 2015-09-16 北京奇虎科技有限公司 Method, device and system for classifying POI information

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509233A (en) * 2011-11-29 2012-06-20 汕头大学 User online action information-based recommendation method
CN103577423A (en) * 2012-07-23 2014-02-12 阿里巴巴集团控股有限公司 Keyword classification method and system
CN104572645A (en) * 2013-10-11 2015-04-29 高德软件有限公司 Method and device for POI (Point Of Interest) data association
CN104090890A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method, device and server for obtaining similarity of key words
CN104915453A (en) * 2015-07-01 2015-09-16 北京奇虎科技有限公司 Method, device and system for classifying POI information

Also Published As

Publication number Publication date
CN108491387A (en) 2018-09-04

Similar Documents

Publication Publication Date Title
CN107491547B (en) Search method and device based on artificial intelligence
US11698261B2 (en) Method, apparatus, computer device and storage medium for determining POI alias
CN107679039B (en) Method and device for determining statement intention
CN107273503B (en) Method and device for generating parallel text in same language
CN109299320B (en) Information interaction method and device, computer equipment and storage medium
CN111898643B (en) Semantic matching method and device
CN111709240A (en) Entity relationship extraction method, device, equipment and storage medium thereof
US9588941B2 (en) Context-based visualization generation
CN108121699B (en) Method and apparatus for outputting information
CN108491387B (en) Method and apparatus for outputting information
CN111522927A (en) Entity query method and device based on knowledge graph
CN110737824B (en) Content query method and device
CN111488742A (en) Method and device for translation
JP2020071839A (en) Search device, search method, search program, and recording medium
CN110634050B (en) Method, device, electronic equipment and storage medium for identifying house source type
CN110059172B (en) Method and device for recommending answers based on natural language understanding
CN110895587B (en) Method and device for determining target user
CN112182255A (en) Method and apparatus for storing media files and for retrieving media files
CN111026849B (en) Data processing method and device
CN111125550A (en) Interest point classification method, device, equipment and storage medium
CN109710634B (en) Method and device for generating information
CN111191107B (en) System and method for recalling points of interest using annotation model
CN108920707B (en) Method and device for labeling information
WO2016155384A1 (en) Search optimization method, apparatus, and system
CN116662495A (en) Question-answering processing method, and method and device for training question-answering processing model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant