CN106933797B - Target information generation method and device - Google Patents

Target information generation method and device Download PDF

Info

Publication number
CN106933797B
CN106933797B CN201511017033.8A CN201511017033A CN106933797B CN 106933797 B CN106933797 B CN 106933797B CN 201511017033 A CN201511017033 A CN 201511017033A CN 106933797 B CN106933797 B CN 106933797B
Authority
CN
China
Prior art keywords
information
text content
points
initial text
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201511017033.8A
Other languages
Chinese (zh)
Other versions
CN106933797A (en
Inventor
郭勇刚
何伟平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunxing Software Technology Co.,Ltd.
Original Assignee
Beijing Qu Na Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qu Na Information Technology Co Ltd filed Critical Beijing Qu Na Information Technology Co Ltd
Priority to CN201511017033.8A priority Critical patent/CN106933797B/en
Publication of CN106933797A publication Critical patent/CN106933797A/en
Application granted granted Critical
Publication of CN106933797B publication Critical patent/CN106933797B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries

Abstract

The invention discloses a method and a device for generating target information. Wherein, the method comprises the following steps: acquiring initial text content; performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; and extracting the plurality of information points through a preset extraction algorithm to generate target information. The invention solves the technical problem of low efficiency of generating the travel product information caused by the fact that the existing travel product information needs to be manually screened and generated aiming at a large amount of text contents, realizes the automatic extraction of important information of products from the existing product information, reduces the manual input time and the manual input error rate, and improves the experience of users for acquiring the travel product information.

Description

Target information generation method and device
Technical Field
The invention relates to the field of computers, in particular to a method and a device for generating target information.
Background
In the prior art, the information of the travel vacation products is generated by manual input of a worker at the background, namely the information of the travel vacation is screened out from a large amount of text contents, the screened information is manually input to the background of the client, and the user can browse the information of the travel vacation products through the foreground of the client or a search engine.
It should be noted here that manual entry of information is a corresponding tedious and error-prone matter, and particularly when the number of products is large, the entry workload is very large, and the efficiency of travel information entry is low.
Aiming at the problem that the efficiency of generating the travel product information is low because the existing travel product information needs to be manually screened and generated aiming at a large amount of text contents, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the invention provides a method and a device for generating target information, which are used for at least solving the technical problem that the efficiency of generating the information of a tourist product is low because the information of the tourist product needs to be manually screened and generated aiming at a large amount of text contents.
According to an aspect of the embodiments of the present invention, there is provided a method for generating target information, including: acquiring initial text content; performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; and extracting the plurality of information points through a preset extraction algorithm to generate target information.
According to another aspect of the embodiments of the present invention, there is also provided a device for generating target information, including: an acquisition unit configured to acquire an initial text content; the processing unit is used for extracting information points from the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; and the extraction unit is used for extracting the plurality of information points through a preset extraction algorithm to generate target information.
In the embodiment of the invention, the method comprises the steps of obtaining initial text content; performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; the information points are extracted through the preset extraction algorithm to generate the target information, the technical problem that the generation efficiency of the travel product information is low due to the fact that the existing travel product information needs to be manually screened and generated aiming at a large amount of text contents is solved, the important information of the product is automatically extracted from the existing product information, the manual input time is shortened, the manual input error rate is reduced, and therefore the experience of a user for acquiring the travel product information is improved
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a flow chart of a method of generating target information according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a target information generation apparatus according to an embodiment of the present invention; and
fig. 3 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
In accordance with an embodiment of the present invention, there is provided an embodiment of a method for generating target information, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that herein.
Fig. 1 is a flowchart of a method for generating target information according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:
in step S12, the initial text content is acquired.
Specifically, in the present solution, an initial text content may be obtained through the acquisition terminal, where the initial text content may be basic description information of a travel vacation product, and the basic description information may include: it should be noted that the basic description information includes a large amount of useless information, such as the title, the feature, and the description of the trip.
Step S14, performing information point extraction processing on the initial text content according to a preset word segmentation dictionary, and generating a plurality of information points.
Specifically, in this scheme, the processing terminal may perform extraction processing on the basic description information according to a preset word segmentation dictionary, where the extraction includes word segmentation extraction and feature value extraction to generate a plurality of information points, and it should be noted that the information points are configured to: the segmentation and the feature value of the segmentation.
And step S16, extracting the plurality of information points through a preset extraction algorithm to generate target information.
Specifically, in this scheme, the processing terminal may extract the plurality of information points through a preset algorithm to generate the target information, which is travel product information, where it is to be noted that the target information may be destination, hotel, shopping, traffic information, and the like.
The embodiment obtains the initial text content; performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; and extracting the plurality of information points through a preset algorithm to generate target information. It is easy to notice that in this embodiment, only need obtain basic description information, processing terminal can extract basic description information automatically to generate tourism vacation product information, great saving the time of type-in, also can avoid because the mistake that the work load leads to is type-in greatly, consequently, this embodiment has solved current tourism product information and has needed the manual work to filter the generation to a large amount of text contents, leads to the technical problem that the inefficiency of tourism product information generation. The method and the device have the advantages that the important information of the product is automatically extracted from the existing product information, the manual input time is shortened, the manual input error rate is reduced, and therefore the experience of a user for acquiring the travel product information is improved.
Optionally, before the step S12, acquiring the initial text content, the method provided in this embodiment may further include:
step S10, creating a word segmentation dictionary according to the travel vocabulary database, wherein the word segmentation dictionary comprises a plurality of travel product vocabularies and the characteristics of the travel product vocabularies.
Specifically, in the present solution, the travel vocabulary database may be an information knowledge base constructed by information of the existing travel industry and corresponding product information, and the present solution may utilize a word segmentation tool to construct the word segmentation dictionary according to the information knowledge base.
Optionally, in step S14, the step of performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points may include:
step S141 is performed to segment the initial text content to generate a plurality of sub-initial text contents.
Specifically, in this scheme, the basic description information may be divided, for example, the basic description information may be segmented or divided into sentences, so as to generate the plurality of sub-initial text contents (for example, a plurality of segments or a plurality of clauses).
Step S142, using a plurality of tourism product vocabularies to sequentially perform word segmentation processing and feature extraction processing on each sub-initial text content to generate a plurality of information points, wherein each information point at least comprises: the segmentation and the feature value of the segmentation.
Specifically, in the scheme, word segmentation processing and feature extraction can be performed on the text data of the basic information of the product through a plurality of tourism product vocabularies in the word segmentation dictionary, so that a plurality of information points are obtained.
It should be noted that the present solution may perform word segmentation on each of the sub-initial text contents through the KMP algorithm to obtain all the mentioned information in the product and the features of the product.
Optionally, the preset decimation algorithm may be an area algorithm, and the area algorithm is used for the preset decimation algorithm
In step S16, the step of extracting the plurality of information points by using a preset extraction algorithm to generate the target information may include the following steps:
step S1601, respectively counting the occurrence frequency of the first information point in the plurality of information points in each sub-initial text content.
Specifically, the present solution may randomly select an information point, that is, the information point, and then count the frequency of occurrence of the information point in each paragraph and sentence.
Step S1602, calculating a decreasing rate of the occurrence frequency of the first information point according to the occurrence frequency of the first information point in each sub-initial text content.
In step S1603, the first information point is determined to be the target information when the falling rate does not exceed the first threshold.
Specifically, in the present embodiment, a descending rate of the first information point in each paragraph and sentence may be calculated, and when the descending rate does not exceed a first threshold, it is determined that the first information point is the main information of the travel product, and the present embodiment determines the first information point as the target information.
In a preferred embodiment, the scheme can extract the information points related to the travel products through an area algorithm, that is, if the sentence and paragraph in all the description information are used as the area for measuring the space of the described product information, if the frequency of the information point appearing in the first area is a, the frequency of occurrence in the second area is b, the area reduction rate is q ═ a-b)/b, the area reduction rate can be used to find the zone boundaries of the information points, the area principal algorithm may sort the areas of all information points in descending order, then find the boundary from large to small in area, calculating the corresponding area descending speed according to the characteristic accumulation area of the information in the searching process, and when the descending rate and the accumulated area are larger than the set threshold value, stopping searching, wherein the accumulated area is the corresponding main information of the product, namely the target information.
Alternatively, in step S16, the extracting the plurality of information points by the preset extracting algorithm, and the generating the target information may include the following steps:
in step S1604, in a case where the feature value of a first information point of the plurality of information points exceeds a second threshold and/or a text content associated with the first information point is included in the initial text content, the first information point is determined to be target information.
In particular, in the present solution, isolated and occasionally mentioned information points may be filtered by a plaintext rule algorithm: when the characteristic value of the first information point exceeds a second threshold value, the first information point is indicated to be small in characteristic value and isolated, and if relevant information of the first information point is described by corresponding characters in the context of the initial text content, the first information point is considered to be mentioned unintentionally and not belonging to main information of the travel product, namely the target information.
Alternatively, in step S16, the extracting the plurality of information points by the preset extracting algorithm, and the generating the target information may include the following steps:
step S1605, filtering the plurality of information points in the initial text content according to the preset standard information point database, and determining the plurality of information points contained in the standard information point database as the target information.
Specifically, in the present solution, the unreliable information points may be filtered through a preset standard information point database, that is, a plurality of information points included in the standard information point database are determined as target information.
It should be noted that, in the present solution, analog or non-true information points may be filtered through a semantic annotation algorithm: the semantic annotation algorithm is to adopt artificial knowledge to label a large number of existing product lines and record final results, use the data as training data to perform machine learning model training, use the trained model to perform data processing on the current product data, and filter information points with unreliable similar labels.
Alternatively, in step S16, the extracting the plurality of information points by the preset extracting algorithm, and the generating the target information may include the following steps:
in step S1606, the distances between the first information point and the other information points in the plurality of information points are obtained.
Step S1607, when the distance does not exceed the second threshold, determining the first information point as the target information.
Specifically, in the scheme, unreliable information points in the plurality of information points can be filtered through the distances between the information points, and if the difference between the distance of one information point and the distances of all other information points exceeds a certain threshold value, namely the first threshold value, the information point belongs to the product information.
It should be noted that, according to the present disclosure, unreliable information points in the plurality of information points may also be filtered through a region calculation method, and all the characteristics of the information points are in the same region, and only a few information points are not in the region, and the few information points are not in the same region, so that the few information points are excluded.
Alternatively, in step S16, the extracting the plurality of information points by the preset extracting algorithm, and the generating the target information may include the following steps:
in step S1608, the probability that the first information point of the plurality of information points and the other plurality of information points appear in the preset text content together is calculated.
Step S1609, determining the first information point as the target information when the probability exceeds the third threshold.
Specifically, in the scheme, information points with a smaller co-occurrence probability of the information product may be filtered through a co-occurrence relation algorithm, and information points with a larger co-occurrence probability (i.e., exceeding a third threshold) are determined as target information. It should be noted that, the co-occurrence relationship algorithm is to calculate the probability of the co-occurrence of different information points in the same product through the existing product information and a statistical method, and use these probabilities to guide whether the information points in one product are available, for example, if the probability of co-occurrence of the information point a and the information points B and C is relatively high, it is considered reasonable if a, B, and C occur simultaneously in the product. If the probability of co-occurrence of A and B, C is small, if A, B and C occur in the product at the same time, the result is considered unreasonable, and at the moment, A needs to be filtered to achieve the effect of reasonable product information.
Optionally, in step S16, after the multiple information points are extracted by the preset extraction algorithm to generate the target information, the method provided in this embodiment may further include:
in step S17, target information is sent to the search engine, wherein the target information at least includes: destination, hotel, shopping, and traffic information.
Specifically, the scheme can provide the extracted information point data (target information) to a search engine to provide a search basis for a user.
Preferably, the information point data of the product can also be directly displayed on the processing terminal to provide reference for the user.
In summary, in the embodiment, by acquiring the basic information of the product, performing word segmentation and feature extraction on the basic information of the product through the accumulated information knowledge base, acquiring all information points and feature values of the product, and extracting information points related to the product by analyzing the product and using an extraction algorithm (information area algorithm, plaintext rule algorithm, semantic annotation algorithm, distance calculation algorithm, area range algorithm, co-occurrence relation algorithm), the user can refer to and search conveniently, the user experience is improved, and the entry cost of a supplier is reduced.
Example two
The present application also provides an apparatus for generating target information, which may be configured to execute the method for generating target information, as shown in fig. 2, the apparatus may include: an acquisition unit 20 configured to acquire an initial text content; the processing unit 22 is configured to perform information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; and the extracting unit 24 is used for extracting the plurality of information points through a preset extracting algorithm to generate target information.
The embodiment obtains the initial text content; performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; and extracting the plurality of information points through a preset algorithm to generate target information. It is easy to notice that in this embodiment, only need obtain basic description information, processing terminal can extract basic description information automatically to generate tourism vacation product information, great saving the time of type-in, also can avoid because the mistake that the work load leads to is type-in greatly, consequently, this embodiment has solved current tourism product information and has needed the manual work to filter the generation to a large amount of text contents, leads to the technical problem that the inefficiency of tourism product information generation.
Optionally, the apparatus may further include: and the creating unit is used for creating a word segmentation dictionary according to the travel vocabulary database, wherein the word segmentation dictionary comprises a plurality of travel product vocabularies.
Optionally, the processing unit may include: the first processing module is used for carrying out segmentation processing on the initial text content to generate a plurality of sub-initial text contents; the second processing module is used for carrying out word segmentation processing and feature extraction processing on each sub-initial text content in sequence by using a plurality of tourism product vocabularies to generate a plurality of information points, wherein each information point at least comprises: the segmentation and the feature value of the segmentation.
Alternatively, the extracting unit may include: the statistical module is used for respectively counting the occurrence frequency of a first information point in the plurality of information points in each sub-initial text content; the first calculating module is used for calculating the descending rate of the appearance frequency of the first information point according to the appearance frequency of the first information point in each sub-initial text content; and the first determining module is used for determining the first information point as the target information under the condition that the descending speed does not exceed the first threshold value.
Alternatively, the extracting unit may include: and the second determining module is used for determining the first information point as the target information under the condition that the characteristic value of the first information point in the plurality of information points exceeds a second threshold value and/or the text content associated with the first information point is contained in the initial text content.
Alternatively, the extracting unit may include: and the filtering module is used for filtering the plurality of information points in the initial text content according to a preset standard information point database and determining the plurality of information points contained in the standard information point database as target information.
Optionally, the extraction unit may further include: the acquisition module is used for acquiring the distance between a first information point in the plurality of information points and other information points; and the third determining module is used for determining the first information point as the target information under the condition that the distance does not exceed the second threshold.
Optionally, the extraction unit may further include: the second calculation module is used for calculating the probability that the first information point in the plurality of information points and other information points appear in the preset text content together; and the fourth determining module is used for determining the first information point as the target information under the condition that the probability exceeds the third threshold.
Optionally, the apparatus may further include: a sending unit, configured to send target information to a search engine, where the target information at least includes: destination, hotel, shopping, and traffic information.
EXAMPLE III
The present application also provides a server, as shown in fig. 3, the server may include:
a receiving end 30, configured to receive an initial text content; the processor 32 is configured to perform information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points, and extract the plurality of information points through a preset extraction algorithm to generate target information; and a transmitting end 34 for transmitting the target information to the user terminal.
The embodiment obtains the initial text content; performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points; and extracting the plurality of information points through a preset algorithm to generate target information. It is easy to notice that in this embodiment, only need obtain basic description information, processing terminal can extract basic description information automatically to generate tourism vacation product information, great saving the time of type-in, also can avoid because the mistake that the work load leads to is type-in greatly, consequently, this embodiment has solved current tourism product information and has needed the manual work to filter the generation to a large amount of text contents, leads to the technical problem that the inefficiency of tourism product information generation.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (13)

1. A method for generating target information, comprising:
acquiring initial text content;
performing information point extraction processing on the initial text content according to a preset word segmentation dictionary to generate a plurality of information points;
extracting the plurality of information points through a preset extraction algorithm to generate target information;
wherein, before obtaining the initial text content, the method further comprises: creating the word segmentation dictionary according to a travel word database, wherein the word segmentation dictionary comprises a plurality of travel product words and characteristics of the travel product words;
the step of extracting the initial text content according to a preset word segmentation dictionary and generating a plurality of information points comprises the following steps: segmenting the initial text content to generate a plurality of sub-initial text contents; using the plurality of tourism product vocabularies to sequentially perform word segmentation processing and feature extraction processing on each sub-initial text content to generate the plurality of information points, wherein each information point at least comprises: the word segmentation and the feature value of the word segmentation;
the step of extracting the plurality of information points through a preset algorithm to generate target information comprises the following steps:
respectively counting the occurrence frequency of a first information point in the plurality of information points in each sub-initial text content; calculating a descending rate of the appearance frequency of the first information point according to the appearance frequency of the first information point in each sub-initial text content; determining the first information point as the target information if the rate of decrease does not exceed a first threshold.
2. The method of claim 1, wherein the step of extracting the plurality of information points by a preset extraction algorithm to generate target information comprises:
the characteristic value of a first information point in the plurality of information points exceeds a second threshold value; and/or
And determining the first information point as the target information under the condition that the initial text content contains the text content associated with the first information point.
3. The method of claim 1, wherein the step of extracting the plurality of information points by a preset extraction algorithm to generate target information comprises:
and filtering the plurality of information points in the initial text content according to a preset standard information point database, and determining the plurality of information points contained in the standard information point database as the target information.
4. The method of claim 1, wherein the step of extracting the plurality of information points by a preset extraction algorithm to generate target information comprises:
obtaining the distance between a first information point in the plurality of information points and other information points;
and under the condition that the distance does not exceed a second threshold value, determining the first information point as target information.
5. The method of claim 1, wherein the step of extracting the plurality of information points by a preset extraction algorithm to generate target information comprises:
calculating the probability that a first information point in the plurality of information points and other information points appear in the preset text content together;
and under the condition that the probability exceeds a third threshold value, determining the first information point as target information.
6. The method according to any one of claims 1 to 5, wherein after extracting the plurality of information points by a preset extraction algorithm to generate target information, the method further comprises:
sending the target information to a search engine, wherein the target information at least comprises: destination, hotel, shopping, and traffic information.
7. An apparatus for generating object information, comprising:
an acquisition unit configured to acquire an initial text content;
the processing unit is used for extracting information points from the initial text content according to a preset word segmentation dictionary to generate a plurality of information points;
the extraction unit is used for extracting the plurality of information points through a preset extraction algorithm to generate target information;
the creating unit is used for creating the word segmentation dictionary according to the travel vocabulary database, wherein the word segmentation dictionary comprises a plurality of travel product vocabularies and the characteristics of the travel product vocabularies;
the processing unit includes: the first processing module is used for carrying out segmentation processing on the initial text content to generate a plurality of sub-initial text contents; the second processing module is used for performing word segmentation processing and feature extraction processing on each sub-initial text content in sequence by using the plurality of tourism product vocabularies to generate the plurality of information points, wherein each information point at least comprises: the word segmentation and the feature value of the word segmentation;
the extraction unit includes:
the statistical module is used for respectively counting the occurrence frequency of a first information point in the plurality of information points in each sub-initial text content;
a first calculating module, configured to calculate, according to the occurrence frequency of the first information point in each of the sub-initial text contents, a decreasing rate of the occurrence frequency of the first information point;
a first determining module, configured to determine that the first information point is the target information when the rate of decrease does not exceed a first threshold.
8. The apparatus of claim 7, wherein the extraction unit comprises:
a second determining module, configured to determine that a first information point in the plurality of information points is the target information if a feature value of the first information point exceeds a second threshold and/or a text content associated with the first information point is included in the initial text content.
9. The apparatus of claim 7, wherein the extraction unit comprises:
and the filtering module is used for filtering the plurality of information points in the initial text content according to a preset standard information point database and determining the plurality of information points contained in the standard information point database as the target information.
10. The apparatus of claim 7, wherein the extraction unit comprises:
the acquisition module is used for acquiring the distance between a first information point in the plurality of information points and other information points;
and the third determining module is used for determining the first information point as the target information under the condition that the distance does not exceed a second threshold value.
11. The apparatus of claim 7, wherein the extraction unit comprises:
the second calculation module is used for calculating the probability that the first information point in the plurality of information points and other information points appear in the preset text content together;
and the fourth determining module is used for determining the first information point as the target information under the condition that the probability exceeds a third threshold value.
12. The apparatus of any one of claims 7 to 11, further comprising:
a sending unit, configured to send the target information to a search engine, where the target information at least includes: destination, hotel, shopping, and traffic information.
13. A server, comprising:
the receiving end is used for receiving the initial text content;
the processor is used for extracting information points from the initial text content according to a preset word segmentation dictionary to generate a plurality of information points, and extracting the information points through a preset extraction algorithm to generate target information;
the sending terminal is used for sending the target information to the user terminal;
wherein the server is further configured to: before receiving initial text content, creating the word segmentation dictionary according to a travel vocabulary database, wherein the word segmentation dictionary comprises a plurality of travel product vocabularies and characteristics of the travel product vocabularies;
the processor is used for extracting the initial text content according to a preset word segmentation dictionary and generating a plurality of information points by the following steps: segmenting the initial text content to generate a plurality of sub-initial text contents; using the plurality of tourism product vocabularies to sequentially perform word segmentation processing and feature extraction processing on each sub-initial text content to generate the plurality of information points, wherein each information point at least comprises: the word segmentation and the feature value of the word segmentation;
the processor extracts the plurality of information points through a preset algorithm, and the step of generating target information comprises the following steps:
respectively counting the occurrence frequency of a first information point in the plurality of information points in each sub-initial text content; calculating a descending rate of the appearance frequency of the first information point according to the appearance frequency of the first information point in each sub-initial text content; determining the first information point as the target information if the rate of decrease does not exceed a first threshold.
CN201511017033.8A 2015-12-29 2015-12-29 Target information generation method and device Active CN106933797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511017033.8A CN106933797B (en) 2015-12-29 2015-12-29 Target information generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511017033.8A CN106933797B (en) 2015-12-29 2015-12-29 Target information generation method and device

Publications (2)

Publication Number Publication Date
CN106933797A CN106933797A (en) 2017-07-07
CN106933797B true CN106933797B (en) 2021-01-26

Family

ID=59441551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511017033.8A Active CN106933797B (en) 2015-12-29 2015-12-29 Target information generation method and device

Country Status (1)

Country Link
CN (1) CN106933797B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846592A (en) * 2018-07-11 2018-11-20 北京神州泰岳软件股份有限公司 A kind of valuation of enterprise report-generating method and device based on big data
CN112446208A (en) * 2020-12-09 2021-03-05 北京有竹居网络技术有限公司 Method, device and equipment for generating advertisement title and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411563A (en) * 2010-09-26 2012-04-11 阿里巴巴集团控股有限公司 Method, device and system for identifying target words
CN102946797A (en) * 2009-08-14 2013-02-27 D·伯顿 Anaesthesia and consciousness depth monitoring system
CN104239539A (en) * 2013-09-22 2014-12-24 中科嘉速(北京)并行软件有限公司 Microblog information filtering method based on multi-information fusion
CN105045812A (en) * 2015-06-18 2015-11-11 上海高欣计算机系统有限公司 Text topic classification method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5245882B2 (en) * 2008-03-17 2013-07-24 アイシン・エィ・ダブリュ株式会社 Database creation system and database creation method
US9536361B2 (en) * 2012-03-14 2017-01-03 Autoconnect Holdings Llc Universal vehicle notification system
CN104965992B (en) * 2015-07-13 2018-01-09 南开大学 A kind of text mining method based on online medical question and answer information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102946797A (en) * 2009-08-14 2013-02-27 D·伯顿 Anaesthesia and consciousness depth monitoring system
CN102411563A (en) * 2010-09-26 2012-04-11 阿里巴巴集团控股有限公司 Method, device and system for identifying target words
CN104239539A (en) * 2013-09-22 2014-12-24 中科嘉速(北京)并行软件有限公司 Microblog information filtering method based on multi-information fusion
CN105045812A (en) * 2015-06-18 2015-11-11 上海高欣计算机系统有限公司 Text topic classification method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Research on Dictionary for Personalized Chinese Word Segmentation";Huan JunJiang等;《Advanced Materials Research》;20141126;第3593页 *
"基于词典的中文分词算法研究";周程远 等;《计算机与数字工程》;20090320;第37卷(第3期);第68-71页 *

Also Published As

Publication number Publication date
CN106933797A (en) 2017-07-07

Similar Documents

Publication Publication Date Title
CN105095211B (en) The acquisition methods and device of multi-medium data
CN107239440B (en) Junk text recognition method and device
US20140172415A1 (en) Apparatus, system, and method of providing sentiment analysis result based on text
CN109918485B (en) Method and device for identifying dishes by voice, storage medium and electronic device
CN110110577B (en) Method and device for identifying dish name, storage medium and electronic device
CN101673266B (en) Method for searching audio and video contents
CN111797210A (en) Information recommendation method, device and equipment based on user portrait and storage medium
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
CN107544988B (en) Method and device for acquiring public opinion data
CN108305180B (en) Friend recommendation method and device
CN103593371A (en) Method and device for recommending search keywords
EP3232336A1 (en) Method and device for recognizing stop word
CN105550253B (en) Method and device for acquiring type relationship
CN106844482B (en) Search engine-based retrieval information matching method and device
KR101638535B1 (en) Method of detecting issue patten associated with user search word, server performing the same and storage medium storing the same
CN104915359A (en) Theme label recommending method and device
CN105512300B (en) information filtering method and system
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
WO2015062377A1 (en) Device and method for detecting similar text, and application
CN114861677A (en) Information extraction method, information extraction device, electronic equipment and storage medium
CN111062211A (en) Information extraction method and device, electronic equipment and storage medium
CN106933797B (en) Target information generation method and device
CN101673263B (en) Method for searching video content
CN112464036B (en) Method and device for auditing violation data
CN109670153A (en) A kind of determination method, apparatus, storage medium and the terminal of similar model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230925

Address after: Room 309, 3rd Floor, Weiya Building, Building 18, No. 29 Suzhou Street, Haidian District, Beijing, 100000

Patentee after: Beijing Yunxing Software Technology Co.,Ltd.

Address before: 100080 room 1709, 17 / F, Weiya building, 29 Suzhou street, Haidian District, Beijing

Patentee before: BEIJING QUNAR INFORMATION TECHNOLOGY CO.,LTD.