CN106446087A - Method and device for acquiring thematic information - Google Patents

Method and device for acquiring thematic information Download PDF

Info

Publication number
CN106446087A
CN106446087A CN201610815957.0A CN201610815957A CN106446087A CN 106446087 A CN106446087 A CN 106446087A CN 201610815957 A CN201610815957 A CN 201610815957A CN 106446087 A CN106446087 A CN 106446087A
Authority
CN
China
Prior art keywords
information
target keyword
target
weights
thematic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610815957.0A
Other languages
Chinese (zh)
Inventor
沈文策
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Cnfol Information Technology Co Ltd
Original Assignee
Fujian Cnfol Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Cnfol Information Technology Co Ltd filed Critical Fujian Cnfol Information Technology Co Ltd
Priority to CN201610815957.0A priority Critical patent/CN106446087A/en
Publication of CN106446087A publication Critical patent/CN106446087A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3325Reformulation based on results of preceding query

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for acquiring thematic information. The method comprises the steps of: acquiring first information; acquiring at least one target keyword from the first information; searching second information, which includes all target keywords, from a database; and determining the thematic information of the first information according to the second information. According to the method and the device for acquiring the thematic information provided by the embodiment, at least one target keyword is obtained through processing of the first information, the second information which includes all target keywords is searched from the database and finally the thematic information which has high relativity with the first information can be acquired according to the searched second information.

Description

Thematic information acquisition methods and device
Technical field
The present invention relates to data searching technology field, more particularly to a kind of thematic information acquisition methods and device.
Background technology
With the Internet and the development of Internet of Things, network is more intelligent, and the data volume of network also assumes the trend of blast.
User is when thematic information (thematic information is the high information of dependency) of certain raw information is searched for, it will usually people For find some key word from the raw information, and scanned on network according to the key word, obtain search letter Breath, these search the huge and multiformity of information content, user can easily not find from these search information with original The high information of information correlativity.
Content of the invention
It is an object of the invention to, a kind of thematic information acquisition methods and device are provided, with easily get with original The high information of information correlativity.
For above-mentioned purpose is reached, a kind of thematic information acquisition methods are embodiments provided, methods described includes:
Obtain the first information;
At least one target keyword is obtained from the first information;
In data base, second information of the search comprising all target keyword, determines institute according to second information State the thematic information of the first information.
More preferably, described obtain at least one target keyword from the first information after, methods described is also wrapped Include:
Calculate first weights of each described target keyword in the first information;
The thematic information for determining the first information according to second information, including:
Calculate second weights of each described target keyword in second information;
First weights of relatively the second weights of each target keyword and the target keyword, according to comparing knot Really, the thematic information of the first information is determined from second information.
More preferably, described obtain at least one target keyword from the first information, including:
Word segmentation processing and stop words filtration treatment is carried out to the first information, obtains the first key word of the first quantity;
Count first number that each described first key word occurs in the first information;
According to the size of first number, descending is carried out to first key word and forms queue;
First key word described in front at least one in the queue is defined as the target keyword.
More preferably, calculating first weights of each described target keyword in the first information, including:
According to the first total degree and first number of each target keyword, the target keyword is calculated described The first weights in the first information;Wherein, first total degree is the sum of first number of all first key words.
More preferably, calculating second weights of each described target keyword in second information, including:
Each described target keyword second weights in each second information are calculated in such a way:
Word segmentation processing and stop words filtration treatment are carried out to the second information of target, the second key word of the second quantity are obtained, Wherein, the second information of the target is arbitrary second information;
Count second number that each second key word occurs in the second information of the target;
The target keyword is calculated in the mesh according to second number of each target keyword and the second total degree Mark the second weights in the second information;Wherein, second total degree is all second key words in the second information of the target Second number sum.
More preferably, the first weights of second weights for comparing each target keyword and the target keyword, According to comparative result, the thematic information of the first information is determined from second information, including:
Judge that whether each second information be the thematic information of the first information in such a way:
The second weights in the second information of the target and the target keyword based on target keyword each described First weights, calculate corresponding first gap of the target keyword;
Judge corresponding first gap of each described target keyword whether less than the corresponding default difference of the target keyword Away from;
If it is, corresponding for the target keyword the first gap is labeled as reasonable gap;
If not, corresponding for the target keyword the first gap is labeled as unreasonable gap;
If corresponding first gap of all target keyword is reasonable gap, by the second information of the target Thematic information as the first information;Conversely, then can not be using the second information of the target as the special of the first information Topic information.
More preferably, methods described also includes:
The thematic information is stored;
Set up linking for all target keyword and the thematic information;
All target keyword and corresponding link of each target keyword are sent to client, make institute State client to show all target keyword and the first information.
The embodiment of the present invention additionally provides a kind of thematic information acquisition device, and described device includes:
First information acquisition module, for obtaining the first information;
Target keyword acquisition module, for obtaining at least one target keyword from the first information;
Thematic information determining module, for second information of the search comprising all target keyword in data base, The thematic information of the first information is determined according to second information.
More preferably, described device also includes:
First weight computing module, for calculating first power of each described target keyword in the first information Value;
The thematic information determining module, including:
Second weight calculation unit, for calculating second power of each described target keyword in second information Value;
Weights comparing unit, for comparing the second weights of each target keyword and the first of the target keyword Weights, according to comparative result, determine the thematic information of the first information from second information.
More preferably, the target keyword acquisition module, including:
First key word acquiring unit, for carrying out word segmentation processing and stop words filtration treatment to the first information, obtains To the first key word of the first quantity;
First number statistic unit, for count that each described first key word occurs in the first information first Number of times;
Descending unit, for the size according to first number, carries out descending to first key word Form queue;
Target keyword determining unit, described for the first key word described in front at least one in the queue to be defined as Target keyword.
Thematic information acquisition methods provided in an embodiment of the present invention and device, by the first information is carried out process obtain to A few target keyword, in second information of the search comprising all target keyword from data base, finally can be according to search The secondary signal for arriving obtains the thematic information high with first information dependency.As seen from the above, side provided in an embodiment of the present invention The special topic letter high with first information dependency is obtained in case in the form of data base is combined with least one target keyword Breath, and without the need for scanning in the substantial amounts of network information again, and obtain the pass for considering during the high information of first information dependency Keyword is many, obtains the information high with first information dependency therefore, it is possible to convenient.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing Accompanying drawing to be used needed for technology description is had to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the flow chart of thematic information acquisition methods provided in an embodiment of the present invention;
Fig. 2 is the structural representation of thematic information acquisition device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made Embodiment, belongs to the scope of protection of the invention.
Embodiment one
As shown in figure 1, for the flow chart of thematic information acquisition methods provided in an embodiment of the present invention, server is applied to, Methods described includes:
S110, obtains the first information.
Specifically, the first information can be phrase, sentence or article etc..In this enforcement, the first information can pass through with lower section Formula is obtained:First, customer side sends the first information to server;2nd, acquirement is swashed to the first information from network by web crawlers.
S120, obtains at least one target keyword from the first information.
Preferably, step S120 may include following steps:
A1, carry out word segmentation processing and stop words filtration treatment to the first information, obtain the first quantity first crucial Word.
Specifically, the process that implements of word segmentation processing and stop words filtration treatment is prior art, herein not superfluous State.
Define the first key word, be the first information is processed after the key word that obtains.For example, if the first information is " Xiao Ming is born in Beijing ", the first key word for obtaining can be " Xiao Ming ", " birth " and " Beijing ".
The number that the first quantity is all first key words is defined, for example, the first key word " Xiao Ming ", " birth " and " north First quantity in capital " is 3.
A2, count first number that each described first key word occurs in the first information.
Specifically, in the first information (article as length is larger), may there is certain first key word repeatedly to go out Existing situation, first number of definition is the number of times that the first key word occurs in the first information.The statistical method of first number, can Realized using ergodic algorithm, this is prior art.
A3, according to first number size, first key word is carried out descending formed queue.
A4, the first key word described in front at least one in the queue is defined as the target keyword.
Specifically, according to the size of first number of all first key words, descending is carried out, occurrence number is more First key word can be located at before queue, illustrate that these importances of the first key word in the first information are higher, will be former Individual (as first 3) first key word, as target keyword, illustrates that the first information can be entered by these target keyword Line identifier.
S130, second information of the search comprising all target keyword in data base, according to second information Determine the thematic information of the first information.
Specifically, data base can be local data base, cloud data base or network data base etc..In the present embodiment, definition Second information is the information comprising all target keyword, i.e., obtain several target keyword according to the first information, it is desirable to second Information must include all of target keyword, and the second information can be phrase, sentence or article etc..In the present embodiment, target Key word characterizes the dependency of the first information and the second information, and target keyword is more, and the dependency of the two is higher.
The acquisition methods of the second information, can obtain according to the method that successively screens.For example, first include one from data base's screening The information aggregate of individual target keyword, then the information comprising another target keyword is screened from the information aggregate, with this Successively screen.
In the present embodiment, determine that according to the second information the mode of the thematic information of the first information can be divided into two kinds:First, due to All comprising all of target keyword in the first information in each second information for obtaining, the dependency of the two is stronger, then can be by All second information are used as the thematic information of the first information;2nd, can also be according to target keyword in the first information and the second information In weights, select higher the second information of dependency from all of second information as the thematic information of the first information.
Thematic information acquisition methods provided in an embodiment of the present invention, obtain at least one by carrying out process to the first information Target keyword, in second information of the search comprising all target keyword from data base, finally can be according to for searching Binary signal obtains the thematic information high with first information dependency.As seen from the above, adopt in scheme provided in an embodiment of the present invention The thematic information high with first information dependency is obtained with the form that data base and at least one target keyword combine, and nothing Need scan in the substantial amounts of network information again, and it be many to obtain the key word for considering during the high information of first information dependency, The information high with first information dependency is obtained therefore, it is possible to convenient.
Preferably, after step silo, methods described also includes:Each described target keyword is calculated described first The first weights in information.
Preferably, according to the first total degree and first number of each target keyword, the target critical is calculated First weights of the word in the first information;Wherein, first total degree is the first time of all first key words The sum of number.
Specifically, the first weights can be regarded as each influence degree of first key word in the first information.According to above-mentioned Embodiment is obtained after first number of each the first key word (i.e. occurrence number), first number to all of first key word Total occurrence number that carrying out sues for peace obtains is the first total degree.In the present embodiment, can be by the first time of some target keyword Several the first weights divided by the value of the first total degree as the target keyword in the first information.For example, it is assumed that there is certain Target keyword a, it is 100 that its first number is 10, the first total degree, then first weights of a are 1/10.
Preferably, the thematic information for determining the first information according to second information, including:
B1, calculate second weights of each described target keyword in second information.
Preferably, second power of each described target keyword in each second information is calculated in such a way Value:
C1, word segmentation processing and stop words filtration treatment are carried out to the second information of target, obtain the second quantity second crucial Word, wherein, the second information of the target is arbitrary second information.
Specifically, after scanning for according to target keyword, multiple second information are obtained, defining the second information of target is The second information of any of which.When second weights of all target keyword in second information of target are calculated, need to this The second information of target carries out word segmentation processing and stop words filtration treatment, obtains multiple key words (being defined as the second key word), obtains The number of the second key word for arriving is the second quantity.
C2, count second number that each second key word occurs in the second information of the target.
Specifically, second number of definition is the number of times that each second key word occurs in the second information of target.It is worth note Meaning, due to all comprising all of target keyword in each second information of target, must in the second key word for therefore obtaining Surely include all of target keyword, when second number of each the second key word is counted, while can also count each target Second number of key word.
C3, the target keyword is calculated described according to second number and second total degree of each target keyword The second weights in the second information of target.
Specifically, the sum that the second total degree is second number of all second key words in the second information of target is defined.? In some second information of target, by second number of either objective key word divided by the second total degree, the value for obtaining is should Second weights of the target keyword in second information of target.
B2, compare the second weights of each target keyword and the first weights of the target keyword, according to comparing As a result, the thematic information of the first information is determined from second information.
Preferably, judge that whether each second information be the thematic information of the first information in such a way:
D1, the second weights based on target keyword each described in the second information of the target and the target keyword The first weights, calculate corresponding first gap of the target keyword.
Specifically, the absolute value of the second weights of target keyword and the difference of the first weights can be closed as the target Corresponding first gap of keyword;Can also be using the ratio of the second weights of target keyword and the first weights as the target critical Corresponding first gap of word.
D2, judge that whether corresponding less than the target keyword corresponding first gap of each described target keyword is default Gap.
D3 is if it is, be labeled as reasonable gap by corresponding for the target keyword the first gap;
D4 is if not, be labeled as unreasonable gap by corresponding for the target keyword the first gap;
Specifically, default gap is the whether rational criterion of the first gap of sign target keyword, when the first gap During less than the default gap, then illustrate that the first gap is less, be marked as reasonable gap;When the first gap is default not less than this During gap, then illustrate that the first gap is excessive, be marked as unreasonable gap.
Specifically, if by the second weights of target keyword and the absolute value of the difference of the first weights, as the target Corresponding first gap of key word, the absolute value of the two difference is bigger, then the first gap is bigger;If by target keyword Two weights and the ratio of the first weights are closer to 1 as corresponding first gap of the target keyword, then ratio, the first gap Less.
In the present embodiment, the big I of default gap is according to the size of target keyword first weights in the first information Different and freely set.For example, due to the influence degree highest of the target keyword in the first information of the first maximum weight, then Corresponding for the target keyword of the first maximum weight default gap can be arranged a little bit smaller.
If corresponding first gap of all target keyword of D5 is reasonable gap, by the target second Information is used as the thematic information of the first information;Conversely, then can not be using the second information of the target as the first information Thematic information.
In the present embodiment, reasonable gap being if all of corresponding first gap of target keyword, then illustrates all Influence degree of the target keyword in the first information and in the second information of target all very close, the now first information and the mesh The dependency for marking the second information is very high, then can using second information of target as the first information thematic information;If a certain Corresponding first gap of individual or multiple target keyword is unreasonable gap, then to illustrate at least one target keyword in the first letter Influence degree in breath and the second information of target is not close, and now the dependency of the first information and second information of target is not high, Then can not using second information of target as the first information thematic information.
The thematic information acquisition methods that the present embodiment is provided, by calculating all target keyword in the first information the One weights and in the second information of target the second weights, and according to the first weights and second the first gap of weight computing, when all When first gap of target keyword is reasonable gap, using the second information of target as the first information thematic information, earlier above The embodiment that states, the method can get the thematic information of more high correlation.
Preferably, methods described also includes:
E1, the thematic information is stored;
E2, set up linking for all target keyword and the thematic information;
E3, all target keyword and the corresponding link of each target keyword are sent to client, The client is made to show all target keyword and the first information.
Specifically, according to the method in above-described embodiment, after server obtains the thematic information of the first information, special to these Topic information Store, and sets up linking between thematic information and all target keyword, then by all target keyword and Each corresponding link of mark key word is sent to client, and the first information is shown jointly by client with all target keyword Together to facilitate user to check.If user is interested in the first information for showing, when wanting to watch related thematic information, can pass through Key Words, client is linked using corresponding, and request server calls stored thematic information, and server is client End pushes these thematic informations, and client shows thematic information.
Embodiment two
As shown in Fig. 2 for the structural representation of thematic information acquisition device provided in an embodiment of the present invention, for executing such as Method shown in Fig. 1, the device includes:
First information acquisition module 210, for obtaining the first information;
Target keyword acquisition module 220, for obtaining at least one target keyword from the first information;
Thematic information determining module 230, the second letter comprising all target keyword for search in data base Breath, determines the thematic information of the first information according to second information.
The present invention is the thematic information acquisition device that embodiment is provided, and obtains at least one by carrying out process to the first information Individual target keyword, in second information of the search comprising all target keyword from data base, finally can be according to searching Secondary signal obtains the thematic information high with first information dependency.As seen from the above, in scheme provided in an embodiment of the present invention The thematic information high with first information dependency is obtained with least one target keyword in the form of data base is combined, and Without the need for scanning in the substantial amounts of network information again, and obtain the key word for considering during the high information of first information dependency Many, the information high with first information dependency is obtained therefore, it is possible to convenient.
Preferably, described device also includes:
First weight computing module, for calculating first power of each described target keyword in the first information Value;
The thematic information determining module, including:
Second weight calculation unit, for calculating second power of each described target keyword in second information Value;
Weights comparing unit, for comparing the second weights of each target keyword and the first of the target keyword Weights, according to comparative result, determine the thematic information of the first information from second information.
Preferably, the target keyword acquisition module, including:
First key word acquiring unit, for carrying out word segmentation processing and stop words filtration treatment to the first information, obtains To the first key word of the first quantity;
First number statistic unit, for count that each described first key word occurs in the first information first Number of times;
Descending unit, for the size according to first number, carries out descending to first key word Form queue;
Target keyword determining unit, described for the first key word described in front at least one in the queue to be defined as Target keyword.
The first weight computing module, specifically for according to the of the first total degree and each target keyword Number, calculates first weights of the target keyword in the first information;Wherein, first total degree is all institutes State the sum of first number of the first key word.
Preferably, second weight calculation unit, including:
Second key word obtains subelement, for carrying out word segmentation processing and stop words filtration treatment to the second information of target, The second key word of the second quantity is obtained, wherein, the second information of the target is arbitrary second information;
Second number counts subelement, for count that each second key word occurs in the second information of the target the Two numbers;
Second weight computing subelement, based on second number according to each target keyword and the second total degree Calculate second weights of the target keyword in the second information of the target;Wherein, second total degree is the target the The sum of second number of all second key words in two information.
Preferably, the weights comparing unit, including:
First gap computation subunit, for based on target keyword each described in the second information of the target Two weights and the first weights of the target keyword, calculate corresponding first gap of the target keyword;
Whether the first gap judgment sub-unit, for judging corresponding first gap of each described target keyword less than should The corresponding default gap of target keyword;
Reasonable gap labelling subelement, for if it is, corresponding for the target keyword the first gap to be labeled as rationally Gap;
Unreasonable gap labelling subelement, for if not, corresponding for the target keyword the first gap to be labeled as not Reasonable gap;
Thematic information obtains subelement, if be for corresponding first gap of all target keyword rationally poor Away from, then using the second information of the target as the first information thematic information;Conversely, then the target second can not be believed Cease the thematic information as the first information.
Preferably, described device also includes:
Thematic information memory module, for storing to the thematic information;
Module is set up in link, for setting up linking for all target keyword and the thematic information;
Sending module, for receiving and sending all target keyword and the corresponding chain of each target keyword To client, the client is made to show all target keyword and the first information.
The thematic information acquisition device that the present embodiment is provided, by calculating all target keyword in the first information the One weights and in the second information of target the second weights, and according to the first weights and second the first gap of weight computing, when all When first gap of target keyword is reasonable gap, using the second information of target as the first information thematic information, earlier above The embodiment that states, the device can get the thematic information of more high correlation.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation are made a distinction with another entity or operation, and are not necessarily required or implied these entities or deposit between operating In any this actual relation or order.And, term " including ", "comprising" or its any other variant are intended to The including of nonexcludability, so that a series of process including key elements, method, article or equipment not only include that those will Element, but also other key elements including being not expressly set out, or also include for this process, method, article or equipment Intrinsic key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is all described by the way of correlation, identical similar portion between each embodiment Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality For applying example, as which is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method Part explanation.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit protection scope of the present invention.All Any modification, equivalent substitution and improvement that is made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention Interior.

Claims (10)

1. a kind of thematic information acquisition methods, it is characterised in that methods described includes:
Obtain the first information;
At least one target keyword is obtained from the first information;
Second information of the search comprising all target keyword in data base, determines described the according to second information The thematic information of one information.
2. method according to claim 1, it is characterised in that obtain at least one mesh from the first information described After mark key word, methods described also includes:
Calculate first weights of each described target keyword in the first information;
The thematic information for determining the first information according to second information, including:
Calculate second weights of each described target keyword in second information;
First weights of relatively the second weights of each target keyword and the target keyword, according to comparative result, from The thematic information of the first information is determined in second information.
3. method according to claim 2, it is characterised in that described obtain at least one target from the first information Key word, including:
Word segmentation processing and stop words filtration treatment is carried out to the first information, obtains the first key word of the first quantity;
Count first number that each described first key word occurs in the first information;
According to the size of first number, descending is carried out to first key word and forms queue;
First key word described in front at least one in the queue is defined as the target keyword.
4. method according to claim 3, it is characterised in that each described target keyword of the calculating is described first The first weights in information, including:
According to the first total degree and first number of each target keyword, the target keyword is calculated described first The first weights in information;Wherein, first total degree is the sum of first number of all first key words.
5. method according to claim 4, it is characterised in that each described target keyword of the calculating is described second The second weights in information, including:
Each described target keyword second weights in each second information are calculated in such a way:
Word segmentation processing and stop words filtration treatment are carried out to the second information of target, the second key word of the second quantity is obtained, wherein, The second information of the target is arbitrary second information;
Count second number that each second key word occurs in the second information of the target;
The target keyword is calculated in the target according to second number of each target keyword and the second total degree The second weights in two information;Wherein, second total degree is of all second key words in the second information of the target The sum of two numbers.
6. method according to claim 5, it is characterised in that second weights for comparing each target keyword With the first weights of the target keyword, according to comparative result, the special topic of the first information is determined from second information Information, including:
Judge that whether each second information be the thematic information of the first information in such a way:
Based on second weights of the target keyword each described in the second information of the target and the first of the target keyword Weights, calculate corresponding first gap of the target keyword;
Judge corresponding first gap of each described target keyword whether less than the corresponding default gap of the target keyword;
If it is, corresponding for the target keyword the first gap is labeled as reasonable gap;
If not, corresponding for the target keyword the first gap is labeled as unreasonable gap;
If corresponding first gap of all target keyword is reasonable gap, using the second information of the target as The thematic information of the first information;Conversely, then the second information of the target can not be believed as the special topic of the first information Breath.
7. the method according to any one of claim 1-6, it is characterised in that methods described also includes:
The thematic information is stored;
Set up linking for all target keyword and the thematic information;
All target keyword and corresponding link of each target keyword are sent to client, make the visitor Family end shows to all target keyword and the first information.
8. a kind of thematic information acquisition device, it is characterised in that described device includes:
First information acquisition module, for obtaining the first information;
Target keyword acquisition module, for obtaining at least one target keyword from the first information;
Thematic information determining module, includes the second information of all target keyword for search in data base, according to Second information determines the thematic information of the first information.
9. device according to claim 8, it is characterised in that described device also includes:
First weight computing module, for calculating first weights of each described target keyword in the first information;
The thematic information determining module, including:
Second weight calculation unit, for calculating second weights of each described target keyword in second information;
Weights comparing unit, weighs with the first of the target keyword for comparing the second weights of each target keyword Value, according to comparative result, determines the thematic information of the first information from second information.
10. device according to claim 9, it is characterised in that the target keyword acquisition module, including:
First key word acquiring unit, for carrying out word segmentation processing and stop words filtration treatment to the first information, obtains The first key word of one quantity;
First number statistic unit, for counting the first time that each described first key word occurs in the first information Number;
Descending unit, for the size according to first number, carries out descending to first key word and is formed Queue;
Target keyword determining unit, for being defined as the target by the first key word described in front at least one in the queue Key word.
CN201610815957.0A 2016-09-12 2016-09-12 Method and device for acquiring thematic information Pending CN106446087A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610815957.0A CN106446087A (en) 2016-09-12 2016-09-12 Method and device for acquiring thematic information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610815957.0A CN106446087A (en) 2016-09-12 2016-09-12 Method and device for acquiring thematic information

Publications (1)

Publication Number Publication Date
CN106446087A true CN106446087A (en) 2017-02-22

Family

ID=58167659

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610815957.0A Pending CN106446087A (en) 2016-09-12 2016-09-12 Method and device for acquiring thematic information

Country Status (1)

Country Link
CN (1) CN106446087A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127046A (en) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 Method and system for sequencing to blog article
CN101146288A (en) * 2007-09-24 2008-03-19 中兴通讯股份有限公司 A SMS mobile search method and system
CN102819555A (en) * 2012-06-27 2012-12-12 北京奇虎科技有限公司 Method and device for loading recommended information in read mode of webpage
CN103617241A (en) * 2013-11-26 2014-03-05 北京奇虎科技有限公司 Search information processing method, browser terminal and server
US20140214786A1 (en) * 2013-01-31 2014-07-31 Alpine Electronics, Inc. Internet search apparatus
CN104050163A (en) * 2013-03-11 2014-09-17 捷达世软件(深圳)有限公司 Content recommendation system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101146288A (en) * 2007-09-24 2008-03-19 中兴通讯股份有限公司 A SMS mobile search method and system
CN101127046A (en) * 2007-09-25 2008-02-20 腾讯科技(深圳)有限公司 Method and system for sequencing to blog article
CN102819555A (en) * 2012-06-27 2012-12-12 北京奇虎科技有限公司 Method and device for loading recommended information in read mode of webpage
US20140214786A1 (en) * 2013-01-31 2014-07-31 Alpine Electronics, Inc. Internet search apparatus
CN104050163A (en) * 2013-03-11 2014-09-17 捷达世软件(深圳)有限公司 Content recommendation system and method
CN103617241A (en) * 2013-11-26 2014-03-05 北京奇虎科技有限公司 Search information processing method, browser terminal and server

Similar Documents

Publication Publication Date Title
WO2020140373A1 (en) Intention recognition method, recognition device and computer-readable storage medium
CN106570144A (en) Method and apparatus for recommending information
CA2578157C (en) Duplicate document detection and presentation functions
US6708165B2 (en) Wide-spectrum information search engine
KR20060048780A (en) Phrase-based indexing in an information retrieval system
US20080077569A1 (en) Integrated Search Service System and Method
KR20060048779A (en) Phrase identification in an information retrieval system
KR20060048778A (en) Phrase-based searching in an information retrieval system
CN105488023B (en) A kind of text similarity appraisal procedure and device
CN104077407B (en) A kind of intelligent data search system and method
CN109635084B (en) Real-time rapid duplicate removal method and system for multi-source data document
WO2021068563A1 (en) Sample date processing method, device and computer equipment, and storage medium
CN106033445A (en) Method and device for obtaining article association degree data
CN106354871A (en) Similarity search method of enterprise names
CN110209942B (en) Scientific and technological information intelligence push system based on big data
CN107832444A (en) Event based on search daily record finds method and device
CN109492118A (en) A kind of data detection method and detection device
CN110765760A (en) Legal case distribution method and device, storage medium and server
CN104346411B (en) The method and apparatus that multiple contributions are clustered
CN115618014A (en) Standard document analysis management system and method applying big data technology
CN103257961B (en) Bibliography disappear weight method, Apparatus and system
US7519619B2 (en) Facilitating document classification using branch associations
CN110738048B (en) Keyword extraction method and device and terminal equipment
CN106446087A (en) Method and device for acquiring thematic information
RU105758U1 (en) ANALYSIS AND FILTRATION SYSTEM FOR INTERNET TRAFFIC BASED ON THE CLASSIFICATION METHODS OF MULTI-DIMENSIONAL DOCUMENTS

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170222