CN106446087A - Method and device for acquiring thematic information - Google Patents
Method and device for acquiring thematic information Download PDFInfo
- Publication number
- CN106446087A CN106446087A CN201610815957.0A CN201610815957A CN106446087A CN 106446087 A CN106446087 A CN 106446087A CN 201610815957 A CN201610815957 A CN 201610815957A CN 106446087 A CN106446087 A CN 106446087A
- Authority
- CN
- China
- Prior art keywords
- information
- target keyword
- target
- weights
- thematic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3325—Reformulation based on results of preceding query
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a device for acquiring thematic information. The method comprises the steps of: acquiring first information; acquiring at least one target keyword from the first information; searching second information, which includes all target keywords, from a database; and determining the thematic information of the first information according to the second information. According to the method and the device for acquiring the thematic information provided by the embodiment, at least one target keyword is obtained through processing of the first information, the second information which includes all target keywords is searched from the database and finally the thematic information which has high relativity with the first information can be acquired according to the searched second information.
Description
Technical field
The present invention relates to data searching technology field, more particularly to a kind of thematic information acquisition methods and device.
Background technology
With the Internet and the development of Internet of Things, network is more intelligent, and the data volume of network also assumes the trend of blast.
User is when thematic information (thematic information is the high information of dependency) of certain raw information is searched for, it will usually people
For find some key word from the raw information, and scanned on network according to the key word, obtain search letter
Breath, these search the huge and multiformity of information content, user can easily not find from these search information with original
The high information of information correlativity.
Content of the invention
It is an object of the invention to, a kind of thematic information acquisition methods and device are provided, with easily get with original
The high information of information correlativity.
For above-mentioned purpose is reached, a kind of thematic information acquisition methods are embodiments provided, methods described includes:
Obtain the first information;
At least one target keyword is obtained from the first information;
In data base, second information of the search comprising all target keyword, determines institute according to second information
State the thematic information of the first information.
More preferably, described obtain at least one target keyword from the first information after, methods described is also wrapped
Include:
Calculate first weights of each described target keyword in the first information;
The thematic information for determining the first information according to second information, including:
Calculate second weights of each described target keyword in second information;
First weights of relatively the second weights of each target keyword and the target keyword, according to comparing knot
Really, the thematic information of the first information is determined from second information.
More preferably, described obtain at least one target keyword from the first information, including:
Word segmentation processing and stop words filtration treatment is carried out to the first information, obtains the first key word of the first quantity;
Count first number that each described first key word occurs in the first information;
According to the size of first number, descending is carried out to first key word and forms queue;
First key word described in front at least one in the queue is defined as the target keyword.
More preferably, calculating first weights of each described target keyword in the first information, including:
According to the first total degree and first number of each target keyword, the target keyword is calculated described
The first weights in the first information;Wherein, first total degree is the sum of first number of all first key words.
More preferably, calculating second weights of each described target keyword in second information, including:
Each described target keyword second weights in each second information are calculated in such a way:
Word segmentation processing and stop words filtration treatment are carried out to the second information of target, the second key word of the second quantity are obtained,
Wherein, the second information of the target is arbitrary second information;
Count second number that each second key word occurs in the second information of the target;
The target keyword is calculated in the mesh according to second number of each target keyword and the second total degree
Mark the second weights in the second information;Wherein, second total degree is all second key words in the second information of the target
Second number sum.
More preferably, the first weights of second weights for comparing each target keyword and the target keyword,
According to comparative result, the thematic information of the first information is determined from second information, including:
Judge that whether each second information be the thematic information of the first information in such a way:
The second weights in the second information of the target and the target keyword based on target keyword each described
First weights, calculate corresponding first gap of the target keyword;
Judge corresponding first gap of each described target keyword whether less than the corresponding default difference of the target keyword
Away from;
If it is, corresponding for the target keyword the first gap is labeled as reasonable gap;
If not, corresponding for the target keyword the first gap is labeled as unreasonable gap;
If corresponding first gap of all target keyword is reasonable gap, by the second information of the target
Thematic information as the first information;Conversely, then can not be using the second information of the target as the special of the first information
Topic information.
More preferably, methods described also includes:
The thematic information is stored;
Set up linking for all target keyword and the thematic information;
All target keyword and corresponding link of each target keyword are sent to client, make institute
State client to show all target keyword and the first information.
The embodiment of the present invention additionally provides a kind of thematic information acquisition device, and described device includes:
First information acquisition module, for obtaining the first information;
Target keyword acquisition module, for obtaining at least one target keyword from the first information;
Thematic information determining module, for second information of the search comprising all target keyword in data base,
The thematic information of the first information is determined according to second information.
More preferably, described device also includes:
First weight computing module, for calculating first power of each described target keyword in the first information
Value;
The thematic information determining module, including:
Second weight calculation unit, for calculating second power of each described target keyword in second information
Value;
Weights comparing unit, for comparing the second weights of each target keyword and the first of the target keyword
Weights, according to comparative result, determine the thematic information of the first information from second information.
More preferably, the target keyword acquisition module, including:
First key word acquiring unit, for carrying out word segmentation processing and stop words filtration treatment to the first information, obtains
To the first key word of the first quantity;
First number statistic unit, for count that each described first key word occurs in the first information first
Number of times;
Descending unit, for the size according to first number, carries out descending to first key word
Form queue;
Target keyword determining unit, described for the first key word described in front at least one in the queue to be defined as
Target keyword.
Thematic information acquisition methods provided in an embodiment of the present invention and device, by the first information is carried out process obtain to
A few target keyword, in second information of the search comprising all target keyword from data base, finally can be according to search
The secondary signal for arriving obtains the thematic information high with first information dependency.As seen from the above, side provided in an embodiment of the present invention
The special topic letter high with first information dependency is obtained in case in the form of data base is combined with least one target keyword
Breath, and without the need for scanning in the substantial amounts of network information again, and obtain the pass for considering during the high information of first information dependency
Keyword is many, obtains the information high with first information dependency therefore, it is possible to convenient.
Description of the drawings
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
Accompanying drawing to be used needed for technology description is had to be briefly described, it should be apparent that, drawings in the following description are only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, acceptable
Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the flow chart of thematic information acquisition methods provided in an embodiment of the present invention;
Fig. 2 is the structural representation of thematic information acquisition device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of creative work is not made
Embodiment, belongs to the scope of protection of the invention.
Embodiment one
As shown in figure 1, for the flow chart of thematic information acquisition methods provided in an embodiment of the present invention, server is applied to,
Methods described includes:
S110, obtains the first information.
Specifically, the first information can be phrase, sentence or article etc..In this enforcement, the first information can pass through with lower section
Formula is obtained:First, customer side sends the first information to server;2nd, acquirement is swashed to the first information from network by web crawlers.
S120, obtains at least one target keyword from the first information.
Preferably, step S120 may include following steps:
A1, carry out word segmentation processing and stop words filtration treatment to the first information, obtain the first quantity first crucial
Word.
Specifically, the process that implements of word segmentation processing and stop words filtration treatment is prior art, herein not superfluous
State.
Define the first key word, be the first information is processed after the key word that obtains.For example, if the first information is
" Xiao Ming is born in Beijing ", the first key word for obtaining can be " Xiao Ming ", " birth " and " Beijing ".
The number that the first quantity is all first key words is defined, for example, the first key word " Xiao Ming ", " birth " and " north
First quantity in capital " is 3.
A2, count first number that each described first key word occurs in the first information.
Specifically, in the first information (article as length is larger), may there is certain first key word repeatedly to go out
Existing situation, first number of definition is the number of times that the first key word occurs in the first information.The statistical method of first number, can
Realized using ergodic algorithm, this is prior art.
A3, according to first number size, first key word is carried out descending formed queue.
A4, the first key word described in front at least one in the queue is defined as the target keyword.
Specifically, according to the size of first number of all first key words, descending is carried out, occurrence number is more
First key word can be located at before queue, illustrate that these importances of the first key word in the first information are higher, will be former
Individual (as first 3) first key word, as target keyword, illustrates that the first information can be entered by these target keyword
Line identifier.
S130, second information of the search comprising all target keyword in data base, according to second information
Determine the thematic information of the first information.
Specifically, data base can be local data base, cloud data base or network data base etc..In the present embodiment, definition
Second information is the information comprising all target keyword, i.e., obtain several target keyword according to the first information, it is desirable to second
Information must include all of target keyword, and the second information can be phrase, sentence or article etc..In the present embodiment, target
Key word characterizes the dependency of the first information and the second information, and target keyword is more, and the dependency of the two is higher.
The acquisition methods of the second information, can obtain according to the method that successively screens.For example, first include one from data base's screening
The information aggregate of individual target keyword, then the information comprising another target keyword is screened from the information aggregate, with this
Successively screen.
In the present embodiment, determine that according to the second information the mode of the thematic information of the first information can be divided into two kinds:First, due to
All comprising all of target keyword in the first information in each second information for obtaining, the dependency of the two is stronger, then can be by
All second information are used as the thematic information of the first information;2nd, can also be according to target keyword in the first information and the second information
In weights, select higher the second information of dependency from all of second information as the thematic information of the first information.
Thematic information acquisition methods provided in an embodiment of the present invention, obtain at least one by carrying out process to the first information
Target keyword, in second information of the search comprising all target keyword from data base, finally can be according to for searching
Binary signal obtains the thematic information high with first information dependency.As seen from the above, adopt in scheme provided in an embodiment of the present invention
The thematic information high with first information dependency is obtained with the form that data base and at least one target keyword combine, and nothing
Need scan in the substantial amounts of network information again, and it be many to obtain the key word for considering during the high information of first information dependency,
The information high with first information dependency is obtained therefore, it is possible to convenient.
Preferably, after step silo, methods described also includes:Each described target keyword is calculated described first
The first weights in information.
Preferably, according to the first total degree and first number of each target keyword, the target critical is calculated
First weights of the word in the first information;Wherein, first total degree is the first time of all first key words
The sum of number.
Specifically, the first weights can be regarded as each influence degree of first key word in the first information.According to above-mentioned
Embodiment is obtained after first number of each the first key word (i.e. occurrence number), first number to all of first key word
Total occurrence number that carrying out sues for peace obtains is the first total degree.In the present embodiment, can be by the first time of some target keyword
Several the first weights divided by the value of the first total degree as the target keyword in the first information.For example, it is assumed that there is certain
Target keyword a, it is 100 that its first number is 10, the first total degree, then first weights of a are 1/10.
Preferably, the thematic information for determining the first information according to second information, including:
B1, calculate second weights of each described target keyword in second information.
Preferably, second power of each described target keyword in each second information is calculated in such a way
Value:
C1, word segmentation processing and stop words filtration treatment are carried out to the second information of target, obtain the second quantity second crucial
Word, wherein, the second information of the target is arbitrary second information.
Specifically, after scanning for according to target keyword, multiple second information are obtained, defining the second information of target is
The second information of any of which.When second weights of all target keyword in second information of target are calculated, need to this
The second information of target carries out word segmentation processing and stop words filtration treatment, obtains multiple key words (being defined as the second key word), obtains
The number of the second key word for arriving is the second quantity.
C2, count second number that each second key word occurs in the second information of the target.
Specifically, second number of definition is the number of times that each second key word occurs in the second information of target.It is worth note
Meaning, due to all comprising all of target keyword in each second information of target, must in the second key word for therefore obtaining
Surely include all of target keyword, when second number of each the second key word is counted, while can also count each target
Second number of key word.
C3, the target keyword is calculated described according to second number and second total degree of each target keyword
The second weights in the second information of target.
Specifically, the sum that the second total degree is second number of all second key words in the second information of target is defined.?
In some second information of target, by second number of either objective key word divided by the second total degree, the value for obtaining is should
Second weights of the target keyword in second information of target.
B2, compare the second weights of each target keyword and the first weights of the target keyword, according to comparing
As a result, the thematic information of the first information is determined from second information.
Preferably, judge that whether each second information be the thematic information of the first information in such a way:
D1, the second weights based on target keyword each described in the second information of the target and the target keyword
The first weights, calculate corresponding first gap of the target keyword.
Specifically, the absolute value of the second weights of target keyword and the difference of the first weights can be closed as the target
Corresponding first gap of keyword;Can also be using the ratio of the second weights of target keyword and the first weights as the target critical
Corresponding first gap of word.
D2, judge that whether corresponding less than the target keyword corresponding first gap of each described target keyword is default
Gap.
D3 is if it is, be labeled as reasonable gap by corresponding for the target keyword the first gap;
D4 is if not, be labeled as unreasonable gap by corresponding for the target keyword the first gap;
Specifically, default gap is the whether rational criterion of the first gap of sign target keyword, when the first gap
During less than the default gap, then illustrate that the first gap is less, be marked as reasonable gap;When the first gap is default not less than this
During gap, then illustrate that the first gap is excessive, be marked as unreasonable gap.
Specifically, if by the second weights of target keyword and the absolute value of the difference of the first weights, as the target
Corresponding first gap of key word, the absolute value of the two difference is bigger, then the first gap is bigger;If by target keyword
Two weights and the ratio of the first weights are closer to 1 as corresponding first gap of the target keyword, then ratio, the first gap
Less.
In the present embodiment, the big I of default gap is according to the size of target keyword first weights in the first information
Different and freely set.For example, due to the influence degree highest of the target keyword in the first information of the first maximum weight, then
Corresponding for the target keyword of the first maximum weight default gap can be arranged a little bit smaller.
If corresponding first gap of all target keyword of D5 is reasonable gap, by the target second
Information is used as the thematic information of the first information;Conversely, then can not be using the second information of the target as the first information
Thematic information.
In the present embodiment, reasonable gap being if all of corresponding first gap of target keyword, then illustrates all
Influence degree of the target keyword in the first information and in the second information of target all very close, the now first information and the mesh
The dependency for marking the second information is very high, then can using second information of target as the first information thematic information;If a certain
Corresponding first gap of individual or multiple target keyword is unreasonable gap, then to illustrate at least one target keyword in the first letter
Influence degree in breath and the second information of target is not close, and now the dependency of the first information and second information of target is not high,
Then can not using second information of target as the first information thematic information.
The thematic information acquisition methods that the present embodiment is provided, by calculating all target keyword in the first information the
One weights and in the second information of target the second weights, and according to the first weights and second the first gap of weight computing, when all
When first gap of target keyword is reasonable gap, using the second information of target as the first information thematic information, earlier above
The embodiment that states, the method can get the thematic information of more high correlation.
Preferably, methods described also includes:
E1, the thematic information is stored;
E2, set up linking for all target keyword and the thematic information;
E3, all target keyword and the corresponding link of each target keyword are sent to client,
The client is made to show all target keyword and the first information.
Specifically, according to the method in above-described embodiment, after server obtains the thematic information of the first information, special to these
Topic information Store, and sets up linking between thematic information and all target keyword, then by all target keyword and
Each corresponding link of mark key word is sent to client, and the first information is shown jointly by client with all target keyword
Together to facilitate user to check.If user is interested in the first information for showing, when wanting to watch related thematic information, can pass through
Key Words, client is linked using corresponding, and request server calls stored thematic information, and server is client
End pushes these thematic informations, and client shows thematic information.
Embodiment two
As shown in Fig. 2 for the structural representation of thematic information acquisition device provided in an embodiment of the present invention, for executing such as
Method shown in Fig. 1, the device includes:
First information acquisition module 210, for obtaining the first information;
Target keyword acquisition module 220, for obtaining at least one target keyword from the first information;
Thematic information determining module 230, the second letter comprising all target keyword for search in data base
Breath, determines the thematic information of the first information according to second information.
The present invention is the thematic information acquisition device that embodiment is provided, and obtains at least one by carrying out process to the first information
Individual target keyword, in second information of the search comprising all target keyword from data base, finally can be according to searching
Secondary signal obtains the thematic information high with first information dependency.As seen from the above, in scheme provided in an embodiment of the present invention
The thematic information high with first information dependency is obtained with least one target keyword in the form of data base is combined, and
Without the need for scanning in the substantial amounts of network information again, and obtain the key word for considering during the high information of first information dependency
Many, the information high with first information dependency is obtained therefore, it is possible to convenient.
Preferably, described device also includes:
First weight computing module, for calculating first power of each described target keyword in the first information
Value;
The thematic information determining module, including:
Second weight calculation unit, for calculating second power of each described target keyword in second information
Value;
Weights comparing unit, for comparing the second weights of each target keyword and the first of the target keyword
Weights, according to comparative result, determine the thematic information of the first information from second information.
Preferably, the target keyword acquisition module, including:
First key word acquiring unit, for carrying out word segmentation processing and stop words filtration treatment to the first information, obtains
To the first key word of the first quantity;
First number statistic unit, for count that each described first key word occurs in the first information first
Number of times;
Descending unit, for the size according to first number, carries out descending to first key word
Form queue;
Target keyword determining unit, described for the first key word described in front at least one in the queue to be defined as
Target keyword.
The first weight computing module, specifically for according to the of the first total degree and each target keyword
Number, calculates first weights of the target keyword in the first information;Wherein, first total degree is all institutes
State the sum of first number of the first key word.
Preferably, second weight calculation unit, including:
Second key word obtains subelement, for carrying out word segmentation processing and stop words filtration treatment to the second information of target,
The second key word of the second quantity is obtained, wherein, the second information of the target is arbitrary second information;
Second number counts subelement, for count that each second key word occurs in the second information of the target the
Two numbers;
Second weight computing subelement, based on second number according to each target keyword and the second total degree
Calculate second weights of the target keyword in the second information of the target;Wherein, second total degree is the target the
The sum of second number of all second key words in two information.
Preferably, the weights comparing unit, including:
First gap computation subunit, for based on target keyword each described in the second information of the target
Two weights and the first weights of the target keyword, calculate corresponding first gap of the target keyword;
Whether the first gap judgment sub-unit, for judging corresponding first gap of each described target keyword less than should
The corresponding default gap of target keyword;
Reasonable gap labelling subelement, for if it is, corresponding for the target keyword the first gap to be labeled as rationally
Gap;
Unreasonable gap labelling subelement, for if not, corresponding for the target keyword the first gap to be labeled as not
Reasonable gap;
Thematic information obtains subelement, if be for corresponding first gap of all target keyword rationally poor
Away from, then using the second information of the target as the first information thematic information;Conversely, then the target second can not be believed
Cease the thematic information as the first information.
Preferably, described device also includes:
Thematic information memory module, for storing to the thematic information;
Module is set up in link, for setting up linking for all target keyword and the thematic information;
Sending module, for receiving and sending all target keyword and the corresponding chain of each target keyword
To client, the client is made to show all target keyword and the first information.
The thematic information acquisition device that the present embodiment is provided, by calculating all target keyword in the first information the
One weights and in the second information of target the second weights, and according to the first weights and second the first gap of weight computing, when all
When first gap of target keyword is reasonable gap, using the second information of target as the first information thematic information, earlier above
The embodiment that states, the device can get the thematic information of more high correlation.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality
Body or operation are made a distinction with another entity or operation, and are not necessarily required or implied these entities or deposit between operating
In any this actual relation or order.And, term " including ", "comprising" or its any other variant are intended to
The including of nonexcludability, so that a series of process including key elements, method, article or equipment not only include that those will
Element, but also other key elements including being not expressly set out, or also include for this process, method, article or equipment
Intrinsic key element.In the absence of more restrictions, the key element for being limited by sentence "including a ...", it is not excluded that
Also there is other identical element in process, method, article or equipment including the key element.
Each embodiment in this specification is all described by the way of correlation, identical similar portion between each embodiment
Divide mutually referring to what each embodiment was stressed is the difference with other embodiment.Especially for system reality
For applying example, as which is substantially similar to embodiment of the method, so description is fairly simple, related part is referring to embodiment of the method
Part explanation.
Presently preferred embodiments of the present invention is the foregoing is only, is not intended to limit protection scope of the present invention.All
Any modification, equivalent substitution and improvement that is made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention
Interior.
Claims (10)
1. a kind of thematic information acquisition methods, it is characterised in that methods described includes:
Obtain the first information;
At least one target keyword is obtained from the first information;
Second information of the search comprising all target keyword in data base, determines described the according to second information
The thematic information of one information.
2. method according to claim 1, it is characterised in that obtain at least one mesh from the first information described
After mark key word, methods described also includes:
Calculate first weights of each described target keyword in the first information;
The thematic information for determining the first information according to second information, including:
Calculate second weights of each described target keyword in second information;
First weights of relatively the second weights of each target keyword and the target keyword, according to comparative result, from
The thematic information of the first information is determined in second information.
3. method according to claim 2, it is characterised in that described obtain at least one target from the first information
Key word, including:
Word segmentation processing and stop words filtration treatment is carried out to the first information, obtains the first key word of the first quantity;
Count first number that each described first key word occurs in the first information;
According to the size of first number, descending is carried out to first key word and forms queue;
First key word described in front at least one in the queue is defined as the target keyword.
4. method according to claim 3, it is characterised in that each described target keyword of the calculating is described first
The first weights in information, including:
According to the first total degree and first number of each target keyword, the target keyword is calculated described first
The first weights in information;Wherein, first total degree is the sum of first number of all first key words.
5. method according to claim 4, it is characterised in that each described target keyword of the calculating is described second
The second weights in information, including:
Each described target keyword second weights in each second information are calculated in such a way:
Word segmentation processing and stop words filtration treatment are carried out to the second information of target, the second key word of the second quantity is obtained, wherein,
The second information of the target is arbitrary second information;
Count second number that each second key word occurs in the second information of the target;
The target keyword is calculated in the target according to second number of each target keyword and the second total degree
The second weights in two information;Wherein, second total degree is of all second key words in the second information of the target
The sum of two numbers.
6. method according to claim 5, it is characterised in that second weights for comparing each target keyword
With the first weights of the target keyword, according to comparative result, the special topic of the first information is determined from second information
Information, including:
Judge that whether each second information be the thematic information of the first information in such a way:
Based on second weights of the target keyword each described in the second information of the target and the first of the target keyword
Weights, calculate corresponding first gap of the target keyword;
Judge corresponding first gap of each described target keyword whether less than the corresponding default gap of the target keyword;
If it is, corresponding for the target keyword the first gap is labeled as reasonable gap;
If not, corresponding for the target keyword the first gap is labeled as unreasonable gap;
If corresponding first gap of all target keyword is reasonable gap, using the second information of the target as
The thematic information of the first information;Conversely, then the second information of the target can not be believed as the special topic of the first information
Breath.
7. the method according to any one of claim 1-6, it is characterised in that methods described also includes:
The thematic information is stored;
Set up linking for all target keyword and the thematic information;
All target keyword and corresponding link of each target keyword are sent to client, make the visitor
Family end shows to all target keyword and the first information.
8. a kind of thematic information acquisition device, it is characterised in that described device includes:
First information acquisition module, for obtaining the first information;
Target keyword acquisition module, for obtaining at least one target keyword from the first information;
Thematic information determining module, includes the second information of all target keyword for search in data base, according to
Second information determines the thematic information of the first information.
9. device according to claim 8, it is characterised in that described device also includes:
First weight computing module, for calculating first weights of each described target keyword in the first information;
The thematic information determining module, including:
Second weight calculation unit, for calculating second weights of each described target keyword in second information;
Weights comparing unit, weighs with the first of the target keyword for comparing the second weights of each target keyword
Value, according to comparative result, determines the thematic information of the first information from second information.
10. device according to claim 9, it is characterised in that the target keyword acquisition module, including:
First key word acquiring unit, for carrying out word segmentation processing and stop words filtration treatment to the first information, obtains
The first key word of one quantity;
First number statistic unit, for counting the first time that each described first key word occurs in the first information
Number;
Descending unit, for the size according to first number, carries out descending to first key word and is formed
Queue;
Target keyword determining unit, for being defined as the target by the first key word described in front at least one in the queue
Key word.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610815957.0A CN106446087A (en) | 2016-09-12 | 2016-09-12 | Method and device for acquiring thematic information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610815957.0A CN106446087A (en) | 2016-09-12 | 2016-09-12 | Method and device for acquiring thematic information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106446087A true CN106446087A (en) | 2017-02-22 |
Family
ID=58167659
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610815957.0A Pending CN106446087A (en) | 2016-09-12 | 2016-09-12 | Method and device for acquiring thematic information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106446087A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101127046A (en) * | 2007-09-25 | 2008-02-20 | 腾讯科技(深圳)有限公司 | Method and system for sequencing to blog article |
CN101146288A (en) * | 2007-09-24 | 2008-03-19 | 中兴通讯股份有限公司 | A SMS mobile search method and system |
CN102819555A (en) * | 2012-06-27 | 2012-12-12 | 北京奇虎科技有限公司 | Method and device for loading recommended information in read mode of webpage |
CN103617241A (en) * | 2013-11-26 | 2014-03-05 | 北京奇虎科技有限公司 | Search information processing method, browser terminal and server |
US20140214786A1 (en) * | 2013-01-31 | 2014-07-31 | Alpine Electronics, Inc. | Internet search apparatus |
CN104050163A (en) * | 2013-03-11 | 2014-09-17 | 捷达世软件(深圳)有限公司 | Content recommendation system and method |
-
2016
- 2016-09-12 CN CN201610815957.0A patent/CN106446087A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101146288A (en) * | 2007-09-24 | 2008-03-19 | 中兴通讯股份有限公司 | A SMS mobile search method and system |
CN101127046A (en) * | 2007-09-25 | 2008-02-20 | 腾讯科技(深圳)有限公司 | Method and system for sequencing to blog article |
CN102819555A (en) * | 2012-06-27 | 2012-12-12 | 北京奇虎科技有限公司 | Method and device for loading recommended information in read mode of webpage |
US20140214786A1 (en) * | 2013-01-31 | 2014-07-31 | Alpine Electronics, Inc. | Internet search apparatus |
CN104050163A (en) * | 2013-03-11 | 2014-09-17 | 捷达世软件(深圳)有限公司 | Content recommendation system and method |
CN103617241A (en) * | 2013-11-26 | 2014-03-05 | 北京奇虎科技有限公司 | Search information processing method, browser terminal and server |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020140373A1 (en) | Intention recognition method, recognition device and computer-readable storage medium | |
CN106570144A (en) | Method and apparatus for recommending information | |
CA2578157C (en) | Duplicate document detection and presentation functions | |
US6708165B2 (en) | Wide-spectrum information search engine | |
KR20060048780A (en) | Phrase-based indexing in an information retrieval system | |
US20080077569A1 (en) | Integrated Search Service System and Method | |
KR20060048779A (en) | Phrase identification in an information retrieval system | |
KR20060048778A (en) | Phrase-based searching in an information retrieval system | |
CN105488023B (en) | A kind of text similarity appraisal procedure and device | |
CN104077407B (en) | A kind of intelligent data search system and method | |
CN109635084B (en) | Real-time rapid duplicate removal method and system for multi-source data document | |
WO2021068563A1 (en) | Sample date processing method, device and computer equipment, and storage medium | |
CN106033445A (en) | Method and device for obtaining article association degree data | |
CN106354871A (en) | Similarity search method of enterprise names | |
CN110209942B (en) | Scientific and technological information intelligence push system based on big data | |
CN107832444A (en) | Event based on search daily record finds method and device | |
CN109492118A (en) | A kind of data detection method and detection device | |
CN110765760A (en) | Legal case distribution method and device, storage medium and server | |
CN104346411B (en) | The method and apparatus that multiple contributions are clustered | |
CN115618014A (en) | Standard document analysis management system and method applying big data technology | |
CN103257961B (en) | Bibliography disappear weight method, Apparatus and system | |
US7519619B2 (en) | Facilitating document classification using branch associations | |
CN110738048B (en) | Keyword extraction method and device and terminal equipment | |
CN106446087A (en) | Method and device for acquiring thematic information | |
RU105758U1 (en) | ANALYSIS AND FILTRATION SYSTEM FOR INTERNET TRAFFIC BASED ON THE CLASSIFICATION METHODS OF MULTI-DIMENSIONAL DOCUMENTS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |