CN106649334A - Conjunction word set processing method and device - Google Patents
Conjunction word set processing method and device Download PDFInfo
- Publication number
- CN106649334A CN106649334A CN201510726038.1A CN201510726038A CN106649334A CN 106649334 A CN106649334 A CN 106649334A CN 201510726038 A CN201510726038 A CN 201510726038A CN 106649334 A CN106649334 A CN 106649334A
- Authority
- CN
- China
- Prior art keywords
- text
- vocabulary
- index data
- related word
- coupling index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
Abstract
The invention discloses a conjunction word set processing method and device, wherein the processing method comprises the steps of crawling a web text from a target data source on the basis of conjunction words in a conjunction word set of an object to be analyzed; performing word segmentation on the web text to obtain a plurality of text vocabularies, and obtaining the vocabulary information of each text vocabulary, wherein the vocabulary information includes conjunction index data of each text vocabulary and/or information of part of speech of each text vocabulary, and the conjunction index data is used for indicating the conjunction degree of each text vocabulary and the conjunction words; screening the conjunction index data of a plurality of text vocabularies and/or information of part of speech of a plurality of text vocabularies, and obtaining the screened conjunction vocabularies; and updating the conjunction word set by using the screened conjunction vocabularies. The method and the device provided by the invention solve the technical problem of small vocabulary quantity of the existing word bag accumulating method.
Description
Technical field
The application is related to internet arena, in particular to a kind of processing method and processing device of related word set.
Background technology
Enterprise's release product, release service when, or government department promulgates a certain policy, and occurs a certain to cause society
During the instant event that can be paid close attention to, the contents such as the related news of some network media reports certainly will be occurred on internet,
These Internet news will cause the concern and discussion of netizen.Object is being analyzed for a certain (such as:Current events, product,
Personage, policy etc.) network public-opinion content (i.e. related to object network text) collection during, according to
Web crawlers crawls the mode of the network text relevant with analysis object come the information of collecting, due to not right when crawling
Object is relevant is distinguish between with analysis for content, then after crawling and obtaining network text, need to screen it,
To filter out the content related to object to be analyzed.
Typically during screening and screen text, one section of network text is judged by setting some Rule of judgment
Whether this is the related content of object to be analyzed, using the set of the content related to object to be analyzed as " word bag ", general
Content in " word bag " come replace analyze object, to network text process screen with filter, this process can also
Referred to as word bag accumulation.
The basic skills behaviour union of existing word bag accumulation is wanted to be manually entered, more using the combined method of following vocabulary:
Using object oriented to be analyzed as word bag;Using object oriented to be analyzed and synon combination as word bag;And to treat
Analysis object oriented and competing product contamination are used as word bag.It can be seen that the shortcoming of existing word bag accumulation method is:Word
Remittance amount is on the low side;Whether the relation between vocabulary and analysis object closely cannot quantify to weigh;It is artificial to participate in vocabulary building institute
Take time longer, efficiency is low;And poor expandability.
For the method vocabulary of above-mentioned existing word bag accumulation problem on the low side, effective solution party is not yet proposed at present
Case.
The content of the invention
The embodiment of the present application provides a kind of processing method and processing device of related word set, at least to solve existing word
Wrap the method vocabulary of accumulation technical problem on the low side.
According to the one side of the embodiment of the present application, there is provided a kind of processing method of related word set, the process side
Method includes:Network text is crawled from target data source based on the related word in the related word set of object to be analyzed;
Participle is carried out to network text and obtains multiple text vocabulary, and obtain the lexical information of each text vocabulary, wherein, word
Remittance information includes the coupling index data of each text vocabulary and/or the part-of-speech information of each text vocabulary, coupling index number
According to for indicating the degree of association of each text vocabulary and related word;According to default screening conditions to multiple text vocabulary
The part-of-speech information of coupling index data and/or multiple text vocabulary is screened, and obtains the association vocabulary for filtering out;Use
The association vocabulary for filtering out updates related word set.
Further, participle is carried out to network text and obtains multiple text vocabulary, and obtain the vocabulary of each text vocabulary
Information includes:After participle being carried out to network text and obtains multiple text vocabulary, the text of multiple text vocabulary is created
Dictionary;Determine the coupling index data of each text vocabulary in text dictionary according to default Correlation Criteria, and/or extract text
The part-of-speech information of each text vocabulary in this dictionary.
Further, determine that the coupling index data of each text vocabulary in text dictionary include according to default Correlation Criteria:
If default Correlation Criteria is one, the relevance numerical value of each default Correlation Criteria of text vocabulary correspondence is obtained, obtained
The coupling index data of each text vocabulary;If default Correlation Criteria is multiple, each text vocabulary correspondence is obtained each
All relevance numerical value of each text vocabulary are made mixing operation by the relevance numerical value of individual default Correlation Criteria, will be melted
With result as each text vocabulary coupling index data, wherein, mixing operation include weighted calculation, plus and calculate
At least one of and multiplication and division calculating.
Further, determine that the coupling index data of each text vocabulary in text dictionary include according to default Correlation Criteria:
Each text vocabulary is met the coupling index data of the number of times of default Correlation Criteria as each text vocabulary, wherein,
Default Correlation Criteria includes:Each text vocabulary occurs simultaneously with related word in the same sentence of network text;With/
Or each text vocabulary and related word network text is occurred in identical part of speech in network text sentence in it is identical
Position.
Further, the coupling index data and/or multiple text vocabulary according to default screening conditions to multiple text vocabulary
Part-of-speech information screened, the association vocabulary for obtaining filtering out includes:By coupling index data in preset range
Text vocabulary is used as the association vocabulary for filtering out;Or in the coupling index data of multiple text vocabulary coupling index data
Ranking front N names text vocabulary as the association vocabulary for filtering out;Or by text word that lexical information is default part of speech
Converge as the association vocabulary for filtering out.
Further, updating related word set using the association vocabulary for filtering out includes:Using the conjunctive word for filtering out
Converge and replace related word, to update related word set;Or the association vocabulary for filtering out is added into into related word set,
To update related word set.
According to the another aspect of the embodiment of the present application, a kind of processing meanss of related word set, the process are additionally provided
Device includes:Unit is crawled, for the related word in the related word set based on object to be analyzed from target data
Network text is crawled on source;Processing unit, for carrying out participle to network text multiple text vocabulary are obtained, and are obtained
The lexical information of each text vocabulary, wherein, coupling index data of lexical information including each text vocabulary and/or each
The part-of-speech information of individual text vocabulary, coupling index data are used to indicate the degree of association of each text vocabulary and related word;
Screening unit, for coupling index data and/or multiple text vocabulary according to default screening conditions to multiple text vocabulary
Part-of-speech information screened, obtain the association vocabulary for filtering out;Updating block, for using the conjunctive word for filtering out
Converge and update related word set.
Further, processing unit includes:Creation module, for obtaining multiple texts carrying out participle to network text
After vocabulary, the text dictionary of multiple text vocabulary is created;Determining module, for determining text according to default Correlation Criteria
The coupling index data of each text vocabulary in this dictionary, and/or extract the part of speech letter of each text vocabulary in text dictionary
Breath.
Further, it is determined that module includes:First calculating sub module, if being one for default Correlation Criteria, obtains
The relevance numerical value of each default Correlation Criteria of text vocabulary correspondence is taken, the coupling index data of each text vocabulary are obtained;
Second calculating sub module, if being multiple for default Correlation Criteria, obtains each each default pass of text vocabulary correspondence
All relevance numerical value of each text vocabulary are made mixing operation by the relevance numerical value of bracing part, and warm result is made
For the coupling index data of each text vocabulary, wherein, mixing operation includes weighted calculation, plus and calculates and multiplication and division
At least one of calculate.
Further, it is determined that module includes:Determination sub-module, for each text vocabulary to be met into default Correlation Criteria
Number of times as each text vocabulary coupling index data, wherein, default Correlation Criteria includes:Each text vocabulary
Occur simultaneously in the same sentence of network text with related word;And/or each text vocabulary and related word are in network
The same position in the sentence of network text is occurred in text with identical part of speech.
In the embodiment of the present application, web crawlers based on object to be analyzed related word set in related word from
Crawl in target data source after network text, participle is carried out to network text and obtains multiple text vocabulary, and obtain each
The lexical information of individual text vocabulary, and the coupling index data or many according to default screening conditions to multiple text vocabulary
The part-of-speech information of individual text vocabulary is screened, after screening obtains the association vocabulary for filtering out, using what is filtered out
Association vocabulary updates related word set.By above-described embodiment, the network text that can be crawled to indifference is carried out
Participle and screening, obtain the association vocabulary for filtering out to update related word set, and repeating carries out participle and screening,
Constantly expand and update related word set, so as to the method vocabulary for solving the problems, such as existing word bag accumulation is on the low side,
Reach the effect of the related word set for improving object to be analyzed.
Description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, constitutes the part of the application, this Shen
Schematic description and description please does not constitute the improper restriction to the application for explaining the application.In accompanying drawing
In:
Fig. 1 is a kind of flow chart of the processing method of the related word set according to the embodiment of the present application;
Fig. 2 is the flow chart of the processing method of the optional related word set of another kind according to the embodiment of the present application;With
And
Fig. 3 is a kind of schematic diagram of the processing meanss of the related word set according to the embodiment of the present application.
Specific embodiment
In order that those skilled in the art more fully understand application scheme, below in conjunction with the embodiment of the present application
Accompanying drawing, is clearly and completely described to the technical scheme in the embodiment of the present application, it is clear that described embodiment
The only embodiment of the application part, rather than the embodiment of whole.Based on the embodiment in the application, ability
The every other embodiment that domain those of ordinary skill is obtained under the premise of creative work is not made, all should belong to
The scope of the application protection.
It should be noted that the description and claims of this application and the term " first " in above-mentioned accompanying drawing, "
Two " it is etc. the object for distinguishing similar, without for describing specific order or precedence.It should be appreciated that this
The data that sample is used can be exchanged in the appropriate case, so as to embodiments herein described herein can with except
Here the order beyond those for illustrating or describing is implemented.Additionally, term " comprising " and " having " and they
Any deformation, it is intended that covering is non-exclusive to be included, and for example, contains process, the side of series of steps or unit
Method, system, product or equipment are not necessarily limited to those steps clearly listed or unit, but may include unclear
List or other steps intrinsic for these processes, method, product or equipment or unit.
Explanation of nouns:
Analysis object:Based on network text content, it is intended to analyze the object of its public sentiment content.Possibly current events, produce
Product, personage, policy etc..
Corpus:The network text that reptile crawls.
Dictionary vocabulary:Text in corpus is carried out after participle, with relation form storage between single vocabulary and vocabulary
Lexicon.
Relevance:Refer to the tightness degree between multiple objects (vocabulary).
Screening logic:To the condition algorithm for screening vocabulary.
Word bag:To substitution analysis object, as screening to the network text in corpus, will wherein with analysis
The set of the related information filtering of object vocabulary composition out.
Embodiment 1
According to the embodiment of the present application, there is provided a kind of embodiment of the processing method of related word set, explanation is needed
It is can to hold in the such as computer system of one group of computer executable instructions the step of the flow process of accompanying drawing is illustrated
OK, and, although show logical order in flow charts, but in some cases, can be with different from herein
Order perform shown or described step.
Fig. 1 is a kind of flow chart of the processing method of the related word set according to the embodiment of the present application, as shown in figure 1,
The processing method comprises the steps:
Step S102, based on the related word in the related word set of object to be analyzed net is crawled from target data source
Network text.
Step S104, participle is carried out to network text and obtains multiple text vocabulary, and obtains the vocabulary of each text vocabulary
Information, wherein, lexical information includes the coupling index data of each text vocabulary and/or the part of speech letter of each text vocabulary
Breath, coupling index data are used to indicate the degree of association of each text vocabulary and related word.
Step S106, according to coupling index data and/or multiple text word of the default screening conditions to multiple text vocabulary
The part-of-speech information of remittance is screened, and obtains the association vocabulary for filtering out.
Step S108, using the association vocabulary for filtering out related word set is updated.
Using the embodiment of the present application, the current association in related word set of the web crawlers based on object to be analyzed
Word is crawled after network text from target data source, participle is carried out to network text and obtains multiple text vocabulary, and
Obtain the lexical information of each text vocabulary, and the coupling index number according to default screening conditions to multiple text vocabulary
According to or the part-of-speech information of multiple text vocabulary screened, after screening obtains the association vocabulary for filtering out, using sieve
The association vocabulary selected updates related word set.
By above-described embodiment, the network text that can be crawled to indifference carries out participle and screening, is filtered out
Association vocabulary to update related word set, repeating carries out participle and screening, constantly expands and update related word
Set, so as to the method vocabulary for solving the problems, such as existing word bag accumulation is on the low side, reaches and improves object to be analyzed
The effect of related word set.
In above-described embodiment, can be by based on a large amount of network texts that indifference is crawled, setting up initial corpus.
Network text in the initial corpus is carried out after participle, the dictionary vocabulary after participle is calculated in certain method (i.e. above-mentioned
Text vocabulary) and the relevance between analysis object oriented (i.e. above-mentioned related word), and by rational word
Converge screening logic, filter out qualified dictionary vocabulary (the i.e. above-mentioned text vocabulary for meeting default screening conditions,
I.e. above-mentioned association vocabulary) composition word bag.The word bag can constantly be expanded by repetition above step, be improved for dividing
The word bag content (i.e. above-mentioned related word set) of analysis object.
Specifically, indifference crawls and can refer to and be not provided with particular keywords, and the content of network upgrade in a period of time is complete
Climb down and in portion.Such as climb daily once, the content such as the previous day newly-increased article, comment will all crawl down on website
Come, for the content for having got over, be not repeated to crawl.
Alternatively, the related word in the related word set based on object to be analyzed crawls net from target data source
Before network text, the analysis object oriented (pass in the related word set of i.e. above-mentioned object to be analyzed can be first determined
Connection word), specifically, it is determined that the object to be analyzed, can be referred to as initial word bag content by its name.
In an optional embodiment, after crawling and obtaining network text, initial corpus can be set up.For
It is determined that object to be analyzed (related word in the related word set of i.e. above-mentioned object to be analyzed), from its number of targets
A certain amount of content of text (i.e. above-mentioned network is crawled according to indifference on source (for example, website, forum, mhkc etc.)
Text), as the initial corpus for analysis object.Amount of text contained by initial corpus is bigger, is more conducive to improving
The accuracy that following relevances are calculated.
Alternatively, participle is carried out to network text and obtains multiple text vocabulary, and obtain the vocabulary letter of each text vocabulary
Breath includes:After participle being carried out to network text and obtains multiple text vocabulary, the text word of multiple text vocabulary is created
Allusion quotation;Determine the coupling index data of each text vocabulary in text dictionary according to default Correlation Criteria, and/or extract text
The part-of-speech information of each text vocabulary in dictionary.
In the above-described embodiments, carry out participle in the network text got to swashing from target data source and obtain multiple texts
After vocabulary, the text dictionary of multiple text vocabulary is created, and determine in text dictionary each according to default Correlation Criteria
The coupling index data of text vocabulary and current related word, or extract the word of each text vocabulary in text dictionary
Property information, or each text vocabulary and current related word in text dictionary is determined according to default Correlation Criteria
While coupling index data, the part-of-speech information of each text vocabulary in text dictionary is extracted.Then according to default screening
Condition is screened to the coupling index data of multiple text vocabulary or the part-of-speech information of multiple text vocabulary, is screened
The association vocabulary for going out, reuses the association vocabulary for filtering out and updates related word set.
By above-described embodiment, text dictionary can be created after participle come the lexical information of recording text vocabulary, from
And the extraction of lexical information to text vocabulary is facilitated, realize and rapidly and accurately obtain information and carry out word bag accumulation
Effect.
Specifically, can be using the network text for crawling as initial corpus, then to the text in the initial corpus
This content (i.e. network text) carries out participle, builds comprising (the i.e. text of all vocabulary in text (i.e. network text)
Vocabulary) dictionary (i.e. text dictionary).
Alternatively, determine that the coupling index data of each text vocabulary in text dictionary include according to default Correlation Criteria:
If default Correlation Criteria is one, the relevance numerical value of each default Correlation Criteria of text vocabulary correspondence is obtained, obtained
The coupling index data of each text vocabulary;If default Correlation Criteria is multiple, each text vocabulary correspondence is obtained each
All relevance numerical value of each text vocabulary are made mixing operation by the relevance numerical value of individual default Correlation Criteria, will be melted
With result as each text vocabulary coupling index data, wherein, mixing operation include weighted calculation, plus and calculate
At least one of and multiplication and division calculating.
In the above-described embodiments, carry out participle in the network text got to swashing from target data source and obtain multiple texts
After vocabulary, the text dictionary of multiple text vocabulary is created, can be determined according to default Correlation Criteria each in text dictionary
The coupling index data of individual text vocabulary and current related word, if also, default Correlation Criteria is one, leads to
The relevance numerical value that default Correlation Criteria calculates each text vocabulary is crossed, each text vocabulary and current conjunctive word is obtained
The coupling index data of language;If default Correlation Criteria is multiple, corresponding each the default association of each text vocabulary is obtained
All relevance numerical value of each text vocabulary are made mixing operation by the relevance numerical value of condition, using warm result as
The coupling index data of each text vocabulary, then according to default coupling index number of the screening conditions to multiple text vocabulary
According to or the part-of-speech information of multiple text vocabulary screened, obtain the association vocabulary for filtering out, reuse the pass for filtering out
Connection vocabulary updates related word set.
By above-described embodiment, can be obtained using the default Correlation Criteria of different weights each text vocabulary with it is current
Related word coupling index data, such that it is able to reach neatly obtain coupling index data effect.
Specifically, in the above-described embodiments mixing operation can include weighted calculation, plus and calculate and multiplication and division calculate in
At least one of.For example, when mixing operation includes weighted calculation, even preset Correlation Criteria for multiple, then can be with
The condition weight of default Correlation Criteria is obtained, the relevance number of each text vocabulary is calculated by each default Correlation Criteria
Value, to each condition weight and corresponding relevance numerical value weighted calculation is made, and obtains the coupling index of each text vocabulary
Data.
Alternatively, determine that the coupling index data of each text vocabulary in text dictionary can be wrapped according to default Correlation Criteria
Include:Each text vocabulary is met the coupling index data of the number of times of default Correlation Criteria as each text vocabulary, its
In, default Correlation Criteria includes:Each text vocabulary occurs simultaneously with related word in the same sentence of network text;
And/or each text vocabulary and related word network text is occurred in identical part of speech in network text sentence in
Same position.
In the above-described embodiments, the coupling index number of each text vocabulary and current related word in text dictionary is determined
According to the default Correlation Criteria of institute's reference, can include:Each text vocabulary is with current related word in network text
The number of times occurred simultaneously in same sentence;Or each text vocabulary and current related word in network text with phase
The number of times of the same position in the sentence of network text is occurred in part of speech;Or the group of the default Correlation Criteria of above-mentioned two
Close, the number of times that as each text vocabulary occurs simultaneously with current related word in the same sentence of network text,
The sentence in network text is occurred in identical part of speech in network text with current related word with each text vocabulary
The number of times of middle same position.By above-described embodiment, can efficiently and accurately be determined by above-mentioned default Correlation Criteria
The coupling index data of each text vocabulary and current related word in text dictionary.
Same position in above-described embodiment is specifically as follows:In each sentence of network text with identical word away from
From identical position, such as text vocabulary (as decayed tooth) in sentence with the current conjunctive word of identical (such as Coca-Cola)
Position of the distance within five words, then the position of the text vocabulary (as decayed tooth) in different sentences can be considered as
Identical position;Or, the same position in above-described embodiment can also be specifically:In each sentence of network text
In identical word in the range of position, such as in different sentences, identical text vocabulary is both present in first five of sentence
In individual word, then text vocabulary can be regarded as with identical position.
Specifically, to dictionary vocabulary (each i.e. above-mentioned text vocabulary) and analysis object oriented (i.e. above-mentioned association
Word) relevance (i.e. above-mentioned coupling index data) when calculating, can calculate text word by default Correlation Criteria
(i.e. above-mentioned association refers to relevance in allusion quotation between contained text vocabulary and analysis object oriented (i.e. above-mentioned related word)
Mark data), default Correlation Criteria can be including but not limited to following default Correlation Criteria:
Default Correlation Criteria 1:Dictionary vocabulary (each i.e. above-mentioned text vocabulary) is (i.e. above-mentioned with analysis object oriented
Related word) occur simultaneously in a word (or a section word, article etc.) of network text.
For example, related word is Coca-Cola, and the text vocabulary in dictionary includes Sprite, then the default Correlation Criteria is:
Sprite is with Coca-Cola while occur, statistics Sprite occurs with Coca-Cola in same a word simultaneously in a word
Situation number of times, using the number of times as coupling index data.If in the sentence in network text, Sprite with it is good to eat
Cola situation about occurring while same a word occurs in that 5 times, then Sprite is with the coupling index data of Coca-Cola
5。
Default Correlation Criteria 2:Dictionary vocabulary (each i.e. above-mentioned text vocabulary) is (i.e. above-mentioned with analysis object oriented
Related word) situation of sentence same position is occurred in same part of speech in network text.
For example, if related word is Coca-Cola, the text vocabulary in dictionary includes Sprite, the first of network text
" Coca-Cola is good " is occurred in that in individual sentence, " Sprite is bad " is occurred in that in the second sentence, then Sprite and Coca-Cola
The same position (such as the stem of sentence) of sentence is occurred in same part of speech (such as noun) in network text, now,
The number of times of all words (such as Sprite) for meeting above-mentioned situation of statistics.
Calculating the default Correlation Criteria of coupling index data can choose one default Correlation Criteria of the above, or with multiple pre-
If Correlation Criteria is combined, the different weight calculations of setting go out final relevance numerical value (i.e. above-mentioned coupling index data),
Wherein, relevance numerical value is with the relation of correlation:The more high then text vocabulary of relevance numerical value is associated with related word
Property is bigger.
Alternatively, according to default screening conditions to the coupling index data of multiple text vocabulary and/or multiple text vocabulary
Part-of-speech information is screened, and the association vocabulary for obtaining filtering out includes:By text of the coupling index data in preset range
This vocabulary is used as the association vocabulary for filtering out;Or coupling index data are arranged in the coupling index data of multiple text vocabulary
Name front N names text vocabulary as the association vocabulary for filtering out;Or by text vocabulary that lexical information is default part of speech
As the association vocabulary for filtering out.
In the above-described embodiments, the related word in related word set of the web crawlers based on object to be analyzed is from mesh
Crawl after network text in mark data source, participle is carried out to network text and obtains multiple text vocabulary, and obtain each
The coupling index data of multiple text vocabulary are screened by the lexical information of text vocabulary according to default screening conditions,
Or the part-of-speech information of multiple text vocabulary is screened, or the coupling index data and multiple texts to multiple text vocabulary
The part-of-speech information of this vocabulary is screened, wherein, screening can be by the text by coupling index data in preset range
This vocabulary is remitted as the conjunctive word for filtering out and carried out, or is referred to associating in the coupling index data of multiple text vocabulary
The text vocabulary that data rank is marked in front N names is used as the association vocabulary for filtering out, or is default part of speech by lexical information
Then text vocabulary update related word set as the association vocabulary for filtering out using the association vocabulary for filtering out.It is logical
Above-described embodiment is crossed, different default screening conditions can be arranged to screen to associating vocabulary, such that it is able to realize
Flexibly and effectively screen, while the different screening requirements of client can be met.
Specifically, it is determined that the default screening conditions of word bag vocabulary (i.e. above-mentioned related word set) can be included but do not limited
In following conditions:
Optionally presetting screening conditions for first is:Relevance numerical value (i.e. above-mentioned coupling index data) is in a certain interval
(value such as coupling index data is pre- at two more than certain threshold value, or the value of coupling index data for interior all text vocabulary
If numerical value between situations such as).
Optionally presetting screening conditions for second is:Relevance (i.e. above-mentioned coupling index data) ranking is in front N names
All text vocabulary.
Optionally presetting screening conditions for 3rd is:The text vocabulary of certain specified part of speech.
According to above-mentioned default screening conditions to the coupling index data of multiple text vocabulary or the part of speech of multiple text vocabulary
Information is screened, wherein, the default screening conditions of selection can be one of default screening conditions of the above, or
Multiple default screening conditions are used in combination, and take the common factor of the association vocabulary for filtering out as related word set.
In an optional embodiment, according to default screening conditions to the coupling index data of multiple text vocabulary or
Before the part-of-speech information of multiple text vocabulary is screened, can be to dictionary vocabulary (each i.e. above-mentioned text vocabulary)
Carry out with the relevance measuring and calculating value (i.e. above-mentioned coupling index data) of analysis object oriented (i.e. above-mentioned related word)
Sequence.Specifically, text vocabulary in text dictionary (is gone up with presetting the relevance index that Correlation Criteria is acquired
State coupling index data) it is ranked up from high to low, as follow-up screening content.
Alternatively, updating related word set using the association vocabulary for filtering out includes:Using the association vocabulary for filtering out
Related word is replaced, to update related word set;Or the association vocabulary for filtering out is added into into related word set,
To update related word set.
Specifically, using the association vocabulary that filters out as word bag vocabulary, set up the word bag for object to be analyzed and (go up
The related word set stated).The word bag (i.e. above-mentioned related word set) can also be used for circulating above-mentioned mistake next time
Cheng Shi, substitution analysis object oriented (i.e. above-mentioned related word), to dictionary vocabulary (i.e. above-mentioned text vocabulary)
Relevance is calculated, greatly expands analysis subject word bag (i.e. above-mentioned related word set), and improve constantly pass
The accuracy that connection property (coupling index data) is calculated.
In an optional embodiment, as shown in Fig. 2 the processing method of related word set specifically can include as
Lower step:
Step S202, determines the related word in the related word set of object to be analyzed.
Specifically, it is determined that want object to be analyzed, the name that can be analysed to object is referred to as initial word bag content and (closes
Current related word in connection set of words).
Step S203, crawls network text, sets up initial corpus.
Specifically, the current related word in the related word set of object to be analyzed can be based on from target data source
On crawl network text, wherein, target data source can include website, forum and mhkc etc..
Step S204, participle is carried out to network text, builds text dictionary.
Specifically, participle can be carried out to network text and obtains multiple text vocabulary, and obtain the word of each text vocabulary
Remittance information, wherein, lexical information includes coupling index data of each text vocabulary and current related word and/or each
The part-of-speech information of individual text vocabulary, then builds the text dictionary comprising all text vocabulary in network text.
Step S205, calculates the coupling index data of each text vocabulary in text dictionary and related word.
Specifically, coupling index data or multiple text vocabulary that can be according to default screening conditions to multiple text vocabulary
Part-of-speech information screened, obtain the association vocabulary for filtering out.
The coupling index data of each text vocabulary in text dictionary are ranked up by step S206.
Specifically, can by the measuring and calculating value of the coupling index data of the text vocabulary of each in text dictionary according to from height to
Low order sequence, in order to follow-up screening process.
Alternatively, to dictionary vocabulary (each i.e. above-mentioned text vocabulary) and analysis object oriented (i.e. above-mentioned association
Word) relevance (i.e. above-mentioned coupling index data) when calculating, can calculate text word by default Correlation Criteria
(i.e. above-mentioned association refers to relevance in allusion quotation between contained text vocabulary and analysis object oriented (i.e. above-mentioned related word)
Mark data), default Correlation Criteria can be including but not limited to:
With analysis object oriented (i.e. above-mentioned related word) network text a word (or one section words, an article
Deng) the interior number of times for occurring simultaneously.
The same position of sentence is occurred in same part of speech in network text with analysis object oriented (i.e. above-mentioned related word)
The situation number of times put.
Calculating the default Correlation Criteria of coupling index data can choose one default Correlation Criteria of the above, or with multiple pre-
If Correlation Criteria is combined, the different weight calculations of setting go out final relevance numerical value (i.e. above-mentioned coupling index data),
Wherein, relevance numerical value is with the relation of correlation:The more high then text vocabulary of relevance numerical value and current related word
Relevance it is bigger.
Step S207, the default screening conditions of setting, screens to the text vocabulary in text dictionary.
Specifically, it is determined that the default screening conditions of word bag vocabulary (i.e. above-mentioned related word set) can be included but do not limited
In following conditions:
Optionally presetting screening conditions for first is:Relevance numerical value (i.e. above-mentioned coupling index data) is in a certain interval
(value such as coupling index data is pre- at two more than certain threshold value, or the value of coupling index data for interior all text vocabulary
If numerical value between situations such as).
Optionally presetting screening conditions for second is:Relevance (i.e. above-mentioned coupling index data) ranking is in front N names
All text vocabulary.
Optionally presetting screening conditions for 3rd is:The text vocabulary of certain specified part of speech.
According to above-mentioned default screening conditions to the coupling index data of multiple text vocabulary or the part of speech of multiple text vocabulary
Information is screened, wherein, the default screening conditions of selection can be one of default screening conditions of the above, or
Multiple default screening conditions are used in combination, and take the common factor of the association vocabulary for filtering out as related word set.
Step S208, sets up related word set.
Specifically, it is possible to use the association vocabulary for filtering out updates related word set.
Compared with existing word bag accumulation method, the method for the word bag of the employing of the above embodiments of the present application accumulation it is excellent
Gesture is:Vocabulary growth rate in related word set is fast, and word bag accumulation efficiency is obviously improved;Word bag vocabulary is (i.e.
Association vocabulary) can quantify to weigh with analyzing whether to be truly present to associate between object (i.e. related word);Word bag word
The default Correlation Criteria for converging (associating vocabulary) and analyzing the relevance calculating between object (i.e. related word) can be flexible
Setting, and can be calculated in the form of conditional combination;Carry out again after can sorting according to the value of coupling index data
Screening, so as to can flexibly set its default screening conditions, and screened in the form of screening conditions with combining to be preset;
Also can be by being circulated operation to upper predicate bag cumulative process, the word bag of the output in the cycle of the above one (i.e. related word
Set) replace the analysis object oriented (related word) in this cycle, can iterate the word bag accumulation flow process for carrying out,
So as to realize constantly expanding word bag content (i.e. the content of related word set), improve word bag content accuracy and expand it
The effect of coverage rate.
Embodiment 2
According to the embodiment of the present application, a kind of embodiment of the processing meanss of related word set is additionally provided, such as Fig. 3 institutes
Show, the processing meanss include:Crawl unit 10, processing unit 30, screening unit 50 and updating block 70.
Wherein, unit 10 is crawled, for the related word in the related word set based on object to be analyzed from number of targets
According to crawling network text on source.
Processing unit 30, for carrying out participle to network text multiple text vocabulary are obtained, and obtain each text vocabulary
Lexical information, wherein, the coupling index data of lexical information including each text vocabulary and/or each text vocabulary
Part-of-speech information, coupling index data are used to indicate the degree of association of each text vocabulary and related word.
Screening unit 50, for according to default screening conditions to the coupling index data of multiple text vocabulary and/or multiple
The part-of-speech information of text vocabulary is screened, and obtains the association vocabulary for filtering out.
Updating block 70, for updating related word set using the association vocabulary for filtering out.
Alternatively, processing unit includes:Creation module and determining module.
Wherein, creation module, for after participle being carried out to network text and obtains multiple text vocabulary, creating multiple
The text dictionary of text vocabulary;Determining module, for determining each text word in text dictionary according to default Correlation Criteria
The coupling index data of remittance, and/or extract the part-of-speech information of each text vocabulary in text dictionary.
Using the embodiment of the present application, the current association in related word set of the web crawlers based on object to be analyzed
Word is crawled after network text from target data source, participle is carried out to network text and obtains multiple text vocabulary, and
Obtain the lexical information of each text vocabulary, and the coupling index number according to default screening conditions to multiple text vocabulary
According to or the part-of-speech information of multiple text vocabulary screened, after screening obtains the association vocabulary for filtering out, using sieve
The association vocabulary selected updates related word set.By above-described embodiment, the network text that can be crawled to indifference
Originally participle and screening are carried out, the association vocabulary for filtering out is obtained to update related word set, repeat carry out participle and
Screening, constantly expands and updates related word set, so as to the method vocabulary for solving existing word bag accumulation is on the low side
Problem, reach the effect of the related word set for improving object to be analyzed.
Optionally it is determined that module includes:First calculating sub module and the second calculating sub module.
Wherein, the first calculating sub module, if being one for default Correlation Criteria, obtains each text vocabulary correspondence
The relevance numerical value of default Correlation Criteria, obtains the coupling index data of each text vocabulary;Second calculating sub module,
If being multiple for default Correlation Criteria, the relevance number of corresponding each the default Correlation Criteria of each text vocabulary is obtained
All relevance numerical value of each text vocabulary are made mixing operation, using warm result as each text vocabulary by value
Coupling index data, wherein, mixing operation includes weighted calculation, adds and at least one of calculating and multiplication and division calculating.
In the above-described embodiments, carry out participle in the network text got to swashing from target data source and obtain multiple texts
After vocabulary, the text dictionary of multiple text vocabulary is created, can be determined according to default Correlation Criteria each in text dictionary
The coupling index data of individual text vocabulary and current related word, if also, default Correlation Criteria is one, leads to
The relevance numerical value that default Correlation Criteria calculates each text vocabulary is crossed, each text vocabulary and current conjunctive word is obtained
The coupling index data of language;If default Correlation Criteria is multiple, corresponding each the default association of each text vocabulary is obtained
All relevance numerical value of each text vocabulary are made mixing operation by the relevance numerical value of condition, using warm result as
The coupling index data of each text vocabulary, then according to default coupling index number of the screening conditions to multiple text vocabulary
According to or the part-of-speech information of multiple text vocabulary screened, obtain the association vocabulary for filtering out, reuse the pass for filtering out
Connection vocabulary updates related word set.By above-described embodiment, can be obtained using the default Correlation Criteria of different weights
The coupling index data of each text vocabulary and current related word are taken, is referred to such that it is able to reach neatly acquisition association
The effect of mark data.
Optionally it is determined that module can include:Determination sub-module, for each text vocabulary to be met into default association bar
The number of times of part as each text vocabulary coupling index data, wherein, default Correlation Criteria includes:Each text word
Converge and occur simultaneously in the same sentence of network text with related word;And/or each text vocabulary and related word are in net
The same position in the sentence of network text is occurred in network text with identical part of speech.
In the above-described embodiments, the coupling index number of each text vocabulary and current related word in text dictionary is determined
According to the default Correlation Criteria of institute's reference, can include:Each text vocabulary is with current related word in network text
The number of times occurred simultaneously in same sentence;Or each text vocabulary and current related word in network text with phase
The number of times of the same position in the sentence of network text is occurred in part of speech;Or the group of the default Correlation Criteria of above-mentioned two
Close, the number of times that as each text vocabulary occurs simultaneously with current related word in the same sentence of network text,
The sentence in network text is occurred in identical part of speech in network text with current related word with each text vocabulary
The number of times of middle same position.By above-described embodiment, can efficiently and accurately be determined by above-mentioned default Correlation Criteria
The coupling index data of each text vocabulary and current related word in text dictionary.
Alternatively, screening unit can include:First screening module, the second screening module and the 3rd screening module.
Wherein, the first screening module, for the text vocabulary using coupling index data in preset range as the pass for filtering out
Connection vocabulary;Or second screening module, for the coupling index data rank in the coupling index data of multiple text vocabulary
Front N names text vocabulary as the association vocabulary for filtering out;Or the 3rd screening module, for being pre- by lexical information
If the text vocabulary of part of speech is used as the association vocabulary for filtering out.
In the above-described embodiments, the current conjunctive word in related word set of the web crawlers based on object to be analyzed
Language is crawled after network text from target data source, participle is carried out to network text and obtains multiple text vocabulary, and is obtained
The lexical information of each text vocabulary is taken, the coupling index data of multiple text vocabulary are carried out according to default screening conditions
Screening, or the part-of-speech information of multiple text vocabulary is screened, or the coupling index data to multiple text vocabulary and
The part-of-speech information of multiple text vocabulary is screened, wherein, screening can by by coupling index data in preset range
Interior text vocabulary is remitted as the conjunctive word for filtering out and carried out, or by the coupling index data of multiple text vocabulary
Coupling index data rank front N names text vocabulary as the association vocabulary for filtering out, or be default by lexical information
Then the text vocabulary of part of speech update related word collection as the association vocabulary for filtering out using the association vocabulary for filtering out
Close.By above-described embodiment, different default screening conditions can be arranged to screen to associating vocabulary, so as to can
To realize flexibly and effectively screening, while the different screening requirements of client can be met.
Alternatively, updating block includes:First update module and the second update module.
First update module, for replacing related word using the association vocabulary for filtering out, to update related word set;
Or second update module, for the association for filtering out vocabulary to be added into into related word set, to update related word collection
Close.
Specifically, using the association vocabulary that filters out as word bag vocabulary, set up the word bag for object to be analyzed and (go up
The related word set stated).The word bag (i.e. above-mentioned related word set) can also be used for circulating above-mentioned mistake next time
Cheng Shi, substitution analysis object oriented (i.e. above-mentioned related word), to dictionary vocabulary (i.e. above-mentioned text vocabulary)
Relevance is calculated, greatly expands analysis subject word bag (i.e. above-mentioned related word set), and improve constantly pass
The accuracy that connection property (coupling index data) is calculated.
The processing meanss of related word set include processor and memory, it is above-mentioned crawl unit 10, processing unit 30,
Screening unit 50 and updating block 70 etc. are stored in memory as program unit, are stored in by computing device
Said procedure unit in memory is realizing corresponding function.
Kernel is included in processor, is gone in memory to transfer corresponding program unit by kernel.Kernel can arrange one
Or more, the network text crawled to indifference by adjusting kernel parameter carries out participle and screening, is screened
To update related word set, repeat carries out participle and screening to the association vocabulary for going out, and constantly expands and update conjunctive word
Language set, so as to the method vocabulary for solving the problems, such as existing word bag accumulation is on the low side, reaches and improves object to be analyzed
Related word set effect.
Memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/
Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory, memory includes at least one
Individual storage chip.
Present invention also provides a kind of computer program, when performing on data processing equipment, is adapted for carrying out just
The program code of beginningization there are as below methods step:Based on the related word in the related word set of object to be analyzed from mesh
Network text is crawled in mark data source;Participle is carried out to network text and obtains multiple text vocabulary, and obtain each text
The lexical information of vocabulary, wherein, lexical information include the coupling index data of each text vocabulary and related word and/
Or the part-of-speech information of each text vocabulary;According to coupling index data or many of the default screening conditions to multiple text vocabulary
The part-of-speech information of individual text vocabulary is screened, and obtains the association vocabulary for filtering out;Using the association vocabulary for filtering out more
New related word set.
Above-mentioned the embodiment of the present application sequence number is for illustration only, does not represent the quality of embodiment.
In above-described embodiment of the application, the description to each embodiment all emphasizes particularly on different fields, and does not have in certain embodiment
The part of detailed description, may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, other can be passed through
Mode realize.Wherein, device embodiment described above is only schematic, such as division of described unit,
Can be a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing
Can with reference to or be desirably integrated into another system, or some features can be ignored, or not perform.It is another, institute
The coupling each other for showing or discussing or direct-coupling or communication connection can be by some interfaces, unit or mould
The INDIRECT COUPLING of block or communication connection, can be electrical or other forms.
The unit as separating component explanation can be or may not be it is physically separate, it is aobvious as unit
The part for showing can be or may not be physical location, you can with positioned at a place, or can also be distributed to
On multiple units.Some or all of unit therein can be according to the actual needs selected to realize this embodiment scheme
Purpose.
In addition, each functional unit in the application each embodiment can be integrated in a processing unit, it is also possible to
It is that unit is individually physically present, it is also possible to which two or more units are integrated in a unit.It is above-mentioned integrated
Unit both can be realized in the form of hardware, it would however also be possible to employ the form of SFU software functional unit is realized.
If the integrated unit is realized using in the form of SFU software functional unit and as independent production marketing or used
When, during a computer read/write memory medium can be stored in.Based on such understanding, the technical scheme of the application
The part for substantially contributing to prior art in other words or all or part of the technical scheme can be produced with software
The form of product is embodied, and the computer software product is stored in a storage medium, including some instructions are to make
Obtain a computer equipment (can be personal computer, server or network equipment etc.) and perform each enforcement of the application
The all or part of step of example methods described.And aforesaid storage medium includes:USB flash disk, read-only storage (ROM,
Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic
Dish or CD etc. are various can be with the medium of store program codes.
The above is only the preferred embodiment of the application, it is noted that for the ordinary skill people of the art
For member, on the premise of without departing from the application principle, some improvements and modifications can also be made, these improve and moisten
Decorations also should be regarded as the protection domain of the application.
Claims (10)
1. a kind of processing method of related word set, it is characterised in that include:
Network text is crawled from target data source based on the related word in the related word set of object to be analyzed
This;
Participle is carried out to the network text and obtains multiple text vocabulary, and obtain the word of each text vocabulary
Remittance information, wherein, the lexical information includes the coupling index data of each text vocabulary and/or each institute
The part-of-speech information of text vocabulary is stated, the coupling index data are used to indicate each described text vocabulary and the pass
The degree of association of connection word;
According to coupling index data and/or the plurality of text of the default screening conditions to the plurality of text vocabulary
The part-of-speech information of vocabulary is screened, and obtains the association vocabulary for filtering out;
The related word set is updated using the association vocabulary for filtering out.
2. processing method according to claim 1, it is characterised in that participle is carried out to the network text and obtains many
Individual text vocabulary, and obtain the lexical information of each text vocabulary and include:
After participle being carried out to the network text and obtains multiple text vocabulary, the plurality of text vocabulary is created
Text dictionary;
Determine the coupling index data of each text vocabulary in the text dictionary according to default Correlation Criteria,
And/or extract the part-of-speech information of each text vocabulary in the text dictionary.
3. processing method according to claim 2, it is characterised in that determine the text according to default Correlation Criteria
The coupling index data of each text vocabulary include in dictionary:
If the default Correlation Criteria is one, the corresponding default association bar of each described text vocabulary is obtained
The relevance numerical value of part, obtains the coupling index data of each text vocabulary;
If the default Correlation Criteria is multiple, corresponding each the described default association bar of each text vocabulary is obtained
The relevance numerical value of part, to all described relevance numerical value of each text vocabulary mixing operation is made, and will be melted
With result as each text vocabulary coupling index data, wherein, the mixing operation include weighting meter
Calculate, add and at least one of calculating and multiplication and division calculating.
4. processing method according to claim 2, it is characterised in that determine the text according to default Correlation Criteria
The coupling index data of each text vocabulary include in dictionary:
Each described text vocabulary is met the number of times of the default Correlation Criteria as text vocabulary each described
Coupling index data,
Wherein, the default Correlation Criteria includes:Each described text vocabulary is with the related word in the net
Occur simultaneously in the same sentence of network text;And/or each described text vocabulary with the related word in the net
The same position in the sentence of the network text is occurred in network text with identical part of speech.
5. processing method according to claim 1, it is characterised in that according to default screening conditions to the plurality of text
The part-of-speech information of the coupling index data of this vocabulary and/or the plurality of text vocabulary is screened, and is sieved
The association vocabulary selected includes:
Text vocabulary using coupling index data in preset range is used as the association vocabulary for filtering out;Or
Text of the coupling index data rank in front N names described in the coupling index data of the plurality of text vocabulary
This vocabulary is used as the association vocabulary for filtering out;Or
Using text vocabulary that the lexical information is default part of speech as the association vocabulary for filtering out.
6. processing method as claimed in any of claims 1 to 5, it is characterised in that using the institute for filtering out
Stating the association vocabulary renewal related word set includes:
The related word is replaced using the association vocabulary for filtering out, to update the related word set;
Or
The association vocabulary for filtering out is added into into the related word set, to update the related word collection
Close.
7. a kind of processing meanss of related word set, it is characterised in that include:
Unit is crawled, for the related word in the related word set based on object to be analyzed from target data source
On crawl network text;
Processing unit, for carrying out participle to the network text multiple text vocabulary are obtained, and obtain each institute
The lexical information of text vocabulary is stated, wherein, the lexical information includes the coupling index of each text vocabulary
The part-of-speech information of data and/or each text vocabulary, the coupling index data are used to indicate each text
The degree of association of this vocabulary and the related word;
Screening unit, for according to default screening conditions to the coupling index data of the plurality of text vocabulary and/
Or the part-of-speech information of the plurality of text vocabulary is screened, the association vocabulary for filtering out is obtained;
Updating block, for updating the related word set using the association vocabulary for filtering out.
8. processing meanss according to claim 7, it is characterised in that the processing unit includes:
Creation module, for after participle being carried out to the network text and obtains multiple text vocabulary, creating institute
State the text dictionary of multiple text vocabulary;
Determining module, for determining each text vocabulary in the text dictionary according to default Correlation Criteria
Coupling index data, and/or extract the part-of-speech information of each text vocabulary in the text dictionary.
9. processing meanss according to claim 8, it is characterised in that the determining module includes:
First calculating sub module, if being one for the default Correlation Criteria, obtains each text word
The relevance numerical value of the remittance correspondence default Correlation Criteria, obtains the coupling index data of each text vocabulary;
Second calculating sub module, if being multiple for the default Correlation Criteria, obtains each text vocabulary pair
Answer the relevance numerical value of each default Correlation Criteria, all described relevance to each text vocabulary
Numerical value makees mixing operation, using warm result as each text vocabulary coupling index data, wherein, institute
Stating mixing operation includes weighted calculation, adds and at least one of calculating and multiplication and division calculating.
10. processing meanss according to claim 8, it is characterised in that the determining module includes:
Determination sub-module, for each described text vocabulary to be met the number of times of the default Correlation Criteria as each
The coupling index data of the individual text vocabulary,
Wherein, the default Correlation Criteria includes:Each described text vocabulary is with the related word in the net
Occur simultaneously in the same sentence of network text;And/or each described text vocabulary with the related word in the net
The same position in the sentence of the network text is occurred in network text with identical part of speech.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510726038.1A CN106649334B (en) | 2015-10-29 | 2015-10-29 | Processing method and device of associated word set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510726038.1A CN106649334B (en) | 2015-10-29 | 2015-10-29 | Processing method and device of associated word set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649334A true CN106649334A (en) | 2017-05-10 |
CN106649334B CN106649334B (en) | 2020-09-15 |
Family
ID=58830513
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510726038.1A Active CN106649334B (en) | 2015-10-29 | 2015-10-29 | Processing method and device of associated word set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649334B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107908654A (en) * | 2017-10-12 | 2018-04-13 | 广州艾媒数聚信息咨询股份有限公司 | A kind of recommendation method, system and device in knowledge based storehouse |
CN108984514A (en) * | 2017-06-05 | 2018-12-11 | 中兴通讯股份有限公司 | Acquisition methods and device, storage medium, the processor of word |
CN108984570A (en) * | 2017-06-05 | 2018-12-11 | 北京国双科技有限公司 | There are the merging method and device of intersection set |
CN108984573A (en) * | 2017-06-05 | 2018-12-11 | 北京国双科技有限公司 | There are the merging method and device of intersection set |
CN109087163A (en) * | 2018-07-06 | 2018-12-25 | 阿里巴巴集团控股有限公司 | The method and device of credit evaluation |
CN109885696A (en) * | 2019-02-01 | 2019-06-14 | 杭州晶一智能科技有限公司 | A kind of foreign language word library construction method based on self study |
CN109902295A (en) * | 2019-02-01 | 2019-06-18 | 杭州晶一智能科技有限公司 | A kind of foreign language word library self-training method based on the network information |
CN111324705A (en) * | 2018-12-14 | 2020-06-23 | 财团法人工业技术研究院 | System and method for adaptively adjusting related search terms |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090063959A1 (en) * | 2007-08-20 | 2009-03-05 | Zoran Stejic | Document creation support system |
CN103092966A (en) * | 2013-01-23 | 2013-05-08 | 盘古文化传播有限公司 | Vocabulary mining method and device |
CN103324641A (en) * | 2012-03-23 | 2013-09-25 | 日电(中国)有限公司 | Information record recommendation method and device |
US8782082B1 (en) * | 2011-11-07 | 2014-07-15 | Trend Micro Incorporated | Methods and apparatus for multiple-keyword matching |
CN104360993A (en) * | 2014-11-19 | 2015-02-18 | 广州极盛信息科技开发有限公司 | Method for extracting needed content from text |
CN104408191A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Method and device for obtaining correlated keywords of keywords |
CN104462439A (en) * | 2014-12-15 | 2015-03-25 | 北京国双科技有限公司 | Event recognizing method and device |
US20150187107A1 (en) * | 2012-07-18 | 2015-07-02 | Google Inc. | Highlighting related points of interest in a geographical region |
CN104765830A (en) * | 2015-04-13 | 2015-07-08 | 天脉聚源(北京)传媒科技有限公司 | Information searching method and device |
-
2015
- 2015-10-29 CN CN201510726038.1A patent/CN106649334B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090063959A1 (en) * | 2007-08-20 | 2009-03-05 | Zoran Stejic | Document creation support system |
US8782082B1 (en) * | 2011-11-07 | 2014-07-15 | Trend Micro Incorporated | Methods and apparatus for multiple-keyword matching |
CN103324641A (en) * | 2012-03-23 | 2013-09-25 | 日电(中国)有限公司 | Information record recommendation method and device |
US20150187107A1 (en) * | 2012-07-18 | 2015-07-02 | Google Inc. | Highlighting related points of interest in a geographical region |
CN103092966A (en) * | 2013-01-23 | 2013-05-08 | 盘古文化传播有限公司 | Vocabulary mining method and device |
CN104360993A (en) * | 2014-11-19 | 2015-02-18 | 广州极盛信息科技开发有限公司 | Method for extracting needed content from text |
CN104408191A (en) * | 2014-12-15 | 2015-03-11 | 北京国双科技有限公司 | Method and device for obtaining correlated keywords of keywords |
CN104462439A (en) * | 2014-12-15 | 2015-03-25 | 北京国双科技有限公司 | Event recognizing method and device |
CN104765830A (en) * | 2015-04-13 | 2015-07-08 | 天脉聚源(北京)传媒科技有限公司 | Information searching method and device |
Non-Patent Citations (1)
Title |
---|
何彦青 等: ""基于词与短语的多机器翻译系统融合方法研究"", 《情报学报》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984514A (en) * | 2017-06-05 | 2018-12-11 | 中兴通讯股份有限公司 | Acquisition methods and device, storage medium, the processor of word |
CN108984570A (en) * | 2017-06-05 | 2018-12-11 | 北京国双科技有限公司 | There are the merging method and device of intersection set |
CN108984573A (en) * | 2017-06-05 | 2018-12-11 | 北京国双科技有限公司 | There are the merging method and device of intersection set |
CN107908654A (en) * | 2017-10-12 | 2018-04-13 | 广州艾媒数聚信息咨询股份有限公司 | A kind of recommendation method, system and device in knowledge based storehouse |
CN107908654B (en) * | 2017-10-12 | 2021-12-07 | 广州艾媒数聚信息咨询股份有限公司 | Knowledge base-based recommendation method, system and device |
CN109087163A (en) * | 2018-07-06 | 2018-12-25 | 阿里巴巴集团控股有限公司 | The method and device of credit evaluation |
CN109087163B (en) * | 2018-07-06 | 2021-07-09 | 创新先进技术有限公司 | Credit assessment method and device |
CN111324705A (en) * | 2018-12-14 | 2020-06-23 | 财团法人工业技术研究院 | System and method for adaptively adjusting related search terms |
CN111324705B (en) * | 2018-12-14 | 2023-05-02 | 财团法人工业技术研究院 | System and method for adaptively adjusting associated search terms |
CN109885696A (en) * | 2019-02-01 | 2019-06-14 | 杭州晶一智能科技有限公司 | A kind of foreign language word library construction method based on self study |
CN109902295A (en) * | 2019-02-01 | 2019-06-18 | 杭州晶一智能科技有限公司 | A kind of foreign language word library self-training method based on the network information |
Also Published As
Publication number | Publication date |
---|---|
CN106649334B (en) | 2020-09-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649334A (en) | Conjunction word set processing method and device | |
CN103399891B (en) | Method for automatic recommendation of network content, device and system | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
CN104199833B (en) | The clustering method and clustering apparatus of a kind of network search words | |
CN104317784A (en) | Cross-platform user identification method and cross-platform user identification system | |
CN109816438B (en) | Information pushing method and device | |
CN108038205A (en) | For the viewpoint analysis prototype system of Chinese microblogging | |
CN110795568A (en) | Risk assessment method and device based on user information knowledge graph and electronic equipment | |
CN109471946A (en) | A kind of classification method and system of Chinese text | |
CN107577655A (en) | Name acquiring method and apparatus | |
CN104809108A (en) | Information monitoring and analyzing system | |
CN109800413A (en) | Recognition methods, device, equipment and the readable storage medium storing program for executing of media event | |
CN105354327A (en) | Interface API recommendation method and system based on massive data analysis | |
JPWO2016147276A1 (en) | DATA ANALYSIS SYSTEM, DATA ANALYSIS METHOD, DATA ANALYSIS PROGRAM, AND RECORDING MEDIUM OF THE PROGRAM | |
CN108280164A (en) | A kind of short text filtering and sorting technique based on classification related words | |
CN112347254B (en) | Method, device, computer equipment and storage medium for classifying news text | |
KR20150103509A (en) | Method for analyzing patent documents using a latent dirichlet allocation | |
CN109033166A (en) | A kind of character attribute extraction training dataset construction method | |
CN108984514A (en) | Acquisition methods and device, storage medium, the processor of word | |
CN102915358B (en) | Navigation website implementation method and device | |
CN102402563A (en) | Network information screening method and device | |
Aldous et al. | Five statistical questions about the tree of life | |
CN106485525A (en) | Information processing method and device | |
CN111125561A (en) | Network heat display method and device | |
Hasanati et al. | Implementation of support vector machine with lexicon based for sentimenT ANALYSIS ON TWITter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing Applicant after: Beijing Guoshuang Technology Co.,Ltd. Address before: 100086 Cuigong Hotel, 76 Zhichun Road, Shuangyushu District, Haidian District, Beijing Applicant before: Beijing Guoshuang Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |