CN105354265A - Method and apparatus for automatically constructing association structure of delivered keyword - Google Patents

Method and apparatus for automatically constructing association structure of delivered keyword Download PDF

Info

Publication number
CN105354265A
CN105354265A CN201510697764.5A CN201510697764A CN105354265A CN 105354265 A CN105354265 A CN 105354265A CN 201510697764 A CN201510697764 A CN 201510697764A CN 105354265 A CN105354265 A CN 105354265A
Authority
CN
China
Prior art keywords
service
type
seed words
industry category
industry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510697764.5A
Other languages
Chinese (zh)
Inventor
李强
廖耀华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510697764.5A priority Critical patent/CN105354265A/en
Publication of CN105354265A publication Critical patent/CN105354265A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a method and apparatus for automatically constructing an association structure of a delivered keyword. The method comprises: dividing industry categories related to seed words and service types under the industry categories; according to the industry categories and the service types, constructing the seed words under all the service types to form a seed word library; matching a delivered keyword with the seed word in the seed word library, and determining the industry category and the service type that the delivered keyword belongs to according to a matching result, wherein the industry category and the service type that the delivered keyword belongs to are the industry category and the service type that are stored in association with the seed word matched with the delivered keyword; and according to the industry category and the service type that the delivered keyword belongs to and that is obtained by matching, generating an association structure of the delivered keyword, wherein the association structure comprises an association plan and an association unit, the association plan is the industry category that the delivered keyword belongs to, and the association unit is the service type that the delivered keyword belongs to.

Description

A kind of automatic structure throws in method and the device of keyword relational structure
Technical field
The invention belongs to computer network search technique field, particularly relate to a kind of method and device of automatic structure keyword relational structure.
Background technology
Day by day universal along with network, search engine utilizes netizen to the dependence of search engine and use habit, as far as possible relevant information is passed to target group when netizen's retrieving information.Common way is, website is by throwing in some particular keywords during at search engine system, when netizen is by search engine search related service, once the search word of netizen's input mates with thrown in particular keywords, then the pushed information that display is corresponding with this input keyword in search result web page.
In existing search engine system, before input keyword, need first to configure and will throw in relational structure corresponding to keyword, relational structure comprises association plan and associative cell.Wherein association plan is that level the highest in relational structure is arranged, corresponding to the propelling movement project of website; Associative cell is the level under association plan, corresponding to the concrete business in propelling movement project; Under keyword is positioned at associative cell level; In order to the keyword in convenient management relational structure, usually same association can be divided in the works by the associative cell of relevant category; The keyword of relevant category is divided to same associative cell, and association plan is relevant to keyword product generic attribute with associative cell title.
Current way is when throwing in a collection of keyword, and can observe each keyword meaning of a word according to artificial, the experience according to oneself sets up relevant association plan and associative cell, then the keyword of identical category is divided to relevant associative cell.
By the existing mode artificially dividing relational structure belonging to keyword according to the keyword meaning of a word, when the keyword negligible amounts thrown in, shortcoming is also not obvious, but, along with input keyword quantity gets more and more, have following shortcoming:
1, artificially divide relational structure belonging to keyword according to the keyword meaning of a word, when many people safeguard same relational structure, because everyone criteria for classifying experience is different, relational structure may be caused chaotic, for follow-up management monitoring relational structure data are made troubles;
2, when keyword quantity is more, if still adopt the mode of above-mentioned artificial division keyword relational structure, input efficiency will inevitably be reduced.
Summary of the invention
In view of this, the invention provides method and the device that keyword relational structure thrown in by a kind of automatic structure, by having concluded the seed words of different category under different business level, then input keyword is mated with seed words, to determine to throw in the concrete category involved by keyword and business level, throwing in keyword relational structure for automatically generating, saving artificial participation, improve and throw in efficiency.
According to an aspect of the present invention, which provide a kind of method that keyword relational structure thrown in by automatic structure, comprising:
Divide the type of service under the seed words industry category of being correlated with and industry category; Wherein, described industry category comprises the trade classification belonging to seed words, and type of service comprises the business categorizing under each industry category;
According to divided industry category and type of service, build the seed words under all types of service, form seed dictionary; The each seed words association store of described seed dictionary has corresponding industry category and type of service;
Input keyword is mated with the seed words in described seed bank, determine to throw in the industry category belonging to keyword and type of service according to matching result, the industry category that the industry category belonging to wherein said input keyword and type of service store associated by the seed words that mates with it and type of service;
Industry category and type of service belonging to the input keyword that coupling obtains, generate the relational structure throwing in keyword, described relational structure comprises association plan and associative cell, described coulometer divides industry category belonging to described input keyword into, the type of service of described associative cell belonging to described input keyword.
Wherein, the type of service under the described division seed words industry category of being correlated with and industry category specifically comprises:
Determine to be correlated with containing industry the address correlation linking inlet ports URL of category and type of service;
Use HttpClient technology, download the URL page source code that above-mentioned address correlation linking inlet ports URL is corresponding;
Analyze the URL webpage source code downloaded, and parse web page element, obtain industry category and and relevant industries category under type of service data.
Wherein, described according to divided industry category and type of service, the seed words built under all types of service specifically comprises:
According to ready-portioned industry category and type of service, determine the seed words link URL containing related service;
Use HttpClient technology, download the URL page source code that above-mentioned seed words link URL is corresponding;
Analyze the URL page source code that the seed words link URL downloaded is corresponding, and resolve corresponding web page element, obtain seed words;
The seed words of acquisition and existing seed words are carried out full matching ratio comparatively, if there is the seed words of coupling, then directly filters out, otherwise retain;
By under type of service under corresponding industry category of the seed words association store that finally obtains.
Wherein, described by input keyword mate with the seed words in described seed bank, according to matching result determine input keyword belonging to industry category and type of service, specifically comprise:
According to divided industry category and type of service, read the seed words stored in the type of service under each industry category, put into buffer memory;
Take out each seed words in buffer memory successively, input keyword is mated with seed words;
When throwing in keyword and mating with a certain seed words, determine that industry category corresponding to this seed words and type of service are the industry category and type of service that described input keyword is corresponding.
Wherein, after having divided industry category and type of service, set up corresponding file, under file corresponding to industry category belonging to it put into by file corresponding to described type of service.
According to a further aspect of the invention, which provide the device that keyword relational structure thrown in by a kind of automatic structure, comprising:
Industry and delineation of activities module, for dividing the type of service under industry category and industry category that seed words is correlated with; Wherein, described industry category comprises the trade classification belonging to seed words, and type of service comprises the business categorizing under each industry category;
Seed dictionary builds module, for according to divided industry category and type of service, builds the seed words under all types of service, forms seed dictionary; The each seed words association store of described seed dictionary has corresponding industry category and type of service;
Matching module, for input keyword is mated with the seed words in described seed bank, determine to throw in the industry category belonging to keyword and type of service according to matching result, the industry category that the industry category belonging to wherein said input keyword and type of service store associated by the seed words that mates with it and type of service;
Throw in keyword relational structure and set up module, for according to mating industry category and type of service belonging to the input keyword that obtains, generate the relational structure throwing in keyword, described relational structure comprises association plan and associative cell, described coulometer divides industry category belonging to described input keyword into, the type of service of described associative cell belonging to described input keyword.
Wherein, industry and delineation of activities module comprise:
Industry and business webpage determination module, for the address correlation linking inlet ports URL of determine to be correlated with containing industry category and type of service;
First download module, uses HttpClient technology, downloads the URL page source code that above-mentioned address correlation linking inlet ports URL is corresponding;
Industry and business diagnosis module, analyze the URL webpage source code downloaded, and parse web page element, obtain industry category and and relevant industries category under type of service data.
Wherein, described seed dictionary structure module comprises:
Seed words webpage determination module, for according to ready-portioned industry category and type of service, determines the seed words link URL containing related service;
Second download module, uses HttpClient technology, downloads the URL page source code that above-mentioned seed words link URL is corresponding;
Seed words analysis module, analyzes the URL page source code that the seed words link URL downloaded is corresponding, and resolves corresponding web page element, obtains seed words;
Seed words screening module, carries out full matching ratio comparatively by the seed words of acquisition and existing seed words, if there is the seed words of coupling, then directly filters out, otherwise retain;
Seed words memory module, by under type of service under corresponding industry category of the seed words association store that finally obtains.
Wherein, described matching module comprises:
Seed words memory module, according to divided industry category and type of service, reads the seed words stored in the type of service under each industry category, puts into buffer memory;
Throw in Keywords matching module, take out each seed words in buffer memory successively, input keyword is mated with seed words;
Throw in keyword industry and business determination module, for when input keyword mates with a certain seed words, determine that industry category corresponding to this seed words and type of service are the industry category and type of service that described input keyword is corresponding.
According to the such scheme that the present invention proposes, can according to above-mentioned steps, the unified keyword to throwing in carries out structure keyword relational structure automatically, avoids because many people safeguard same relational structure, causes relational structure hierarchical structure chaotic; Avoid artificial division keyword relational structure, improve and throw in keyword input efficiency.
Accompanying drawing explanation
Fig. 1 builds the method flow diagram throwing in keyword relational structure automatically in the present invention;
Fig. 2 is category and the business seed words division exemplary plot of embodiment of the present invention middle plateform TV;
Fig. 3 builds the device frame figure throwing in keyword relational structure automatically in the present invention.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, the present invention proposes a kind of method that keyword relational structure thrown in by automatic structure.The method comprises:
Step 1, the type of service divided under the seed words industry category of being correlated with and industry category; Wherein, described industry category comprises the trade classification belonging to seed words, such as: flat panel TV, and refrigerator etc.; Type of service comprises the business categorizing under each industry category, such as: brand seed words, and product seed words etc.This step can according to different industries, and to be automatically configured to master, then the mode being aided with artificial supplementation builds.
Step 2, according to divided industry category and type of service, build the seed words under all types of service, form seed dictionary; This step mainly sets up seed dictionary for the ready-portioned industry category of previous step and type of service, and equally to be automatically configured to master, then the mode being aided with artificial supplementation builds.In the seed dictionary that this step is set up, each seed words association store has corresponding industry category and type of service;
Step 3, input keyword being mated with seed words, throwing in the industry category belonging to keyword and type of service for determining.
Step 4, according to mating industry category and type of service belonging to the input keyword that obtains, generate the relational structure throwing in keyword, described relational structure comprises association plan and associative cell.
The said method that the present invention proposes, mainly by having concluded the seed words of different industries category under different service types, then input keyword is mated with seed words, to determine to throw in the concrete industry category involved by keyword and type of service, for automatically generating the association plan and associative cell of throwing in keyword, remove artificial participation, the pushing efficiency of website can be improved.During the information that user is correlated with by search engine search input keyword, throw in website and according to the relational structure throwing in keyword corresponding, the related information of correspondence can be pushed to user, for reference, for user provides multiple choices, user search efficiency and search experience can be improved.
Below each step of said method is described in further detail.
In one embodiment of the present invention, build industry category and type of service in step 1, comprising and be automatically configured to major-minor in artificial constructed mode, is below the concrete elaboration to two kinds of modes.
Automatic structure utilizes the existing trade classification of relevant industries and business datum, uses crawler technology to carry out automatic capturing to data; Artificial constructed is artificially divide industry category and type of service, is generally to supplement as the one automatically built.
In step 1, automatically build industry category and type of service, concrete grammar comprises as follows:
The address correlation linking inlet ports URL of step 101, determine to be correlated with containing industry category and type of service;
Step 102, use HttpClient technology, download the URL page source code that above-mentioned address correlation linking inlet ports URL is corresponding;
Step 103, analyze the URL webpage source code downloaded, and parse web page element, obtain industry category and and relevant industries category under type of service data, and to store; Alternatively, storage can adopt document form to store, and sets up with the file of industry category naming, and with the sub-folder of related service type naming, under service scripts is placed on corresponding industry file; In other embodiments, database association also can be adopted to store industry category and type of service data.
When automatic capturing data also need to carry out supplementing or adjusting, can adopt artificial structure industry category and type of service, the storage organization of setting forth according to above-mentioned 103rd step, sets up file and file.
Build seed words in step 2 and equally also comprise structure and artificial structure automatically; Automatically creating seed words in one embodiment of the invention is utilize crawler technology to carry out automatic capturing structure to related term on internet, and manual creation artificially adds some seed words, and the one as automatic capturing seed words is supplemented.
Automatically build seed words in step 2, concrete grammar is as follows:
Ready-portioned industry category and type of service in step 201, foundation step 1, determine the seed words link URL containing related service;
Step 202, use HttpClient technology, download the URL page source code that the link of above-mentioned seed words is corresponding;
Step 203, analyze URL page source code corresponding to the seed words link URL downloaded, and resolve corresponding web page element, obtain seed words;
Step 204, duplicate removal is carried out to seed words; Specifically comprise: the seed words parsed and existing seed words are carried out full matching ratio comparatively, if there is the word of coupling, then directly filters out, otherwise retain;
Step 205, the type of service of the seed words finally obtained with corresponding industry category to be stored, alternatively, seed words is stored in the type of service file under corresponding industry category, or store in a database, often capable storage seed words;
When the seed words automatically built is not full-time, manually some seed words can be added under the type of service artificially under the industry category of correspondence.
In step 3, input keyword is mated with the seed words in the seed dictionary generated in step 2, if the match is successful, then determines the industry category that described input keyword is corresponding and business datum; If mate unsuccessful, then again input keyword is concluded, again improve the classification building and divide in module and business seed words, improve well, then carry out matching treatment.
In one embodiment of the invention, it is as follows that step 3 throws in the concrete grammar step that keyword and seed words carry out mating:
Step 301, the industry category divided according to above-mentioned steps and type of service, read the seed words data that the type of service under each industry category comprises, put into buffer memory, buffer structure can be Key-Value form, Key is seed words, and Value is industry category title and type of service title;
Each seed words Key-Value value in step 302, successively taking-up buffer memory, mates input keyword with seed words;
Step 303, when throw in keyword mate with a certain seed words time, then represent that this input keyword belongs to this type of service, also belong to selected industry category simultaneously, then determine that the sector category and type of service are the industry category and type of service that described input keyword is corresponding.
The concrete grammar step of throwing in keyword relational structure is generated as follows in step 4:
The relational structure throwing in industry category corresponding to keyword and type of service structure keyword is obtained according to coupling, described relational structure comprises association industry and associative cell, wherein associating industry is mate the industry category obtained, and promoting unit is mate the type of service obtained.
By the said method that the present invention proposes, relational structure corresponding to keyword can be thrown according to obtaining, namely industry and associative cell is associated, user is when using input keyword search relevant information, according to actual needs other relevant informations under association industry and associative cell can be pushed to user, make the Search Results of user rich and varied, for user provides multiple choices.
Be that example describes such scheme of the present invention in detail below with flat panel TV.Fig. 2 is category and the division of business seed words exemplary plot, the wherein brand of embodiment of the present invention middle plateform TV, product, and the seed words under the types of service such as type carries out capturing structure automatically from electric business website, query type seed words is artificial structure:
Large household electrical appliances peace plate TV is industry category, and brand, product, type and query are type of service, and it obtains by analyzing a certain electric business website.Seed words under brand business type to the brand that should have each flat panel TV, as Sharp, TCL, Hisense ...Seed words under product type of service to should product type be had, as flat panel TV etc.; Under seed words below type service type again to the type that flat panel TV should be had to divide by different attribute as high definition, 4K, curved surface etc.; During seed words correspondence search flat panel TV under query type of service, the interrogative that user may use, how as, what, which etc.
Such as, user uses " Sharp's TV how " to search for, how about input keyword " Sharp " wherein, " TV ", " " are mated with the seed words in seed dictionary, obtain matching result be above-mentioned flat panel TV classification under seed words, how because comprise " Sharp " in this search statement and " " throws in keyword, so can show that this lexicon closes brand business type under flat panel TV and query type of service.The industry category of this input keyword meets flat panel TV category, then association plan can be set to " large household electrical appliances-flat panel TV "; The type of service of this input keyword meets brand and query two kinds of types of service simultaneously, then associative cell can be set to " Sharp-query "; The search association plan of the input keyword like this in " Sharp's TV how " this search statement and associative cell just generate automatically.
As shown in Figure 3, the present invention proposes the device that keyword relational structure thrown in by a kind of automatic structure.This device comprises:
Industry and delineation of activities module, for dividing the type of service under industry category and industry category that seed words is correlated with; Wherein, described industry category comprises the trade classification belonging to seed words, such as: flat panel TV, and refrigerator etc.; Type of service comprises the business categorizing under each industry category, such as: brand seed words, and product seed words etc.This step can according to different industries, and to be automatically configured to master, then the mode being aided with artificial supplementation builds.
Seed dictionary builds module, according to divided industry category and type of service, builds the seed words under all types of service, forms seed dictionary; This step mainly sets up seed dictionary for the ready-portioned industry category of previous step and type of service, and equally to be automatically configured to master, then the mode being aided with artificial supplementation builds.In the seed dictionary that this step is set up, each seed words association store has corresponding industry category and type of service;
Matching module, mates input keyword with seed words, throws in the industry category belonging to keyword and type of service for determining.
Throw in keyword relational structure and set up module, industry category and type of service belonging to the input keyword that coupling obtains, generate the relational structure throwing in keyword, described relational structure comprises association plan and associative cell.
Wherein, industry and delineation of activities module comprise again further:
Industry and business webpage determination module, the address correlation linking inlet ports URL of determine to be correlated with containing industry category and type of service;
First download module, uses HttpClient technology, downloads above-mentioned address correlation entrance and connects corresponding URL page source code;
Industry and business diagnosis module, analyze the URL webpage source code downloaded, and parse web page element, obtain industry category and and relevant industries category under type of service data, and to store; Alternatively, storage can adopt document form to store, and sets up with the file of industry category naming, and with the sub-folder of related service type naming, under service scripts is placed on corresponding industry file; In other embodiments, database association also can be adopted to store industry category and type of service data.
Wherein, seed dictionary structure module comprises again further:
Seed words webpage determination module, according to ready-portioned industry category and type of service in industry and delineation of activities module, determines the seed words link URL containing related service;
Second download module, uses HttpClient technology, downloads the URL page source code that the link of above-mentioned seed words is corresponding;
Seed words analysis module, analyzes the seed words downloaded and connects corresponding URL page source code, and resolve corresponding web page element, obtain seed words;
Seed words screening module, carries out duplicate removal to seed words; Specifically comprise: the seed words parsed and existing seed words are carried out full matching ratio comparatively, if there is the word of coupling, then directly filters out, otherwise retain;
Seed words memory module, stores the type of service of the seed words finally obtained with corresponding industry category, alternatively, is stored into by seed words in the type of service file under corresponding industry category, or stores in a database, often capable storage seed words;
Wherein, matching module comprises further:
Seed words read module, the industry category divided according to above-mentioned steps and type of service, read the seed words data that the type of service under each industry category comprises, put into buffer memory, buffer structure can be Key-Value form, Key is seed words, and Value is industry category title and type of service title;
Throw in Keywords matching module, take out each seed words Key-Value value in buffer memory successively, input keyword is mated with seed words;
Throw in keyword industry and business determination module, when throwing in keyword and mating with a certain seed words, then represent that this input keyword belongs to this type of service, also belong to selected industry category simultaneously, then determine that the sector category and type of service are the industry category and type of service that described input keyword is corresponding.
Wherein, throw in keyword relational structure and set up module specifically obtains throwing in industry category corresponding to keyword and type of service structure keyword relational structure according to coupling, described relational structure comprises association industry and associative cell, wherein associating industry is mate the industry category obtained, and promoting unit is mate the type of service obtained.
Due to the scheme that said apparatus is corresponding with said method, detail with reference to the description to method, can not repeat them here.
Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. automatically build a method of throwing in keyword relational structure, comprising:
Divide the type of service under the seed words industry category of being correlated with and industry category; Wherein, described industry category comprises the trade classification belonging to seed words, and type of service comprises the business categorizing under each industry category;
According to divided industry category and type of service, build the seed words under all types of service, form seed dictionary; The each seed words association store of described seed dictionary has corresponding industry category and type of service;
Input keyword is mated with the seed words in described seed bank, determine to throw in the industry category belonging to keyword and type of service according to matching result, the industry category that the industry category belonging to wherein said input keyword and type of service store associated by the seed words that mates with it and type of service;
Industry category and type of service belonging to the input keyword that coupling obtains, generate the relational structure throwing in keyword, described relational structure comprises association plan and associative cell, described coulometer divides industry category belonging to described input keyword into, the type of service of described associative cell belonging to described input keyword.
2. the type of service the method for claim 1, wherein under the described division seed words industry category of being correlated with and industry category specifically comprises:
Determine to be correlated with containing industry the address correlation linking inlet ports URL of category and type of service;
Use HttpClient technology, download the URL page source code that above-mentioned address correlation linking inlet ports URL is corresponding;
Analyze the URL webpage source code downloaded, and parse web page element, obtain industry category and and relevant industries category under type of service data.
3. the method for claim 1, wherein described according to divided industry category and type of service, the seed words built under all types of service specifically comprises:
According to ready-portioned industry category and type of service, determine the seed words link URL containing related service;
Use HttpClient technology, download the URL page source code that above-mentioned seed words link URL is corresponding;
Analyze the URL page source code that the seed words link URL downloaded is corresponding, and resolve corresponding web page element, obtain seed words;
The seed words of acquisition and existing seed words are carried out full matching ratio comparatively, if there is the seed words of coupling, then directly filters out, otherwise retain;
By under type of service under corresponding industry category of the seed words association store that finally obtains.
4. the method for claim 1, wherein described by input keyword mate with the seed words in described seed bank, according to matching result determine input keyword belonging to industry category and type of service, specifically comprise:
According to divided industry category and type of service, read the seed words stored in the type of service under each industry category, put into buffer memory;
Take out each seed words in buffer memory successively, input keyword is mated with seed words;
When throwing in keyword and mating with a certain seed words, determine that industry category corresponding to this seed words and type of service are the industry category and type of service that described input keyword is corresponding.
5. after the method for claim 1, wherein having divided industry category and type of service, set up corresponding file, under file corresponding to industry category belonging to it put into by file corresponding to described type of service.
6. automatically build the device throwing in keyword relational structure, comprising:
Industry and delineation of activities module, for dividing the type of service under industry category and industry category that seed words is correlated with; Wherein, described industry category comprises the trade classification belonging to seed words, and type of service comprises the business categorizing under each industry category;
Seed dictionary builds module, for according to divided industry category and type of service, builds the seed words under all types of service, forms seed dictionary; The each seed words association store of described seed dictionary has corresponding industry category and type of service;
Matching module, for input keyword is mated with the seed words in described seed bank, determine to throw in the industry category belonging to keyword and type of service according to matching result, the industry category that the industry category belonging to wherein said input keyword and type of service store associated by the seed words that mates with it and type of service;
Throw in keyword relational structure and set up module, for according to mating industry category and type of service belonging to the input keyword that obtains, generate the relational structure throwing in keyword, described relational structure comprises association plan and associative cell, described coulometer divides industry category belonging to described input keyword into, the type of service of described associative cell belonging to described input keyword.
7. device as claimed in claim 6, wherein, industry and delineation of activities module comprise:
Industry and business webpage determination module, for the address correlation linking inlet ports URL of determine to be correlated with containing industry category and type of service;
First download module, uses HttpClient technology, downloads the URL page source code that above-mentioned address correlation linking inlet ports URL is corresponding;
Industry and business diagnosis module, analyze the URL webpage source code downloaded, and parse web page element, obtain industry category and and relevant industries category under type of service data.
8. device as claimed in claim 6, wherein, described seed dictionary builds module and comprises:
Seed words webpage determination module, for according to ready-portioned industry category and type of service, determines the seed words link URL containing related service;
Second download module, uses HttpClient technology, downloads the URL page source code that above-mentioned seed words link URL is corresponding;
Seed words analysis module, analyzes the URL page source code that the seed words link URL downloaded is corresponding, and resolves corresponding web page element, obtains seed words;
Seed words screening module, carries out full matching ratio comparatively by the seed words of acquisition and existing seed words, if there is the seed words of coupling, then directly filters out, otherwise retain;
Seed words memory module, by under type of service under corresponding industry category of the seed words association store that finally obtains.
9. device as claimed in claim 6, wherein, described matching module comprises:
Seed words memory module, according to divided industry category and type of service, reads the seed words stored in the type of service under each industry category, puts into buffer memory;
Throw in Keywords matching module, take out each seed words in buffer memory successively, input keyword is mated with seed words;
Throw in keyword industry and business determination module, for when input keyword mates with a certain seed words, determine that industry category corresponding to this seed words and type of service are the industry category and type of service that described input keyword is corresponding.
10. device as claimed in claim 6, wherein, after described industry and the good industry category of delineation of activities Module Division and type of service, sets up corresponding file, under file corresponding to industry category belonging to it put into by file corresponding to described type of service.
CN201510697764.5A 2015-10-23 2015-10-23 Method and apparatus for automatically constructing association structure of delivered keyword Pending CN105354265A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510697764.5A CN105354265A (en) 2015-10-23 2015-10-23 Method and apparatus for automatically constructing association structure of delivered keyword

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510697764.5A CN105354265A (en) 2015-10-23 2015-10-23 Method and apparatus for automatically constructing association structure of delivered keyword

Publications (1)

Publication Number Publication Date
CN105354265A true CN105354265A (en) 2016-02-24

Family

ID=55330238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510697764.5A Pending CN105354265A (en) 2015-10-23 2015-10-23 Method and apparatus for automatically constructing association structure of delivered keyword

Country Status (1)

Country Link
CN (1) CN105354265A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255881A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 It is a kind of to generate the method and device for launching keyword
CN108304484A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Key word matching method and device, electronic equipment and readable storage medium storing program for executing
CN110489649A (en) * 2019-08-19 2019-11-22 北京创鑫旅程网络技术有限公司 The method and device of label association content

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6640223B1 (en) * 1995-06-07 2003-10-28 America Online, Inc. Seamless integration of internet resources
CN102426572A (en) * 2011-07-05 2012-04-25 百度在线网络技术(北京)有限公司 Method and equipment for classifying business entries
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data
CN102955807A (en) * 2011-08-26 2013-03-06 华为软件技术有限公司 Retrieval method and retrieval device for associated information

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6640223B1 (en) * 1995-06-07 2003-10-28 America Online, Inc. Seamless integration of internet resources
CN102426572A (en) * 2011-07-05 2012-04-25 百度在线网络技术(北京)有限公司 Method and equipment for classifying business entries
CN102955807A (en) * 2011-08-26 2013-03-06 华为软件技术有限公司 Retrieval method and retrieval device for associated information
CN102915380A (en) * 2012-11-19 2013-02-06 北京奇虎科技有限公司 Method and system for carrying out searching on data

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255881A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 It is a kind of to generate the method and device for launching keyword
CN108255881B (en) * 2016-12-29 2022-02-11 北京国双科技有限公司 Method and device for generating release keywords
CN108304484A (en) * 2017-12-29 2018-07-20 北京城市网邻信息技术有限公司 Key word matching method and device, electronic equipment and readable storage medium storing program for executing
CN110489649A (en) * 2019-08-19 2019-11-22 北京创鑫旅程网络技术有限公司 The method and device of label association content
CN110489649B (en) * 2019-08-19 2023-06-27 北京创鑫旅程网络技术有限公司 Method and device for associating content with tag

Similar Documents

Publication Publication Date Title
US11580104B2 (en) Method, apparatus, device, and storage medium for intention recommendation
Coscia et al. Demon: a local-first discovery method for overlapping communities
CN102855309B (en) A kind of information recommendation method based on user behavior association analysis and device
CN104199833B (en) The clustering method and clustering apparatus of a kind of network search words
Hadgu et al. Identifying and analyzing researchers on twitter
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
CN105404699A (en) Method, device and server for searching articles of finance and economics
Goel et al. Discovering similar users on twitter
CN104462553A (en) Method and device for recommending question and answer page related questions
CN104809108A (en) Information monitoring and analyzing system
CN105378730A (en) Social media content analysis and output
CN102915358B (en) Navigation website implementation method and device
Danisch et al. Towards multi-ego-centred communities: a node similarity approach
CN110706015A (en) Advertisement click rate prediction oriented feature selection method
CN107679186A (en) The method and device of entity search is carried out based on entity storehouse
Santos et al. Aggregated search result diversification
CN105354265A (en) Method and apparatus for automatically constructing association structure of delivered keyword
CN116362684A (en) Library cluster-based book management method, library cluster-based book management device, library cluster-based book management equipment and storage medium
US10147095B2 (en) Chain understanding in search
CN105426392A (en) Collaborative filtering recommendation method and system
CN113806492B (en) Record generation method, device, equipment and storage medium based on semantic recognition
CN110717089A (en) User behavior analysis system and method based on weblog
CN116775826A (en) Intelligent question-answering method and device, electronic equipment and storage medium
CN103324720A (en) Personalized recommendation method and system according to user state
CN116226494A (en) Crawler system and method for information search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160224

RJ01 Rejection of invention patent application after publication