CN105574162B

CN105574162B - The method of the automatic hyperlink of keyword

Info

Publication number: CN105574162B
Application number: CN201510946128.1A
Authority: CN
Inventors: 吴阳; 杜宇
Original assignee: NANJING DINGYAN INFORMATION TECHNOLOGY Co Ltd
Current assignee: NANJING DINGYAN INFORMATION TECHNOLOGY Co Ltd
Priority date: 2015-12-16
Filing date: 2015-12-16
Publication date: 2019-05-03
Anticipated expiration: 2035-12-16
Also published as: CN105574162A

Abstract

A kind of method of the automatic hyperlink of keyword, when user opens application program, whether program needs to download keyword hyperlink data from specified server according to current configuration determination, if necessary to download, then downloads automatically from specified server and is saved in local；For user when opening document by application program, application program automatically analyzes keyword when parsing document, and the search matching keyword in keyword hyperlink database, obtains corresponding hyperlink data；If being not matched to keyword in database, then access specified server, the hyperlink data of nominal key is obtained with this, and it updates to local, if keyword can be matched in database, according to the hyperlink data that matching keyword obtains, when showing document content, hyperlink is shown in keyword region.Thus avoiding hyperlink can not also be the problem of remotely greatly having limited to its function and effect by way of keyword realization automatically forms hyperlink.

Description

The method of the automatic hyperlink of keyword

Technical field

The present invention relates to a kind of hyperlink technical field, the method for especially a kind of automatic hyperlink of keyword.

Background technique

Hyperlink belongs to a part of a webpage in itself, it be it is a kind of allow we with other webpages or website Between the element that is attached.After each web page interlinkage together, a website could be really constituted.So-called hyperlink refers to The connection relationship of a target is directed toward from a webpage, this target can be another webpage, be also possible in same web page Different location, can also be a picture, an e-mail address, a file, even an application program.And It is used to the object of hyperlink in one webpage, can be one section of text either picture.When viewer click it is linked Text or picture after, hyperlink target is displayed on browser, and is opened or run according to the type of target.

And this hyperlink at present can not also remotely by keyword realize automatically form hyperlink by way of, by This has greatly limited to its function and effect.

Summary of the invention

The purpose of the present invention is to provide a kind of methods of the automatic hyperlink of keyword, when user opens application program, Whether program needs to download keyword hyperlink data from specified server according to current configuration determination, if necessary to download, It is then downloaded automatically from specified server and is saved in local；User is when opening document by application program, application program solution Keyword is automatically analyzed when analysing document, and the search matching keyword in keyword hyperlink database, obtains corresponding hyperlink number According to；If being not matched to keyword in database, specified server is accessed, the super of nominal key is obtained with this Link data, and update and arrive locally, if keyword can be matched in database, according to the super chain of matching keyword acquisition Data are connect, when showing document content, show hyperlink in keyword region.Avoiding hyperlink can not also remotely lead to Cross the problem of keyword realizes the form for automatically forming hyperlink and thus greatly limited to its function and effect.

In order to achieve the above object, the technical scheme is that a kind of method of the automatic hyperlink of keyword, user When opening application program, whether program needs to download keyword hyperlink number from specified server according to current configuration determination According to downloading, then download automatically from specified server and be saved in local if necessary；User is opening text by application program When shelves, application program automatically analyzes keyword when parsing document, and the search matching keyword in keyword hyperlink database, obtains Take corresponding hyperlink data；If being not matched to keyword in database, specified server is accessed, is referred to this to obtain Determine the hyperlink data of keyword, and updates and arrive locally, it is crucial according to matching if keyword can be matched in database The hyperlink data that word obtains shows hyperlink in keyword region when showing document content.

After adopting the above method, the present invention from specified server downloading keyword hyperlink data and can save automatically To local, thus according to the hyperlink data analyzed and obtained, when showing document content, super chain is shown in keyword region It connects.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 is the flow chart of random content extraction algorithm of the invention.

Fig. 3 is the flow chart of layered contents extraction algorithm of the invention.

Fig. 4 is the flow chart of keyword extraction of the invention.

Fig. 5 is the flow chart that keyword of the invention is collected automatically.

Specific embodiment

The present invention is described in further detail for the embodiment provided below in conjunction with attached drawing.

Referring to shown in Fig. 1-Fig. 5, the method for the automatic hyperlink of keyword, when user opens application program, program according to Whether current configuration determination needs to download keyword hyperlink data from specified server, if necessary to download, then automatically from Specified server is downloaded and is saved in local；User is when opening document by application program, when application program parses document Keyword is automatically analyzed, and the search matching keyword in keyword hyperlink database, obtains corresponding hyperlink data；If Database is not matched to keyword, then accesses specified server, and the hyperlink data of nominal key is obtained with this, And update to local, if keyword can be matched in database, according to the hyperlink data that matching keyword obtains, When showing document content, hyperlink is shown in keyword region.

Wherein, the mode of application program parsing document are as follows:

Single document Sample Data Collection is first carried out, that is, is analyzed according to the document that user opens, general text The key message of data can all be stored in following place: filename, Document Title, bookmark, document first segment content, document Final stage content.In consideration of it, the present invention also will be mainly from these part sample drawn data.Sample data is first stored in this Ground is completed the extraction of keyword, classification, weight adjustment to which these data are uploaded to server again after user's networking and is surpassed The preparation of chain data；It is uncertain due to uploading ground opportunity, need that recording documents the last time opens when recording sample data when Between and upload front opening number, this two information be after calculate keyword weight when need；The sample number According to storage format be followed successively by as shown in table 1 data package size, last opening time, open number, document language, file name Length, file name data, Document Title length, Document Title data, first section content-length, first section content-data, in latter end Hold length, latter end content-data, bookmark entry quantity, the first bookmark entry length, the first bookmark entry content, the second bookmark item Mesh length, second the n-th bookmark entry of bookmark entry content ... length, the n-th bookmark entry content, random content quantity, first with Machine content-length, the first random content data, the second random content length, second the n-th random content of random content data ... are long The data segment of degree, the n-th random content data；Wherein n is positive integer；

Table 1

The meaning description of above structure content is as shown in table 2:

Table 2

And when carrying out random content extraction, it can be using randomly selecting algorithm or layered extraction method is extracted；

For smaller document or without the document of bookmark information and distributed intelligence, calculation is randomly selected using this Method, the content randomly selected algorithm and random write is wanted to be rounded a document, the algorithm detailed process are as follows, it is first determined need The quantity of the sample of extraction generates the list of random numbers of one group of sample size and does not repeat according to the size of document content, then according to According to each numerical value in the ordered series of numbers table group as document bias internal, it is successively read the content of text of regular length and preservation；

For bigger document, using this layered extraction method algorithm.The algorithm can be according to the strategy of sample drawn It extracts comprehensively or emphasis extracts to define the weight coefficient H of first layer sample.

The layered extraction algorithm detailed process is as follows, it is first determined the quantity N for the sample for needing to extract, first layer need The sample size to be extracted is the N/H of total quantity.If the document has bookmark information, using bookmark information as first layer sample Pond, if without bookmark information, using paging information as first layer sample pool.It is constructed with each bookmark ID or each paging ID number List, then the first layer sample list needed using the quantity of bookmark or number of pages as radix construction.Next according to extraction First layer sample list respectively individually extracts H sample.Such as: if it is using bookmark information as sample pool, according to having given birth to At first layer sample list, each list item is a bookmark ID, individually extract two layers of sample when, according to bookmark ID Come the position of locating documents, then opens preservation with H samples of text data of algorithm extraction are randomly selected；

Carry out the mode of sample data upload and keyword message extraction are as follows: sample data can be opened different according to user Document is updated, and is stored in local device；When user equipment networking, it is locally stored according to strategy setting upload In sample data to server, to carry out keyword extraction and follow-up work；

User's usage scenario is varied, can totally be divided into: do not network, mobile network, fixed mesh (LAN, WIFI), networked environment is used according to different, different sample datas is taken to upload strategy, specific as follows:

(1) environment that do not network: without upload operation.

(1) mobile network environment: without upload operation.

(2) fixed network environment: idle uploads, limiting uploading speed.

(4) user oneself definition strategy, such as: allow to upload when mobile network, allow in the time in fixed network time limit Pass etc..

When carrying out client id calculating and verification, often using the calculating of several client ids discussed below And method of calibration, but client id not only limits lower these types of method, while being also possible to several algorithm combinations and using:

1, the method for hardware combinations ID

The sample information of collection needs to indicate user belonging to the sample information, and software download user when upload It is also required to indicate User ID when keyword message and hyperlink data, i.e. data are and user-association.This User ID needs energy Uniquely indicate user.

Calculation method: obtaining the id information of environment division hardware in a fixed order, and the ID that can not be obtained is filled with FF, And a string of characters are sequentially combined into according to this.

Method of calibration: which hardware each section information due to forming ID, each equipment have different, and do not guarantee The hard disk of each equipment will not be replaced.The method so verified cannot be verified with simple exact matching, specifically when verification, often A hardware information analyze come individually verify, when all hardware informations be more than 50% match when, with regard to verify pass through.It is right simultaneously It may include more than one in the equipment having in some hardware, at this moment verify single hardware information, as long as having in same item of hardware One matching indicates Hardware match success.

2, the method for client unique information

According to the information of unique identification user (such as: E-mail address, phone number etc.) come as client id.Verification When, to exactly match just calculation successful match.

It is to be extracted with keyword extraction algorithms, that is, user's sample data is uploaded when carrying out keyword extraction To after server, server end just carries out keyword extraction according to these data.Algorithm is generally described as follows:

1. pair text carries out word segmentation processing.

2. the text of pair different position is weighted processing, as previously described, file is contained in the sample of upload The weight of the information such as name, Document Title, bookmark, these information should be different, so to add different weights to these information. Such as: to filename weighted 5, Document Title weighted 3.

3. reservation noun, verb and adjective, and the frequency that each word occurs is calculated, obtain<frequency, the weight>of each word List.

4. calculating the Words similarity of the word in the top in above-mentioned list, similar word is merged.

5. finally obtaining Keyword List.

Each keyword belongs at least one scope, for example bicycle is included in multiple scopes such as " traffic ", " movement " Within.Extraction to keyword and then it is further concluded that affiliated scope former list of file names.

When carrying out key data accumulation, weight adjustment, keyword and scope are the label data that can be used as user, this A little label datas also have different weights, according to this come adjust hyperlink data push priority.

The time with keyword hit of label data weight, the parameters such as hit-count are related.Principle is to order in the recent period In keyword its label weight it is bigger；Hit-count more multiple weighing value is bigger.

During analyzing sample data extraction keyword, it is possible to which there is no corresponding keys in keyword database Word information so also can not just match the keyword and its label information.It is automatic that this keyword is specially designed in response to this Collection method, the big principle that works are as follows:

1. being extracted according to the keyword extraction algorithm to sample data.

2. for being not matched to label and the higher keyword of weight ratio, it is added to that this client id is associated not to return In class key table, while being also added to global do not sort out in key data table.

Periodically or manually boot that processing is above-mentioned not to sort out key data table process 3. automatic, according to the frequency of occurrences from height The keyword in table is successively handled to low sequence:

A) search whether the keyword has synonym or near synonym categorized, then using its label if having, and from Do not sort out key table and delete the keyword, completes the treatment process of the keyword.Otherwise continue following step.

B) label information of the keyword is obtained from appointed website or server, if obtained successfully, uses the letter Breath, and never sort out in key table and delete the keyword, complete the treatment process of the keyword.Otherwise, continue following step Suddenly.

C) the manual trasaction key label data of administrator is prompted.Complete the keyword treatment process.

4. not sorting out keyword message for what is had been processed by, corresponding client tag information and these clients are updated Do not sort out key table in end.

When executing keyword and hyperlink data-pushing, needs are judged whether according to the state of networking when software is opened From service trasaction key hyperlink data, specific rules are consistent with the rule of sample data is uploaded；Hyperlink number is downloaded from server According to when need to provide software client id information, server can be according to finding keyword hyperlink data with ID and return to related data Information as size, for software download；

The data format of downloading is as shown in table 3, particularly, be followed successively by size, number of labels, weight, keyword quantity, Keyword List, hyperlink data, label data data segment:

Table 3

The data of downloading are stored in local device, and when user opens document, software analyzes document content, are matched above-mentioned Keyword List in each label of data, the relatively high label of priority match weight, then keyword in associated document after matching With hyperlink data, hyperlink content is shown according to strategy when showing document.

And keyword hyperlink data update method is commonly two kinds of update methods, but is not limited only to both sides Method, while can also be applied in combination with several method, two kinds of update methods are as follows:

1. manual update method

Administrator updates server database information, input<label, keyword, hyperlink data>combination letter manually Breath；The corresponding relationship of keyword and label is the relationship of multi-to-multi；When pushing hyperlink data to client, above-mentioned mark is searched Information is signed,<label, keyword, hyperlink data>information issue client matched；

2. automatic obtaining method

Administrator setting given server is used to automatic lookup label, keyword message, gives in push hyperlink data Client, according to requesting its hyperlink data on client tag, keyword message to server automatically；These specified clothes Data on business device are generally provided by third party or big data automatically generates.

In addition keyword hyperlink data push strategy, specific as follows:

The hyperlink data of same label and keyword association can have many items, need in this case certain Strategy so as to allow server know how choose data pushed.

Some strategies are only enumerated below, but not only limit these types strategy.It can be several strategy combinations simultaneously to use.

1. priority is arranged according to label.Default setting is that automatic obtain links data, and administrator also adds manually for label Add hyperlink data, the different link data of same label are set as different priority, and push link data are to client Priority ratio higher data are preferentially pushed when end；

2. priority is arranged according to keyword, default setting is that automatic obtain links data, and administrator is manually also key Word adds hyperlink data, and the different link data of same keyword are set as different priority, push link data Priority ratio higher data are preferentially pushed when to client, when keyword priority is identical, then compare the priority of its label, The preferential push higher data of priority ratio；

3. according to<label, keyword>combination settings priority.Default setting is that automatic obtain links data, and administrator is also It is manually combination addition hyperlink data, the different link data of same combination are set as different priority, push chain Priority ratio higher data are preferentially pushed when connecing data to client, if combined priority is identical, then comparison keyword Priority, if the priority of keyword is still identical, then compares the priority of its label, preferential push priority ratio is higher Data.

Whether program needs according to current configuration determination from specified server when the user opens application program The judgment mechanism for downloading keyword hyperlink data includes at least the following:

1) it when locally there is no hyperlink data, needs to download hyperlink data from server.

2) whether the Keyword Tag that connection server detects the client updates, and needs to download if being updated super Grade link data.

3, connection server detects whether the associated hyperlink data of the corresponding Keyword Tag of the client has updated, It needs to download hyperlink data if being updated.

What has been described above is only a preferred embodiment of the present invention, it is noted that for those of ordinary skill in the art For, without departing from the concept of the premise of the invention, various modifications and improvements can be made, these belong to the present invention Protection scope.

Claims

1. a kind of method of the automatic hyperlink of keyword, it is characterised in that: when user opens application program, program is according to current Configuration determination whether need to download keyword hyperlink data from specified server, if necessary to download, then automatically from specified Server download and be saved in local；When opening document by application program, application program parses automatic when document user Analysis keyword, and the search matching keyword in keyword hyperlink database, obtain corresponding hyperlink data；If in data Library is not matched to keyword, then accesses specified server, the hyperlink data of nominal key is obtained with this, and more It is new to arrive locally, if keyword can be matched in database, according to the hyperlink data that matching keyword obtains, showing When document content, hyperlink is shown in keyword region；

Wherein, in the mode of the application program parsing document are as follows: first carry out single document Sample Data Collection, that is, root It is analyzed according to the document that user opens, sample data is first stored in local, to which these data are uploaded again after user's networking The preparation of extraction, classification, the weight adjustment and hyperlink data of keyword is completed to server；It needs to remember when recording sample data The number of the front opening of time and upload that document the last time opens is recorded, this two information calculate keyword weight after being When need；The storage format of the sample data be followed successively by storing data packet size, last opening time, open number, Document language, file name length, file name data, Document Title length, Document Title data, first section content-length, head Section content-data, latter end content-length, latter end content-data, bookmark entry quantity, the first bookmark entry length, the first bookmark item Mesh content, the second bookmark entry length, second the n-th bookmark entry of bookmark entry content ... length, the n-th bookmark entry content, with Machine content quantity, the first random content length, the first random content data, the second random content length, the second random content number According to ... the data segment of the n-th random content length, the n-th random content data；Wherein n is positive integer；It is taken out to random content It, can be using randomly selecting algorithm or layered extraction method is extracted when taking；The algorithm of randomly selecting wants random write to be rounded The content of a document, the algorithm detailed process are as follows, it is first determined the quantity for the sample for needing to extract, according to the big of document content It is small, it generates the list of random numbers of one group of sample size and does not repeat, further in accordance with each numerical value in the ordered series of numbers table group as inclined in document It moves, is successively read the content of text of regular length and preservation；The layered extraction algorithm detailed process is as follows, it is first determined needs The quantity N for the sample to be extracted, the sample size that first layer needs to extract are the N/H of total quantity；If the document has bookmark letter Breath, then using bookmark information as first layer sample pool, if without bookmark information, using paging information as first layer sample Pond；List is constructed with each bookmark ID or each paging ID number, then needed as radix construction using the quantity of bookmark or number of pages First layer sample list；Next according to the first layer sample list of extraction, H sample is respectively individually extracted；If it is with book Information is signed as sample pool, according to generated first layer sample list, each list item is a bookmark ID, is individually being taken out When taking two layers of sample, according to bookmark ID come the position of locating documents, then with randomly select algorithm extract H samples of text number It is saved according to opening；And carry out the mode of sample data upload and keyword message extraction are as follows: sample data can be opened not according to user Same document is updated, and is stored in local device；When user equipment networking, local deposit is uploaded according to strategy setting In the sample data to server of storage, to carry out keyword extraction and follow-up work；Carrying out client id calculating and verification When, it also can be the combination of several method that method, which includes the method for hardware combinations ID or the method for client unique information,；Into Row keyword extraction is extracted with keyword extraction algorithms, that is, after user's sample data is uploaded onto the server, Server end just carries out keyword extraction according to these data；Extraction and then it is further concluded that affiliated model to keyword Former list of file names on farmland；When carrying out key data accumulation, weight adjustment, keyword and scope are the label that can be used as user Data, these label datas also have different weights, according to this come adjust hyperlink data push priority, label data weight The time with keyword hit, parameter as hit-count is related, and principle is keyword its label hit in the recent period Weight is bigger, and hit-count more multiple weighing value is bigger；Executing keyword and when hyperlink data-pushing, when software is opened according to The state of networking judges whether to need from service trasaction key hyperlink data, specific rules and the rule one for uploading sample data It causes；Need to provide software client id information when downloading hyperlink data from server, server can find keyword according to ID Hyperlink data simultaneously return to information as related data size, for software download；The data format of downloading particularly, is followed successively by Size, number of labels, weight, keyword quantity, Keyword List, hyperlink data, label data data segment；The data of downloading It is stored in local device, when user opens document, software analyzes document content, matches crucial in each label of above-mentioned data Word list, the relatively high label of priority match weight, then keyword and hyperlink data in associated document after matching are showing text Hyperlink content is shown according to strategy when shelves.

2. the method for the automatic hyperlink of keyword according to claim 1, it is characterised in that: the plan that sample data uploads Slightly:

(1) environment that do not network: without upload operation；

(2) mobile network environment: without upload operation；

(3) fixed network environment: idle uploads, limiting uploading speed；

(4) user oneself definition strategy, comprising: allow to upload when mobile network, allow to upload in the time in fixed network time limit.

3. the method for the automatic hyperlink of keyword according to claim 2, it is characterised in that: and keyword hyperlink Data-updating method includes two kinds of update methods as follows:

(1) manual update method: i.e. administrator updates server database information, input < label, keyword, super chain manually Connect data > combined information；The corresponding relationship of keyword and label is the relationship of multi-to-multi；Hyperlink data is pushed to client When, above-mentioned label information is searched,<label, keyword, hyperlink data>information issue client matched；

(2) update method obtained automatically: i.e. administrator setting given server is used to automatic lookup label, keyword message, In push hyperlink data to client, request its super automatically according on client tag, keyword message to server Link data；Data on these specified servers are generally provided by third party or big data automatically generates；

In addition keyword hyperlink data push strategy, three kinds of push strategy specific as follows:

(1) priority is arranged according to label, default setting is that automatic obtain links data, and administrator is also super for label addition manually Grade link data, the different link data of same label are set as different priority, when push link data are to client The preferential push higher data of priority ratio；

(2) priority is arranged according to keyword, default setting is that automatic obtain links data, and administrator also adds manually for keyword Add hyperlink data, the different link data of same keyword are set as different priority, and push link data are to visitor Priority ratio higher data are preferentially pushed when the end of family, when keyword priority is identical, then compare the priority of its label, preferentially Push the higher data of priority ratio；

(3) according to<label, keyword>combination settings priority, default setting is that automatic obtain links data, and administrator strikes back It moves and adds hyperlink data for combination, the different link data of same combination are set as different priority, push link Priority ratio higher data are preferentially pushed when data are to client, if combined priority is identical, then comparison keyword is excellent First grade, it is preferential to push the higher number of priority ratio if the priority of keyword is still identical, then compares the priority of its label According to.

4. the method for the automatic hyperlink of keyword according to claim 3, it is characterised in that: wherein keyword extraction is calculated Method is that algorithm description is as follows:

(1) word segmentation processing is carried out to text；

(2) processing is weighted to the text of different positions, filename, Document Title, bookmark is contained in the sample of upload Such information adds different weights to these information；

(3) only retain noun, verb and adjective, and calculate the frequency that each word occurs, obtain<frequency, weight>column of each word Table；

(4) Words similarity for calculating the word in the top in above-mentioned list, merges similar word；

(5) Keyword List is finally obtained.

5. the method for the automatic hyperlink of keyword according to claim 1, it is characterised in that: the hardware combinations ID Calculation method are as follows: in a fixed order obtain environment division hardware id information, the ID that can not be obtained is filled with FF, and is pressed A string of characters are sequentially combined into according to this；The method of calibration of the hardware combinations ID are as follows: when specific verification, each hardware information Analyze come individually verify, when all hardware informations be more than 50% match when, with regard to verify pass through；Simultaneously for some hardware It may include more than one in the equipment having, at this moment verify single hardware information, as long as having a matching in same item of hardware, i.e., Indicate Hardware match success；

The method of the client unique information are as follows: according to the information of unique identification user come as client id；When verification, Successful match is just calculated in exact matching；The information of the unique identification user can be E-mail address or phone number.

6. the method for the automatic hyperlink of keyword according to claim 1, it is characterised in that: received automatically for keyword Collection, the concrete mode collected are as follows:

(1) sample data is extracted according to the keyword extraction algorithm；

(2) for being not matched to label and the higher keyword of weight ratio, it is added to that this client id is associated not to be sorted out In key table, while being also added to global do not sort out in key data table；

(3) automatic periodically or to manually boot that processing is above-mentioned not to sort out key data table process, according to the frequency of occurrences from height to Low sequence successively handles the keyword in table:

A) it searches whether the keyword has synonym or near synonym categorized, if so, then using its label, and never sorts out Key table deletes the keyword, completes the treatment process of the keyword, otherwise continues following step；

B) label information of the keyword is obtained from appointed website or server, if obtained successfully, uses the information, and Never sort out in key table and delete the keyword, complete the treatment process of the keyword, otherwise, continue following step；

C) the manual trasaction key label data of administrator is prompted, the keyword treatment process is completed；

(4) do not sort out keyword message for what is had been processed by, update corresponding client tag information and these clients Do not sort out key table.

7. the method for the automatic hyperlink of keyword according to claim 1, it is characterised in that: the user opens and answers Whether need to download the judgement of keyword hyperlink data from specified server according to current configuration determination with program when program Mechanism includes at least the following:

1) it when locally there is no hyperlink data, needs to download hyperlink data from server；

2) whether the Keyword Tag that connection server detects the client updates, and needs to download super chain if being updated Connect data；

3) connection server detects whether the associated hyperlink data of the corresponding Keyword Tag of the client has updated, if It has been updated and has then needed to download hyperlink data.