CN103631963B - A kind of keyword optimized treatment method and device based on big data - Google Patents
A kind of keyword optimized treatment method and device based on big data Download PDFInfo
- Publication number
- CN103631963B CN103631963B CN201310696077.2A CN201310696077A CN103631963B CN 103631963 B CN103631963 B CN 103631963B CN 201310696077 A CN201310696077 A CN 201310696077A CN 103631963 B CN103631963 B CN 103631963B
- Authority
- CN
- China
- Prior art keywords
- text message
- character
- keyword
- character string
- individual character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Marketing (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Transfer Between Computers (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of keyword optimized treatment method and device based on big data, methods described includes:Pending each text message order is arranged, and is split as individual character;The individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string;Kernel keyword is extracted from each character string of merging.Pass through a kind of keyword optimized treatment method and device based on big data disclosed by the invention, it can solve to determine that the accuracy of keyword is low, and the high problem of cost of keyword is determined, the accuracy for improving and determining keyword is realized, and reduction determines the technique effect of the cost of keyword.
Description
Technical field
The present embodiments relate to microcomputer data processing, more particularly to a kind of keyword optimization based on big data
Processing method and processing device.
Background technology
Paid search advertisement is most important advertisement putting mode on current internet.If by all enterprises on the internet
Advertising budget be denoted as 100%, then input occupies more than 50% share in terms of paid search advertisement.At home, main dispensing
Platform is such as having Baidu's Extension Software Platform.
The realization mechanism of paid search advertisement is that keyword to be put is determined by advertisement putting person, and keyword correspondence
Intention recommendation information and link advertisement webpage etc..Advertisement putting person buys pass to be put at paid search advertising service business
Keyword, when browse user input retrieval type when, will by with Keywords matching and search corresponding intention recommendation information and
Link advertisement webpage, browses and clicks on for user.Search engine system can record the data such as the amount of showing, click volume, for according to
Setting rule carries out charging.
Based on above-mentioned mechanism, for advertisement putting person, a successful paid search advertisement needs to complete following several
Individual important step:
Firstth, correct keyword is chosen.Such as one nash-equilibrium mechanism, it should buy " aviation passenger ticket ", " electronics visitor
Ticket " etc. can match the keyword of its business, the completely irrelevant keyword of the industry that similar " baby milk " is so engaged in it
It is inapplicable.Secondth, write concisely and attractive intention recommendation information for the keyword of purchase, with the pass that attracts clients
Note, lifts ad click rate, and then lift keyword quality score.3rd, for each keyword set rational best bid and
Matching way etc..
Wherein, the correct keyword of selection is particularly important, and keyword to be put constantly can be changed and increased newly, existing
Technology is manually to be updated by judgement of experience etc. to the newly-increased mode for promoting keyword.Rely primarily on to industry and paying
The personnel that advertisement promotion is all known quite well, or veteran consultant extract industry kernel keyword and carry out opening up word, to opening up
Word result carries out artificial filter, packet, and popularization of then reaching the standard grade is screened with effect is further to keyword.Specifically, one
Individual typical optimization process can be summarized as follows:First, consultant can select core according to oneself experience and related service knowledge
Heart keyword carries out opening up word;Then, artificial filter is carried out to opening up word result according to related service knowledge, deletes itself and think not
Related keyword;Next, keyword packet is reached the standard grade, if keyword brings a large amount of invalid costs, the key is deleted
Word.
But, it is existing to be had a disadvantage that based on manual type processing keyword process:
Firstth, it is different because this method relies primarily on the subjective judgement of people, it is easy to occur for same keyword
Consultant is not consistent to the opinion of industry kernel keyword, filtering and the packet of opening up word result.This causes the quality promoted
The professional skill level of consultant and the understanding to industry are severely limited by, if consultant is understood industry not enough, it is easy to
Cause a large amount of invalid costs.
Secondth, kernel keyword is selected by way of semantic, keyword filtering and packet is carried out, results contrast is accurate,
Because this is the result to true semantic analysis.But to consume substantial amounts of time cost:
(1) consultant needs rule of thumb and the understanding to relevant industries, according to the existing keyword extraction industry core of account
Heart keyword, this can spend consultant's many times;
(2) carried out opening up word according to kernel keyword, open up that word result is typically more, consultant is analyzed keyword one by one
Filtering, packet, can spend a large amount of valuable times of consultant.
(3)The keyword included in the promoted account of large enterprise is likely to be breached 100,000 or million magnitudes, when account scale
More than to a certain degree when select the work of account core word surmounted manpower can and scope, when account needs increased keyword
More than it is a certain amount of when, manually keyword, which is filtered, and is grouped can also seem unable to do what one wishes.
The content of the invention
The embodiment of the present invention provides a kind of keyword optimized treatment method and device based on big data, is determined with improving
The accuracy of keyword, and reduction determines the cost of keyword.
On the one hand, the embodiments of the invention provide a kind of keyword optimized treatment method based on big data, including:
Pending each text message order is arranged, and is split as individual character;
The individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string;
Kernel keyword is extracted from each character string of merging.
On the other hand, the embodiment of the present invention additionally provides a kind of keyword optimization processing device based on big data, including:
Single-character splitting module, for pending each text message order to be arranged, and is split as individual character;
Character string merging module, removes the individual character of setpoint frequency for the frequency according to each individual character, and will be remaining
Individual character merges into character string;
Keyword extracting module, for extracting kernel keyword from each character string of merging.
The embodiment of the present invention is split as individual character by the way that pending each text message order is arranged;According to each list
The frequency of word removes the individual character of setpoint frequency, and remaining individual character is merged into character string;Carried from each character string of merging
Kernel keyword is taken, solves to determine that the accuracy of keyword is low, and determines the high problem of cost of keyword, realizes that raising determines to close
The accuracy of keyword, and reduction determines the technique effect of the cost of keyword.
Brief description of the drawings
Fig. 1 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention one is shown
It is intended to;
Fig. 2 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention two is shown
It is intended to;
Fig. 3 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention three is shown
It is intended to;
Fig. 4 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention four is shown
It is intended to;
Fig. 5 is that a kind of structure of the keyword optimization processing device based on big data provided in the embodiment of the present invention five is shown
It is intended to;
Fig. 6 is that a kind of structure of the keyword optimization processing device based on big data provided in the embodiment of the present invention six is shown
It is intended to.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that, in order to just
Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention one is shown
It is intended to, the processing method can be performed by the keyword optimization processing device based on big data, as shown in figure 1, including following
Step:
Step S101, pending each text message order is arranged, and be split as individual character.
Pending each text message can be fixed multiple text messages at the beginning of when initially delivering keyword or
In subsequently addition keyword, the keyword of original dispensing in account.
Text message is split as to individual character first, so as to subsequent treatment in this step.Pending each text message is suitable
Sequence is arranged, and is split as the operation of individual character and is preferably included:Pending each text message order is arranged, each text message it
Between blank character is set;According to blank character, each text message is split as individual character.
Need to illustrate, each text message can include the word of letter, numeral, Chinese character and symbol any combination
Symbol string.Specifically, individual character can include a letter, a numeral, a Chinese character or a symbol.
Step S102, removes the individual character of setpoint frequency according to the frequency of each individual character, and remaining individual character is merged into
Character string;
This step is removed according to the frequency of individual character, and the frequency of individual character is each individual character comparing in all individual characters
, for example, there are 100 individual characters altogether in example, and the individual character of wherein 10 appearance is identical, then the frequency of the individual character is 10%.This step
Suddenly the too high or too low individual character of frequency can be will appear to remove, specific setpoint frequency value can as needed or experience is set
Put.Remaining individual character merges into the character string for meeting certain rule, so as to screen out some excessively uncommon or redundancy individual characters.
In step S102, the individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string
It can specifically include:
First, the individual character of setpoint frequency is removed according to the frequency of each individual character, the individual character of removal is replaced with blank character.So
Afterwards, in remaining individual character, the continuous individual character between blank character is merged into a character string.
Step S103, kernel keyword is extracted from each character string of merging.
Extracting the operation of kernel keyword can be carried out according to setting rule, because remaining character string has already been through list
The filtering screening of word, so remaining character string has been the higher character string of the frequency of occurrences in itself, in particular cases can be complete
Extract and be used as kernel keyword in portion.
It is preferred that extraction operation, from each character string of merging extract kernel keyword specifically may include:
From each character string of merging, the character string that the quantity of character is less than given threshold is deleted;So as to retain
Character quantity is more than the character string of given threshold, wherein, given threshold can be positive integer, such as 1, so as to delete character
Quantity is 1 character string.The operation is actually deleted the character string of only one of which individual character.
In remaining character string, extract frequency one character string of highest and be used as kernel keyword.
Kernel keyword is replaced with blank character from pending each text message, above-mentioned fractionation individual character, conjunction is repeated
And character string and the operation for extracting kernel keyword.
Need to illustrate is or in remaining character string once to extract and meet setting highest frequency
Multiple kernel keywords.But, the operation extracted using above-mentioned circulation can to extract a kernel keyword every time, remain
Remaining text message is just no longer influenced by the interference of this kernel keyword, and can continue to extract other kernel keywords wherein, so
Accuracy rate it is higher.
By a kind of keyword optimized treatment method based on big data disclosed in the embodiment of the present invention one, wherein with to list
The Screening Treatment of word carrys out automatic fitration text message, reduces even without artificial intervention, is automatically performed, also can be suitably used for magnanimity
The processing of text message.It can improve for a large amount of, complex data to determine the accuracy of keyword, and reduction determines keyword
Cost.
Embodiment two
The embodiment of the present invention two is based on a kind of keyword optimization processing based on big data disclosed in the embodiment of the present invention one
Method is there is provided a kind of preferred embodiment of keyword optimized treatment method, as shown in Fig. 2 comprising the following steps:
Step S201, hypothesis specifically include following phrase in text message, and each text message order is arranged, and each
Between text message set blank character "!", it is as follows:
Nokia's mobile phone!Samsung mobile phone!IPhone!How is Nokia's mobile phone!IPHONE5s!Smart mobile phone!
IPHONE4!IPHONE5!IPHONE4s!Samsung S4!Samsung S3!Samsung S2!How is iPhone!Samsung mobile phone is handy!
Nokia's mobile phone is handy!Intelligent large-screen mobile phone
Step S202, all phrases in above-mentioned text message are splitted into individual character, split result is:
Nokia's mobile phone!Samsung mobile phone!IPhone!How is Nokia's mobile phone!IPHONE5s!Smart mobile phone!
IPHONE4!IPHONE5!IPHONE4s!Samsung S4!Samsung S3!Samsung S2!How is iPhone!Samsung mobile phone is handy!
Nokia's mobile phone is handy!Intelligent large-screen mobile phone
Step S203, by the sub-average character of the frequency of individual character with blank character "!" replace, the average value is all lists
The average value of word frequency, replacing result is:
!!!Mobile phone!Samsung mobile phone!!!Mobile phone!!!!Mobile phone!!!!IPHONE!!!Mobile phone!IPHONE!!IPHONE!!
IPHONE!!!Samsung!!!Samsung!!!Samsung!!!!!Mobile phone!!!!Samsung mobile phone!!!!!!!Mobile phone!!!!!!!!Mobile phone
Step S204, retain the phrase that number of characters in above-mentioned phrase is more than 1, as a result for:
Mobile phone!Samsung mobile phone!Mobile phone!Mobile phone!IPHONE!Mobile phone!IPHONE!IPHONE!IPHONE!Samsung!Samsung!Three
Star!Mobile phone!Samsung mobile phone!Mobile phone!Mobile phone
Step S205, extraction frequency of occurrences highest character string, wherein there is frequency highest word for " mobile phone ", occur
7 times, kernel keyword " mobile phone " is extracted herein.
Step S206, remove " mobile phone " in urtext information, replaced with blank character, as a result for:
Nokia!Samsung!Apple!Nokia!How!IPHONE5s!Intelligence!IPHONE4!IPHONE5!
IPHONE4s!Samsung S4!Samsung S3!Samsung S2!Apple!How!Samsung!It is handy!Nokia!It is handy!Intelligent large-size screen monitors!
Step S207, repeat the above steps S202-S206, and it is " Samsung " to extract frequency highest character string, is occurred 5 times,
Kernel keyword " Samsung " is extracted herein.
Step S208, remove " Samsung " in original text information, replaced with blank character, as a result for:
Nokia!Apple!Nokia!How!IPHONE5s!Intelligence!IPHONE4!IPHONE5!IPHONE4s!S4!
S3!S2!Apple!How!It is handy!Nokia!It is handy!Intelligent large-size screen monitors!
Step S209, extraction wherein frequency highest are IPHONE, occur 4 times, kernel keyword is extracted herein
“IPHONE”。
It is repeatable to perform aforesaid operations, until the kernel keyword of setting quantity is obtained, or highest frequency given threshold.
In this example, kernel keyword extracts result and is:Mobile phone, Samsung, IPHONE, Nokia.
A kind of keyword optimized treatment method based on big data provided by the embodiment of the present invention two, can correctly from
Keyword is extracted in phrase, the accuracy for determining keyword is improved, and reduction determines the cost of keyword.
Embodiment three
Fig. 3 is the flow chart for the keyword optimized treatment method based on big data that the embodiment of the present invention three is provided, this reality
Apply the application scenarios after example is extracted based on previous embodiment there is provided a kind of kernel keyword.In paid search advertisement
In application process, the keyword of dispensing can be updated according to advertising results, then needs first to determine newly-increased text message, then therefrom
Screen keyword to deliver, the present embodiment can determine newly-increased keyword based on the kernel keyword delivered in account.Such as
Shown in Fig. 3, on the basis of previous embodiment, from each character string of merging after extraction kernel keyword, in addition to following step
Suddenly:
Step S301, the text message for not including kernel keyword is deleted from newly-increased text message;
Step S302, in remaining each text message, determine comparing for non-core keyword and kernel keyword
Example, and the text message that ratio is less than setting ratio value is deleted, with the text message after being filtered.
It is illustrated below, newly-increased text message is:
Samsung, Nokia's mobile phone it is expensive, Samsung mobile phone OK, cell phone number, Nokia's mobile phone how, Samsung it is big
Screen mobile phone.
Delete kernel keyword from newly-increased text message, the kernel keyword that previous examples are determined is mobile phone, Samsung,
IPHONE, Nokia, comprising kernel keyword.But, wherein visible, the kernel keyword occurred in " cell phone number "
Ratio it is relatively low, if less than setting ratio value, being deleted filtering.Result after filtering is:Samsung, Nokia's mobile phone are expensive
, Samsung mobile phone OK, Nokia's mobile phone how, Samsung large-screen mobile phone.Result after filtering can be newly-increased crucial as delivering
The foundation of word, or directly as dispensing keyword.
In such scheme, preferably after the text message after being filtered, in addition to:
Kernel keyword after step S303, each filtering of extraction in text message, is defined as the label of text message;
Step S304, according to label the text message after each filtering is grouped.
Illustrate yet by examples detailed above, the label correspondence situation of text message is as follows after filtering:
Samsung --- Samsung
Nokia's mobile phone is expensive --- Nokia+mobile phone
Samsung mobile phone OK --- Samsung+mobile phone
Nokia's mobile phone is how --- Nokia+mobile phone
Samsung large-screen mobile phone --- Samsung+mobile phone
Above-mentioned label has three kinds:Samsung, Nokia+mobile phone, Samsung+mobile phone, can be divided into three groups by text message accordingly.
Keyword after packet is easier to carry out packet dispensing.
The process of newly-increased keyword can be performed a plurality of times, and when newly-increased keyword is devoted in account, then next time, increase was closed
During keyword, the extraction of kernel keyword can be re-started to the keyword in account, is then carried out further according to kernel keyword
The screening of newly-increased keyword.
Example IV
Fig. 4 is the flow chart for the keyword optimized treatment method based on big data that the embodiment of the present invention four is provided, this reality
Apply example based on previous embodiment there is provided another kernel keyword extract after application scenarios, you can to keyword with
Susceptibility between attribute is identified.Arranged by pending each text message order, and be split as also wrapping before individual character
Include:
Step S401, according to the attribute of pending text message text message is classified, form at least two groups and wait to locate
The text message of reason;
The setting of attribute can be completed as desired, and the attribute of text message can be the technology corresponding to text message
Field, region, time limit, personage and event.Preferably classify according to intention recommendation information.One of example is, Ke Yicong
The priority ranking of each intention recommendation information, or classification are determined in the data such as the amount of showing and click volume of advertising service business feedback
For more excellent and poor intention recommendation information.Keyword corresponding to sorted intention recommendation information, as meets the attribute
Pending text message.
Step S402, pending each text message order arranged, and be split as individual character;
Step S403, according to the frequency of each individual character the individual character of setpoint frequency is removed, and remaining individual character is merged into
Character string;
Step S404, from each character string of merging extract kernel keyword.
Above-mentioned steps S402-404 can refer to previous embodiment to perform, and every group of pending text message is held respectively
OK.
Whether step S405, the kernel keyword for comparing the pending text message of each group are identical, by different keys
Word is defined as the kernel keyword of attribute corresponding to the pending text message of the group.
If the corresponding kernel keyword of every group of attribute is different, illustrate that the different kernel keyword can more represent two groups
The difference of attribute.For example, it may be possible to be the keyword for making intention recommendation information difference more sensitive, then these keywords can be set
Weighted value, to do the reference frame delivered.
By the keyword optimized treatment method based on big data disclosed in the embodiment of the present invention, attribute can be realized automatically
The extraction of kernel keyword, and extraction cost is low, reliability is high.
Embodiment five
The embodiment of the present invention five provides a kind of keyword optimization processing device based on big data, as shown in figure 5, specifically
Including:Single-character splitting module 51, character string merging module 52 and keyword extracting module 53.
Wherein, single-character splitting module 51 is used to arrange pending each text message order, and is split as individual character;Character
Merging module 52 of going here and there is used to be removed the individual character of setpoint frequency according to the frequency of each individual character, and remaining individual character is merged into word
Symbol string;Keyword extracting module 53 is used to extract kernel keyword from each character string of merging.
In such scheme, single-character splitting module 51 may particularly include:Blank character setting unit 511 and split cells 512.Between
It is used to arrange pending each text message order every symbol setting unit 511, blank character is set between each text message;Tear open
Subdivision 512, for each text message to be split as into individual character according to blank character.
Character string merging module 52 may particularly include:Blank character replacement unit 521 and combining unit 522.Blank character is replaced
Unit 521, removes the individual character of setpoint frequency for the frequency according to each individual character, and the individual character of removal is replaced with blank character;Close
And unit 522, in remaining individual character, the continuous individual character between blank character to be merged into a character string.
Keyword-extraction module 53 may particularly include:Character string deletes unit 531 and extraction unit 532.Wherein, character string
Unit 531 is deleted, for from each character string of merging, the character string that the quantity of character is less than given threshold to be deleted;Extract
Unit 532, kernel keyword is used as in remaining character string, extracting frequency one character string of highest.
Described device may also include:Module 533 is repeated, for being used as core in extraction one character string of frequency highest
After heart keyword, kernel keyword is replaced with blank character from pending each text message, triggering repeats above-mentioned tear open
Divide individual character, merge character string and extract the operation of kernel keyword.
By a kind of keyword optimization processing device based on big data disclosed in the embodiment of the present invention five, it can improve really
Determine the accuracy of keyword, and reduction determines the cost of keyword.
Embodiment six
The embodiment of the present invention six provides a kind of keyword optimization processing device based on big data, as shown in fig. 6, bag
Include:Single-character splitting module 61, character string merging module 62 and keyword extracting module 63, in addition to:Text message removing module
64, after the extraction kernel keyword in each character string from merging, deleted from newly-increased text message and do not include core
The text message of keyword;
Text message module 65 is filtered, in remaining each text message, determining non-core keyword and core
The appearance ratio of keyword, and the text message that ratio is less than setting ratio value is deleted, with the text message after being filtered.
Label determining module 66, for after the text message after being filtered, extracting text message after each filtering
In kernel keyword, be defined as the label of text message;
Grouping module 67, for being grouped the text message after each filtering according to label.
Said apparatus can realize the what's new for delivering keyword.
Or, in the device, text information processing module can also be included, for pending each text message is suitable
Sequence is arranged, and is split as before individual character, and text message is classified according to the attribute of pending text message, forms at least two
The pending text message of group;
Kernel keyword determining module, it is relatively more each after the extraction kernel keyword in each character string from merging
Whether the kernel keyword of the pending text message of group is identical, and different kernel keywords is defined as into the pending text envelope of the group
The kernel keyword of the corresponding attribute of breath.
A kind of keyword optimization processing device based on big data provided by the embodiment of the present invention five, will can be increased newly
It is correct in text message to extract keyword, it is added in original keyword group.
The said goods can perform the method that any embodiment of the present invention is provided, and possess the corresponding functional module of execution method
And beneficial effect.
The technical scheme of the embodiment of the present invention, with reference to statistical knowledge and text mining knowledge, selectes the individual character of certain frequency
Restore in original text information, simplify the complexity for considering word composition mechanism etc., be that the realization of this method provides the foundation;
In word discovery procedure, the frequency and simple positional information of text are taken into account, the Exact Travelling found for word provides guarantor
Card;In word selection course, frequency of occurrences highest word is only taken every time, takes the continual selection of circulative metabolism, will not
The interference of controllable factor is preferably minimized, and improves the degree of accuracy of word discovery.
Scheme of the embodiment of the present invention handles keyword process compared to existing manual type, and advantages and benefits are:
Firstth, the filtering of extraction, keyword for kernel keyword and packet standard are unified, in the absence of because of people
Different situation.Algorithm can be analyzed for the related text message of each promoted account, the kernel keyword of extraction and popularization
Account is closely related, and greatly reduces and does not know about the deviation brought, unified filtering and packet side to promoting industry etc.
The very big facility that follow-up optimization of the formula to promoted account is also brought;
Secondth, during processing keyword, core word that is artificial more time-consuming or even can not completing is extracted, mistake
Filter, grouping process learn to complete automatically by algorithm, save consultant's valuable time.
Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that
The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art it is various it is obvious change,
Readjust and substitute without departing from protection scope of the present invention.Therefore, although the present invention is carried out by above example
It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also
Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.
Claims (8)
1. a kind of keyword optimized treatment method based on big data, it is characterised in that including:
Pending each text message order is arranged, and is split as individual character;
It is described to arrange pending each text message order, and be split as individual character and specifically include:
Pending each text message order is arranged, blank character is set between each text message;
According to the blank character, each text message is split as individual character;
The individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string;
The frequency of each individual character of basis removes the individual character of setpoint frequency, and it is specific that remaining individual character is merged into character string
Including:
The individual character of setpoint frequency is removed according to the frequency of each individual character, the individual character of removal is replaced with blank character;
In remaining individual character, the continuous individual character between blank character is merged into a character string;
Kernel keyword is extracted from each character string of merging.
2. according to the method described in claim 1, it is characterised in that extract kernel keyword in each character string from merging
Specifically include:
From each character string of merging, the character string that the quantity of character is less than given threshold is deleted;
In remaining character string, extract frequency one character string of highest and be used as kernel keyword;
After one character string of frequency highest is extracted as kernel keyword, methods described also includes:
The kernel keyword is replaced with blank character from pending each text message, above-mentioned fractionation individual character, conjunction is repeated
And character string and the operation for extracting kernel keyword.
3. the method according to claim any one of 1-2, it is characterised in that extracted in each character string from merging
After kernel keyword, in addition to:
The text message for not including kernel keyword is deleted from newly-increased text message;
In remaining each text message, the appearance ratio of non-core keyword and kernel keyword is determined, and delete ratio
Less than the text message of setting ratio value, with the text message after being filtered.
4. method according to claim 3, it is characterised in that after the text message after described filtered, also wrap
Include:
The kernel keyword in text message after each filtering is extracted, is defined as the label of the text message;
The text message after each filtering is grouped according to label.
5. according to the method described in claim 1, it is characterised in that arrange pending each text message order described,
And be split as also including before individual character:
Text message is classified according to the attribute of pending text message, at least two groups pending text messages are formed;
In each character string from merging after extraction kernel keyword, in addition to:
Whether the kernel keyword for comparing the pending text message of each group is identical, and different kernel keywords is defined as into the group treats
Handle the kernel keyword of attribute corresponding to text message.
6. a kind of keyword optimization processing device based on big data, it is characterised in that including:
Single-character splitting module, for pending each text message order to be arranged, and is split as individual character;
The single-character splitting module includes:
Blank character setting unit, for pending each text message order to be arranged, sets interval between each text message
Symbol;
Split cells, for each text message to be split as into individual character according to the blank character;
Character string merging module, removes the individual character of setpoint frequency for the frequency according to each individual character, and by remaining individual character
Merge into character string;
The character string merging module includes:
Blank character replacement unit, removes the individual character of setpoint frequency for the frequency according to each individual character, the individual character of removal with
Replaced every symbol;
Combining unit, in remaining individual character, the continuous individual character between blank character to be merged into a character string;
Keyword extracting module, for extracting kernel keyword from each character string of merging;
The keyword-extraction module includes:
Character string deletes unit, for from each character string of merging, the character string that the quantity of character is less than given threshold to be deleted
Remove;
Extraction unit, kernel keyword is used as in remaining character string, extracting frequency one character string of highest;
Described device also includes:Module is repeated, for being used as kernel keyword in extraction one character string of frequency highest
Afterwards, the kernel keyword is replaced with blank character from pending each text message, triggering repeats above-mentioned fractionation list
Word, the operation for merging character string and extraction kernel keyword;
Described device also includes:Text message removing module, in each character string from merging extract kernel keyword it
Afterwards, the text message for not including kernel keyword is deleted from newly-increased text message;
Text message module is filtered, in remaining each text message, determining non-core keyword and kernel keyword
Appearance ratio, and delete ratio be less than setting ratio value text message, with the text message after being filtered.
7. device according to claim 6, it is characterised in that also include:
Label determining module, for after the text message after being filtered, extracting the core after each filtering in text message
Heart keyword, is defined as the label of the text message;
Grouping module, for being grouped the text message after each filtering according to label.
8. device according to claim 6, it is characterised in that also include:
Text information processing module, for arranging pending each text message order described, and is split as before individual character,
Text message is classified according to the attribute of pending text message, at least two groups pending text messages are formed;
Kernel keyword determining module, it is relatively more each after the extraction kernel keyword in each character string from merging
Whether the kernel keyword of the pending text message of group is identical, and different kernel keywords is defined as into the pending text envelope of the group
The kernel keyword of the corresponding attribute of breath.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310696077.2A CN103631963B (en) | 2013-12-18 | 2013-12-18 | A kind of keyword optimized treatment method and device based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310696077.2A CN103631963B (en) | 2013-12-18 | 2013-12-18 | A kind of keyword optimized treatment method and device based on big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103631963A CN103631963A (en) | 2014-03-12 |
CN103631963B true CN103631963B (en) | 2017-10-17 |
Family
ID=50213004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310696077.2A Expired - Fee Related CN103631963B (en) | 2013-12-18 | 2013-12-18 | A kind of keyword optimized treatment method and device based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103631963B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063370B (en) * | 2014-07-01 | 2017-09-22 | 北京博雅立方科技有限公司 | A kind of intelligent packet method and device based on keyword |
CN106033403B (en) * | 2015-03-20 | 2019-05-31 | 广州金山移动科技有限公司 | A kind of text conversion method and device |
WO2017128438A1 (en) * | 2016-01-31 | 2017-08-03 | 深圳市博信诺达经贸咨询有限公司 | Method and system for application of big data |
CN110069676A (en) * | 2017-09-28 | 2019-07-30 | 北京国双科技有限公司 | Keyword recommendation method and device |
CN108538300B (en) * | 2018-02-27 | 2021-01-29 | 科大讯飞股份有限公司 | Voice control method and device, storage medium and electronic equipment |
CN109949806B (en) * | 2019-03-12 | 2021-07-27 | 百度国际科技(深圳)有限公司 | Information interaction method and device |
CN112000794B (en) * | 2020-07-30 | 2023-08-22 | 北京百度网讯科技有限公司 | Text corpus screening method and device, electronic equipment and storage medium |
CN113538062B (en) * | 2021-07-28 | 2024-05-07 | 福州果集信息科技有限公司 | Method for reversely pushing bid words purchased by commodity popularization notes |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101477566A (en) * | 2009-01-19 | 2009-07-08 | 腾讯科技(深圳)有限公司 | Method and apparatus used for putting candidate key words advertisement |
CN101625683A (en) * | 2008-07-09 | 2010-01-13 | 精实万维软件(北京)有限公司 | Method for selecting bidding advertisement keyword during release of search engine bidding advertisement |
CN102156721A (en) * | 2011-03-29 | 2011-08-17 | 张栋 | Method for accurately delivering Internet video advertisement based on label |
CN102169496A (en) * | 2011-04-12 | 2011-08-31 | 清华大学 | Anchor text analysis-based automatic domain term generating method |
CN103092956A (en) * | 2013-01-17 | 2013-05-08 | 上海交通大学 | Method and system for topic keyword self-adaptive expansion on social network platform |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100504851C (en) * | 2007-06-27 | 2009-06-24 | 腾讯科技(深圳)有限公司 | Chinese character word distinguishing method and system |
CN101122900A (en) * | 2007-09-25 | 2008-02-13 | 中兴通讯股份有限公司 | Words partition system and method |
JP2012256268A (en) * | 2011-06-10 | 2012-12-27 | Ad Space Co Ltd | Advertisement distribution device and advertisement distribution program |
-
2013
- 2013-12-18 CN CN201310696077.2A patent/CN103631963B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101625683A (en) * | 2008-07-09 | 2010-01-13 | 精实万维软件(北京)有限公司 | Method for selecting bidding advertisement keyword during release of search engine bidding advertisement |
CN101477566A (en) * | 2009-01-19 | 2009-07-08 | 腾讯科技(深圳)有限公司 | Method and apparatus used for putting candidate key words advertisement |
CN102156721A (en) * | 2011-03-29 | 2011-08-17 | 张栋 | Method for accurately delivering Internet video advertisement based on label |
CN102169496A (en) * | 2011-04-12 | 2011-08-31 | 清华大学 | Anchor text analysis-based automatic domain term generating method |
CN103092956A (en) * | 2013-01-17 | 2013-05-08 | 上海交通大学 | Method and system for topic keyword self-adaptive expansion on social network platform |
Also Published As
Publication number | Publication date |
---|---|
CN103631963A (en) | 2014-03-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103631963B (en) | A kind of keyword optimized treatment method and device based on big data | |
CN109271512B (en) | Emotion analysis method, device and storage medium for public opinion comment information | |
CN106682169B (en) | Application label mining method and device, application searching method and server | |
CN108369709B (en) | System and method for network-based advertisement data traffic latency reduction | |
US20130304469A1 (en) | Information processing method and apparatus, computer program and recording medium | |
CN103023753B (en) | Method, client and the system of interaction content association output in instant messaging | |
US20110153595A1 (en) | System And Method For Identifying Topics For Short Text Communications | |
CN107766371A (en) | A kind of text message sorting technique and its device | |
US10546336B2 (en) | Search device, search method, program, and storage medium | |
CN103377249A (en) | Keyword putting method and system | |
CN106682170B (en) | Application search method and device | |
US20160085855A1 (en) | Perspective data analysis and management | |
CN108364199A (en) | A kind of data analysing method and system based on Internet user's comment | |
CN108256537A (en) | A kind of user gender prediction method and system | |
CN105786793A (en) | Method and device for analyzing semanteme of spoken language text information | |
CN103684969A (en) | Message handling method and message handling system | |
CN107679217A (en) | Association method for extracting content and device based on data mining | |
CN103123624A (en) | Method of confirming head word, device of confirming head word, searching method and device | |
CN107422941A (en) | Exchange method and system | |
CN106997339A (en) | Text feature, file classification method and device | |
CN103150331A (en) | Method and device for providing search engine tags | |
CN103246655A (en) | Text categorizing method, device and system | |
CN109978624A (en) | Information processing method, electronic equipment and computer readable storage medium | |
CN107729573A (en) | Information-pushing method and device | |
CN106294676A (en) | A kind of data retrieval method of ecommerce government system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20171017 Termination date: 20211218 |
|
CF01 | Termination of patent right due to non-payment of annual fee |