CN103631963B - A kind of keyword optimized treatment method and device based on big data - Google Patents

A kind of keyword optimized treatment method and device based on big data Download PDF

Info

Publication number
CN103631963B
CN103631963B CN201310696077.2A CN201310696077A CN103631963B CN 103631963 B CN103631963 B CN 103631963B CN 201310696077 A CN201310696077 A CN 201310696077A CN 103631963 B CN103631963 B CN 103631963B
Authority
CN
China
Prior art keywords
text message
character
keyword
character string
individual character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310696077.2A
Other languages
Chinese (zh)
Other versions
CN103631963A (en
Inventor
裴向宇
田传钊
王汉生
李红波
常莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Learned Cube Of Beijing Science And Technology Ltd
Original Assignee
Learned Cube Of Beijing Science And Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Learned Cube Of Beijing Science And Technology Ltd filed Critical Learned Cube Of Beijing Science And Technology Ltd
Priority to CN201310696077.2A priority Critical patent/CN103631963B/en
Publication of CN103631963A publication Critical patent/CN103631963A/en
Application granted granted Critical
Publication of CN103631963B publication Critical patent/CN103631963B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of keyword optimized treatment method and device based on big data, methods described includes:Pending each text message order is arranged, and is split as individual character;The individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string;Kernel keyword is extracted from each character string of merging.Pass through a kind of keyword optimized treatment method and device based on big data disclosed by the invention, it can solve to determine that the accuracy of keyword is low, and the high problem of cost of keyword is determined, the accuracy for improving and determining keyword is realized, and reduction determines the technique effect of the cost of keyword.

Description

A kind of keyword optimized treatment method and device based on big data
Technical field
The present embodiments relate to microcomputer data processing, more particularly to a kind of keyword optimization based on big data Processing method and processing device.
Background technology
Paid search advertisement is most important advertisement putting mode on current internet.If by all enterprises on the internet Advertising budget be denoted as 100%, then input occupies more than 50% share in terms of paid search advertisement.At home, main dispensing Platform is such as having Baidu's Extension Software Platform.
The realization mechanism of paid search advertisement is that keyword to be put is determined by advertisement putting person, and keyword correspondence Intention recommendation information and link advertisement webpage etc..Advertisement putting person buys pass to be put at paid search advertising service business Keyword, when browse user input retrieval type when, will by with Keywords matching and search corresponding intention recommendation information and Link advertisement webpage, browses and clicks on for user.Search engine system can record the data such as the amount of showing, click volume, for according to Setting rule carries out charging.
Based on above-mentioned mechanism, for advertisement putting person, a successful paid search advertisement needs to complete following several Individual important step:
Firstth, correct keyword is chosen.Such as one nash-equilibrium mechanism, it should buy " aviation passenger ticket ", " electronics visitor Ticket " etc. can match the keyword of its business, the completely irrelevant keyword of the industry that similar " baby milk " is so engaged in it It is inapplicable.Secondth, write concisely and attractive intention recommendation information for the keyword of purchase, with the pass that attracts clients Note, lifts ad click rate, and then lift keyword quality score.3rd, for each keyword set rational best bid and Matching way etc..
Wherein, the correct keyword of selection is particularly important, and keyword to be put constantly can be changed and increased newly, existing Technology is manually to be updated by judgement of experience etc. to the newly-increased mode for promoting keyword.Rely primarily on to industry and paying The personnel that advertisement promotion is all known quite well, or veteran consultant extract industry kernel keyword and carry out opening up word, to opening up Word result carries out artificial filter, packet, and popularization of then reaching the standard grade is screened with effect is further to keyword.Specifically, one Individual typical optimization process can be summarized as follows:First, consultant can select core according to oneself experience and related service knowledge Heart keyword carries out opening up word;Then, artificial filter is carried out to opening up word result according to related service knowledge, deletes itself and think not Related keyword;Next, keyword packet is reached the standard grade, if keyword brings a large amount of invalid costs, the key is deleted Word.
But, it is existing to be had a disadvantage that based on manual type processing keyword process:
Firstth, it is different because this method relies primarily on the subjective judgement of people, it is easy to occur for same keyword Consultant is not consistent to the opinion of industry kernel keyword, filtering and the packet of opening up word result.This causes the quality promoted The professional skill level of consultant and the understanding to industry are severely limited by, if consultant is understood industry not enough, it is easy to Cause a large amount of invalid costs.
Secondth, kernel keyword is selected by way of semantic, keyword filtering and packet is carried out, results contrast is accurate, Because this is the result to true semantic analysis.But to consume substantial amounts of time cost:
(1) consultant needs rule of thumb and the understanding to relevant industries, according to the existing keyword extraction industry core of account Heart keyword, this can spend consultant's many times;
(2) carried out opening up word according to kernel keyword, open up that word result is typically more, consultant is analyzed keyword one by one Filtering, packet, can spend a large amount of valuable times of consultant.
(3)The keyword included in the promoted account of large enterprise is likely to be breached 100,000 or million magnitudes, when account scale More than to a certain degree when select the work of account core word surmounted manpower can and scope, when account needs increased keyword More than it is a certain amount of when, manually keyword, which is filtered, and is grouped can also seem unable to do what one wishes.
The content of the invention
The embodiment of the present invention provides a kind of keyword optimized treatment method and device based on big data, is determined with improving The accuracy of keyword, and reduction determines the cost of keyword.
On the one hand, the embodiments of the invention provide a kind of keyword optimized treatment method based on big data, including:
Pending each text message order is arranged, and is split as individual character;
The individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string;
Kernel keyword is extracted from each character string of merging.
On the other hand, the embodiment of the present invention additionally provides a kind of keyword optimization processing device based on big data, including:
Single-character splitting module, for pending each text message order to be arranged, and is split as individual character;
Character string merging module, removes the individual character of setpoint frequency for the frequency according to each individual character, and will be remaining Individual character merges into character string;
Keyword extracting module, for extracting kernel keyword from each character string of merging.
The embodiment of the present invention is split as individual character by the way that pending each text message order is arranged;According to each list The frequency of word removes the individual character of setpoint frequency, and remaining individual character is merged into character string;Carried from each character string of merging Kernel keyword is taken, solves to determine that the accuracy of keyword is low, and determines the high problem of cost of keyword, realizes that raising determines to close The accuracy of keyword, and reduction determines the technique effect of the cost of keyword.
Brief description of the drawings
Fig. 1 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention one is shown It is intended to;
Fig. 2 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention two is shown It is intended to;
Fig. 3 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention three is shown It is intended to;
Fig. 4 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention four is shown It is intended to;
Fig. 5 is that a kind of structure of the keyword optimization processing device based on big data provided in the embodiment of the present invention five is shown It is intended to;
Fig. 6 is that a kind of structure of the keyword optimization processing device based on big data provided in the embodiment of the present invention six is shown It is intended to.
Embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that, in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.
Embodiment one
Fig. 1 is that a kind of flow of the keyword optimized treatment method based on big data provided in the embodiment of the present invention one is shown It is intended to, the processing method can be performed by the keyword optimization processing device based on big data, as shown in figure 1, including following Step:
Step S101, pending each text message order is arranged, and be split as individual character.
Pending each text message can be fixed multiple text messages at the beginning of when initially delivering keyword or In subsequently addition keyword, the keyword of original dispensing in account.
Text message is split as to individual character first, so as to subsequent treatment in this step.Pending each text message is suitable Sequence is arranged, and is split as the operation of individual character and is preferably included:Pending each text message order is arranged, each text message it Between blank character is set;According to blank character, each text message is split as individual character.
Need to illustrate, each text message can include the word of letter, numeral, Chinese character and symbol any combination Symbol string.Specifically, individual character can include a letter, a numeral, a Chinese character or a symbol.
Step S102, removes the individual character of setpoint frequency according to the frequency of each individual character, and remaining individual character is merged into Character string;
This step is removed according to the frequency of individual character, and the frequency of individual character is each individual character comparing in all individual characters , for example, there are 100 individual characters altogether in example, and the individual character of wherein 10 appearance is identical, then the frequency of the individual character is 10%.This step Suddenly the too high or too low individual character of frequency can be will appear to remove, specific setpoint frequency value can as needed or experience is set Put.Remaining individual character merges into the character string for meeting certain rule, so as to screen out some excessively uncommon or redundancy individual characters. In step S102, the individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string It can specifically include:
First, the individual character of setpoint frequency is removed according to the frequency of each individual character, the individual character of removal is replaced with blank character.So Afterwards, in remaining individual character, the continuous individual character between blank character is merged into a character string.
Step S103, kernel keyword is extracted from each character string of merging.
Extracting the operation of kernel keyword can be carried out according to setting rule, because remaining character string has already been through list The filtering screening of word, so remaining character string has been the higher character string of the frequency of occurrences in itself, in particular cases can be complete Extract and be used as kernel keyword in portion.
It is preferred that extraction operation, from each character string of merging extract kernel keyword specifically may include:
From each character string of merging, the character string that the quantity of character is less than given threshold is deleted;So as to retain Character quantity is more than the character string of given threshold, wherein, given threshold can be positive integer, such as 1, so as to delete character Quantity is 1 character string.The operation is actually deleted the character string of only one of which individual character.
In remaining character string, extract frequency one character string of highest and be used as kernel keyword.
Kernel keyword is replaced with blank character from pending each text message, above-mentioned fractionation individual character, conjunction is repeated And character string and the operation for extracting kernel keyword.
Need to illustrate is or in remaining character string once to extract and meet setting highest frequency Multiple kernel keywords.But, the operation extracted using above-mentioned circulation can to extract a kernel keyword every time, remain Remaining text message is just no longer influenced by the interference of this kernel keyword, and can continue to extract other kernel keywords wherein, so Accuracy rate it is higher.
By a kind of keyword optimized treatment method based on big data disclosed in the embodiment of the present invention one, wherein with to list The Screening Treatment of word carrys out automatic fitration text message, reduces even without artificial intervention, is automatically performed, also can be suitably used for magnanimity The processing of text message.It can improve for a large amount of, complex data to determine the accuracy of keyword, and reduction determines keyword Cost.
Embodiment two
The embodiment of the present invention two is based on a kind of keyword optimization processing based on big data disclosed in the embodiment of the present invention one Method is there is provided a kind of preferred embodiment of keyword optimized treatment method, as shown in Fig. 2 comprising the following steps:
Step S201, hypothesis specifically include following phrase in text message, and each text message order is arranged, and each Between text message set blank character "!", it is as follows:
Nokia's mobile phone!Samsung mobile phone!IPhone!How is Nokia's mobile phone!IPHONE5s!Smart mobile phone! IPHONE4!IPHONE5!IPHONE4s!Samsung S4!Samsung S3!Samsung S2!How is iPhone!Samsung mobile phone is handy! Nokia's mobile phone is handy!Intelligent large-screen mobile phone
Step S202, all phrases in above-mentioned text message are splitted into individual character, split result is:
Nokia's mobile phone!Samsung mobile phone!IPhone!How is Nokia's mobile phone!IPHONE5s!Smart mobile phone! IPHONE4!IPHONE5!IPHONE4s!Samsung S4!Samsung S3!Samsung S2!How is iPhone!Samsung mobile phone is handy! Nokia's mobile phone is handy!Intelligent large-screen mobile phone
Step S203, by the sub-average character of the frequency of individual character with blank character "!" replace, the average value is all lists The average value of word frequency, replacing result is:
!!!Mobile phone!Samsung mobile phone!!!Mobile phone!!!!Mobile phone!!!!IPHONE!!!Mobile phone!IPHONE!!IPHONE!! IPHONE!!!Samsung!!!Samsung!!!Samsung!!!!!Mobile phone!!!!Samsung mobile phone!!!!!!!Mobile phone!!!!!!!!Mobile phone
Step S204, retain the phrase that number of characters in above-mentioned phrase is more than 1, as a result for:
Mobile phone!Samsung mobile phone!Mobile phone!Mobile phone!IPHONE!Mobile phone!IPHONE!IPHONE!IPHONE!Samsung!Samsung!Three Star!Mobile phone!Samsung mobile phone!Mobile phone!Mobile phone
Step S205, extraction frequency of occurrences highest character string, wherein there is frequency highest word for " mobile phone ", occur 7 times, kernel keyword " mobile phone " is extracted herein.
Step S206, remove " mobile phone " in urtext information, replaced with blank character, as a result for:
Nokia!Samsung!Apple!Nokia!How!IPHONE5s!Intelligence!IPHONE4!IPHONE5! IPHONE4s!Samsung S4!Samsung S3!Samsung S2!Apple!How!Samsung!It is handy!Nokia!It is handy!Intelligent large-size screen monitors!
Step S207, repeat the above steps S202-S206, and it is " Samsung " to extract frequency highest character string, is occurred 5 times, Kernel keyword " Samsung " is extracted herein.
Step S208, remove " Samsung " in original text information, replaced with blank character, as a result for:
Nokia!Apple!Nokia!How!IPHONE5s!Intelligence!IPHONE4!IPHONE5!IPHONE4s!S4! S3!S2!Apple!How!It is handy!Nokia!It is handy!Intelligent large-size screen monitors!
Step S209, extraction wherein frequency highest are IPHONE, occur 4 times, kernel keyword is extracted herein “IPHONE”。
It is repeatable to perform aforesaid operations, until the kernel keyword of setting quantity is obtained, or highest frequency given threshold. In this example, kernel keyword extracts result and is:Mobile phone, Samsung, IPHONE, Nokia.
A kind of keyword optimized treatment method based on big data provided by the embodiment of the present invention two, can correctly from Keyword is extracted in phrase, the accuracy for determining keyword is improved, and reduction determines the cost of keyword.
Embodiment three
Fig. 3 is the flow chart for the keyword optimized treatment method based on big data that the embodiment of the present invention three is provided, this reality Apply the application scenarios after example is extracted based on previous embodiment there is provided a kind of kernel keyword.In paid search advertisement In application process, the keyword of dispensing can be updated according to advertising results, then needs first to determine newly-increased text message, then therefrom Screen keyword to deliver, the present embodiment can determine newly-increased keyword based on the kernel keyword delivered in account.Such as Shown in Fig. 3, on the basis of previous embodiment, from each character string of merging after extraction kernel keyword, in addition to following step Suddenly:
Step S301, the text message for not including kernel keyword is deleted from newly-increased text message;
Step S302, in remaining each text message, determine comparing for non-core keyword and kernel keyword Example, and the text message that ratio is less than setting ratio value is deleted, with the text message after being filtered.
It is illustrated below, newly-increased text message is:
Samsung, Nokia's mobile phone it is expensive, Samsung mobile phone OK, cell phone number, Nokia's mobile phone how, Samsung it is big Screen mobile phone.
Delete kernel keyword from newly-increased text message, the kernel keyword that previous examples are determined is mobile phone, Samsung, IPHONE, Nokia, comprising kernel keyword.But, wherein visible, the kernel keyword occurred in " cell phone number " Ratio it is relatively low, if less than setting ratio value, being deleted filtering.Result after filtering is:Samsung, Nokia's mobile phone are expensive , Samsung mobile phone OK, Nokia's mobile phone how, Samsung large-screen mobile phone.Result after filtering can be newly-increased crucial as delivering The foundation of word, or directly as dispensing keyword.
In such scheme, preferably after the text message after being filtered, in addition to:
Kernel keyword after step S303, each filtering of extraction in text message, is defined as the label of text message;
Step S304, according to label the text message after each filtering is grouped.
Illustrate yet by examples detailed above, the label correspondence situation of text message is as follows after filtering:
Samsung --- Samsung
Nokia's mobile phone is expensive --- Nokia+mobile phone
Samsung mobile phone OK --- Samsung+mobile phone
Nokia's mobile phone is how --- Nokia+mobile phone
Samsung large-screen mobile phone --- Samsung+mobile phone
Above-mentioned label has three kinds:Samsung, Nokia+mobile phone, Samsung+mobile phone, can be divided into three groups by text message accordingly. Keyword after packet is easier to carry out packet dispensing.
The process of newly-increased keyword can be performed a plurality of times, and when newly-increased keyword is devoted in account, then next time, increase was closed During keyword, the extraction of kernel keyword can be re-started to the keyword in account, is then carried out further according to kernel keyword The screening of newly-increased keyword.
Example IV
Fig. 4 is the flow chart for the keyword optimized treatment method based on big data that the embodiment of the present invention four is provided, this reality Apply example based on previous embodiment there is provided another kernel keyword extract after application scenarios, you can to keyword with Susceptibility between attribute is identified.Arranged by pending each text message order, and be split as also wrapping before individual character Include:
Step S401, according to the attribute of pending text message text message is classified, form at least two groups and wait to locate The text message of reason;
The setting of attribute can be completed as desired, and the attribute of text message can be the technology corresponding to text message Field, region, time limit, personage and event.Preferably classify according to intention recommendation information.One of example is, Ke Yicong The priority ranking of each intention recommendation information, or classification are determined in the data such as the amount of showing and click volume of advertising service business feedback For more excellent and poor intention recommendation information.Keyword corresponding to sorted intention recommendation information, as meets the attribute Pending text message.
Step S402, pending each text message order arranged, and be split as individual character;
Step S403, according to the frequency of each individual character the individual character of setpoint frequency is removed, and remaining individual character is merged into Character string;
Step S404, from each character string of merging extract kernel keyword.
Above-mentioned steps S402-404 can refer to previous embodiment to perform, and every group of pending text message is held respectively OK.
Whether step S405, the kernel keyword for comparing the pending text message of each group are identical, by different keys Word is defined as the kernel keyword of attribute corresponding to the pending text message of the group.
If the corresponding kernel keyword of every group of attribute is different, illustrate that the different kernel keyword can more represent two groups The difference of attribute.For example, it may be possible to be the keyword for making intention recommendation information difference more sensitive, then these keywords can be set Weighted value, to do the reference frame delivered.
By the keyword optimized treatment method based on big data disclosed in the embodiment of the present invention, attribute can be realized automatically The extraction of kernel keyword, and extraction cost is low, reliability is high.
Embodiment five
The embodiment of the present invention five provides a kind of keyword optimization processing device based on big data, as shown in figure 5, specifically Including:Single-character splitting module 51, character string merging module 52 and keyword extracting module 53.
Wherein, single-character splitting module 51 is used to arrange pending each text message order, and is split as individual character;Character Merging module 52 of going here and there is used to be removed the individual character of setpoint frequency according to the frequency of each individual character, and remaining individual character is merged into word Symbol string;Keyword extracting module 53 is used to extract kernel keyword from each character string of merging.
In such scheme, single-character splitting module 51 may particularly include:Blank character setting unit 511 and split cells 512.Between It is used to arrange pending each text message order every symbol setting unit 511, blank character is set between each text message;Tear open Subdivision 512, for each text message to be split as into individual character according to blank character.
Character string merging module 52 may particularly include:Blank character replacement unit 521 and combining unit 522.Blank character is replaced Unit 521, removes the individual character of setpoint frequency for the frequency according to each individual character, and the individual character of removal is replaced with blank character;Close And unit 522, in remaining individual character, the continuous individual character between blank character to be merged into a character string.
Keyword-extraction module 53 may particularly include:Character string deletes unit 531 and extraction unit 532.Wherein, character string Unit 531 is deleted, for from each character string of merging, the character string that the quantity of character is less than given threshold to be deleted;Extract Unit 532, kernel keyword is used as in remaining character string, extracting frequency one character string of highest.
Described device may also include:Module 533 is repeated, for being used as core in extraction one character string of frequency highest After heart keyword, kernel keyword is replaced with blank character from pending each text message, triggering repeats above-mentioned tear open Divide individual character, merge character string and extract the operation of kernel keyword.
By a kind of keyword optimization processing device based on big data disclosed in the embodiment of the present invention five, it can improve really Determine the accuracy of keyword, and reduction determines the cost of keyword.
Embodiment six
The embodiment of the present invention six provides a kind of keyword optimization processing device based on big data, as shown in fig. 6, bag Include:Single-character splitting module 61, character string merging module 62 and keyword extracting module 63, in addition to:Text message removing module 64, after the extraction kernel keyword in each character string from merging, deleted from newly-increased text message and do not include core The text message of keyword;
Text message module 65 is filtered, in remaining each text message, determining non-core keyword and core The appearance ratio of keyword, and the text message that ratio is less than setting ratio value is deleted, with the text message after being filtered.
Label determining module 66, for after the text message after being filtered, extracting text message after each filtering In kernel keyword, be defined as the label of text message;
Grouping module 67, for being grouped the text message after each filtering according to label.
Said apparatus can realize the what's new for delivering keyword.
Or, in the device, text information processing module can also be included, for pending each text message is suitable Sequence is arranged, and is split as before individual character, and text message is classified according to the attribute of pending text message, forms at least two The pending text message of group;
Kernel keyword determining module, it is relatively more each after the extraction kernel keyword in each character string from merging Whether the kernel keyword of the pending text message of group is identical, and different kernel keywords is defined as into the pending text envelope of the group The kernel keyword of the corresponding attribute of breath.
A kind of keyword optimization processing device based on big data provided by the embodiment of the present invention five, will can be increased newly It is correct in text message to extract keyword, it is added in original keyword group.
The said goods can perform the method that any embodiment of the present invention is provided, and possess the corresponding functional module of execution method And beneficial effect.
The technical scheme of the embodiment of the present invention, with reference to statistical knowledge and text mining knowledge, selectes the individual character of certain frequency Restore in original text information, simplify the complexity for considering word composition mechanism etc., be that the realization of this method provides the foundation; In word discovery procedure, the frequency and simple positional information of text are taken into account, the Exact Travelling found for word provides guarantor Card;In word selection course, frequency of occurrences highest word is only taken every time, takes the continual selection of circulative metabolism, will not The interference of controllable factor is preferably minimized, and improves the degree of accuracy of word discovery.
Scheme of the embodiment of the present invention handles keyword process compared to existing manual type, and advantages and benefits are:
Firstth, the filtering of extraction, keyword for kernel keyword and packet standard are unified, in the absence of because of people Different situation.Algorithm can be analyzed for the related text message of each promoted account, the kernel keyword of extraction and popularization Account is closely related, and greatly reduces and does not know about the deviation brought, unified filtering and packet side to promoting industry etc. The very big facility that follow-up optimization of the formula to promoted account is also brought;
Secondth, during processing keyword, core word that is artificial more time-consuming or even can not completing is extracted, mistake Filter, grouping process learn to complete automatically by algorithm, save consultant's valuable time.
Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art it is various it is obvious change, Readjust and substitute without departing from protection scope of the present invention.Therefore, although the present invention is carried out by above example It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims (8)

1. a kind of keyword optimized treatment method based on big data, it is characterised in that including:
Pending each text message order is arranged, and is split as individual character;
It is described to arrange pending each text message order, and be split as individual character and specifically include:
Pending each text message order is arranged, blank character is set between each text message;
According to the blank character, each text message is split as individual character;
The individual character of setpoint frequency is removed according to the frequency of each individual character, and remaining individual character is merged into character string;
The frequency of each individual character of basis removes the individual character of setpoint frequency, and it is specific that remaining individual character is merged into character string Including:
The individual character of setpoint frequency is removed according to the frequency of each individual character, the individual character of removal is replaced with blank character;
In remaining individual character, the continuous individual character between blank character is merged into a character string;
Kernel keyword is extracted from each character string of merging.
2. according to the method described in claim 1, it is characterised in that extract kernel keyword in each character string from merging Specifically include:
From each character string of merging, the character string that the quantity of character is less than given threshold is deleted;
In remaining character string, extract frequency one character string of highest and be used as kernel keyword;
After one character string of frequency highest is extracted as kernel keyword, methods described also includes:
The kernel keyword is replaced with blank character from pending each text message, above-mentioned fractionation individual character, conjunction is repeated And character string and the operation for extracting kernel keyword.
3. the method according to claim any one of 1-2, it is characterised in that extracted in each character string from merging After kernel keyword, in addition to:
The text message for not including kernel keyword is deleted from newly-increased text message;
In remaining each text message, the appearance ratio of non-core keyword and kernel keyword is determined, and delete ratio Less than the text message of setting ratio value, with the text message after being filtered.
4. method according to claim 3, it is characterised in that after the text message after described filtered, also wrap Include:
The kernel keyword in text message after each filtering is extracted, is defined as the label of the text message;
The text message after each filtering is grouped according to label.
5. according to the method described in claim 1, it is characterised in that arrange pending each text message order described, And be split as also including before individual character:
Text message is classified according to the attribute of pending text message, at least two groups pending text messages are formed;
In each character string from merging after extraction kernel keyword, in addition to:
Whether the kernel keyword for comparing the pending text message of each group is identical, and different kernel keywords is defined as into the group treats Handle the kernel keyword of attribute corresponding to text message.
6. a kind of keyword optimization processing device based on big data, it is characterised in that including:
Single-character splitting module, for pending each text message order to be arranged, and is split as individual character;
The single-character splitting module includes:
Blank character setting unit, for pending each text message order to be arranged, sets interval between each text message Symbol;
Split cells, for each text message to be split as into individual character according to the blank character;
Character string merging module, removes the individual character of setpoint frequency for the frequency according to each individual character, and by remaining individual character Merge into character string;
The character string merging module includes:
Blank character replacement unit, removes the individual character of setpoint frequency for the frequency according to each individual character, the individual character of removal with Replaced every symbol;
Combining unit, in remaining individual character, the continuous individual character between blank character to be merged into a character string;
Keyword extracting module, for extracting kernel keyword from each character string of merging;
The keyword-extraction module includes:
Character string deletes unit, for from each character string of merging, the character string that the quantity of character is less than given threshold to be deleted Remove;
Extraction unit, kernel keyword is used as in remaining character string, extracting frequency one character string of highest;
Described device also includes:Module is repeated, for being used as kernel keyword in extraction one character string of frequency highest Afterwards, the kernel keyword is replaced with blank character from pending each text message, triggering repeats above-mentioned fractionation list Word, the operation for merging character string and extraction kernel keyword;
Described device also includes:Text message removing module, in each character string from merging extract kernel keyword it Afterwards, the text message for not including kernel keyword is deleted from newly-increased text message;
Text message module is filtered, in remaining each text message, determining non-core keyword and kernel keyword Appearance ratio, and delete ratio be less than setting ratio value text message, with the text message after being filtered.
7. device according to claim 6, it is characterised in that also include:
Label determining module, for after the text message after being filtered, extracting the core after each filtering in text message Heart keyword, is defined as the label of the text message;
Grouping module, for being grouped the text message after each filtering according to label.
8. device according to claim 6, it is characterised in that also include:
Text information processing module, for arranging pending each text message order described, and is split as before individual character, Text message is classified according to the attribute of pending text message, at least two groups pending text messages are formed;
Kernel keyword determining module, it is relatively more each after the extraction kernel keyword in each character string from merging Whether the kernel keyword of the pending text message of group is identical, and different kernel keywords is defined as into the pending text envelope of the group The kernel keyword of the corresponding attribute of breath.
CN201310696077.2A 2013-12-18 2013-12-18 A kind of keyword optimized treatment method and device based on big data Expired - Fee Related CN103631963B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310696077.2A CN103631963B (en) 2013-12-18 2013-12-18 A kind of keyword optimized treatment method and device based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310696077.2A CN103631963B (en) 2013-12-18 2013-12-18 A kind of keyword optimized treatment method and device based on big data

Publications (2)

Publication Number Publication Date
CN103631963A CN103631963A (en) 2014-03-12
CN103631963B true CN103631963B (en) 2017-10-17

Family

ID=50213004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310696077.2A Expired - Fee Related CN103631963B (en) 2013-12-18 2013-12-18 A kind of keyword optimized treatment method and device based on big data

Country Status (1)

Country Link
CN (1) CN103631963B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063370B (en) * 2014-07-01 2017-09-22 北京博雅立方科技有限公司 A kind of intelligent packet method and device based on keyword
CN106033403B (en) * 2015-03-20 2019-05-31 广州金山移动科技有限公司 A kind of text conversion method and device
WO2017128438A1 (en) * 2016-01-31 2017-08-03 深圳市博信诺达经贸咨询有限公司 Method and system for application of big data
CN110069676A (en) * 2017-09-28 2019-07-30 北京国双科技有限公司 Keyword recommendation method and device
CN108538300B (en) * 2018-02-27 2021-01-29 科大讯飞股份有限公司 Voice control method and device, storage medium and electronic equipment
CN109949806B (en) * 2019-03-12 2021-07-27 百度国际科技(深圳)有限公司 Information interaction method and device
CN112000794B (en) * 2020-07-30 2023-08-22 北京百度网讯科技有限公司 Text corpus screening method and device, electronic equipment and storage medium
CN113538062B (en) * 2021-07-28 2024-05-07 福州果集信息科技有限公司 Method for reversely pushing bid words purchased by commodity popularization notes

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101477566A (en) * 2009-01-19 2009-07-08 腾讯科技(深圳)有限公司 Method and apparatus used for putting candidate key words advertisement
CN101625683A (en) * 2008-07-09 2010-01-13 精实万维软件(北京)有限公司 Method for selecting bidding advertisement keyword during release of search engine bidding advertisement
CN102156721A (en) * 2011-03-29 2011-08-17 张栋 Method for accurately delivering Internet video advertisement based on label
CN102169496A (en) * 2011-04-12 2011-08-31 清华大学 Anchor text analysis-based automatic domain term generating method
CN103092956A (en) * 2013-01-17 2013-05-08 上海交通大学 Method and system for topic keyword self-adaptive expansion on social network platform

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100504851C (en) * 2007-06-27 2009-06-24 腾讯科技(深圳)有限公司 Chinese character word distinguishing method and system
CN101122900A (en) * 2007-09-25 2008-02-13 中兴通讯股份有限公司 Words partition system and method
JP2012256268A (en) * 2011-06-10 2012-12-27 Ad Space Co Ltd Advertisement distribution device and advertisement distribution program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625683A (en) * 2008-07-09 2010-01-13 精实万维软件(北京)有限公司 Method for selecting bidding advertisement keyword during release of search engine bidding advertisement
CN101477566A (en) * 2009-01-19 2009-07-08 腾讯科技(深圳)有限公司 Method and apparatus used for putting candidate key words advertisement
CN102156721A (en) * 2011-03-29 2011-08-17 张栋 Method for accurately delivering Internet video advertisement based on label
CN102169496A (en) * 2011-04-12 2011-08-31 清华大学 Anchor text analysis-based automatic domain term generating method
CN103092956A (en) * 2013-01-17 2013-05-08 上海交通大学 Method and system for topic keyword self-adaptive expansion on social network platform

Also Published As

Publication number Publication date
CN103631963A (en) 2014-03-12

Similar Documents

Publication Publication Date Title
CN103631963B (en) A kind of keyword optimized treatment method and device based on big data
CN109271512B (en) Emotion analysis method, device and storage medium for public opinion comment information
CN106682169B (en) Application label mining method and device, application searching method and server
CN108369709B (en) System and method for network-based advertisement data traffic latency reduction
US20130304469A1 (en) Information processing method and apparatus, computer program and recording medium
CN103023753B (en) Method, client and the system of interaction content association output in instant messaging
US20110153595A1 (en) System And Method For Identifying Topics For Short Text Communications
CN107766371A (en) A kind of text message sorting technique and its device
US10546336B2 (en) Search device, search method, program, and storage medium
CN103377249A (en) Keyword putting method and system
CN106682170B (en) Application search method and device
US20160085855A1 (en) Perspective data analysis and management
CN108364199A (en) A kind of data analysing method and system based on Internet user's comment
CN108256537A (en) A kind of user gender prediction method and system
CN105786793A (en) Method and device for analyzing semanteme of spoken language text information
CN103684969A (en) Message handling method and message handling system
CN107679217A (en) Association method for extracting content and device based on data mining
CN103123624A (en) Method of confirming head word, device of confirming head word, searching method and device
CN107422941A (en) Exchange method and system
CN106997339A (en) Text feature, file classification method and device
CN103150331A (en) Method and device for providing search engine tags
CN103246655A (en) Text categorizing method, device and system
CN109978624A (en) Information processing method, electronic equipment and computer readable storage medium
CN107729573A (en) Information-pushing method and device
CN106294676A (en) A kind of data retrieval method of ecommerce government system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171017

Termination date: 20211218

CF01 Termination of patent right due to non-payment of annual fee