CN110399479A - Search for data processing method, device, electronic equipment and computer-readable medium - Google Patents
Search for data processing method, device, electronic equipment and computer-readable medium Download PDFInfo
- Publication number
- CN110399479A CN110399479A CN201810361882.2A CN201810361882A CN110399479A CN 110399479 A CN110399479 A CN 110399479A CN 201810361882 A CN201810361882 A CN 201810361882A CN 110399479 A CN110399479 A CN 110399479A
- Authority
- CN
- China
- Prior art keywords
- keyword
- data
- shop
- characteristic
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0623—Item investigation
- G06Q30/0625—Directed, with specific intent or strategy
Abstract
This disclosure relates to a kind of search data processing method, device, electronic equipment and computer-readable medium.It is related to computer information processing field, this method comprises: determining keyword set according to shop data;The predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature set;Characteristic processing is carried out to the keyword feature set and generates characteristic;And by the characteristic input prediction model to obtain the corresponding click prediction data of each keyword, the prediction model is established by sorting algorithm.This disclosure relates to search data processing method, device, electronic equipment and computer-readable medium.Can Accurate Prediction search key be shop bring clicking rate, promote conversion ratio of the shop in search system, and different intervention strateges is formulated according to user's gender and region.
Description
Technical field
This disclosure relates to computer information processing field, in particular to a kind of search data processing method, device,
Electronic equipment and computer-readable medium.
Background technique
With flourishing for internet electronic business, shopping online has become an important way of people's current consumption
Diameter.The effective tool of commodity needed for search engine is quickly obtained from mass data as user has website traffic distribution
Very important status.It is one of the important way that specific shop water conservancy diversion has become flow operation by intervening search result, so
And traditional artificial screening intervenes keyword and has been unable to satisfy fast-developing business demand.
In the prior art, shop search, which is intervened, relies primarily on the preferable keyword of hand picking transformation in planta rate, then mentions
The sequence of dependent merchandise search results pages under the keyword in shop is risen to increase shop exposure.It is before being intervened by comparison and dry
The Data Representation of prognosis for a period of time assesses intervention effect.Existing scheme cannot achieve accurate operation, and transformation in planta rate is high
Keyword may conversion ratio be unsatisfactory in the case where intervening shop, and biggish flow is caused to waste.Existing scheme to different geographical and
The user of different sexes uses same set of intervention stratege, and personalized support cannot be carried out to shop.Prior art can not be pre-
Know intervention effect, selected intervention stratege has certain blindness, and can only pass through for a period of time comparison patients before and after intervention after intervention
Data Representation assess intervention effect, there is larger hysteresis quality, can not quickly improve intervention stratege.
Therefore, it is necessary to a kind of new search data processing method, device, electronic equipment and computer-readable mediums.
Above- mentioned information are only used for reinforcing the understanding to the background of the disclosure, therefore it disclosed in the background technology part
It may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
In view of this, the disclosure provides a kind of search data processing method, device, electronic equipment and computer-readable Jie
Matter, can Accurate Prediction search key be shop bring clicking rate, promoted shop launch advertisement conversion ratio.
Other characteristics and advantages of the disclosure will be apparent from by the following detailed description, or partially by the disclosure
Practice and acquistion.
According to the one side of the disclosure, a kind of search data processing method is proposed, this method comprises: true according to shop data
Determine keyword set;The predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature collection
It closes;Characteristic processing generation and characteristic are carried out to the keyword feature set;And the characteristic is inputted pre-
Model is surveyed to obtain the corresponding click prediction data of each keyword, the prediction model is established by sorting algorithm.
In a kind of exemplary embodiment of the disclosure, further includes: generated by history shop data and sorting algorithm pre-
Model is surveyed, the sorting algorithm includes logistic regression algorithm.
In a kind of exemplary embodiment of the disclosure, prediction model packet is generated by history shop data and sorting algorithm
It includes: obtaining the history feature data in the data of history shop;And using history feature data as independent variable, by predesignated subscriber's row
To generate the prediction model by the training sorting algorithm as output variable.
In a kind of exemplary embodiment of the disclosure, determine that keyword set includes: to the shop according to shop data
It spreads data and carries out data prediction;Extract multiple first keywords in the shop data after pre-processing;To the multiple
One keyword carries out polymerization processing respectively, obtains multiple first keyword dimensional searches amounts;And it is crucial to the multiple first
Word dimensional searches, which measure, carries out Screening Treatment to obtain the keyword set.
In a kind of exemplary embodiment of the disclosure, carrying out data prediction to the shop data includes: to described
User's click logs and order log in the data of shop carry out data prediction.
In a kind of exemplary embodiment of the disclosure, polymerization processing is carried out to the multiple first keyword respectively, is obtained
Taking multiple first keyword dimensional searches amounts includes: to gather the multiple first keyword under predetermined dimensional characteristics respectively
Conjunction processing generates multiple first keyword dimensional searches amounts.
In a kind of exemplary embodiment of the disclosure, the multiple first keyword dimensional searches amount is carried out at screening
Reason includes: to judge whether each first keyword dimensional searches amount meets predetermined item respectively to obtain the keyword set
Part;And the keyword dimensional searches by meeting predetermined condition measure corresponding first keyword and generate the keyword set.
In a kind of exemplary embodiment of the disclosure, the predetermined condition includes: the first keyword flow accounting, clicks
Conversion ratio and lower single conversion ratio.
According to the one side of the disclosure, it proposes that a kind of search data processing equipment, the device include: collection modules, is used for
Keyword set is determined according to shop data;Extraction module, for extracting each of keyword set keyword
Predetermined characteristic is to generate keyword feature set;Characteristic module, it is raw for carrying out characteristic processing to the keyword feature set
At and characteristic;And prediction module, for by the characteristic input prediction model to obtain each keyword
Corresponding click prediction data, the prediction model are established by sorting algorithm.
In a kind of exemplary embodiment of the disclosure, further includes: training module, for passing through history shop data and dividing
Class algorithm generates prediction model, and the sorting algorithm includes logistic regression algorithm.
According to the one side of the disclosure, a kind of electronic equipment is proposed, which includes: one or more processors;
Storage device, for storing one or more programs;When one or more programs are executed by one or more processors, so that one
A or multiple processors realize such as methodology above.
According to the one side of the disclosure, it proposes a kind of computer-readable medium, is stored thereon with computer program, the program
Method as mentioned in the above is realized when being executed by processor.
It, can be accurately pre- according to search data processing method, device, electronic equipment and the computer-readable medium of the disclosure
Survey search key is shop bring clicking rate, promotes the conversion ratio that advertisement is launched in shop.
It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited
It is open.
Detailed description of the invention
Its example embodiment is described in detail by referring to accompanying drawing, above and other target, feature and the advantage of the disclosure will
It becomes more fully apparent.Drawings discussed below is only some embodiments of the present disclosure, for the ordinary skill of this field
For personnel, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the system block diagram of a kind of search data processing method shown according to an exemplary embodiment and device.
Fig. 2 is a kind of flow chart for searching for data processing method shown according to an exemplary embodiment.
Fig. 3 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.
Fig. 4 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.
Fig. 5 is a kind of block diagram for searching for data processing equipment shown according to an exemplary embodiment.
Fig. 6 is a kind of schematic diagram of the search data processing equipment shown according to another exemplary embodiment.
Fig. 7 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Fig. 8 is that a kind of computer readable storage medium schematic diagram is shown according to an exemplary embodiment.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be real in a variety of forms
It applies, and is not understood as limited to embodiment set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will be comprehensively and complete
It is whole, and the design of example embodiment is comprehensively communicated to those skilled in the art.Identical appended drawing reference indicates in figure
Same or similar part, thus repetition thereof will be omitted.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner
In example.In the following description, many details are provided to provide and fully understand to embodiment of the disclosure.However,
It will be appreciated by persons skilled in the art that can with technical solution of the disclosure without one or more in specific detail,
Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side
Method, device, realization or operation are to avoid fuzzy all aspects of this disclosure.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity.
I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit
These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step,
It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close
And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
It should be understood that although herein various assemblies may be described using term first, second, third, etc., these groups
Part should not be limited by these terms.These terms are to distinguish a component and another component.Therefore, first group be discussed herein below
Part can be described as the second component without departing from the teaching of disclosure concept.As used herein, term " and/or " include associated
All combinations for listing any of project and one or more.
It will be understood by those skilled in the art that attached drawing is the schematic diagram of example embodiment, module or process in attached drawing
Necessary to not necessarily implementing the disclosure, therefore it cannot be used for the protection scope of the limitation disclosure.
Fig. 1 is the system block diagram of a kind of search data processing method shown according to an exemplary embodiment and device.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 101,102,103
The application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, packet
Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as utilize terminal device 101,102,103 to user
The shopping class website browsed provides the back-stage management server supported.Back-stage management server can believe the product received
The data such as breath inquiry request carry out the processing such as analyzing, and processing result is fed back to terminal device.
Server 105 for example can determine keyword set according to shop data;Server 105 can for example extract the key
The predetermined characteristic of each of set of words keyword is to generate keyword feature set;Server 105 can be for example to the pass
Keyword characteristic set carries out characteristic processing and generates characteristic;Server 105 can be for example by the characteristic input prediction mould
Type to obtain the corresponding click prediction data of each keyword, established by sorting algorithm by the prediction model.
Server 105 also for example can generate prediction model, the sorting algorithm by history shop data and sorting algorithm
Including logistic regression algorithm.
Server 105 can be the server of an entity, also may be, for example, multiple server compositions, needs to illustrate
It is that search data processing method provided by the embodiment of the present disclosure can be executed by server 105, correspondingly, searches at data
Reason device can be set in server 105.And it is supplied to user and carries out the page end of goods browse and carry out businessman's inquiry
Request end is normally in terminal device 101,102,103.
Fig. 2 is a kind of flow chart for searching for data processing method shown according to an exemplary embodiment.It searches at data
Reason method 20 includes at least step S202 to S208.
As shown in Fig. 2, determining keyword set according to shop data in S202.Click and order day can for example be rejected
Invalid data in will, avoids system from being interfered by hash, and further authority data format, promotes follow-up data processing effect
Rate.Data prediction for example can be carried out to the shop data;Multiple first extracted in the shop data after pre-processing are closed
Keyword;Polymerization processing is carried out to the multiple first keyword respectively, obtains multiple first keyword dimensional searches amounts;And it is right
The multiple first keyword dimensional searches, which measure, carries out Screening Treatment to obtain the keyword set.
In one embodiment, carrying out data prediction to the shop data includes: to the use in the shop data
Family click logs and order log carry out data prediction.In view of keyword data to timeliness require and not bery strongly, and
User's search click data volume is huge, frame can be handled for example, by using offline distributed big data, with nearest history search in N days pass
Keyword data are to excavate pond, and daily hundreds of millions rank data of batch processing are launched for online advertisement, and search is intervened, precisely operation etc.
System provides specification stable based data service.Offline distribution big data processing frame may be, for example, based on Hadoop's
Spark big data processing platform, the real-time data base externally provided may be, for example, Hbase or Redis.
In S204, the predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature
Set.Predetermined characteristic may be, for example: volumes of searches, search number of users, click volume, click number of users, lower list number of users, order row,
Transaction amount, average exposure depth, average click location, average lower single position are clicked conversion ratio CTR (click volume/volumes of searches),
Lower list conversion ratio CVR (order row/click volume), visitor unit price price (transaction amount/order row), UV value (gmv/ searches for uv),
RPM (1000* transaction amount/volumes of searches), per capita features such as volumes of searches (volumes of searches/search user).
In S206, characteristic processing is carried out to the keyword feature set and generates characteristic.Characteristic processing can for example,
The index too big to very poor (difference i.e. between maxima and minima), such as volumes of searches, transaction amount etc. first use log function
Smoothly;It is in order to eliminate the influence of index magnitude, all features are shown as follows, utilize minimax method for normalizing to turn
Change to [0,1] section:
Wherein, Y be normalization after data, X be to normalized numerical value, Xmin, Xmax be respectively X minimum value with
Maximum value.
Generation characteristic is normalized in data in all keyword feature set.
In S208, by the characteristic input prediction model to obtain the corresponding click prediction number of each keyword
According to the prediction model is established by sorting algorithm.By the above step, determine that keyword may be, for example, pending
The vocabulary of intervention can also for example set display location of the keyword in result of page searching, can be for example, search results pages Top
The position M is (10 such as preceding) for that can intervene position, and position is intervened in setting, the commodity in the shop is shown in this position, by what is selected
Keyword and position to be presented input are preset in prediction model to obtain the corresponding click prediction number of each keyword
According to, and then estimate and dry outcome is carried out to keyword, it is shop bring flow.
According to the search data processing method of the disclosure, by handling shop historical data, and then extract crucial
Then word inputs keyword in preset prediction model, obtain click prediction data mode, can Accurate Prediction search
Rope keyword is shop bring clicking rate, promotes the conversion ratio that advertisement is launched in shop.
It will be clearly understood that the present disclosure describes how to form and use particular example, but the principle of the disclosure is not limited to
These exemplary any details.On the contrary, the introduction based on disclosure disclosure, these principles can be applied to many other
Embodiment.
In a kind of exemplary embodiment of the disclosure, further includes: generated by history shop data and sorting algorithm pre-
Model is surveyed, the sorting algorithm includes logistic regression algorithm.It can be for example, obtaining the history feature data in the data of history shop;
And pass through the training sorting algorithm using predesignated subscriber's behavior as output variable using history feature data as independent variable
Generate the prediction model.
Logistic regression is a kind of typical sorting algorithm, and to one group of given independent variable, it is other general that output belongs to every type
Rate.In this application can be for example, by using binary logistic regression, i.e. classification is 0,1 two kinds, conditional probability distribution are as follows:
In this application, x is input independent variable (namely characteristic), and y=0,1 is output variable, can be described as using
Whether click commodity or whether place an order in family.θ is the corresponding weight coefficient of independent variable x.It, can be such as during training pattern
Using there is maximum-likelihood method estimation.Also prediction model can be established with bayesian algorithm for example, setting by the decline of GBDT gradient, established
Process is similar to the above process, and the application repeats no more again.The application is not limited.
According to the search data processing method of the disclosure, obtained using the model training that shop historical data carries out sorting algorithm
The mode of prediction model is taken, the prediction model of relationship between Accurate Prediction keyword and clicking rate can be obtained.
Fig. 3 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.It is shown in Fig. 3
Process is the detailed description to S202 in process shown in Fig. 2 " determining keyword set according to shop data ".Fig. 3 is illustrative
The invalid data rejected and clicked in order log is described, avoids system from being interfered by hash, and further authority data
Format promotes the associated process steps of follow-up data treatment effeciency.
As shown in figure 3, obtaining and clicking and order daily record data in S302.It may be, for example, nearest N days (such as 90 days) users
Search is clicked and order log data.
In S304, crawler data are judged whether it is.Can for example, identification record whether be obtain web page contents crawler,
If yes then enter step S310.
In S306, judge whether data report complete.Judge the integrality that user information and keyword report, rejects nothing
The data of keyword or user information, if yes then enter step S310.
In S308, cheating user data is judged whether it is.Judging user's (login account or browser unique number) is
No is cheating user (cheating user list can be from air control data acquisition), if yes then enter step S310, abandons the user's
All records.
In S310, abandon.
In S312, keyword standardization.Can for example filter spcial character (!@$ %^* ()=~` { } |:;" ') and
Violated word, English alphabet capitalization switch to small letter, and Complex form of Chinese Character switchs to simplified, the removal of keyword head and the tail space and internal multiple spaces
Single space is merged into, long character (such as length is more than 15 character) processing is filtered.
In S314, the click and order table effectively standardized is exported.
In one embodiment, polymerization processing is carried out to the multiple first keyword respectively, it is crucial to obtain multiple first
Word dimensional searches amount includes: that the multiple first keyword is carried out polymerization processing under predetermined dimensional characteristics respectively, is generated more
A first keyword dimensional searches amount.It can be for example, statistics keyword and shop, user's gender, region and result page position etc. be handed over
Volumes of searches under dimension is pitched, number of users is searched for, click volume clicks number of users, lower list number of users, order row, the bases such as transaction amount
Achievement data, and index conversion is carried out by attenuation function.
In this step process, input data may be, for example, the click effectively standardized after pretreatment and order log sheet,
The shop commodity sku- id dimension table, user's gender dimension table, IP address and region mapping table.Output data may be, for example, following shape
Formula: keyword-position, keyword-gender-position, keyword-region-position, keyword-shop-position, keyword-shop
Paving-gender-position, the aggregated data of six dimensions such as keyword-shop-region-position.
First with the pretreated click effectively standardized and order table associated articles, user's gender, the life of region dimension table
Tapability is not and the shop of regional information is clicked and order table.
Then to click and order table according to keyword-position, keyword-gender-position, keyword-region-position, close
Keyword-shop-position, keyword-shop-gender-position, keyword-shop-dimension of region-position six daily, are divided respectively
Group polymerization calculates volumes of searches, searches for number of users, and click volume clicks number of users, lower list number of users, order row, the bases such as transaction amount
Plinth index.
It is small that hits can be for example rejected since keyword click data is there are obvious long tail effect, in reprocessing process
In the data of some threshold value (such as 100), data volume is reduced to improve follow-up data treatment effeciency.
Also processing can be weighted to base values daily under each dimension for example, by using following formula, i.e., to away from calculation date
Data in n days, which are not done, to decay, and decays to n to N days data.However the application is not limited.
In one embodiment, the multiple first keyword dimensional searches are measured and carries out Screening Treatment to obtain the pass
Keyword set includes: to judge whether each first keyword dimensional searches amount meets predetermined condition respectively;And pass through satisfaction
The keyword dimensional searches of predetermined condition measure corresponding first keyword and generate the keyword set.The predetermined condition packet
Include: the first keyword flow accounting, click conversion ratio and lower single conversion ratio can for example, for keep the search ecological balance and
User experience is not influenced, it, should when keyword has reached given threshold value to the ratio between the flow in shop and keyword entirety flow
Keyword can not intervene.In the candidate keywords that other can intervene, picks out and click conversion ratio (Click-Through-
Rate, CTR), lower list conversion ratio (Conversion Rate, CVR) is more than that the keyword of average level generates the keyword set
It closes.
Fig. 4 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.It is shown in Fig. 4
Process is to " crucial to the multiple first in S202 in process shown in Fig. 2 " determining keyword set according to shop data "
Word dimensional searches measure carry out Screening Treatment to obtain the keyword set " detailed description.Fig. 4 is illustratively described and is selected
Qualified keyword is as the associated process steps for intervening alternative.
As shown in figure 4, selecting shop to be processed in S402.
In S404, multiple first keywords in the shop are determined by shop data.
In S406, judge the first keyword to the flow and keyword bulk flow in shop respectively each keyword
Whether the ratio between amount has reached given threshold value.When above-mentioned ratio is greater than given threshold value, into S408, S410 is otherwise entered step.
In S408, whether the CTR of first keyword, CVR are more than average level.It is entered step when more than mean values
Otherwise S410 enters S410.
In S410, first keyword is abandoned.
In S412, keyword set is generated.What the keyword can be used as the shop intervenes keyword.
It is ad system, intervenes by polymerizeing to the various dimensions of keyword according to the search data processing method of the disclosure
System, operation personnel provide basic data support, value of each position to shop under quantized key word and word.
According to the search data processing method of the disclosure, by analysis of key word to the water conservancy diversion of different sexes and regional user
Effect precisely finds high potentiality user for shop, and orientation intervenes search result under keyword, improves conversion ratio, enhances shop pair
The approval of platform.
According to the search data processing method of the disclosure, clicking rate prediction is carried out by Logic Regression Models, is quickly calculated
The desired effect of various combination intervention stratege facilitates operation personnel according to the suitably intervention plan of different target makings.
The search data processing method of the disclosure searches for shopping path with user (search is clicked, adds shopping cart, place an order)
On data based on, keyword data is pre-processed and is standardized, and using keyword as core, from shop, Yong Huxing
Not, multiple cross-dimension polymerization analysis such as position under regional and keyword, precisely depict different sexes, position under area and word
It is the influence of shop diversion effect to keyword, and is ad system by importing data to Hbase or Redis, precisely runs
System, search interfering system provides real time data and supports.
It will be appreciated by those skilled in the art that realizing that all or part of the steps of above-described embodiment is implemented as being executed by CPU
Computer program.When the computer program is executed by CPU, above-mentioned function defined by the above method that the disclosure provides is executed
Energy.The program can store in a kind of computer readable storage medium, which can be read-only memory, magnetic
Disk or CD etc..
Further, it should be noted that above-mentioned attached drawing is only the place according to included by the method for disclosure exemplary embodiment
Reason schematically illustrates, rather than limits purpose.It can be readily appreciated that above-mentioned processing shown in the drawings is not indicated or is limited at these
The time sequencing of reason.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.It is real for disclosure device
Undisclosed details in example is applied, embodiments of the present disclosure is please referred to.
Fig. 5 is a kind of block diagram for searching for data processing equipment shown according to an exemplary embodiment.Search for data processing
Device 50 includes: collection modules 502, extraction module 504, characteristic module 506, prediction module 508, search data processing equipment 50
May be used also for example, training module 510.
Collection modules 502 are used to determine keyword set according to shop data.It can for example reject in click and order log
Invalid data, avoid system from being interfered by magazine data, and further authority data format, promote follow-up data treatment effeciency.
Data prediction for example can be carried out to the shop data;Extract multiple first keys in the shop data after pre-processing
Word;Polymerization processing is carried out to the multiple first keyword respectively, obtains multiple first keyword dimensional searches amounts;And to institute
It states multiple first keyword dimensional searches and measures progress Screening Treatment to obtain the keyword set.
Extraction module 504 is used to extract the predetermined characteristic of each of keyword set keyword to generate key
Word characteristic set.Predetermined characteristic may be, for example: volumes of searches, search for number of users, and click volume clicks number of users, and lower list number of users is ordered
Conversion ratio CTR (click volume/search is clicked in uniline, transaction amount, average exposure depth, average click location, average lower single position
Rope amount), lower list conversion ratio CVR (order row/click volume), (gmv/ is searched visitor unit price price (transaction amount/order row), UV value
Rope uv), RPM (1000* transaction amount/volumes of searches), features such as volumes of searches (volumes of searches/search user) per capita.
Characteristic module 506 is used to carry out characteristic processing generation and characteristic to the keyword feature set.Feature
Processing can be for example, the index too big to very poor (difference i.e. between maxima and minima), such as volumes of searches, transaction amount etc.,
It is first smooth with log function;In order to eliminate the influence of index magnitude, all characteristic use minimax method for normalizing are transformed into
[0,1] section.
Prediction module 508 is used for the characteristic input prediction model to obtain the corresponding click of each keyword
Prediction data, the prediction model are established by sorting algorithm.By the above step, determine that keyword may be, for example,
The vocabulary of pending intervention can also for example set display location of the keyword in result of page searching, can be for example, search result
The position page Top M is (10 such as preceding) for that can intervene position, and position is intervened in setting, and the commodity in the shop are shown in this position, will
Selected keyword and position input to be presented are preset in prediction model to obtain the corresponding click of each keyword
Prediction data, and then estimate and dry outcome is carried out to keyword, it is shop bring flow.
Training module 510 is used to generate prediction model, the sorting algorithm packet by history shop data and sorting algorithm
Include logistic regression algorithm.It can be for example, obtaining the history feature data in the data of history shop;And using history feature data as
Independent variable generates the prediction model by the training sorting algorithm using predesignated subscriber's behavior as output variable.
According to the search data processing equipment of the disclosure, by handling shop historical data, and then extract crucial
Then word inputs keyword in preset prediction model, obtain click prediction data mode, can Accurate Prediction search
Rope keyword is shop bring clicking rate, promotes the conversion ratio that advertisement is launched in shop.
Fig. 6 is a kind of schematic diagram of the search data processing equipment shown according to another exemplary embodiment.Fig. 6 is exemplary
Illustrate in the application search for data processing method process flow frame.Searching for data processing equipment 60 can for example,
Data warehouse module 602, data processing and modeling module 604, distributed data system 606, data application scene module 608.
Wherein, data warehouse module 602 can be used for storing initial data, and initial data may be, for example, shop data, specifically
It include: search data, click data adds purchase data and lower forms data etc..
Data processing can be used for pre-processing original processing with modeling module 604, later to data carry out polymerization with
It calculates.Word is intervened in exportable gender personalization shop after calculating, and word and gender, that is, region are intervened in zone individualty shop
Property shop intervene word.Also the data after calculating for example can be subjected to keyword feature extraction, then in input prediction model,
It obtains and clicks estimating for conversion ratio.
Distributed data system 606 can be used for Individuation Management shop and intervene word, also for example can intervene word to shop and carry out
Result estimate.
Data application scene module 608 can be used for supporting advertisement delivery system, search interfering system, and precisely operation system
System etc..
Fig. 7 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
The electronic equipment 200 of this embodiment according to the disclosure is described referring to Fig. 7.The electronics that Fig. 7 is shown
Equipment 200 is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in fig. 7, electronic equipment 200 is showed in the form of universal computing device.The component of electronic equipment 200 can wrap
It includes but is not limited to: at least one processing unit 210, at least one storage unit 220, (including the storage of the different system components of connection
Unit 220 and processing unit 210) bus 230, display unit 240 etc..
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 210
Row, so that the processing unit 210 executes described in this specification above-mentioned electronic prescription circulation processing method part according to this
The step of disclosing various illustrative embodiments.For example, the processing unit 210 can be executed such as Fig. 2, Fig. 3, shown in Fig. 4
The step of.
The storage unit 220 may include the readable medium of volatile memory cell form, such as random access memory
Unit (RAM) 2201 and/or cache memory unit 2202 can further include read-only memory unit (ROM) 2203.
The storage unit 220 can also include program/practical work with one group of (at least one) program module 2205
Tool 2204, such program module 2205 includes but is not limited to: operating system, one or more application program, other programs
It may include the realization of network environment in module and program data, each of these examples or certain combination.
Bus 230 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage
Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures
Local bus.
Electronic equipment 200 can also be with one or more external equipments 300 (such as keyboard, sensing equipment, bluetooth equipment
Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 200 communicate, and/or with make
Any equipment (such as the router, modulation /demodulation that the electronic equipment 200 can be communicated with one or more of the other calculating equipment
Device etc.) communication.This communication can be carried out by input/output (I/O) interface 250.Also, electronic equipment 200 can be with
By network adapter 260 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network,
Such as internet) communication.Network adapter 260 can be communicated by bus 230 with other modules of electronic equipment 200.It should
Understand, although not shown in the drawings, other hardware and/or software module can be used in conjunction with electronic equipment 200, including but unlimited
In: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and number
According to backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented
Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure
The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one
Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating
Equipment (can be personal computer, server or network equipment etc.) executes the above method according to disclosure embodiment.
Fig. 8 schematically shows a kind of computer readable storage medium schematic diagram in disclosure exemplary embodiment.
Refering to what is shown in Fig. 8, describing the program product for realizing the above method according to embodiment of the present disclosure
400, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device,
Such as it is run on PC.However, the program product of the disclosure is without being limited thereto, in this document, readable storage medium storing program for executing can be with
To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter
Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or
System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive
List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only
Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory
(CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The computer readable storage medium may include in a base band or the data as the propagation of carrier wave a part are believed
Number, wherein carrying readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetism
Signal, optical signal or above-mentioned any appropriate combination.Readable storage medium storing program for executing can also be any other than readable storage medium storing program for executing
Readable medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or
Person's program in connection.The program code for including on readable storage medium storing program for executing can transmit with any suitable medium, packet
Include but be not limited to wireless, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the disclosure operation program
Code, described program design language include object oriented program language-Java, C++ etc., further include conventional
Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user
It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating
Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far
Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP
To be connected by internet).
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by one
When the equipment executes, so that the computer-readable medium implements function such as: determining keyword set according to shop data;It extracts
The predetermined characteristic of each of keyword set keyword is to generate keyword feature set;To the keyword feature
Set carries out characteristic processing generation and characteristic;And by the characteristic input prediction model to obtain each pass
The corresponding click prediction data of keyword, the prediction model are established by sorting algorithm.
It will be appreciated by those skilled in the art that above-mentioned each module can be distributed in device according to the description of embodiment, it can also
Uniquely it is different from one or more devices of the present embodiment with carrying out corresponding change.The module of above-described embodiment can be merged into
One module, can also be further split into multiple submodule.
By the description of above embodiment, those skilled in the art is it can be readily appreciated that example embodiment described herein
It can also be realized in such a way that software is in conjunction with necessary hardware by software realization.Therefore, implemented according to the disclosure
The technical solution of example can be embodied in the form of software products, which can store in a non-volatile memories
In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) or on network, including some instructions are so that a calculating equipment (can
To be personal computer, server, mobile terminal or network equipment etc.) it executes according to the method for the embodiment of the present disclosure.
It is particularly shown and described the exemplary embodiment of the disclosure above.It should be appreciated that the present disclosure is not limited to
Detailed construction, set-up mode or implementation method described herein;On the contrary, disclosure intention covers included in appended claims
Various modifications and equivalence setting in spirit and scope.
In addition, structure shown by this specification Figure of description, ratio, size etc., only to cooperate specification institute
Disclosure, for skilled in the art realises that be not limited to the enforceable qualifications of the disclosure with reading, therefore
Do not have technical essential meaning, the modification of any structure, the change of proportionate relationship or the adjustment of size are not influencing the disclosure
Under the technical effect and achieved purpose that can be generated, it should all still fall in technology contents disclosed in the disclosure and obtain and can cover
In the range of.Meanwhile cited such as "upper" in this specification, " first ", " second " and " one " term, be also only and be convenient for
Narration is illustrated, rather than to limit the enforceable range of the disclosure, relativeness is altered or modified, without substantive change
Under technology contents, when being also considered as the enforceable scope of the disclosure.
Claims (12)
1. a kind of search data processing method characterized by comprising
Keyword set is determined according to shop data;
The predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature set;
Characteristic processing is carried out to the keyword feature set and generates characteristic;And
By the characteristic input prediction model to obtain the corresponding click prediction data of each keyword, the prediction mould
Type is established by sorting algorithm.
2. the method as described in claim 1, which is characterized in that further include:
Prediction model is generated by history shop data and sorting algorithm, the sorting algorithm includes logistic regression algorithm.
3. method according to claim 2, which is characterized in that generate prediction model by history shop data and sorting algorithm
Include:
Obtain the history feature data in the data of history shop;And
Pass through the training sorting algorithm using predesignated subscriber's behavior as output variable using history feature data as independent variable
Generate the prediction model.
4. the method as described in claim 1, which is characterized in that determine that keyword set includes: according to shop data
Data prediction is carried out to the shop data;
Extract multiple first keywords in the shop data after pre-processing;
Polymerization processing is carried out to the multiple first keyword respectively, obtains multiple first keyword dimensional searches amounts;And
The multiple first keyword dimensional searches are measured and carry out Screening Treatment to obtain the keyword set.
5. method as claimed in claim 4, which is characterized in that carrying out data prediction to the shop data includes:
To the user's click logs and order log progress data prediction in the shop data.
6. method as claimed in claim 4, which is characterized in that polymerization processing is carried out respectively to the multiple first keyword,
Obtaining multiple first keyword dimensional searches amounts includes:
The multiple first keyword is subjected to polymerization processing under predetermined dimensional characteristics respectively, generates multiple first keyword dimensions
Spend volumes of searches.
7. method as claimed in claim 6, which is characterized in that screened to the multiple first keyword dimensional searches amount
It handles to obtain the keyword set and includes:
Judge whether each first keyword dimensional searches amount meets predetermined condition respectively;And
Keyword dimensional searches by meeting predetermined condition measure corresponding first keyword and generate the keyword set.
8. the method for claim 7, which is characterized in that the predetermined condition includes:
First keyword flow accounting clicks conversion ratio and lower single conversion ratio.
9. a kind of search data processing equipment characterized by comprising
Collection modules, for determining keyword set according to shop data;
Extraction module, for extracting the predetermined characteristic of each of keyword set keyword to generate keyword feature
Set;
Characteristic module, for carrying out characteristic processing generation and characteristic to the keyword feature set;And
Prediction module, for the characteristic input prediction model to be obtained to the corresponding click prediction number of each keyword
According to the prediction model is established by sorting algorithm.
10. device as claimed in claim 9, which is characterized in that further include:
Training module, for generating prediction model by history shop data and sorting algorithm, the sorting algorithm includes logic
Regression algorithm.
11. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as method described in any one of claims 1-8.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
Such as method described in any one of claims 1-8 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810361882.2A CN110399479A (en) | 2018-04-20 | 2018-04-20 | Search for data processing method, device, electronic equipment and computer-readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810361882.2A CN110399479A (en) | 2018-04-20 | 2018-04-20 | Search for data processing method, device, electronic equipment and computer-readable medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110399479A true CN110399479A (en) | 2019-11-01 |
Family
ID=68319490
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810361882.2A Pending CN110399479A (en) | 2018-04-20 | 2018-04-20 | Search for data processing method, device, electronic equipment and computer-readable medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399479A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259249A (en) * | 2020-01-20 | 2020-06-09 | 北京百度网讯科技有限公司 | Data screening method, device, equipment and storage medium |
CN112364185A (en) * | 2020-11-23 | 2021-02-12 | 北京达佳互联信息技术有限公司 | Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium |
CN113111182A (en) * | 2021-04-15 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | Information recommendation method and device and computer-readable storage medium |
CN113743975A (en) * | 2021-01-29 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Advertisement effect processing method and device |
CN113761108A (en) * | 2020-06-02 | 2021-12-07 | 深信服科技股份有限公司 | Data searching method, device, equipment and computer readable storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060026147A1 (en) * | 2004-07-30 | 2006-02-02 | Cone Julian M | Adaptive search engine |
US20070233565A1 (en) * | 2006-01-06 | 2007-10-04 | Jeff Herzog | Online Advertising System and Method |
CN101980210A (en) * | 2010-11-12 | 2011-02-23 | 百度在线网络技术(北京)有限公司 | Marked word classifying and grading method and system |
CN102479190A (en) * | 2010-11-22 | 2012-05-30 | 阿里巴巴集团控股有限公司 | Method and device for predicting estimation values of search keyword |
CN102567398A (en) * | 2010-12-30 | 2012-07-11 | 阿里巴巴集团控股有限公司 | Method and system for feeding back keyword estimated value |
CN103823803A (en) * | 2012-11-16 | 2014-05-28 | 腾讯科技(深圳)有限公司 | Keyword screening method, device and equipment |
CN105095210A (en) * | 2014-04-22 | 2015-11-25 | 阿里巴巴集团控股有限公司 | Method and apparatus for screening promotional keywords |
-
2018
- 2018-04-20 CN CN201810361882.2A patent/CN110399479A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060026147A1 (en) * | 2004-07-30 | 2006-02-02 | Cone Julian M | Adaptive search engine |
US20070233565A1 (en) * | 2006-01-06 | 2007-10-04 | Jeff Herzog | Online Advertising System and Method |
CN101980210A (en) * | 2010-11-12 | 2011-02-23 | 百度在线网络技术(北京)有限公司 | Marked word classifying and grading method and system |
CN102479190A (en) * | 2010-11-22 | 2012-05-30 | 阿里巴巴集团控股有限公司 | Method and device for predicting estimation values of search keyword |
CN102567398A (en) * | 2010-12-30 | 2012-07-11 | 阿里巴巴集团控股有限公司 | Method and system for feeding back keyword estimated value |
CN103823803A (en) * | 2012-11-16 | 2014-05-28 | 腾讯科技(深圳)有限公司 | Keyword screening method, device and equipment |
CN105095210A (en) * | 2014-04-22 | 2015-11-25 | 阿里巴巴集团控股有限公司 | Method and apparatus for screening promotional keywords |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111259249A (en) * | 2020-01-20 | 2020-06-09 | 北京百度网讯科技有限公司 | Data screening method, device, equipment and storage medium |
CN111259249B (en) * | 2020-01-20 | 2023-08-22 | 北京百度网讯科技有限公司 | Data screening method, device, equipment and storage medium |
CN113761108A (en) * | 2020-06-02 | 2021-12-07 | 深信服科技股份有限公司 | Data searching method, device, equipment and computer readable storage medium |
CN112364185A (en) * | 2020-11-23 | 2021-02-12 | 北京达佳互联信息技术有限公司 | Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium |
CN112364185B (en) * | 2020-11-23 | 2024-02-06 | 北京达佳互联信息技术有限公司 | Method and device for determining characteristics of multimedia resources, electronic equipment and storage medium |
CN113743975A (en) * | 2021-01-29 | 2021-12-03 | 北京沃东天骏信息技术有限公司 | Advertisement effect processing method and device |
CN113111182A (en) * | 2021-04-15 | 2021-07-13 | 北京沃东天骏信息技术有限公司 | Information recommendation method and device and computer-readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021203819A1 (en) | Content recommendation method and apparatus, electronic device, and storage medium | |
CN110399479A (en) | Search for data processing method, device, electronic equipment and computer-readable medium | |
CN107729937A (en) | For determining the method and device of user interest label | |
US11275748B2 (en) | Influence score of a social media domain | |
CN107818344A (en) | The method and system that user behavior is classified and predicted | |
Joung et al. | Approach for importance–performance analysis of product attributes from online reviews | |
US10657543B2 (en) | Targeted e-commerce business strategies based on affiliation networks derived from predictive cognitive traits | |
CN111210335B (en) | User risk identification method and device and electronic equipment | |
US20190213194A1 (en) | System and method for information recommendation | |
CN109684627A (en) | A kind of file classification method and device | |
EP4322031A1 (en) | Recommendation method, recommendation model training method, and related product | |
CN110111139A (en) | Behavior prediction model generation method, device, electronic equipment and readable medium | |
CN112085565A (en) | Deep learning-based information recommendation method, device, equipment and storage medium | |
US20230023630A1 (en) | Creating predictor variables for prediction models from unstructured data using natural language processing | |
CN109241403A (en) | Item recommendation method, device, machinery equipment and computer readable storage medium | |
CN113643103A (en) | Product recommendation method, device, equipment and storage medium based on user similarity | |
CN109087138A (en) | Data processing method and system, computer system and readable storage medium storing program for executing | |
CN105512180A (en) | Search recommendation method and device | |
CN113569129A (en) | Click rate prediction model processing method, content recommendation method, device and equipment | |
CN111429214B (en) | Transaction data-based buyer and seller matching method and device | |
CN109446431A (en) | For the method, apparatus of information recommendation, medium and calculate equipment | |
CN111695024A (en) | Object evaluation value prediction method and system, and recommendation method and system | |
CN110335143A (en) | Financial Risk Analysis method, apparatus and electronic equipment based on multiple temporal verifying | |
CN116680481B (en) | Search ranking method, apparatus, device, storage medium and computer program product | |
CN106575418A (en) | Suggested keywords |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |