CN110399479A - Search for data processing method, device, electronic equipment and computer-readable medium - Google Patents

Search for data processing method, device, electronic equipment and computer-readable medium Download PDF

Info

Publication number
CN110399479A
CN110399479A CN201810361882.2A CN201810361882A CN110399479A CN 110399479 A CN110399479 A CN 110399479A CN 201810361882 A CN201810361882 A CN 201810361882A CN 110399479 A CN110399479 A CN 110399479A
Authority
CN
China
Prior art keywords
keyword
data
shop
characteristic
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810361882.2A
Other languages
Chinese (zh)
Inventor
魏毅
邵荣防
郝晖
罗宝胜
邓旺文
刘爽爽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201810361882.2A priority Critical patent/CN110399479A/en
Publication of CN110399479A publication Critical patent/CN110399479A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation
    • G06Q30/0625Directed, with specific intent or strategy

Abstract

This disclosure relates to a kind of search data processing method, device, electronic equipment and computer-readable medium.It is related to computer information processing field, this method comprises: determining keyword set according to shop data;The predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature set;Characteristic processing is carried out to the keyword feature set and generates characteristic;And by the characteristic input prediction model to obtain the corresponding click prediction data of each keyword, the prediction model is established by sorting algorithm.This disclosure relates to search data processing method, device, electronic equipment and computer-readable medium.Can Accurate Prediction search key be shop bring clicking rate, promote conversion ratio of the shop in search system, and different intervention strateges is formulated according to user's gender and region.

Description

Search for data processing method, device, electronic equipment and computer-readable medium
Technical field
This disclosure relates to computer information processing field, in particular to a kind of search data processing method, device, Electronic equipment and computer-readable medium.
Background technique
With flourishing for internet electronic business, shopping online has become an important way of people's current consumption Diameter.The effective tool of commodity needed for search engine is quickly obtained from mass data as user has website traffic distribution Very important status.It is one of the important way that specific shop water conservancy diversion has become flow operation by intervening search result, so And traditional artificial screening intervenes keyword and has been unable to satisfy fast-developing business demand.
In the prior art, shop search, which is intervened, relies primarily on the preferable keyword of hand picking transformation in planta rate, then mentions The sequence of dependent merchandise search results pages under the keyword in shop is risen to increase shop exposure.It is before being intervened by comparison and dry The Data Representation of prognosis for a period of time assesses intervention effect.Existing scheme cannot achieve accurate operation, and transformation in planta rate is high Keyword may conversion ratio be unsatisfactory in the case where intervening shop, and biggish flow is caused to waste.Existing scheme to different geographical and The user of different sexes uses same set of intervention stratege, and personalized support cannot be carried out to shop.Prior art can not be pre- Know intervention effect, selected intervention stratege has certain blindness, and can only pass through for a period of time comparison patients before and after intervention after intervention Data Representation assess intervention effect, there is larger hysteresis quality, can not quickly improve intervention stratege.
Therefore, it is necessary to a kind of new search data processing method, device, electronic equipment and computer-readable mediums.
Above- mentioned information are only used for reinforcing the understanding to the background of the disclosure, therefore it disclosed in the background technology part It may include the information not constituted to the prior art known to persons of ordinary skill in the art.
Summary of the invention
In view of this, the disclosure provides a kind of search data processing method, device, electronic equipment and computer-readable Jie Matter, can Accurate Prediction search key be shop bring clicking rate, promoted shop launch advertisement conversion ratio.
Other characteristics and advantages of the disclosure will be apparent from by the following detailed description, or partially by the disclosure Practice and acquistion.
According to the one side of the disclosure, a kind of search data processing method is proposed, this method comprises: true according to shop data Determine keyword set;The predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature collection It closes;Characteristic processing generation and characteristic are carried out to the keyword feature set;And the characteristic is inputted pre- Model is surveyed to obtain the corresponding click prediction data of each keyword, the prediction model is established by sorting algorithm.
In a kind of exemplary embodiment of the disclosure, further includes: generated by history shop data and sorting algorithm pre- Model is surveyed, the sorting algorithm includes logistic regression algorithm.
In a kind of exemplary embodiment of the disclosure, prediction model packet is generated by history shop data and sorting algorithm It includes: obtaining the history feature data in the data of history shop;And using history feature data as independent variable, by predesignated subscriber's row To generate the prediction model by the training sorting algorithm as output variable.
In a kind of exemplary embodiment of the disclosure, determine that keyword set includes: to the shop according to shop data It spreads data and carries out data prediction;Extract multiple first keywords in the shop data after pre-processing;To the multiple One keyword carries out polymerization processing respectively, obtains multiple first keyword dimensional searches amounts;And it is crucial to the multiple first Word dimensional searches, which measure, carries out Screening Treatment to obtain the keyword set.
In a kind of exemplary embodiment of the disclosure, carrying out data prediction to the shop data includes: to described User's click logs and order log in the data of shop carry out data prediction.
In a kind of exemplary embodiment of the disclosure, polymerization processing is carried out to the multiple first keyword respectively, is obtained Taking multiple first keyword dimensional searches amounts includes: to gather the multiple first keyword under predetermined dimensional characteristics respectively Conjunction processing generates multiple first keyword dimensional searches amounts.
In a kind of exemplary embodiment of the disclosure, the multiple first keyword dimensional searches amount is carried out at screening Reason includes: to judge whether each first keyword dimensional searches amount meets predetermined item respectively to obtain the keyword set Part;And the keyword dimensional searches by meeting predetermined condition measure corresponding first keyword and generate the keyword set.
In a kind of exemplary embodiment of the disclosure, the predetermined condition includes: the first keyword flow accounting, clicks Conversion ratio and lower single conversion ratio.
According to the one side of the disclosure, it proposes that a kind of search data processing equipment, the device include: collection modules, is used for Keyword set is determined according to shop data;Extraction module, for extracting each of keyword set keyword Predetermined characteristic is to generate keyword feature set;Characteristic module, it is raw for carrying out characteristic processing to the keyword feature set At and characteristic;And prediction module, for by the characteristic input prediction model to obtain each keyword Corresponding click prediction data, the prediction model are established by sorting algorithm.
In a kind of exemplary embodiment of the disclosure, further includes: training module, for passing through history shop data and dividing Class algorithm generates prediction model, and the sorting algorithm includes logistic regression algorithm.
According to the one side of the disclosure, a kind of electronic equipment is proposed, which includes: one or more processors; Storage device, for storing one or more programs;When one or more programs are executed by one or more processors, so that one A or multiple processors realize such as methodology above.
According to the one side of the disclosure, it proposes a kind of computer-readable medium, is stored thereon with computer program, the program Method as mentioned in the above is realized when being executed by processor.
It, can be accurately pre- according to search data processing method, device, electronic equipment and the computer-readable medium of the disclosure Survey search key is shop bring clicking rate, promotes the conversion ratio that advertisement is launched in shop.
It should be understood that the above general description and the following detailed description are merely exemplary, this can not be limited It is open.
Detailed description of the invention
Its example embodiment is described in detail by referring to accompanying drawing, above and other target, feature and the advantage of the disclosure will It becomes more fully apparent.Drawings discussed below is only some embodiments of the present disclosure, for the ordinary skill of this field For personnel, without creative efforts, it is also possible to obtain other drawings based on these drawings.
Fig. 1 is the system block diagram of a kind of search data processing method shown according to an exemplary embodiment and device.
Fig. 2 is a kind of flow chart for searching for data processing method shown according to an exemplary embodiment.
Fig. 3 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.
Fig. 4 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.
Fig. 5 is a kind of block diagram for searching for data processing equipment shown according to an exemplary embodiment.
Fig. 6 is a kind of schematic diagram of the search data processing equipment shown according to another exemplary embodiment.
Fig. 7 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Fig. 8 is that a kind of computer readable storage medium schematic diagram is shown according to an exemplary embodiment.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be real in a variety of forms It applies, and is not understood as limited to embodiment set forth herein;On the contrary, thesing embodiments are provided so that the disclosure will be comprehensively and complete It is whole, and the design of example embodiment is comprehensively communicated to those skilled in the art.Identical appended drawing reference indicates in figure Same or similar part, thus repetition thereof will be omitted.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner In example.In the following description, many details are provided to provide and fully understand to embodiment of the disclosure.However, It will be appreciated by persons skilled in the art that can with technical solution of the disclosure without one or more in specific detail, Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side Method, device, realization or operation are to avoid fuzzy all aspects of this disclosure.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity. I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step, It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
It should be understood that although herein various assemblies may be described using term first, second, third, etc., these groups Part should not be limited by these terms.These terms are to distinguish a component and another component.Therefore, first group be discussed herein below Part can be described as the second component without departing from the teaching of disclosure concept.As used herein, term " and/or " include associated All combinations for listing any of project and one or more.
It will be understood by those skilled in the art that attached drawing is the schematic diagram of example embodiment, module or process in attached drawing Necessary to not necessarily implementing the disclosure, therefore it cannot be used for the protection scope of the limitation disclosure.
Fig. 1 is the system block diagram of a kind of search data processing method shown according to an exemplary embodiment and device.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 101,102,103 The application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 101,102,103 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 105 can be to provide the server of various services, such as utilize terminal device 101,102,103 to user The shopping class website browsed provides the back-stage management server supported.Back-stage management server can believe the product received The data such as breath inquiry request carry out the processing such as analyzing, and processing result is fed back to terminal device.
Server 105 for example can determine keyword set according to shop data;Server 105 can for example extract the key The predetermined characteristic of each of set of words keyword is to generate keyword feature set;Server 105 can be for example to the pass Keyword characteristic set carries out characteristic processing and generates characteristic;Server 105 can be for example by the characteristic input prediction mould Type to obtain the corresponding click prediction data of each keyword, established by sorting algorithm by the prediction model.
Server 105 also for example can generate prediction model, the sorting algorithm by history shop data and sorting algorithm Including logistic regression algorithm.
Server 105 can be the server of an entity, also may be, for example, multiple server compositions, needs to illustrate It is that search data processing method provided by the embodiment of the present disclosure can be executed by server 105, correspondingly, searches at data Reason device can be set in server 105.And it is supplied to user and carries out the page end of goods browse and carry out businessman's inquiry Request end is normally in terminal device 101,102,103.
Fig. 2 is a kind of flow chart for searching for data processing method shown according to an exemplary embodiment.It searches at data Reason method 20 includes at least step S202 to S208.
As shown in Fig. 2, determining keyword set according to shop data in S202.Click and order day can for example be rejected Invalid data in will, avoids system from being interfered by hash, and further authority data format, promotes follow-up data processing effect Rate.Data prediction for example can be carried out to the shop data;Multiple first extracted in the shop data after pre-processing are closed Keyword;Polymerization processing is carried out to the multiple first keyword respectively, obtains multiple first keyword dimensional searches amounts;And it is right The multiple first keyword dimensional searches, which measure, carries out Screening Treatment to obtain the keyword set.
In one embodiment, carrying out data prediction to the shop data includes: to the use in the shop data Family click logs and order log carry out data prediction.In view of keyword data to timeliness require and not bery strongly, and User's search click data volume is huge, frame can be handled for example, by using offline distributed big data, with nearest history search in N days pass Keyword data are to excavate pond, and daily hundreds of millions rank data of batch processing are launched for online advertisement, and search is intervened, precisely operation etc. System provides specification stable based data service.Offline distribution big data processing frame may be, for example, based on Hadoop's Spark big data processing platform, the real-time data base externally provided may be, for example, Hbase or Redis.
In S204, the predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature Set.Predetermined characteristic may be, for example: volumes of searches, search number of users, click volume, click number of users, lower list number of users, order row, Transaction amount, average exposure depth, average click location, average lower single position are clicked conversion ratio CTR (click volume/volumes of searches), Lower list conversion ratio CVR (order row/click volume), visitor unit price price (transaction amount/order row), UV value (gmv/ searches for uv), RPM (1000* transaction amount/volumes of searches), per capita features such as volumes of searches (volumes of searches/search user).
In S206, characteristic processing is carried out to the keyword feature set and generates characteristic.Characteristic processing can for example, The index too big to very poor (difference i.e. between maxima and minima), such as volumes of searches, transaction amount etc. first use log function Smoothly;It is in order to eliminate the influence of index magnitude, all features are shown as follows, utilize minimax method for normalizing to turn Change to [0,1] section:
Wherein, Y be normalization after data, X be to normalized numerical value, Xmin, Xmax be respectively X minimum value with Maximum value.
Generation characteristic is normalized in data in all keyword feature set.
In S208, by the characteristic input prediction model to obtain the corresponding click prediction number of each keyword According to the prediction model is established by sorting algorithm.By the above step, determine that keyword may be, for example, pending The vocabulary of intervention can also for example set display location of the keyword in result of page searching, can be for example, search results pages Top The position M is (10 such as preceding) for that can intervene position, and position is intervened in setting, the commodity in the shop is shown in this position, by what is selected Keyword and position to be presented input are preset in prediction model to obtain the corresponding click prediction number of each keyword According to, and then estimate and dry outcome is carried out to keyword, it is shop bring flow.
According to the search data processing method of the disclosure, by handling shop historical data, and then extract crucial Then word inputs keyword in preset prediction model, obtain click prediction data mode, can Accurate Prediction search Rope keyword is shop bring clicking rate, promotes the conversion ratio that advertisement is launched in shop.
It will be clearly understood that the present disclosure describes how to form and use particular example, but the principle of the disclosure is not limited to These exemplary any details.On the contrary, the introduction based on disclosure disclosure, these principles can be applied to many other Embodiment.
In a kind of exemplary embodiment of the disclosure, further includes: generated by history shop data and sorting algorithm pre- Model is surveyed, the sorting algorithm includes logistic regression algorithm.It can be for example, obtaining the history feature data in the data of history shop; And pass through the training sorting algorithm using predesignated subscriber's behavior as output variable using history feature data as independent variable Generate the prediction model.
Logistic regression is a kind of typical sorting algorithm, and to one group of given independent variable, it is other general that output belongs to every type Rate.In this application can be for example, by using binary logistic regression, i.e. classification is 0,1 two kinds, conditional probability distribution are as follows:
In this application, x is input independent variable (namely characteristic), and y=0,1 is output variable, can be described as using Whether click commodity or whether place an order in family.θ is the corresponding weight coefficient of independent variable x.It, can be such as during training pattern Using there is maximum-likelihood method estimation.Also prediction model can be established with bayesian algorithm for example, setting by the decline of GBDT gradient, established Process is similar to the above process, and the application repeats no more again.The application is not limited.
According to the search data processing method of the disclosure, obtained using the model training that shop historical data carries out sorting algorithm The mode of prediction model is taken, the prediction model of relationship between Accurate Prediction keyword and clicking rate can be obtained.
Fig. 3 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.It is shown in Fig. 3 Process is the detailed description to S202 in process shown in Fig. 2 " determining keyword set according to shop data ".Fig. 3 is illustrative The invalid data rejected and clicked in order log is described, avoids system from being interfered by hash, and further authority data Format promotes the associated process steps of follow-up data treatment effeciency.
As shown in figure 3, obtaining and clicking and order daily record data in S302.It may be, for example, nearest N days (such as 90 days) users Search is clicked and order log data.
In S304, crawler data are judged whether it is.Can for example, identification record whether be obtain web page contents crawler, If yes then enter step S310.
In S306, judge whether data report complete.Judge the integrality that user information and keyword report, rejects nothing The data of keyword or user information, if yes then enter step S310.
In S308, cheating user data is judged whether it is.Judging user's (login account or browser unique number) is No is cheating user (cheating user list can be from air control data acquisition), if yes then enter step S310, abandons the user's All records.
In S310, abandon.
In S312, keyword standardization.Can for example filter spcial character (!@$ %^* ()=~` { } |:;" ') and Violated word, English alphabet capitalization switch to small letter, and Complex form of Chinese Character switchs to simplified, the removal of keyword head and the tail space and internal multiple spaces Single space is merged into, long character (such as length is more than 15 character) processing is filtered.
In S314, the click and order table effectively standardized is exported.
In one embodiment, polymerization processing is carried out to the multiple first keyword respectively, it is crucial to obtain multiple first Word dimensional searches amount includes: that the multiple first keyword is carried out polymerization processing under predetermined dimensional characteristics respectively, is generated more A first keyword dimensional searches amount.It can be for example, statistics keyword and shop, user's gender, region and result page position etc. be handed over Volumes of searches under dimension is pitched, number of users is searched for, click volume clicks number of users, lower list number of users, order row, the bases such as transaction amount Achievement data, and index conversion is carried out by attenuation function.
In this step process, input data may be, for example, the click effectively standardized after pretreatment and order log sheet, The shop commodity sku- id dimension table, user's gender dimension table, IP address and region mapping table.Output data may be, for example, following shape Formula: keyword-position, keyword-gender-position, keyword-region-position, keyword-shop-position, keyword-shop Paving-gender-position, the aggregated data of six dimensions such as keyword-shop-region-position.
First with the pretreated click effectively standardized and order table associated articles, user's gender, the life of region dimension table Tapability is not and the shop of regional information is clicked and order table.
Then to click and order table according to keyword-position, keyword-gender-position, keyword-region-position, close Keyword-shop-position, keyword-shop-gender-position, keyword-shop-dimension of region-position six daily, are divided respectively Group polymerization calculates volumes of searches, searches for number of users, and click volume clicks number of users, lower list number of users, order row, the bases such as transaction amount Plinth index.
It is small that hits can be for example rejected since keyword click data is there are obvious long tail effect, in reprocessing process In the data of some threshold value (such as 100), data volume is reduced to improve follow-up data treatment effeciency.
Also processing can be weighted to base values daily under each dimension for example, by using following formula, i.e., to away from calculation date Data in n days, which are not done, to decay, and decays to n to N days data.However the application is not limited.
In one embodiment, the multiple first keyword dimensional searches are measured and carries out Screening Treatment to obtain the pass Keyword set includes: to judge whether each first keyword dimensional searches amount meets predetermined condition respectively;And pass through satisfaction The keyword dimensional searches of predetermined condition measure corresponding first keyword and generate the keyword set.The predetermined condition packet Include: the first keyword flow accounting, click conversion ratio and lower single conversion ratio can for example, for keep the search ecological balance and User experience is not influenced, it, should when keyword has reached given threshold value to the ratio between the flow in shop and keyword entirety flow Keyword can not intervene.In the candidate keywords that other can intervene, picks out and click conversion ratio (Click-Through- Rate, CTR), lower list conversion ratio (Conversion Rate, CVR) is more than that the keyword of average level generates the keyword set It closes.
Fig. 4 is a kind of flow chart of the search data processing method shown according to another exemplary embodiment.It is shown in Fig. 4 Process is to " crucial to the multiple first in S202 in process shown in Fig. 2 " determining keyword set according to shop data " Word dimensional searches measure carry out Screening Treatment to obtain the keyword set " detailed description.Fig. 4 is illustratively described and is selected Qualified keyword is as the associated process steps for intervening alternative.
As shown in figure 4, selecting shop to be processed in S402.
In S404, multiple first keywords in the shop are determined by shop data.
In S406, judge the first keyword to the flow and keyword bulk flow in shop respectively each keyword Whether the ratio between amount has reached given threshold value.When above-mentioned ratio is greater than given threshold value, into S408, S410 is otherwise entered step.
In S408, whether the CTR of first keyword, CVR are more than average level.It is entered step when more than mean values Otherwise S410 enters S410.
In S410, first keyword is abandoned.
In S412, keyword set is generated.What the keyword can be used as the shop intervenes keyword.
It is ad system, intervenes by polymerizeing to the various dimensions of keyword according to the search data processing method of the disclosure System, operation personnel provide basic data support, value of each position to shop under quantized key word and word.
According to the search data processing method of the disclosure, by analysis of key word to the water conservancy diversion of different sexes and regional user Effect precisely finds high potentiality user for shop, and orientation intervenes search result under keyword, improves conversion ratio, enhances shop pair The approval of platform.
According to the search data processing method of the disclosure, clicking rate prediction is carried out by Logic Regression Models, is quickly calculated The desired effect of various combination intervention stratege facilitates operation personnel according to the suitably intervention plan of different target makings.
The search data processing method of the disclosure searches for shopping path with user (search is clicked, adds shopping cart, place an order) On data based on, keyword data is pre-processed and is standardized, and using keyword as core, from shop, Yong Huxing Not, multiple cross-dimension polymerization analysis such as position under regional and keyword, precisely depict different sexes, position under area and word It is the influence of shop diversion effect to keyword, and is ad system by importing data to Hbase or Redis, precisely runs System, search interfering system provides real time data and supports.
It will be appreciated by those skilled in the art that realizing that all or part of the steps of above-described embodiment is implemented as being executed by CPU Computer program.When the computer program is executed by CPU, above-mentioned function defined by the above method that the disclosure provides is executed Energy.The program can store in a kind of computer readable storage medium, which can be read-only memory, magnetic Disk or CD etc..
Further, it should be noted that above-mentioned attached drawing is only the place according to included by the method for disclosure exemplary embodiment Reason schematically illustrates, rather than limits purpose.It can be readily appreciated that above-mentioned processing shown in the drawings is not indicated or is limited at these The time sequencing of reason.In addition, be also easy to understand, these processing, which can be, for example either synchronously or asynchronously to be executed in multiple modules.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.It is real for disclosure device Undisclosed details in example is applied, embodiments of the present disclosure is please referred to.
Fig. 5 is a kind of block diagram for searching for data processing equipment shown according to an exemplary embodiment.Search for data processing Device 50 includes: collection modules 502, extraction module 504, characteristic module 506, prediction module 508, search data processing equipment 50 May be used also for example, training module 510.
Collection modules 502 are used to determine keyword set according to shop data.It can for example reject in click and order log Invalid data, avoid system from being interfered by magazine data, and further authority data format, promote follow-up data treatment effeciency. Data prediction for example can be carried out to the shop data;Extract multiple first keys in the shop data after pre-processing Word;Polymerization processing is carried out to the multiple first keyword respectively, obtains multiple first keyword dimensional searches amounts;And to institute It states multiple first keyword dimensional searches and measures progress Screening Treatment to obtain the keyword set.
Extraction module 504 is used to extract the predetermined characteristic of each of keyword set keyword to generate key Word characteristic set.Predetermined characteristic may be, for example: volumes of searches, search for number of users, and click volume clicks number of users, and lower list number of users is ordered Conversion ratio CTR (click volume/search is clicked in uniline, transaction amount, average exposure depth, average click location, average lower single position Rope amount), lower list conversion ratio CVR (order row/click volume), (gmv/ is searched visitor unit price price (transaction amount/order row), UV value Rope uv), RPM (1000* transaction amount/volumes of searches), features such as volumes of searches (volumes of searches/search user) per capita.
Characteristic module 506 is used to carry out characteristic processing generation and characteristic to the keyword feature set.Feature Processing can be for example, the index too big to very poor (difference i.e. between maxima and minima), such as volumes of searches, transaction amount etc., It is first smooth with log function;In order to eliminate the influence of index magnitude, all characteristic use minimax method for normalizing are transformed into [0,1] section.
Prediction module 508 is used for the characteristic input prediction model to obtain the corresponding click of each keyword Prediction data, the prediction model are established by sorting algorithm.By the above step, determine that keyword may be, for example, The vocabulary of pending intervention can also for example set display location of the keyword in result of page searching, can be for example, search result The position page Top M is (10 such as preceding) for that can intervene position, and position is intervened in setting, and the commodity in the shop are shown in this position, will Selected keyword and position input to be presented are preset in prediction model to obtain the corresponding click of each keyword Prediction data, and then estimate and dry outcome is carried out to keyword, it is shop bring flow.
Training module 510 is used to generate prediction model, the sorting algorithm packet by history shop data and sorting algorithm Include logistic regression algorithm.It can be for example, obtaining the history feature data in the data of history shop;And using history feature data as Independent variable generates the prediction model by the training sorting algorithm using predesignated subscriber's behavior as output variable.
According to the search data processing equipment of the disclosure, by handling shop historical data, and then extract crucial Then word inputs keyword in preset prediction model, obtain click prediction data mode, can Accurate Prediction search Rope keyword is shop bring clicking rate, promotes the conversion ratio that advertisement is launched in shop.
Fig. 6 is a kind of schematic diagram of the search data processing equipment shown according to another exemplary embodiment.Fig. 6 is exemplary Illustrate in the application search for data processing method process flow frame.Searching for data processing equipment 60 can for example, Data warehouse module 602, data processing and modeling module 604, distributed data system 606, data application scene module 608.
Wherein, data warehouse module 602 can be used for storing initial data, and initial data may be, for example, shop data, specifically It include: search data, click data adds purchase data and lower forms data etc..
Data processing can be used for pre-processing original processing with modeling module 604, later to data carry out polymerization with It calculates.Word is intervened in exportable gender personalization shop after calculating, and word and gender, that is, region are intervened in zone individualty shop Property shop intervene word.Also the data after calculating for example can be subjected to keyword feature extraction, then in input prediction model, It obtains and clicks estimating for conversion ratio.
Distributed data system 606 can be used for Individuation Management shop and intervene word, also for example can intervene word to shop and carry out Result estimate.
Data application scene module 608 can be used for supporting advertisement delivery system, search interfering system, and precisely operation system System etc..
Fig. 7 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
The electronic equipment 200 of this embodiment according to the disclosure is described referring to Fig. 7.The electronics that Fig. 7 is shown Equipment 200 is only an example, should not function to the embodiment of the present disclosure and use scope bring any restrictions.
As shown in fig. 7, electronic equipment 200 is showed in the form of universal computing device.The component of electronic equipment 200 can wrap It includes but is not limited to: at least one processing unit 210, at least one storage unit 220, (including the storage of the different system components of connection Unit 220 and processing unit 210) bus 230, display unit 240 etc..
Wherein, the storage unit is stored with program code, and said program code can be held by the processing unit 210 Row, so that the processing unit 210 executes described in this specification above-mentioned electronic prescription circulation processing method part according to this The step of disclosing various illustrative embodiments.For example, the processing unit 210 can be executed such as Fig. 2, Fig. 3, shown in Fig. 4 The step of.
The storage unit 220 may include the readable medium of volatile memory cell form, such as random access memory Unit (RAM) 2201 and/or cache memory unit 2202 can further include read-only memory unit (ROM) 2203.
The storage unit 220 can also include program/practical work with one group of (at least one) program module 2205 Tool 2204, such program module 2205 includes but is not limited to: operating system, one or more application program, other programs It may include the realization of network environment in module and program data, each of these examples or certain combination.
Bus 230 can be to indicate one of a few class bus structures or a variety of, including storage unit bus or storage Cell controller, peripheral bus, graphics acceleration port, processing unit use any bus structures in a variety of bus structures Local bus.
Electronic equipment 200 can also be with one or more external equipments 300 (such as keyboard, sensing equipment, bluetooth equipment Deng) communication, can also be enabled a user to one or more equipment interact with the electronic equipment 200 communicate, and/or with make Any equipment (such as the router, modulation /demodulation that the electronic equipment 200 can be communicated with one or more of the other calculating equipment Device etc.) communication.This communication can be carried out by input/output (I/O) interface 250.Also, electronic equipment 200 can be with By network adapter 260 and one or more network (such as local area network (LAN), wide area network (WAN) and/or public network, Such as internet) communication.Network adapter 260 can be communicated by bus 230 with other modules of electronic equipment 200.It should Understand, although not shown in the drawings, other hardware and/or software module can be used in conjunction with electronic equipment 200, including but unlimited In: microcode, device driver, redundant processing unit, external disk drive array, RAID system, tape drive and number According to backup storage system etc..
Through the above description of the embodiments, those skilled in the art is it can be readily appreciated that example described herein is implemented Mode can also be realized by software realization in such a way that software is in conjunction with necessary hardware.Therefore, according to the disclosure The technical solution of embodiment can be embodied in the form of software products, which can store non-volatile at one Property storage medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) in or network on, including some instructions are so that a calculating Equipment (can be personal computer, server or network equipment etc.) executes the above method according to disclosure embodiment.
Fig. 8 schematically shows a kind of computer readable storage medium schematic diagram in disclosure exemplary embodiment.
Refering to what is shown in Fig. 8, describing the program product for realizing the above method according to embodiment of the present disclosure 400, can using portable compact disc read only memory (CD-ROM) and including program code, and can in terminal device, Such as it is run on PC.However, the program product of the disclosure is without being limited thereto, in this document, readable storage medium storing program for executing can be with To be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or It is in connection.
Described program product can be using any combination of one or more readable mediums.Readable medium can be readable letter Number medium or readable storage medium storing program for executing.Readable storage medium storing program for executing for example can be but be not limited to electricity, magnetic, optical, electromagnetic, infrared ray or System, device or the device of semiconductor, or any above combination.The more specific example of readable storage medium storing program for executing is (non exhaustive List) include: electrical connection with one or more conducting wires, portable disc, hard disk, random access memory (RAM), read-only Memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read only memory (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.
The computer readable storage medium may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying readable program code.The data-signal of this propagation can take various forms, including but not limited to electromagnetism Signal, optical signal or above-mentioned any appropriate combination.Readable storage medium storing program for executing can also be any other than readable storage medium storing program for executing Readable medium, the readable medium can send, propagate or transmit for by instruction execution system, device or device use or Person's program in connection.The program code for including on readable storage medium storing program for executing can transmit with any suitable medium, packet Include but be not limited to wireless, wired, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with any combination of one or more programming languages come write for execute the disclosure operation program Code, described program design language include object oriented program language-Java, C++ etc., further include conventional Procedural programming language-such as " C " language or similar programming language.Program code can be fully in user It calculates and executes in equipment, partly executes on a user device, being executed as an independent software package, partially in user's calculating Upper side point is executed on a remote computing or is executed in remote computing device or server completely.It is being related to far Journey calculates in the situation of equipment, and remote computing device can pass through the network of any kind, including local area network (LAN) or wide area network (WAN), it is connected to user calculating equipment, or, it may be connected to external computing device (such as utilize ISP To be connected by internet).
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are by one When the equipment executes, so that the computer-readable medium implements function such as: determining keyword set according to shop data;It extracts The predetermined characteristic of each of keyword set keyword is to generate keyword feature set;To the keyword feature Set carries out characteristic processing generation and characteristic;And by the characteristic input prediction model to obtain each pass The corresponding click prediction data of keyword, the prediction model are established by sorting algorithm.
It will be appreciated by those skilled in the art that above-mentioned each module can be distributed in device according to the description of embodiment, it can also Uniquely it is different from one or more devices of the present embodiment with carrying out corresponding change.The module of above-described embodiment can be merged into One module, can also be further split into multiple submodule.
By the description of above embodiment, those skilled in the art is it can be readily appreciated that example embodiment described herein It can also be realized in such a way that software is in conjunction with necessary hardware by software realization.Therefore, implemented according to the disclosure The technical solution of example can be embodied in the form of software products, which can store in a non-volatile memories In medium (can be CD-ROM, USB flash disk, mobile hard disk etc.) or on network, including some instructions are so that a calculating equipment (can To be personal computer, server, mobile terminal or network equipment etc.) it executes according to the method for the embodiment of the present disclosure.
It is particularly shown and described the exemplary embodiment of the disclosure above.It should be appreciated that the present disclosure is not limited to Detailed construction, set-up mode or implementation method described herein;On the contrary, disclosure intention covers included in appended claims Various modifications and equivalence setting in spirit and scope.
In addition, structure shown by this specification Figure of description, ratio, size etc., only to cooperate specification institute Disclosure, for skilled in the art realises that be not limited to the enforceable qualifications of the disclosure with reading, therefore Do not have technical essential meaning, the modification of any structure, the change of proportionate relationship or the adjustment of size are not influencing the disclosure Under the technical effect and achieved purpose that can be generated, it should all still fall in technology contents disclosed in the disclosure and obtain and can cover In the range of.Meanwhile cited such as "upper" in this specification, " first ", " second " and " one " term, be also only and be convenient for Narration is illustrated, rather than to limit the enforceable range of the disclosure, relativeness is altered or modified, without substantive change Under technology contents, when being also considered as the enforceable scope of the disclosure.

Claims (12)

1. a kind of search data processing method characterized by comprising
Keyword set is determined according to shop data;
The predetermined characteristic of each of keyword set keyword is extracted to generate keyword feature set;
Characteristic processing is carried out to the keyword feature set and generates characteristic;And
By the characteristic input prediction model to obtain the corresponding click prediction data of each keyword, the prediction mould Type is established by sorting algorithm.
2. the method as described in claim 1, which is characterized in that further include:
Prediction model is generated by history shop data and sorting algorithm, the sorting algorithm includes logistic regression algorithm.
3. method according to claim 2, which is characterized in that generate prediction model by history shop data and sorting algorithm Include:
Obtain the history feature data in the data of history shop;And
Pass through the training sorting algorithm using predesignated subscriber's behavior as output variable using history feature data as independent variable Generate the prediction model.
4. the method as described in claim 1, which is characterized in that determine that keyword set includes: according to shop data
Data prediction is carried out to the shop data;
Extract multiple first keywords in the shop data after pre-processing;
Polymerization processing is carried out to the multiple first keyword respectively, obtains multiple first keyword dimensional searches amounts;And
The multiple first keyword dimensional searches are measured and carry out Screening Treatment to obtain the keyword set.
5. method as claimed in claim 4, which is characterized in that carrying out data prediction to the shop data includes:
To the user's click logs and order log progress data prediction in the shop data.
6. method as claimed in claim 4, which is characterized in that polymerization processing is carried out respectively to the multiple first keyword, Obtaining multiple first keyword dimensional searches amounts includes:
The multiple first keyword is subjected to polymerization processing under predetermined dimensional characteristics respectively, generates multiple first keyword dimensions Spend volumes of searches.
7. method as claimed in claim 6, which is characterized in that screened to the multiple first keyword dimensional searches amount It handles to obtain the keyword set and includes:
Judge whether each first keyword dimensional searches amount meets predetermined condition respectively;And
Keyword dimensional searches by meeting predetermined condition measure corresponding first keyword and generate the keyword set.
8. the method for claim 7, which is characterized in that the predetermined condition includes:
First keyword flow accounting clicks conversion ratio and lower single conversion ratio.
9. a kind of search data processing equipment characterized by comprising
Collection modules, for determining keyword set according to shop data;
Extraction module, for extracting the predetermined characteristic of each of keyword set keyword to generate keyword feature Set;
Characteristic module, for carrying out characteristic processing generation and characteristic to the keyword feature set;And
Prediction module, for the characteristic input prediction model to be obtained to the corresponding click prediction number of each keyword According to the prediction model is established by sorting algorithm.
10. device as claimed in claim 9, which is characterized in that further include:
Training module, for generating prediction model by history shop data and sorting algorithm, the sorting algorithm includes logic Regression algorithm.
11. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method described in any one of claims 1-8.
12. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method described in any one of claims 1-8 is realized when row.
CN201810361882.2A 2018-04-20 2018-04-20 Search for data processing method, device, electronic equipment and computer-readable medium Pending CN110399479A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810361882.2A CN110399479A (en) 2018-04-20 2018-04-20 Search for data processing method, device, electronic equipment and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810361882.2A CN110399479A (en) 2018-04-20 2018-04-20 Search for data processing method, device, electronic equipment and computer-readable medium

Publications (1)

Publication Number Publication Date
CN110399479A true CN110399479A (en) 2019-11-01

Family

ID=68319490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810361882.2A Pending CN110399479A (en) 2018-04-20 2018-04-20 Search for data processing method, device, electronic equipment and computer-readable medium

Country Status (1)

Country Link
CN (1) CN110399479A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259249A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN112364185A (en) * 2020-11-23 2021-02-12 北京达佳互联信息技术有限公司 Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium
CN113111182A (en) * 2021-04-15 2021-07-13 北京沃东天骏信息技术有限公司 Information recommendation method and device and computer-readable storage medium
CN113743975A (en) * 2021-01-29 2021-12-03 北京沃东天骏信息技术有限公司 Advertisement effect processing method and device
CN113761108A (en) * 2020-06-02 2021-12-07 深信服科技股份有限公司 Data searching method, device, equipment and computer readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060026147A1 (en) * 2004-07-30 2006-02-02 Cone Julian M Adaptive search engine
US20070233565A1 (en) * 2006-01-06 2007-10-04 Jeff Herzog Online Advertising System and Method
CN101980210A (en) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 Marked word classifying and grading method and system
CN102479190A (en) * 2010-11-22 2012-05-30 阿里巴巴集团控股有限公司 Method and device for predicting estimation values of search keyword
CN102567398A (en) * 2010-12-30 2012-07-11 阿里巴巴集团控股有限公司 Method and system for feeding back keyword estimated value
CN103823803A (en) * 2012-11-16 2014-05-28 腾讯科技(深圳)有限公司 Keyword screening method, device and equipment
CN105095210A (en) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 Method and apparatus for screening promotional keywords

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060026147A1 (en) * 2004-07-30 2006-02-02 Cone Julian M Adaptive search engine
US20070233565A1 (en) * 2006-01-06 2007-10-04 Jeff Herzog Online Advertising System and Method
CN101980210A (en) * 2010-11-12 2011-02-23 百度在线网络技术(北京)有限公司 Marked word classifying and grading method and system
CN102479190A (en) * 2010-11-22 2012-05-30 阿里巴巴集团控股有限公司 Method and device for predicting estimation values of search keyword
CN102567398A (en) * 2010-12-30 2012-07-11 阿里巴巴集团控股有限公司 Method and system for feeding back keyword estimated value
CN103823803A (en) * 2012-11-16 2014-05-28 腾讯科技(深圳)有限公司 Keyword screening method, device and equipment
CN105095210A (en) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 Method and apparatus for screening promotional keywords

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111259249A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN111259249B (en) * 2020-01-20 2023-08-22 北京百度网讯科技有限公司 Data screening method, device, equipment and storage medium
CN113761108A (en) * 2020-06-02 2021-12-07 深信服科技股份有限公司 Data searching method, device, equipment and computer readable storage medium
CN112364185A (en) * 2020-11-23 2021-02-12 北京达佳互联信息技术有限公司 Method and device for determining characteristics of multimedia resource, electronic equipment and storage medium
CN112364185B (en) * 2020-11-23 2024-02-06 北京达佳互联信息技术有限公司 Method and device for determining characteristics of multimedia resources, electronic equipment and storage medium
CN113743975A (en) * 2021-01-29 2021-12-03 北京沃东天骏信息技术有限公司 Advertisement effect processing method and device
CN113111182A (en) * 2021-04-15 2021-07-13 北京沃东天骏信息技术有限公司 Information recommendation method and device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
WO2021203819A1 (en) Content recommendation method and apparatus, electronic device, and storage medium
CN110399479A (en) Search for data processing method, device, electronic equipment and computer-readable medium
CN107729937A (en) For determining the method and device of user interest label
US11275748B2 (en) Influence score of a social media domain
CN107818344A (en) The method and system that user behavior is classified and predicted
Joung et al. Approach for importance–performance analysis of product attributes from online reviews
US10657543B2 (en) Targeted e-commerce business strategies based on affiliation networks derived from predictive cognitive traits
CN111210335B (en) User risk identification method and device and electronic equipment
US20190213194A1 (en) System and method for information recommendation
CN109684627A (en) A kind of file classification method and device
EP4322031A1 (en) Recommendation method, recommendation model training method, and related product
CN110111139A (en) Behavior prediction model generation method, device, electronic equipment and readable medium
CN112085565A (en) Deep learning-based information recommendation method, device, equipment and storage medium
US20230023630A1 (en) Creating predictor variables for prediction models from unstructured data using natural language processing
CN109241403A (en) Item recommendation method, device, machinery equipment and computer readable storage medium
CN113643103A (en) Product recommendation method, device, equipment and storage medium based on user similarity
CN109087138A (en) Data processing method and system, computer system and readable storage medium storing program for executing
CN105512180A (en) Search recommendation method and device
CN113569129A (en) Click rate prediction model processing method, content recommendation method, device and equipment
CN111429214B (en) Transaction data-based buyer and seller matching method and device
CN109446431A (en) For the method, apparatus of information recommendation, medium and calculate equipment
CN111695024A (en) Object evaluation value prediction method and system, and recommendation method and system
CN110335143A (en) Financial Risk Analysis method, apparatus and electronic equipment based on multiple temporal verifying
CN116680481B (en) Search ranking method, apparatus, device, storage medium and computer program product
CN106575418A (en) Suggested keywords

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination