CN111753179B - Data extraction method and device - Google Patents

Data extraction method and device Download PDF

Info

Publication number
CN111753179B
CN111753179B CN201910231855.8A CN201910231855A CN111753179B CN 111753179 B CN111753179 B CN 111753179B CN 201910231855 A CN201910231855 A CN 201910231855A CN 111753179 B CN111753179 B CN 111753179B
Authority
CN
China
Prior art keywords
data
pieces
requested
intention
interval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910231855.8A
Other languages
Chinese (zh)
Other versions
CN111753179A (en
Inventor
叶晗
吴含
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Youkun Information Technology Co ltd
Original Assignee
Shanghai Youkun Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Youkun Information Technology Co ltd filed Critical Shanghai Youkun Information Technology Co ltd
Priority to CN201910231855.8A priority Critical patent/CN111753179B/en
Publication of CN111753179A publication Critical patent/CN111753179A/en
Application granted granted Critical
Publication of CN111753179B publication Critical patent/CN111753179B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Abstract

The embodiment of the invention discloses a data extraction method and a data extraction device, wherein the method comprises the following steps: after request information of a user is obtained, N pieces of data and intention scores of each piece of data are obtained from a preset database, the N pieces of data are divided into T data intervals according to the range to which the N pieces of data belong and the intention scores of each piece of data, and intention conversion rate of each data interval is obtained; and screening out a target data interval with the intention conversion rate matched with the intention conversion rate corresponding to the requested data from the T data intervals, and extracting the requested data from the target data interval. In the embodiment of the invention, the data are extracted from the target data interval matched with the intention conversion rate corresponding to the requested data, so that the value of the extracted data is matched with that of the requested data; and the target data extracted for each user is matched with the value of the data requested by the user, so that the reasonable distribution of the data can be ensured.

Description

Data extraction method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data extraction method and apparatus.
Background
In the internet marketing system, the marketing platform can filter target data according to the requirements of a marketer and recommend the target data to the marketer, so that the marketer can determine whether to market commodities to users corresponding to the target data through the marketing platform according to the actual value of the target data. For example, if the demand of the marketer is to obtain contact information of 100 users in the shanghai region interested in shoes, the marketing platform may first screen data (for example, 1000 contact information) belonging to the shanghai region from a shoe database stored in the marketing platform, and may further analyze 1000 pieces of data (for example, predict 1000 pieces of data in a machine learning manner), and screen 100 pieces of target data from 1000 pieces of data; accordingly, the marketer can pay the marketing platform the value corresponding to the 100 items of target data to obtain 100 items of target data.
In one existing implementation, the marketing platform may set the data in the database to be equivalent, that is, the values corresponding to different data may be the same, and if the value of one piece of data is 10 incentive values, the value corresponding to 100 pieces of data may be 1000 incentive values. However, in actual operation, the values corresponding to each piece of data are different, and if the success rate of sales promotion of commodities to the user corresponding to the data a is high and the success rate of sales promotion of commodities to the user corresponding to the data B is low, the value corresponding to the data a is higher than the value corresponding to the data B. Therefore, if the first marketer pays more incentive values than the second marketer, the first marketer should theoretically acquire data of better quality than the second marketer. However, in the above manner, since the corresponding value of each piece of data in the database is the same, the quality of the target data extracted by the marketing platform from the database for the first marketer and the quality of the target data extracted for the second marketer may be the same. Therefore, the target data acquired by the method is not matched with the incentive value paid by the marketer, so that the data extracted by the marketing platform for different marketers is unreasonable.
In summary, there is a need for a data extraction method for extracting reasonable data for marketers.
Disclosure of Invention
The embodiment of the invention provides a data extraction method and device, which are used for extracting reasonable data for marketers.
In a first aspect, a data extraction method provided in an embodiment of the present invention includes:
acquiring request information, wherein the request information comprises the quantity of requested data, the range of the requested data and the value corresponding to the requested data; acquiring N pieces of data and an intention score of each piece of data in the N pieces of data from a preset database according to the request information, wherein the range of the N pieces of data is matched with the range of the requested data; dividing the N data into T data intervals according to the range to which the N data belong and the intention score of each data in the N data, wherein each data interval in the T data intervals comprises at least one data in the N data, and N is more than or equal to T and is greater than 0; obtaining the intention conversion rate of each data interval according to the intention score of at least one piece of data included in each data interval; determining the intention conversion rate corresponding to the requested data according to the value corresponding to the requested data, the reference value corresponding to the requested data and the reference intention conversion rate corresponding to the requested data; and screening out target data intervals with intention conversion rates matched with the intention conversion rates corresponding to the requested data from the T data intervals, and extracting the requested data from the target data intervals according to the quantity of the requested data.
In the embodiment of the invention, N pieces of data are divided into T data intervals with different intention conversion rates (namely, the quality of the data in each data space is different), and the data are extracted from the target data interval matched with the intention conversion rate corresponding to the requested data, so that the extracted data can be ensured to be matched with the value of the requested data, namely, the quality of the extracted data is matched with the incentive value paid by a marketer; further, since the target data extracted for each marketer is matched with the value of the data requested by the marketer, the reasonable distribution of the data can be ensured.
Optionally, the request information further includes seed data, and the intention score of the N pieces of data is determined by: determining negative sample data from P pieces of data of the preset database, wherein the P pieces of data comprise the N pieces of data, and P is larger than or equal to N; and training the seed data and the negative sample data to obtain a prediction model, and predicting the P pieces of data by using the prediction model respectively to obtain the intention score of the P pieces of data.
In the embodiment of the invention, the model can be trained by using a supervised machine learning method through predetermining the negative sample data in the preset database, namely, the model can be predicted by training the characteristics of seed data (namely positive sample data) provided by a marketer and the characteristics of the negative sample data, so that the prediction model is more accurate and has better prediction effect; furthermore, P pieces of data in the preset database are predicted by adopting a prediction model with a good prediction effect, so that the intention score of the P pieces of data is accurate.
Optionally, the obtaining N pieces of data from a preset database includes: and acquiring the N pieces of data with the cooling time less than a preset threshold value from the preset database according to the range of the requested data.
In the embodiment of the invention, by presetting the cooling time of each piece of data in the preset database, the data with the cooling time greater than or equal to the preset threshold value can be avoided being selected; in an application aspect, the cooling time of the data can be used for identifying the number of times that the data is recommended to the marketer or the number of times that the data is used by the marketer through the marketing platform, so that the same data can be prevented from being recommended to multiple users by setting the cooling time of the N pieces of data to be smaller than a preset threshold value, the technical problem that the users corresponding to the data, which are caused by the fact that the multiple marketers use the same data in the same time period, are harassed can be prevented, and the experience of the marketer and the users is improved.
Optionally, the requested data includes first data, and after the requested data is extracted from the target data interval, the method further includes: updating a cooling time of the first data if it is determined that the first data is used.
In the embodiment of the invention, the cooling time of each piece of data in the preset database can be updated according to the actual use condition of each piece of data, if a certain piece of data is used by a marketer through a marketing platform in a first time period (namely, the marketer already markets commodities to a user corresponding to the piece of data through the marketing platform), the data can be prevented from being recommended to other marketers in the cooling time by updating the cooling time of the piece of data, so that the problem that the commodities are repeatedly sold to the same user due to the fact that different marketers obtain the same piece of data can be avoided, the privacy and the safety of the user can be protected, and the experience of the user can be improved.
In a second aspect, an apparatus for upgrading a cascade device provided in an embodiment of the present invention includes:
the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring request information, and the request information comprises the quantity of requested data, the range of the requested data and the value corresponding to the requested data;
the processing module is used for acquiring N pieces of data and the intention score of each piece of data in the N pieces of data from a preset database according to the request information, and the range of the N pieces of data is matched with the range of the requested data; dividing the N data into T data intervals according to the range to which the N data belong and the intention score of each piece of data in the N data, wherein each data interval in the T data intervals comprises at least one piece of data in the N data; obtaining the intention conversion rate of each data interval according to the intention score of at least one piece of data included in each data interval; determining the intention conversion rate corresponding to the requested data according to the value corresponding to the requested data, the reference value corresponding to the requested data and the reference intention conversion rate corresponding to the requested data; wherein N is more than or equal to T and is more than 0;
and the extraction module is used for screening out a target data interval of which the intention conversion rate is matched with the intention conversion rate corresponding to the requested data from the T data intervals, and extracting the requested data from the target data interval according to the quantity of the requested data.
Optionally, the request information further includes seed data, and the intention score of the N pieces of data is determined by: determining negative sample data from P pieces of data of the preset database, wherein the P pieces of data comprise the N pieces of data, and P is larger than or equal to N; and training the seed data and the negative sample data to obtain a prediction model, and predicting the P pieces of data by using the prediction model respectively to obtain the intention score of the P pieces of data.
Optionally, the processing module is configured to: and acquiring the N pieces of data with the cooling time less than a preset threshold value from the preset database according to the range of the requested data.
Optionally, the requested data includes first data, and after the extraction module extracts the requested data from the target data interval, the processing module is further configured to: updating a cooling time of the first data if it is determined that the first data is used.
In a third aspect, an embodiment of the present invention further provides a computer program product, which when run on a computer, causes the computer to execute the data extraction method according to any of the first aspect or the second aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, which includes computer-readable instructions, and when the computer-readable instructions are read and executed by a computer, the computer is caused to execute the data extraction method according to the first aspect or any of the first aspects.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic structural diagram of an internet marketing system according to an embodiment of the present invention;
fig. 2 is a schematic flow chart corresponding to a data extraction method according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating the updating of the cooling time according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data extraction device provided in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic structural diagram of an internet marketing system according to an embodiment of the present invention, where the structural diagram can be applied to a crowd diffusion business scenario in an internet marketing mode. As shown in fig. 1, the architecture may include a marketer 110 and a marketing platform 120, the marketer 110 may obtain data (i.e., target data) of users interested in goods marketed by the marketer through the marketing platform 120, and market goods to users corresponding to the target data through the marketing platform 120 according to target data with high actual value, for example, the marketer 110 may deliver marketing advertisements to users corresponding to the target data through the marketing platform 120. The items marketed by the marketer may refer to products such as daily supplies, wearing supplies, travel supplies and the like, may also refer to software applications such as financial applications, shopping applications, reading applications and the like, may also refer to services such as advertisement production, music recommendation, video playing, news information and the like, and are not limited specifically. For example, if the commodity operated by the marketer 110 is brand-a children's garments, the marketer can acquire data of users interested in children's garments according to the recommendation of the marketing platform 120, and recommend brand-a children's garments to the users through the marketing platform, so that the business channel can be expanded, the number of the users can be increased, and the income can be increased.
In specific implementation, the marketing platform 120 may be provided with a preset database, where at least one piece of data may be stored in the preset database, and each piece of data may correspond to data of one user, for example, one piece of data may include a contact way of the user, a common place of residence, identification information of a terminal device used by the user, and the like. After receiving the request information of the marketer 110, the marketing platform 120 may screen one or more pieces of target data from the preset database according to the request information of the marketer 110, and then provide the one or more pieces of target data to the marketer 110. As shown in fig. 1, in one example, data 131 to data 139 may be stored in a preset database of the marketing platform 120, and after receiving a request message sent by the marketer 110, the marketing platform 120 screens out the data 131, the data 134 and the data 138 (i.e., target data) from the data 131 to the data 139, so that the marketing platform 120 can provide the data 131, the data 134 and the data 138 to the marketer 110; in this way, the marketer 110 can decide whether to promote the commodities to the users corresponding to the data 131, the data 134 and the data 138 respectively through the marketing platform 120 according to the actual value of the data recommended by the marketing platform 120.
Based on the system architecture illustrated in fig. 1, fig. 2 is a schematic flow chart corresponding to a data extraction method provided in an embodiment of the present invention. The execution subject of the data extraction method may be the marketing platform shown in fig. 1, and the method includes:
step 201, request information is obtained.
Here, the request information may be obtained according to the demand information of the marketer, and the request information may include information of data that the marketer wishes to acquire from the marketing platform, such as the amount of requested data (i.e., the amount of data that the marketer wishes to acquire from the marketing platform), the range of the requested data (i.e., the range of data that the marketer wishes to acquire), and the value corresponding to the requested data (e.g., the incentive value that the marketer can pay to the marketing platform). The scope to which the requested data belongs may refer to an industry to which the requested data belongs, or may also refer to a region to which the requested data belongs, or may also refer to an industry and a region to which the requested data belongs, and is not particularly limited.
In a possible implementation manner, an order submission page may be set on the marketing platform, and the marketer may generate the request information by filling the demand information of the marketer on the order submission page. For example, table 1 is a schematic table of the requirement information filled in by the marketer a, the marketer B and the marketer C on the order submission page, respectively, where the second line of data in table 1 may be the requirement information filled in by the marketer a, the third line of data in table 1 may be the requirement information filled in by the marketer B, and the fourth line of data in table 1 may be the requirement information filled in by the marketer C.
Table 1: requirement information example filled in by order submitting page
Marketing business Amount of data Data range Data value
Marketer A 100 strips Footwear industry Excitation value of 1000
Marketer B 100 strips Shanghai area, footwear industry Excitation value of 500
Marketer C 100 strips Beijing area, children's garment industry 700 excitation value
As shown in table 1, marketer a wishes to acquire data of 100 users interested in shoes by paying a 1000 incentive value, marketer B wishes to acquire data of 100 users interested in shoes in shanghai area by paying a 500 incentive value, and marketer C wishes to acquire data of 100 users interested in children's garments in beijing area by paying a 700 incentive value.
It should be noted that table 1 is only an exemplary and simple description, and the range of the requested data is only for convenience of describing the scheme and does not constitute a limitation on the scheme, and in a specific implementation, the range of the requested data may also be other ranges, such as the type of the terminal to which the requested data belongs, the payroll range to which the requested data belongs, and the like, and is not specifically limited.
In one example, the marketing platform may perform a review of the orders filled by the marketer, which may include an initial review and a re-review. Specifically, the initial review can be performed on the value corresponding to the requested data filled by the marketer, and if the value corresponding to the requested data is greater than or equal to the preset value, the required information can pass the review; and if the value corresponding to the requested data is smaller than the preset value, the required information cannot pass the audit. For example, if the preset value is 600 incentive values, according to the contents in table 1, the demand information of the marketer a and the marketer C can pass the audit, and the demand information of the marketer B cannot pass the audit. After the initial audit is passed, the marketing platform can perform secondary audit on the demand information, the secondary audit can perform audit on the value in the account of the marketing provider, and if the value in the account of the marketing provider is greater than or equal to the value corresponding to the requested data, the demand information can pass the audit; and if the value in the account of the marketer is smaller than the value corresponding to the requested data, the requirement information cannot pass the audit. Further, if the marketing platform determines that the demand information of the marketer passes the primary audit and the secondary audit, the request message can be generated according to the demand information of the marketer.
Step 202, determining the intention conversion rate corresponding to the requested data and the intention conversion rate of the T data intervals.
In a possible implementation manner, a plurality of preset databases may be set in the marketing platform, each preset database may store a plurality of pieces of data (i.e., two or more pieces of data), a region to which each piece of data belongs, and an intention score of each piece of data in the plurality of pieces of data, and industries to which the plurality of pieces of data stored in each preset database belong may be the same. The intention score of a piece of data can be used to identify the probability that the user corresponding to the data is likely to be interested in the commodity of the industry to which the data belongs, and the range of the intention score can be set by those skilled in the art according to actual needs, for example, the intention score can be set to any value within the range of [0,100 ]. Table 2 is a schematic table of a plurality of preset databases stored in a marketing platform.
As shown in Table 2, the marketing platform can store preset number of shoesDatabase and children clothes preset database, wherein the shoes preset database can store data a1Data a2Data a3Data a4And data a5The 5 pieces of data, the regions to which the 5 pieces of data belong respectively and the intention scores of the five pieces of data can be stored in the children's garment database1Data b2Data b3Data b4And data b5The 5 pieces of data, the regions to which the 5 pieces of data belong, respectively, and the intention scores of the five pieces of data. It should be noted that the data in the footwear preset database and the children's garment preset database may be the same or different, and are not limited specifically.
Table 2: multiple preset database schematic
Figure GDA0003030865210000091
Taking the preset database of footwear as an example, in one example, the intent score of each datum stored in the preset database of footwear may be determined by: determining negative sample data from a plurality of pieces of data in a preset database of the footwear, training the negative sample data by using seed data and the negative sample data provided by a marketer to obtain a prediction model, and predicting the plurality of pieces of data by using the prediction model respectively to obtain intention scores of the plurality of pieces of data; for example, data a1The intention score of (a) can be used to identify data a1A probability that the corresponding user is interested in the shoe. The negative sample data may be data of a user who is not interested in shoes, or data of a user who is not interested in commodities in multiple industries, and the negative sample data may be determined by an investigation method, or may be determined by a person skilled in the art according to experience, and is not particularly limited.
In the embodiment of the invention, the model can be trained by using a supervised machine learning method through predetermining the negative sample data in the preset database, namely, the model can be predicted by training the characteristics of the seed data (namely the positive sample data) provided by a user and the characteristics of the negative sample data, so that the prediction model is more accurate and has better prediction effect; furthermore, the multiple data in the preset database are predicted by adopting the prediction model with a good prediction effect, so that the intention scores of the multiple data are accurate.
In one example, the footwear preset database stored in the marketing platform may be divided into a plurality of preset databases (i.e., two or more preset databases), and the data extraction method in the embodiment of the present invention may be performed according to a first preset period by sequentially using the plurality of preset databases, so as to ensure that the amount of data in the preset databases used in the first preset period meets requirements and is not attenuated intensively. In the embodiment of the present invention, the data in the preset database may also be updated according to a second preset period, for example, the data may be updated once every 10 days; or the intention score of the data in the preset database may be updated according to the conversion condition of the user corresponding to the data in the subsequent use process, for example, the intention score of the data may be updated according to a third preset period, if the data a is used1Time-lapse determination data a1The corresponding user is interested in the shoes, and the data a can be improved1An intent score of; if the data a is used1Time-lapse determination data a1The corresponding user is not interested in the shoes, and the data a can be reduced1The intention score of (1). The intention score of the data can be made to better conform to the actual situation by updating the intention score of the data according to the actual execution effect.
In the embodiment of the present invention, there may be multiple ways to obtain N pieces of data from the preset database, and in a possible implementation way, N pieces of data matching the range to which the requested data belongs may be obtained from the preset database. For example, based on the preset database illustrated in table 2, if the range of the requested data in the acquired request information is in the footwear industry, 5 pieces of data and the intention score of each piece of data in the 5 pieces of data may be acquired from the preset database of the footwear; if the range of the data requested in the acquired request information is the footwear industry and the shanghai area, the area to which the data belongs can be acquired from the footwear preset database3 pieces of data (i.e., data a)1、a3And a5) And an intent score for each of the 3 pieces of data.
In another possible implementation manner, the preset database may further store cooling time of a plurality of pieces of data, and after the request information is obtained, N pieces of data whose cooling time is less than the preset threshold may be obtained from the preset database according to a range to which the requested data belongs. Where the cooling time may be used to identify the frequency with which the data is used. In the embodiment of the invention, by presetting the cooling time of each piece of data in the preset database, the data with the cooling time greater than or equal to the preset threshold value can be avoided being selected; in an application aspect, the cooling time of the data may be used to identify the number of times the data is recommended to a user or the number of times the data is used by the user, and therefore, by setting the cooling time of the N pieces of data to be less than a preset threshold, the same piece of data may be prevented from being recommended to multiple users, so that a technical problem that a data body is disturbed when multiple users use the same piece of data in the same time period may be prevented, and user experience may be improved.
Further, the N pieces of data can be divided into T data intervals according to the range to which the N pieces of data belong and the intention score of each piece of data in the N pieces of data; each data interval of the T data intervals may include at least one piece of data of the N pieces of data, each data interval corresponds to an intention conversion rate, and the intention conversion rate of each data interval may be obtained according to an intention score of the at least one piece of data included in each data interval. Taking the preset database of the footwear as an example, in one example, the intention scores of the N pieces of data in the preset database of the footwear may be sorted in order from small to large (for example, the intention scores of the N pieces of data range from 1 to 100), and the intention scores of 1 to 100 are divided into T intention score intervals. For example, the intention score of 1 to 100 may be divided into T intention score intervals according to a predetermined length, and the difference between any two adjacent intention score intervals may be the same; or the intention scores of 1-100 can be divided into T intention score intervals according to the actual execution effect of the N pieces of data, and the difference value of any two adjacent intention score intervals can be set according to the actual execution effect of the N pieces of data. Further, a corresponding relationship between each intention score interval and the intention conversion rate may be set, and the corresponding relationship between the T intention score intervals and the intention conversion rates of different industries may be the same or may also be different. It should be noted that, in actual operation, the corresponding relationship between the T intent score intervals and the intent conversion rate may also be updated according to the actual execution effect of the N pieces of data.
Table 3 is a schematic table of T data intervals provided in the embodiment of the present invention.
Table 3: t data interval signaling
Data interval Intention conversion rate Interval of intention score Number of data
First data interval 0.00% [0,5] X1
Second data interval 0.50% [6,8] X2
Third data interval 1.00% [9,12] X3
Fourth data interval 1.50% [13,25] X4
Fifth data interval 2.00% [26,40] X5
Sixth data interval 2.50% [41,50] X6
Seventh data interval 3.00% [51,53] X7
Eighth data interval 3.50% [54,60] X8
Ninth data interval 4.00% [61,71] X9
Tenth data interval 4.50% [72,79] X10
Eleventh data interval 5.00% [80,100] X11
As shown in Table 3, the N pieces of data are divided into 11 data intervals, and the intention score interval of the first data interval is [0, 5 ]]The number of included data is X1Bar, corresponding intention conversion of 0.00%; the intention score interval of the second data interval is [6, 8 ]]The number of included data is X2Bar, corresponding to an intentional conversion of 0.50%; the intention score interval of the third data interval is [9, 12 ]]The number of included data is X3Bar, corresponding intention conversion of 1.00%; the intention score interval of the fourth data interval is [13, 25 ]]The number of included data is X4Bar, corresponding to an intentional conversion of 1.50%; the intention score interval of the fifth data interval is [26, 40 ]]The number of included data is X5Bar, corresponding intention conversion of 2.00%; the intention score interval of the sixth data interval is [41, 50 ]]The number of included data is X6Bar, corresponding to an intentional conversion of 2.50%; the intention score interval of the seventh data interval is [51, 53 ]]The number of included data is X7Bar, corresponding intention conversion of 3.00%; the intention score interval of the eighth data interval is [54, 60 ]]The number of included data is X8Bar, corresponding to an intentional conversion of 3.50%; the intention score interval of the ninth data interval is [61, 71 ]]The number of included data is X9Bar, corresponding intent conversion of 4.00%; the intention score interval of the tenth data interval is [72, 79 ]]The number of included data is X10Bars, corresponding intentionThe conversion rate was 4.50%; the intention score interval of the eleventh data interval is [80, 100]]The number of included data is X11Bar, corresponding to an intended conversion of 5.00%.
Further, the intent conversion rate corresponding to the requested data may be determined based on the value corresponding to the requested data, the reference value corresponding to the requested data, and the reference intent conversion rate corresponding to the requested data. Specifically, the intention conversion rate corresponding to the requested data may be composed of a first sub-conversion rate and a second sub-conversion rate, and the first sub-conversion rate may be a conversion rate corresponding to a value corresponding to the obtained requested data, which is determined according to a correspondence between a reference value corresponding to the requested data and a reference intention conversion rate corresponding to the requested data. For example, if the reference value corresponding to the requested data is 500 incentive values and the intent-to-reference conversion rate corresponding to the requested data is W, the correspondence between the reference value corresponding to the requested data and the intent-to-reference conversion rate corresponding to the requested data may be 500: w, therefore, if the requested data corresponds to a value of 1000 incentive values, the first subconvertion rate may be 2W. The second sub-conversion rate may be determined based on a value of the requested data relative to a reference value of the requested data.
In one example, the second sub-conversion may satisfy the following condition:
Figure GDA0003030865210000131
Figure GDA0003030865210000132
where β is a second sub-conversion rate corresponding to the requested data, P is a value corresponding to the requested data, P is a reference value corresponding to the requested data, and W is a reference intention conversion rate corresponding to the requested data.
Step 203, extracting the requested data according to the intention conversion rate corresponding to the requested data and the intention conversion rate of each interval in the T data intervals.
In specific implementation, one or more data intervals with the intention conversion rate matched with the intention conversion rate corresponding to the requested data can be screened out from the T data intervals to serve as target data intervals, and the requested data can be extracted from the target data intervals according to the quantity of the requested data. Taking the number of requested data as y pieces as an example, based on T data intervals shown in table 3, in one example, if the intention conversion rate corresponding to the requested data is 2.00%, the fifth data interval may be used as the target data interval, and X included in the fifth data interval may be used as the target data interval5And selecting y pieces of data from the pieces of data as target data. In another example, if the requested data corresponds to an intention conversion rate of 3.21%, the seventh data interval and the eighth data interval may be set as the target data interval, and X may be included from the seventh data interval7Selecting M from the strip data1Stripe data and X included from eighth data interval8Selecting M from the strip data2Stripe data, will M1Stripe data and M2The bar data is used as target data. Wherein M is1And M2The following conditions may be satisfied:
Figure GDA0003030865210000133
M1+M2=y
in one example, if the number of data included in the destination data interval is smaller than the number of requested data, the data satisfying the condition may be acquired and added to the destination data interval, so that the number of data included in the destination data interval is greater than or equal to the number of requested data. The data satisfying the condition may be obtained in various manners, for example, the data satisfying the condition may be obtained from other marketing platforms, or a new user satisfying the condition may be expanded to obtain the data of the new user satisfying the condition, which is not limited specifically.
In the embodiment of the invention, N pieces of data are divided into T data intervals with different intention conversion rates (namely, the quality of the data in each data space is different), and the data are extracted from the target data interval matched with the intention conversion rate corresponding to the requested data, so that the value of the extracted data and the value of the requested data can be ensured to be matched, namely, the quality of the extracted data is matched with an incentive value paid by a user; further, since the target data extracted for each user is matched with the value of the data requested by the user, reasonable distribution of the data can be ensured.
In a possible implementation manner, after the target data is extracted from the target data interval, the usage of the target data may be monitored, and the cooling time of the target data stored in the preset database may be updated according to the usage of the target data. Taking the first data in the target data as an example, in an outbound service (namely, the data is the contact way of the user), the marketing platform makes a call return visit according to the purchase record of the user to the commodity, and updates the cooling time of the first data according to the effect of making a call.
Fig. 3 is a schematic flow chart of updating the cooling time according to an embodiment of the present invention, and as shown in fig. 3, if the marketing platform makes a call to a user corresponding to the first data, a process of updating the cooling time of the first data may include the following four branches:
in the first branch, if the first data is null, the cooling time of the first data may be updated to 180 days.
And in the second branch, if the user corresponding to the first data does not answer the call, the touch frequency of the first data can be increased once, and the cooling time of the first data does not need to be updated.
And in the third branch, if the user corresponding to the first data answers the call and is not interested in the marketing goods, the cooling time of the first data can be updated to 10 days.
And a fourth branch, if the user corresponding to the first data answers the call and is interested in the marketing goods, the cooling time of the first data can be updated to 30 days.
If the frequency of the first data reaches the preset frequency (for example, 10 times), the first data may be set to be in a locked state, that is, the cooling time of the first data may be updated to any time from 10 days to 30 days.
It should be noted that fig. 3 is only an exemplary and simple illustration, and the listed ways of updating the cooling time (i.e. four scores) are only for convenience of illustration and do not constitute a limitation on the schemes, and in a specific implementation, the ways of updating the cooling time may be set according to an actual scenario, and are not limited thereto.
In the embodiment of the invention, the cooling time of each piece of data in the preset database can be updated according to the actual use condition of each piece of data, and if a certain piece of data is used by a user in the first time period, the cooling time of the piece of data is updated, so that the data can be prevented from being recommended to other users in the cooling time, different users can be prevented from acquiring the same piece of data, and the user experience is improved.
In view of the above method flow, an embodiment of the present invention further provides a message processing apparatus, and specific contents of the apparatus may be implemented with reference to the above method.
Fig. 4 is a schematic structural diagram of a message processing apparatus according to an embodiment of the present invention, where the apparatus includes:
an obtaining module 401, configured to obtain request information, where the request information includes a quantity of requested data, a range to which the requested data belongs, and a value corresponding to the requested data;
a processing module 402, configured to obtain, according to the request information, N pieces of data from a preset database and an intention score of each piece of data in the N pieces of data, where a range to which the N pieces of data belong matches a range to which the requested data belongs; dividing the N data into T data intervals according to the range to which the N data belong and the intention score of each piece of data in the N data, wherein each data interval in the T data intervals comprises at least one piece of data in the N data; obtaining the intention conversion rate of each data interval according to the intention score of at least one piece of data included in each data interval; determining the intention conversion rate corresponding to the requested data according to the value corresponding to the requested data, the reference value corresponding to the requested data and the reference intention conversion rate corresponding to the requested data; wherein N is more than or equal to T and is more than 0;
an extracting module 403, configured to screen out, from the T data intervals, a target data interval in which an intention conversion rate matches an intention conversion rate corresponding to the requested data, and extract the requested data from the target data interval according to the quantity of the requested data.
Optionally, the request information further includes seed data, and the intention score of the N pieces of data is determined by:
determining negative sample data from P pieces of data of the preset database, wherein the P pieces of data comprise the N pieces of data; wherein P is more than or equal to N;
and training the seed data and the negative sample data to obtain a prediction model, and predicting the P pieces of data by using the prediction model respectively to obtain the intention score of the P pieces of data.
Optionally, the processing module 402 is configured to:
and acquiring the N pieces of data with the cooling time less than a preset threshold value from the preset database according to the range of the requested data.
Optionally, the requested data includes first data, and after the extraction module 403 extracts the requested data from the target data interval, the processing module 402 is further configured to:
and if the first data is determined to be used by the user, updating the cooling time of the first data.
From the above, it can be seen that: in the embodiment of the invention, after the request information is acquired, the N pieces of data and the intention score of each piece of data in the N pieces of data are acquired from the preset database according to the request information, the N pieces of data are divided into T data intervals according to the range to which the N pieces of data belong and the intention score of each piece of data in the N pieces of data, and the intention conversion rate of each data interval is obtained according to the intention score of at least one piece of data included in each data interval; further, determining the intention conversion rate corresponding to the requested data according to the value corresponding to the requested data, the reference value corresponding to the requested data and the reference intention conversion rate corresponding to the requested data; and screening out a target data interval with the intention conversion rate matched with the intention conversion rate corresponding to the requested data from the T data intervals, and extracting the requested data from the target data interval according to the quantity of the requested data. In the embodiment of the invention, N pieces of data are divided into T data intervals with different intention conversion rates (namely, the quality of the data in each data space is different), and the data are extracted from the target data interval matched with the intention conversion rate corresponding to the requested data, so that the extracted data can be ensured to be matched with the value of the requested data, namely, the quality of the extracted data is matched with the incentive value paid by a marketer; further, since the target data extracted for each marketer is matched with the value of the data requested by the marketer, the reasonable distribution of the data can be ensured.
Based on the same inventive concept, embodiments of the present invention also provide a computer program product, which when run on a computer, causes the computer to execute the data extraction method in the embodiment shown in fig. 2.
Based on the same inventive concept, embodiments of the present invention also provide a computer-readable storage medium, which includes computer-readable instructions, and when the computer-readable instructions are read and executed by a computer, the computer-readable instructions cause the computer to perform the data extraction method in the embodiment shown in fig. 2.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method of data extraction, the method comprising:
acquiring request information, wherein the request information comprises the quantity of requested data, the range of the requested data and the value corresponding to the requested data;
acquiring N pieces of data and an intention score of each piece of data in the N pieces of data from a preset database according to the request information, wherein the intention score represents a positive feedback probability value of a user corresponding to the data to all data in a range to which each piece of data in the N pieces of data belongs, and the range to which the N pieces of data belong is matched with the range to which the requested data belongs; dividing the N data into T data intervals according to the range to which the N data belong and the intention score of each piece of data in the N data, wherein each data interval in the T data intervals comprises at least one piece of data in the N data; obtaining an intention conversion rate of each data interval according to an intention score of at least one piece of data included in each data interval, wherein the intention conversion rate is used for representing the quality of the data in each data interval in the T data intervals; wherein N is more than or equal to T and is more than 0;
determining the intention conversion rate corresponding to the requested data according to the value corresponding to the requested data, the reference value corresponding to the requested data and the reference intention conversion rate corresponding to the requested data;
and screening out target data intervals with intention conversion rates matched with the intention conversion rates corresponding to the requested data from the T data intervals, and extracting the requested data from the target data intervals according to the quantity of the requested data.
2. The method according to claim 1, wherein the request message further includes seed data, and the intention scores of the N pieces of data are determined by:
determining negative sample data from P pieces of data of the preset database, wherein the P pieces of data comprise the N pieces of data; wherein P is more than or equal to N;
and training the seed data and the negative sample data to obtain a prediction model, and predicting the P pieces of data by using the prediction model respectively to obtain the intention score of the P pieces of data.
3. The method according to claim 1 or 2, wherein the obtaining N pieces of data from the preset database comprises:
and acquiring the N pieces of data with the cooling time smaller than a preset threshold from the preset database according to the range of the requested data, wherein the cooling time represents the time interval of each piece of data used by the user or recommended to the user for 2 times continuously in the N pieces of data.
4. The method of claim 3, wherein the requested data comprises first data, and wherein extracting the requested data from the target data interval further comprises:
updating a cooling time of the first data if it is determined that the first data is used.
5. A data extraction apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring request information, and the request information comprises the quantity of requested data, the range of the requested data and the value corresponding to the requested data;
the processing module is used for acquiring N pieces of data and an intention score of each piece of data in the N pieces of data from a preset database according to the request information, wherein the intention score represents a positive feedback probability value of a user corresponding to each piece of data in the N pieces of data on other data in a range to which the data belong, and the range to which the N pieces of data belong is matched with the range to which the requested data belongs; dividing the N data into T data intervals according to the range to which the N data belong and the intention score of each piece of data in the N data, wherein each data interval in the T data intervals comprises at least one piece of data in the N data; obtaining the intention conversion rate of each data interval according to the intention score of at least one piece of data included in each data interval; determining the intention conversion rate corresponding to the requested data according to the value corresponding to the requested data, the reference value corresponding to the requested data and the reference intention conversion rate corresponding to the requested data, wherein the intention conversion rate is used for representing the quality of the data in each data interval in the T data intervals; wherein N is more than or equal to T and is more than 0;
and the extraction module is used for screening out a target data interval of which the intention conversion rate is matched with the intention conversion rate corresponding to the requested data from the T data intervals, and extracting the requested data from the target data interval according to the quantity of the requested data.
6. The apparatus of claim 5, wherein the request message further includes seed data, and wherein the intention score of the N pieces of data is determined by:
determining negative sample data from P pieces of data of the preset database, wherein the P pieces of data comprise the N pieces of data; wherein P is more than or equal to N;
and training the seed data and the negative sample data to obtain a prediction model, and predicting the P pieces of data by using the prediction model respectively to obtain the intention score of the P pieces of data.
7. The apparatus of claim 5 or 6, wherein the processing module is configured to:
and acquiring the N pieces of data with the cooling time smaller than a preset threshold from the preset database according to the range of the requested data, wherein the cooling time represents the time interval of each piece of data used by the user or recommended to the user for 2 times continuously in the N pieces of data.
8. The apparatus of claim 7, wherein the requested data comprises first data, and after the extraction module extracts the requested data from the target data interval, the processing module is further configured to:
updating a cooling time of the first data if it is determined that the first data is used.
9. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1 to 4.
10. A data extraction device characterized in that the data extraction device comprises: a processor, and a memory storing computer program instructions; the processor reads and executes the computer program instructions to implement the data extraction method of any one of claims 1 to 4.
CN201910231855.8A 2019-03-26 2019-03-26 Data extraction method and device Active CN111753179B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910231855.8A CN111753179B (en) 2019-03-26 2019-03-26 Data extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910231855.8A CN111753179B (en) 2019-03-26 2019-03-26 Data extraction method and device

Publications (2)

Publication Number Publication Date
CN111753179A CN111753179A (en) 2020-10-09
CN111753179B true CN111753179B (en) 2021-06-15

Family

ID=72671057

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910231855.8A Active CN111753179B (en) 2019-03-26 2019-03-26 Data extraction method and device

Country Status (1)

Country Link
CN (1) CN111753179B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140156396A1 (en) * 2009-07-09 2014-06-05 Cubic Corporation Techniques in transit advertising
CN102156932A (en) * 2010-02-11 2011-08-17 阿里巴巴集团控股有限公司 Prediction method and device for secondary purchase intention of customers
US10134058B2 (en) * 2014-10-27 2018-11-20 Amobee, Inc. Methods and apparatus for identifying unique users for on-line advertising
CN107015863B (en) * 2016-11-25 2021-05-04 阿里巴巴集团控股有限公司 Resource allocation method and device
CN107730318B (en) * 2017-10-30 2021-08-20 厦门二五八网络科技集团股份有限公司 Intelligent client recommendation platform and recommendation method thereof
CN108460601A (en) * 2017-11-27 2018-08-28 平安科技(深圳)有限公司 Client's list acquisition methods, device, terminal device and storage medium
CN108694622B (en) * 2018-06-26 2022-03-01 泰康保险集团股份有限公司 Method and device for obtaining guests

Also Published As

Publication number Publication date
CN111753179A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
US6507851B1 (en) Customer information retrieving method, a customer information retrieving apparatus, a data preparation method, and a database
US20160379268A1 (en) User behavior data analysis method and device
US10657149B2 (en) Information-processing system
KR20160071990A (en) Customer data analysis and verification system
Joshi et al. A random forest approach for predicting online buying behavior of Indian customers
CN109299356B (en) Activity recommendation method and device based on big data, electronic equipment and storage medium
Guney et al. A combined approach for customer profiling in video on demand services using clustering and association rule mining
CN110689402A (en) Method and device for recommending merchants, electronic equipment and readable storage medium
Faroqi et al. Behavioural advertising in the public transit network
Helmers et al. Attention and saliency on the internet: Evidence from an online recommendation system
CN111127074B (en) Data recommendation method
Zheng et al. A scalable purchase intention prediction system using extreme gradient boosting machines with browsing content entropy
CN112330373A (en) User behavior analysis method and device and computer readable storage medium
CN113159828A (en) Promotion scheme recommendation method and device and computer readable storage medium
CN111967970A (en) Bank product recommendation method and device based on spark platform
CN111753179B (en) Data extraction method and device
CN111951051A (en) Method, device and system for recommending products to customers
CN114119168A (en) Information pushing method and device
CN113516496B (en) Advertisement conversion rate estimation model construction method, device, equipment and medium thereof
CN114925261A (en) Keyword determination method, apparatus, device, storage medium and program product
CN111382343B (en) Label system generation method and device
CN112015970A (en) Product recommendation method, related equipment and computer storage medium
CN110992106A (en) Training data acquisition method and device, and model training method and device
CN110827093A (en) Method and device for accurate marketing
CN111881355B (en) Object recommendation method and device, storage medium and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant