CN113590692A - Three-stage crowd mining condition optimization method and system - Google Patents

Three-stage crowd mining condition optimization method and system Download PDF

Info

Publication number
CN113590692A
CN113590692A CN202110914921.9A CN202110914921A CN113590692A CN 113590692 A CN113590692 A CN 113590692A CN 202110914921 A CN202110914921 A CN 202110914921A CN 113590692 A CN113590692 A CN 113590692A
Authority
CN
China
Prior art keywords
crowd
stage
model
data
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110914921.9A
Other languages
Chinese (zh)
Inventor
陈子妍
袁亦韧
林炯佑
李炳辉
姬小庆
魏文俊
杨睿通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Fibonacci Information Technology Co ltd
Original Assignee
Suzhou Fibonacci Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Fibonacci Information Technology Co ltd filed Critical Suzhou Fibonacci Information Technology Co ltd
Priority to CN202110914921.9A priority Critical patent/CN113590692A/en
Publication of CN113590692A publication Critical patent/CN113590692A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention relates to a three-stage crowd mining condition optimization method and system, and belongs to the field of data processing. In the production of a user operation platform, the functions of target sample crowd access, a data mining model, a three-stage model optimization process, historical operation data back test and periodic target crowd extraction are provided for a user, so that the automation of model training and rule crowd extraction processes, the flexible screening of three-stage rules and the image-assisted crowd tendency detection are realized, and the daily repetitive labor of operation is greatly reduced. The three-stage crowd mining condition optimization method is approved by a large number of professional operators and applied to daily stock user operation management activities. Through the optimization of the return test certificate and the three-stage model, the operation test cost of the expansion population is greatly reduced, the operation and new-development effect is remarkably improved, and the whole operation risk is effectively controlled.

Description

Three-stage crowd mining condition optimization method and system
Technical Field
The invention belongs to the field of data processing, and relates to a three-stage crowd mining condition optimization method and system.
Background
Under the internet environment, more and more small and medium-sized enterprises rely on stock customer mining and customer preference trend exploration to achieve the business purposes of cost reduction, innovation, wind control and the like.
At present, related stock customers on the market dig operation management platforms comprise a data management platform, an advertisement putting platform, a data digging model platform and the like, all of which have certain data access capacity, support users to maintain own stock users, automatically select an expansion crowd packet according to a label or an algorithm, and then access the expansion link.
The selected key link of the crowd packet is generated by a coarse-grained label or a clustering regression algorithm, the selected key link is unknown to a user, the quality of the crowd packet is controlled by an operation platform, the target accuracy is not controllable, the self-definition capability is lacked, and the user is forced to be in a passive environment.
Disclosure of Invention
In view of this, the present invention aims to provide a three-stage crowd mining condition optimization method and system, which solves the following problems:
firstly, the platform data mining model parameters are not open enough, and the user cannot be supported to adjust the parameters of the model training autonomously.
Secondly, the platform is incompatible with the self-owned data mining model, and the self-owned model can not be logically merged into the general flow of the platform.
Thirdly, the fine-grained screening support of the data mining crowd result set is insufficient, and the target crowd cannot be accurately positioned.
And fourthly, in the initial experimental stage, the crowd effect prediction support is insufficient, and the operation test cost cannot be controlled and the operation effect cannot be ensured.
The visual display of the optimization result is finally realized, the flexible screening of the user according to the fine-grained label and the correlation index is supported, the combined superposition of the natural attributes, the commercial attributes, the equipment labels and the industrial attributes of the crowd is supported, and the target crowd packet with the expansion amount is positioned. And to a certain extent, assisting the user to realize quantitative control of the target group accuracy.
In order to achieve the purpose, the invention provides the following technical scheme:
a three-stage crowd mining condition optimization method comprises the following steps:
s1: acquiring audience sample crowd raw data; acquiring five-level classification labels related to audience sample crowd interest attributes, and natural attributes, business attributes, equipment attributes and industry attributes;
s2: in the first training stage, a multi-class quantitative correlation data mining model provided by a platform is automatically accessed or selected, a training task is submitted according to a customized model or model parameters, and a target crowd result set in the first stage is obtained;
s3: in the second verification stage, according to the five-level classification and TGI index statistics of the coarse-grained related population interest tags in the first stage, a second-stage auxiliary tendency interest tag screening is carried out by combining a TGI index 0.1% precision tag population bitmap after edge data smoothing;
s4: in the third integration stage, forward or reverse screening conditions are supplemented from the attribute matching condition tree according to the coverage statistical result of the natural attribute, the business attribute, the equipment attribute and the industry attribute label dimension; the conditions support the intersection and difference combination logic, assist the user to specify the characteristic parameters of the target crowd and converge the crowd result set;
s5: according to the three-stage optimization of S2-S4, a crowd mining model is finally selected, and a relatively stable target crowd characteristic parameter combination set, namely a target crowd rule set, is generated;
s6: carrying out periodic target crowd operation according to the target crowd rule set;
s7: in the initial experiment stage, the user is supported to select historical operation data, and the regression test of the operation effect of the target population is carried out to verify the target population rule set generated in the three stages.
Optionally, in S2, the page accesses the locally uploaded targeted audience sample crowd package;
and (3) selecting a target crowd data mining model provided by the platform autonomously, setting a model data time interval, a model data domain and a model label granularity grade, training positive sample crowds, training negative sample crowds or training overall sample crowds and model internal parameters from one grade to five grades, and submitting a training mining task.
Optionally, in S2, setting a model data field, a target expansion magnitude, training a positive sample population, and submitting an intelligent population expansion task without selecting a specific model;
and calculating a model with the optimal historical operation backtesting effect in the self-correlation model according to the label list covered by the training positive sample crowd, and mining the fixed-magnitude crowd of the specified positive sample.
Optionally, in S2, a custom model is selected, a standard model code package is provided, custom model parameters are entered, a training sample is selected, and a training excavation task is submitted.
Optionally, the relevance index bitmap and the data sample of the target crowd information are displayed to the user according to the first-stage target crowd result set;
on one hand, the user can screen interest categories through displayed Chinese classification, so that the user is assisted in removing remarkable noise data and abnormal data, and the effective crowd range is reduced;
on the other hand, the user can check the TGI of the designated sub-position and the index value of the number of the large-disk covered people through the TGI index 0.1% precision tag people sub-position chart, select the TGI index value, combine the conditions of the number of the large-disk covered people and the tag keywords to screen the sample, and check the number of the tags covered by the current feature combination in real time.
Optionally, the screening of the interest categories includes gender, age, price, mobile brand, mobile model, province of the user, city hierarchy, consumption ability, occupation, marital status, risk level, and industry attributes.
Optionally, according to the target population rule set, in an initial experimental stage, selecting historical operation data of a specified time, a service type and a data domain, performing index statistics of a touch achievement number, a click number and a download number of the historical operation of the target population, and providing a large-disk operation effect as a reference comparison;
and in the middle and later period stabilization stage, selecting a relative time interval, a business type, a data field and an extraction period, and performing normalized extraction on the target population.
A computer system comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, the processor when executing the computer program implementing the method of any one of claims 2 or 3 or 4 or 6 or 7.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 2 or 3 or 4 or 6 or 7.
The invention has the beneficial effects that:
first, the requirement of a user for actively intervening in a data mining model training result set can be met to a certain extent, and the three stages support the user to continuously improve and optimize the current target crowd screening logic, so that the result set is accurate, effective and certain.
Secondly, in the production of a user operation platform, the functions of target sample crowd access, a data mining model, a three-stage model optimization process, historical operation data retesting and periodic target crowd extraction are provided for a user, so that automation of a model training and rule crowd extraction process, flexible screening of three-stage rules and image-assisted crowd tendency detection are realized, and daily repetitive labor in operation is greatly reduced. The three-stage crowd mining condition optimization method is approved by a large number of professional operators and applied to daily stock user operation management activities.
Thirdly, through the retest verification and the three-stage model optimization, the operation test cost of the expansion population is greatly reduced, the operation and new effect is remarkably improved, and the overall operation risk is effectively controlled.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Drawings
For the purposes of promoting a better understanding of the objects, aspects and advantages of the invention, reference will now be made to the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 is a schematic three-stage flow diagram of the present invention;
FIG. 2 is a schematic flow chart of a first training phase according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a second training phase according to an embodiment of the present invention;
FIG. 4 is a flow chart illustrating a third training phase according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart of a verification phase according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating an example of a second verification phase according to an embodiment of the present invention;
FIG. 7 is a flow chart of an integration phase according to an embodiment of the present invention;
FIG. 8 is a flow chart of a third example integration phase according to an embodiment of the present invention
FIG. 9 is a schematic diagram of a data retrieval and extraction application flow according to an embodiment of the present invention;
FIG. 10 is a flow chart of a first extraction application according to an embodiment of the present invention;
FIG. 11 is a flow chart of a second extraction application according to an embodiment of the present invention;
FIG. 12 is a flow chart of a third extraction application according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention in a schematic way, and the features in the following embodiments and examples may be combined with each other without conflict.
Wherein the showings are for the purpose of illustrating the invention only and not for the purpose of limiting the same, and in which there is shown by way of illustration only and not in the drawings in which there is no intention to limit the invention thereto; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if there is an orientation or positional relationship indicated by terms such as "upper", "lower", "left", "right", "front", "rear", etc., based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not an indication or suggestion that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes, and are not to be construed as limiting the present invention, and the specific meaning of the terms may be understood by those skilled in the art according to specific situations.
In a first aspect, the present disclosure provides a method for optimizing three-stage crowd sourcing conditions, including:
and acquiring audience sample crowd raw data.
And acquiring five-level classification labels related to the audience sample crowd interest attributes, and natural attributes, business attributes, equipment attributes and industry attributes.
In the first training stage, a multi-class quantitative correlation data mining model provided by a platform is automatically accessed or selected, a training task is submitted according to a customized model or model parameters, and a target crowd result set in the first stage is obtained.
And in the second verification stage, secondary auxiliary tendency interest label screening is carried out according to the five-level classification and TGI index statistics of the coarse-grained related population interest labels in the first stage and by combining a 0.1% precision label people bitmap of the TGI index after edge data smoothing processing.
And in the third integration stage, forward or reverse screening conditions are supplemented from the attribute matching condition tree according to the coverage statistical result of the natural attribute, the business attribute, the equipment attribute and the industry attribute label dimension. The above conditions support the cross-and-difference combinational logic, assist the user to specify the characteristic parameters of the target population, and converge the population result set.
According to the three-stage optimization, a crowd mining model is finally selected, and a relatively stable target crowd characteristic parameter condition combination set (called a target crowd rule set for short, and will be referred to as the target crowd rule set in the following) is generated.
And according to the target crowd rule set, the periodic target crowd operation can be carried out.
In the initial experiment stage, the user is supported to select historical operation data, and the regression test of the operation effect of the target population is carried out to verify the target population rule set generated in the three stages.
In a second aspect, with reference to the first training phase of the first aspect, a first possible implementation manner of the present invention includes:
the page accesses a locally uploaded targeted audience sample crowd package.
And (3) selecting a target crowd data mining model provided by the platform autonomously, setting a model data time interval, a model data domain, model label granularity levels (from one to five, five levels in total), a training positive sample crowd, a training negative sample crowd or a training total sample crowd and model internal parameters (including default values), and submitting a training mining task.
In a second aspect, with reference to the first training phase of the first aspect, a second possible implementation manner of the present invention includes:
setting a model data field, a target expansion magnitude and training a positive sample crowd, and submitting an intelligent crowd expansion task without selecting a specified model.
The system calculates a model with the optimal historical operation backtesting effect in the self-correlation model according to a label list covered by the training positive sample crowd, and performs fixed-magnitude crowd mining on the designated positive sample.
The function supports the user to quickly start, and helps the user to effectively and quickly realize the target crowd expansion amount in the first training stage by means of historical operation data accumulated by the platform.
In a second aspect, in combination with the first training phase of the first aspect, the method of the third implementation of the first aspect includes:
selecting a custom model, providing a standard model code packet, inputting custom model parameters, selecting a training sample, and submitting a training excavation task.
The three implementation methods all support the self-definition of the type, granularity, parameter and corresponding service type of the training model.
And in the third aspect, the relevance index mapping chart and the data sample of the target crowd information are displayed to the user according to the training result set.
On one hand, the user can screen interest categories through displayed Chinese classification, the user is assisted in removing remarkable noise data and abnormal data, and the effective crowd range is reduced.
On the other hand, the user can easily check the TGI of the designated sub-position and the index value of the number of the large-disk covered people through the TGI index 0.1% precision label people sub-position chart, select the accurate TGI index value, combine the conditions of the number of the large-disk covered people, the label keywords and the like to screen the sample, and check the number of the labels covered by the current feature combination in real time.
The function assists the user in positioning accurate screening conditions of audiences according to the specific label distribution condition of the target population in the second verification stage.
And fourthly, checking audience tag distribution except interests, wherein the audience tag distribution comprises gender, age, price, mobile phone brand, mobile phone model, province of the audience, city level, consumption capacity, occupation, marital state, risk level, industry attributes and the like, and overlapping the attributes to realize multi-layer filtering. The above screening conditions support cross-and-poor combinations.
The function supports the user to converge the characteristics of the target population in the third integration stage, and the result is more accurate and effective through artificial judgment, so that the stability of the operation effect of the population is ensured.
And fifthly, generating a target crowd rule set according to the three-stage model optimization process.
In the initial experiment stage, historical operation data of specified time, service types and data domains can be selected, the statistics of indexes such as the number of touch achievement results, the number of clicks, the number of downloads and the like of the historical operation of target groups is carried out, and the operation effect of the large disk is provided as reference comparison.
And in the middle and later period stabilization stage, the time interval, the service type, the data domain and the extraction period can be selected and phased, and the normalized extraction of the target crowd is carried out.
The present invention will be described with reference to the drawings in the embodiments of the present application, so as to solve the technical problems in the prior art. Meanwhile, a more flexible data mining model training flow and optimization steps are provided, a user is supported to carry out three-stage target population correlation visual verification, the target population magnitude is converged by multiple condition sets, and a target population is accurately positioned.
As shown in fig. 1, the present invention provides a three-stage crowd mining condition optimization method, which specifically includes the following steps.
Step S1, the user operation management platform obtains audience sample.
And (4) submitting audience sample crowd details by a user, and analyzing static and dynamic label attributes of the sample crowd through sample crowd label detection.
And step S2, in the first training stage, selecting a data mining model to mine target population.
And the user autonomously selects a data mining training model, which comprises a platform correlation model and an autonomous submission model.
And filling necessary parameters according to the model, and submitting a training task.
Step S3, in the second verification stage, correlation verification and screening are performed in combination with the statistical value bitmap and the tag statistical examples.
Based on the training result set generated in step S2, an interest tag statistics list of a specified granularity is generated, and a relationship graph of estimated large disk coverage population (without duplication) and assigned dimension tag TGI index 0.1% precision tag population quantiles is generated. And according to the auxiliary statistical values and the related label tendentiousness of the user verification result set, carrying out target group condition screening.
And step S4, a third integration stage, combining the static attributes to carry out target crowd constraint.
And combining the operation target based on the condition result set generated in the step S3 and the static attributes of the crowd generated in the step S2 to carry out target crowd condition constraint.
The condition set selected in the S3 stage and the condition set selected in the S4 stage can realize flexible cross-parallel-difference combination.
In step S5, a history data review or a periodic data retrieval is performed on the condition rule set.
And constructing a rule set according to the training conditions and the screening conditions of the steps S2/S3/S4.
Optionally, a small batch historical population operation effect regression test may be performed to verify that the rule set generated in steps S2/S3/S4 meets the expected operation effect.
And if the test is not needed, the target crowd extraction mode can be directly selected for extracting the target crowd.
In one embodiment, as shown in FIG. 2, step S2 includes.
Step S211: the models provided by the platform comprise a target index bias model (M1), a time frequency consumption model (M2), a multi-positive sample user scoring model rRDF (M3), a multi-sample time frequency consumption model (M4) and a frequent pattern growth model (M5).
Wherein the target index bias population reflects the strengths or weaknesses of the target population within a particular research scope (e.g., region, media audience, product consumption). The stronger the target population bias, the more similar the target population is considered to be, the more likely it is to have similar interest trends and behavior trends.
The time frequency consumption model reflects the comprehensive performance of a target group in three dimensions of time, frequency and consumption and is used for distinguishing edge users, hardworking users, stock users and golden bull users. Head users are high-potential and high-value users who specify interests and behaviors.
The multi-positive-sample user scoring model rRDF represents the tag click rate, R represents the tag library collision rate, D represents the number of tag access days, and F represents the tag access frequency. And determining the main characteristics of the target population by using a reverse PCA principle and depending on image analysis of the target population, then calculating multi-sample population characteristic vectors, obtaining attribute label weight coefficients, and finally comprehensively counting user tendency integrals. The user tendency is quantified, the similarity crowd is quantitatively and accurately positioned, and the problems of uneven coverage of the number of the labels and fluctuation of the marketing effect in the common algorithm are solved.
The multi-sample time frequency consumption model is subjected to multi-sample processing based on the time frequency consumption model, so that after multi-sample training, three-dimensional performance is more uniform, and high-potential and high-value user detection is more stable and effective.
The frequent pattern growth model reflects the ordering condition of interest label cluster sets of target crowds, identifies the tendency of crowd interest to a certain extent, and efficiently filters out similar crowds.
For example: the user positioning crowd pack selects the rRDF model according to the expansion target, and the form parameters change along with the model.
Step S212: according to the nature of the operation service and the time interval, a data field (such as an operator, province and city, other data sources), a training service (such as insurance, finance, game and the like) and a training starting time are selected.
The method supports easy and accurate positioning of relevant target population in a plurality of business fields, particularly in the fields of insurance, finance and games.
Step S213: the statistical label granularity is selected according to the desired training granularity (e.g., five levels). The label grade is from one grade to five grades, the granularity is from coarse to fine, and different label granularities can influence the display of subsequent chart information.
The label is marked manually, the real reliability of data is improved, the accurate positioning of behavior interest data is promoted, and the accuracy of the crowd expansion flow is improved.
Step S214: according to the training purpose, a plurality of positive sample crowds (such as pos _ finish _0723, pos _ finish _ high _ quality _0723 and pos _ finish _ balance _0723) and a whole/negative sample crowd (such as neg _ finish _0723) accessed by the self-traveling are selected. And the highly-relevant highly-intended user data with strong subscriber line inventory is fully utilized.
Step S215: and selecting the parameters to determine a first-stage condition set. And submitting a training task, checking the state and progress of the task in a list, and taking a task result as a first-stage result set.
In one embodiment, as shown in FIG. 3, step S2 includes.
Step S221: according to the training purpose, a positive sample group (for example: pos _ finish _0723) of the custom access is selected.
Step S222: the selected data field is Jiangsu and the expansion magnitude is 10W.
Step S223: and selecting the parameters to determine a first training stage condition set. And submitting a training task, checking the state and progress of the task in a list, and taking a task result as a first-stage result set.
In one embodiment, as shown in FIG. 4, step S2 includes.
Step S231: and accessing a custom model implementation packaged by the standardized code (the model code is verified by offline butting, and the correctness of the code implementation is ensured).
Step S232: selecting an accessed custom model (such as TModel0723), inputting standard custom JSON parameters in a frame, supporting a user to use variable parameters within 20, and flexibly regulating and controlling the custom model.
Step S233: and selecting the parameters to determine a first training stage condition set. And submitting the training tasks, checking the state and the progress of the tasks in the list, and taking the task results as a first training stage result set.
In one embodiment, as shown in FIG. 5, step S3 includes.
Step S241: and clicking to check and take data according to the result set in the first stage, and displaying a relation graph of the estimated large disk coverage number (without duplication) and the number quantiles of the tag number with the precision of 0.1% of the TGI index of the tag in the specified dimension on the page.
In order to avoid the influence of extreme values on the image, the label scattered point data covered by only the positive sample is locally subjected to regression scattered point smoothing processing and noise filling (the unique TGI value of the comparison sample is set as-200)
The image after special processing of abnormal data has better robustness.
The bitmap takes 0.1% of label people number as an abscissa and estimates the number of large-plate covered people (without duplication removal) as an ordinate, the image generally shows a descending trend that the TGI value is suddenly reduced between 5% and 50% of the quanta from left to right until the TGI value is stabilized on noise point data of-200, the accumulated number of covered people curve is gradually increased, an intersected balance point is arranged between 10% and 30% of the quanta of the TGI value curve, and the accumulated number of covered people gradually increases to the saturation number after the TGI value curve suddenly ascends and becomes gentle.
The image can intuitively provide the distribution situation of the interest bias of the whole crowd, support the flexible adjustment of the people number percentage data interval with the precision of 0.1 percent, and further assist in analyzing the local change situation of the image.
In addition, the bitmap below will show sample data for TOP100 specifying the tag granularity, and the list field contains tag level five (four, three, two, one), number of positive samples, number of comparison samples, number of large disks, TGI value.
Step S242: according to the number of the line of the bitmap and the field of the list, the range or the keyword screening of the field can be carried out.
As shown in fig. 6, when the target population demand is described as: the number of positive samples was greater than 100, the four-level label contained "legends", and the population biased only toward the TOP50 label of the positive samples. The corresponding screening rules are as follows.
Step S2421: if the number of the positive samples is more than 100, the minimum number of the positive samples is 100 corresponding to the content of the filter box.
Step S2422: the four-level label contains the description of the legend, and the corresponding content of the filling screening box is the keyword of the four-level label which is the legend.
Step S2423: only biased towards the positive sample description, corresponding to filling out the content of the filter box, the TGI value has a minimum value of 100.
Step S2424: the description of the population of TOP50 tags corresponds to the number of tags that were limited to 50 when the content of the filter box was filled.
Step S2425: click on the selection rule, show [ selection rule: TGI values equal to or greater than 100, four grades comprising "legend", TGI score TOP50, 23 tags with positive sample population equal to or greater than 100 ]. Similarly, the above conditions may be selected as exclusion conditions, which indicate that the total population excludes people having more than 100 positive samples, whose four-level label contains "legends", and who are biased only toward the TOP50 label of the positive sample.
And determining a second verification stage condition set by selecting the conditions and excluding the conditions.
In one embodiment, as shown in FIG. 7, step S4 includes.
Step S251: and based on the result set of the second verification stage, switching to the next page to display the distribution chart of the natural attributes, the business attributes, the equipment attributes and the industry attributes of the crowd in the result set.
Wherein, the natural attribute comprises: gender, age, province, city level.
Wherein the business attributes include: consumption ability, occupation, risk rating, marital status.
Wherein the device attributes include: equipment brand, equipment model, operator, equipment price, equipment system.
Wherein the industry attributes include: a bank outlet.
Step S252: in the third integration stage, the above attributes can be superimposed according to the business requirements, and the targeted crowd can be selected.
As shown in fig. 8, when marketing game products, a student group using a middle-high-end android mobile phone in a two-three-wire city needs to be located, and the following screening can be performed.
Step S2521: and (3) description of the two-line city and the three-line city, wherein the two-line city and the three-line city under the city hierarchy classification are correspondingly selected and are in an OR relationship.
Step S2522: and (4) correspondingly selecting 3000 yuan-5000 yuan and more than 5000 yuan under the price of the equipment by using the description of the high-end mobile phone, wherein the two are in a relation of being equal to or, and the step S2521 is in a positive relation.
Step S2523: using the description of the android phone, corresponding to the relationship that the device system is android and step S2522 is yes.
Step S2524: a description of the student population corresponding to the selection of profession as students, is yes in step S2523.
Step S2525: and for the constraints of the student population, correspondingly selecting the relationship of age segmentation of 16-18 years, 19-25 years and 26-30 years, or the relationship of the three and the step S2525 is yes.
Step S2526: the rules are selected in the page, and the intersection relation is assembled according to expected logic, so that the target crowd can be screened out. In addition, if there is a sensitive area or an attribute clearly requiring exclusion, the exclusion condition may be included after selection.
Step S253: the attributes support the combination of poor intersection, and the number of the coverage of the corresponding combination condition in the large plate can be displayed after all conditions are selected, so that an auxiliary reference is provided for a user.
And determining a third integration phase condition set by selecting the conditions and excluding the conditions.
After the three-stage condition set is determined, relevant condition association logic is stored and is solidified into a rule set.
In one embodiment, step S5 includes.
For the newly added rule set, as shown in fig. 9, a small batch of historical operation data verification may be performed.
Step S261: and selecting a rule set and clicking a callback function.
Step S262: filling form data: data field, service type, historical time interval.
Step S263: and confirming the form parameters and submitting a retest task.
Step S264: after the task is completed, the retest records in the rule set operation are selected, and then the statistical result can be displayed.
Step S265: and if the retest result is in accordance with expectation, carrying out periodic extraction application.
Step S266: and if the regression result is not in accordance with the expectation, the optimization is continued by the three stages of regression again.
The back measurement statistical results include: the method comprises the following steps of training stage historical data starting time T1, training stage historical data ending time T2, touch user success number U1, touch user success rate U1R, user click number C1, user click rate C1R, user download number D1, user download rate D1R, number of covered users R1, number of intersected users of R1U1, number of intersected users of R1C1, number of intersected users of R1C1, touch achievement power TUR of large-disk users, click rate TCR of large-disk users and download rate TDR of large-disk users in a rule set time interval.
The indexes reflect the historical operation effect of the current rule set in comparison with the comprehensive operation effect of a large disk in a specified time period.
To a certain extent, the historical operation effect of the rule set is provided for the user, and the potential and quality of the user encircled by the rule set can be judged by the user.
If the above backtesting data is in accordance with the expectation, the rule set can be periodically operated and used.
As shown in fig. 10, the fetch type one is as follows.
Step S26511: and selecting a time interval operated by historical data.
Step S26512: selecting a single instant access.
Step S26513: and submitting the data acquisition task, and quickly outputting the target crowd according to the rule set for delivery and use.
As shown in fig. 11, the fetch type two is as follows.
Step S26521: and selecting a time interval operated by historical data.
Step S26522: selecting appointed date and fixed time, and taking number at a single time.
Step S26523: and submitting the data acquisition task, and generating the target crowd according to the rule set on time for delivery and use.
As shown in fig. 12, the fetch type three is as follows.
Step S26531: and selecting a time interval operated by historical data.
Step S26532: and selecting a periodic time period (Monday to Sunday) and a fixed time, and periodically taking the number at regular time.
Step S26533: the daily production requirement is met, and the target crowd is produced according to the rule set on time by day and delivered for use.
In the production of a user operation platform, the functions of target sample crowd access, a data mining model, a three-stage model optimization process, historical operation data back test and periodic target crowd extraction are provided for a user, so that the automation of a model training and rule crowd extraction process, flexible screening of three-stage rules and image-assisted crowd tendency detection are realized, and the daily repetitive labor of operation is greatly reduced.
The three-stage crowd mining condition optimization method is approved by a large number of professional operators and applied to daily stock user operation management activities.
Through the optimization of the return test certificate and the three-stage model, the operation test cost is greatly reduced, the operation and innovation effect is remarkably improved, and the operation risk is effectively avoided.
In the first training stage, multiple models are opened for users to be selected, dynamic adjustment of model parameters is supported, an access self-owned model is supported, and the problems that a large number of data mining platform model parameters are not opened enough and cannot be compatible with self-owned model deployment and the like are solved greatly.
In the second verification stage, a TGI index 0.1% precision tag person number distribution chart of the five-level classification tag is displayed in a fine-grained mode, Chinese classification screening is used, and precise screening is carried out by combining the accumulated number of covered persons, the number of inverted TGI indexes, the natural attributes of people, the commercial attributes, the equipment attributes and the industry attributes. Compared with the mode that a plurality of data management platforms in the market only extract crowds through the first and second comprehensive large categories, the scheme relies on three-stage progressive fine-grained label screening, greatly solves the problem that the fine-grained screening support of a data mining crowd result set is insufficient, realizes directional crowd mining, and guides users to learn and master the optimization thinking of a conventional model.
In the third integration stage, compared with the situation that a plurality of data management platforms on the market provide label combination blind selection and the small-batch population test effect is unpredictable in the initial experimental stage, the static label distribution statistical value of the relevant population after the first and second stages of filtering is opened for the user, the user can be helped to carry out further directional screening on the result set, and the situations that the result set is not uniformly distributed and the effect is unstable due to incomplete label coverage and subjective blind selection are greatly avoided.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The present invention also includes the computer itself when programmed with the three-stage crowd sourcing condition optimization method and technique described in the present invention.
A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
Finally, the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the present invention, and although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all of them should be covered by the claims of the present invention.

Claims (9)

1. A three-stage crowd mining condition optimization method is characterized by comprising the following steps: the method comprises the following steps:
s1: acquiring audience sample crowd raw data; acquiring five-level classification labels related to audience sample crowd interest attributes, and natural attributes, business attributes, equipment attributes and industry attributes;
s2: in the first training stage, a multi-class quantitative correlation data mining model provided by a platform is automatically accessed or selected, a training task is submitted according to a customized model or model parameters, and a target crowd result set in the first stage is obtained;
s3: in the second verification stage, according to the five-level classification and TGI index statistics of the coarse-grained related population interest tags in the first stage, a second-stage auxiliary tendency interest tag screening is carried out by combining a TGI index 0.1% precision tag population bitmap after edge data smoothing;
s4: in the third integration stage, forward or reverse screening conditions are supplemented from the attribute matching condition tree according to the coverage statistical result of the natural attribute, the business attribute, the equipment attribute and the industry attribute label dimension; the conditions support the intersection and difference combination logic, assist the user to specify the characteristic parameters of the target crowd and converge the crowd result set;
s5: according to the three-stage optimization of S2-S4, a crowd mining model is finally selected, and a relatively stable target crowd characteristic parameter combination set, namely a target crowd rule set, is generated;
s6: carrying out periodic target crowd operation according to the target crowd rule set;
s7: in the initial experiment stage, the user is supported to select historical operation data, and the regression test of the operation effect of the target population is carried out to verify the target population rule set generated in the three stages.
2. The three-stage crowd sourcing condition optimization method of claim 1, wherein: in the step S2, the page is accessed to the locally uploaded targeted audience sample crowd package;
and (3) selecting a target crowd data mining model provided by the platform autonomously, setting a model data time interval, a model data domain and a model label granularity grade, training positive sample crowds, training negative sample crowds or training overall sample crowds and model internal parameters from one grade to five grades, and submitting a training mining task.
3. The three-stage crowd sourcing condition optimization method of claim 1, wherein: in the S2, setting a model data field, a target expansion magnitude and training a positive sample crowd, submitting an intelligent crowd expansion task without selecting a designated model;
and calculating a model with the optimal historical operation backtesting effect in the self-correlation model according to the label list covered by the training positive sample crowd, and mining the fixed-magnitude crowd of the specified positive sample.
4. The three-stage crowd sourcing condition optimization method of claim 1, wherein: in the step S2, a custom model is selected, a standard model code packet is provided, custom model parameters are input, a training sample is selected, and a training excavation task is submitted.
5. The three-stage crowd sourcing condition optimizing method of claim 2, 3 or 4, wherein: displaying the relevance index bitmap and the data sample of the target crowd information to a user according to the first-stage target crowd result set;
on one hand, the user can screen interest categories through displayed Chinese classification, so that the user is assisted in removing remarkable noise data and abnormal data, and the effective crowd range is reduced;
on the other hand, the user can check the TGI of the designated sub-position and the index value of the number of the large-disk covered people through the TGI index 0.1% precision tag people sub-position chart, select the TGI index value, combine the conditions of the number of the large-disk covered people and the tag keywords to screen the sample, and check the number of the tags covered by the current feature combination in real time.
6. The three-stage crowd sourcing condition optimization method of claim 5, wherein: the filtering of the interest categories comprises gender, age, price, mobile phone brand, mobile phone model, province of the city, city level, consumption capacity, occupation, marital status, risk level and industry attributes.
7. The three-stage crowd sourcing condition optimization method of claim 6, wherein: selecting historical operation data of designated time, business types and data domains at an initial experiment stage according to the target population rule set, carrying out index statistics of the achievement triggering number, the click number and the download number of the historical operation of the target population, and providing the operation effect of the large disk as reference comparison;
and in the middle and later period stabilization stage, selecting a relative time interval, a business type, a data field and an extraction period, and performing normalized extraction on the target population.
8. A computer system comprising a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein: the processor, when executing the computer program, implements the method of any of claims 2 or 3 or 4 or 6 or 7.
9. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the method of any one of claims 2 or 3 or 4 or 6 or 7.
CN202110914921.9A 2021-08-10 2021-08-10 Three-stage crowd mining condition optimization method and system Pending CN113590692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110914921.9A CN113590692A (en) 2021-08-10 2021-08-10 Three-stage crowd mining condition optimization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110914921.9A CN113590692A (en) 2021-08-10 2021-08-10 Three-stage crowd mining condition optimization method and system

Publications (1)

Publication Number Publication Date
CN113590692A true CN113590692A (en) 2021-11-02

Family

ID=78256888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110914921.9A Pending CN113590692A (en) 2021-08-10 2021-08-10 Three-stage crowd mining condition optimization method and system

Country Status (1)

Country Link
CN (1) CN113590692A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591969A (en) * 2024-01-18 2024-02-23 知呱呱(天津)大数据技术有限公司 Rule checking method and system based on IPC label co-occurrence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591969A (en) * 2024-01-18 2024-02-23 知呱呱(天津)大数据技术有限公司 Rule checking method and system based on IPC label co-occurrence
CN117591969B (en) * 2024-01-18 2024-04-05 北京知呱呱科技有限公司 Rule checking method and system based on IPC label co-occurrence

Similar Documents

Publication Publication Date Title
US20210241860A1 (en) Trial design platform
US20150120263A1 (en) Computer-Implemented Systems and Methods for Testing Large Scale Automatic Forecast Combinations
US20210241866A1 (en) Interactive trial design platform
CN110163723A (en) Recommended method, device, computer equipment and storage medium based on product feature
US20100205039A1 (en) Demand forecasting
US20110106723A1 (en) Computer-Implemented Systems And Methods For Scenario Analysis
CN101777147A (en) Forecast modeling
US20210319158A1 (en) Methods and system for reducing computational complexity of clinical trial design simulations
WO2017079824A1 (en) Markov decision process-based decision support tool for financial planning, budgeting, and forecasting
US20220382935A1 (en) Filtering designs using boundaries derived from optimal designs
US20220375551A1 (en) Systems and methods for clinician interface
CN102117464B (en) Apparatus, system and relevant method for marketing investment optimizer with dynamic hierarchies
CN111861605A (en) Business object recommendation method
US20140019207A1 (en) Interactive in-memory based sales forecasting
CN113590692A (en) Three-stage crowd mining condition optimization method and system
US20210090101A1 (en) Systems and methods for business analytics model scoring and selection
US20220374558A1 (en) Systems and methods for trade-off visual analysis
WO2021192232A1 (en) Article recommendation system, article recommendation device, article recommendation method, and recording medium storing article recommendation program
CA3160715A1 (en) Systems and methods for business analytics model scoring and selection
US20200342302A1 (en) Cognitive forecasting
WO2022239181A1 (en) Customer classification device, customer classification system, customer classification method, and recording medium storing customer classification program
US7660735B1 (en) Method and system for creation of consumer segmentations using maturation and exogenous curves
Paul et al. Preparing and Mining Data with Microsoft SQL Server 2000 and Analysis Services
WO2021138216A1 (en) Systems and methods for business analytics model scoring and selection
CN116127189A (en) User operation method, device, equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination