CN107067144A - A kind of method and device for excavating operation base dimension - Google Patents
A kind of method and device for excavating operation base dimension Download PDFInfo
- Publication number
- CN107067144A CN107067144A CN201611263960.2A CN201611263960A CN107067144A CN 107067144 A CN107067144 A CN 107067144A CN 201611263960 A CN201611263960 A CN 201611263960A CN 107067144 A CN107067144 A CN 107067144A
- Authority
- CN
- China
- Prior art keywords
- dimension
- data
- base dimension
- base
- business
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 37
- 239000013598 vector Substances 0.000 claims abstract description 37
- 239000011159 matrix material Substances 0.000 claims abstract description 24
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 14
- 238000010276 construction Methods 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 8
- 230000000717 retained effect Effects 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 7
- 239000000203 mixture Substances 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 230000014759 maintenance of location Effects 0.000 description 6
- 230000009466 transformation Effects 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 235000021251 pulses Nutrition 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 235000010627 Phaseolus vulgaris Nutrition 0.000 description 1
- 244000046052 Phaseolus vulgaris Species 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000001556 precipitation Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000007711 solidification Methods 0.000 description 1
- 230000008023 solidification Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 238000005303 weighing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Educational Administration (AREA)
- Economics (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of method and device for excavating operation base dimension.This method includes:According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;Calculate the covariance matrix XX of the sample PT;To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.Using the present invention, it can preparatively differentiate base dimension by analyzing each incidence relation for runing dimension, instruct business decision.
Description
Technical field
The present invention relates to data mining technology field, more particularly, to a kind of method and dress for excavating operation base dimension
Put.
Background technology
Product operation is built from content, and user safeguards, three aspects such as activity planning come management product content and user;Fortune
Battalion is the key of product sustainable and healthy development.In the epoch of " flow is king ", in order to strive for flow, the canal of operation to greatest extent
Road and mode are continuously increased, and the lean operation for different scenes and different user attribute is important all the more.Specifically, excavate hidden
The information ensconced in mass data, user property and scene characteristic are portrayed and to all types of user group's customized marketing plan using data
Slightly, flow operation, user's operation, product operation and growth and retention in content operation can effectively be solved the problems, such as.In digitization
In operation, observable statistical dimension is a lot, such as the PV (pageview), UV (visitor's number), page clicking rate etc. of product.These dimensions
Relation is complicated between degree, there is substantial amounts of information redundancy and overlapping phenomenon.For example in live field, dimension " adds up for nearest 3 days
There is the association of forward direction in recharge amount ", and " nearest 7 days accumulated recharge amounts ";I.e. when " nearest 3 days accumulated recharge amounts " is high
When, index of correlation " nearest 7 days accumulated recharge amounts " is general also high.In other words, the information content between dimension exist it is overlapping, one
Individual dimension can linearly be calculated by other relevant dimensions to a certain extent.Similarly, dimension " nearest 3 days accumulative battalion
Receipts " exist with " adding up live duration within nearest 3 days " to be associated, i.e., the live time is longer, and business revenue volume is bigger.These are huge and complicated
Run the trap that dimension easily allows operation personnel to fall into information overload, it is difficult to which the situation of accurate product of feeling the pulse simultaneously makes suitable determine
Plan.How a small amount of key dimension, i.e. base dimension are found out from these operation dimensions, be a technological difficulties.In face of various
Dimension is runed, the method for artificial screening is hard to work.According to known document, there is presently no run base dimension to automatic identification
Research and method.
Conventional method is typically using artificial method screening key dimension, for example in live field, " nearest 7 days accumulative
Business revenue ", " average PCU " is used as key dimension within nearest 7 days.However, the dimension of these artificial screenings can not be complete portray production
Whole states of product.For example not only existence information redundancy between " nearest 7 days accumulative business revenues " and " nearest 3 days accumulative business revenues ",
Have differences, i.e., the information content contained by " nearest 3 days accumulative business revenues " can not be completely covered in " nearest 7 days accumulative business revenues ", can not
Replace the numerical statistic characteristic of " nearest 3 days accumulative business revenues " this dimension.On the one hand, simply by artificial method from 100
10 dimensions are selected out in individual operation dimension as key dimension, the problem of existence information is lost.In other words, conventional method is not
Base dimension can accurately be judged.On the other hand, artificial method is subjective, and workload is big, and regulative mode is difficult to solidification precipitation.
The content of the invention
In view of the above problems, the present invention proposes a kind of method and device for excavating operation base dimension, can pass through analysis
The incidence relation of each operation dimension, differentiates base dimension, instructs business decision exactly.
A kind of method for excavating operation base dimension is provided in the embodiment of the present invention, including:
According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
Calculate the covariance matrix XX of the sample PT;
To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
Preferably, the step of dimension constructed according to the corresponding characteristic vector of one of characteristic value is determined as base dimension it
Afterwards, in addition to:
According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.
Preferably, according to the operation dimension status data of collection, before the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the server end of business platform, the dimension data plays number comprising business
According to, among business business revenue data, business interaction data at least one of.
Preferably, according to the operation dimension status data of collection, before the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the client of user, the dimension data includes user's viewing data, user
At least one of among alive data, user's retained data.
Preferably, after the step of differentiating the base dimension of numerical digit before ranking, including:
Importance threshold value is added up according to default characteristic value, the base dimension of numerical digit before at least one ranking is obtained and constitutes
Base dimension collection.
Correspondingly, the embodiments of the invention provide a kind of device for excavating operation base dimension, including:
Sample construction unit, for the operation dimension data according to collection, builds sample set P={ x1, x2 ..., xi };
Spatial transform unit, the covariance matrix XX for calculating the sample PT;
Feature decomposition unit, for the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension judgement unit, the dimension for being constructed according to the corresponding characteristic vector of one of characteristic value is determined as base
Dimension.
Preferably, in addition to:
Base dimension sequencing unit, for the sequence according to characteristic value, differentiates the base dimension of numerical digit before ranking.
Preferably, in addition to:
Business dimension unit, the dimension data for collecting subject of operation from the server end of business platform, the number of dimensions
According to include among business played data, business business revenue data, business interaction data at least one of.
Preferably, in addition to:
User's dimension units, the dimension data for collecting subject of operation from the client of user, dimension data is included
User's viewing data, user's alive data, among user's retained data at least one of.
Preferably, the base dimension sequencing unit, including:
Base dimension collector unit, for adding up importance threshold value according to default characteristic value, is obtained by least one ranking
The base dimension collection of the base dimension composition of preceding numerical digit.
The present invention proposes the scheme that a kind of automatic mining runs base dimension.First, according to the operation dimension data of collection,
Sample set P={ x1, x2 ..., xi } is built, relative to prior art, for the sample set of structure, user need not consider each
Information overlap or information redundancy between sample, are screened or are divided to the sample of sample set without by artificial or machine
Class.But, calculate the covariance matrix XX of the sample PT, by the spatial alternation of covariance, analyze the association between dimension
Relation and information redundancy situation.Then, to the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve, constructs automatically
Base dimension.The information allowed between it not redundancy, comprehensive can but portray the state of product, can represent complete with a small amount of base dimension
The information content of amount operation dimension.Finally, the dimension constructed according to the corresponding characteristic vector of one of characteristic value is determined as Ji Wei
Degree.Such scheme, simple and fast can differentiate base dimension exactly, instruct by analyzing each incidence relation for runing dimension
Business decision.Specifically, for live broadcast service, it can help to excavate and potentially net red main broadcaster, evaluation and test high-quality main broadcaster etc..Enter
One step, allow operator more to understand each operation indicator in depth, including sort out to index, the essence of cognition influence main broadcaster's ranking
Reason etc., instructs business decision.
The additional aspect of the present invention and advantage will be set forth in part in the description, and these will become from the following description
Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those skilled in the art, on the premise of not paying creative work, it can also be obtained according to these accompanying drawings other attached
Figure.
Fig. 1 is a kind of flow chart for the method for excavating operation base dimension of the present invention.
Fig. 2 is a kind of embodiment flow chart for the method for excavating operation base dimension of the present invention.
Fig. 3 collects schematic diagram for the dimension data of the embodiment of the present invention.
Fig. 4 for the embodiment of the present invention sample space representation into vector schematic diagram.
Fig. 5 is a kind of schematic diagram for the device for excavating operation base dimension of the present invention.
Fig. 6 is a kind of embodiment schematic diagram for the device for excavating operation base dimension of the present invention.
Embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described.
In some flows of description in description and claims of this specification and above-mentioned accompanying drawing, contain according to
Particular order occur multiple operations, but it should be clearly understood that these operation can not herein occur according to it is suitable
Sequence is performed or performed parallel, and the sequence number such as 101,102 etc. of operation is only used for distinguishing each different operation, sequence number
Any execution sequence is not represented for itself.In addition, these flows can include more or less operations, and these operations can
To perform or perform parallel in order.It should be noted that the description such as " first ", " second " herein, is to be used to distinguish not
Same message, equipment, module etc., does not represent sequencing, it is different types also not limit " first " and " second ".
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on
Embodiment in the present invention, the every other implementation that those skilled in the art are obtained under the premise of creative work is not made
Example, belongs to the scope of protection of the invention.
Run dimension:The index of operational situation is normally managed for weighing product, the PV (pageview) of such as product, UV (is visited
Objective number), page clicking rate etc.;More than these indexs can by measurement or explicitly statistical calculation obtain, be the operation people of product
Member summarizes, analysis and evaluation product specification provide foundation.
Base dimension:This prime factor for describing product operation situation, it may be said that be the concentration of multiple operation dimensions.It is different
Information between related and information redundancy, base dimension not redundancy is there may be between operation dimension, comprehensive be able to can but be portrayed
The state of product.The inside of data being hidden in these base dimensions, difficult directly observation is obtained, and is the mother being hidden in operation dimension more
Factor.Such as in University Rank, influence most essential in the factor of ranking have two classes, including the natural science factor, and social section
The factor is learned, these are exactly base dimension;But these factors are not easy directly observation and obtained, and the dimension that can only be observed such as undergraduate course is entered a school
Average mark line, employment rate, professor's hair science and engineering class/humanity class paper amount etc..
How from observable operation dimension, thus it is speculated that and base dimension is excavated, it is the technical problem to be solved in the present invention.It is logical
The incidence relation for analyzing each operation dimension is crossed, present invention design new algorithm finds out base dimension exactly.The present invention is base dimension
Applied to live field, decision-making can be done with guide product operator, include findings that high-quality main broadcaster, evaluation and test main broadcaster's performance etc..
Fig. 1 is a kind of flow chart for the method for excavating operation base dimension of the present invention, including:
S101:According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
S102:Calculate the covariance matrix XXT of the sample P;
S103:Eigenvalues Decomposition is done to the covariance matrix XXT, characteristic value is tried to achieve;
S104:Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
The present invention proposes the scheme that a kind of automatic mining runs base dimension.First, according to the operation dimension data of collection,
Sample set P={ x1, x2 ..., xi } is built, relative to prior art, for the sample set of structure, user need not consider each
Information overlap or information redundancy between sample, are screened or are divided to the sample of sample set without by artificial or machine
Class.But, calculate the covariance matrix XX of the sample PT, by the spatial alternation of covariance, analyze the association between dimension
Relation and information redundancy situation.Then, to the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve, constructs automatically
Base dimension.The information allowed between it not redundancy, comprehensive can but portray the state of product, can represent complete with a small amount of base dimension
The information content of amount operation dimension.Finally, the dimension constructed according to the corresponding characteristic vector of one of characteristic value is determined as Ji Wei
Degree.Such scheme, simple and fast can differentiate base dimension exactly, instruct by analyzing each incidence relation for runing dimension
Business decision.Specifically, for live broadcast service, it can help to excavate and potentially net red main broadcaster etc..
Below by taking live broadcast service as an example, the construction method of base dimension is introduced.Specifically, operation dimension data is collected first,
Data generation base dimension is then based on, whole process does not need labeled data.
Fig. 2 is a kind of embodiment flow chart for the method for excavating operation base dimension of the present invention.
S201:The dimension data of subject of operation is collected from the server end of business platform, the dimension data is broadcast comprising business
Put data, business business revenue data, among business interaction data at least one of.
S202:Collect the dimension data of subject of operation from the client of user, the dimension data comprising user's viewing data,
At least one of among user's alive data, user's retained data.
S203:According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
S204:Calculate the covariance matrix XX of the sample PT;
S205:To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
S206:Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
S207:According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.
S208:Importance threshold value is added up according to default characteristic value, the base dimension of the numerical digit before at least one ranking is obtained
The base dimension collection of composition.
The present embodiment, for different subjects of operation (main broadcaster, and spectators), routinely there is following two class by taking live field as an example
Dimension data is runed, the dimension data of subject of operation (main broadcaster) is collected from the server end of business platform, from the client of user
Collect the operation dimension data of the dimension data of subject of operation (spectators), i.e. main broadcaster and the viewing dimension data of spectators, such as Fig. 3.
Fig. 3 collects schematic diagram for the dimension data of the embodiment of the present invention.Wherein the dimension data of main broadcaster passes through live platform
Server end obtain, record the global behavior of main broadcaster, including play, business revenue, interaction etc..Broadcast information collection in Fig. 3 is single
Member 101 represents broadcasting behavior dimension collector, and business revenue information acquisition unit 102 represents business revenue behavior dimension collector, and interactive
Information acquisition unit 103 represents mutual-action behavior dimension collector.The dimension data of operation is exemplified below:Business played data, such as
Nearest 3 days/7 days main broadcasters are accumulative to play play, and main broadcaster adds up playing duration within nearest 3 days/7 days;Business business revenue data, such as nearest 3
My god/the accumulative paying audience number of 7 days main broadcasters, nearest 3 days/7 days main broadcaster's paying audience number amplification, the accumulative battalion of nearest 3 days/7 days main broadcasters
Crop, main broadcaster adds up business revenue volume amplification within nearest 3 days/7 days;Business interaction data, the accumulative speech in chatroom of such as nearest 3 days/7 days
Spectators' number, the chatroom of nearest 3 days/7 days adds up speech amount etc..
The viewing dimension data of spectators is obtained by the client of user, records the viewing of spectators, active and retention situation
Etc. feature.Viewing information collecting unit 104 in Fig. 1 represents viewing behavior dimension collector, enlivens the generation of information acquisition unit 105
Table enlivens behavior dimension collector, and retention information acquisition unit 106 represents retention behavior dimension collector.The number of dimensions of operation
According to being exemplified below, user's viewing data, such as nearest 3 days/7 days spectators averagely watch duration;User's alive data, such as nearest 3 days/
7 days spectators are averagely while online number, nearest 3 days/7 days spectators are averagely while online number speedup;User's retained data, such as most
The retention spectators amount of nearly 3 days/7 days, spectators' retention ratio of nearest 3 days/7 days.
It should be added that, this programme both can only collect the dimension data of server end, analysis main broadcaster side
Base dimension, can also only collect the dimension data of client, analyze the base dimension of viewer, two ends can also be collected simultaneously
Dimension data, both interactional dimension datas of analysis.In addition, with the expansion of business, such as advertiser, content is provided
Business, the business of third party game developer is added, and this programme can also add the dimension data of other related sides, is excavated and is updated base
Dimension, instructs business decision.
Fig. 4 for the embodiment of the present invention sample space representation into vector schematic diagram.We are described with reference to Fig. 4
Case, it is assumed that have 400,000 main broadcasters, the operation dimension of each main broadcaster has 1000 dimensions, according to the operation dimension data of collection, builds sample
This collection P={ x1, x2 ..., xi };The sample set of so each main broadcaster operation dimension data can be expressed as one 1000 dimension
VectorEach element numerical value of vector is exactly measured value of the corresponding main broadcaster in the dimension;Such as the 10th dimension (nearest 3
Its accumulative paying audience number) it is 120 people, then and the 10th element numerical value of the vector is 120.
By the equivalence transformation M in space, (including translation rotates, scaling;The generic operation does not result in information loss), main
Broadcast sample to be mapped in the base dimensional space of a s dimension (for example 10 dimension), i.e., each main broadcaster's sample can be expressed as one 10 dimension
VectorThis vectorial information content is equivalent to above vectorThat is MTh→w。
In fact, vectorialElement numerical value be vectorIn some elements linear weighted function, the numerical value of weighting is by becoming
M is changed to determine;It is for example vectorialThe 2nd element=0.2 × vectorThe 1st element+0.4 × vectorThe 2nd member
Element+...;The coefficient (0.2,0.4 as more than) of weighting is determined by converting M.In other words, the 1st dimension has 20% information
Have overlapping with the information content of the 2nd dimension 40%, can compress and collect a new dimension.For each lot sample notebook data,
It is unique to convert M.It was found that the key of base dimension is exactly to find conversion M by step S202.
For giving a sample point x, vector is expressed as in q dimension coordinates space { h1, h2 ..., hq }To the sample
Originally the equivalence transformation (including translation, rotate, scaling) of information content fidelity is done, the vector after conversion in new coordinate space can be with table
It is shown asFor p sample set P={ x1, x2 ..., xi };Each sample vector can be transformed into new space
A new vector.
Different conversion M, can be mapped to sample in corresponding different new coordinate space.Optimal conversion M can be sample
Originally it is mapped in the coordinate space of s Wiki dimensions { w1, w2 ..., ws }.It is orthogonal between dimension in this coordinate system, information
Not overlapping not redundancy, i.e.,And in this space, p all sample point divides as much as possible
Open, mutual discrimination is maximum;That is, in this space, p all sample point is that maximum can divide, as long as with
A small amount of dimension s, you can significantly distinguish and portray p sample point.
Calculate the covariance matrix XX of the sample PT, from mathematical statistics, sample point maximum separability is equivalent to sample
The maximum variance of point.Analyzed more than looking back, for given some sample point xi, be after transforming to new spaceSo for all p sample points, variance is
Maximum variance is sought, that is, solves following most value function, such as formula 1:
s.t.MTM=I.... formula 1
Wherein X is the corresponding matrix representation forms of p sample point vector;For most value function, ripe mathematics can be used
Method is solved;Specifically, method of Lagrange multipliers is used formula 1, and formula 1 is equivalent to solution formula 2;
XXTM=lM.... formula 2
By to covariance matrix XXTCarry out feature decomposition, the characteristic value that can be tried to achieve.According to one of characteristic value correspondence
Characteristic vector construction dimension be determined as base dimension.
According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.The characteristic value that solution for formula 2 is obtained is just
Be q dimension coordinates space { h1, h2 ..., hq } after equivalence transformation, in new coordinate space { w1, w2 ..., wq }, by letter
The sequence l of the importance of breath amount1≥l2...≥lq。
By the sequence of characteristic value, it can find out that information content is maximum and most important base dimension.Specifically, according to l1Correspondence
Characteristic vector m1(1 × q dimensions) constructs first base dimension, i.e. m1HT, the matrix that wherein H is { h1, h2 ..., hq } represents;
Similarly, according to liCorresponding characteristic vector miConstruct i-th of base dimension.
For example, it is assumed that have q=10 (i.e. original 10 operations dimension), by solving conversion M, find out sequence first place
l1Corresponding characteristic vector m1, it is for example [0.3,0.15,0.05 ..., 0.01];So new base dimension w1=0.3*h1+
0.15*h2+...+0.11*h10。
From the point of view of information content, new base dimension is overlapping equivalent in the operation dimension of output and the part that mutually covers
Extract;This is a kind of process of Information Compression.
Further, importance threshold value is added up according to default characteristic value, obtains the base of the numerical digit before at least one ranking
The base dimension collection of dimension composition.We can find out s base dimension according to the accumulative importance of characteristic value;Specifically, add up
The computational methods of importance such as formula 3,
The s base dimension that wherein threshold values t is usually set to before 0.95 or so, that is, row in the application accounts for whole data letter
The 95% of breath amount.
The present invention automatic from substantial amounts of operation dimension can have found base dimension, and these base number of dimensions are few, but value is high
And imperfectly cover the information content of the dimension of output, i.e., the comprehensive state for portraying product.The achievement has been reached the standard grade applied to straight
Operation is broadcast, the base dimension of 15 high values can be found automatically from 220 operation dimensions of main broadcaster at present.As long as operation personnel's handle
Hold this 15 base dimensions, you can the situation to live product is accurately felt the pulse, and is made suitable decision-making, is obviously improved efficiency of operation.
Further, the base dimension that the present invention exports algorithm is applied to operation project, such as finds potentiality main broadcaster's project,
The old feature of replacement project.It is the Information Compression of a large amount of operation dimensions in view of base dimension, and because number of dimensions relatively collects
In, the Sparse Problem of some project models is avoided that, theoretically can be with the performance of Improving Project.Online should by actual
With the performance of discovery project is obviously improved.
Specifically, project is found for potentiality main broadcaster, the old offline accuracy rate of model is 83%, and old mould is replaced using base dimension
The feature of type, offline accuracy rate is promoted to 90%, and amount of increase is 8.4%.The evaluating system performance of multiple months is tested using AB, wherein
A groups are potentiality main broadcaster's list that old model is generated, and B groups are new method list, and two group name odd number amounts are consistent, and statistics attracts bean vermicelli
Situation;Evaluation metricses are recognition accuracy (how many main broadcaster becomes the red big main broadcaster of networking).By tracking main broadcaster two months (2016
September And October) enliven audience conditions, at the same time on online index of number, aging method (A groups) increase by 6.4%, new method (B
Group) increase by 10.5%.
Fig. 5 is a kind of schematic diagram for the device for excavating operation base dimension of the present invention, including:
Sample construction unit, for the operation dimension data according to collection, builds sample set P={ x1, x2 ..., xi };
Spatial transform unit, the covariance matrix XXT for calculating the sample P;
Feature decomposition unit, for doing Eigenvalues Decomposition to the covariance matrix XXT, tries to achieve characteristic value;
Base dimension judgement unit, the dimension for being constructed according to the corresponding characteristic vector of one of characteristic value is determined as base
Dimension.
Fig. 5 is corresponding with Fig. 1, and the method for operation of unit is identical with method in figure.
Fig. 6 is a kind of embodiment schematic diagram for the device for excavating operation base dimension of the present invention.
As shown in fig. 6, also including:
Base dimension sequencing unit, for the sequence according to characteristic value, differentiates the base dimension of numerical digit before ranking.
As shown in fig. 6, also including:
Business dimension unit, the dimension data for collecting subject of operation from the server end of business platform, the number of dimensions
According to include among business played data, business business revenue data, business interaction data at least one of.
As shown in fig. 6, also including:
User's dimension units, the dimension data for collecting subject of operation from the client of user, dimension data is included
User's viewing data, user's alive data, among user's retained data at least one of.
As shown in fig. 6, the base dimension sequencing unit, including:
Base dimension collector unit, for adding up importance threshold value according to default characteristic value, is obtained by least one ranking
The base dimension collection of the base dimension composition of preceding numerical digit.
Fig. 6 is corresponding with Fig. 2, and the method for operation of unit is identical with method in figure.
The present apparatus/module introduces the generating algorithm of base dimension, and thinking is as follows:
P sample is given, each sample is portrayed by q operation dimension to weigh.For each sample x, it can regard as
It is a vector of the coordinate space { h1, h2 ..., hq } of q dimension(a point vector in such as Fig. 1).In view of original
Q dimension between exist association and information redundancy, i.e., | | hi||2=1, hi Thj≠0;Wherein | | × | |2It is l2The mathematics of regularization
Symbol.
Assuming that base dimension has s dimensions, these base dimensions are the compression of original dimension high-fidelity, i.e. s<<q.For sample x, it is
Reservation original all information content (i.e. various statistical properties), can doing the equivalence transformation in space, (including translation rotates, contracting
Put;The generic operation does not result in information loss);Made conversion is denoted as M.After conversion, sample x can regard a s dimension as
A vector in the coordinate space { w1, w2 ..., ws } of base dimensionAfter the conversion, vectorInformation contained amount etc.
Valency is in vectorAssociation and information redundancy are not present between wherein s base dimension, i.e.,Wherein | |
×||2It is l2The mathematic sign of regularization.
From the above analysis, the key of base dimension construction seeks to find the method M of equivalence transformation a kind of, allow sample to
Measure x from q operation latitude coordinates space { h1, h2 ..., hq } in, be mapped to information fidelity s tie up coordinate space w1,
W2 ..., ws } in, i.e. MTh→w;Association and information redundancy are not present between new dimension, i.e.,Its
The corresponding column vectors of middle conversion M are exactly the construction method of each base dimension.
In other words, the construction method of base dimension can regard a kind of process of Information Compression as.There is information from multiple
In the operation dimension of overlapping/redundancy, extract information overlap the best part and (be referred to as a base to tie up as a new dimension
Degree), this can be regarded as does primary information compression to the dimension of redundancy.Similarly, extract second largest overlapping part and be used as the
Two new dimensions;And so on, ultimately generate s base dimension.In order to find out the information overlap between dimension, the present invention uses space
The mode of equivalence transformation.
It is apparent to those skilled in the art that, for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Embodiment described above only expresses the several embodiments of the present invention, and it describes more specific and detailed, but simultaneously
Therefore the limitation to the scope of the claims of the present invention can not be interpreted as.It should be pointed out that for one of ordinary skill in the art
For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the guarantor of the present invention
Protect scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (10)
1. a kind of method for excavating operation base dimension, it is characterised in that including:
According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
Calculate the covariance matrix XX of the sample PT;
To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
2. the method according to claim 1 for excavating operation base dimension, it is characterised in that according to one of characteristic value pair
After the step of dimension for the characteristic vector construction answered is determined as base dimension, in addition to:
According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.
3. the method according to claim 1 for excavating operation base dimension, it is characterised in that according to the operation dimension shape of collection
Before state data, the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the server end of business platform, the dimension data includes business played data, industry
At least one of among business revenue of being engaged in data, business interaction data.
4. the method according to claim 1 for excavating operation base dimension, it is characterised in that according to the operation dimension shape of collection
Before state data, the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the client of user, the dimension data is active comprising user's viewing data, user
At least one of among data, user's retained data.
5. the method according to claim 2 for excavating operation base dimension, it is characterised in that differentiate the Ji Wei of numerical digit before ranking
After the step of spending, including:
Importance threshold value is added up according to default characteristic value, the Ji Wei that the base dimension of numerical digit is constituted before at least one ranking is obtained
Degree set.
6. a kind of device for excavating operation base dimension, it is characterised in that including:
Sample construction unit, for the operation dimension data according to collection, builds sample set P={ x1, x2 ..., xi };
Spatial transform unit, the covariance matrix XX for calculating the sample PT;
Feature decomposition unit, for the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension judgement unit, the dimension for being constructed according to the corresponding characteristic vector of one of characteristic value is determined as Ji Wei
Degree.
7. the device according to claim 6 for excavating operation base dimension, it is characterised in that also include:
Base dimension sequencing unit, for the sequence according to characteristic value, differentiates the base dimension of numerical digit before ranking.
8. the device according to claim 6 for excavating operation base dimension, it is characterised in that also include:
Business dimension unit, the dimension data for collecting subject of operation from the server end of business platform, the dimension data bag
Played data containing business, business business revenue data, among business interaction data at least one of.
9. the device according to claim 6 for excavating operation base dimension, it is characterised in that also include:
User's dimension units, the dimension data for collecting subject of operation from the client of user, dimension data includes user
Watch data, user's alive data, among user's retained data at least one of.
10. the device according to claim 7 for excavating operation base dimension, it is characterised in that the base dimension sequencing unit,
Including:
Base dimension collector unit, for adding up importance threshold value according to default characteristic value, obtains the number before at least one ranking
The base dimension collection of the base dimension composition of position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611263960.2A CN107067144A (en) | 2016-12-30 | 2016-12-30 | A kind of method and device for excavating operation base dimension |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611263960.2A CN107067144A (en) | 2016-12-30 | 2016-12-30 | A kind of method and device for excavating operation base dimension |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107067144A true CN107067144A (en) | 2017-08-18 |
Family
ID=59623479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611263960.2A Pending CN107067144A (en) | 2016-12-30 | 2016-12-30 | A kind of method and device for excavating operation base dimension |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107067144A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107666615A (en) * | 2017-09-04 | 2018-02-06 | 广州虎牙信息科技有限公司 | Method for digging, device and the server of potentiality main broadcaster user |
TWI752546B (en) * | 2020-07-09 | 2022-01-11 | 多利曼股份有限公司 | Evaluation system and evaluation method |
-
2016
- 2016-12-30 CN CN201611263960.2A patent/CN107067144A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107666615A (en) * | 2017-09-04 | 2018-02-06 | 广州虎牙信息科技有限公司 | Method for digging, device and the server of potentiality main broadcaster user |
TWI752546B (en) * | 2020-07-09 | 2022-01-11 | 多利曼股份有限公司 | Evaluation system and evaluation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Polák | The productivity paradox: A meta-analysis | |
Ghorbani et al. | Chaos-based multigene genetic programming: A new hybrid strategy for river flow forecasting | |
CN108665159A (en) | A kind of methods of risk assessment, device, terminal device and storage medium | |
Papacharalampous et al. | Hydrological time series forecasting using simple combinations: Big data testing and investigations on one-year ahead river flow predictability | |
CN109118119A (en) | Air control model generating method and device | |
Wohlin et al. | An experimental evaluation of capture‐recapture in software inspections | |
Walsh et al. | Spatial weighting of land use and temporal weighting of antecedent discharge improves prediction of stream condition | |
Zhang et al. | Quantifying the accuracies of six 30-m cropland datasets over China: A comparison and evaluation analysis | |
Liu et al. | Research fronts and prevailing applications in data envelopment analysis | |
CN109559230A (en) | Bank transaction group based on overlapping community discovery algorithm finds method and system | |
CN106157616B (en) | A kind of magnitude of traffic flow short-term prediction device | |
CN107067144A (en) | A kind of method and device for excavating operation base dimension | |
Triantakonstantis et al. | A spatially heterogeneous expert based (SHEB) urban growth model using model regionalization | |
Boyce et al. | Negative binomial models for abundance estimation of multiple closed populations | |
Estacio et al. | A statistical model of land use/cover change integrating logistic and linear models: An application to agricultural abandonment | |
Jeganathan et al. | Multi-objective spatial decision model for land use planning in a tourism district of India | |
Ramanathan | A data envelopment analysis of comparative performance of schools in the Netherlands | |
Mujiono et al. | Simulation of land use change and effect on potential deforestation using Markov Chain-Cellular Automata | |
CN103164806A (en) | Deriving market intelligence from social content | |
CN113723871A (en) | Multi-source information-based current situation flood consistency processing method and system | |
Psihas et al. | CVN A Convolutional Visual Network for Identication and Reconstruction of NOvA Events | |
Breuer et al. | Spatio-Temporal Changes of Slum Populations | |
CN109635047A (en) | Information processing method, device, equipment and the readable storage medium storing program for executing of geographic grid | |
Monika et al. | Systematic literature review on an integrated Generalized Space Time Autoregressive Integrated Moving Average (GSTARIMA) Model with heteroscedastic error and Kriging method for forecasting climate | |
CN118052477B (en) | Method and system for evaluating ecological restoration effect of comprehensive land space restoration area |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20170818 Assignee: GUANGZHOU HUYA INFORMATION TECHNOLOGY Co.,Ltd. Assignor: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd. Contract record no.: 2018990000088 Denomination of invention: Method and device for mining operation base dimension License type: Common License Record date: 20180413 |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170818 |