CN107067144A - A kind of method and device for excavating operation base dimension - Google Patents

A kind of method and device for excavating operation base dimension Download PDF

Info

Publication number
CN107067144A
CN107067144A CN201611263960.2A CN201611263960A CN107067144A CN 107067144 A CN107067144 A CN 107067144A CN 201611263960 A CN201611263960 A CN 201611263960A CN 107067144 A CN107067144 A CN 107067144A
Authority
CN
China
Prior art keywords
dimension
data
base dimension
base
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611263960.2A
Other languages
Chinese (zh)
Inventor
余建兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huaduo Network Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN201611263960.2A priority Critical patent/CN107067144A/en
Publication of CN107067144A publication Critical patent/CN107067144A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Educational Administration (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of method and device for excavating operation base dimension.This method includes:According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;Calculate the covariance matrix XX of the sample PT;To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.Using the present invention, it can preparatively differentiate base dimension by analyzing each incidence relation for runing dimension, instruct business decision.

Description

A kind of method and device for excavating operation base dimension
Technical field
The present invention relates to data mining technology field, more particularly, to a kind of method and dress for excavating operation base dimension Put.
Background technology
Product operation is built from content, and user safeguards, three aspects such as activity planning come management product content and user;Fortune Battalion is the key of product sustainable and healthy development.In the epoch of " flow is king ", in order to strive for flow, the canal of operation to greatest extent Road and mode are continuously increased, and the lean operation for different scenes and different user attribute is important all the more.Specifically, excavate hidden The information ensconced in mass data, user property and scene characteristic are portrayed and to all types of user group's customized marketing plan using data Slightly, flow operation, user's operation, product operation and growth and retention in content operation can effectively be solved the problems, such as.In digitization In operation, observable statistical dimension is a lot, such as the PV (pageview), UV (visitor's number), page clicking rate etc. of product.These dimensions Relation is complicated between degree, there is substantial amounts of information redundancy and overlapping phenomenon.For example in live field, dimension " adds up for nearest 3 days There is the association of forward direction in recharge amount ", and " nearest 7 days accumulated recharge amounts ";I.e. when " nearest 3 days accumulated recharge amounts " is high When, index of correlation " nearest 7 days accumulated recharge amounts " is general also high.In other words, the information content between dimension exist it is overlapping, one Individual dimension can linearly be calculated by other relevant dimensions to a certain extent.Similarly, dimension " nearest 3 days accumulative battalion Receipts " exist with " adding up live duration within nearest 3 days " to be associated, i.e., the live time is longer, and business revenue volume is bigger.These are huge and complicated Run the trap that dimension easily allows operation personnel to fall into information overload, it is difficult to which the situation of accurate product of feeling the pulse simultaneously makes suitable determine Plan.How a small amount of key dimension, i.e. base dimension are found out from these operation dimensions, be a technological difficulties.In face of various Dimension is runed, the method for artificial screening is hard to work.According to known document, there is presently no run base dimension to automatic identification Research and method.
Conventional method is typically using artificial method screening key dimension, for example in live field, " nearest 7 days accumulative Business revenue ", " average PCU " is used as key dimension within nearest 7 days.However, the dimension of these artificial screenings can not be complete portray production Whole states of product.For example not only existence information redundancy between " nearest 7 days accumulative business revenues " and " nearest 3 days accumulative business revenues ", Have differences, i.e., the information content contained by " nearest 3 days accumulative business revenues " can not be completely covered in " nearest 7 days accumulative business revenues ", can not Replace the numerical statistic characteristic of " nearest 3 days accumulative business revenues " this dimension.On the one hand, simply by artificial method from 100 10 dimensions are selected out in individual operation dimension as key dimension, the problem of existence information is lost.In other words, conventional method is not Base dimension can accurately be judged.On the other hand, artificial method is subjective, and workload is big, and regulative mode is difficult to solidification precipitation.
The content of the invention
In view of the above problems, the present invention proposes a kind of method and device for excavating operation base dimension, can pass through analysis The incidence relation of each operation dimension, differentiates base dimension, instructs business decision exactly.
A kind of method for excavating operation base dimension is provided in the embodiment of the present invention, including:
According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
Calculate the covariance matrix XX of the sample PT
To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
Preferably, the step of dimension constructed according to the corresponding characteristic vector of one of characteristic value is determined as base dimension it Afterwards, in addition to:
According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.
Preferably, according to the operation dimension status data of collection, before the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the server end of business platform, the dimension data plays number comprising business According to, among business business revenue data, business interaction data at least one of.
Preferably, according to the operation dimension status data of collection, before the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the client of user, the dimension data includes user's viewing data, user At least one of among alive data, user's retained data.
Preferably, after the step of differentiating the base dimension of numerical digit before ranking, including:
Importance threshold value is added up according to default characteristic value, the base dimension of numerical digit before at least one ranking is obtained and constitutes Base dimension collection.
Correspondingly, the embodiments of the invention provide a kind of device for excavating operation base dimension, including:
Sample construction unit, for the operation dimension data according to collection, builds sample set P={ x1, x2 ..., xi };
Spatial transform unit, the covariance matrix XX for calculating the sample PT
Feature decomposition unit, for the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension judgement unit, the dimension for being constructed according to the corresponding characteristic vector of one of characteristic value is determined as base Dimension.
Preferably, in addition to:
Base dimension sequencing unit, for the sequence according to characteristic value, differentiates the base dimension of numerical digit before ranking.
Preferably, in addition to:
Business dimension unit, the dimension data for collecting subject of operation from the server end of business platform, the number of dimensions According to include among business played data, business business revenue data, business interaction data at least one of.
Preferably, in addition to:
User's dimension units, the dimension data for collecting subject of operation from the client of user, dimension data is included User's viewing data, user's alive data, among user's retained data at least one of.
Preferably, the base dimension sequencing unit, including:
Base dimension collector unit, for adding up importance threshold value according to default characteristic value, is obtained by least one ranking The base dimension collection of the base dimension composition of preceding numerical digit.
The present invention proposes the scheme that a kind of automatic mining runs base dimension.First, according to the operation dimension data of collection, Sample set P={ x1, x2 ..., xi } is built, relative to prior art, for the sample set of structure, user need not consider each Information overlap or information redundancy between sample, are screened or are divided to the sample of sample set without by artificial or machine Class.But, calculate the covariance matrix XX of the sample PT, by the spatial alternation of covariance, analyze the association between dimension Relation and information redundancy situation.Then, to the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve, constructs automatically Base dimension.The information allowed between it not redundancy, comprehensive can but portray the state of product, can represent complete with a small amount of base dimension The information content of amount operation dimension.Finally, the dimension constructed according to the corresponding characteristic vector of one of characteristic value is determined as Ji Wei Degree.Such scheme, simple and fast can differentiate base dimension exactly, instruct by analyzing each incidence relation for runing dimension Business decision.Specifically, for live broadcast service, it can help to excavate and potentially net red main broadcaster, evaluation and test high-quality main broadcaster etc..Enter One step, allow operator more to understand each operation indicator in depth, including sort out to index, the essence of cognition influence main broadcaster's ranking Reason etc., instructs business decision.
The additional aspect of the present invention and advantage will be set forth in part in the description, and these will become from the following description Obtain substantially, or recognized by the practice of the present invention.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, makes required in being described below to embodiment Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for For those skilled in the art, on the premise of not paying creative work, it can also be obtained according to these accompanying drawings other attached Figure.
Fig. 1 is a kind of flow chart for the method for excavating operation base dimension of the present invention.
Fig. 2 is a kind of embodiment flow chart for the method for excavating operation base dimension of the present invention.
Fig. 3 collects schematic diagram for the dimension data of the embodiment of the present invention.
Fig. 4 for the embodiment of the present invention sample space representation into vector schematic diagram.
Fig. 5 is a kind of schematic diagram for the device for excavating operation base dimension of the present invention.
Fig. 6 is a kind of embodiment schematic diagram for the device for excavating operation base dimension of the present invention.
Embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described.
In some flows of description in description and claims of this specification and above-mentioned accompanying drawing, contain according to Particular order occur multiple operations, but it should be clearly understood that these operation can not herein occur according to it is suitable Sequence is performed or performed parallel, and the sequence number such as 101,102 etc. of operation is only used for distinguishing each different operation, sequence number Any execution sequence is not represented for itself.In addition, these flows can include more or less operations, and these operations can To perform or perform parallel in order.It should be noted that the description such as " first ", " second " herein, is to be used to distinguish not Same message, equipment, module etc., does not represent sequencing, it is different types also not limit " first " and " second ".
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation is described, it is clear that described embodiment is only a part of embodiment of the invention, rather than whole embodiments.It is based on Embodiment in the present invention, the every other implementation that those skilled in the art are obtained under the premise of creative work is not made Example, belongs to the scope of protection of the invention.
Run dimension:The index of operational situation is normally managed for weighing product, the PV (pageview) of such as product, UV (is visited Objective number), page clicking rate etc.;More than these indexs can by measurement or explicitly statistical calculation obtain, be the operation people of product Member summarizes, analysis and evaluation product specification provide foundation.
Base dimension:This prime factor for describing product operation situation, it may be said that be the concentration of multiple operation dimensions.It is different Information between related and information redundancy, base dimension not redundancy is there may be between operation dimension, comprehensive be able to can but be portrayed The state of product.The inside of data being hidden in these base dimensions, difficult directly observation is obtained, and is the mother being hidden in operation dimension more Factor.Such as in University Rank, influence most essential in the factor of ranking have two classes, including the natural science factor, and social section The factor is learned, these are exactly base dimension;But these factors are not easy directly observation and obtained, and the dimension that can only be observed such as undergraduate course is entered a school Average mark line, employment rate, professor's hair science and engineering class/humanity class paper amount etc..
How from observable operation dimension, thus it is speculated that and base dimension is excavated, it is the technical problem to be solved in the present invention.It is logical The incidence relation for analyzing each operation dimension is crossed, present invention design new algorithm finds out base dimension exactly.The present invention is base dimension Applied to live field, decision-making can be done with guide product operator, include findings that high-quality main broadcaster, evaluation and test main broadcaster's performance etc..
Fig. 1 is a kind of flow chart for the method for excavating operation base dimension of the present invention, including:
S101:According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
S102:Calculate the covariance matrix XXT of the sample P;
S103:Eigenvalues Decomposition is done to the covariance matrix XXT, characteristic value is tried to achieve;
S104:Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
The present invention proposes the scheme that a kind of automatic mining runs base dimension.First, according to the operation dimension data of collection, Sample set P={ x1, x2 ..., xi } is built, relative to prior art, for the sample set of structure, user need not consider each Information overlap or information redundancy between sample, are screened or are divided to the sample of sample set without by artificial or machine Class.But, calculate the covariance matrix XX of the sample PT, by the spatial alternation of covariance, analyze the association between dimension Relation and information redundancy situation.Then, to the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve, constructs automatically Base dimension.The information allowed between it not redundancy, comprehensive can but portray the state of product, can represent complete with a small amount of base dimension The information content of amount operation dimension.Finally, the dimension constructed according to the corresponding characteristic vector of one of characteristic value is determined as Ji Wei Degree.Such scheme, simple and fast can differentiate base dimension exactly, instruct by analyzing each incidence relation for runing dimension Business decision.Specifically, for live broadcast service, it can help to excavate and potentially net red main broadcaster etc..
Below by taking live broadcast service as an example, the construction method of base dimension is introduced.Specifically, operation dimension data is collected first, Data generation base dimension is then based on, whole process does not need labeled data.
Fig. 2 is a kind of embodiment flow chart for the method for excavating operation base dimension of the present invention.
S201:The dimension data of subject of operation is collected from the server end of business platform, the dimension data is broadcast comprising business Put data, business business revenue data, among business interaction data at least one of.
S202:Collect the dimension data of subject of operation from the client of user, the dimension data comprising user's viewing data, At least one of among user's alive data, user's retained data.
S203:According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
S204:Calculate the covariance matrix XX of the sample PT
S205:To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
S206:Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
S207:According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.
S208:Importance threshold value is added up according to default characteristic value, the base dimension of the numerical digit before at least one ranking is obtained The base dimension collection of composition.
The present embodiment, for different subjects of operation (main broadcaster, and spectators), routinely there is following two class by taking live field as an example Dimension data is runed, the dimension data of subject of operation (main broadcaster) is collected from the server end of business platform, from the client of user Collect the operation dimension data of the dimension data of subject of operation (spectators), i.e. main broadcaster and the viewing dimension data of spectators, such as Fig. 3.
Fig. 3 collects schematic diagram for the dimension data of the embodiment of the present invention.Wherein the dimension data of main broadcaster passes through live platform Server end obtain, record the global behavior of main broadcaster, including play, business revenue, interaction etc..Broadcast information collection in Fig. 3 is single Member 101 represents broadcasting behavior dimension collector, and business revenue information acquisition unit 102 represents business revenue behavior dimension collector, and interactive Information acquisition unit 103 represents mutual-action behavior dimension collector.The dimension data of operation is exemplified below:Business played data, such as Nearest 3 days/7 days main broadcasters are accumulative to play play, and main broadcaster adds up playing duration within nearest 3 days/7 days;Business business revenue data, such as nearest 3 My god/the accumulative paying audience number of 7 days main broadcasters, nearest 3 days/7 days main broadcaster's paying audience number amplification, the accumulative battalion of nearest 3 days/7 days main broadcasters Crop, main broadcaster adds up business revenue volume amplification within nearest 3 days/7 days;Business interaction data, the accumulative speech in chatroom of such as nearest 3 days/7 days Spectators' number, the chatroom of nearest 3 days/7 days adds up speech amount etc..
The viewing dimension data of spectators is obtained by the client of user, records the viewing of spectators, active and retention situation Etc. feature.Viewing information collecting unit 104 in Fig. 1 represents viewing behavior dimension collector, enlivens the generation of information acquisition unit 105 Table enlivens behavior dimension collector, and retention information acquisition unit 106 represents retention behavior dimension collector.The number of dimensions of operation According to being exemplified below, user's viewing data, such as nearest 3 days/7 days spectators averagely watch duration;User's alive data, such as nearest 3 days/ 7 days spectators are averagely while online number, nearest 3 days/7 days spectators are averagely while online number speedup;User's retained data, such as most The retention spectators amount of nearly 3 days/7 days, spectators' retention ratio of nearest 3 days/7 days.
It should be added that, this programme both can only collect the dimension data of server end, analysis main broadcaster side Base dimension, can also only collect the dimension data of client, analyze the base dimension of viewer, two ends can also be collected simultaneously Dimension data, both interactional dimension datas of analysis.In addition, with the expansion of business, such as advertiser, content is provided Business, the business of third party game developer is added, and this programme can also add the dimension data of other related sides, is excavated and is updated base Dimension, instructs business decision.
Fig. 4 for the embodiment of the present invention sample space representation into vector schematic diagram.We are described with reference to Fig. 4 Case, it is assumed that have 400,000 main broadcasters, the operation dimension of each main broadcaster has 1000 dimensions, according to the operation dimension data of collection, builds sample This collection P={ x1, x2 ..., xi };The sample set of so each main broadcaster operation dimension data can be expressed as one 1000 dimension VectorEach element numerical value of vector is exactly measured value of the corresponding main broadcaster in the dimension;Such as the 10th dimension (nearest 3 Its accumulative paying audience number) it is 120 people, then and the 10th element numerical value of the vector is 120.
By the equivalence transformation M in space, (including translation rotates, scaling;The generic operation does not result in information loss), main Broadcast sample to be mapped in the base dimensional space of a s dimension (for example 10 dimension), i.e., each main broadcaster's sample can be expressed as one 10 dimension VectorThis vectorial information content is equivalent to above vectorThat is MTh→w。
In fact, vectorialElement numerical value be vectorIn some elements linear weighted function, the numerical value of weighting is by becoming M is changed to determine;It is for example vectorialThe 2nd element=0.2 × vectorThe 1st element+0.4 × vectorThe 2nd member Element+...;The coefficient (0.2,0.4 as more than) of weighting is determined by converting M.In other words, the 1st dimension has 20% information Have overlapping with the information content of the 2nd dimension 40%, can compress and collect a new dimension.For each lot sample notebook data, It is unique to convert M.It was found that the key of base dimension is exactly to find conversion M by step S202.
For giving a sample point x, vector is expressed as in q dimension coordinates space { h1, h2 ..., hq }To the sample Originally the equivalence transformation (including translation, rotate, scaling) of information content fidelity is done, the vector after conversion in new coordinate space can be with table It is shown asFor p sample set P={ x1, x2 ..., xi };Each sample vector can be transformed into new space A new vector.
Different conversion M, can be mapped to sample in corresponding different new coordinate space.Optimal conversion M can be sample Originally it is mapped in the coordinate space of s Wiki dimensions { w1, w2 ..., ws }.It is orthogonal between dimension in this coordinate system, information Not overlapping not redundancy, i.e.,And in this space, p all sample point divides as much as possible Open, mutual discrimination is maximum;That is, in this space, p all sample point is that maximum can divide, as long as with A small amount of dimension s, you can significantly distinguish and portray p sample point.
Calculate the covariance matrix XX of the sample PT, from mathematical statistics, sample point maximum separability is equivalent to sample The maximum variance of point.Analyzed more than looking back, for given some sample point xi, be after transforming to new spaceSo for all p sample points, variance is
Maximum variance is sought, that is, solves following most value function, such as formula 1:
s.t.MTM=I.... formula 1
Wherein X is the corresponding matrix representation forms of p sample point vector;For most value function, ripe mathematics can be used Method is solved;Specifically, method of Lagrange multipliers is used formula 1, and formula 1 is equivalent to solution formula 2;
XXTM=lM.... formula 2
By to covariance matrix XXTCarry out feature decomposition, the characteristic value that can be tried to achieve.According to one of characteristic value correspondence Characteristic vector construction dimension be determined as base dimension.
According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.The characteristic value that solution for formula 2 is obtained is just Be q dimension coordinates space { h1, h2 ..., hq } after equivalence transformation, in new coordinate space { w1, w2 ..., wq }, by letter The sequence l of the importance of breath amount1≥l2...≥lq
By the sequence of characteristic value, it can find out that information content is maximum and most important base dimension.Specifically, according to l1Correspondence Characteristic vector m1(1 × q dimensions) constructs first base dimension, i.e. m1HT, the matrix that wherein H is { h1, h2 ..., hq } represents; Similarly, according to liCorresponding characteristic vector miConstruct i-th of base dimension.
For example, it is assumed that have q=10 (i.e. original 10 operations dimension), by solving conversion M, find out sequence first place l1Corresponding characteristic vector m1, it is for example [0.3,0.15,0.05 ..., 0.01];So new base dimension w1=0.3*h1+ 0.15*h2+...+0.11*h10。
From the point of view of information content, new base dimension is overlapping equivalent in the operation dimension of output and the part that mutually covers Extract;This is a kind of process of Information Compression.
Further, importance threshold value is added up according to default characteristic value, obtains the base of the numerical digit before at least one ranking The base dimension collection of dimension composition.We can find out s base dimension according to the accumulative importance of characteristic value;Specifically, add up The computational methods of importance such as formula 3,
The s base dimension that wherein threshold values t is usually set to before 0.95 or so, that is, row in the application accounts for whole data letter The 95% of breath amount.
The present invention automatic from substantial amounts of operation dimension can have found base dimension, and these base number of dimensions are few, but value is high And imperfectly cover the information content of the dimension of output, i.e., the comprehensive state for portraying product.The achievement has been reached the standard grade applied to straight Operation is broadcast, the base dimension of 15 high values can be found automatically from 220 operation dimensions of main broadcaster at present.As long as operation personnel's handle Hold this 15 base dimensions, you can the situation to live product is accurately felt the pulse, and is made suitable decision-making, is obviously improved efficiency of operation.
Further, the base dimension that the present invention exports algorithm is applied to operation project, such as finds potentiality main broadcaster's project, The old feature of replacement project.It is the Information Compression of a large amount of operation dimensions in view of base dimension, and because number of dimensions relatively collects In, the Sparse Problem of some project models is avoided that, theoretically can be with the performance of Improving Project.Online should by actual With the performance of discovery project is obviously improved.
Specifically, project is found for potentiality main broadcaster, the old offline accuracy rate of model is 83%, and old mould is replaced using base dimension The feature of type, offline accuracy rate is promoted to 90%, and amount of increase is 8.4%.The evaluating system performance of multiple months is tested using AB, wherein A groups are potentiality main broadcaster's list that old model is generated, and B groups are new method list, and two group name odd number amounts are consistent, and statistics attracts bean vermicelli Situation;Evaluation metricses are recognition accuracy (how many main broadcaster becomes the red big main broadcaster of networking).By tracking main broadcaster two months (2016 September And October) enliven audience conditions, at the same time on online index of number, aging method (A groups) increase by 6.4%, new method (B Group) increase by 10.5%.
Fig. 5 is a kind of schematic diagram for the device for excavating operation base dimension of the present invention, including:
Sample construction unit, for the operation dimension data according to collection, builds sample set P={ x1, x2 ..., xi };
Spatial transform unit, the covariance matrix XXT for calculating the sample P;
Feature decomposition unit, for doing Eigenvalues Decomposition to the covariance matrix XXT, tries to achieve characteristic value;
Base dimension judgement unit, the dimension for being constructed according to the corresponding characteristic vector of one of characteristic value is determined as base Dimension.
Fig. 5 is corresponding with Fig. 1, and the method for operation of unit is identical with method in figure.
Fig. 6 is a kind of embodiment schematic diagram for the device for excavating operation base dimension of the present invention.
As shown in fig. 6, also including:
Base dimension sequencing unit, for the sequence according to characteristic value, differentiates the base dimension of numerical digit before ranking.
As shown in fig. 6, also including:
Business dimension unit, the dimension data for collecting subject of operation from the server end of business platform, the number of dimensions According to include among business played data, business business revenue data, business interaction data at least one of.
As shown in fig. 6, also including:
User's dimension units, the dimension data for collecting subject of operation from the client of user, dimension data is included User's viewing data, user's alive data, among user's retained data at least one of.
As shown in fig. 6, the base dimension sequencing unit, including:
Base dimension collector unit, for adding up importance threshold value according to default characteristic value, is obtained by least one ranking The base dimension collection of the base dimension composition of preceding numerical digit.
Fig. 6 is corresponding with Fig. 2, and the method for operation of unit is identical with method in figure.
The present apparatus/module introduces the generating algorithm of base dimension, and thinking is as follows:
P sample is given, each sample is portrayed by q operation dimension to weigh.For each sample x, it can regard as It is a vector of the coordinate space { h1, h2 ..., hq } of q dimension(a point vector in such as Fig. 1).In view of original Q dimension between exist association and information redundancy, i.e., | | hi||2=1, hi Thj≠0;Wherein | | × | |2It is l2The mathematics of regularization Symbol.
Assuming that base dimension has s dimensions, these base dimensions are the compression of original dimension high-fidelity, i.e. s<<q.For sample x, it is Reservation original all information content (i.e. various statistical properties), can doing the equivalence transformation in space, (including translation rotates, contracting Put;The generic operation does not result in information loss);Made conversion is denoted as M.After conversion, sample x can regard a s dimension as A vector in the coordinate space { w1, w2 ..., ws } of base dimensionAfter the conversion, vectorInformation contained amount etc. Valency is in vectorAssociation and information redundancy are not present between wherein s base dimension, i.e.,Wherein | | ×||2It is l2The mathematic sign of regularization.
From the above analysis, the key of base dimension construction seeks to find the method M of equivalence transformation a kind of, allow sample to Measure x from q operation latitude coordinates space { h1, h2 ..., hq } in, be mapped to information fidelity s tie up coordinate space w1, W2 ..., ws } in, i.e. MTh→w;Association and information redundancy are not present between new dimension, i.e.,Its The corresponding column vectors of middle conversion M are exactly the construction method of each base dimension.
In other words, the construction method of base dimension can regard a kind of process of Information Compression as.There is information from multiple In the operation dimension of overlapping/redundancy, extract information overlap the best part and (be referred to as a base to tie up as a new dimension Degree), this can be regarded as does primary information compression to the dimension of redundancy.Similarly, extract second largest overlapping part and be used as the Two new dimensions;And so on, ultimately generate s base dimension.In order to find out the information overlap between dimension, the present invention uses space The mode of equivalence transformation.
It is apparent to those skilled in the art that, for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, may be referred to the corresponding process in preceding method embodiment, will not be repeated here.
Embodiment described above only expresses the several embodiments of the present invention, and it describes more specific and detailed, but simultaneously Therefore the limitation to the scope of the claims of the present invention can not be interpreted as.It should be pointed out that for one of ordinary skill in the art For, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the guarantor of the present invention Protect scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

1. a kind of method for excavating operation base dimension, it is characterised in that including:
According to the operation dimension data of collection, sample set P={ x1, x2 ..., xi } is built;
Calculate the covariance matrix XX of the sample PT
To the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension is determined as according to the dimension that the corresponding characteristic vector of one of characteristic value is constructed.
2. the method according to claim 1 for excavating operation base dimension, it is characterised in that according to one of characteristic value pair After the step of dimension for the characteristic vector construction answered is determined as base dimension, in addition to:
According to the sequence of characteristic value, the base dimension of numerical digit before ranking is differentiated.
3. the method according to claim 1 for excavating operation base dimension, it is characterised in that according to the operation dimension shape of collection Before state data, the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the server end of business platform, the dimension data includes business played data, industry At least one of among business revenue of being engaged in data, business interaction data.
4. the method according to claim 1 for excavating operation base dimension, it is characterised in that according to the operation dimension shape of collection Before state data, the step of building sample set, in addition to:
The dimension data of subject of operation is collected from the client of user, the dimension data is active comprising user's viewing data, user At least one of among data, user's retained data.
5. the method according to claim 2 for excavating operation base dimension, it is characterised in that differentiate the Ji Wei of numerical digit before ranking After the step of spending, including:
Importance threshold value is added up according to default characteristic value, the Ji Wei that the base dimension of numerical digit is constituted before at least one ranking is obtained Degree set.
6. a kind of device for excavating operation base dimension, it is characterised in that including:
Sample construction unit, for the operation dimension data according to collection, builds sample set P={ x1, x2 ..., xi };
Spatial transform unit, the covariance matrix XX for calculating the sample PT
Feature decomposition unit, for the covariance matrix XXTEigenvalues Decomposition is done, characteristic value is tried to achieve;
Base dimension judgement unit, the dimension for being constructed according to the corresponding characteristic vector of one of characteristic value is determined as Ji Wei Degree.
7. the device according to claim 6 for excavating operation base dimension, it is characterised in that also include:
Base dimension sequencing unit, for the sequence according to characteristic value, differentiates the base dimension of numerical digit before ranking.
8. the device according to claim 6 for excavating operation base dimension, it is characterised in that also include:
Business dimension unit, the dimension data for collecting subject of operation from the server end of business platform, the dimension data bag Played data containing business, business business revenue data, among business interaction data at least one of.
9. the device according to claim 6 for excavating operation base dimension, it is characterised in that also include:
User's dimension units, the dimension data for collecting subject of operation from the client of user, dimension data includes user Watch data, user's alive data, among user's retained data at least one of.
10. the device according to claim 7 for excavating operation base dimension, it is characterised in that the base dimension sequencing unit, Including:
Base dimension collector unit, for adding up importance threshold value according to default characteristic value, obtains the number before at least one ranking The base dimension collection of the base dimension composition of position.
CN201611263960.2A 2016-12-30 2016-12-30 A kind of method and device for excavating operation base dimension Pending CN107067144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611263960.2A CN107067144A (en) 2016-12-30 2016-12-30 A kind of method and device for excavating operation base dimension

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611263960.2A CN107067144A (en) 2016-12-30 2016-12-30 A kind of method and device for excavating operation base dimension

Publications (1)

Publication Number Publication Date
CN107067144A true CN107067144A (en) 2017-08-18

Family

ID=59623479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611263960.2A Pending CN107067144A (en) 2016-12-30 2016-12-30 A kind of method and device for excavating operation base dimension

Country Status (1)

Country Link
CN (1) CN107067144A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107666615A (en) * 2017-09-04 2018-02-06 广州虎牙信息科技有限公司 Method for digging, device and the server of potentiality main broadcaster user
TWI752546B (en) * 2020-07-09 2022-01-11 多利曼股份有限公司 Evaluation system and evaluation method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107666615A (en) * 2017-09-04 2018-02-06 广州虎牙信息科技有限公司 Method for digging, device and the server of potentiality main broadcaster user
TWI752546B (en) * 2020-07-09 2022-01-11 多利曼股份有限公司 Evaluation system and evaluation method

Similar Documents

Publication Publication Date Title
Polák The productivity paradox: A meta-analysis
Ghorbani et al. Chaos-based multigene genetic programming: A new hybrid strategy for river flow forecasting
CN108665159A (en) A kind of methods of risk assessment, device, terminal device and storage medium
Papacharalampous et al. Hydrological time series forecasting using simple combinations: Big data testing and investigations on one-year ahead river flow predictability
CN109118119A (en) Air control model generating method and device
Wohlin et al. An experimental evaluation of capture‐recapture in software inspections
Walsh et al. Spatial weighting of land use and temporal weighting of antecedent discharge improves prediction of stream condition
Zhang et al. Quantifying the accuracies of six 30-m cropland datasets over China: A comparison and evaluation analysis
Liu et al. Research fronts and prevailing applications in data envelopment analysis
CN109559230A (en) Bank transaction group based on overlapping community discovery algorithm finds method and system
CN106157616B (en) A kind of magnitude of traffic flow short-term prediction device
CN107067144A (en) A kind of method and device for excavating operation base dimension
Triantakonstantis et al. A spatially heterogeneous expert based (SHEB) urban growth model using model regionalization
Boyce et al. Negative binomial models for abundance estimation of multiple closed populations
Estacio et al. A statistical model of land use/cover change integrating logistic and linear models: An application to agricultural abandonment
Jeganathan et al. Multi-objective spatial decision model for land use planning in a tourism district of India
Ramanathan A data envelopment analysis of comparative performance of schools in the Netherlands
Mujiono et al. Simulation of land use change and effect on potential deforestation using Markov Chain-Cellular Automata
CN103164806A (en) Deriving market intelligence from social content
CN113723871A (en) Multi-source information-based current situation flood consistency processing method and system
Psihas et al. CVN A Convolutional Visual Network for Identication and Reconstruction of NOvA Events
Breuer et al. Spatio-Temporal Changes of Slum Populations
CN109635047A (en) Information processing method, device, equipment and the readable storage medium storing program for executing of geographic grid
Monika et al. Systematic literature review on an integrated Generalized Space Time Autoregressive Integrated Moving Average (GSTARIMA) Model with heteroscedastic error and Kriging method for forecasting climate
CN118052477B (en) Method and system for evaluating ecological restoration effect of comprehensive land space restoration area

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20170818

Assignee: GUANGZHOU HUYA INFORMATION TECHNOLOGY Co.,Ltd.

Assignor: GUANGZHOU HUADUO NETWORK TECHNOLOGY Co.,Ltd.

Contract record no.: 2018990000088

Denomination of invention: Method and device for mining operation base dimension

License type: Common License

Record date: 20180413

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170818