CN108776911A

CN108776911A - A kind of Commodity Competition relationship analysis method based on machine learning

Info

Publication number: CN108776911A
Application number: CN201810706947.2A
Authority: CN
Inventors: 张帅
Original assignee: Inspur Software Co Ltd
Current assignee: Inspur Software Co Ltd
Priority date: 2018-07-02
Filing date: 2018-07-02
Publication date: 2018-11-09

Abstract

The invention discloses a kind of Commodity Competition relationship analysis method based on machine learning, the method quantifies the similarity of commodity by item property, original commodity data is filtered, while being filtered out in the presence of abnormal business datum, to reach the business demand of network analysis.By the present invention in that carrying out Linear Comparison to the Sales Volume of Commodity in these experiments with visualization tool, it was found that two commodity show certain competitive relation within a certain period of time, it is provided a convenient for the work development of profession personnel, decision support is provided for Commodity Competition investigation and evidence collection, it ensure that the reliability of data, the data survey workload for greatly reducing product marketing personnel and decision-making management personnel again simultaneously, strong foundation is provided for the future plan of commodity.

Description

A kind of Commodity Competition relationship analysis method based on machine learning

Technical field

The present invention relates to data analysis technique fields, and in particular to a kind of Commodity Competition relationship analysis based on machine learning Method is related to the application of machine learning KNN algorithms, Pearson relevance matrix analysis, data visualization analysis, mathematical statistics etc..

Background technology

With the fast development of e-commerce, the highly desirable acquisition of goods marketing tradesman finds that there is certain competition closes The ability of the commodity of system.Existing similarity competition method of discrimination can help client to find that there are competitive relations to a certain extent Commodity, but the result of data analysis is often not accurate enough, in addition the erroneous judgement of many prioris and the defect of analysis method, Last result is often not particularly suited for Commodity Competition relationship and excavates practical scene.

Since common method of analyzing competitiveness SWOT self-formings, strategic research and competition analysis have been widely used in it, have become war The slightly important tools of analysis of management and competitive intelligence.Analysis is intuitive, using being simply its important advantage.Even if without accurate Data are supported and more specialized analysis tool, can also obtain convictive conclusion.But exactly this intuitive and letter It is single so that SWOT inevitably carries the inadequate defect of precision.Such as SWOT analysis uses qualitative method, by enumerating The various performances of Strengths, Weaknesses, Opportunities, Threats, a kind of fuzzy competition among enterprises of formation Position description.The judgement made on this basis carries a degree of get sth into one's head unavoidably.So when using SWOT methods It is noted that the limitation of method, true as possible, objective, accurate when enumerating as the fact that basis for estimation, and provide certain Quantitative data make up the deficiencies of SWOT qualitative analyses, construct the basis of high-rise qualitative analysis.Due to information excavating channel and pass Note point is often with subjectivity, and data analyst can go to collect data towards expected subconsciousness judgement, so analysis result is past It is past not accurate enough.

Keen competition brings immense pressure to enterprise between commodity economic times, similar commodity, accurate and quick A pair of of the commodity found with competitive relation can be necessarily the market expansion of industry product and reduce cost and bring important ginseng Examine foundation.Under big data scene, researcher often faces the data processing needs of magnanimity, and analysis result is often not objective enough Accurately, while it is even more impossible to ensure the validity of traditional analysis.

Current each major company, there are the mode that generally use when the commodity of competitive relation is investigated on the spot, passes through correlation in investigation Sales department and sales department go to market to collect evidence, and use some basic graphic statistics methods.But it is a lack of the reason of complete set By support, and precision is not high enough, and analysis result is often with prodigious subjective consciousness.

Invention content

The technical problem to be solved by the present invention is to：The present invention is in view of the above problems, provide a kind of quotient based on machine learning Product competitive relation analysis method is calculated by using data visualization analysis, Gauss modeling, NearestNeighbors machine learning Some application processes such as method, Pearson matrixes can preferably identify the commodity with competitive relation.

The technical solution adopted in the present invention is：

A kind of Commodity Competition relationship analysis method based on machine learning, the method quantify the similar of commodity by item property Degree, original commodity data is filtered, while being filtered out in the presence of abnormal business datum, to reach the industry of network analysis Business demand.

The commodity data is the business datum in commodity at the appointed time section, this partial data is can completely to do relatively Further linear analysis.

The selection process of the commodity data includes that content is as follows：

From at least one candidate commodity data object, obtain and commodity data object to be analyzed most like at least one period Target data objects, including：Determine the period threshold value of the target fragment and Sales Volume of Commodity data of a commodity data object It chooses.

The quantization method of the commodity similarity is：The pairs of attribute value of commodity is shown comparison on X-Y scheme.

The quantization method of the commodity similarity is：Set different commodity to a point, and using scatter plot come into Row visual analyzing, it is more intuitive effective.

The quantization of the commodity similarity carrys out Accurate Curve-fitting using kernel function fitting modeling using the method for gaussian kernel function The higher similar commodity of similarity are gathered in the same Gaussian Profile, the commodity of different similarities by the distribution of data attribute value It is respectively distributed in different gaussian kernel functions, commodity classification circle of Different groups, density, the color of deme commodity can be obtained The depth visualizes isometry strategy.

The item property is a variety of attributes（More than multiple attributes reach high-dimensional）, the quantization of data commodity similarity adopts With difference of the An Delie curve comparison difference commodity on different attribute.An Delie curves can convert high dimensional data to limited Fourier series are finally indicated with trigonometric function output, judge the similarity between different commodity by the close coefficient of curve.

After obtaining the higher commodity collection of similarity, done further using NearestNeighbors machine learning algorithms Similarity calculation.

The similarity calculation process is as follows：

One training sample set is set, and training sample concentrates each data, and there are labels, i.e., it is understood that training sample is concentrated The correspondence of each data and affiliated classification；

After inputting the not new data of label, by each feature of new data spy corresponding with the training sample each data of concentration Sign is compared, and the tag along sort of the data most like with new samples is then calculated.Usually only selection sample data is concentrated Preceding k most like data, using wherein most tag along sorts belonged to as the tag along sort of new data.

The method is similar by calculating the minimum between two commodity by using NearestNeighbors nearest neighbour methods Distance, accurately calculate with another highest commodity of each commodity similarity, the commodity being calculated in this way are further Analysis provide important foundation.

Similarity is higher between similarity distance two commodity of smaller explanation between two commodity.By practical test, two The corresponding attribute of row commodity is very close.

After launch, market generally requires for a period of time to receive a pair of of commodity with competitive relation, with The propagation of public praise and the diffusion of demonstration effect, sales volume often gradually rise, a pair with competitive relation after market saturation Commodity often have special performance on sales volume.Such as the rapid growth of a Sales Volume of Commodity may result in another commodity The decline of sales volume, while a commodity show weak, sales volume stagnation, increase weak in a short time, another commodity is opened The gradually expansion market sales volume that begins also gradually rises.A pair of of commodity with this performance are often with competitive relation.Finally in order to Improve the reliability of data, it is also necessary to be filtered in the commodity of competitive relation to these.Two kinds of commodity all occupy on the market When certain share, their sales volume is possible to whithin a period of time relatively due to competing, and it is therefore necessary to right The screening that these data are further walked.Finally we have obtained multiple commodity pair with competitive relation.

Beneficial effects of the present invention are：

By the present invention in that carrying out Linear Comparison to the Sales Volume of Commodity in these experiments with visualization tool, it is found that two commodity exist Certain competitive relation is shown in certain period of time, is provided a convenient for the work development of profession personnel, is that commodity are competing It strives investigation and evidence collection and provides decision support, ensure that the reliability of data, while greatly reducing product marketing personnel again and determining The data survey workload of plan administrative staff provides strong foundation for the future plan of commodity.

Description of the drawings

Fig. 1 is distribution map of the different commodity on two attributes.

Specific implementation mode

Below in conjunction with the accompanying drawings, according to specific implementation mode, the present invention is further described：

Embodiment 1：

Embodiment 2

Embodiment 3

As shown in Figure 1, the quantization method of the commodity similarity is：It sets different commodity to a point, and uses scatterplot Figure carries out visual analyzing, more it is intuitive effectively.

Each point represents a commodity in figure, and reference axis respectively represents two attributes of commodity, the quotient concentrated in together It is close that product point illustrates that these commodity show on attribute 3 and attribute 4, belongs to more similar commodity.

Embodiment 4

The quantization of the data commodity similarity carrys out Accurate Curve-fitting using kernel function fitting modeling using the method for gaussian kernel function The higher similar commodity of similarity are gathered in the same Gaussian Profile, the commodity of different similarities by the distribution of data attribute value It is respectively distributed in different gaussian kernel functions, commodity classification circle of Different groups, density, the color of deme commodity can be obtained The depth visualizes isometry strategy.

Embodiment 5

The item property is a variety of attributes（More than multiple attributes reach high-dimensional）, the quantization of data commodity similarity is using peace Difference of the strong curve comparison difference commodity of moral on different attribute.An Delie curves can convert high dimensional data in limited Fu Leaf arrangement is finally indicated with trigonometric function output, judges the similarity between different commodity by the close coefficient of curve.

Embodiment 6

After obtaining the higher commodity collection of similarity, further phase is done using NearestNeighbors machine learning algorithms It is calculated like degree.

The similarity calculation process is as follows：

Embodiment is merely to illustrate the present invention, and not limitation of the present invention, the ordinary skill in relation to technical field Personnel can also make a variety of changes and modification without departing from the spirit and scope of the present invention, therefore all equivalent Technical solution also belong to scope of the invention, scope of patent protection of the invention should be defined by the claims.

Claims

1. a kind of Commodity Competition relationship analysis method based on machine learning, it is characterised in that：The method passes through item property The similarity for quantifying commodity, original commodity data is filtered, while being filtered out in the presence of abnormal business datum, to reach To the business demand of network analysis.

2. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 1, which is characterized in that institute Commodity data is stated as the business datum in commodity at the appointed time section.

3. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 2, which is characterized in that institute The selection process for stating commodity data includes that content is as follows：

From at least one candidate commodity data object, obtain and commodity data object to be analyzed most like at least one period Target data objects, include the period threshold value of the target fragment of commodity data object and Sales Volume of Commodity data.

4. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 3, which is characterized in that institute The quantization method for stating commodity similarity is：The pairs of attribute value of commodity is shown comparison on X-Y scheme.

5. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 3, which is characterized in that institute The quantization method for stating commodity similarity is：It sets different commodity to a point, and visualization point is carried out using scatter plot Analysis.

6. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 3, which is characterized in that institute The quantization of commodity similarity is stated using kernel function fitting modeling, uses point of the method fitting data attribute value of gaussian kernel function Cloth.

7. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 3, which is characterized in that institute It is a variety of attributes to state item property, and the quantization of data commodity similarity uses An Delie curve comparison difference commodity in different attribute On difference.

8. according to a kind of any Commodity Competition relationship analysis methods based on machine learning of claim 4-7, feature It is, after obtaining the higher commodity collection of similarity, is done further using NearestNeighbors machine learning algorithms Similarity calculation.

9. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 8, which is characterized in that institute It is as follows to state similarity calculation process：

One training sample set is set, and training sample concentrates each data, and there are labels；

After inputting the not new data of label, by each feature of new data spy corresponding with the training sample each data of concentration Sign is compared, and the tag along sort of the data most like with new samples is then calculated.

10. a kind of Commodity Competition relationship analysis method based on machine learning according to claim 9, which is characterized in that The method is by using NearestNeighbors nearest neighbour methods, by calculating the minimum similarity distance between two commodity, meter It calculates with another highest commodity of each commodity similarity, the commodity being calculated in this way provide weight for further analysis Want basis.