CN109472660A

CN109472660A - Generate method and device, the computer readable storage medium that article is most preferably arranged in pairs or groups

Info

Publication number: CN109472660A
Application number: CN201811054898.5A
Authority: CN
Inventors: 胡旭迈
Original assignee: Beijing Unexpectedly Home Yundihui New Retail Chain Co Ltd
Current assignee: Every flat every house designer (Beijing) Technology Co.,Ltd.
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2019-03-15
Anticipated expiration: 2038-09-11
Also published as: CN109472660B

Abstract

The invention discloses a kind of method and devices that generation article is most preferably arranged in pairs or groups, storage medium.It include: step S110, the design attributes for obtaining user, the design attributes include several design schemes, and each design scheme includes several articles and each article corresponding addition time；Step S120, it is based on each article corresponding addition time, several articles in each design scheme are ranked up according to the sequence of addition time respectively, form several article arrangement sets；Step S130, article arrangement set corresponding to each design scheme is integrated, article adfluxion is formed to close, and article adfluxion conjunction closes the article adfluxion using preset natural language processing technique and carries out data analysis, to obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection；Step S140, the classification collocation candidate collection of each article is ranked up according to correlation.It can effectively improve the working efficiency of designer.

Description

Generate method and device, the computer readable storage medium that article is most preferably arranged in pairs or groups

Technical field

The present invention relates to data analysis technique fields, and in particular to a kind of to generate method, the Yi Zhongsheng that article is most preferably arranged in pairs or groups The device and a kind of computer readable storage medium most preferably arranged in pairs or groups at article.

Background technique

Traditionally, when carrying out house ornamentation design, be often based upon the collaborative filtering of user or article, namely according to The similarity analysis of family similarity or article does intelligent recommendation.But intelligence is done by the similarity analysis to user or article Recommend, lack the analysis to user's usage scenario context, causes to recommend hit rate low.In addition, being known based on expert also Know and carry out intelligent recommendation, for example, forming expert knowledge library according to the experience of expert does intelligent recommendation.But such intelligent recommendation Method is unable to real-time learning there are expert knowledge library renewal speed is slow.

Summary of the invention

The present invention is directed at least solve one of the technical problems existing in the prior art, it is best to propose a kind of generation article The method of collocation, a kind of device and a kind of computer readable storage medium for generating article and most preferably arranging in pairs or groups.

To achieve the goals above, the first aspect of the present invention provides a kind of method that generation article is most preferably arranged in pairs or groups, packet It includes:

Step S110, several design schemes are obtained, each design scheme includes several articles and each The article corresponding addition time；

Step S120, based on each article corresponding addition time, if respectively in each design scheme Dry article is ranked up according to the sequence of addition time, forms several article arrangement sets；

Step S130, article arrangement set corresponding to each design scheme is integrated, forms article adfluxion It closes, and the article adfluxion is closed to close the article adfluxion using preset natural language processing technique and carries out data analysis, To obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection；

Step S140, the classification collocation candidate collection of each article is ranked up according to correlation.

Optionally, step S130 is specifically included:

The article adfluxion is closed and carries out part-of-speech tagging, each article is mapped to the classification belonging to it, to obtain Classification adfluxion is closed；

To the classification adfluxion close carry out spectrum analysis, with obtain high frequency classification subclass, intermediate frequency classification subclass and Low frequency classification subclass；

Based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency classification subclass and low frequency classification Set generates classification collocation candidate collection.

Optionally, step S140 is specifically included:

Using T check algorithm to the high frequency subclass and with its corresponding to classification collocation candidate collection correlation It is analyzed, to obtain high frequency classification collocation correlation results；

It is waited using PMI algorithm and T check algorithm to the intermediate frequency classification subclass and with the classification collocation corresponding to it The correlation that selected works close is analyzed, to obtain intermediate frequency classification collocation correlation results；

Using PMI algorithm to the low frequency classification subclass and to its corresponding to classification collocation candidate collection it is related Property analyzed, with obtain low frequency classification collocation correlation results；

According to high frequency classification collocation correlation results, intermediate frequency classification collocation correlation results and low frequency classification collocation phase Closing property result is ranked up.

Optionally, it in step S140, is arranged using descending arrangement or ascending order.

The second aspect of the present invention provides a kind of device that generation article is most preferably arranged in pairs or groups, comprising:

Obtain module, for obtaining several design schemes, each design scheme include several articles and The each article corresponding addition time；

First sorting module, for being based on each article corresponding addition time, respectively to each design side Several articles in case are ranked up according to the sequence of addition time, form several article arrangement sets；

Data analysis module is formed for integrating article arrangement set corresponding to each design scheme Article adfluxion is closed, and is closed to the article adfluxion and counted using preset natural language processing technique to article adfluxion conjunction According to analysis, to obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection；

Second sorting module is ranked up for the classification collocation candidate collection to each article according to correlation.

Optionally, the data analysis module includes part-of-speech tagging submodule, spectrum analysis submodule and processing submodule；

The part-of-speech tagging submodule carries out part-of-speech tagging for closing to the article adfluxion, each article is reflected It is mapped to the classification belonging to it, to obtain the conjunction of classification adfluxion；

The spectrum analysis submodule carries out spectrum analysis for closing to the classification adfluxion, to obtain high frequency classification Set, intermediate frequency classification subclass and low frequency classification subclass；

The processing submodule, for being based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency classification Subclass and low frequency classification subclass generate classification collocation candidate collection.

Optionally, second sorting module includes correlation analysis submodule and sorting sub-module；

The correlation analysis submodule, is used for:

Using T check algorithm to the high frequency classification subclass and with its corresponding to classification collocation candidate collection phase Closing property is analyzed, to obtain high frequency classification collocation correlation results；

The sorting sub-module, for according to high frequency classification collocation correlation results, intermediate frequency classification collocation correlation As a result it is ranked up with low frequency classification collocation correlation results.

Optionally, second sorting sub-module is arranged using descending arrangement or ascending order.

The third aspect of the present invention provides a kind of computer readable storage medium, the computer readable storage medium It is stored with computer program, the generation article such as recorded above is realized when the computer program is executed by processor most The step of method of good collocation.

The method, apparatus and computer readable storage medium that generation article of the invention is most preferably arranged in pairs or groups.It obtains first several A design scheme, the design scheme can come from one or more user, later, to several articles of each design scheme According to addition time sequencing arrangement, followed by, data analysis is carried out using natural language processing technique, to obtain classification collocation Candidate collection finally, being arranged according to correlation classification collocation candidate collection, and then can obtain the best collocation of article Classification.Therefore, it can effectively improve the working efficiency of designer, also, recommendation hit rate can also be effectively improved, in addition, also It can be learnt by constantly obtaining new design scheme, further increase recommendation hit rate.

Detailed description of the invention

The drawings are intended to provide a further understanding of the invention, and constitutes part of specification, with following tool Body embodiment is used to explain the present invention together, but is not construed as limiting the invention.In the accompanying drawings:

Fig. 1 is the flow chart that the method that article is most preferably arranged in pairs or groups is generated in first embodiment of the invention；

Fig. 2 is that the classification species number in second embodiment of the invention in the conjunction of classification adfluxion is arranged according to original frequency descending Schematic diagram；

Fig. 3 is the schematic diagram that cumulative frequency is calculated in third embodiment of the invention；

Fig. 4 is that all classifications are divided into height in fourth embodiment of the invention, in, the schematic diagram of low frequency；

Fig. 5 is that Tri-Gram model is based in fifth embodiment of the invention to the classification adfluxion symphysis into collocation Candidate Set The schematic diagram of the frequency spectrum of conjunction；

Fig. 6 is the structural schematic diagram that the device that article is most preferably arranged in pairs or groups is generated in sixth embodiment of the invention.

Specific embodiment

Below in conjunction with attached drawing, detailed description of the preferred embodiments.It should be understood that this place is retouched The specific embodiment stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.

As shown in Figure 1, the first aspect of the present invention, is related to a kind of method that generation article is most preferably arranged in pairs or groups, comprising:

Step S110, several design schemes are obtained, each design scheme includes several articles and each The article corresponding addition time.

Specifically, in this step, several design schemes can be obtained from a user, naturally it is also possible to from two Or it is obtained in multiple users.In this way, in this step, following information can be collected:

{ design scheme, user name, article add the time of article }.

It should be noted that the particular number for design scheme does not define, can according to actual needs into Row determines, will be hereinafter illustrated with the quantity of design scheme for n, and wherein n is the positive integer more than or equal to 1.

Step S120, based on each article corresponding addition time, if respectively in each design scheme Dry article is ranked up according to the sequence of addition time, forms several article arrangement sets.

Specifically, in this step, backstage is sent by information collected by step S110 carry out data processing, it will be each Several articles in design scheme are ranked up according to the sequence of addition time, obtain following article arrangement sets:

Design scheme 1:[article 1, article 2, article 3 ...]

...

Design scheme n:[article n1, article n2, article n3 ...].

Step S130, article arrangement set corresponding to each design scheme is integrated, forms article adfluxion It closes, and the article adfluxion is closed to close the article adfluxion using preset natural language processing technique and carries out data analysis, To obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection.

Specifically, in this step, being formed by the conjunction of article adfluxion can be such as: [article 1, article 2, article 3 ..., object Product n1, article n2, article n3], later, it can be based in this way using the Tri-Gram model in natural language processing, the model It is a kind of it is assumed that the appearance of n-th of word is only related to the word of front n-1, and it is all uncorrelated to other any words, to obtain classification It arranges in pairs or groups candidate collection, for example, [(closestool, shower), (closestool, hardware and other), (closestool, bathroom cabinet) ...] etc..

Descending specifically, in this step, can be carried out according to correlation or carry out ascending order arrangement etc..For example, adopting It is arranged with following descending arrangement modes:

Classification 1, [collocation classification 1, classification of arranging in pairs or groups ...]

...

Classification N, [collocation classification 1, classification of arranging in pairs or groups ...]

The method S100 that generation article in the present embodiment is most preferably arranged in pairs or groups, obtains several design schemes, the design first Scheme can come from one or more user, later, to several articles of each design scheme according to addition time sequencing Arrangement carries out data analysis using natural language processing technique followed by, so that classification collocation candidate collection is obtained, finally, right Classification collocation candidate collection is arranged according to correlation, and then can obtain the best collocation classification of article.Therefore, this implementation The method that generation article in example is most preferably arranged in pairs or groups, can effectively improve the working efficiency of designer, also, can also effectively improve Recommend hit rate, further, it is also possible to be learnt by constantly obtaining new design scheme, further increases recommendation hit rate.

Optionally, step S130 is specifically included:

The article adfluxion is closed and carries out part-of-speech tagging, each article is mapped to the classification belonging to it, to obtain Classification adfluxion is closed.

Specifically, it is formed by classification adfluxion and is combined into [classification 1, classification 2, classification 3 ..., classification n1, classification n2, classification n3].For example, can be as follows:

[floor tile, floor tile, customized product]；

[vertical hinged door, customized product]；

[customized product, customized product, customized product, customized product]；

[customized product, floor tile, customized product, closestool, wood skin, wood skin, wood skin, wood skin, wood skin, metal, wallpaper]；

[customized product, pendulum decorations, green plant, customized product, customized product]；

[customized product, sliding door, wardrobe, shoe chest, cabinet for TV, TV, floor tile, floor tile, floor tile, sliding door, float window, Float window]；

[customized product, double bed, baking vanish, baking vanish, baking vanish, baking vanish, baking vanish, baking vanish, baking vanish]；

[customized product, kitchen appliance, kitchen appliance, pendulum decorations, pendulum decorations, pendulum decorations, pendulum decorations, kitchen appliance]；

...

To the classification adfluxion close carry out spectrum analysis, with obtain high frequency classification subclass, intermediate frequency classification subclass and Low frequency classification subclass.

The process of spectrum analysis is as follows:

(1) the classification species number (we obtain 198 kinds of classifications) occurred in the conjunction of statistics classification adfluxion, and by them by original The arrangement of frequency descending, as shown in Figure 2.

(2) cumulative frequency is calculated, as shown in Figure 3.

(3) according to the slope of cummulative frequency curve (slope value is smaller, and curve is gentler, and corresponding classification comments rate lower), All classifications are divided into height, in, three kinds of low frequency, as shown in figure 4, namely generation high frequency classification subclass, intermediate frequency classification subset Conjunction and low frequency classification subclass.

Specifically, as shown in figure 5, based on such a it is assumed that the classification arranged in pairs or groups with high frequency classification subclass is all high Frequency classification, the classification that intermediate frequency classification subclass is arranged in pairs or groups all are intermediate frequency classifications, and the classification that low frequency classification subclass is arranged in pairs or groups all is Low frequency classification.

Optionally, step S140 is specifically included:

Specifically, using PMI algorithm, the correlation between two things is measured with this index, formula is as follows:

In probability theory, it is known that if x is uncorrelated with y, p (x, y)=p (x) p (y).The two correlation is bigger,

Then p (x, y) is just bigger compared to p (x) p (y).It is best understood from subsequent formula, in the case where y occurs, x goes out The Probability p (x) that existing conditional probability p (x | y) occurs divided by x itself, means that x with the degree of correlation of y naturally.For this implementation This scene in example, the codomain of PMI be [0 ,+∞), monotonic increase, this algorithm is very sensitive to low-frequency information.

Using T check algorithm, the T value of P (x, y) and P (x) P (y) are calculated, what it reflected is the opposite of a collocation power Difference, formula are as follows:

T value is bigger, illustrates that the co-occurrence probabilities that this is observed are P (x, y) and accident probability P (x) P's (y) of random co-occurrence Difference is objective reality rather than accidental coincidence.From statistical angle, 1.65 mean square deviations show that we have 95% assurance to say One meaningful collocation of collocation, corresponding T value are 2.132.This algorithm is very sensitive to high-frequency information.

Therefore, the present invention, which excavates the strategy most preferably arranged in pairs or groups from collocation set, is:

A. high frequency collocation item is filtered using T check algorithm, is arranged by T value descending.

B. intermediate frequency collocation item is filtered using PMI check algorithm and T check algorithm, carries out merger to result.

C. low frequency collocation item is filtered using PMI check algorithm, is arranged by PMI value descending.

The second aspect of the present invention, as shown in fig. 6, providing a kind of device 100 that generation article is most preferably arranged in pairs or groups, comprising:

Obtain module 110, for obtaining several design schemes, each design scheme include several articles with And each article corresponding addition time；

First sorting module 120, for being based on each article corresponding addition time, respectively to each design Several articles in scheme are ranked up according to the sequence of addition time, form several article arrangement sets；

Data analysis module 130, for article arrangement set corresponding to each design scheme to be integrated, shape It is closed at article adfluxion, and the article adfluxion is closed to close the article adfluxion using preset natural language processing technique and is carried out Data analysis, to obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection；

Second sorting module 140 is arranged for the classification collocation candidate collection to each article according to correlation Sequence.

The device 100 that generation article in the present embodiment is most preferably arranged in pairs or groups, obtains several design schemes, the design side first Case can come from one or more user, later, be arranged according to addition time sequencing several articles of each design scheme Column carry out data analysis using natural language processing technique followed by, so that classification collocation candidate collection is obtained, finally, to class Mesh collocation candidate collection is arranged according to correlation, and then can obtain the best collocation classification of article.Therefore, the present embodiment In the device most preferably arranged in pairs or groups of generation article, can effectively improve the working efficiency of designer, also, can also effectively improve and push away Hit rate is recommended, further, it is also possible to be learnt by constantly obtaining new design scheme, further increases recommendation hit rate.

Optionally, the data analysis module 130 includes part-of-speech tagging submodule 131, spectrum analysis submodule 132 and place Manage submodule 133；

The part-of-speech tagging submodule 131 carries out part-of-speech tagging for closing to the article adfluxion, by each article It is mapped to the classification belonging to it, to obtain the conjunction of classification adfluxion；

The spectrum analysis submodule 132 carries out spectrum analysis for closing to the classification adfluxion, to obtain high frequency classification Subclass, intermediate frequency classification subclass and low frequency classification subclass；

The processing submodule 133, for being based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency class Mesh subclass and low frequency classification subclass generate classification collocation candidate collection.

Spectrum analysis and remaining content can be recorded with reference to related above, and therefore not to repeat here.

Optionally, second sorting module 140 includes correlation analysis submodule 141 and sorting sub-module 142；

The correlation analysis submodule 141, is used for:

The sorting sub-module 142 is related for being arranged in pairs or groups according to high frequency classification collocation correlation results, intermediate frequency classification Property result and low frequency classification collocation correlation results be ranked up.

It can specifically be recorded with reference to related above, therefore not to repeat here.

Optionally, second sorting sub-module 140 is arranged using descending arrangement or ascending order.

Computer readable storage medium in the present embodiment, the computer program stored can when being executed by processor It, should by obtaining several design schemes first to realize the method S100 most preferably to arrange in pairs or groups such as the generation article recorded above Design scheme can come from one or more user, later, to several articles of each design scheme according to the addition time Sequence arranges, and followed by, carries out data analysis using natural language processing technique, to obtain classification collocation candidate collection, most Afterwards, classification collocation candidate collection is arranged according to correlation, and then the best collocation classification of article can be obtained.Therefore, Computer readable storage medium in the present embodiment can effectively improve the working efficiency of designer, also, can also effectively mention Height recommends hit rate, further, it is also possible to be learnt by constantly obtaining new design scheme, further increases recommendation hit Rate.

It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses Mode, however the present invention is not limited thereto.For those skilled in the art, essence of the invention is not being departed from In the case where mind and essence, various changes and modifications can be made therein, these variations and modifications are also considered as protection scope of the present invention.

Claims

1. a kind of method for generating article and most preferably arranging in pairs or groups characterized by comprising

Step S110, several design schemes are obtained, each design scheme includes several articles and each described The article corresponding addition time；

Step S120, based on each article corresponding addition time, respectively to several in each design scheme Article is ranked up according to the sequence of addition time, forms several article arrangement sets；

Step S130, article arrangement set corresponding to each design scheme is integrated, forms article adfluxion and closes, and The article adfluxion is closed to close the article adfluxion using preset natural language processing technique and carries out data analysis, to obtain The classification collocation candidate collection mutually arranged in pairs or groups with each article；

2. the method according to claim 1 for generating article and most preferably arranging in pairs or groups, which is characterized in that step S130 is specifically included:

The classification adfluxion is closed and carries out spectrum analysis, to obtain high frequency classification subclass, intermediate frequency classification subclass and low frequency Classification subclass；

Based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency classification subclass and low frequency classification subclass Generate classification collocation candidate collection.

3. the method according to claim 2 for generating article and most preferably arranging in pairs or groups, which is characterized in that step S140 is specifically included:

Using T check algorithm to the high frequency classification subclass and with its corresponding to classification collocation candidate collection correlation It is analyzed, to obtain high frequency classification collocation correlation results；

It is arranged in pairs or groups Candidate Set using PMI algorithm and T check algorithm to the intermediate frequency classification subclass and with the classification corresponding to it The correlation of conjunction is analyzed, to obtain intermediate frequency classification collocation correlation results；

Using PMI algorithm to the low frequency classification subclass and with its corresponding to classification collocation candidate collection correlation into Row analysis, to obtain low frequency classification collocation correlation results；

According to high frequency classification collocation correlation results, intermediate frequency classification collocation correlation results and low frequency classification collocation correlation As a result it is ranked up.

4. the method as claimed in any of claims 1 to 3 for generating article and most preferably arranging in pairs or groups, which is characterized in that in step In rapid S140, arranged using descending arrangement or ascending order.

5. a kind of device for generating article and most preferably arranging in pairs or groups characterized by comprising

Module is obtained, for obtaining several design schemes, each design scheme includes several articles and each The article corresponding addition time；

First sorting module, for being based on each article corresponding addition time, respectively in each design scheme Several articles according to addition the time sequence be ranked up, form several article arrangement sets；

Data analysis module forms article for integrating article arrangement set corresponding to each design scheme Adfluxion is closed, and is closed to close the article adfluxion using preset natural language processing technique to the article adfluxion and carried out data point Analysis, to obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection；

6. the device according to claim 5 for generating article and most preferably arranging in pairs or groups, which is characterized in that the data analysis module packet Include part-of-speech tagging submodule, spectrum analysis submodule and processing submodule；

The part-of-speech tagging submodule carries out part-of-speech tagging for closing to the article adfluxion, each article is mapped to Classification belonging to it, to obtain the conjunction of classification adfluxion；

The spectrum analysis submodule, for the classification adfluxion close carry out spectrum analysis, with obtain high frequency classification subclass, Intermediate frequency classification subclass and low frequency classification subclass；

The processing submodule, for being based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency classification subset It closes and low frequency classification subclass generates classification collocation candidate collection.

7. the device according to claim 6 for generating article and most preferably arranging in pairs or groups, which is characterized in that the second sorting module packet Include correlation analysis submodule and sorting sub-module；

The correlation analysis submodule, is used for:

Using T check algorithm to the high frequency classification subclass and to its corresponding to classification collocation candidate subset close it is related Property analyzed, with obtain high frequency classification collocation correlation results；

It is candidate sub to the intermediate frequency classification subclass and with the classification collocation corresponding to it using PMI algorithm and T check algorithm The correlation of set is analyzed, to obtain intermediate frequency classification collocation correlation results；

Using PMI algorithm to the low frequency classification subclass and the correlation closed with the classification collocation candidate subset corresponding to it It is analyzed, to obtain low frequency classification collocation correlation results；

The sorting sub-module, for according to high frequency classification collocation correlation results, intermediate frequency classification collocation correlation results It is ranked up with low frequency classification collocation correlation results.

8. the method that generation article is most preferably arranged in pairs or groups according to any one of claim 5 to 7, which is characterized in that described Second sorting sub-module is arranged using descending arrangement or ascending order.

9. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey Sequence realizes that the generation article as described in any one of Claims 1-4 is best when the computer program is executed by processor The step of method of collocation.