Summary of the invention
The present invention is directed at least solve one of the technical problems existing in the prior art, it is best to propose a kind of generation article
The method of collocation, a kind of device and a kind of computer readable storage medium for generating article and most preferably arranging in pairs or groups.
To achieve the goals above, the first aspect of the present invention provides a kind of method that generation article is most preferably arranged in pairs or groups, packet
It includes:
Step S110, several design schemes are obtained, each design scheme includes several articles and each
The article corresponding addition time;
Step S120, based on each article corresponding addition time, if respectively in each design scheme
Dry article is ranked up according to the sequence of addition time, forms several article arrangement sets;
Step S130, article arrangement set corresponding to each design scheme is integrated, forms article adfluxion
It closes, and the article adfluxion is closed to close the article adfluxion using preset natural language processing technique and carries out data analysis,
To obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection;
Step S140, the classification collocation candidate collection of each article is ranked up according to correlation.
Optionally, step S130 is specifically included:
The article adfluxion is closed and carries out part-of-speech tagging, each article is mapped to the classification belonging to it, to obtain
Classification adfluxion is closed;
To the classification adfluxion close carry out spectrum analysis, with obtain high frequency classification subclass, intermediate frequency classification subclass and
Low frequency classification subclass;
Based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency classification subclass and low frequency classification
Set generates classification collocation candidate collection.
Optionally, step S140 is specifically included:
Using T check algorithm to the high frequency subclass and with its corresponding to classification collocation candidate collection correlation
It is analyzed, to obtain high frequency classification collocation correlation results;
It is waited using PMI algorithm and T check algorithm to the intermediate frequency classification subclass and with the classification collocation corresponding to it
The correlation that selected works close is analyzed, to obtain intermediate frequency classification collocation correlation results;
Using PMI algorithm to the low frequency classification subclass and to its corresponding to classification collocation candidate collection it is related
Property analyzed, with obtain low frequency classification collocation correlation results;
According to high frequency classification collocation correlation results, intermediate frequency classification collocation correlation results and low frequency classification collocation phase
Closing property result is ranked up.
Optionally, it in step S140, is arranged using descending arrangement or ascending order.
The second aspect of the present invention provides a kind of device that generation article is most preferably arranged in pairs or groups, comprising:
Obtain module, for obtaining several design schemes, each design scheme include several articles and
The each article corresponding addition time;
First sorting module, for being based on each article corresponding addition time, respectively to each design side
Several articles in case are ranked up according to the sequence of addition time, form several article arrangement sets;
Data analysis module is formed for integrating article arrangement set corresponding to each design scheme
Article adfluxion is closed, and is closed to the article adfluxion and counted using preset natural language processing technique to article adfluxion conjunction
According to analysis, to obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection;
Second sorting module is ranked up for the classification collocation candidate collection to each article according to correlation.
Optionally, the data analysis module includes part-of-speech tagging submodule, spectrum analysis submodule and processing submodule;
The part-of-speech tagging submodule carries out part-of-speech tagging for closing to the article adfluxion, each article is reflected
It is mapped to the classification belonging to it, to obtain the conjunction of classification adfluxion;
The spectrum analysis submodule carries out spectrum analysis for closing to the classification adfluxion, to obtain high frequency classification
Set, intermediate frequency classification subclass and low frequency classification subclass;
The processing submodule, for being based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency classification
Subclass and low frequency classification subclass generate classification collocation candidate collection.
Optionally, second sorting module includes correlation analysis submodule and sorting sub-module;
The correlation analysis submodule, is used for:
Using T check algorithm to the high frequency classification subclass and with its corresponding to classification collocation candidate collection phase
Closing property is analyzed, to obtain high frequency classification collocation correlation results;
It is waited using PMI algorithm and T check algorithm to the intermediate frequency classification subclass and with the classification collocation corresponding to it
The correlation that selected works close is analyzed, to obtain intermediate frequency classification collocation correlation results;
Using PMI algorithm to the low frequency classification subclass and to its corresponding to classification collocation candidate collection it is related
Property analyzed, with obtain low frequency classification collocation correlation results;
The sorting sub-module, for according to high frequency classification collocation correlation results, intermediate frequency classification collocation correlation
As a result it is ranked up with low frequency classification collocation correlation results.
Optionally, second sorting sub-module is arranged using descending arrangement or ascending order.
The third aspect of the present invention provides a kind of computer readable storage medium, the computer readable storage medium
It is stored with computer program, the generation article such as recorded above is realized when the computer program is executed by processor most
The step of method of good collocation.
The method, apparatus and computer readable storage medium that generation article of the invention is most preferably arranged in pairs or groups.It obtains first several
A design scheme, the design scheme can come from one or more user, later, to several articles of each design scheme
According to addition time sequencing arrangement, followed by, data analysis is carried out using natural language processing technique, to obtain classification collocation
Candidate collection finally, being arranged according to correlation classification collocation candidate collection, and then can obtain the best collocation of article
Classification.Therefore, it can effectively improve the working efficiency of designer, also, recommendation hit rate can also be effectively improved, in addition, also
It can be learnt by constantly obtaining new design scheme, further increase recommendation hit rate.
Specific embodiment
Below in conjunction with attached drawing, detailed description of the preferred embodiments.It should be understood that this place is retouched
The specific embodiment stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
As shown in Figure 1, the first aspect of the present invention, is related to a kind of method that generation article is most preferably arranged in pairs or groups, comprising:
Step S110, several design schemes are obtained, each design scheme includes several articles and each
The article corresponding addition time.
Specifically, in this step, several design schemes can be obtained from a user, naturally it is also possible to from two
Or it is obtained in multiple users.In this way, in this step, following information can be collected:
{ design scheme, user name, article add the time of article }.
It should be noted that the particular number for design scheme does not define, can according to actual needs into
Row determines, will be hereinafter illustrated with the quantity of design scheme for n, and wherein n is the positive integer more than or equal to 1.
Step S120, based on each article corresponding addition time, if respectively in each design scheme
Dry article is ranked up according to the sequence of addition time, forms several article arrangement sets.
Specifically, in this step, backstage is sent by information collected by step S110 carry out data processing, it will be each
Several articles in design scheme are ranked up according to the sequence of addition time, obtain following article arrangement sets:
Design scheme 1:[article 1, article 2, article 3 ...]
...
Design scheme n:[article n1, article n2, article n3 ...].
Step S130, article arrangement set corresponding to each design scheme is integrated, forms article adfluxion
It closes, and the article adfluxion is closed to close the article adfluxion using preset natural language processing technique and carries out data analysis,
To obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection.
Specifically, in this step, being formed by the conjunction of article adfluxion can be such as: [article 1, article 2, article 3 ..., object
Product n1, article n2, article n3], later, it can be based in this way using the Tri-Gram model in natural language processing, the model
It is a kind of it is assumed that the appearance of n-th of word is only related to the word of front n-1, and it is all uncorrelated to other any words, to obtain classification
It arranges in pairs or groups candidate collection, for example, [(closestool, shower), (closestool, hardware and other), (closestool, bathroom cabinet) ...] etc..
Step S140, the classification collocation candidate collection of each article is ranked up according to correlation.
Descending specifically, in this step, can be carried out according to correlation or carry out ascending order arrangement etc..For example, adopting
It is arranged with following descending arrangement modes:
Classification 1, [collocation classification 1, classification of arranging in pairs or groups ...]
...
Classification N, [collocation classification 1, classification of arranging in pairs or groups ...]
The method S100 that generation article in the present embodiment is most preferably arranged in pairs or groups, obtains several design schemes, the design first
Scheme can come from one or more user, later, to several articles of each design scheme according to addition time sequencing
Arrangement carries out data analysis using natural language processing technique followed by, so that classification collocation candidate collection is obtained, finally, right
Classification collocation candidate collection is arranged according to correlation, and then can obtain the best collocation classification of article.Therefore, this implementation
The method that generation article in example is most preferably arranged in pairs or groups, can effectively improve the working efficiency of designer, also, can also effectively improve
Recommend hit rate, further, it is also possible to be learnt by constantly obtaining new design scheme, further increases recommendation hit rate.
Optionally, step S130 is specifically included:
The article adfluxion is closed and carries out part-of-speech tagging, each article is mapped to the classification belonging to it, to obtain
Classification adfluxion is closed.
Specifically, it is formed by classification adfluxion and is combined into [classification 1, classification 2, classification 3 ..., classification n1, classification n2, classification
n3].For example, can be as follows:
[floor tile, floor tile, customized product];
[vertical hinged door, customized product];
[customized product, customized product, customized product, customized product];
[customized product, floor tile, customized product, closestool, wood skin, wood skin, wood skin, wood skin, wood skin, metal, wallpaper];
[customized product, pendulum decorations, green plant, customized product, customized product];
[customized product, sliding door, wardrobe, shoe chest, cabinet for TV, TV, floor tile, floor tile, floor tile, sliding door, float window,
Float window];
[customized product, double bed, baking vanish, baking vanish, baking vanish, baking vanish, baking vanish, baking vanish, baking vanish];
[customized product, kitchen appliance, kitchen appliance, pendulum decorations, pendulum decorations, pendulum decorations, pendulum decorations, kitchen appliance];
...
To the classification adfluxion close carry out spectrum analysis, with obtain high frequency classification subclass, intermediate frequency classification subclass and
Low frequency classification subclass.
The process of spectrum analysis is as follows:
(1) the classification species number (we obtain 198 kinds of classifications) occurred in the conjunction of statistics classification adfluxion, and by them by original
The arrangement of frequency descending, as shown in Figure 2.
(2) cumulative frequency is calculated, as shown in Figure 3.
(3) according to the slope of cummulative frequency curve (slope value is smaller, and curve is gentler, and corresponding classification comments rate lower),
All classifications are divided into height, in, three kinds of low frequency, as shown in figure 4, namely generation high frequency classification subclass, intermediate frequency classification subset
Conjunction and low frequency classification subclass.
Based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency classification subclass and low frequency classification
Set generates classification collocation candidate collection.
Specifically, as shown in figure 5, based on such a it is assumed that the classification arranged in pairs or groups with high frequency classification subclass is all high
Frequency classification, the classification that intermediate frequency classification subclass is arranged in pairs or groups all are intermediate frequency classifications, and the classification that low frequency classification subclass is arranged in pairs or groups all is
Low frequency classification.
Optionally, step S140 is specifically included:
Using T check algorithm to the high frequency classification subclass and with its corresponding to classification collocation candidate collection phase
Closing property is analyzed, to obtain high frequency classification collocation correlation results;
It is waited using PMI algorithm and T check algorithm to the intermediate frequency classification subclass and with the classification collocation corresponding to it
The correlation that selected works close is analyzed, to obtain intermediate frequency classification collocation correlation results;
Using PMI algorithm to the low frequency classification subclass and to its corresponding to classification collocation candidate collection it is related
Property analyzed, with obtain low frequency classification collocation correlation results;
According to high frequency classification collocation correlation results, intermediate frequency classification collocation correlation results and low frequency classification collocation phase
Closing property result is ranked up.
Specifically, using PMI algorithm, the correlation between two things is measured with this index, formula is as follows:
In probability theory, it is known that if x is uncorrelated with y, p (x, y)=p (x) p (y).The two correlation is bigger,
Then p (x, y) is just bigger compared to p (x) p (y).It is best understood from subsequent formula, in the case where y occurs, x goes out
The Probability p (x) that existing conditional probability p (x | y) occurs divided by x itself, means that x with the degree of correlation of y naturally.For this implementation
This scene in example, the codomain of PMI be [0 ,+∞), monotonic increase, this algorithm is very sensitive to low-frequency information.
Using T check algorithm, the T value of P (x, y) and P (x) P (y) are calculated, what it reflected is the opposite of a collocation power
Difference, formula are as follows:
T value is bigger, illustrates that the co-occurrence probabilities that this is observed are P (x, y) and accident probability P (x) P's (y) of random co-occurrence
Difference is objective reality rather than accidental coincidence.From statistical angle, 1.65 mean square deviations show that we have 95% assurance to say
One meaningful collocation of collocation, corresponding T value are 2.132.This algorithm is very sensitive to high-frequency information.
Therefore, the present invention, which excavates the strategy most preferably arranged in pairs or groups from collocation set, is:
A. high frequency collocation item is filtered using T check algorithm, is arranged by T value descending.
B. intermediate frequency collocation item is filtered using PMI check algorithm and T check algorithm, carries out merger to result.
C. low frequency collocation item is filtered using PMI check algorithm, is arranged by PMI value descending.
The second aspect of the present invention, as shown in fig. 6, providing a kind of device 100 that generation article is most preferably arranged in pairs or groups, comprising:
Obtain module 110, for obtaining several design schemes, each design scheme include several articles with
And each article corresponding addition time;
First sorting module 120, for being based on each article corresponding addition time, respectively to each design
Several articles in scheme are ranked up according to the sequence of addition time, form several article arrangement sets;
Data analysis module 130, for article arrangement set corresponding to each design scheme to be integrated, shape
It is closed at article adfluxion, and the article adfluxion is closed to close the article adfluxion using preset natural language processing technique and is carried out
Data analysis, to obtain the classification mutually arranged in pairs or groups with each article collocation candidate collection;
Second sorting module 140 is arranged for the classification collocation candidate collection to each article according to correlation
Sequence.
The device 100 that generation article in the present embodiment is most preferably arranged in pairs or groups, obtains several design schemes, the design side first
Case can come from one or more user, later, be arranged according to addition time sequencing several articles of each design scheme
Column carry out data analysis using natural language processing technique followed by, so that classification collocation candidate collection is obtained, finally, to class
Mesh collocation candidate collection is arranged according to correlation, and then can obtain the best collocation classification of article.Therefore, the present embodiment
In the device most preferably arranged in pairs or groups of generation article, can effectively improve the working efficiency of designer, also, can also effectively improve and push away
Hit rate is recommended, further, it is also possible to be learnt by constantly obtaining new design scheme, further increases recommendation hit rate.
Optionally, the data analysis module 130 includes part-of-speech tagging submodule 131, spectrum analysis submodule 132 and place
Manage submodule 133;
The part-of-speech tagging submodule 131 carries out part-of-speech tagging for closing to the article adfluxion, by each article
It is mapped to the classification belonging to it, to obtain the conjunction of classification adfluxion;
The spectrum analysis submodule 132 carries out spectrum analysis for closing to the classification adfluxion, to obtain high frequency classification
Subclass, intermediate frequency classification subclass and low frequency classification subclass;
The processing submodule 133, for being based on Tri-Gram model respectively to the high frequency classification subclass, intermediate frequency class
Mesh subclass and low frequency classification subclass generate classification collocation candidate collection.
Spectrum analysis and remaining content can be recorded with reference to related above, and therefore not to repeat here.
Optionally, second sorting module 140 includes correlation analysis submodule 141 and sorting sub-module 142;
The correlation analysis submodule 141, is used for:
Using T check algorithm to the high frequency classification subclass and with its corresponding to classification collocation candidate collection phase
Closing property is analyzed, to obtain high frequency classification collocation correlation results;
It is waited using PMI algorithm and T check algorithm to the intermediate frequency classification subclass and with the classification collocation corresponding to it
The correlation that selected works close is analyzed, to obtain intermediate frequency classification collocation correlation results;
Using PMI algorithm to the low frequency classification subclass and to its corresponding to classification collocation candidate collection it is related
Property analyzed, with obtain low frequency classification collocation correlation results;
The sorting sub-module 142 is related for being arranged in pairs or groups according to high frequency classification collocation correlation results, intermediate frequency classification
Property result and low frequency classification collocation correlation results be ranked up.
It can specifically be recorded with reference to related above, therefore not to repeat here.
Optionally, second sorting sub-module 140 is arranged using descending arrangement or ascending order.
The third aspect of the present invention provides a kind of computer readable storage medium, the computer readable storage medium
It is stored with computer program, the generation article such as recorded above is realized when the computer program is executed by processor most
The step of method of good collocation.
Computer readable storage medium in the present embodiment, the computer program stored can when being executed by processor
It, should by obtaining several design schemes first to realize the method S100 most preferably to arrange in pairs or groups such as the generation article recorded above
Design scheme can come from one or more user, later, to several articles of each design scheme according to the addition time
Sequence arranges, and followed by, carries out data analysis using natural language processing technique, to obtain classification collocation candidate collection, most
Afterwards, classification collocation candidate collection is arranged according to correlation, and then the best collocation classification of article can be obtained.Therefore,
Computer readable storage medium in the present embodiment can effectively improve the working efficiency of designer, also, can also effectively mention
Height recommends hit rate, further, it is also possible to be learnt by constantly obtaining new design scheme, further increases recommendation hit
Rate.
It is understood that the principle that embodiment of above is intended to be merely illustrative of the present and the exemplary implementation that uses
Mode, however the present invention is not limited thereto.For those skilled in the art, essence of the invention is not being departed from
In the case where mind and essence, various changes and modifications can be made therein, these variations and modifications are also considered as protection scope of the present invention.