Disclosure of Invention
The present invention is directed to at least one of the technical problems in the prior art, and provides a method for generating an optimal matching of an article, an apparatus for generating an optimal matching of an article, and a computer-readable storage medium.
In order to achieve the above object, in a first aspect of the present invention, there is provided a method for generating an optimal matching of an article, including:
step S110, obtaining a plurality of design schemes, wherein each design scheme comprises a plurality of articles and adding time corresponding to each article;
step S120, respectively sequencing a plurality of articles in each design scheme according to the adding time sequence based on the adding time corresponding to each article to form a plurality of article sequence sets;
step S130, integrating the article sequence set corresponding to each design scheme to form an article flow set, and performing data analysis on the article flow set by adopting a preset natural language processing technology to obtain a category collocation candidate set matched with each article;
and S140, sorting the category collocation candidate set of each article according to the relevance.
Optionally, step S130 specifically includes:
performing part-of-speech tagging on the item flow set, and mapping each item to the category to which the item belongs to obtain a category flow set;
performing spectrum analysis on the category stream set to obtain a high-frequency category subset, a medium-frequency category subset and a low-frequency category subset;
and respectively generating a category collocation candidate set for the high-frequency category subset, the medium-frequency category subset and the low-frequency category subset based on a Tri-Gram model.
Optionally, step S140 specifically includes:
analyzing the correlation of the high-frequency subset and the category collocation candidate set corresponding to the high-frequency subset by adopting a T test algorithm to obtain a high-frequency category collocation correlation result;
analyzing the correlation of the intermediate frequency category subset and the category collocation candidate set corresponding to the intermediate frequency category subset by adopting a PMI algorithm and a T inspection algorithm to obtain an intermediate frequency category collocation correlation result;
analyzing the correlation between the low-frequency category subset and the category collocation candidate set corresponding to the low-frequency category subset by adopting a PMI algorithm to obtain a low-frequency category collocation correlation result;
and sequencing according to the high-frequency category collocation correlation result, the medium-frequency category collocation correlation result and the low-frequency category collocation correlation result.
Alternatively, in step S140, a descending order or an ascending order is adopted.
In a second aspect of the present invention, there is provided an apparatus for generating an optimal matching of an article, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a plurality of design schemes, and each design scheme comprises a plurality of articles and adding time corresponding to each article;
the first sequencing module is used for sequencing a plurality of articles in each design scheme according to the adding time sequence based on the adding time corresponding to each article to form a plurality of article sequence sets;
the data analysis module is used for integrating the article sequence set corresponding to each design scheme to form an article flow set, and performing data analysis on the article flow set by adopting a preset natural language processing technology to obtain a category matching candidate set matched with each article;
and the second sorting module is used for sorting the category collocation candidate set of each article according to the relevance.
Optionally, the data analysis module includes a part-of-speech tagging submodule, a spectrum analysis submodule, and a processing submodule;
the part-of-speech tagging submodule is used for performing part-of-speech tagging on the item flow set and mapping each item to the category to which the item belongs so as to obtain a category flow set;
the spectrum analysis submodule is used for performing spectrum analysis on the category stream set to obtain a high-frequency category subset, a medium-frequency category subset and a low-frequency category subset;
and the processing submodule is used for respectively generating a category collocation candidate set for the high-frequency category subset, the medium-frequency category subset and the low-frequency category subset based on the Tri-Gram model.
Optionally, the second sorting module includes a correlation analysis sub-module and a sorting sub-module;
the correlation analysis submodule is used for:
analyzing the relevance of the high-frequency category subset and the category collocation candidate set corresponding to the high-frequency category subset by adopting a T test algorithm to obtain a high-frequency category collocation relevance result;
analyzing the correlation of the intermediate frequency category subset and the category collocation candidate set corresponding to the intermediate frequency category subset by adopting a PMI algorithm and a T inspection algorithm to obtain an intermediate frequency category collocation correlation result;
analyzing the correlation between the low-frequency category subset and the category collocation candidate set corresponding to the low-frequency category subset by adopting a PMI algorithm to obtain a low-frequency category collocation correlation result;
and the sequencing submodule is used for sequencing according to the high-frequency category collocation correlation result, the medium-frequency category collocation correlation result and the low-frequency category collocation correlation result.
Optionally, the second sorting sub-module adopts descending order or ascending order.
In a third aspect of the present invention, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor implements the steps of the method for generating a best fit for an item as described above.
The invention discloses a method, a device and a computer-readable storage medium for generating optimal collocation of articles. Firstly, obtaining a plurality of design schemes which can come from one or a plurality of users, then arranging a plurality of articles of each design scheme according to the adding time sequence, then adopting the natural language processing technology to carry out data analysis so as to obtain a category collocation candidate set, and finally arranging the category collocation candidate set according to the correlation so as to obtain the optimal collocation categories of the articles. Therefore, the work efficiency of designers can be effectively improved, the recommendation hit rate can be effectively improved, and in addition, new design schemes can be continuously obtained for learning, so that the recommendation hit rate is further improved.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present invention, are given by way of illustration and explanation only, not limitation.
As shown in fig. 1, a first aspect of the present invention relates to a method for generating an optimal collocation of items, comprising:
step S110, obtaining a plurality of design schemes, wherein each design scheme comprises a plurality of articles and adding time corresponding to each article.
Specifically, in this step, several designs may be obtained from one user, or may be obtained from two or more users. Thus, in this step, the following information can be collected:
{ design, user name, item, time to add item }.
It should be noted that, the specific number of the design solutions is not limited, and can be determined according to actual needs, and hereinafter, the number of the design solutions is referred to as n, where n is a positive integer greater than or equal to 1.
And S120, respectively sequencing a plurality of articles in each design scheme according to the adding time sequence based on the adding time corresponding to each article to form a plurality of article sequence sets.
Specifically, in this step, the information collected in step S110 is sent to a background for data processing, and a plurality of articles in each design are sorted according to the order of adding time, so as to obtain the following article sequence set:
design scheme 1: [ article 1, article 2, article 3. ]
...
Design scheme n: article n1, article n2, article n 3.
Step S130, integrating the article sequence set corresponding to each design scheme to form an article flow set, and performing data analysis on the article flow set by adopting a preset natural language processing technology to obtain a category collocation candidate set collocated with each article.
In particular, in this step, the set of streams of items formed may be as follows: article 1, article 2, article 3., article n1, article n2, article n3, then a Tri-Gram model in natural language processing may be employed, based on the assumption that the nth word occurs in relation to only the first n-1 words and not in relation to any other words, to arrive at a set of category collocation candidates, e.g., [ (toilet, shower), (toilet, hardware and others), (toilet, bathroom cabinet),. ], and so on.
And S140, sorting the category collocation candidate set of each article according to the relevance.
Specifically, in this step, descending or ascending order may be performed according to the correlation, or the like. For example, the arrangement is performed in the following descending order:
{ Category 1, [ Category 1, Category. ] } in the formula
...
{ Category N, [ Category 1, Category. ] } in the formula
In the method S100 for generating the optimal collocation of the articles in this embodiment, a plurality of design solutions are first obtained, where the design solutions may be from one or more users, then the plurality of articles of each design solution are arranged according to the addition time sequence, then data analysis is performed by using a natural language processing technique, so as to obtain a category collocation candidate set, and finally the category collocation candidate set is arranged according to the relevance, so as to obtain the optimal collocation categories of the articles. Therefore, the method for generating the optimal matching of the articles in the embodiment can effectively improve the working efficiency of designers, can also effectively improve the recommendation hit rate, and can further improve the recommendation hit rate by continuously acquiring new design schemes for learning.
Optionally, step S130 specifically includes:
and performing part-of-speech tagging on the item flow set, and mapping each item to the category to which the item belongs to obtain a category flow set.
Specifically, the set of category streams formed is [ category 1, category 2, category 3,. ], category n1, category n2, category n3 ]. For example, it can be as follows:
[ tiles, custom products ];
[ vertical hinged door, custom product ];
[ custom product, custom product ];
[ custom products, floor tiles, custom products, toilets, veneers, metals, wallpaper ];
[ custom products, ornaments, greenery plants, custom products ];
[ custom products, sliding doors, wardrobes, shoe cabinets, television cabinets, televisions, floor tiles, sliding doors, bay windows ];
[ custom product, double bed, baking finish ];
[ self-defined products, kitchen utensils, ornaments, kitchen utensils ];
...
and carrying out spectrum analysis on the category stream set to obtain a high-frequency category subset, a medium-frequency category subset and a low-frequency category subset.
The procedure for the spectral analysis is as follows:
(1) the number of category classes appearing in the set of category streams is counted (we get 198 categories) and they are sorted in descending order of the original frequency, as shown in fig. 2.
(2) The cumulative frequency is calculated as shown in fig. 3.
(3) All categories are divided into three types, namely high, medium and low frequencies according to the slope of the cumulative frequency curve (the slope value is smaller, the curve is gentler, and the corresponding category evaluation rate is lower), as shown in fig. 4, that is, a high-frequency category subset, a medium-frequency category subset and a low-frequency category subset are generated.
And respectively generating a category collocation candidate set for the high-frequency category subset, the medium-frequency category subset and the low-frequency category subset based on a Tri-Gram model.
Specifically, as shown in fig. 5, based on the assumption that the categories collocated with the high-frequency category subset are all high-frequency categories, the categories collocated with the medium-frequency category subset are all medium-frequency categories, and the categories collocated with the low-frequency category subset are all low-frequency categories.
Optionally, step S140 specifically includes:
analyzing the relevance of the high-frequency category subset and the category collocation candidate set corresponding to the high-frequency category subset by adopting a T test algorithm to obtain a high-frequency category collocation relevance result;
analyzing the correlation of the intermediate frequency category subset and the category collocation candidate set corresponding to the intermediate frequency category subset by adopting a PMI algorithm and a T inspection algorithm to obtain an intermediate frequency category collocation correlation result;
analyzing the correlation between the low-frequency category subset and the category collocation candidate set corresponding to the low-frequency category subset by adopting a PMI algorithm to obtain a low-frequency category collocation correlation result;
and sequencing according to the high-frequency category collocation correlation result, the medium-frequency category collocation correlation result and the low-frequency category collocation correlation result.
Specifically, the PMI algorithm is adopted, and the correlation between two things is measured by using the index, and the formula is as follows:
in probability theory, we know that if x is not correlated with y, then p (x, y) is p (x) p (y). The greater the correlation between the two is,
the larger p (x, y) compared to p (x) p (y). As better understood by the following equation, the conditional probability p (x | y) of x occurring in the case of y, divided by the probability p (x) of x occurring itself, naturally indicates the degree of correlation of x with y. For this scenario in this embodiment, the range of PMI is 0, + ∞), monotonically increasing, and this algorithm is very sensitive to low frequency information.
Calculating the T value of P (x, y) and P (x) P (y) by using a T test algorithm, wherein the T value reflects the relative difference of the collocation intensity, and the formula is as follows:
the larger the value of T, the difference between the observed co-occurrence probability P (x, y) and the random co-occurrence chance probability P (x) P (y) is objective rather than accidental coincidence. From a statistical perspective, 1.65 mean square deviations indicate that we have 95% confidence that a match is a meaningful match, corresponding to a T value of 2.132. This algorithm is very sensitive to high frequency information.
Therefore, the strategy for excavating the optimal collocation from the collocation set is as follows:
a. and filtering the high-frequency matching items by adopting a T-test algorithm, and arranging the high-frequency matching items in a descending order according to the T value.
b. And the intermediate frequency collocation items are filtered by adopting a PMI (PMI index) check algorithm and a T check algorithm, and the results are merged.
c. And filtering the low-frequency collocation items by adopting a PMI (PMI testing algorithm) and arranging the low-frequency collocation items in a descending order according to the PMI values.
In a second aspect of the present invention, as shown in fig. 6, there is provided an apparatus 100 for generating an optimal collocation of articles, comprising:
an obtaining module 110, configured to obtain a plurality of design solutions, where each design solution includes a plurality of articles and an adding time corresponding to each article;
a first sorting module 120, configured to sort, based on the addition time corresponding to each article, the articles in each design solution according to the order of the addition time, so as to form a plurality of article sequence sets;
a data analysis module 130, configured to integrate the item sequence sets corresponding to each design scheme to form an item stream set, and perform data analysis on the item stream set by using a preset natural language processing technology to obtain a category matching candidate set matched with each item;
a second sorting module 140, configured to sort the category collocation candidate set of each of the items according to relevance.
The apparatus 100 for generating the optimal collocation of articles in this embodiment first obtains a plurality of design solutions, where the design solutions may be from one or more users, then arranges a plurality of articles of each design solution according to the adding time sequence, then performs data analysis by using a natural language processing technique, thereby obtaining a category collocation candidate set, and finally arranges the category collocation candidate set according to the relevance, thereby obtaining the optimal collocation categories of the articles. Therefore, the device for generating the best collocation of the articles in the embodiment can effectively improve the working efficiency of designers, can also effectively improve the recommendation hit rate, and can also learn by continuously acquiring new design schemes to further improve the recommendation hit rate.
Optionally, the data analysis module 130 includes a part-of-speech tagging sub-module 131, a spectrum analysis sub-module 132, and a processing sub-module 133;
the part-of-speech tagging submodule 131 is configured to perform part-of-speech tagging on the item stream set, and map each item to a category to which the item belongs, so as to obtain a category stream set;
the spectrum analysis sub-module 132 is configured to perform spectrum analysis on the category stream set to obtain a high-frequency category subset, a medium-frequency category subset, and a low-frequency category subset;
the processing sub-module 133 is configured to generate a category collocation candidate set for the high-frequency category subset, the medium-frequency category subset, and the low-frequency category subset, respectively, based on the Tri-Gram model.
The spectral analysis and the rest of the contents can refer to the related descriptions in the foregoing, and are not described herein again.
Optionally, the second sorting module 140 includes a correlation analysis sub-module 141 and a sorting sub-module 142;
the correlation analysis submodule 141 is configured to:
analyzing the relevance of the high-frequency category subset and the category collocation candidate set corresponding to the high-frequency category subset by adopting a T test algorithm to obtain a high-frequency category collocation relevance result;
analyzing the correlation of the intermediate frequency category subset and the category collocation candidate set corresponding to the intermediate frequency category subset by adopting a PMI algorithm and a T inspection algorithm to obtain an intermediate frequency category collocation correlation result;
analyzing the correlation between the low-frequency category subset and the category collocation candidate set corresponding to the low-frequency category subset by adopting a PMI algorithm to obtain a low-frequency category collocation correlation result;
the sorting submodule 142 is configured to sort according to the high-frequency category collocation correlation result, the medium-frequency category collocation correlation result, and the low-frequency category collocation correlation result.
Reference may be made to the above related descriptions, which are not repeated herein.
Optionally, the second sorting sub-module 140 adopts descending order or ascending order.
In a third aspect of the present invention, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor implements the steps of the method for generating a best fit for an item as described above.
In the computer-readable storage medium of this embodiment, when being executed by a processor, a stored computer program may implement the method S100 for generating optimal collocation of articles as described above, where a plurality of design solutions are obtained first, the design solutions may come from one or more users, then the plurality of articles of each design solution are arranged according to an addition time sequence, then data analysis is performed by using a natural language processing technique, so as to obtain a category collocation candidate set, and finally the category collocation candidate set is arranged according to a correlation, so as to obtain an optimal collocation category of the articles. Therefore, the computer-readable storage medium in this embodiment can effectively improve the work efficiency of designers, and can also effectively improve the recommendation hit rate, and in addition, can further improve the recommendation hit rate by continuously acquiring new design schemes for learning.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.