CA3166094A1

CA3166094A1 - Commodity short title generation method and apparatus

Info

Publication number: CA3166094A1
Application number: CA3166094A
Authority: CA
Inventors: Bin Zhu; Yi SHEN; Kang QI; Heqiang NI; Shu Chen
Original assignee: 10353744 Canada Ltd
Current assignee: 10353744 Canada Ltd
Priority date: 2019-12-27
Filing date: 2020-08-28
Publication date: 2021-07-01
Also published as: CA3217721A1; WO2021128914A1; CA3217669A1; CN111191022A; CN111191022B

Abstract

A commodity short title generation method and apparatus, relating to the technical field of text abstracts, and capable of improving the generation efficiency and precision of a commodity short title. The method comprises: crawling commodity title data and/or collecting search word data, and constructing a corpus data set; based on a commodity classification table, classifying a plurality of corpora in the corpus data set according to commodity categories, and then extracting keywords to construct a word bank; labelling each keyword in the word bank as a modifying word or a category word according to part-of-speech; obtaining original commodity title data, performing word segmentation to obtain a plurality of title words, separately matching each title word with the keyword in the word bank, and outputting the successfully matched keyword; and selecting at least two effective keywords from the plurality of keywords, and according to the part-of-speech, combining the effective keywords to form the commodity short title.

Description

COMMODITY SHORT TITLE GENERATION METHOD AND APPARATUS
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of text abstracting, and more particularly to a method and an apparatus for generating merchandise short-titles.
Description of Related Art

[0002] Merchandise short-titles are generally formed by compressing a standard-length titles of merchandise items. As implied in the name, short-titles are simple, concise, and short.
The purpose of short-titles is to describe key information of merchandise items with the least possible words so that users can get such key information at a glance.
An example of a short-title is "Korean-cutting all-over print dress." This can be regarded as a special text abstracting technology in the sense of natural language processing.

[0003] The traditional text abstracting techniques, such as TextRank, and Lead-3, are about abstracting sentences from articles, and are not really suitable for generation of merchandise titles. With the rapid development of deep learning, various deep-learning models, like seq2seq and pointer-generation, can be used to generate compressed short-titles. However, without sufficient short-title trained corpus, these models are not applicable to practical applications, particularly for generation of merchandise titles.
SUMMARY OF THE INVENTION

[0004] The objective of the present invention is to provide a method and an apparatus for generating a merchandise short-titles, which can generate merchandise short-titles with improved efficiency and precision.

[0005] For achieving the foregoing objective, in a first aspect, the present invention provides a method for generating a merchandise short-title, which comprises:

Date Regue/Date Received 2022-06-27

[0006] crawling merchandise title data and/or collecting search term data, so as to construct a corpus data set;

[0007] based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library;

[0008] tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word;

[0009] performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and

[0010] sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.

[0011] Preferably, the step of based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library comprises:

[0012] based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;

[0013] performing word segmentation on the corpuses, respectively, so as to obtain the plural key words, and de-duplicating and then filtering the key words in every merchandise category so as to obtain key word sets each corresponding to a said merchandise category;
and

[0014] uniting the plural key words sets to form the word library.

[0015] More preferably, the step of tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word comprises:

[0016] extracting the key words that are the modifier words or the category words from the word library by means of manual tagging and tagging the corresponding parts of speech; and/or

[0017] extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model.

[0018] Further, after the step of extracting the key words that are the modifier words or the Date Regue/Date Received 2022-06-27 category words from the word library by means of manual tagging and tagging the corresponding parts of speech, the method further comprises:

[0019] crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library;

[0020] if a number of the key words that have matches is smaller than a threshold, adding the key words in the new merchandise title data into the corresponding key word sets, and tagging the newly added key words for their parts of speech; or

[0021] if the number of the key words that have matches is greater than the threshold, crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library again.

[0022] Preferably, after the step of extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model, the method further comprises:

[0023] based on a semantic recognition technology in the machine model, extracting the key words that are the modifier words or the category words from the newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

[0024] Preferably, the step of performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches comprises:

[0025] recognizing the merchandise categories in the original merchandise title data, and matching them with the corresponding key word sets; and

[0026] segmenting the original merchandise title data into the plural title words, matching each of the title words with the key words in the corresponding key word set, and sieving out the key words that have matches.

[0027] Preferably, the step of sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to Date Regue/Date Received 2022-06-27 their parts of speech comprises:

[0028] recording location information of each of the key words in the original merchandise title data;

[0029] if in the key words tagged as the modifier words, there are plural said key words whose lexical scopes have intersection, only one said key word in the intersection is kept;

[0030] if in the key words tagged as the modifier words, there are plural said key words in which the lexical scope of one said key word contains the lexical scope of another said key word, only the key word has the largest lexical scope is kept;

[0031] if the key words tagged as the category words have word sense containing word sense of any said key word tagged as the modifier word, the key word corresponding to the modifier word is removed; and

[0032] defining the left key words as the effective key words, and stitching them into the merchandise short-title according to locational sequence thereof.

[0033] Optionally, matching the different original merchandise title data with the word library, respectively, performing parallel processing, and outputting plural corresponding merchandise short-titles.

[0034] Exemplarily, the search term data represent a collection of search terms to be input by a user for searching for a merchandise item.

[0035] As compared to the prior art, the method for generating merchandise short-titles of the present invention provides the following beneficial effects:

[0036] In the method for generating merchandise short-titles according to the present invention, a corpus data set is first constructed. Then, based on the merchandise category table, corpuses in the corpus data set as categorized. From the categorized corpuses, key words are extracted to form a word library. Every key word in the word library is tagged as a modifier word or a category word according to its part of speech. The word library is so established. Afterward, original merchandise title data are acquired and to be compressed.
The original merchandise title data are segmented to obtain plural title words. These title words are entered into the word library to be matched with the key words. From the key words that have matches, at least two effective key words are sieved out, and stitched into Date Regue/Date Received 2022-06-27 a merchandise short-title according to the order of their parts of speech.

[0037] It is thus clear that the present invention categorizes corpuses before tagging them, thereby effectively reducing difficulty of the tagging process and tagging key words more efficiency. By segmenting the original merchandise title data and directly matching the data with the key words in the word library, the sieved and stitched merchandise short-title is more precise.

[0038] In another aspect, the present invention provides an apparatus for generating merchandise short-titles, which is applied with the method for generating merchandise short-titles as described above. The apparatus comprises:

[0039] a data collecting unit, for crawling merchandise title data and/or collecting search term data, so as to construct a corpus data set;

[0040] a word library unit, for based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library;

[0041] a word tagging unit, for tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word;

[0042] a word matching unit, for performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and

[0043] a processing unit, for sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.

[0044] As compared to the prior art, the disclosed apparatus for generating merchandise short-titles provides beneficial effects that are similar to those provided by the method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.

[0045] In a third aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored. When run by a processor, the computer program executes the steps of the method for generating merchandise short-titles as described Date Regue/Date Received 2022-06-27 above.

[0046] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.
BRIEF DESCRIPTION OF THE DRAWINGS

[0047] The accompanying drawing is provided herein for better understanding of the present invention and form a part of this disclosure. The illustrative embodiments and their descriptions are for explaining the present invention and by no means form any improper limitation to the present invention, wherein:

[0048] FIG. 1 is a flowchart of a method for generating merchandise short-titles according to a first embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION

[0049] To make the foregoing objectives, features, and advantages of the present invention clearer and more understandable, the following description will be directed to some embodiments as depicted in the accompanying drawings to detail the technical schemes disclosed in these embodiments. It is, however, to be understood that the embodiments referred herein are only a part of all possible embodiments and thus not exhaustive. Based on the embodiments of the present invention, all the other embodiments can be conceived without creative labor by people of ordinary skill in the art, and all these and other embodiments shall be encompassed in the scope of the present invention.

[0050] Embodiment 1

[0051] Referring to FIG. 1, the present embodiment provides a method for generating a merchandise short-title, comprising:

[0052] crawling merchandise title data and/or collecting search term data, so as to construct a corpus data set; based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a Date Regue/Date Received 2022-06-27 word library; tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word; performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.

[0053] In the method for generating merchandise short-titles according to the present embodiment, a corpus data set is first constructed. Then, based on the merchandise category table, corpuses in the corpus data set as categorized. From the categorized corpuses, key words are extracted to form a word library. Every key word in the word library is tagged as a modifier word or a category word according to its part of speech.
The word library is so established. Afterward, original merchandise title data are acquired and to be compressed. The original merchandise title data are segmented to obtain plural title words. These title words are entered into the word library to be matched with the key words. From the key words that have matches, at least two effective key words are sieved out, and stitched into a merchandise short-title according to the order of their parts of speech.

[0054] It is thus clear that the present invention categorizes corpuses before tagging them, thereby effectively reducing difficulty of the tagging process and tagging key words more efficiency. By segmenting the original merchandise title data and directly matching the data with the key words in the word library, the sieved and stitched merchandise short-title is more precise.

[0055] It is to be noted that the data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data. For crawling the merchandise title data, it is important to crawl merchandise short-titles from major e-commerce platforms. For collecting the search term data, search terms used for searching for various merchandise items, namely query data, are gathered.

[0056] In the embodiment, the step of based on a merchandise category table, categorizing Date Regue/Date Received 2022-06-27 corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library comprises:

[0057] based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories; performing word segmentation on the corpuses, respectively, so as to obtain the plural key words, and de-duplicating and then filtering the key words in every merchandise category so as to obtain key word sets each corresponding to a said merchandise category; and uniting the plural key words sets to form the word library.

[0058] Since tagging corpuses directly represents a prodigious workload, for reducing difficulty and improving efficiency of the tagging task, it is desired to categorize corpuses in the corpus data set according to a merchandise category table (e.g., a quaternary merchandise group). For example, the categories may include a clothes corpus group, a pants corpus group, a mobile phone corpus group, etc. Then the categorized corpuses are segmented so that every category group is formed by plural key words. Those irrelevant key words are filtered out (denoising key words), and the key words in every category group are de-duplicated, so as to ensure every key word is unique in its group. Eventually, key word sets are formed and each correspond to a category group. By uniting all the key word sets, the word library is formed.

[0059] In the embodiment, the step of tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word comprises:

[0060] extracting the key words that are the modifier words or the category words from the word library by means of manual tagging and tagging the corresponding parts of speech; and/or extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model.

[0061] As implied in the name, manual tagging refers to manually determining whether a key word in the word library is a modifier word or a category word, and manually tagging the key word. Differently, a machine tagging model implements automatically recognizing and tagging techniques. When the number of key words in the word library is huge, such Date Regue/Date Received 2022-06-27 a machine model is effective in improving tagging efficiency. However, as demonstrated in practice, while a machine model provides high efficiency, its tagging results are less precise than those from manual operation. Therefore, it is preferred to combine the two solutions for tagging key words in the word library. For example, a machine model is first used to pre-tag numerous key words, and then manual verification is performed, so as to balance and maximize efficiency and precision of key-word tagging.

[0062] after the step of extracting the key words that are the modifier words or the category words from the word library by means of manual tagging and tagging the corresponding parts of speech, the method further comprises:

[0063] crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library; if a number of the key words that have matches is smaller than a threshold, adding the key words in the new merchandise title data into the corresponding key word sets, and tagging the newly added key words for their parts of speech; or if the number of the key words that have matches is greater than the threshold, crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library again.

[0064] The objective of the embodiment is to increase word sources for the word library. By keeping acquiring new merchandise title data, the robustness of the key words in the word library can be evaluated. Specifically, word segmentation is performed on the merchandise title data, and the results are filtered so that only those key words whose parts of speech are identified as modifier words and category words are kept.
When the number of the left key words and the number of the key words in the word library are smaller than a threshold, it indicates that the key words in the word library are not robust enough. At this time, the key words in the merchandise title data that do not have matches are supplemented into the corresponding key word sets. The newly added key words are tagged by their parts of speech. On the contrary, if the number of the left key words and the number of the key words in the word library are greater than the threshold, it indicates that the collection of the key words in the word library is competent to deal with the Date Regue/Date Received 2022-06-27 current merchandise title data. Thus, a user can continue to crawl new merchandise title data and repeat the foregoing process to continuously assess the word library.

Exemplarily, the threshold is 3.

[0065] after the step of extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model, the method further comprises:

[0066] based on a semantic recognition technology in the machine model, extracting the key words that are the modifier words or the category words from the newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

[0067] Optionally, the machine model may be a BiLSTM+CRF deep learning model.
By using such a deep learning model to extract the key words that are modifier words or category words from the newly crawled merchandise title data, tagging the key words and adding them into the corresponding key word sets, the deep learning model demonstrates great adaptivity and can automatically recognizing category words and modifiers in the merchandise title according to contextual information.

[0068] Further, in the embodiment, the step of performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches comprises:

[0069] recognizing the merchandise categories in the original merchandise title data, and matching them with the corresponding key word sets; and segmenting the original merchandise title data into the plural title words, matching each of the title words with the key words in the corresponding key word set, and sieving out the key words that have matches.

[0070] Preferably, multiple different original merchandise title data may be acquired at the same time and matched with the word library, respectively. Then parallel processing is performed to output plural merchandise short-titles.
Date Regue/Date Received 2022-06-27

[0071] In practical implementations, merchandise categories in different original merchandise title data can be recognized at the same time and have respective matched key word sets.
The original merchandise title data are segmented into plural title words.
Then each of the title words is matched with the key words in the corresponding key word set, and the key words have matches in the original merchandise title data are sieved out.

[0072] Further, in the embodiment, the step of sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:

[0073] recording location information of each of the key words in the original merchandise title data; if in the key words tagged as the modifier words, there are plural said key words whose lexical scopes have intersection, only one said key word in the intersection is kept;
if in the key words tagged as the modifier words, there are plural said key words in which the lexical scope of one said key word contains the lexical scope of another said key word, only the key word has the largest lexical scope is kept; if the key words tagged as the category words have word sense containing word sense of any said key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
and defining the left key words as the effective key words, and stitching them into the merchandise short-title according to locational sequence thereof. In practical implementations, the key words tagged as the category words in the original merchandise title data are processed first.

[0074] It is understandable that, according to the word count of the merchandise short-title, modifier key words and category key words satisfying preset criteria can be found and then they can be stitched together according to their locational sequence, so as to form a fluent merchandise short-title. The described embodiment is for explaining how to generate a merchandise short-title from original merchandise title data. If there are different original merchandise title data, the foregoing process may be repeated as many times as required, thereby facilitating batch generation of merchandise short-titles.

[0075] Embodiment 2 Date Regue/Date Received 2022-06-27

[0076] The present embodiment provides an apparatus for generating merchandise short-titles, comprising:

[0077] a data collecting unit, for crawling merchandise title data and/or collecting search term data, so as to construct a corpus data set;

[0078] a word library unit, for based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library;

[0079] a word tagging unit, for tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word;

[0080] a word matching unit, for performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and

[0081] a processing unit, for sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.

[0082] As compared to the prior art, the disclosed apparatus for generating merchandise short-titles provides beneficial effects that are similar to those provided by the disclosed smart method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.

[0083] Embodiment 3

[0084] The present embodiment provides a computer-readable storage medium, in which a computer program is stored. When run by a processor, the computer program executes the steps of the method for generating merchandise short-titles as described previously.

[0085] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the disclosed smart method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.

[0086] As will be appreciated by people of ordinary skill in the art, implementation of all or a Date Regue/Date Received 2022-06-27 part of the steps of the method of the present invention as described previously may be realized by having a program instruct related hardware components. The program may be stored in a computer-readable storage medium, and the program is about performing the individual steps of the methods described in the foregoing embodiments.
The storage medium may be a ROM/RAM, a hard drive, an optical disk, a memory card or the like.

[0087] The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims. Hence, the scope of the present invention shall only be defined by the appended claims.

Date Regue/Date Received 2022-06-27

Claims

Claims:

1. An apparatus comprising:
a data collecting unit, configured to crawl merchandise title data and/or collect search term data, to construct a corpus data set;
a word library unit, configured to:
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library;
a word tagging unit, configured to tag each key word in the word library as a modifier word or a category word according to a part of speech of word;
a word matching unit, configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

2. The apparatus of claim 1, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:

based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

3. The apparatus of claim 2, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

4. The apparatus of claim 2, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

5. The apparatus of any one of claim 3 to 4, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
DateRegue/DateReceived 2022-06-27 wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

6. The apparatus of any one of claims 3 to 5, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

7. The apparatus of any one of claims 2 to 5, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

8. The apparatus of any of claims 1 to 5, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;

DateRegue/DateReceived 2022-06-27 wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

9. The apparatus of claim 1, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

10. The apparatus of claim 1, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

11. The apparatus of any one of claims 1 to 10, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

12. The apparatus of any one of claims 1 to 11, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

13. The apparatus of any one of claims 1 to 12, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

DateRegue/DateReceived 2022-06-27

14. The apparatus of any one of claims 1 to 13, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

15. The apparatus of any one of claims 1 to 14, wherein key word sets are formed and each correspond to the category group.

16. The apparatus of any one of claims 1 to 15, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

17. The apparatus of any one of claims 1 to 16, wherein the machine model is a BiLSTIV1+CRF
deep learning model.

18. The apparatus of any one of claims 1 to 17, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

19. The apparatus of any one of claims 1 to 18, wherein the key words tagged as the category words in the original merchandise title data are processed first.

20. The apparatus of any one of claims 1 to 19, wherein there are the different original merchandise title data, processes are repeated as many times as required.

21. A system comprising:
a data collecting unit, configured to crawl merchandise title data and/or collect search term data, to construct a corpus data set;
a word library unit, configured to:
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library;

DateRegue/DateReceived 2022-06-27 a word tagging unit, configured to tag each key word in the word library as a modifier word or a category word according to a part of speech of word;
a word matching unit, configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

22. The system of claim 21, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

23. The system of claim 22, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:

DateRegue/DateReceived 2022-06-27 extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

24. The system of claim 22, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

25. The system of any one of claim 23 to 24, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

26. The system of any one of claims 23 to 25, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.
DateRegue/DateReceived 2022-06-27

27. The system of any one of claims 22 to 25, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

28. The system of any of claims 21 to 25, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and DateRegue/DateReceived 2022-06-27 stitching the remaining key words into the merchandise short-title according to locational sequence.

29. The system of claim 21, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

30. The system of claim 21, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

31. The system of any one of claims 21 to 30, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

32. The system of any one of claims 21 to 31, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

33. The system of any one of claims 21 to 32, wherein for collecting the search terzi data, search terms used for searching for the merchandise items, includes query data.

34. The system of any one of claims 21 to 33, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

35. The system of any one of claims 21 to 34, wherein key word sets are formed and each correspond to the category group.

36. The system of any one of claims 21 to 35, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

37. The system of any one of claims 21 to 36, wherein the machine model is a BiLSTM+CRF
deep learning model.

DateRegue/DateReceived 2022-06-27

38. The system of any one of claims 21 to 37, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

39. The system of any one of claims 21 to 38, wherein the key words tagged as the category words in the original merchandise title data are processed first.

40. The system of any one of claims 21 to 39, wherein there are the different original merchandise title data, processes are repeated as many times as required.

41. A method comprising:
crawling merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories;
extracting key words to construct a word library;
tagging each key word in the word library as a modifier word or a category word according to a part of speech of the key word;
performing word segmentation on original merchandise title data to obtain plural title words;
matching each of the title words with the key words in the word library;
outputting the key words with matches;
sieving out at least two effective key words from plural key words; and stitching the effective key words into merchandise short-title according to their parts of speech.

42. The method of claim 41, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

43. The method of claim 42, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

44. The method of claim 42, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

45. The method of any one of claim 43 to 44, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;

DateRegue/DateReceived 2022-06-27 wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

46. The method of any one of claims 43 to 45, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

47. The method of any one of claims 42 to 45, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

48. The method of any of claims 41 to 45, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
DateRegue/DateReceived 2022-06-27 recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

49. The method of claim 41, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

50. The method of claim 41, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

51. The method of any one of claims 41 to 50, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

52. The method of any one of claims 41 to 51, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

DateRegue/DateReceived 2022-06-27

53. The method of any one of claims 41 to 52, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

54. The method of any one of claims 41 to 53, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

55. The method of any one of claims 41 to 54, wherein key word sets are formed and each correspond to the category group.

56. The method of any one of claims 41 to 54, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

57. The method of any one of claims 41 to 56, wherein the machine model is a BiLSTM+CRF
deep learning model.

58. The method of any one of claims 41 to 57, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

59. The method of any one of claims 41 to 58, wherein the key words tagged as the category words in the original merchandise title data are processed first.

60. The method of any one of claims 41 to 59, wherein there are the different original merchandise title data, processes are repeated as many times as required.

61. A computer equipment comprising:
a processor configured to:
crawl merchandise title data and/or collecting search term data, to construct a corpus data set;

DateRegue/DateReceived 2022-06-27 based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library;
tag each key word in the word library as a modifier word or a category word according to a part of speech of the key word;
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

62. The equipment of claim 61, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

63. The equipment of claim 62, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:

DateRegue/DateReceived 2022-06-27 extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

64. The equipment of claim 62, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

65. The equipment of any one of claim 63 to 64, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

66. The equipment of any one of claims 63 to 65, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

DateRegue/DateReceived 2022-06-27

67. The equipment of any one of claims 62 to 65, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

68. The equipment of any of claims 61 to 65, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and DateRegue/DateReceived 2022-06-27 stitching the remaining key words into the merchandise short-title according to locational sequence.

69. The equipment of claim 61, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

70. The equipment of claim 61, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

71. The equipment of any one of claims 61 to 70, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

72. The equipment of any one of claims 61 to 71, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

73. The equipment of any one of claims 61 to 72, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

74. The equipment of any one of claims 61 to 73, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

75. The equipment of any one of claims 61 to 74, wherein key word sets are formed and each correspond to the category group.

76. The equipment of any one of claims 61 to 75, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

77. The equipment of any one of claims 61 to 76, wherein the machine model is a BiLSTM+CRF deep learning model.

DateRegue/DateReceived 2022-06-27

78. The equipment of any one of claims 61 to 77, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

79. The equipment of any one of claims 61 to 78, wherein the key words tagged as the category words in the original merchandise title data are processed first.

80. The equipment of any one of claims 61 to 79, wherein there are the different original merchandise title data, processes are repeated as many times as required.

81. A computer readable physical memory having stored thereon a computer program executed by a computer configured to:
crawl merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library;
tag each key word in the word library as a modifier word or a category word according to a part of speech of the key word;
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

DateRegue/DateReceived 2022-06-27

82. The memory of claim 81, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

83. The memory of claim 82, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

84. The memory of claim 82, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

85. The memory of any one of claim 83 to 84, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;

DateRegue/DateReceived 2022-06-27 wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

86. The memory of any one of claims 83 to 85, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

87. The memory of any one of claims 82 to 85, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

88. The memory of any of claims 81 to 85, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:

DateRegue/DateReceived 2022-06-27 recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

89. The memory of claim 81, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

90. The memory of claim 81, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

91. The memory of any one of claims 81 to 90, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

92. The memory of any one of claims 81 to 91, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.
DateRegue/DateReceived 2022-06-27

93. The memory of any one of claims 81 to 92, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

94. The memory of any one of claims 81 to 93, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

95. The memory of any one of claims 81 to 94, wherein key word sets are formed and each correspond to the category group.

96. The memory of any one of claims 81 to 96, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

97. The memory of any one of claims 81 to 97, wherein the machine model is a BiLSTIV1+CRF
deep learning model.

98. The memory of any one of claims 81 to 98, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

99. The memory of any one of claims 81 to 99, wherein the key words tagged as the category words in the original merchandise title data are processed first.

100. The memory of any one of claims 81 to 100, wherein there are the different original merchandise title data, processes are repeated as many times as required.

101. An apparatus comprising:
a data collecting unit, configured to crawl merchandise title data and/or collect search term data, to construct a corpus data set;
a word library unit, configured to:

DateRegue/DateReceived 2022-06-27 based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library; and a word tagging unit, configured to tag each key word in the word library as a modifier word or a category word according to a part of speech of word.

102. The apparatus of claim 101, further comprises:
a word matching unit, configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

103. The apparatus of claim 102, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;

DateRegue/DateReceived 2022-06-27 de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

104. The apparatus of claim 103, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

105. The apparatus of claim 103, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

106. The apparatus of any one of claim 104 to 105, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

107. The apparatus of any one of claims 104 to 106, further comprises:

DateRegue/DateReceived 2022-06-27 based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

108. The apparatus of any one of claims 103 to 106, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

109. The apparatus of any of claims 101 to 106, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;

DateRegue/DateReceived 2022-06-27 wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

110. The apparatus of claim 102, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

111. The apparatus of claim 102, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

112. The apparatus of any one of claims 101 to 111, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

113. The apparatus of any one of claims 101 to 112, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

114. The apparatus of any one of claims 101 to 113, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

115. The apparatus of any one of claims 101 to 114, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

116. The apparatus of any one of claims 101 to 115, wherein key word sets are formed and each correspond to the category group.
DateRegue/DateReceived 2022-06-27

117. The apparatus of any one of claims 101 to 116, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

118. The apparatus of any one of claims 101 to 117, wherein the machine model is a BiLSTM+CRF deep learning model.

119. The apparatus of any one of claims 101 to 118, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

120. The apparatus of any one of claims 101 to 119, wherein the key words tagged as the category words in the original merchandise title data are processed first.

121. The apparatus of any one of claims 101 to 120, wherein there are the different original merchandise title data, processes are repeated as many times as required.

122.A system comprising:
a data collecting unit, configured to crawl merchandise title data and/or collect search term data, to construct a corpus data set;
a word library unit, configured to:
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library; and a word tagging unit, configured to tag each key word in the word library as a modifier word or a category word according to a part of speech of word.

123. The system of claim 122, further comprises:
a word matching unit, configured to:

DateRegue/DateReceived 2022-06-27 perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

124. The system of claim 123, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

125. The system of claim 124, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

126. The system of claim 124, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:

DateRegue/DateReceived 2022-06-27 extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

127. The system of any one of claim 125 to 126, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

128. The system of any one of claims 125 to 127, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

129. The system of any one of claims 124 to 127, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;

DateRegue/DateReceived 2022-06-27 segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

130. The system of any of claims 122 to 127, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

131. The system of claim 123, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

DateRegue/DateReceived 2022-06-27

132. The system of claim 123, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

133. The system of any one of claims 122 to 132, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

134. The system of any one of claims 122 to 133, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

135. The system of any one of claims 122 to 134, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

136. The system of any one of claims 122 to 135, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

137. The system of any one of claims 122 to 136, wherein key word sets are formed and each correspond to the category group.

138. The system of any one of claims 122 to 137, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

139. The system of any one of claims 122 to 138, wherein the machine model is a BiLSTM+CRF
deep learning model.

140. The system of any one of claims 122 to 139, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

141. The system of any one of claims 122 to 140, wherein the key words tagged as the category words in the original merchandise title data are processed first.

142. The system of any one of claims 122 to 141, wherein there are the different original merchandise title data, processes are repeated as many times as required.

143.A method comprising:
DateRegue/DateReceived 2022-06-27 crawling merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories;
extracting key words to construct a word library; and tagging each key word in the word library as a modifier word or a category word according to a part of speech of the key word.

144. The method of claim 143, further comprises:
performing word segmentation on original merchandise title data to obtain plural title words;
matching each of the title words with the key words in the word library;
outputting the key words with matches;
sieving out at least two effective key words from plural key words; and stitching the effective key words into merchandise short-title according to their parts of speech.

145. The method of claim 144, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and DateRegue/DateReceived 2022-06-27 uniting the key words sets to form the word library.

146. The method of claim 145, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

147. The method of claim 145, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

148. The method of any one of claim 146 to 147, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

149. The method of any one of claims 146 to 148, further comprises:

DateRegue/DateReceived 2022-06-27 based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

150. The method of any one of claims 144 to 148, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

151. The method of any of claims 143 to 148, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;

DateRegue/DateReceived 2022-06-27 wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

152. The method of claim 144, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

153. The method of claim 144, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

154. The method of any one of claims 143 to 153, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

155. The method of any one of claims 143 to 154, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

156. The method of any one of claims 143 to 155, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

157. The method of any one of claims 143 to 156, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

158. The method of any one of claims 143 to 157, wherein key word sets are formed and each correspond to the category group.

DateRegue/DateReceived 2022-06-27

159. The method of any one of claims 143 to 158, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

160. The method of any one of claims 143 to 159, wherein the machine model is a BiLSTM+CRF deep learning model.

161. The method of any one of claims 143 to 160, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

162. The method of any one of claims 143 to 161, wherein the key words tagged as the category words in the original merchandise title data are processed first.

163. The method of any one of claims 143 to 162, wherein there are the different original merchandise title data, processes are repeated as many times as required.

164.A computer equipment comprising:
a processor configured to:
crawl merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library; and tag each key word in the word library as a modifier word or a category word according to a part of speech of the key word.

165. The equipment of claim 164, further comprises:
the processor configured to:
DateRegue/DateReceived 2022-06-27 perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

166. The equipment of claim 165, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

167. The equipment of claim 166, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

168. The equipment of claim 166, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:

DateRegue/DateReceived 2022-06-27 extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

169. The equipment of any one of claim 167 to 168, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

170. The equipment of any one of claims 167 to 169, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

171. The equipment of any one of claims 166 to 169, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;

DateRegue/DateReceived 2022-06-27 segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

172. The equipment of any of claims 164 to 169, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

173. The equipment of claim 165, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

DateRegue/DateReceived 2022-06-27

174. The equipment of claim 165, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

175. The equipment of any one of claims 164 to 174, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

176. The equipment of any one of claims 164 to 175, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

177. The equipment of any one of claims 164 to 176, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

178. The equipment of any one of claims 164 to 177, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

179. The equipment of any one of claims 164 to 178, wherein key word sets are formed and each correspond to the category group.

180. The equipment of any one of claims 164 to 179, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

181. The equipment of any one of claims 164 to 180, wherein the machine model is a BiLSTM+CRF deep learning model.

182. The equipment of any one of claims 164 to 181, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

183. The equipment of any one of claims 164 to 182, wherein the key words tagged as the category words in the original merchandise title data are processed first.

184. The equipment of any one of claims 164 to 183, wherein there are the different original merchandise title data, processes are repeated as many times as required.

DateRegue/DateReceived 2022-06-27

185.A computer readable physical memory having stored thereon a computer program executed by a computer configured to:
crawl merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract key words to construct a word library; and tag each key word in the word library as a modifier word or a category word according to a part of speech of the key word.

186. The memory of claim 185, further comprises:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with the key words in the word library;
output the key words with matches;
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

187. The memory of claim 186, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
DateRegue/DateReceived 2022-06-27 performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

188. The memory of claim 187, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

189. The memory of claim 187, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

190. The memory of any one of claim 188 to 189, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

191. The memory of any one of claims 188 to 190, further comprises:

DateRegue/DateReceived 2022-06-27 based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

192. The memory of any one of claims 186 to 190, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

193. The memory of any of claims 185 to 190, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;

DateRegue/DateReceived 2022-06-27 wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

194. The memory of claim 186, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

195. The memory of claim 186, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

196. The memory of any one of claims 185 to 195, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

197. The memory of any one of claims 185 to 196, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

198. The memory of any one of claims 185 to 197, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

199. The memory of any one of claims 185 to 198, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

200. The memory of any one of claims 185 to 199, wherein key word sets are formed and each correspond to the category group.

DateRegue/DateReceived 2022-06-27

201. The memory of any one of claims 185 to 200, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

202. The memory of any one of claims 185 to 201, wherein the machine model is a BiLSTM+CRF deep learning model.

203. The memory of any one of claims 185 to 202, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

204. The memory of any one of claims 185 to 203, wherein the key words tagged as the category words in the original merchandise title data are processed first.

205. The memory of any one of claims 185 to 204, wherein there are the different original merchandise title data, processes are repeated as many times as required.

206. An apparatus comprising:
a word matching unit, configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with key words in a word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

DateRegue/DateReceived 2022-06-27

207. The apparatus of claim 206, further comprises:
a data collecting unit, configured to crawl merchandise title data and/or collect search term data, to construct a corpus data set;
a word library unit, configured to:
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract the key words to construct the word library; and a word tagging unit, configured to tag each key word in the word library as a modifier word or a category word according to a part of speech of word.

208. The apparatus of claim 207, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

209. The apparatus of claim 208, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;
DateRegue/DateReceived 2022-06-27

210. The apparatus of claim 208, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

211. The apparatus of any one of claim 209 to 210, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

212. The apparatus of any one of claims 209 to 211, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

213. The apparatus of any one of claims 208 to 211, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

DateRegue/DateReceived 2022-06-27 matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

214. The apparatus of any of claims 206 to 211, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

215. The apparatus of claim 207, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and DateRegue/DateReceived 2022-06-27 outputting plural corresponding merchandise short-titles.

216. The apparatus of claim 207, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

217. The apparatus of any one of claims 206 to 216, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

218. The apparatus of any one of claims 206 to 217, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

219. The apparatus of any one of claims 206 to 218, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

220. The apparatus of any one of claims 206 to 219, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

221. The apparatus of any one of claims 206 to 220, wherein key word sets are formed and each correspond to the category group.

222. The apparatus of any one of claims 206 to 221, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

223. The apparatus of any one of claims 206 to 222, wherein the machine model is a BiLSTM+CRF deep learning model.

224. The apparatus of any one of claims 206 to 223, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

225. The apparatus of any one of claims 206 to 224, wherein the key words tagged as the category words in the original merchandise title data are processed first.

DateRegue/DateReceived 2022-06-27

226. The apparatus of any one of claims 206 to 225, wherein there are the different original merchandise title data, processes are repeated as many times as required.

227.A system comprising:
a word matching unit, configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with key words in a word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

228. The system of claim 227, further comprises:
a data collecting unit, configured to crawl merchandise title data and/or collect search term data, to construct a corpus data set;
a word library unit, configured to:
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract the key words to construct the word library; and a word tagging unit, configured to tag each key word in the word library as a modifier word or a category word according to a part of speech of word.

DateRegue/DateReceived 2022-06-27

229. The system of claim 228, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

230. The system of claim 229, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

231. The system of claim 229, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

232. The system of any one of claim 230 to 231, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
DateRegue/DateReceived 2022-06-27 wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

233. The system of any one of claims 230 to 232, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

234. The system of any one of claims 229 to 232, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

235. The system of any of claims 227 to 232, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:

DateRegue/DateReceived 2022-06-27 recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

236. The system of claim 228, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

237. The system of claim 228, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

238. The system of any one of claims 227 to 237, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

239. The system of any one of claims 227 to 238, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

DateRegue/DateReceived 2022-06-27

240. The system of any one of claims 227 to 239, wherein for collecting the search temi data, search terms used for searching for the merchandise items, includes query data.

241. The system of any one of claims 227 to 240, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

242. The system of any one of claims 227 to 241, wherein key word sets are formed and each correspond to the category group.

243. The system of any one of claims 227 to 242, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

244. The system of any one of claims 227 to 243, wherein the machine model is a BiLSTM+CRF
deep learning model.

245. The system of any one of claims 227 to 244, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

246. The system of any one of claims 227 to 245, wherein the key words tagged as the category words in the original merchandise title data are processed first.

247. The system of any one of claims 227 to 246, wherein there are the different original merchandise title data, processes are repeated as many times as required.

248.A method comprising:
performing word segmentation on original merchandise title data to obtain plural title words;
matching each of the title words with key words in a word library;
outputting the key words with matches;
sieving out at least two effective key words from plural key words; and DateRegue/DateReceived 2022-06-27 stitching the effective key words into merchandise short-title according to their parts of speech.

249. The method of claim 248, further comprises:
crawling merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories;
extracting the key words to construct the word library; and tagging each key word in the word library as a modifier word or a category word according to a part of speech of the key word.

250. The method of claim 249, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

251. The method of claim 250, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

DateRegue/DateReceived 2022-06-27

252. The method of claim 250, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

253. The method of any one of claim 251 to 252, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

254. The method of any one of claims 251 to 253, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

255. The method of any one of claims 250 to 253, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;
DateRegue/DateReceived 2022-06-27 matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

256. The method of any of claims 248 to 253, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

257. The method of claim 249, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and DateRegue/DateReceived 2022-06-27 outputting plural corresponding merchandise short-titles.

258. The method of claim 249, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

259. The method of any one of claims 248 to 258, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

260. The method of any one of claims 248 to 259, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

261. The method of any one of claims 248 to 260, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

262. The method of any one of claims 248 to 261, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

263. The method of any one of claims 248 to 262, wherein key word sets are formed and each correspond to the category group.

264. The method of any one of claims 248 to 263, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

265. The method of any one of claims 248 to 264, wherein the machine model is a BiLSTM+CRF deep learning model.

266. The method of any one of claims 248 to 265, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

267. The method of any one of claims 248 to 266, wherein the key words tagged as the category words in the original merchandise title data are processed first.

DateRegue/DateReceived 2022-06-27

268. The method of any one of claims 248 to 267, wherein there are the different original merchandise title data, processes are repeated as many times as required.

269.A computer equipment comprising:
a processor configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with key words in a word library;
output the key words with matches;
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

270. The equipment of claim 269, further comprises:
the processor configured to:
crawl merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract the key words to construct the word library; and tag each key word in the word library as a modifier word or a category word according to a part of speech of the key word.

DateRegue/DateReceived 2022-06-27

271. The equipment of claim 270, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

272. The equipment of claim 271, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

273. The equipment of claim 271, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

274. The equipment of any one of claim 272 to 273, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;

DateRegue/DateReceived 2022-06-27 wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

275. The equipment of any one of claims 272 to 274, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

276. The equipment of any one of claims 271 to 274, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

277. The equipment of any of claims 269 to 274, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
DateRegue/DateReceived 2022-06-27 recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and stitching the remaining key words into the merchandise short-title according to locational sequence.

278. The equipment of claim 270, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

279. The equipment of claim 270, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

280. The equipment of any one of claims 269 to 279, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

281. The equipment of any one of claims 269 to 280, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

DateRegue/DateReceived 2022-06-27

282. The equipment of any one of claims 269 to 281, wherein for collecting the search terrn data, search terms used for searching for the merchandise items, includes query data.

283. The equipment of any one of claims 269 to 282, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

284. The equipment of any one of claims 269 to 283, wherein key word sets are formed and each correspond to the category group.

285. The equipment of any one of claims 269 to 284, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

286. The equipment of any one of claims 269 to 285, wherein the machine model is a BiLSTM+CRF deep learning model.

287. The equipment of any one of claims 269 to 286, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

288. The equipment of any one of claims 269 to 287, wherein the key words tagged as the category words in the original merchandise title data are processed first.

289. The equipment of any one of claims 269 to 288, wherein there are the different original merchandise title data, processes are repeated as many times as required.

290.A computer readable physical memory having stored thereon a computer program executed by a computer configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with key words in a word library;

DateRegue/DateReceived 2022-06-27 output the key words with matches;
sieve out at least two effective key words from plural key words; and stitch the effective key words into merchandise short-title according to their parts of speech.

291. The memory of claim 290, further comprises:
crawl merchandise title data and/or collecting search term data, to construct a corpus data set;
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract the key words to construct the word library; and tag each key word in the word library as a modifier word or a category word according to a part of speech of the key word.

292. The memory of claim 291, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library.

293. The memory of claim 292, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:

DateRegue/DateReceived 2022-06-27 extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;

294. The memory of claim 292, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.

295. The memory of any one of claim 293 to 294, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, performing word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library.

296. The memory of any one of claims 293 to 295, further comprises:
based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.

DateRegue/DateReceived 2022-06-27

297. The memory of any one of claims 292 to 295, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

298. The memory of any of claims 290 to 295, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words; and DateRegue/DateReceived 2022-06-27 stitching the remaining key words into the merchandise short-title according to locational sequence.

299. The memory of claim 291, further comprises:
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

300. The memory of claim 291, wherein the search term data represent a collection of search terms are input by a user for searching for a merchandise item.

301. The memory of any one of claims 290 to 300, wherein data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data.

302. The memory of any one of claims 290 to 301, wherein for crawling the merchandise title data, crawl the merchandise short-titles from major e-commerce platforms.

303. The memory of any one of claims 290 to 302, wherein for collecting the search term data, search terms used for searching for the merchandise items, includes query data.

304. The memory of any one of claims 290 to 303, wherein irrelevant key words are filtered out, denoising key words, and the key words in every category group are de-duplicated, to ensure every key word is unique in its group.

305. The memory of any one of claims 290 to 304, wherein key word sets are formed and each correspond to the category group.

306. The memory of any one of claims 290 to 305, wherein the machine model is first to pre-tag numerous key words and manual tagging verification is performed second.

307. The memory of any one of claims 290 to 306, wherein the machine model is a BiLSTM+CRF deep learning model.

DateRegue/DateReceived 2022-06-27

308. The memory of any one of claims 290 to 307, wherein merchandise categories in the different original merchandise title data are recognized and have respective matched key word sets.

309. The memory of any one of claims 290 to 308, wherein the key words tagged as the category words in the original merchandise title data are processed first.

310. The memory of any one of claims 290 to 309, wherein there are the different original merchandise title data, processes are repeated as many times as required.

DateRegue/DateReceived 2022-06-27