CA3217669A1 - Commodity short title generation method and apparatus - Google Patents

Commodity short title generation method and apparatus Download PDF

Info

Publication number
CA3217669A1
CA3217669A1 CA3217669A CA3217669A CA3217669A1 CA 3217669 A1 CA3217669 A1 CA 3217669A1 CA 3217669 A CA3217669 A CA 3217669A CA 3217669 A CA3217669 A CA 3217669A CA 3217669 A1 CA3217669 A1 CA 3217669A1
Authority
CA
Canada
Prior art keywords
word
words
key
merchandise
key words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CA3217669A
Other languages
French (fr)
Inventor
Bin Zhu
Yi SHEN
Kang QI
Heqiang NI
Shu Chen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
10353744 Canada Ltd
Original Assignee
10353744 Canada Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 10353744 Canada Ltd filed Critical 10353744 Canada Ltd
Publication of CA3217669A1 publication Critical patent/CA3217669A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

An apparatus for matching words in a merchandise title library is provided, including a word matching unit that performs word segmentation on original merchandise title data to obtain plural title words, matches each of the title words with keywords in a library, and outputs the key words with matches; a processing unit that sieves out at least two effective keywords from plural key words, stitches the keywords into merchandise short-titles; a data collecting unit that crawls merchandise title data and collects search term data to construct a data set that includes original merchandise title data; a word library unit that categorizes entries in the data set by merchandise categories and extracts the keywords to construct the word library; and a word tagging unit that tags each keyword in the word library as a modifier or category word, according to the keyword's part of speech.

Description

COMMODITY SHORT TITLE GENERATION METHOD AND APPARATUS
BACKGROUND OF THE INVENTION
Technical Field [0001] The present invention relates to the technical field of text abstracting, and more particularly to a method and an apparatus for generating merchandise short-titles.
Description of Related Art
[0002] Merchandise short-titles are generally formed by compressing a standard-length titles of merchandise items. As implied in the name, short-titles are simple, concise, and short.
The purpose of short-titles is to describe key information of merchandise items with the least possible words so that users can get such key information at a glance.
An example of a short-title is "Korean-cutting all-over print dress." This can be regarded as a special text abstracting technology in the sense of natural language processing.
[0003] The traditional text abstracting techniques, such as TextRank, and Lead-3, are about abstracting sentences from articles, and are not really suitable for generation of merchandise titles. With the rapid development of deep learning, various deep-learning models, like seq2seq and pointer-generation, can be used to generate compressed short-titles. However, without sufficient short-title trained corpus, these models are not applicable to practical applications, particularly for generation of merchandise titles.
SUMMARY OF THE INVENTION
[0004] The objective of the present invention is to provide a method and an apparatus for generating a merchandise short-titles, which can generate merchandise short-titles with improved efficiency and precision.
[0005] For achieving the foregoing objective, in a first aspect, the present invention provides a method for generating a merchandise short-title, which comprises:
[0006] crawling merchandise title data and/or collecting search term data, so as to construct a Date Recue/Date Received 2023-10-25 corpus data set;
[0007] based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library;
[0008] tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word;
[0009] performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and
[0010] sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.
[0011] Preferably, the step of based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library comprises:
[0012] based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
[0013] performing word segmentation on the corpuses, respectively, so as to obtain the plural key words, and de-duplicating and then filtering the key words in every merchandise category so as to obtain key word sets each corresponding to a said merchandise category;
and
[0014] uniting the plural key words sets to form the word library.
[0015] More preferably, the step of tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word comprises:
[0016] extracting the key words that are the modifier words or the category words from the word library by means of manual tagging and tagging the corresponding parts of speech; and/or
[0017] extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model.
[0018] Further, after the step of extracting the key words that are the modifier words or the category words from the word library by means of manual tagging and tagging the Date Recue/Date Received 2023-10-25 corresponding parts of speech, the method further comprises:
[0019] crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library;
[0020] if a number of the key words that have matches is smaller than a threshold, adding the key words in the new merchandise title data into the corresponding key word sets, and tagging the newly added key words for their parts of speech; or
[0021] if the number of the key words that have matches is greater than the threshold, crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library again.
[0022] Preferably, after the step of extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model, the method further comprises:
[0023] based on a semantic recognition technology in the machine model, extracting the key words that are the modifier words or the category words from the newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.
[0024] Preferably, the step of performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches comprises:
[0025] recognizing the merchandise categories in the original merchandise title data, and matching them with the corresponding key word sets; and
[0026] segmenting the original merchandise title data into the plural title words, matching each of the title words with the key words in the corresponding key word set, and sieving out the key words that have matches.
[0027] Preferably, the step of sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:

Date Recue/Date Received 2023-10-25
[0028] recording location information of each of the key words in the original merchandise title data;
[0029] if in the key words tagged as the modifier words, there are plural said key words whose lexical scopes have intersection, only one said key word in the intersection is kept;
[0030] if in the key words tagged as the modifier words, there are plural said key words in which the lexical scope of one said key word contains the lexical scope of another said key word, only the key word has the largest lexical scope is kept;
[0031] if the key words tagged as the category words have word sense containing word sense of any said key word tagged as the modifier word, the key word corresponding to the modifier word is removed; and
[0032] defining the left key words as the effective key words, and stitching them into the merchandise short-title according to locational sequence thereof.
[0033] Optionally, matching the different original merchandise title data with the word library, respectively, performing parallel processing, and outputting plural corresponding merchandise short-titles.
[0034] Exemplarily, the search term data represent a collection of search terms to be input by a user for searching for a merchandise item.
[0035] As compared to the prior art, the method for generating merchandise short-titles of the present invention provides the following beneficial effects:
[0036] In the method for generating merchandise short-titles according to the present invention, a corpus data set is first constructed. Then, based on the merchandise category table, corpuses in the corpus data set as categorized. From the categorized corpuses, key words are extracted to form a word library. Every key word in the word library is tagged as a modifier word or a category word according to its part of speech. The word library is so established. Afterward, original merchandise title data are acquired and to be compressed.
The original merchandise title data are segmented to obtain plural title words. These title words are entered into the word library to be matched with the key words. From the key words that have matches, at least two effective key words are sieved out, and stitched into a merchandise short-title according to the order of their parts of speech.

Date Recue/Date Received 2023-10-25
[0037] It is thus clear that the present invention categorizes corpuses before tagging them, thereby effectively reducing difficulty of the tagging process and tagging key words more efficiency. By segmenting the original merchandise title data and directly matching the data with the key words in the word library, the sieved and stitched merchandise short-title is more precise.
[0038] In another aspect, the present invention provides an apparatus for generating merchandise short-titles, which is applied with the method for generating merchandise short-titles as described above. The apparatus comprises:
[0039] a data collecting unit, for crawling merchandise title data and/or collecting search term data, so as to construct a corpus data set;
[0040] a word library unit, for based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library;
[0041] a word tagging unit, for tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word;
[0042] a word matching unit, for performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and
[0043] a processing unit, for sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.
[0044] As compared to the prior art, the disclosed apparatus for generating merchandise short-titles provides beneficial effects that are similar to those provided by the method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.
[0045] In a third aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored. When run by a processor, the computer program executes the steps of the method for generating merchandise short-titles as described above.
Date Recue/Date Received 2023-10-25
[0046] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.
BRIEF DESCRIPTION OF THE DRAWINGS
[0047] The accompanying drawing is provided herein for better understanding of the present invention and form a part of this disclosure. The illustrative embodiments and their descriptions are for explaining the present invention and by no means form any improper limitation to the present invention, wherein:
[0048] FIG. 1 is a flowchart of a method for generating merchandise short-titles according to a first embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
[0049] To make the foregoing objectives, features, and advantages of the present invention clearer and more understandable, the following description will be directed to some embodiments as depicted in the accompanying drawings to detail the technical schemes disclosed in these embodiments. It is, however, to be understood that the embodiments referred herein are only a part of all possible embodiments and thus not exhaustive. Based on the embodiments of the present invention, all the other embodiments can be conceived without creative labor by people of ordinary skill in the art, and all these and other embodiments shall be encompassed in the scope of the present invention.
[0050] Embodiment 1
[0051] Referring to FIG. 1, the present embodiment provides a method for generating a merchandise short-title, comprising:
[0052] crawling merchandise title data and/or collecting search term data, so as to construct a corpus data set; based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library; tagging each key word in the word library as either a modifier word or a Date Recue/Date Received 2023-10-25 category word according to a part of speech of the word; performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.
[0053] In the method for generating merchandise short-titles according to the present embodiment, a corpus data set is first constructed. Then, based on the merchandise category table, corpuses in the corpus data set as categorized. From the categorized corpuses, key words are extracted to form a word library. Every key word in the word library is tagged as a modifier word or a category word according to its part of speech.
The word library is so established. Afterward, original merchandise title data are acquired and to be compressed. The original merchandise title data are segmented to obtain plural title words. These title words are entered into the word library to be matched with the key words. From the key words that have matches, at least two effective key words are sieved out, and stitched into a merchandise short-title according to the order of their parts of speech.
[0054] It is thus clear that the present invention categorizes corpuses before tagging them, thereby effectively reducing difficulty of the tagging process and tagging key words more efficiency. By segmenting the original merchandise title data and directly matching the data with the key words in the word library, the sieved and stitched merchandise short-title is more precise.
[0055] It is to be noted that the data of the corpus data sets are obtained by crawling the merchandise title data and collecting the search term data. For crawling the merchandise title data, it is important to crawl merchandise short-titles from major e-commerce platforms. For collecting the search term data, search terms used for searching for various merchandise items, namely query data, are gathered.
[0056] In the embodiment, the step of based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words Date Recue/Date Received 2023-10-25 to construct a word library comprises:
[0057] based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories; performing word segmentation on the corpuses, respectively, so as to obtain the plural key words, and de-duplicating and then filtering the key words in every merchandise category so as to obtain key word sets each corresponding to a said merchandise category; and uniting the plural key words sets to form the word library.
[0058] Since tagging corpuses directly represents a prodigious workload, for reducing difficulty and improving efficiency of the tagging task, it is desired to categorize corpuses in the corpus data set according to a merchandise category table (e.g., a quaternary merchandise group). For example, the categories may include a clothes corpus group, a pants corpus group, a mobile phone corpus group, etc. Then the categorized corpuses are segmented so that every category group is formed by plural key words. Those irrelevant key words are filtered out (denoising key words), and the key words in every category group are de-duplicated, so as to ensure every key word is unique in its group. Eventually, key word sets are formed and each correspond to a category group. By uniting all the key word sets, the word library is formed.
[0059] In the embodiment, the step of tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word comprises:
[0060] extracting the key words that are the modifier words or the category words from the word library by means of manual tagging and tagging the corresponding parts of speech; and/or extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model.
[0061] As implied in the name, manual tagging refers to manually determining whether a key word in the word library is a modifier word or a category word, and manually tagging the key word. Differently, a machine tagging model implements automatically recognizing and tagging techniques. When the number of key words in the word library is huge, such a machine model is effective in improving tagging efficiency. However, as demonstrated Date Recue/Date Received 2023-10-25 in practice, while a machine model provides high efficiency, its tagging results are less precise than those from manual operation. Therefore, it is preferred to combine the two solutions for tagging key words in the word library. For example, a machine model is first used to pre-tag numerous key words, and then manual verification is performed, so as to balance and maximize efficiency and precision of key-word tagging.
[0062] after the step of extracting the key words that are the modifier words or the category words from the word library by means of manual tagging and tagging the corresponding parts of speech, the method further comprises:
[0063] crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library; if a number of the key words that have matches is smaller than a threshold, adding the key words in the new merchandise title data into the corresponding key word sets, and tagging the newly added key words for their parts of speech; or if the number of the key words that have matches is greater than the threshold, crawling new merchandise title data, performing word segmentation thereon, and matching resulting words with the key words in the word library again.
[0064] The objective of the embodiment is to increase word sources for the word library. By keeping acquiring new merchandise title data, the robustness of the key words in the word library can be evaluated. Specifically, word segmentation is performed on the merchandise title data, and the results are filtered so that only those key words whose parts of speech are identified as modifier words and category words are kept.
When the number of the left key words and the number of the key words in the word library are smaller than a threshold, it indicates that the key words in the word library are not robust enough. At this time, the key words in the merchandise title data that do not have matches are supplemented into the corresponding key word sets. The newly added key words are tagged by their parts of speech. On the contrary, if the number of the left key words and the number of the key words in the word library are greater than the threshold, it indicates that the collection of the key words in the word library is competent to deal with the current merchandise title data. Thus, a user can continue to crawl new merchandise title Date Recue/Date Received 2023-10-25 data and repeat the foregoing process to continuously assess the word library.

Exemplarily, the threshold is 3.
[0065] after the step of extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging the corresponding parts of speech using a machine tagging model, the method further comprises:
[0066] based on a semantic recognition technology in the machine model, extracting the key words that are the modifier words or the category words from the newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.
[0067] Optionally, the machine model may be a BiLSTM+CRF deep learning model.
By using such a deep learning model to extract the key words that are modifier words or category words from the newly crawled merchandise title data, tagging the key words and adding them into the corresponding key word sets, the deep learning model demonstrates great adaptivity and can automatically recognizing category words and modifiers in the merchandise title according to contextual information.
[0068] Further, in the embodiment, the step of performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches comprises:
[0069] recognizing the merchandise categories in the original merchandise title data, and matching them with the corresponding key word sets; and segmenting the original merchandise title data into the plural title words, matching each of the title words with the key words in the corresponding key word set, and sieving out the key words that have matches.
[0070] Preferably, multiple different original merchandise title data may be acquired at the same time and matched with the word library, respectively. Then parallel processing is performed to output plural merchandise short-titles.
[0071] In practical implementations, merchandise categories in different original merchandise Date Recue/Date Received 2023-10-25 title data can be recognized at the same time and have respective matched key word sets.
The original merchandise title data are segmented into plural title words.
Then each of the title words is matched with the key words in the corresponding key word set, and the key words have matches in the original merchandise title data are sieved out.
[0072] Further, in the embodiment, the step of sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
[0073] recording location information of each of the key words in the original merchandise title data; if in the key words tagged as the modifier words, there are plural said key words whose lexical scopes have intersection, only one said key word in the intersection is kept;
if in the key words tagged as the modifier words, there are plural said key words in which the lexical scope of one said key word contains the lexical scope of another said key word, only the key word has the largest lexical scope is kept; if the key words tagged as the category words have word sense containing word sense of any said key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
and defining the left key words as the effective key words, and stitching them into the merchandise short-title according to locational sequence thereof. In practical implementations, the key words tagged as the category words in the original merchandise title data are processed first.
[0074] It is understandable that, according to the word count of the merchandise short-title, modifier key words and category key words satisfying preset criteria can be found and then they can be stitched together according to their locational sequence, so as to form a fluent merchandise short-title. The described embodiment is for explaining how to generate a merchandise short-title from original merchandise title data. If there are different original merchandise title data, the foregoing process may be repeated as many times as required, thereby facilitating batch generation of merchandise short-titles.
[0075] Embodiment 2
[0076] The present embodiment provides an apparatus for generating merchandise short-titles, Date Recue/Date Received 2023-10-25 comprising:
[0077] a data collecting unit, for crawling merchandise title data and/or collecting search term data, so as to construct a corpus data set;
[0078] a word library unit, for based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories, and then extracting key words to construct a word library;
[0079] a word tagging unit, for tagging each key word in the word library as either a modifier word or a category word according to a part of speech of the word;
[0080] a word matching unit, for performing word segmentation on the original merchandise title data so as to obtain plural title words, matching each of the title words with the key words in the word library, respectively, and outputting the key words that have matches; and
[0081] a processing unit, for sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech.
[0082] As compared to the prior art, the disclosed apparatus for generating merchandise short-titles provides beneficial effects that are similar to those provided by the disclosed smart method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.
[0083] Embodiment 3
[0084] The present embodiment provides a computer-readable storage medium, in which a computer program is stored. When run by a processor, the computer program executes the steps of the method for generating merchandise short-titles as described previously.
[0085] As compared to the prior art, the disclosed computer-readable storage medium provides beneficial effects that are similar to those provided by the disclosed smart method for generating merchandise short-titles as enumerated above, and thus no repetitions are made herein.
[0086] As will be appreciated by people of ordinary skill in the art, implementation of all or a part of the steps of the method of the present invention as described previously may be Date Recue/Date Received 2023-10-25 realized by having a program instruct related hardware components. The program may be stored in a computer-readable storage medium, and the program is about performing the individual steps of the methods described in the foregoing embodiments.
The storage medium may be a ROM/RAM, a hard drive, an optical disk, a memory card or the like.
[0087] The present invention has been described with reference to the preferred embodiments and it is understood that the embodiments are not intended to limit the scope of the present invention. Moreover, as the contents disclosed herein should be readily understood and can be implemented by a person skilled in the art, all equivalent changes or modifications which do not depart from the concept of the present invention should be encompassed by the appended claims. Hence, the scope of the present invention shall only be defined by the appended claims.

Date Recue/Date Received 2023-10-25

Claims (20)

1. An apparatus comprising:
a word matching unit, configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with key words in a word library;
output the key words with matches;
a processing unit, configured to:
sieve out at least two effective key words from plural key words;
stitch the effective key words into merchandise short-title according to their parts of speech;
a data collecting unit, configured to crawl merchandise title data and collect search term data, to construct a corpus data set, wherein the corpus data set includes the original merchandize title data;
a word library unit, configured to:
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract the key words to construct the word library; and a word tagging unit, configured to tag each key word in the word library as a modifier word or a category word according to a part of speech of word.
2. The apparatus of claim 1, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:

Date Recue/Date Received 2023-10-25 based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to fomi the word library.
3. The apparatus of claim 2, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech.
4. The apparatus of claim 2, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.
5. The apparatus of any one of claim 3 to 4, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, perfoming word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library; and Date Recue/Date Received 2023-10-25 wherein based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.
6. The apparatus of any one of claims 3 to 5, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.
7. The apparatus of any of claims 1 to 6, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
Date Recue/Date Received 2023-10-25 wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words;
stitching the remaining key words into the merchandise short-title according to locational sequence;
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.
8. A method comprising:
performing word segmentation on original merchandise title data to obtain plural title words;
matching each of the title words with key words in a word library;
outputting the key words with matches;
sieving out at least two effective key words from plural key words;
stitching the effective key words into merchandise short-title according to their parts of speech;
crawling merchandise title data and collecting search term data, to construct a corpus data set, wherein the corpus data set includes the original merchandize title data;
based on a merchandise category table, categorizing corpuses in the corpus data set by merchandise categories;
extracting the key words to construct the word library; and Date Recue/Date Received 2023-10-25 tagging each key word in the word library as a modifier word or a category word according to a part of speech of the key word.
9. The method of claim 8, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to fomi the word library.
10. The method of claim 9, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech;
11. The method of claim 9, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.
12. The method of any one of claim 10 to 11, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;

Date Recue/Date Received 2023-10-25 wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, perfonning word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library; and wherein based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.
13. The method of any one of claims 10 to 12, wherein performing word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.
14. The method of any of claims 8 to 13, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;

Date Recue/Date Received 2023-10-25 wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words;
stitching the remaining key words into the merchandise short-title according to locational sequence;
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.
15. A computer readable physical memory having stored thereon a computer program executed by a computer configured to:
perform word segmentation on original merchandise title data to obtain plural title words;
match each of the title words with key words in a word library;
output the key words with matches;
sieve out at least two effective key words from plural key words;
stitch the effective key words into merchandise short-title according to their parts of speech;

Date Recue/Date Received 2023-10-25 crawl merchandise title data and collecting search term data, to construct a corpus data set, wherein the corpus data set includes the original merchandize title data;
based on a merchandise category table, categorize corpuses in the corpus data set by merchandise categories;
extract the key words to construct the word library; and tag each key word in the word library as a modifier word or a category word according to a part of speech of the key word.
16. The memory of claim 15, wherein based on the merchandise category table, categorizing corpuses in the corpus data set by the merchandise categories, and extracting the key words to construct the word library comprises:
based on the merchandise category table, categorizing the corpuses in the corpus data set one by one according to the merchandise categories;
performing word segmentation on the corpuses, to obtain the plural key words;
de-duplicating and filtering the key words in every merchandise category to obtain key word sets each corresponding to the merchandise category; and uniting the key words sets to form the word library; and wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of word comprises:
extracting the key words which are the modifier words or the category words from the word library by means of manual tagging and tagging corresponding parts of speech.
17. The memory of claim 16, wherein tagging each key word in the word library as the modifier word or the category word according to the part of speech of the word comprises:
extracting the key words that are the modifier words or the category words from the word library using a machine tagging model and tagging corresponding parts of speech.
Date Recue/Date Received 2023-10-25
18. The memory of any one of claim 16 to 17, further comprises:
crawling new merchandise title data;
performing word segmentation on the new merchandise title data;
matching resulting words with the key words in the word library;
wherein a number of the key words have matches is smaller than a threshold, adding the key words in the new merchandise title data into corresponding key word sets, and tagging newly added key words for their parts of speech;
wherein the number of the key words have matches is greater than the threshold, crawling the new merchandise title data, perfoming word segmentation on the new merchandise title data, and matching resulting words with the key words in the word library; and wherein based on a semantic recognition technology in a machine model, extracting the key words that are the modifier words or the category words from newly crawled merchandise title data, adding them into the corresponding key word sets, and tagging the newly added key words for their corresponding parts of speech.
19. The memory of any one of claims 16 to 18, wherein perfoming word segmentation on the original merchandise title data to obtain the plural title words, matching each of the title words with the key words in the word library, and outputting the key words with matches comprises:
recognizing the merchandise categories in the original merchandise title data;

matching the merchandise categories with the key word sets;
segmenting the original merchandise title data into the plural title words;
matching each of the title words with the key words in the key word set; and sieving out the key words with matches.

Date Recue/Date Received 2023-10-25
20. The memory of any of claims 15 to 19, wherein sieving out at least two effective key words from the plural key words, and stitching the effective key words into the merchandise short-title according to their parts of speech comprises:
recording location information of each of the key words in the original merchandise title data;
wherein the key words tagged as the modifier words, there are plural key words whose lexical scopes have intersection, only one key word in the intersection is kept;
wherein the key words tagged as the modifier words, there are plural key words in which the lexical scope of one key word contains the lexical scope of another key word, only the key word has largest lexical scope is kept;
wherein the key words tagged as the category words have word sense containing word sense of any key word tagged as the modifier word, the key word corresponding to the modifier word is removed;
defining remaining key words as the effective key words;
stitching the remaining key words into the merchandise short-title according to locational sequence;
matching different original merchandise title data with the word library;
performing parallel processing; and outputting plural corresponding merchandise short-titles.

Date Recue/Date Received 2023-10-25
CA3217669A 2019-12-27 2020-08-28 Commodity short title generation method and apparatus Pending CA3217669A1 (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201911373120.5A CN111191022B (en) 2019-12-27 2019-12-27 Commodity short header generation method and device
CN201911373120.5 2019-12-27
CA3166094A CA3166094A1 (en) 2019-12-27 2020-08-28 Commodity short title generation method and apparatus

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA3166094A Division CA3166094A1 (en) 2019-12-27 2020-08-28 Commodity short title generation method and apparatus

Publications (1)

Publication Number Publication Date
CA3217669A1 true CA3217669A1 (en) 2021-07-01

Family

ID=70707711

Family Applications (3)

Application Number Title Priority Date Filing Date
CA3217669A Pending CA3217669A1 (en) 2019-12-27 2020-08-28 Commodity short title generation method and apparatus
CA3166094A Pending CA3166094A1 (en) 2019-12-27 2020-08-28 Commodity short title generation method and apparatus
CA3217721A Pending CA3217721A1 (en) 2019-12-27 2020-08-28 Commodity short title generation method and apparatus

Family Applications After (2)

Application Number Title Priority Date Filing Date
CA3166094A Pending CA3166094A1 (en) 2019-12-27 2020-08-28 Commodity short title generation method and apparatus
CA3217721A Pending CA3217721A1 (en) 2019-12-27 2020-08-28 Commodity short title generation method and apparatus

Country Status (3)

Country Link
CN (1) CN111191022B (en)
CA (3) CA3217669A1 (en)
WO (1) WO2021128914A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191022B (en) * 2019-12-27 2023-07-25 苏宁云计算有限公司 Commodity short header generation method and device
CN112446208A (en) * 2020-12-09 2021-03-05 北京有竹居网络技术有限公司 Method, device and equipment for generating advertisement title and storage medium
CN113343687B (en) * 2021-05-25 2023-09-05 北京奇艺世纪科技有限公司 Event name determining method, device, equipment and storage medium
CN113283218A (en) * 2021-06-24 2021-08-20 中国平安人寿保险股份有限公司 Semantic text compression method and computer equipment
CN113553838A (en) * 2021-08-03 2021-10-26 稿定(厦门)科技有限公司 Commodity file generation method and device
CN115169337B (en) * 2022-08-24 2023-02-14 中教畅享(北京)科技有限公司 Method for calculating keyword score in commodity title optimization
CN115470322B (en) * 2022-10-21 2023-05-05 深圳市快云科技有限公司 Keyword generation system and method based on artificial intelligence

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8489609B1 (en) * 2006-08-08 2013-07-16 CastTV Inc. Indexing multimedia web content
CN102012915A (en) * 2010-11-22 2011-04-13 百度在线网络技术(北京)有限公司 Keyword recommendation method and system for document sharing platform
CN104424296B (en) * 2013-09-02 2018-07-31 阿里巴巴集团控股有限公司 Query word sorting technique and device
CN104636334A (en) * 2013-11-06 2015-05-20 阿里巴巴集团控股有限公司 Keyword recommending method and device
CN106708813A (en) * 2015-07-14 2017-05-24 阿里巴巴集团控股有限公司 Title processing method and equipment
KR20180069813A (en) * 2015-10-16 2018-06-25 알리바바 그룹 홀딩 리미티드 Title display method and apparatus
CN108804541B (en) * 2018-05-08 2020-09-18 苏州闻道网络科技股份有限公司 Electric trademark optimization system and optimization method
CN111191022B (en) * 2019-12-27 2023-07-25 苏宁云计算有限公司 Commodity short header generation method and device

Also Published As

Publication number Publication date
WO2021128914A1 (en) 2021-07-01
CA3217721A1 (en) 2021-07-01
CN111191022A (en) 2020-05-22
CN111191022B (en) 2023-07-25
CA3166094A1 (en) 2021-07-01

Similar Documents

Publication Publication Date Title
CA3217669A1 (en) Commodity short title generation method and apparatus
CN111177569B (en) Recommendation processing method, device and equipment based on artificial intelligence
CN108959431B (en) Automatic label generation method, system, computer readable storage medium and equipment
US9588990B1 (en) Performing image similarity operations using semantic classification
Manjari et al. Extractive Text Summarization from Web pages using Selenium and TF-IDF algorithm
CN106096609B (en) A kind of merchandise query keyword automatic generation method based on OCR
CN111797239B (en) Application program classification method and device and terminal equipment
CN102609458A (en) Method and device for picture recommendation
CN102495892A (en) Webpage information extraction method
CN107844533A (en) A kind of intelligent Answer System and analysis method
CN101819583A (en) Generate domain corpus and dictionary at the robotization body
CN111324797B (en) Method and device for precisely acquiring data at high speed
Fejer et al. Automatic Arabic text summarization using clustering and keyphrase extraction
CN108228612B (en) Method and device for extracting network event keywords and emotional tendency
CN110866102A (en) Search processing method
JP2008203933A (en) Category creation method and apparatus and document classification method and apparatus
CN103034709B (en) Retrieving result reordering system and method
Bollegala et al. Extracting key phrases to disambiguate personal name queries in web search
CN109255098B (en) Matrix decomposition hash method based on reconstruction constraint
CN106294689A (en) A kind of method and apparatus selecting based on text category feature to carry out dimensionality reduction
CN116069905A (en) Image text model processing method and image text retrieval system
CN107577667B (en) Entity word processing method and device
CN115526601A (en) File management method and device
CN112115362B (en) Programming information recommendation method and device based on similar code recognition
Thamviset et al. Bottom-up region extractor for semi-structured web pages

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20231025

EEER Examination request

Effective date: 20231025

EEER Examination request

Effective date: 20231025

EEER Examination request

Effective date: 20231025

EEER Examination request

Effective date: 20231025

EEER Examination request

Effective date: 20231025