CN113538062A - Method for reversely deducing bid words purchased by commodity promotion notes - Google Patents

Method for reversely deducing bid words purchased by commodity promotion notes Download PDF

Info

Publication number
CN113538062A
CN113538062A CN202110855006.7A CN202110855006A CN113538062A CN 113538062 A CN113538062 A CN 113538062A CN 202110855006 A CN202110855006 A CN 202110855006A CN 113538062 A CN113538062 A CN 113538062A
Authority
CN
China
Prior art keywords
alternative
frequency
words
word
judging whether
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110855006.7A
Other languages
Chinese (zh)
Other versions
CN113538062B (en
Inventor
李在灼
姜豪
胡长春
郑舒丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Guoji Information Technology Co ltd
Original Assignee
Fuzhou Guoji Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Guoji Information Technology Co ltd filed Critical Fuzhou Guoji Information Technology Co ltd
Priority to CN202110855006.7A priority Critical patent/CN113538062B/en
Publication of CN113538062A publication Critical patent/CN113538062A/en
Application granted granted Critical
Publication of CN113538062B publication Critical patent/CN113538062B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/44Statistical methods, e.g. probability models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0276Advertisement creation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method for reversely deducing bid words purchased by a commodity promotion note, which comprises the following steps of: 1) collecting notes, brand information and alternative word lists of the notes; 2) judging whether the alternative word list is empty or not; 3) when the brand information is not abnormal, n-gram character strings of the alternative words are disassembled, and character strings which only appear once are deleted; judging whether the character string table is empty or not; 4) extracting effective high-frequency phrases from the character string table, and judging whether the word list of the effective high-frequency phrases is empty or not; 5) initializing an alternative word list, deleting alternative words containing other brand information, and judging whether the alternative word list is empty or not; 6) calculating the score and the difference degree of the alternative words according to the effective high-frequency phrases, and screening the effective alternative words; judging whether the effective alternative word list is empty or not; 7) deleting the alternative words contained by the other alternative words in the effective alternative word list, obtaining the first five alternative words with the highest score and outputting the alternative words as reverse-deducing bid words; 8) and classifying the alternative words according to the reverse-deducing bidding word table, and outputting the note and brand information, the alternative words and the classification.

Description

Method for reversely deducing bid words purchased by commodity promotion notes
Technical Field
The invention relates to the technical field of data processing, in particular to a method for reversely deducing bid terms purchased by a commodity promotion note.
Background
With the rise of electronic commerce platforms in China, the importance of online shopping consumption in the life of people is continuously improved, and online shopping becomes an important consumer channel. The platform of panning, tremble sound, fast hand, watermelon video, little red book etc. because its conversion rate is high, marketing effect is good, becomes the new growth power of electricity merchant platform, content platform gradually, has accelerated the consumption conversion, has brought higher flow for the trade company.
At present, the small red book APP is widely popular with the public as a discussion community based on the evaluation of various real products of users; this property of it attracts many brands to enter the community and gain the attention of the public in the form of commercials placed in the red book notes. With the gradual maturity of the business model, the small red book platform provides bidding word service for the brand party, and the brand party can improve the ranking of the commodity displayed on the corresponding search interface of the user through bidding hot keywords (purchasing the bidding words), so as to obtain higher flow and income; specifically, after a brand side purchases one or more bid terms, the small red book platform provides a series of alternative terms (also called related bid terms) related to the bid terms according to the bid terms purchased by the note to form an alternative term table, during advertisement putting, a user searches any alternative term in the alternative term table, and the note can be associated and preferentially displayed in a search result interface. However, for a brand party, a large word bank and a large and expensive Chinese character combination are required to search for a term which is precisely matched with the psychology of a consumer and attracts a large amount of traffic bid words, and the patent deduces one or more bid words which are most likely to be actually purchased by a hot note through reversely deducing the hot note, so that suggestions are provided for purchasing the bid words by the brand party, data reference is provided for the bid marketing of the brand party on a platform, or the brand party is helped to know the trend of the bid or the trend of popularizing the brand and class by the platform.
Disclosure of Invention
The invention aims to provide a method for reversely deducing bid terms purchased by a commodity promotion note, aiming at the condition of the prior art, the method has the advantages that the correlation between the bid terms obtained by reverse deduction and the note is highest, the used terms are more accurate and natural, and the product selling points and the demands and psychology of consumers can be fitted.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method for reversely deducing bid words purchased by commodity promotion notes comprises the following steps:
1) collecting a note with commodity promotion information, brand information related to the note and an alternative word list related to the note;
2) judging whether the alternative vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, judging whether the brand information is abnormal;
3) when the brand information is not abnormal, n-gram character strings are disassembled for all the alternative words in the alternative word list, wherein n is more than or equal to 2 and is an integer, and the character string list is obtained after the character strings which appear only once are deleted; judging whether the character string table is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing the step 4);
4) extracting effective high-frequency phrases from the character string table to obtain an effective high-frequency phrase word list; judging whether the effective high-frequency phrase vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing the step 5);
5) initializing an alternative word list, deleting alternative words containing other brand information, then judging whether the alternative word list is empty, if so, outputting a reverse-thrust bidding word list to be empty, and if not, executing a step 6);
6) calculating scores and difference degrees of all alternative words in the alternative word list according to the effective high-frequency phrases in the effective high-frequency phrase word list, and screening out effective alternative words to obtain an effective alternative word list; judging whether the effective alternative vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing a step 7);
7) deleting the alternative words contained by the other alternative words in the effective alternative word list, obtaining the first five alternative words with the highest scores as the most possible bid words which are actually purchased by reverse estimation, outputting the reverse estimation bid words and storing the reverse estimation bid word list;
8) and classifying all the alternative words in the initial alternative word list according to the reverse-deducing bidding word list, outputting the alternative words obtained by classification and corresponding classification, and simultaneously outputting the note and related brand information.
Preferably, the method for judging whether the brand information is abnormal in the step 2) includes the following steps:
2.1) judging whether the brand information is empty, if so, executing the step 2.2), and if not, executing the step 2.3);
2.2) setting the brand as number 0, and indicating that all brands are excluded;
2.3) judging whether the brand information is unconventional code, if so, judging whether the brand information can be converted into the conventional code, if so, converting the brand information into the conventional code, if not, setting the brand information to be null, and executing the step 2.2).
Preferably, the method for performing n-gram character string decomposition on all the candidate words in the candidate word list in the step 3) comprises the following steps: and (4) uniformly capitalizing all the alternative words in the alternative word list, and traversing the alternative words by a fixed length n to obtain a series of phrases.
Preferably, the method for extracting effective high-frequency phrases from the character string table in the step 4) comprises the following steps:
4.1) setting a high-frequency proportion according to the initial data volume of the alternative words;
4.2) sorting the character string table in a descending manner according to word frequency, screening out high-frequency phrases according to a high-frequency proportion, and storing the high-frequency phrases into a word table;
4.3) deleting the high-frequency phrases contained by the rest phrases in the high-frequency phrase vocabulary;
and 4.4) deleting high-frequency phrases containing other brand information in the high-frequency phrase vocabulary to obtain effective high-frequency phrases, and storing the effective high-frequency phrases into the effective high-frequency phrase vocabulary.
Preferably, the method for setting the high-frequency proportion according to the initial candidate word data size in the step 4.1) comprises the following steps: judging whether the number of the alternative words exceeds N, if so, setting the high-frequency proportion to be 25%, and if not, setting the high-frequency proportion to be 75%; wherein N is a natural number. Preferably, N is equal to 100.
Preferably, the method for deleting the high-frequency phrases contained by the rest phrases in the high-frequency phrase vocabulary in the step 4.3) comprises the following steps:
4.3.1) judging whether the high-frequency phrase vocabulary is empty, if so, returning to an empty list, and if not, executing the step 4.3.2);
4.3.2) the newly-built AC automaton is traversed, added with high-frequency phrases in the high-frequency phrase vocabulary and stored in a data structure;
4.3.3) judging whether to traverse the high-frequency phrase vocabulary until the high-frequency phrase vocabulary is completely taken, if not, executing a step 4.3.4), and if so, executing a step 4.3.5);
4.3.4) calling an AC automaton to traverse each high-frequency phrase, judging whether each high-frequency phrase contains the high-frequency phrases in the list except the high-frequency phrase per se, and if so, recording the contained high-frequency phrases into the list to be deleted;
4.3.5) deleting the high-frequency phrases in the deleted list after the duplication removal from the high-frequency phrase vocabulary, and returning to the high-frequency phrase vocabulary.
Preferably, the method for deleting the high-frequency phrases containing other brand information in the high-frequency phrase vocabulary in the step 4.4) comprises the following steps:
4.4.1) traversing and adding brand information in a preset brand library by the newly-built AC automaton, and storing the brand information in a data structure; the brand information comprises a brand name and a brand number;
4.4.2) judging whether to traverse the high-frequency phrase until the high-frequency phrase is completely taken, if so, returning to an effective high-frequency phrase word list, and if not, executing the step 4.4.3);
4.4.3) calling an AC automaton to traverse each high-frequency phrase, judging whether brand information is detected in the high-frequency phrase, if not, executing the step 4.4.4), and if so, executing the step 4.4.5);
4.4.4) preserving the high frequency phrase;
4.4.5) judging whether the detected brand information is consistent with the brand information related to the note, if not, deleting the high-frequency short word from the high-frequency phrase word list, and if so, executing the step 4.4.4).
Preferably, the method for initializing the alternative word list and deleting the alternative words containing other brand information in step 5) comprises the following steps:
5.1) judging whether to traverse the alternative words until the alternative words are completely taken, if so, returning to an alternative word list, and if not, executing the step 5.2);
5.2) calling an AC automaton traversing a preset brand library to traverse each alternative word, judging whether brand information is detected in the alternative words, if not, executing the step 5.3), and if so, executing the step 5.4);
5.3) reserving the alternative word;
5.4) judging whether the detected brand information is consistent with the brand information related to the note, if not, deleting the alternative words from the alternative word list, and if so, executing the step 5.3).
Preferably, the method for calculating the score of the alternative word according to the effective high-frequency phrase in the step 6) comprises the following steps:
6.1.1) computing a high-frequency phrase score from word frequencyi=counti/∑countiWherein, i is a natural number, whether the alternative word list is traversed until the alternative word list is completely obtained is judged, if yes, the alternative word with the score larger than 0 and the corresponding score are recorded, and if not, the step 6.1.2 is executed);
6.1.2) judging whether to traverse the high-frequency phrases and the corresponding scores until the high-frequency phrases and the corresponding scores are completely obtained, if so, recording the accumulated scores of the alternative words, and if not, executing the step 6.1.3);
6.1.3) judging whether the alternative words contain the high-frequency phrases, if so, accumulating corresponding scores of the high-frequency phrases.
Preferably, the method for calculating the degree of difference err ═ diff (w, list)/w of the alternative words according to the effective high-frequency phrases in the step 6) includes the following steps:
6.2.1) the newly-built AC automaton is traversed, high-frequency phrases and corresponding lengths are added, whether the alternative word list is traversed or not is judged until the alternative word list is completely taken out, if yes, the difference degree is equal to the residual length/the original length of the alternative word, and if not, the step 6.2.2) is executed;
6.2.2) calling an AC automaton to traverse each alternative word, judging whether each alternative word contains more than one high-frequency phrase, if not, the residual length is the original length of the alternative word-the length of one high-frequency phrase contained in the alternative word, and if so, executing a step 6.2.3);
6.2.3) judging whether the positions of the high-frequency phrases contained in the candidate words are overlapped, if not, executing a step 6.2.4), if so, removing the short-length high-frequency phrases in the overlapped objects, and then executing a step 6.2.4);
6.2.4) residual length-the original length of the candidate word-the length of the plurality of high frequency phrases contained.
Preferably, the method for screening the effective alternative words according to the scores and the difference degrees in the step 6) comprises the following steps:
6.3.1) setting high-frequency proportion according to the existing data quantity of the alternative words.
6.3.2) sorting in an increasing way according to the difference degree, and screening the existing alternative words according to the high-frequency proportion;
6.3.3) sorting according to the scores in a descending way, screening the rest alternative words according to the high-frequency proportion, and storing the alternative words into an effective alternative word list;
6.3.4) judging whether the effective alternative word list is empty, if yes, returning to the empty list, if no, executing step 6.3.5);
6.3.5) for the alternative words with the same score in the effective alternative word list, only the alternative word with the minimum difference degree is reserved, and the effective alternative word list is returned.
Preferably, the method for deleting the candidate words contained by the remaining candidate words in the valid candidate word list in step 7) includes the following steps:
7.1) the AC automaton traverses and adds the alternative words in the effective alternative word list, judges whether to traverse the alternative words until the alternative words are completely taken, if yes, executes the step 7.2), and if not, executes the step 7.3);
7.2) deleting the words in the list to be deleted after the duplication removal from the input word list, and returning to the effective alternative word list;
7.3) calling an AC automaton to traverse the alternative words, judging whether the alternative words contain alternative words in the list except the alternative words per se, and if so, recording the contained alternative words into the list to be deleted.
Preferably, the method for classifying all the candidate words in the initial candidate word list according to the reverse-biased bid word list in step 8) comprises the following steps:
8.1) numbering the reverse-push bidding words in the reverse-push bidding word list, calculating scores of alternative words in the dimensions of the reverse-push bidding words according to the correlation, traversing all the alternative words until the alternative words are completely taken, judging whether the highest score is 0, if so, executing a step 8.2), and if not, executing a step 8.5);
8.2) calling an AC automaton which traverses a preset brand library to traverse alternative words, judging whether brand names are detected in the alternative words, if so, executing a step 8.3), and if not, executing a step 8.4);
8.3) judging whether the brand name detected in the candidate word is consistent with the brand information related to the note, if so, setting the classification of the candidate word as 'other related products of the brand', and if not, executing a step 8.4);
8.4) set the classification of this alternative word to "others";
8.5) calling an AC automaton which traverses a preset brand library to traverse alternative words, judging whether brand names are detected in the alternative words, if so, executing a step 8.6), and if not, executing a step 8.7);
8.6) judging whether the brand name detected in the candidate word is consistent with the brand information related to the note, if not, executing a step 8.4), and if so, executing a step 8.7);
8.7) setting the classification of the candidate word as the bidding word corresponding to the highest scoring dimension.
Preferably, the method for calculating the scores of the alternative words in the dimensions of the reverse bid words according to the relevance in the step 8.1) comprises the following steps:
8.1.1) judging whether to traverse the reverse-guessing bid words until the completion of the bid, if so, returning scores of the alternative words on the dimensionality of each bid word, and if not, executing a step 8.1.2);
8.1.2) performing n-gram character string disassembly on each reverse-thrust bidding word, traversing and adding the disassembled character string and the corresponding length by the newly-built AC automaton, and traversing all the alternative words until the alternative words are completely extracted;
8.1.3) calling an AC automaton to traverse each alternative word, judging whether each alternative word contains a substring of a reverse-deducing bid word, and if so, accumulating the length of the substring in the dimension of the current bid word for the alternative word to serve as a score.
Further, step 8) classifying all the alternative words in the initial alternative word list according to the back-stepping bidding word list, then removing the alternative words different from the commodities contained in the classification, outputting the alternative words obtained by the classification and the corresponding classification, and simultaneously outputting the note and the related brand information.
Preferably, the method for removing the alternative words different from the commodities contained in the classification comprises the following steps:
9.1) judging whether to traverse the alternative words and the classification results thereof until the alternative words are completely obtained, if so, returning the alternative words and the corresponding classification, and if not, executing the step 9.2);
9.2) judging whether the candidate word classification result is a certain reverse-deducing bidding word, if so, executing a step 9.3);
9.3) extracting the candidate words and the commodity information of the classification thereof, judging whether the extracted candidate words and the commodity information of the classification are not empty, if so, judging whether the extracted candidate words and the commodity information of the classification are intersected, and if not, changing the classification of the candidate words into other.
Preferably, the method for extracting the candidate words and the classified commodity information thereof in the step 9.3) comprises the following steps:
9.3.1) the AC automaton is traversed and added with the commodity names and the commodity numbers in the preset commodity library and then is stored in the commodity information list;
9.3.2) judging whether the AC automaton traverses the alternative words of the commodity information list until the alternative words are completely taken, if so, returning the identified commodity number, and if not, executing the step 9.3.3);
9.3.3) calling an AC automaton to traverse the alternative words, judging whether the trade names are detected in the alternative words, if so, returning the identified commodity numbers, and if not, returning to the empty list.
The invention adopts the technical scheme that hot notes on a small red book platform, brand information related to the notes and a candidate word list (also called related bidding word list) related to the notes given by the platform are collected, information frequently appearing in the notes is found, one or more bidding words most likely to be actually purchased are calculated by calculating the association degree (larger is better) and the difference degree (smaller is better) of the candidate words and the core information, so that suggestions are given to purchasing the bidding words of a brand party, data reference is provided for the bidding marketing of the brand party on the platform, or the brand party is helped to know the trends of the bidding goods or the trends of popularizing the brand and the category of the platform.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a block diagram illustrating a flow of a method for inferring bid terms purchased for a merchandise promotional note according to the present invention;
FIG. 2 is a block flow diagram of the method for determining whether brand information is abnormal in step 2) of the present invention;
FIG. 3 is a block diagram of the flowchart of the method for extracting effective high-frequency phrases from the character string table in step 4) of the present invention;
FIG. 4 is a flow chart of the method for deleting high frequency phrases contained in the high frequency phrase vocabulary in step 4.3) of the present invention;
FIG. 5 is a flow chart of the method for deleting high-frequency phrases containing other brand information in the high-frequency phrase vocabulary in step 4.4) of the present invention;
FIG. 6 is a block diagram of the flowchart of the method for initializing the alternative word list and deleting the alternative words containing other brand information in step 5) of the present invention;
FIG. 7 is a block flow diagram of the method of step 6) of calculating the score of the alternative word based on the valid high frequency phrases according to the present invention;
FIG. 8 is a block flow diagram of the method of step 6) of calculating the variance of the alternative terms according to the valid high frequency phrases;
FIG. 9 is a block diagram of the flowchart of the method for screening the valid candidate words according to the scores and the difference in step 6) of the present invention;
FIG. 10 is a flowchart illustrating a method for deleting the candidate words included in the remaining candidate words in the valid candidate word list in step 7) according to the present invention;
FIG. 11 is a flowchart illustrating a method for classifying all candidate words in the initial candidate word list according to the back-proposed bidding word list in step 8) of the present invention;
FIG. 12 is a block flow diagram of the method of step 8.1) of calculating the scores of the candidate in each dimension of the reverse bid term according to the relevance according to the invention;
FIG. 13 is a block diagram of a method for removing alternatives different from the categories of items according to the present invention.
Detailed Description
In order to make the objects, aspects and advantages of the present invention more concise and clear, exemplary embodiments of the present invention will be described in detail, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
As shown in one of fig. 1 to 13, the method for inferring bid terms purchased by a merchandise promotion note according to the present invention comprises the steps of:
1) collecting a note with commodity promotion information, brand information related to the note and an alternative word list related to the note;
2) judging whether the alternative vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, judging whether the brand information is abnormal;
3) when the brand information is not abnormal, n-gram character strings are disassembled for all the alternative words in the alternative word list, wherein n is more than or equal to 2 and is an integer, and the character string list is obtained after the character strings which appear only once are deleted; judging whether the character string table is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing the step 4);
4) extracting effective high-frequency phrases from the character string table to obtain an effective high-frequency phrase word list; judging whether the effective high-frequency phrase vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing the step 5);
5) initializing an alternative word list, deleting alternative words containing other brand information, then judging whether the alternative word list is empty, if so, outputting a reverse-thrust bidding word list to be empty, and if not, executing a step 6);
6) calculating scores and difference degrees of all alternative words in the alternative word list according to the effective high-frequency phrases in the effective high-frequency phrase word list, and screening out effective alternative words to obtain an effective alternative word list; judging whether the effective alternative vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing a step 7);
7) deleting the alternative words contained by the other alternative words in the effective alternative word list, obtaining the first five alternative words with the highest scores as the most possible bid words which are actually purchased by reverse estimation, outputting the reverse estimation bid words and storing the reverse estimation bid word list;
8) and classifying all the alternative words in the initial alternative word list according to the reverse-deducing bidding word list, outputting the alternative words obtained by classification and corresponding classification, and simultaneously outputting the note and related brand information.
As shown in fig. 2, the method for determining whether the brand information is abnormal in step 2) preferably includes the steps of:
2.1) judging whether the brand information is empty, if so, executing the step 2.2), and if not, executing the step 2.3);
2.2) setting the brand as number 0, and indicating that all brands are excluded;
2.3) judging whether the brand information is unconventional code, if so, judging whether the brand information can be converted into the conventional code, if so, converting the brand information into the conventional code, if not, setting the brand information to be null, and executing the step 2.2).
Preferably, the method for performing n-gram character string decomposition on all the candidate words in the candidate word list in the step 3) comprises the following steps: and (4) uniformly capitalizing all the alternative words in the alternative word list, and traversing the alternative words by a fixed length n to obtain a series of phrases.
As shown in fig. 3, the method for extracting effective high-frequency phrases from the character string table in step 4) preferably includes the following steps:
4.1) setting a high-frequency proportion according to the initial data volume of the alternative words;
4.2) sorting the character string table in a descending manner according to word frequency, screening out high-frequency phrases according to a high-frequency proportion, and storing the high-frequency phrases into a word table;
4.3) deleting the high-frequency phrases contained by the rest phrases in the high-frequency phrase vocabulary;
and 4.4) deleting high-frequency phrases containing other brand information in the high-frequency phrase vocabulary to obtain effective high-frequency phrases, and storing the effective high-frequency phrases into the effective high-frequency phrase vocabulary.
Preferably, the method for setting the high-frequency proportion according to the initial candidate word data size in the step 4.1) comprises the following steps: judging whether the number of the alternative words exceeds N, if so, setting the high-frequency proportion to be 25%, and if not, setting the high-frequency proportion to be 75%; wherein N is a natural number. Preferably, N is equal to 100.
As shown in fig. 4, the method for deleting the high-frequency phrases contained in the remaining phrases in the high-frequency phrase vocabulary in step 4.3) preferably comprises the following steps:
4.3.1) judging whether the high-frequency phrase vocabulary is empty, if so, returning to an empty list, and if not, executing the step 4.3.2);
4.3.2) the newly-built AC automaton is traversed, added with high-frequency phrases in the high-frequency phrase vocabulary and stored in a data structure;
4.3.3) judging whether to traverse the high-frequency phrase vocabulary until the high-frequency phrase vocabulary is completely taken, if not, executing a step 4.3.4), and if so, executing a step 4.3.5);
4.3.4) calling an AC automaton to traverse each high-frequency phrase, judging whether each high-frequency phrase contains the high-frequency phrases in the list except the high-frequency phrase per se, and if so, recording the contained high-frequency phrases into the list to be deleted;
4.3.5) deleting the high-frequency phrases in the deleted list after the duplication removal from the high-frequency phrase vocabulary, and returning to the high-frequency phrase vocabulary.
As shown in fig. 5, preferably, the method for deleting the high-frequency phrases containing other brand information in the high-frequency phrase vocabulary in step 4.4) includes the following steps:
4.4.1) traversing and adding brand information in a preset brand library by the newly-built AC automaton, and storing the brand information in a data structure; the brand information comprises a brand name and a brand number;
4.4.2) judging whether to traverse the high-frequency phrase until the high-frequency phrase is completely taken, if so, returning to an effective high-frequency phrase word list, and if not, executing the step 4.4.3);
4.4.3) calling an AC automaton to traverse each high-frequency phrase, judging whether brand information is detected in the high-frequency phrase, if not, executing the step 4.4.4), and if so, executing the step 4.4.5);
4.4.4) preserving the high frequency phrase;
4.4.5) judging whether the detected brand information is consistent with the brand information related to the note, if not, deleting the high-frequency short word from the high-frequency phrase word list, and if so, executing the step 4.4.4).
As shown in fig. 6, preferably, the method for initializing the word list in step 5) and deleting the alternative words containing other brand information includes the following steps:
5.1) judging whether to traverse the alternative words until the alternative words are completely taken, if so, returning to an alternative word list, and if not, executing the step 5.2);
5.2) calling an AC automaton traversing a preset brand library to traverse each alternative word, judging whether brand information is detected in the alternative words, if not, executing the step 5.3), and if so, executing the step 5.4);
5.3) reserving the alternative word;
5.4) judging whether the detected brand information is consistent with the brand information related to the note, if not, deleting the alternative words from the alternative word list, and if so, executing the step 5.3).
As shown in fig. 7, the method for calculating the score of the alternative word according to the valid high-frequency phrases in step 6) preferably includes the following steps:
6.1.1) calculating high-frequency phrase score based on word frequencyscorei=counti/∑countiWherein, i is a natural number, whether the alternative word list is traversed until the alternative word list is completely obtained is judged, if yes, the alternative word with the score larger than 0 and the corresponding score are recorded, and if not, the step 6.1.2 is executed);
6.1.2) judging whether to traverse the high-frequency phrases and the corresponding scores until the high-frequency phrases and the corresponding scores are completely obtained, if so, recording the accumulated scores of the alternative words, and if not, executing the step 6.1.3);
6.1.3) judging whether the alternative words contain the high-frequency phrases, if so, accumulating corresponding scores of the high-frequency phrases.
As shown in fig. 8, preferably, the method for calculating the degree of difference err ═ diff (w, list)/w of the alternative words according to the effective high-frequency phrases in step 6) includes the following steps:
6.2.1) the newly-built AC automaton is traversed, high-frequency phrases and corresponding lengths are added, whether the alternative word list is traversed or not is judged until the alternative word list is completely taken out, if yes, the difference degree is equal to the residual length/the original length of the alternative word, and if not, the step 6.2.2) is executed;
6.2.2) calling an AC automaton to traverse each alternative word, judging whether each alternative word contains more than one high-frequency phrase, if not, the residual length is the original length of the alternative word-the length of one high-frequency phrase contained in the alternative word, and if so, executing a step 6.2.3);
6.2.3) judging whether the positions of the high-frequency phrases contained in the candidate words are overlapped, if not, executing a step 6.2.4), if so, removing the short-length high-frequency phrases in the overlapped objects, and then executing a step 6.2.4);
6.2.4) residual length-the original length of the candidate word-the length of the plurality of high frequency phrases contained.
As shown in fig. 9, the method for screening the effective candidate words according to the scores and the difference in step 6) preferably includes the following steps:
6.3.1) setting high-frequency proportion according to the existing data quantity of the alternative words.
6.3.2) sorting in an increasing way according to the difference degree, and screening the existing alternative words according to the high-frequency proportion;
6.3.3) sorting according to the scores in a descending way, screening the rest alternative words according to the high-frequency proportion, and storing the alternative words into an effective alternative word list;
6.3.4) judging whether the effective alternative word list is empty, if yes, returning to the empty list, if no, executing step 6.3.5);
6.3.5) for the alternative words with the same score in the effective alternative word list, only the alternative word with the minimum difference degree is reserved, and the effective alternative word list is returned.
As shown in fig. 10, the method for deleting the candidate words included in the remaining candidate words in the valid candidate word list in step 7) preferably includes the following steps:
7.1) the newly-built AC automaton traverses and adds the alternative words in the effective alternative word list, judges whether the alternative words are traversed until the alternative words are completely taken, if so, executes the step 7.2), and if not, executes the step 7.3);
7.2) deleting the words in the list to be deleted after the duplication removal from the input word list, and returning to the effective alternative word list;
7.3) calling an AC automaton to traverse each alternative word, judging whether the alternative word contains alternative words in the list except the alternative word per se, and if so, recording the contained alternative words into the list to be deleted.
As shown in fig. 11, the method for classifying all the candidate words in the initial candidate word list according to the reverse-biased bid word list in step 8) preferably includes the following steps:
8.1) numbering the reverse-push bidding words in the reverse-push bidding word list, calculating scores of alternative words in the dimensions of the reverse-push bidding words according to the correlation, traversing all the alternative words until the alternative words are completely taken, judging whether the highest score is 0, if so, executing a step 8.2), and if not, executing a step 8.5);
8.2) calling an AC automaton which traverses a preset brand library to traverse alternative words, judging whether brand names are detected in the alternative words, if so, executing a step 8.3), and if not, executing a step 8.4);
8.3) judging whether the brand name detected in the candidate word is consistent with the brand information related to the note, if so, setting the classification of the candidate word as 'other related products of the brand', and if not, executing a step 8.4);
8.4) set the classification of this alternative word to "others";
8.5) calling an AC automaton which traverses a preset brand library to traverse alternative words, judging whether brand names are detected in the alternative words, if so, executing a step 8.6), and if not, executing a step 8.7);
8.6) judging whether the brand name detected in the candidate word is consistent with the brand information related to the note, if not, executing a step 8.4), and if so, executing a step 8.7);
8.7) setting the classification of the candidate word as the bidding word corresponding to the highest scoring dimension.
As shown in fig. 12, the method for calculating the scores of the alternative words in the dimensions of the reverse bid words according to the relevance in the step 8.1) preferably comprises the following steps:
8.1.1) judging whether to traverse the reverse-guessing bid words until the completion of the bid, if so, returning scores of the alternative words on the dimensionality of each bid word, and if not, executing a step 8.1.2);
8.1.2) performing n-gram character string disassembly on each reverse-thrust bidding word, traversing and adding the disassembled character string and the corresponding length by the newly-built AC automaton, and traversing all the alternative words until the alternative words are completely extracted;
8.1.3) calling an AC automaton to traverse each alternative word, judging whether each alternative word contains a substring of a reverse-deducing bid word, and if so, accumulating the length of the substring in the dimension of the current bid word for the alternative word to serve as a score.
Further, step 8) classifying all the alternative words in the initial alternative word list according to the back-stepping bidding word list, then removing the alternative words different from the commodities contained in the classification, outputting the alternative words obtained by the classification and the corresponding classification, and simultaneously outputting the note and the related brand information.
As shown in fig. 13, a method for removing candidate words different from the commodities included in the classification preferably includes the steps of:
9.1) judging whether to traverse the alternative words and the classification results thereof until the alternative words are completely obtained, if so, returning the alternative words and the corresponding classification, and if not, executing the step 9.2);
9.2) judging whether the candidate word classification result is a certain reverse-deducing bidding word, if so, executing a step 9.3);
9.3) extracting the candidate words and the commodity information of the classification thereof, judging whether the extracted candidate words and the commodity information of the classification are not empty, if so, judging whether the extracted candidate words and the commodity information of the classification are intersected, and if not, changing the classification of the candidate words into other.
Preferably, the method for extracting the candidate words and the classified commodity information thereof in the step 9.3) comprises the following steps:
9.3.1) building AC automata to traverse and add commodity names and commodity numbers in the preset commodity library, and storing the commodity names and the commodity numbers in the data structure;
9.3.2) judging whether to traverse the alternative words until the alternative words are completely taken, if yes, returning the identified commodity number, and if not, executing the step 9.3.3);
9.3.3) calling an AC automaton to traverse each alternative word, judging whether the trade name is detected in the alternative word, if so, returning the identified commodity number, and if not, returning to an empty list.
The invention adopts the technical scheme that hot notes on a small red book platform, brand information related to the notes and a candidate word list (also called related bidding word list) related to the notes given by the platform are collected, information frequently appearing in the notes is found, one or more bidding words most likely to be actually purchased are calculated by calculating the association degree (larger is better) and the difference degree (smaller is better) of the candidate words and the core information, so that suggestions are given to purchasing the bidding words of a brand party, data reference is provided for the bidding marketing of the brand party on the platform, or the brand party is helped to know the trends of the bidding goods or the trends of popularizing the brand and the category of the platform.
For convenience of understanding, the words such as notes, brands, bid words, alternative words and the like related to the present invention are explained in the following (the explanation cannot be understood as a limitation to the scope of the present invention):
brand name: name of the finger (e.g., Lancoro) or alias of the brand (e.g., Langome).
Brand library: a database of brands and related information is collected in advance and continuously supplemented.
Brand number/brand code: the number of the brand in the brand library is referred to, and the brand library has one-to-one correspondence property.
Bid term: keywords or phrases which are related to commodities and are often searched by buyers on the e-commerce platform, such as 'dior 999', 'Yashilandai palm bottle', 'flat moisturizing cream' and 'loose big code', advertisers can increase the ranking of products displayed on a corresponding search interface of a user through bidding hot keywords, and therefore higher flow and income are obtained.
The classification is as follows: it refers to the classification of the commodity, and has both generality and distinction, such as "cream".
A commodity library: a database for collecting and continuously supplementing commodities and other information such as classifications and aliases thereof in advance.
Taking notes: the method refers to the blog articles on the small red book platform and shows the blog articles in the forms of texts, pictures, videos and the like. The patent is mainly directed to commercialized notes, namely, targeted graphics and texts or videos including commodity promotion. Similarly, the notes are collected in real time by data acquisition technology, and then brands and other information of the notes are stored in a database by brand identification technology and the like, and are given unique identification numbers.
Note numbering: the reference numbers are written in the database in a one-to-one correspondence manner.
Related bid/alternative word list: the small red book platform presents a series of related words from the bid words purchased by the note, which are searched during ad placement, and the note can be associated and presented. These words will be collected in real time into the database and associated with the note number. For purposes of this patent, these words will be input to the algorithm as alternatives.
N-gram: the term natural language processing field refers to all substrings of a string of N length. For example: the 3-gram set of the 'Yashilandai small brown bottle' is as follows: { "Yashilan", "Shilan Dai", "lan Dai Xiao", "Dai Xiao Brown", "Xiao Brown bottle" }. N-gram (n > -2) is disassembled to find out all sub-character strings with the length more than or equal to 2 in one character string, and the method is a word segmentation method which contains more comprehensive information.
The invention can find out the frequently-appearing information according to the related bidding word table corresponding to the notes given by a small red book platform, and deduces one or more bidding words which are most likely to be actually purchased by calculating the relevance (the larger the better) and the difference (the smaller the better) of the alternative words and the core information.
For ease of understanding, the present invention briefly describes a specific embodiment:
the small red book platform presents notes for a certain lancome release and the following alternative word list (i.e., related bidded word list): three thousands of words such as "eye cream recommendation", "Jiaoyinyi double extract essence", "Yashilandai eye cream", "fine line removing eye cream", "eye cream black eye fine line", "wrinkle resisting and tightening essence", "eye cream eye essence sequence", "fine line fading eye cream", "lanoco eye cream sensitive muscle", "lanoco cyanine pure eye cream" … … and the like.
The invention can deduce that the high-frequency phrase (i.e. core information) of the note has: eye cream, essence, fine wrinkles, lancome and the like. At this time, the candidate words containing core information, such as "eye cream recommendation" (containing "eye cream"), will get a certain score, but the words containing more core information, such as "lankan eye cream essence" (containing "lankan", "eye cream", "essence"), get a higher score; on the basis, the less information except the core information (for example, the 'lancome eye cream essence' does not contain redundant information except the core information) is deduced by the algorithm. It is worth mentioning that the invention can strictly monitor the consistency of brands, categories and the like, and ensure the logic of the deduction result. Thus, the bid term to infer the note purchase might be: the eye cream with fine lines is recommended, and the eye cream essence with lancome and cardamom is also disclosed. This means that the algorithm considers these phrases to contain the most core information-i.e., the most relevant to the note and the more precise and natural word-to fit the product selling point and the consumer's needs and mind.
The invention can also calculate the relevance between other related competitive terms and the calculated competitive terms by comparing the similarity of terms according to the calculated competitive terms, thereby carrying out relevance classification. As described above:
classifying the eye cream recommendation, the fine-line-removing eye cream, the eye cream black eye fine lines and the fine-line-fading eye cream into the fine-line eye cream recommendation;
the anti-wrinkle firming essence, the eye cream eye essence sequence, the lanocoma eye cream sensitive muscle and the lanocoma pure eye cream are classified into the lanocoma pure eye cream.
The step can make full use of the information given by the small red book platform and induce and sort the information, so that the method is more targeted.
While the foregoing is directed to the preferred embodiment of the present invention, it will be appreciated that numerous modifications and variations may be devised by those skilled in the art in light of the above teachings. Therefore, the technical field of the invention based on the concept of the invention through logic analysis, reasoning or limited experiments of equivalent changes, modifications, substitutions and variations, should be determined by the claims scope of protection.

Claims (10)

1. A method for reversely deducing bid words purchased by commodity promotion notes is characterized by comprising the following steps: which comprises the following steps:
1) collecting a note with commodity promotion information, brand information related to the note and an alternative word list related to the note;
2) judging whether the alternative vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, judging whether the brand information is abnormal;
3) when the brand information is not abnormal, n-gram character strings are disassembled for all the alternative words in the alternative word list, wherein n is more than or equal to 2 and is an integer, and the character string list is obtained after the character strings which appear only once are deleted; judging whether the character string table is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing the step 4);
4) extracting effective high-frequency phrases from the character string table to obtain an effective high-frequency phrase word list; judging whether the effective high-frequency phrase vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing the step 5);
5) initializing an alternative word list, deleting alternative words containing other brand information, then judging whether the alternative word list is empty, if so, outputting a reverse-thrust bidding word list to be empty, and if not, executing a step 6);
6) calculating the score of the alternative words according to the effective high-frequency phrases, calculating the difference degree of the alternative words according to the effective high-frequency phrases, and screening the effective alternative words according to the score and the difference degree to obtain an effective alternative word list; judging whether the effective alternative vocabulary is empty, if so, outputting a reverse-thrust bidding vocabulary to be empty, and if not, executing a step 7);
7) deleting the alternative words contained by the other alternative words in the effective alternative word list, obtaining the first five alternative words with the highest scores as the most possible bid words which are actually purchased by reverse estimation, outputting the reverse estimation bid words and storing the reverse estimation bid word list;
8) and classifying all the alternative words in the initial alternative word list according to the reverse-deducing bidding word list, outputting the alternative words obtained by classification and corresponding classification, and simultaneously outputting the note and related brand information.
2. The method of inferring bid terms purchased for a merchandise promotion note according to claim 1, wherein: the method for extracting effective high-frequency phrases from the character string table in the step 4) comprises the following steps:
4.1) setting a high-frequency proportion according to the initial data volume of the alternative words;
4.2) sorting the character string table in a descending manner according to word frequency, screening out high-frequency phrases according to a high-frequency proportion, and storing the high-frequency phrases into a word table;
4.3) deleting the high-frequency phrases contained by the rest phrases in the high-frequency phrase vocabulary;
and 4.4) deleting high-frequency phrases containing other brand information in the high-frequency phrase vocabulary to obtain effective high-frequency phrases, and storing the effective high-frequency phrases into the effective high-frequency phrase vocabulary.
3. The method of inferring bid terms purchased for a merchandise promotion note according to claim 2, wherein: the method for deleting the high-frequency phrases contained by the other phrases in the high-frequency phrase vocabulary in the step 4.3) comprises the following steps:
4.3.1) judging whether the high-frequency phrase vocabulary is empty, if so, returning to an empty list, and if not, executing the step 4.3.2);
4.3.2) the newly-built AC automaton is traversed, added with high-frequency phrases in the high-frequency phrase vocabulary and stored in a data structure;
4.3.3) judging whether to traverse the high-frequency phrase vocabulary until the high-frequency phrase vocabulary is completely taken, if not, executing a step 4.3.4), and if so, executing a step 4.3.5);
4.3.4) calling an AC automaton to traverse each high-frequency phrase, judging whether each high-frequency phrase contains the high-frequency phrases in the list except the high-frequency phrase per se, and if so, recording the contained high-frequency phrases into the list to be deleted;
4.3.5) deleting the high-frequency phrases in the deleted list after the duplication removal from the high-frequency phrase vocabulary, and returning to the high-frequency phrase vocabulary.
4. The method of inferring bid terms purchased for a merchandise promotion note according to claim 2, wherein: the method for deleting the high-frequency phrases containing other brand information in the high-frequency phrase vocabulary in the step 4.4) comprises the following steps:
4.4.1) traversing and adding brand information in a preset brand library by the newly-built AC automaton, and storing the brand information in a data structure; the brand information comprises a brand name and a brand number;
4.4.2) judging whether to traverse the high-frequency phrase until the high-frequency phrase is completely taken, if so, returning to an effective high-frequency phrase word list, and if not, executing the step 4.4.3);
4.4.3) calling an AC automaton to traverse each high-frequency phrase, judging whether brand information is detected in the high-frequency phrase, if not, executing the step 4.4.4), and if so, executing the step 4.4.5);
4.4.4) preserving the high frequency phrase;
4.4.5) judging whether the detected brand information is consistent with the brand information related to the note, if not, deleting the high-frequency short word from the high-frequency phrase word list, and if so, executing the step 4.4.4).
5. The method of inferring bid terms purchased for a merchandise promotion note according to claim 1, wherein: the method for initializing the alternative word list and deleting the alternative words containing other brand information in the step 5) comprises the following steps:
5.1) judging whether to traverse the alternative words until the alternative words are completely taken, if so, returning to an alternative word list, and if not, executing the step 5.2);
5.2) calling an AC automaton traversing a preset brand library to traverse each alternative word, judging whether brand information is detected in the alternative words, if not, executing the step 5.3), and if so, executing the step 5.4);
5.3) reserving the alternative word;
5.4) judging whether the detected brand information is consistent with the brand information related to the note, if not, deleting the alternative words from the alternative word list, and if so, executing the step 5.3).
6. The method of inferring bid terms purchased for a merchandise promotion note according to claim 1, wherein: the method for calculating the score of the alternative word according to the effective high-frequency phrase in the step 6) comprises the following steps:
6.1.1) calculating a high-frequency phrase score which is the word frequency of the high-frequency phrase/the sum of the word frequencies of all the high-frequency phrases according to the word frequency, judging whether to traverse the alternative word list until the candidate word list is completely obtained, if so, recording the alternative words with the score larger than 0 and the corresponding score, and if not, executing the step 6.1.2);
6.1.2) judging whether to traverse the high-frequency phrases and the corresponding scores until the high-frequency phrases and the corresponding scores are completely obtained, if so, recording the accumulated scores of the alternative words, and if not, executing the step 6.1.3);
6.1.3) judging whether the alternative words contain the high-frequency phrases, if so, accumulating corresponding scores of the high-frequency phrases.
7. The method of inferring bid terms purchased for a merchandise promotion note according to claim 1, wherein: the method for calculating the difference degree of the alternative words according to the effective high-frequency phrases in the step 6) comprises the following steps:
6.2.1) the newly-built AC automaton is traversed, high-frequency phrases and corresponding lengths are added, whether the alternative word list is traversed or not is judged until the alternative word list is completely taken out, if yes, the difference degree is equal to the residual length/the original length of the alternative word, and if not, the step 6.2.2) is executed;
6.2.2) calling an AC automaton to traverse each alternative word, judging whether each alternative word contains more than one high-frequency phrase, if not, the residual length is the original length of the alternative word-the length of one high-frequency phrase contained in the alternative word, and if so, executing a step 6.2.3);
6.2.3) judging whether the positions of the high-frequency phrases contained in the candidate words are overlapped, if not, executing a step 6.2.4), if so, removing the short-length high-frequency phrases in the overlapped objects, and then executing a step 6.2.4);
6.2.4) residual length-the original length of the candidate word-the length of the plurality of high frequency phrases contained.
8. The method of inferring bid terms purchased for a merchandise promotion note according to claim 1, wherein: the method for screening the effective alternative words according to the scores and the difference degrees in the step 6) comprises the following steps:
6.3.1) setting high-frequency proportion according to the existing data quantity of the alternative words.
6.3.2) sorting in an increasing way according to the difference degree, and screening the existing alternative words according to the high-frequency proportion;
6.3.3) sorting according to the scores in a descending way, screening the rest alternative words according to the high-frequency proportion, and storing the alternative words into an effective alternative word list;
6.3.4) judging whether the effective alternative word list is empty, if yes, returning to the empty list, if no, executing step 6.3.5);
6.3.5) for the alternative words with the same score in the effective alternative word list, only the alternative word with the minimum difference degree is reserved, and the effective alternative word list is returned.
9. The method of inferring bid terms purchased for a merchandise promotion note according to claim 1, wherein: the method for classifying all the alternative words in the initial alternative word list according to the reverse-deducing bidding word list in the step 8) comprises the following steps:
8.1) numbering the reverse-push bidding words in the reverse-push bidding word list, calculating scores of alternative words in the dimensions of the reverse-push bidding words according to the correlation, traversing all the alternative words until the alternative words are completely taken, judging whether the highest score is 0, if so, executing a step 8.2), and if not, executing a step 8.5);
8.2) calling an AC automaton which traverses a preset brand library to traverse alternative words, judging whether brand names are detected in the alternative words, if so, executing a step 8.3), and if not, executing a step 8.4);
8.3) judging whether the brand name detected in the candidate word is consistent with the brand information related to the note, if so, setting the classification of the candidate word as 'other related products of the brand', and if not, executing a step 8.4);
8.4) set the classification of this alternative word to "others";
8.5) calling an AC automaton which traverses a preset brand library to traverse alternative words, judging whether brand names are detected in the alternative words, if so, executing a step 8.6), and if not, executing a step 8.7);
8.6) judging whether the brand name detected in the candidate word is consistent with the brand information related to the note, if not, executing a step 8.4), and if so, executing a step 8.7);
8.7) setting the classification of the candidate word as the bidding word corresponding to the highest scoring dimension.
10. The method of inferring bid terms for purchase of a merchandise promotion note according to claim 9, wherein: the method for calculating the scores of the alternative words on the dimensions of the reverse-guessed bid words according to the relevance in the step 8.1) comprises the following steps:
8.1.1) judging whether to traverse the reverse-guessing bid words until the completion of the bid, if so, returning scores of the alternative words on the dimensionality of each bid word, and if not, executing a step 8.1.2);
8.1.2) performing n-gram character string disassembly on each reverse-thrust bidding word, traversing and adding the disassembled character string and the corresponding length by the newly-built AC automaton, and traversing all the alternative words until the alternative words are completely extracted;
8.1.3) calling an AC automaton to traverse each alternative word, judging whether each alternative word contains a substring of a reverse-deducing bid word, and if so, accumulating the length of the substring in the dimension of the current bid word for the alternative word to serve as a score.
CN202110855006.7A 2021-07-28 2021-07-28 Method for reversely pushing bid words purchased by commodity popularization notes Active CN113538062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110855006.7A CN113538062B (en) 2021-07-28 2021-07-28 Method for reversely pushing bid words purchased by commodity popularization notes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110855006.7A CN113538062B (en) 2021-07-28 2021-07-28 Method for reversely pushing bid words purchased by commodity popularization notes

Publications (2)

Publication Number Publication Date
CN113538062A true CN113538062A (en) 2021-10-22
CN113538062B CN113538062B (en) 2024-05-07

Family

ID=78089352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110855006.7A Active CN113538062B (en) 2021-07-28 2021-07-28 Method for reversely pushing bid words purchased by commodity popularization notes

Country Status (1)

Country Link
CN (1) CN113538062B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130048018A (en) * 2011-11-01 2013-05-09 주식회사 다음커뮤니케이션 System and method for advertisement
US20130132364A1 (en) * 2011-11-21 2013-05-23 Microsoft Corporation Context dependent keyword suggestion for advertising
CN103631963A (en) * 2013-12-18 2014-03-12 北京博雅立方科技有限公司 Keyword optimization processing method and device based on big data
CN103914492A (en) * 2013-01-09 2014-07-09 阿里巴巴集团控股有限公司 Method for query term fusion, method for commodity information publish and method and system for searching
CN104778602A (en) * 2015-03-25 2015-07-15 北京博雅立方科技有限公司 Dynamic adjustment method and device for promotional keywords
CN107463600A (en) * 2017-06-12 2017-12-12 百度在线网络技术(北京)有限公司 Advertisement putting keyword recommendation method and device, advertisement placement method and device
CN108664585A (en) * 2018-05-07 2018-10-16 多盟睿达科技(中国)有限公司 Word method is selected in a kind of advertisement based on big data
US20180307694A1 (en) * 2017-04-25 2018-10-25 Panasonic Intellectual Property Management Co., Ltd. Search method, search apparatus, and nonvolatile computer-readable recording medium
US20180349351A1 (en) * 2017-05-31 2018-12-06 Move, Inc. Systems And Apparatuses For Rich Phrase Extraction
CN110717104A (en) * 2019-10-11 2020-01-21 广州市丰申网络科技有限公司 Keyword advertisement putting automatic negative keyword method and device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20130048018A (en) * 2011-11-01 2013-05-09 주식회사 다음커뮤니케이션 System and method for advertisement
US20130132364A1 (en) * 2011-11-21 2013-05-23 Microsoft Corporation Context dependent keyword suggestion for advertising
CN103914492A (en) * 2013-01-09 2014-07-09 阿里巴巴集团控股有限公司 Method for query term fusion, method for commodity information publish and method and system for searching
CN103631963A (en) * 2013-12-18 2014-03-12 北京博雅立方科技有限公司 Keyword optimization processing method and device based on big data
CN104778602A (en) * 2015-03-25 2015-07-15 北京博雅立方科技有限公司 Dynamic adjustment method and device for promotional keywords
US20180307694A1 (en) * 2017-04-25 2018-10-25 Panasonic Intellectual Property Management Co., Ltd. Search method, search apparatus, and nonvolatile computer-readable recording medium
US20180349351A1 (en) * 2017-05-31 2018-12-06 Move, Inc. Systems And Apparatuses For Rich Phrase Extraction
CN107463600A (en) * 2017-06-12 2017-12-12 百度在线网络技术(北京)有限公司 Advertisement putting keyword recommendation method and device, advertisement placement method and device
CN108664585A (en) * 2018-05-07 2018-10-16 多盟睿达科技(中国)有限公司 Word method is selected in a kind of advertisement based on big data
CN110717104A (en) * 2019-10-11 2020-01-21 广州市丰申网络科技有限公司 Keyword advertisement putting automatic negative keyword method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
沙晨;张忠能;: "网上交易平台中关键字推广和广告研究与应用", 微型电脑应用, no. 11, 20 November 2013 (2013-11-20), pages 39 - 44 *

Also Published As

Publication number Publication date
CN113538062B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
CN110222272B (en) Potential customer mining and recommending method
JP5456412B2 (en) Community-based advertising word ambiguity removal system and method
Broder et al. Online expansion of rare queries for sponsored search
CN110263248B (en) Information pushing method, device, storage medium and server
CN109064285B (en) Commodity recommendation sequence and commodity recommendation method
CN110210952B (en) Bidding evaluation method and device
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
JP2010009307A (en) Feature word automatic learning system, content linkage type advertisement distribution computer system, retrieval linkage type advertisement distribution computer system and text classification computer system, and computer program and method for them
CN112380349A (en) Commodity gender classification method and device and electronic equipment
JPWO2007108529A1 (en) Information extraction system, information extraction method, information extraction program, and information service system
Wang et al. Psychological advertising: exploring user psychology for click prediction in sponsored search
Chauhan et al. Research on product review analysis and spam review detection
CN113570413A (en) Method and device for generating advertisement keywords, storage medium and electronic equipment
CN111986007A (en) Method for commodity aggregation and similarity calculation
CN105931082B (en) Commodity category keyword extraction method and device
Rani et al. Study and comparision of vectorization techniques used in text classification
Kae et al. Categorization of display ads using image and landing page features
CN115659961B (en) Method, apparatus and computer storage medium for extracting text views
CN109284384B (en) Text analysis method and device, electronic equipment and readable storage medium
Nasiri et al. Aspect category detection on indonesian e-commerce mobile application review
CN113538062A (en) Method for reversely deducing bid words purchased by commodity promotion notes
Rubtsova et al. Aspect extraction from reviews using conditional random fields
US20180005300A1 (en) Information presentation device, information presentation method, and computer program product
CN115131108A (en) E-commerce commodity screening system
CN114255067A (en) Data pricing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant