CN115080824A - Target word mining method and device, electronic equipment and storage medium - Google Patents

Target word mining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115080824A
CN115080824A CN202210507323.4A CN202210507323A CN115080824A CN 115080824 A CN115080824 A CN 115080824A CN 202210507323 A CN202210507323 A CN 202210507323A CN 115080824 A CN115080824 A CN 115080824A
Authority
CN
China
Prior art keywords
words
original
word
intention
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210507323.4A
Other languages
Chinese (zh)
Inventor
吕洪亚
谭云飞
刘晓庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210507323.4A priority Critical patent/CN115080824A/en
Publication of CN115080824A publication Critical patent/CN115080824A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The disclosure provides a target word mining method and device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence such as machine learning and intelligent search. The specific implementation scheme is as follows: digging a plurality of original words corresponding to each material based on the information of each material in a material library; performing word expansion on the basis of the plurality of original words of each material to obtain a plurality of expansion words corresponding to each material; and screening a preset number of target words with the best quality according with a preset intention corresponding to the material library from the plurality of original words and the plurality of expanded words of each material. The technology disclosed by the invention can effectively improve the accuracy of the mined target words, and further can bring the optimal search flow to the material library when searching association is carried out based on the mined target words.

Description

Target word mining method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of artificial intelligence technologies such as machine learning and intelligent search, and in particular, to a method and an apparatus for mining a target word, an electronic device, and a storage medium.
Background
In a search application scenario, in order to enable each material in a material library to obtain the maximum search flow, a word with quality, such as a word with high presentation amount or a word with high skip click amount, needs to be provided based on each material.
For example, in a practical application scenario, each material provider may provide some tags (tags) at the same time when providing the material, and the tags are words. In order to enable the materials in the material library to obtain the maximum search flow, the labels of the materials can be directly used as target words associated with the search, so that the corresponding materials can be displayed and clicked when the target words are searched.
Disclosure of Invention
The disclosure provides a target word mining method and device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a method for mining a target word, including:
digging a plurality of original words corresponding to each material based on the information of each material in a material library;
performing word expansion on the basis of the original words of the materials to obtain expanded words corresponding to the materials;
and screening a preset number of target words with the best quality according with a preset intention corresponding to the material library from the plurality of original words and the plurality of expanded words of each material.
According to another aspect of the present disclosure, there is provided a target word mining apparatus, including:
the system comprises a mining module, a searching module and a searching module, wherein the mining module is used for mining a plurality of original words corresponding to each material based on the information of each material in a material library;
the expansion module is used for performing word expansion on the basis of the original words of the materials to obtain a plurality of expansion words corresponding to the materials;
and the screening module is used for screening a preset number of target words with the best quality according with preset intentions corresponding to the material library from the plurality of original words and the plurality of expanded words of each material.
According to still another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of the aspects and any possible implementation described above.
According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the above-described aspect and any possible implementation.
According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspect and any possible implementation as described above.
According to the technology disclosed by the invention, the accuracy of the mined target words can be effectively improved.
It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;
FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure;
FIG. 5 is a block diagram of an electronic device used to implement methods of embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is to be understood that the described embodiments are only a few, and not all, of the disclosed embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It should be noted that the terminal device involved in the embodiments of the present disclosure may include, but is not limited to, a mobile phone, a Personal Digital Assistant (PDA), a wireless handheld device, a Tablet Computer (Tablet Computer), and other intelligent devices; the display device may include, but is not limited to, a personal computer, a television, and the like having a display function.
In addition, the term "and/or" herein is only one kind of association relationship describing an associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
In the traditional technology, the method for acquiring the target words related to the search is simple, and the labels of the materials are directly acquired and used as the corresponding target words. The label is printed by the material provider, and certain non-standardization exists, so that the accuracy of the target word is poor, and the optimal search flow cannot be brought to the material library.
FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure; as shown in fig. 1, this embodiment provides a method for mining a target word, which specifically includes the following steps:
s101, mining a plurality of original words corresponding to each material based on information of each material in a material library;
s102, performing word expansion on the basis of a plurality of original words of each material to obtain a plurality of expansion words corresponding to each material;
s103, screening a preset number of target words with the best quality according with preset intentions corresponding to the material library from the plurality of original words and the plurality of expansion words of each material.
The executing body of the mining method of the target word in this embodiment may be a mining device of the target word, which may be an electronic entity; or the target words with the best quality and the preset number according with the preset intention can be mined by adopting software integrated application and running on computer equipment during use based on a material library.
The material library of this embodiment may be a resource library, or may also be a commodity library, where a plurality of materials included therein may be a plurality of resources or a plurality of commodities. Wherein the resource may be a video resource, an audio resource or a text resource. The commodity can be an entity commodity or a service commodity corresponding to the entity commodity. For example, the material library may be a commodity library of an e-commerce platform.
In this embodiment, all basic information of each material may be recorded in the material library, for example, the basic information may include a tag of the material, a topic title of the material, and the like. The tag and the subject title of the material can be provided by a material provider or mined by a management platform based on basic information of the material. The topic title of a material may be the name of the material when it is displayed. In this embodiment, a plurality of original words corresponding to the material may be mined based on the basic information of each material, such as the tag of the material and the topic title of the material.
In practical application, the tag of the material and the topic title of the material are composed of a plurality of words, and a plurality of original words corresponding to the material are obtained by mining all the words in the tag of the material and the topic title of the material.
In this embodiment, since the tag and the topic title may be provided by a material provider, and have certain non-normativity and poor accuracy, in order to improve the accuracy of the mined target word, in this embodiment, word expansion may be performed based on the original word of each material to obtain an expanded word corresponding to each material, so that the screening range of the target word may be expanded, and the accuracy of screening the target word may be improved. And finally, taking a plurality of original words and a plurality of expanded words of all materials in the material library as a word library, and screening target words with the best quality and the preset number according with preset intentions. The preset intention of the embodiment corresponds to the material library, and can be specifically obtained based on an object to be oriented when the material library is displayed. For example, if the target object To be targeted is a Business-oriented (To Business, referred To as ToB) object when the material library is displayed, the preset intention corresponding To the material library may be a ToB intention. If the target object To be faced is facing To the consumer (To Customer; ToC for short) when the material library is displayed, the preset intention corresponding To the material library can be a ToC intention; and so on.
For example, if a word is a word corresponding to a preset intention, when the word is displayed, the word is more concerned by an object user facing the material library, and the probability of being clicked is higher, so that the word may become a target word with the best quality of the preset intention. If the word is not the word with the preset intention, the object user facing the material library may ignore the word and click the word when the word is displayed, so that the word not in accordance with the preset intention is not suitable as the target word with the best quality of the preset intention. In the embodiment, when the target words are screened, the preset intentions corresponding to the material library are referred to, so that the accuracy of target word mining can be improved. Therefore, when the target word is used as the search related word of the material library, the probability that the physical object corresponding to the target word in the physical library is displayed and the jump point solution is solved can be improved, and further more search flow can be obtained.
According to the mining method of the target words, word expansion is performed on the basis of the plurality of original words of the materials in the material library to obtain the plurality of expansion words corresponding to the materials, the mining range of the target words can be achieved, and the mining accuracy of the target words is improved. Moreover, the preset number of target words with the best quality and meeting the preset intentions corresponding to the material library can be screened from the plurality of original words and the plurality of expanded words of each material, so that the intentions of the target words can be consistent with the preset intentions corresponding to the material library, and the accuracy of the target words with the preset number is further improved. Therefore, when the target words mined by the embodiment are used for search association display, more search flow can be effectively acquired for the material library due to the fact that the accuracy of the mined target words is very high.
FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure; this embodiment further introduces the technical solution of the present disclosure in more detail based on the technical solution of the embodiment shown in fig. 1. As shown in fig. 2, the mining method for target words in this embodiment may specifically include the following steps:
s201, mining a plurality of original words corresponding to each material based on the label and theme of each material in the material library;
specifically, each word in the label and the theme of each material may be obtained as an original word corresponding to the material.
S202, performing category labeling on each original word of each material;
for example, in this embodiment, the category of each original word may be set according to information of a material in the material library. In practical applications, the categories of the original words may be set to be multiple, including one as a core word, and others may be set according to practical requirements, such as modifiers, parameter words, and the like, which are not limited herein. For example, for the ToB intention material library, the category of each corresponding original word may include a geographic location, a core word, a parameter, and a brand name, and may also include other categories not belonging to the above categories.
Optionally, in practical applications, the category corresponding to each original word may be labeled based on the similarity between each original word and a preset category.
In an embodiment of the present disclosure, a pre-trained category labeling model may be further used to perform category labeling on each original word of each material.
For example, the category of an original word may be labeled manually in advance, and the category labeling model may be trained using manually labeled data. And then, labeling the category of each original word by using the trained category labeling model. By adopting the category labeling model, the category labeling is carried out on each original word of each material, and the labeling efficiency and the category labeling accuracy can be effectively improved.
S203, performing word expansion based on the category of each original word of each material to obtain a plurality of expansion words corresponding to each material;
for example, for each material, the core word in the multiple original words of the material may be used as a basis, and the core word is combined with the original words of other categories in the multiple original words of the material, so as to obtain multiple expansion words corresponding to the material through expansion.
That is, the word expansion may adopt a permutation and combination manner, and based on the core word in the multiple original words of each material, the core word is combined with the original words of other categories, respectively, to obtain multiple expanded words.
For example, for the material library of the ToB intent, the category of the original word corresponding to a certain material may include a geographic location, a core word, a parameter, and a brand, and when the word is expanded, the following combination may be included: a combination of geographic location and core word; a combination of parameters and core words; a combination of brands and core words; a combination of geographic location, parameters, and core words; a combination of geographic location, brand, and core word; a combination of parameters, brands, and core words; and combinations of geographic location, brand, parameters, and core words; the total 7 combination modes can bring 7 times of expansion words, provide rich words for screening of target words, and improve the accuracy of the mined target words.
Steps S202 to S203 in this embodiment are an implementation manner of step S102 in the embodiment shown in fig. 1, and in this manner, a plurality of expansion words corresponding to each material can be expanded, the screening range of the target words is expanded, and the accuracy of the mined target words is improved.
S204, filtering high-risk words in a plurality of original words and a plurality of expansion words of each material;
because the original words of each material in this embodiment are derived from the tag and the subject title of the material, and the tag and the subject title are basically provided by the material provider, there is a certain dissatisfaction. Therefore, in this embodiment, high-risk words in the target words need to be filtered out, so as to improve the quality and accuracy of the target words.
The filtering of the embodiment may be performed by using a black list and/or a white list. For example, the words belonging to the blacklist in a plurality of original words and a plurality of extension words of each material are considered as high-risk words and need to be filtered out. And the words belonging to the white list in the original words and the expanded words of each material are considered as safe words which can be used.
In addition, in an embodiment of the present disclosure, a pre-trained wind control model may be further used to filter high-risk words in the plurality of original words and the plurality of expanded words of each material.
For example, training data of the wind control model can be labeled manually, positive samples are from normative words, negative samples are from high-risk words, and the high-risk words can include some sensitive words, hot search words with high click rate, and the like. And then, counting characteristic information of each sample, such as the characteristics of the display amount, the affiliated category, the click amount, the skip click amount and the like in a preset time length before the current moment. The presentation amount refers to the number of times that the user is presented within a preset time length. The category to which the word belongs in the material library may refer to a category to which the word belongs, for example, a variety of categories such as home appliances, clothing, health, automobiles, mobile phones, and the like are preset in the material library of the commodity category based on the category of the commodity. The set categories of the material libraries of commodities in different fields can be different. The preset time length can be set according to actual requirements, and can be one week, two weeks, one month or other time lengths, which are not limited herein. The number of clicks may refer to the number of times clicked within a preset length of time. The jump click quantity may refer to the number of effective jumps after being clicked within a preset time length. Because some clicking users may be misoperation, after clicking, jumping is not realized, or after jumping, the users are quickly closed without browsing, which are not effective jumping, and only jumping to a corresponding page and completing browsing for a certain time are successful jumps.
During training, inputting the characteristic information of each sample into the wind control model, and predicting whether the word of the sample is a high-risk word or not by the wind control model; and adjusting parameters of the wind control model based on the labels of the marked positive and negative samples and the prediction result. And continuously training the wind control model by adopting the collected samples until the accurate wind control model is obtained. The wind-controlled model may then be used to filter high-risk words of the plurality of original words and the plurality of expanded words for each material.
During specific filtering, for each original word and each expanded word, corresponding characteristic information, such as characteristics including display amount, belonged category, click amount, skip click amount and the like, is respectively obtained and input into a wind control model, and the wind control model can predict the probability that the word is a high-risk word based on the characteristic information corresponding to the word. If the probability is greater than a preset probability threshold value, the word is considered as a high-risk word and needs to be filtered; otherwise, the word is retained.
It should be noted that the wind control model of the present embodiment may be implemented by using an xgboost model.
Through the high-risk word, a plurality of original words and a plurality of extension words of each material can be purified in this embodiment, the quality of a plurality of original words and a plurality of extension words of each material that are left is improved, and then the quality and the accuracy of target words can be improved.
S205, screening a plurality of intention words which accord with preset intentions from a plurality of original words and a plurality of expanded words of each material;
in other words, in this embodiment, a plurality of original words and a plurality of expanded words of each material are used as a word bank, and a plurality of intention words with preset intentions are screened from the word bank.
For example, in this embodiment, it is possible to identify whether each of the plurality of original words and the plurality of expanded words is a word with a preset intention through part-of-speech analysis.
Or optionally, in an embodiment of the present disclosure, a pre-trained preset intention word recognition model may also be adopted to screen a plurality of intention words with preset intentions from a plurality of original words and a plurality of expanded words of each material. For example, when the method is implemented, the method can comprise the following steps:
(1) for each material, acquiring natural features and preset intention features of each word in a plurality of original words and a plurality of expanded words of the material;
for example, the natural features of each word of the embodiment may include at least one of semantic features of the word, category features of the word in the search engine results, and natural result features of the word in the search engine; whether the ranking of the word in the natural search results in the search engine is in the top N preset, wherein N can be set to be top 10, top 20 or other values according to actual requirements. The category of the search engine result is also preset in the search engine, and the category feature in the search engine result refers to the category to which the word belongs in the search engine result.
The preset intention characteristics of each word of the embodiment may include at least one of content characteristics of the preset intention and category characteristics of the word in the preset intention field. The category of various materials is also preset in the preset intention field, and the category characteristics can refer to the category corresponding to the material to which the word belongs.
(2) Based on natural features and preset intention features of all the words in the original words and the expanded words, adopting a pre-trained preset intention word recognition model to recognize the probability that all the words in the original words and the expanded words are preset intention words;
when the method is used, the natural features and the preset intention features of the words obtained in the step (1) are input into the preset intention word recognition model, and the model can predict and output the probability that the words are the preset intention words.
(3) And screening a plurality of intention words according with preset intentions from the plurality of original words and the plurality of expansion words of each material based on the probability that each of the plurality of original words and the plurality of expansion words is a preset intention word and a preset probability threshold.
Specifically, if the probability that a certain word predicted by the preset intention word recognition model is a preset intention word is greater than a preset probability threshold, the word is considered to be a plurality of intention words according with preset intentions; otherwise the word does not conform to the plurality of intended words of the preset intention. Through the method, all intention words which accord with the preset intention can be screened out from a plurality of original words and a plurality of extension words of each material, and a plurality of intention words are obtained. The intention words obtained by the method are all words meeting the preset intention of the material library, and very accurate word sources can be provided for screening of the target words.
It should be noted that the preset intention word recognition model of the present embodiment needs to be trained before being used. During training, the natural features and the preset intention features of the words of the training samples are labeled, and similar modes are adopted, so that the repeated description is omitted.
Similarly, the preset intention word recognition model of the embodiment may also be implemented by using an xgboost model.
S206, obtaining cost parameters and income parameters of each intention word;
in practical applications, the intention words mined in the above manner may bring about a huge amount of traffic and an increase in cost, so that the profit is not as expected. In order to reduce the cost and maximize the profit, the cost parameter and profit parameter of each intention word need to be counted. For example, the display amount, the click rate, the skip click amount and the skip click rate of each intention word in a preset time length before the current time can be counted. Wherein the amount of each word is considered as a so-called cost, and the number of jump clicks per word is considered as a so-called profit. The jump click quantity refers to the effective jump click quantity, and the jump is not successful if the click is closed quickly or the jump to the corresponding webpage is not successful or the jump to the corresponding webpage has browsing time less than a preset time threshold. And the skip click rate is equal to the skip click amount/the display amount. For example, the cost parameter of the present embodiment may include a presentation amount, a click rate, and the like, and the benefit parameter may include a click jump amount, a click jump rate, and the like.
And S207, obtaining a preset number of target words with the maximum profit from the multiple intention words based on the cost parameters and the profit parameters of the intention words by a method for solving the maximum profit based on dynamic planning.
In a practical application scenario, the presentation amount of the words is not infinite, but there is a certain limitation on the presentation amount. Therefore, the method for obtaining the preset number of target words with the maximum profit from the plurality of intention words can be converted into the method for obtaining the optimal scheme, namely the optimal preset number of target words, by maximizing the skip click amount under a certain display amount. In this embodiment, the method for solving the maximum profit by using dynamic programming may obtain a preset number of target words with the maximum profit from the plurality of intention words based on the presentation amount, click rate, skip click rate, and skip click rate of each intention word in a preset time length before the current time. The method for solving the maximum benefit can be referred to related dynamic programming in detail, and details are not repeated herein.
According to the mining method for the target words, the words of the target words can be enriched by expanding the words, the mining range of the target words is expanded, and the mining accuracy of the target words is improved. Moreover, by presetting the intention word recognition model and screening a plurality of intention words, the selection range of the target words can be narrowed, and the mining efficiency of the target words is improved. And finally, acquiring the target words with the maximum profit according to the method for solving the maximum profit based on dynamic programming, and effectively improving the accuracy of the target words with the maximum profit.
The technical scheme of the present disclosure is described below by taking, as an example, a commodity information base provided by a merchant as a material base and a scene with a preset intention as a ToB intention. For example, the method specifically comprises the following steps:
1. and analyzing tag and title of each commodity provided by each merchant to obtain a plurality of original words of each commodity. Performing word expansion based on the core words to obtain a word bank comprising a plurality of original words and a plurality of expansion words of each commodity;
for example, five categories of original words can be labeled by a manual labeling method: geographic location (loc), core words (prd), parameters (prm), brand (brd), and other words (O). Since the tag and the title provided by the merchant are in the form of many words, the original word may be considered as the tag word and the title word obtained by analyzing the information provided by the merchant.
Specifically, tag words and title words provided by each merchant can be extracted 3w for manual labeling to serve as training data. After the labeling is finished, training is carried out through an ernie + crf framework, and the accuracy rate reaches over 95% through evaluation, so that a category labeling model is obtained. And then class labeling is carried out on all the original words by adopting the trained class labeling model.
In order to further expand the word bank, core words corresponding to all commodities are extracted, then the geographic position and the core words are combined, the parameters and the core words are combined, the brand and the core words are combined, the geographic position, the parameters and the core words are combined, the geographic position, the brand and the core words are combined, the parameters, the brand and the core words are combined, the geographic position, the brand, the parameters and the core words are combined to form expansion words, and the expansion of the word bank is achieved. Through the expansion, the word bank not only comprises the original words of all materials, but also comprises the expanded words expanded by adopting the various modes, so that the number of the word bank is greatly expanded, the condition that the original initial words are not accurate enough to cause inaccuracy of the target words is avoided, the mining range of the target words is wider, and the accuracy is higher.
2. Filtering high-risk words in a word bank by adopting a wind control model;
when a merchant fills in tag and title, in order to increase traffic exposure, some high-traffic and non-normative words are filled in, and the words are always at great risk. To solve this problem, the present embodiment constructs a wind control model to filter out high-risk words. The training mode of the wind control model may adopt the records of the above embodiments, and is not described herein again.
3. Filtering words which do not accord with ToB intention in a word bank by adopting a ToB intention word recognition model to obtain a plurality of intention words of the ToB intention;
by adopting the modeling manner of the preset intention word recognition model of the above embodiment, the ToB intention word recognition model of the present embodiment can be constructed. In this embodiment, a plurality of intention words of the ToB intention in the thesaurus may be identified using the ToB intention word recognition model.
For example, ToB intent may include ToB merchandise, ToB merchandise services, ToB merchandise addressing, etc., for example, a steamed stuffed bun maker is ToB merchandise services, and where steamed stuffed buns are sold is ToB merchandise addressing. The ToB intention word recognition model of the embodiment realizes recognition of ToB intention words based on the mode of ToB intention inclusion.
4. Inquiring cost parameters and income parameters of each intention word;
in the embodiment, the display amount, the click rate, the skip click rate and the skip click rate of each intention word can be queried by building an automatic query tool.
5. And obtaining a preset number of target words with the maximum profit from the plurality of intention words by adopting a method for solving the maximum profit by dynamic programming based on the cost parameter and the profit parameter of each intention word.
By adopting the method, the preset number of target words of the ToB intention in the commodity library can be accurately mined, and further, the search association display can be carried out based on the mined target words. Due to the fact that the accuracy of the mined target words is very high, more search flow can be effectively obtained for the commodity library when the target words are displayed.
FIG. 3 is a schematic illustration according to a third embodiment of the present disclosure; as shown in fig. 3, the present embodiment provides a target word mining device 300, including:
the mining module 301 is configured to mine a plurality of original words corresponding to each material based on information of each material in the material library;
an expansion module 302, configured to perform word expansion based on a plurality of original words of each material to obtain a plurality of expansion words corresponding to each material;
the screening module 303 is configured to screen a preset number of target words with the best quality according with a preset intention corresponding to the material library from the plurality of original words and the plurality of expanded words of each material.
The target word mining apparatus 300 of this embodiment uses the modules to implement the implementation principle and the technical effect of the target word mining, and as with the related method embodiments, reference may be made to the description of the related method embodiments in detail, and details are not repeated here.
FIG. 4 is a schematic diagram according to a fourth embodiment of the present disclosure; as shown in fig. 4, the present embodiment provides a mining apparatus 400 for a target word, which introduces the technical solution of the present disclosure in further detail based on the technical solution of the embodiment shown in fig. 3. As shown in fig. 4, the mining apparatus 400 for the target word in this embodiment includes the homonymy function module shown in fig. 3, a mining module 401, an extension module 402, and a filtering module 403.
As shown in fig. 4, in this embodiment, the extension module 402 includes:
the labeling unit 4021 is used for performing category labeling on each original word of each material;
the expansion unit 4022 is configured to perform word expansion based on the category of each original word of each material to obtain a plurality of expansion words corresponding to each material.
Further optionally, in an embodiment of the present disclosure, the marking unit 4021 is configured to:
and carrying out category labeling on each original word of each material by adopting a pre-trained category labeling model.
Further optionally, in an embodiment of the present disclosure, the expanding unit 4022 is configured to:
and for each material, combining the core words in the plurality of original words of the material with the original words of other categories in the plurality of original words of the material on the basis of the core words, and expanding to obtain a plurality of expanded words corresponding to the material.
Further optionally, as shown in fig. 4, in an embodiment of the present disclosure, the mining apparatus 400 for the target word further includes:
and a filtering module 404, configured to filter high-risk words in the plurality of original words and the plurality of expanded words of each material.
Further optionally, in an embodiment of the present disclosure, the filtering module 404 is configured to:
and filtering high-risk words in a plurality of original words and a plurality of extension words of each material by adopting a pre-trained wind control model.
Further optionally, in an embodiment of the present disclosure, the screening module 403 includes:
the screening unit 4031 is used for screening a plurality of intention words which accord with preset intentions from a plurality of original words and a plurality of expanded words of each material;
a parameter obtaining unit 4032, configured to obtain a cost parameter and a benefit parameter of each intention word;
the target word obtaining unit 4033 is configured to obtain a preset number of target words with the maximum profit from the multiple intention words based on the cost parameter and the profit parameter of each intention word in a method for solving the maximum profit based on dynamic programming.
Further optionally, in an embodiment of the present disclosure, the screening unit 4031 is configured to:
and screening a plurality of intention words with preset intentions from a plurality of original words and a plurality of extension words of each material by adopting a pre-trained preset intention word recognition model.
Further optionally, in an embodiment of the present disclosure, the screening unit 4031 is configured to:
for each material, acquiring natural features and preset intention features of each word in a plurality of original words and a plurality of expanded words of the material;
based on natural features and preset intention features of all the words in the original words and the expanded words, adopting a pre-trained preset intention word recognition model to recognize the probability that all the words in the original words and the expanded words are preset intention words;
and screening a plurality of intention words which accord with preset intentions from the plurality of original words and the plurality of extension words of each material based on the probability that each of the plurality of original words and the plurality of extension words is a preset intention word and a preset probability threshold.
Further optionally, in an embodiment of the present disclosure, the parameter obtaining unit 4032 is configured to:
and acquiring the display amount, the click rate, the skip click rate and the skip click rate of each intention word in a preset time length before the current moment.
The target word mining apparatus 400 of this embodiment implements the implementation principle and the technical effect of mining the target word by using the modules, and as with the related method embodiments, reference may be made to the description of the related method embodiments in detail, and details are not repeated here.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the customs of public sequences.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 501 performs the various methods and processes described above, such as the methods described above of the present disclosure. For example, in some embodiments, the above-described methods of the present disclosure may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 505. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the above-described method of the present disclosure described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured by any other suitable means (e.g., by means of firmware) to perform the above-described methods of the present disclosure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (23)

1. A method for mining a target word comprises the following steps:
digging a plurality of original words corresponding to each material based on the information of each material in a material library;
performing word expansion on the basis of the original words of the materials to obtain expanded words corresponding to the materials;
and screening a preset number of target words with the best quality according with a preset intention corresponding to the material library from the plurality of original words and the plurality of expanded words of each material.
2. The method of claim 1, wherein performing word expansion based on the plurality of original words of each of the materials to obtain a plurality of expanded words corresponding to each of the materials comprises:
performing category labeling on each original word of each material;
and performing word expansion on the basis of the category of each original word of each material to obtain a plurality of expansion words corresponding to each material.
3. The method of claim 2, wherein class labeling each of the original words of each of the items comprises:
and carrying out category labeling on each original word of each material by adopting a pre-trained category labeling model.
4. The method of claim 2, wherein performing word expansion based on the category of each original word of each material to obtain the multiple expanded words corresponding to each material comprises:
and for each material, combining the core words in the plurality of original words of the material with original words of other categories in the plurality of original words of the material on the basis of the core words, and expanding to obtain the plurality of expanded words corresponding to the material.
5. The method according to claim 1, wherein before the step of selecting a preset number of target words with the best quality according with a preset intention corresponding to the material library from the plurality of original words and the plurality of expanded words of each material, the method further comprises:
filtering high-risk words in the plurality of original words and the plurality of expanded words of each of the materials.
6. The method of claim 5, wherein filtering high-risk words of the plurality of original words and the plurality of expanded words of each of the items comprises:
and filtering high-risk words in the plurality of original words and the plurality of expansion words of each material by adopting a pre-trained wind control model.
7. The method according to any one of claims 1 to 6, wherein the step of screening a preset number of target words with the best quality according to a preset intention corresponding to the material library from the plurality of original words and the plurality of expanded words of each material comprises:
screening a plurality of intention words which accord with the preset intention from the plurality of original words and the plurality of expanded words of each material;
acquiring a cost parameter and a profit parameter of each intention word;
and obtaining the target words with the maximum profit from the plurality of intention words according to the cost parameter and the profit parameter of each intention word.
8. The method of claim 7, wherein the filtering a plurality of intention words that meet the preset intention from the plurality of original words and the plurality of expanded words of each of the materials comprises:
and screening the plurality of intention words with the preset intention from the plurality of original words and the plurality of expanded words of each material by adopting a pre-trained preset intention word recognition model.
9. The method of claim 8, wherein the screening the plurality of intention words of the preset intention from the plurality of original words and the plurality of expanded words of each of the materials using a pre-trained preset intention word recognition model comprises:
for each material, acquiring natural features and preset intention features of each word in the plurality of original words and the plurality of expanded words of the material;
based on natural features and preset intention features of all the original words and the expanded words, adopting a pre-trained preset intention word recognition model to recognize the probability that all the original words and the expanded words are preset intention words;
and screening the plurality of intention words which accord with the preset intention from the plurality of original words and the plurality of extension words of each material based on the probability that each of the plurality of original words and the plurality of extension words is a preset intention word and a preset probability threshold.
10. The method of claim 7, wherein obtaining a cost parameter and a benefit parameter for each of the intent words comprises:
and obtaining the display amount, the click rate, the skip click rate and the skip click rate of each intention word in a preset time length before the current moment.
11. An apparatus for mining a target word, comprising:
the system comprises a mining module, a searching module and a searching module, wherein the mining module is used for mining a plurality of original words corresponding to each material based on the information of each material in a material library;
the expansion module is used for performing word expansion on the basis of the original words of the materials to obtain a plurality of expansion words corresponding to the materials;
and the screening module is used for screening a preset number of target words with the best quality according with preset intentions corresponding to the material library from the plurality of original words and the plurality of expanded words of each material.
12. The apparatus of claim 11, wherein the expansion module comprises:
the labeling unit is used for labeling the category of each original word of each material;
and the expansion unit is used for performing word expansion on the basis of the category of each original word of each material to obtain the plurality of expansion words corresponding to each material.
13. The apparatus of claim 12, wherein the labeling unit is configured to:
and carrying out category labeling on each original word of each material by adopting a pre-trained category labeling model.
14. The apparatus of claim 12, wherein the extension unit is to:
and for each material, combining the core words in the plurality of original words of the material with original words of other categories in the plurality of original words of the material on the basis of the core words, and expanding to obtain the plurality of expanded words corresponding to the material.
15. The apparatus of claim 11, wherein the apparatus further comprises:
a filtering module for filtering high-risk words in the plurality of original words and the plurality of expanded words of each of the materials.
16. The apparatus of claim 15, wherein the filtering module is to:
and filtering high-risk words in the plurality of original words and the plurality of expansion words of each material by adopting a pre-trained wind control model.
17. The apparatus of any one of claims 11-16, wherein the screening module comprises:
the screening unit is used for screening a plurality of intention words which accord with the preset intention from the plurality of original words and the plurality of expanded words of each material;
a parameter obtaining unit, configured to obtain a cost parameter and a profit parameter of each intention word;
and the target word acquisition unit is used for solving a maximum profit method based on dynamic programming, and acquiring the target words with the maximum profit from the plurality of intention words according to the cost parameter and the profit parameter of each intention word.
18. The apparatus of claim 17, wherein the screening unit is to:
and screening the plurality of intention words with the preset intention from the plurality of original words and the plurality of expanded words of each material by adopting a pre-trained preset intention word recognition model.
19. The apparatus of claim 18, wherein the screening unit is configured to:
for each material, acquiring natural features and preset intention features of each word in the plurality of original words and the plurality of expanded words of the material;
based on natural features and preset intention features of all the original words and the expanded words, adopting a pre-trained preset intention word recognition model to recognize the probability that all the original words and the expanded words are preset intention words;
and screening the plurality of intention words which accord with the preset intention from the plurality of original words and the plurality of extension words of each material based on the probability that each of the plurality of original words and the plurality of extension words is a preset intention word and a preset probability threshold.
20. The apparatus of claim 17, wherein the parameter obtaining unit is configured to:
and obtaining the display amount, the click rate, the skip click rate and the skip click rate of each intention word in a preset time length before the current moment.
21. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.
23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.
CN202210507323.4A 2022-05-10 2022-05-10 Target word mining method and device, electronic equipment and storage medium Pending CN115080824A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210507323.4A CN115080824A (en) 2022-05-10 2022-05-10 Target word mining method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210507323.4A CN115080824A (en) 2022-05-10 2022-05-10 Target word mining method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115080824A true CN115080824A (en) 2022-09-20

Family

ID=83246821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210507323.4A Pending CN115080824A (en) 2022-05-10 2022-05-10 Target word mining method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115080824A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210191995A1 (en) * 2019-12-23 2021-06-24 97th Floor Generating and implementing keyword clusters

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210191995A1 (en) * 2019-12-23 2021-06-24 97th Floor Generating and implementing keyword clusters
US11941073B2 (en) * 2019-12-23 2024-03-26 97th Floor Generating and implementing keyword clusters

Similar Documents

Publication Publication Date Title
CN108521439B (en) Message pushing method and device
CN110019616B (en) POI (Point of interest) situation acquisition method and equipment, storage medium and server thereof
CN105550173A (en) Text correction method and device
CN104850546B (en) Display method and system of mobile media information
CN108334568B (en) House resource pushing method, device, equipment and computer readable storage medium
CN107222526B (en) Method, device and equipment for pushing promotion information and computer storage medium
CN113079417B (en) Method, device and equipment for generating bullet screen and storage medium
CN113157947A (en) Knowledge graph construction method, tool, device and server
EP3961426A2 (en) Method and apparatus for recommending document, electronic device and medium
CN108288208A (en) The displaying object of image content-based determines method, apparatus, medium and equipment
CN107609192A (en) The supplement searching method and device of a kind of search engine
CN112765452A (en) Search recommendation method and device and electronic equipment
CN115080824A (en) Target word mining method and device, electronic equipment and storage medium
CN114428902A (en) Information searching method and device, electronic equipment and storage medium
CN116955817A (en) Content recommendation method, device, electronic equipment and storage medium
CN114265777B (en) Application program testing method and device, electronic equipment and storage medium
CN114491232B (en) Information query method and device, electronic equipment and storage medium
CN114139052B (en) Ranking model training method for intelligent recommendation, intelligent recommendation method and device
CN114090601B (en) Data screening method, device, equipment and storage medium
CN114461749B (en) Data processing method and device for conversation content, electronic equipment and medium
CN113190746B (en) Recommendation model evaluation method and device and electronic equipment
CN114862479A (en) Information pushing method and device, electronic equipment and medium
CN113722593A (en) Event data processing method and device, electronic equipment and medium
CN113220947A (en) Method and device for encoding event characteristics
CN113010782A (en) Demand amount acquisition method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination