CN117668343A - Keyword extraction method and device, electronic equipment and storage medium - Google Patents

Keyword extraction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117668343A
CN117668343A CN202311616287.6A CN202311616287A CN117668343A CN 117668343 A CN117668343 A CN 117668343A CN 202311616287 A CN202311616287 A CN 202311616287A CN 117668343 A CN117668343 A CN 117668343A
Authority
CN
China
Prior art keywords
words
word
target
preset
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311616287.6A
Other languages
Chinese (zh)
Inventor
刘文才
邵明星
朱朴
唐宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202311616287.6A priority Critical patent/CN117668343A/en
Publication of CN117668343A publication Critical patent/CN117668343A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a keyword extraction method, a keyword extraction device, electronic equipment and a storage medium, wherein the keyword extraction method comprises the following steps: extracting the characteristics of the target webpage to obtain characteristic words of the target webpage; inputting the feature words into a preset search engine, and inquiring search suggestion words associated with the feature words; the search suggestion words are obtained by analyzing the feature words based on the user search history data; performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths; and responding to the information crawling request of the preset search engine on the target webpage, and returning the target keyword to the preset search engine so that the preset search engine records the target webpage based on the target keyword. Because the search suggestion words can reflect the search requirements of the users to a certain extent, the target keywords can also be matched with the search requirements of the users, so that SEO optimization can be performed on the target web pages better.

Description

Keyword extraction method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a keyword extraction method, a keyword extraction device, an electronic device, and a storage medium.
Background
Search engine optimization (Search Engine Optimization, SEO) is a technique that improves the ranking of web pages in a search engine by analyzing the ranking rules of the search engine and performing targeted optimization on web page content, thereby improving the web page access.
Search engines utilize complex algorithms to analyze and rank web page content, and if the Keywords (Keywords) of a web page are highly matched with search terms entered by a user, the ranking of the web page in the search results may be higher. Therefore, in the SEO, the ranking of the web pages in the search engine can be improved through optimizing the keywords, and the exposure of the web pages is increased.
However, in the prior art, keyword extraction is generally performed based on the webpage content, and the search requirement and preference of the user are not considered, so that the extracted keywords cannot be well matched with the actual search requirement of the user, and the SEO optimization effect of the webpage is not ideal.
Disclosure of Invention
In order to solve the technical problems, the application discloses a keyword extraction method, a keyword extraction device, electronic equipment and a storage medium, so as to at least solve the problems that the extracted keywords in the related technology are not well matched with the actual search requirements of users, and the SEO optimization effect of the webpage is not ideal. The technical scheme of the present disclosure is as follows:
In a first aspect, the present application shows a keyword extraction method, the method including:
extracting the characteristics of a target webpage to obtain characteristic words of the target webpage;
inputting the feature words into a preset search engine, and inquiring search suggestion words associated with the feature words; the search suggestion words are obtained by analyzing the feature words based on the user search history data;
performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths;
and responding to the information crawling request of the preset search engine on the target webpage, and returning the target keyword to the preset search engine so that the preset search engine records the target webpage based on the target keyword.
Optionally, the word frequency statistics is performed on the word segmentation, and the determining the target keyword from the word segmentation according to the word frequency statistics result and the word length includes:
counting word frequency of the word segmentation, and determining a weighted sum of word frequency and word length of the word segmentation as a sorting score of the word segmentation according to a preset word frequency weight and a preset word length weight;
Selecting a preset number of target word segments from the word segments according to the sequence of the sorting scores from large to small;
and combining the preset number of target segmentation words to obtain target keywords.
Optionally, the word segmentation processing is performed on the search suggestion word to obtain a plurality of segmented words, word frequency statistics is performed on the segmented words, and a target keyword is determined from the segmented words according to a word frequency statistics result and a word length, including:
performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining a first keyword from the segmented words according to word frequency statistics results and word lengths;
performing natural language processing on the search suggestion words to generate second keywords corresponding to the search suggestion words;
and taking the first keyword and the second keyword as the target keyword.
Optionally, the probability of the word segment as the target keyword is positively correlated with the word frequency and the word length of the word segment.
Optionally, before the target keyword is returned to the preset search engine in response to the information crawling request of the preset search engine for the target webpage, the method further includes:
Generating a key value pair corresponding to the target keyword; the value of the key value pair is the target keyword, and the key of the key value pair is the content identifier of the target webpage;
storing the key value pairs into a preset database;
the responding to the information crawling request of the preset search engine to the target webpage returns the target keyword to the preset search engine, and the method comprises the following steps:
responding to the information crawling request of the preset search engine on the target webpage, and inquiring the key value pair from the preset database according to the content identification of the target webpage; the information crawling request carries the content identification of the target webpage;
and returning the target keywords contained in the key value pairs to the preset search engine.
Optionally, the word segmentation processing is performed on the search suggestion word to obtain a plurality of segmented words, word frequency statistics is performed on the segmented words, and a target keyword is determined from the segmented words according to a word frequency statistics result and a word length, including:
removing the stop word processing for the search suggestion word, and filtering punctuation marks in the search suggestion word to obtain a word to be analyzed;
performing word segmentation processing on the word to be analyzed to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths.
Optionally, the method further comprises:
and returning to the step of extracting the characteristics of the target webpage according to a preset period to obtain the characteristic words of the target webpage.
Optionally, the target webpage includes a streaming media webpage, and the extracting the features of the target webpage to obtain feature words of the target webpage includes:
and extracting a video title of the streaming media webpage as a characteristic word of the streaming media webpage.
Optionally, the inputting the feature word into a preset search engine, querying a search suggestion word associated with the feature word includes:
and calling a search suggestion word interface of the preset search engine, and taking the feature words as input parameters of the search suggestion word interface so that the preset search engine inquires the search suggestion words associated with the feature words.
Optionally, the preset search engine is configured to store the target keyword and the webpage identifier of the target webpage correspondingly, and return the webpage identifier of the target webpage after receiving the search request for the target keyword.
In a second aspect, an embodiment of the present invention provides a keyword extraction apparatus, including:
the extraction module is used for extracting the characteristics of the target webpage to obtain characteristic words of the target webpage;
The query module is used for inputting the characteristic words into a preset search engine and querying search suggestion words associated with the characteristic words; the search suggestion words are obtained by analyzing the feature words based on the user search history data;
the analysis module is used for carrying out word segmentation processing on the search suggestion words to obtain a plurality of segmented words, carrying out word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths;
and the response module is used for responding to the information crawling request of the preset search engine on the target webpage and returning the target keyword to the preset search engine so that the preset search engine records the target webpage based on the target keyword.
Optionally, the analysis module is specifically configured to:
counting word frequency of the word segmentation, and determining a weighted sum of word frequency and word length of the word segmentation as a sorting score of the word segmentation according to a preset word frequency weight and a preset word length weight;
selecting a preset number of target word segments from the word segments according to the sequence of the sorting scores from large to small;
and combining the preset number of target segmentation words to obtain target keywords.
Optionally, the analysis module is specifically configured to:
performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining a first keyword from the segmented words according to word frequency statistics results and word lengths;
performing natural language processing on the search suggestion words to generate second keywords corresponding to the search suggestion words;
and taking the first keyword and the second keyword as the target keyword.
Optionally, the probability of the word segment as the target keyword is positively correlated with the word frequency and the word length of the word segment.
Optionally, the apparatus further comprises a storage module for:
generating a key value pair corresponding to the target keyword; the value of the key value pair is the target keyword, and the key of the key value pair is the content identifier of the target webpage;
storing the key value pairs into a preset database;
the response module is specifically configured to:
responding to the information crawling request of the preset search engine on the target webpage, and inquiring the key value pair from the preset database according to the content identification of the target webpage;
and returning the target keywords contained in the key value pairs to the preset search engine.
Optionally, the analysis module is specifically configured to:
removing the stop word processing for the search suggestion word, and filtering punctuation marks in the search suggestion word to obtain a word to be analyzed;
performing word segmentation processing on the word to be analyzed to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths.
Optionally, the method further comprises an updating module for:
and returning to the step of extracting the characteristics of the target webpage according to a preset period to obtain the characteristic words of the target webpage.
Optionally, the target webpage includes a streaming media webpage, and the extracting module is specifically configured to:
and extracting a video title of the streaming media webpage as a characteristic word of the streaming media webpage.
Optionally, the query module is specifically configured to:
and calling a search suggestion word interface of the preset search engine, and taking the feature words as input parameters of the search suggestion word interface so that the preset search engine inquires the search suggestion words associated with the feature words.
Optionally, the preset search engine is configured to store the target keyword and the webpage identifier of the target webpage correspondingly, and return the webpage identifier of the target webpage after receiving the search request for the target keyword.
In a third aspect, an embodiment of the present invention provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the keyword extraction method of any one of the above.
In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium, which when executed by a processor of a keyword extraction electronic device, causes the keyword extraction electronic device to perform the keyword extraction method of any one of the above.
Compared with the prior art, the application has the following advantages:
extracting the characteristics of the target webpage to obtain characteristic words of the target webpage; inputting the feature words into a preset search engine, and inquiring search suggestion words associated with the feature words; the search suggestion words are obtained by analyzing the feature words based on the user search history data; performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths; and responding to the information crawling request of the preset search engine on the target webpage, and returning the target keyword to the preset search engine so that the preset search engine records the target webpage based on the target keyword.
After the feature words of the target webpage are extracted, search suggestion words associated with the feature words are inquired, then the target keywords are determined based on the search suggestion words, so that the target webpage is recorded by a preset search engine based on the target keywords.
Drawings
FIG. 1 is a flow chart of steps of a keyword extraction method of the present application;
FIG. 2 is a schematic diagram of a keyword extraction method of the present application;
FIG. 3 is a block diagram of a keyword extraction apparatus of the present application;
FIG. 4 is a schematic diagram of an electronic device of the present application;
fig. 5 is a block diagram of an apparatus for keyword extraction of the present application.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
The SEO can optimize the keywords of the target webpage by analyzing the ranking rule of the search engine, so that the keywords of the target webpage are highly matched with the search words input by the user, and the ranking of the target webpage in the search engine and the access amount of the target webpage are improved.
In the related art, keyword extraction is generally performed based on web page content, and the web page content may have a certain deviation from consideration of search requirements and preferences of users, so that the extracted keywords cannot be well matched with actual search requirements of the users.
Referring to fig. 1, a step flowchart of a keyword extraction method of the present application is shown, which may specifically include the following steps:
in step S11, feature extraction is performed on the target web page, so as to obtain feature words of the target web page.
Specifically, the method and the device can be applied to a server, other servers or terminals in the local area and the network of the server, acquire target webpages and perform subsequent keyword extraction steps, and return target keywords of webpage identification of the target webpages after receiving an information crawling request of a preset search engine on the target webpages, so that the preset search engine records the target webpages based on the target keywords, and search sorting is performed on the target webpages based on the target keywords when a user searches.
The terminal may be a mobile phone, a tablet computer, a desktop computer, a portable notebook computer, or other terminal devices in various forms, which is not limited in this embodiment of the present application.
The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content distribution network (Content DeliveryNetwork, CDN), basic cloud computing services such as big data and an artificial intelligent platform.
Cloud technology (Cloud technology) refers to a hosting technology that unifies serial resources such as hardware, software, networks and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.
In some embodiments, the servers described above may also be implemented as nodes in a blockchain system. Blockchain (Blockchain) is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanisms, encryption algorithms, and the like. The blockchain is essentially a decentralised database, and is a series of data blocks which are generated by association by using a cryptography method, and each data block contains information of a batch of network transactions and is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
In this step, first, a target web page is acquired, feature extraction is performed on the target web page, and representative words are extracted from information contained in the target web page according to characteristics of the target web page, and the representative words are used as feature words of the target web page, wherein the feature words have relevance with the target web page.
For example, text information included in the target webpage can be obtained, natural language processing is performed on the text information, and keywords in the text information are extracted and used as feature words of the target webpage; or, the target recognition may be performed on the picture included in the target webpage, the feature word of the target webpage is determined according to the detected target category in the picture, and the like, which is not limited in detail.
For example, if the target webpage is a streaming media webpage, performing feature extraction on the target webpage to obtain feature words of the target webpage, including:
and extracting the video title of the streaming media webpage as a characteristic word of the streaming media webpage.
The streaming media data is continuously sent to the client by the server, and the client can start playing without downloading all the streaming media data locally. In a streaming media web page, video data is usually included, and the video data is important content in the streaming media web page, so that a video title (title) of the streaming media web page can be extracted as a feature word of the streaming media web page, so as to quickly obtain the key content in the streaming media web page.
In addition, the video identifier and the video data have unique corresponding relation, so that the video title and the video identifier can be used as characteristic words of the streaming media webpage together. For example, the feature words of a certain streaming media webpage can be parameterized into the following JSON (JavaScript Object Notation, JS object numbered musical notation) format:
{ "id":1, "title": "return" }
The feature words of the streaming media webpage comprise a video title and a video identifier (id), specifically, the video title is "return path", and the video identifier is "1".
In step S12, inputting the feature words into a preset search engine, and querying search suggestion words associated with the feature words; the search suggestion words are obtained by analyzing the feature words based on the user search history data.
The search engine is a search system for collecting information from the Internet by using a specific computer program according to a certain strategy, organizing and processing the information, providing search service for users, and displaying the searched related information to the users.
In the step, the feature words are input into the preset search engine, search suggestion words associated with the feature words can be queried, the number of the search suggestion words is limited, users can be helped to correct some errors on the premise of being close to the search intention of the users, and corresponding search results can be ensured to be obtained as much as possible, so that the probability that the search suggestion words are searched in the preset search engine as input parameters is high, the search requirements of most users in the preset search engine can be reflected, and the optimization of SEO keywords of target web pages is facilitated.
Specifically, the feature words are input into a preset search engine, search suggestion words associated with the feature words are inquired, and the method comprises the following steps:
and calling a search suggestion word interface of the preset search engine, and taking the feature words as input parameters of the search suggestion word interface so that the search suggestion words associated with the feature words are inquired by the preset search engine.
When a user inputs certain words in a search box input box of the preset search engine as input parameters of the preset search engine, the preset search engine automatically generates some words to complement the words possibly searched by the user, the function is called a search suggestion function, a calling interface of the search suggestion function is called a search suggestion word interface, and the words generated by the preset search engine are called search suggestion words. Therefore, by calling the search suggestion word interface of the preset search engine, the related search suggestion words can be quickly queried according to the feature words, so that the keyword extraction efficiency is improved.
For example, if in a preset search engine, "return" is used as an input parameter, the associated number of search suggestion words are as follows:
"return to the online watching corpus of free edition of TV drama; the literacy authors read freely; a road returning novel; introducing a return scenario; returning to the complete version of the free play of the television drama; returning to the road television play; returning to online watching of the television drama; returning to the road to watch the online play of the television play album freely; author of the "return to the road"; and (3) the Chinese angelica is of the leotard family.
Then, the search suggestion word interface of the search engine is called, and after a plurality of search suggestion words are acquired, the following format can be parameterized:
{ "id":1, "title": "return to the road", "sug": "return to the road TV show free edition online viewing corpus; the literacy authors read freely; a road returning novel; scenario introduction of return route; returning to the complete version of the free play of the television drama; returning to the road television play; returning to online watching of the television drama; returning to the road for free watching and online playing of the complete television episode; author of the "return to the road"; guilu encyclopedia "}
The sug represents a search suggestion word, and a plurality of search suggestion words can be separated by a semicolon or comma or space, and the search suggestion words are not limited in particular.
In step S13, word segmentation processing is performed on the search suggestion words to obtain a plurality of segmented words, word frequency statistics is performed on the segmented words, and target keywords are determined from the segmented words according to word frequency statistics results and word lengths.
In this step, the search suggestion word is used as input, word segmentation is performed on the search suggestion word by using a word segmentation device, and the word segmentation is one of basic operations of natural language processing, namely, continuous texts are segmented into individual word elements, wherein the search suggestion word is used as continuous texts to perform word segmentation, and the obtained individual word elements can be called word segmentation.
Wherein, the probability of the word segmentation as the target keyword is positively correlated with the word frequency and word length of the word segmentation.
Specifically, after the word segmentation result is obtained, word frequency statistics can be performed on the obtained word segments, namely, the occurrence frequency of each word segment is counted, and then, according to the word frequency statistics result of each word segment and the word length of the word segment, the word segment with larger word frequency and longer word length is selected as a target keyword.
It can be understood that the probability that the search suggestion word is searched in the preset search engine as an input parameter is higher, and accordingly, if the occurrence frequency of a certain word obtained by word segmentation processing of the search suggestion word is higher, the probability that the word is searched in the preset search engine as an input parameter is higher than that of other words. Therefore, the word with larger word frequency is selected as the target keyword, so that the target keyword is more likely to be searched in the preset search engine.
In addition, word segmentation is obtained based on word segmentation processing of search suggestion words, word length of the word segmentation processing is limited, and the word segmentation processing is related to words or word combinations with shorter word length, and in general, when a user searches through a preset search engine, words or word combinations capable of completely representing search requirements of the user are input, so that more accurate search results are obtained. Therefore, the segmented words with longer word length are selected as target keywords, so that the target keywords are closer to input parameters of a user during searching in a preset search engine.
In one implementation, word segmentation processing is performed on the search suggestion words to obtain a plurality of segmented words, word frequency statistics is performed on the segmented words, and target keywords are determined from the segmented words according to word frequency statistics results and word lengths, including:
removing the stop word processing on the search suggestion words, and filtering punctuation marks in the search suggestion words to obtain words to be analyzed; performing word segmentation processing on the word to be analyzed to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths.
That is, it is necessary to pre-process the search suggestion words, remove stop words and punctuation marks inside each search suggestion word, and then perform word segmentation.
The stop words refer to some functional words in the text, and the functional words are extremely common, and have no actual meaning compared with other words, such as ' that ', ' and the like, so that by deleting the stop words, underlying information can be deleted from the text, so that important information is more focused, and the accuracy and efficiency of determining target keywords in subsequent steps are further improved.
In addition, some punctuations, such as a title, a quotation mark, a dash, etc., may be included in the search suggestion word, and these punctuations have no actual meaning, so by deleting the punctuation marks in the search suggestion word, the underlying information may be deleted from the text, so as to pay more attention to important information, and further improve the accuracy and efficiency of determining the target keyword in subsequent steps.
Continuing the above example, performing the de-stop word processing on the obtained search suggestion words, and filtering punctuation marks in each search suggestion word, where the obtained word to be analyzed may be expressed as:
{ "id":1, "title": "return to the road", "sug": "return to the road TV show free edition online viewing corpus; the literacy authors read freely; a road returning novel; introducing a return scenario; returning to the complete version of the free play of the television drama; returning to the road television play; returning to online watching of the television drama; returning to the road to watch the online play of the television play album freely; returning to the road author; guilu encyclopedia "}
Specifically, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths, wherein the method comprises the following steps:
counting word frequency of the segmented words, and determining a weighted sum of word frequency and word length of the segmented words according to preset word frequency weights and preset word length weights to be used as a sorting score of the segmented words; selecting a preset number of target word segments from the word segments according to the sequence of the sorting scores from large to small; and combining the preset number of target segmentation words to obtain target keywords.
That is, the word frequency and the word length of the word can be scored according to the preset word frequency weight and the preset word length weight, and the above process can be formulated as:
S=mA+nB
Wherein S represents the sorting score of the word, A represents the word frequency of the word, B represents the word length of the word, m represents the preset word frequency weight, and n represents the preset word length weight. The preset word frequency weight and the preset word length weight can be predetermined experience values, can be adjusted according to requirements and scenes, and are not particularly limited.
Then, arranging the segmented words in an inverted order according to the sorting score, selecting the preset number of segmented words with highest sorting, and splicing, wherein the preset number can be 5 or other values as a final target keyword, and the method is not limited in detail.
In this way, the target keywords are quantitatively determined from the segmented words according to the word frequency statistical result and the word length, and the probability that each segmented word is used as the target keyword is guaranteed to be positively correlated with the word frequency and the word length of the segmented word, so that SEO optimization based on the target keywords is facilitated.
Continuing the above example, after word segmentation processing is performed on the search suggestion words, a plurality of obtained word segments are as follows:
"return to the way", "drama", "free", "online", "watch", "album", "play", "novel", "author", "full version", "reading", "scenario", "introduction", "encyclopedia".
Then, word frequency statistics is carried out on the segmented words, and word frequency, word length and ranking score corresponding to each segmented word are shown in table 1:
TABLE 1 word frequency, word length, and ranking score comparison Table for each word segment
Wherein, the word frequencies corresponding to the above word segmentation are respectively 10, 5, 4, 3, 2, 1, the word frequencies corresponding to the word segmentation are respectively 10, 5, 4 and 3 3, 2, 1, when the preset word frequency weight and the preset word length weight are both 1, the sorting scores corresponding to the individual word segments are respectively 12 and 8 6, 5, 4, 3.
Then, selecting 5 target word segments according to the order of the sorting scores from large to small, wherein the target word segments are as follows:
"return to the way", "drama", "free", "online", "watch".
Further, after the target word is connected, the obtained target keywords are: the return television drama is watched on line free.
In this application, in addition to determining a target keyword from the word segments according to the word frequency statistics result and the word length, other manners may be adopted to determine the target keyword, specifically, word segmentation processing is performed on the search suggestion word to obtain a plurality of word segments, word frequency statistics is performed on the word segments, and the target keyword is determined from the word segments according to the word frequency statistics result and the word length, including:
Performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining a first keyword from the segmented words according to word frequency statistics results and word lengths; performing natural language processing on the search suggestion words to generate second keywords corresponding to the search suggestion words; and taking the first keyword and the second keyword as target keywords.
That is, the search suggestion words may be analyzed by natural language processing (Natural Language Processing, NLP) of the search suggestion words using a natural language processing model, outputting the second keywords. In this way, the powerful natural language understanding and generating capability of the natural language processing model can be utilized, and the SEO optimization efficiency is further improved.
For example, any model such as a bag of words model, a convolutional neural network model, an attention model, and a long and short term memory network may be used to perform natural language processing on the search suggestion words, which is not limited in this embodiment.
For example, a chat generation pre-training converter (Chat Generative Pre-trained Transformer, chatGPT) can be used to perform natural language processing on search suggestion words, and ChatGPT is a chat robot based on natural language processing technology, and can analyze text input by a user in a dialog box and answer accordingly.
The text entered by the user in the dialog box may be as follows:
"you are SEO experts, analyze the following search suggestion words for multiple search engines separated by semicolons, and mine the user's search needs. Finally, generating 1 SEO title which is convenient for a search engine to record and attracts a user to click on based on the search requirement of the user, wherein the SEO title is not diverged, the reason is not explained, and only the SEO title is output:
"filling in search suggestion words" here "
The term "filling out the search suggestion" at this point of the text may be replaced with all or part of the search suggestion obtained in step S12, and is not limited in particular. After the ChatGPT receives the text, natural language processing can be performed on all or part of the search suggestion words input by the user, and the obtained second keywords are output in the dialog box.
In step S14, in response to the information crawling request of the preset search engine for the target web page, the target keyword is returned to the preset search engine, so that the preset search engine records the target web page based on the target keyword.
The method comprises the steps that a preset search engine accesses web pages in the Internet by utilizing a crawler program, related information of the web pages is acquired for recording, after input parameters of a user are acquired, inquiry and search can be conducted in the recorded web pages, and search results related to the input parameters of the user are returned to the user.
In this step, if an information crawling request of the preset search engine for the target web page is received, the target keyword may be returned to the preset search engine in response to the information crawling request, so that the preset search engine records the target web page based on the target keyword.
Specifically, the preset search engine is configured to store the target keyword and the web page identifier of the target web page correspondingly, and return the web page identifier of the target web page after receiving the search request for the target keyword. The web page identifier may be a URL (Uniform Resource Locator ) address of the target web page, or may be a video identifier of a video in the target web page, or the like, which is not specifically limited.
When the user searches by the preset search engine, if the input parameters are related to the target keywords, the preset search engine can return the webpage identification of the target webpage to the user as a search result, and the user can further access the target webpage according to the webpage identification by taking the target webpage as the search result of the target keywords currently being searched. Because the target keywords are determined based on the search suggestion words, both the probability that the target web page is searched and the ranking in the search results are optimized.
Under the condition that the target webpage corresponds to a plurality of target keywords, the returned target keywords can be determined according to different application scenes, and the determined target keywords are used as the titles of the target webpage and returned to the preset search engine, so that the target webpage can be recorded by the preset search engine.
In one implementation manner, after determining the target keyword, the target keyword and the target webpage may be stored correspondingly, specifically, before returning the target keyword to the preset search engine in response to an information crawling request of the preset search engine to the target webpage, the method further includes:
generating a key value pair corresponding to the target keyword; the value (value) of the key value pair is a target keyword, and the key (key) of the key value pair is the content identification of the target webpage; storing the key value pairs into a preset database;
that is, the target keyword and the target web page are stored in the form of key value pairs, and the key value pairs are stored in a preset database, and typically, the preset database is deployed at the server.
For example, if the target web page is a streaming web page and the target keyword includes a first keyword (keyword 1) and a second keyword (keyword 2), the video identifier may be used as a key, the target keyword may be used as a value, and a key value pair may be obtained and parameterized as follows:
{ "id" 1, "title" means "return to the road", "keyword1" means "return to the road" free on-line watching of TV play "," keyword2 "means" return to the road "free on-line watching of TV play" }
It can be understood that in the present application, the first keyword and the second keyword are obtained in different manners, where the first keyword performs word segmentation processing on the search suggestion word to obtain a plurality of segmented words, performs word frequency statistics on the segmented words, and determines according to a word frequency statistics result and a word length; the second keyword is generated by performing natural language processing on the search suggestion word, and thus, the first keyword and the second keyword may be the same or different, and are not particularly limited.
The information crawling request for the target webpage generally carries the content identifier of the target webpage. Then, responding to the information crawling request of the preset search engine to the target webpage, returning the target keywords to the preset search engine, wherein the information crawling request comprises the following steps:
responding to an information crawling request of a preset search engine on a target webpage, and inquiring a key value pair from a preset database according to the content identification of the target webpage; and returning the target keywords contained in the key value pairs to a preset search engine.
That is, when the crawler program of the preset search engine accesses the target webpage, the webpage end can call the keyword query interface of the server end in real time according to the content identifier of the target webpage, query is performed in the preset database, and after the fact that the key of a certain key value pair is identical to the content identifier of the target webpage is queried, the webpage end can return the value of the key value pair to the preset search engine as the target keyword.
Therefore, the target keywords of the webpage identification of the target webpage can be quickly inquired and returned to the preset search engine, and the SEO optimization efficiency is improved.
It should be noted that the search requirement of the user is always changed, so in the present application, the step of extracting the features of the target webpage to obtain the feature words of the target webpage may be returned according to the preset period.
That is, the feature extraction can be performed on the target webpage by repeatedly executing the keyword extraction step provided by the application, so as to obtain the feature words of the target webpage; inputting the feature words into a preset search engine, and inquiring search suggestion words associated with the feature words; the search suggestion words are obtained by analyzing the feature words based on the user search history data; performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths; and responding to the information crawling request of the preset search engine on the target webpage, and returning the target keyword to the preset search engine so that the preset search engine records the target webpage based on the target keyword. The method and the device can update the target keywords of the target webpage, so that the keyword extraction effect is more flexible, the user requirements of different periods are met, and the SEO optimization effect is further improved.
As shown in fig. 2, a logic schematic diagram of a keyword extraction method provided in the present application may include 5 steps of data acquisition, data preprocessing, keyword extraction, keyword storage, and search engine recording, specifically:
and (3) data acquisition: selecting a batch of feature words according to the characteristics of a target website, and acquiring related search suggestion words from a suggestion word interface of a preset search engine by utilizing the feature words;
data preprocessing: preprocessing based on the obtained search suggestion words to obtain words to be analyzed;
keyword extraction: performing word segmentation word frequency statistics and ChatGPT text analysis on words to be analyzed, and respectively generating a first keyword and a second keyword which are target keywords;
keyword storage: writing the target keywords into a preset database of the server for storage;
search engine listing: when the crawler program of the preset search engine crawls the webpage content of the target webpage, the webpage end calls a keyword query interface of the server end in real time to query target keywords of the target webpage, and returns the target keywords serving as the titles of the target webpage to the preset search engine to finish recording.
From the above, it can be seen that, according to the technical scheme provided by the embodiment of the present disclosure, after the feature words of the target web page are extracted, the search suggestion words associated with the feature words are queried, and then the target keywords are determined based on the search suggestion words, so that the preset search engine records the target web page based on the target keywords, and because the search suggestion words can reflect the search requirements of the user to a certain extent, the target keywords can also be matched with the search requirements of the user, thereby better performing SEO optimization on the target web page, improving the ranking of the target web page in the search engine, and increasing the web page flow.
Referring to fig. 3, a schematic structural diagram of a keyword extraction apparatus of the present application may specifically include:
the extracting module 201 is configured to perform feature extraction on a target webpage, so as to obtain feature words of the target webpage;
a query module 202, configured to input the feature word into a preset search engine, and query a search suggestion word associated with the feature word; the search suggestion words are obtained by analyzing the feature words based on the user search history data;
the analysis module 203 is configured to perform word segmentation processing on the search suggestion word to obtain a plurality of segmented words, perform word frequency statistics on the segmented words, and determine a target keyword from the segmented words according to a word frequency statistics result and a word length;
and the response module 204 is configured to respond to an information crawling request of the preset search engine for the target webpage, and return the target keyword to the preset search engine, so that the preset search engine records the target webpage based on the target keyword.
Optionally, the analysis module 203 is specifically configured to:
counting word frequency of the word segmentation, and determining a weighted sum of word frequency and word length of the word segmentation as a sorting score of the word segmentation according to a preset word frequency weight and a preset word length weight;
Selecting a preset number of target word segments from the word segments according to the sequence of the sorting scores from large to small;
and combining the preset number of target segmentation words to obtain target keywords.
Optionally, the analysis module 203 is specifically configured to:
performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining a first keyword from the segmented words according to word frequency statistics results and word lengths;
performing natural language processing on the search suggestion words to generate second keywords corresponding to the search suggestion words;
and taking the first keyword and the second keyword as the target keyword.
Optionally, the probability of the word segment as the target keyword is positively correlated with the word frequency and the word length of the word segment.
Optionally, the apparatus further comprises a storage module for:
generating a key value pair corresponding to the target keyword; the value of the key value pair is the target keyword, and the key of the key value pair is the content identifier of the target webpage;
storing the key value pairs into a preset database;
the response module 204 is specifically configured to:
responding to the information crawling request of the preset search engine on the target webpage, and inquiring the key value pair from the preset database according to the content identification of the target webpage; the information crawling request carries the content identification of the target webpage;
And returning the target keywords contained in the key value pairs to the preset search engine.
Optionally, the analysis module 203 is specifically configured to:
removing the stop word processing for the search suggestion word, and filtering punctuation marks in the search suggestion word to obtain a word to be analyzed;
performing word segmentation processing on the word to be analyzed to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths.
Optionally, the method further comprises an updating module for:
and returning to the step of extracting the characteristics of the target webpage according to a preset period to obtain the characteristic words of the target webpage.
Optionally, the target web page includes a streaming media web page, and the extracting module 201 is specifically configured to:
and extracting a video title of the streaming media webpage as a characteristic word of the streaming media webpage.
Optionally, the query module 202 is specifically configured to:
and calling a search suggestion word interface of the preset search engine, and taking the feature words as input parameters of the search suggestion word interface so that the preset search engine inquires the search suggestion words associated with the feature words.
Optionally, the preset search engine is configured to store the target keyword and the webpage identifier of the target webpage correspondingly, and return the webpage identifier of the target webpage after receiving the search request for the target keyword.
From the above, it can be seen that, according to the technical scheme provided by the embodiment of the present disclosure, after the feature words of the target web page are extracted, the search suggestion words associated with the feature words are queried, and then the target keywords are determined based on the search suggestion words, so that the preset search engine records the target web page based on the target keywords, and because the search suggestion words can reflect the search requirements of the user to a certain extent, the target keywords can also be matched with the search requirements of the user, thereby better performing SEO optimization on the target web page, improving the ranking of the target web page in the search engine, and increasing the web page flow.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
FIG. 4 is a block diagram of an electronic device for keyword extraction, including a processor and a memory for storing a computer program, according to an exemplary embodiment; the processor is used for executing programs stored on the memory.
The memory may include random access memory (RanDOM Access Memory, RAM) or non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processing, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In an exemplary embodiment, a computer-readable storage medium is also provided, such as a memory, comprising instructions executable by a processor of an electronic device to perform the above-described method. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
In an exemplary embodiment, a computer program product is also provided which, when run on a computer, causes the computer to implement the method of comment keyword extraction described above.
From the above, it can be seen that, according to the technical scheme provided by the embodiment of the present invention, after the feature words of the target web page are extracted, the search suggestion words associated with the feature words are queried, and then the target keyword is determined based on the search suggestion words, so that the preset search engine records the target web page based on the target keyword, and the search request of the user can be reflected to a certain extent by the search suggestion words, so that the target keyword can also be matched with the search request of the user, thereby better performing SEO optimization on the target web page, promoting the ranking of the target web page in the search engine, and increasing the web page flow.
Fig. 5 is a block diagram illustrating an apparatus 800 for keyword extraction, according to an example embodiment.
For example, apparatus 800 may be a mobile phone, computer, digital broadcast electronic device, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, or the like.
Referring to fig. 5, apparatus 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.
The processing component 802 generally controls overall operation of the apparatus 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 802 may include one or more processors 820 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interactions between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operations at the device 800. Examples of such data include instructions for any application or method operating on the device 800, contact data, phonebook data, messages, pictures, videos, and the like. The memory 804 may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
Power supply component 807 provides power to the various components of device 800. Power supply component 807 can include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for device 800.
The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operational mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data to be processed. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may be further stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 further includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be a keyboard, click wheel, buttons, etc. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of the apparatus 800. For example, the sensor assembly 814 may detect an on/off state of the device 800, a relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or one component of the apparatus 800, the presence or absence of user contact with the apparatus 800, an orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. The sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communication between the apparatus 800 and other devices, either in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic elements for executing the methods described in the first and second aspects.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 804 including instructions executable by processor 820 of apparatus 800 to perform the above-described method. Alternatively, for example, the storage medium may be a non-transitory computer-readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
In an exemplary embodiment, a computer program product containing instructions is also provided, which when run on a computer, causes the computer to perform the keyword extraction method of any of the above embodiments.
From the above, it can be seen that, according to the technical scheme provided by the embodiment of the present invention, after the feature words of the target web page are extracted, the search suggestion words associated with the feature words are queried, and then the target keyword is determined based on the search suggestion words, so that the preset search engine records the target web page based on the target keyword, and the search request of the user can be reflected to a certain extent by the search suggestion words, so that the target keyword can also be matched with the search request of the user, thereby better performing SEO optimization on the target web page, promoting the ranking of the target web page in the search engine, and increasing the web page flow.
Other implementations of the examples of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The embodiments of the present invention are intended to cover any variations, uses, or adaptations of the embodiments of the present invention following, in general, the principles of the embodiments of the present invention and including such departures from the present disclosure as come within known or customary practice in the art to which the embodiments of the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the embodiments being indicated by the following claims.
It is to be understood that the embodiments of the invention are not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be made without departing from the scope thereof. The scope of embodiments of the invention is limited only by the appended claims.

Claims (13)

1. A keyword extraction method, comprising:
extracting the characteristics of a target webpage to obtain characteristic words of the target webpage;
inputting the feature words into a preset search engine, and inquiring search suggestion words associated with the feature words; the search suggestion words are obtained by analyzing the feature words based on the user search history data;
performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths;
and responding to the information crawling request of the preset search engine on the target webpage, and returning the target keyword to the preset search engine so that the preset search engine records the target webpage based on the target keyword.
2. The method of claim 1, wherein the performing word frequency statistics on the segmented words, and determining the target keyword from the segmented words according to the word frequency statistics and the word length, comprises:
Counting word frequency of the word segmentation, and determining a weighted sum of word frequency and word length of the word segmentation as a sorting score of the word segmentation according to a preset word frequency weight and a preset word length weight;
selecting a preset number of target word segments from the word segments according to the sequence of the sorting scores from large to small;
and combining the preset number of target segmentation words to obtain target keywords.
3. The method of claim 1, wherein the performing word segmentation on the search suggestion word to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining a target keyword from the segmented words according to word frequency statistics results and word lengths comprises:
performing word segmentation processing on the search suggestion words to obtain a plurality of segmented words, performing word frequency statistics on the segmented words, and determining a first keyword from the segmented words according to word frequency statistics results and word lengths;
performing natural language processing on the search suggestion words to generate second keywords corresponding to the search suggestion words;
and taking the first keyword and the second keyword as the target keyword.
4. A method according to any one of claims 1 to 3, wherein the probability of the word segment being a target keyword is positively correlated with the word frequency and word length of the word segment.
5. The method of claim 1, wherein the responding to the information crawling request of the preset search engine for the target webpage before returning the target keyword to the preset search engine further comprises:
generating a key value pair corresponding to the target keyword; the value of the key value pair is the target keyword, and the key of the key value pair is the content identifier of the target webpage;
storing the key value pairs into a preset database;
the responding to the information crawling request of the preset search engine to the target webpage returns the target keyword to the preset search engine, and the method comprises the following steps:
responding to the information crawling request of the preset search engine on the target webpage, and inquiring the key value pair from the preset database according to the content identification of the target webpage; the information crawling request carries the content identification of the target webpage;
and returning the target keywords contained in the key value pairs to the preset search engine.
6. The method of claim 1, wherein the word segmentation of the search suggestion word to obtain a plurality of segmented words comprises:
Removing the stop word processing for the search suggestion word, and filtering punctuation marks in the search suggestion word to obtain a word to be analyzed;
and performing word segmentation processing on the words to be analyzed to obtain a plurality of segmented words.
7. The method according to claim 1, wherein the method further comprises:
and returning to the step of extracting the characteristics of the target webpage according to a preset period to obtain the characteristic words of the target webpage.
8. The method according to claim 1, wherein the target web page comprises a streaming web page, the feature extraction is performed on the target web page to obtain feature words of the target web page, and the method comprises:
and extracting a video title of the streaming media webpage as a characteristic word of the streaming media webpage.
9. The method of claim 1, wherein the inputting the feature word into a preset search engine queries search suggestion words associated with the feature word, comprising:
and calling a search suggestion word interface of the preset search engine, and taking the feature words as input parameters of the search suggestion word interface so that the preset search engine inquires the search suggestion words associated with the feature words.
10. The method of claim 1, wherein the preset search engine is configured to store the target keyword and the web page identifier of the target web page correspondingly, and return the web page identifier of the target web page after receiving the search request for the target keyword.
11. A keyword extraction apparatus, characterized by comprising:
the extraction module is used for extracting the characteristics of the target webpage to obtain characteristic words of the target webpage;
the query module is used for inputting the characteristic words into a preset search engine and querying search suggestion words associated with the characteristic words; the search suggestion words are obtained by analyzing the feature words based on the user search history data;
the analysis module is used for carrying out word segmentation processing on the search suggestion words to obtain a plurality of segmented words, carrying out word frequency statistics on the segmented words, and determining target keywords from the segmented words according to word frequency statistics results and word lengths;
and the response module is used for responding to the information crawling request of the preset search engine on the target webpage and returning the target keyword to the preset search engine so that the preset search engine records the target webpage based on the target keyword.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the keyword extraction method of any one of claims 1 to 10 when the program is executed.
13. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the keyword extraction method of any one of claims 1 to 10.
CN202311616287.6A 2023-11-29 2023-11-29 Keyword extraction method and device, electronic equipment and storage medium Pending CN117668343A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311616287.6A CN117668343A (en) 2023-11-29 2023-11-29 Keyword extraction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311616287.6A CN117668343A (en) 2023-11-29 2023-11-29 Keyword extraction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117668343A true CN117668343A (en) 2024-03-08

Family

ID=90070670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311616287.6A Pending CN117668343A (en) 2023-11-29 2023-11-29 Keyword extraction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117668343A (en)

Similar Documents

Publication Publication Date Title
CN110020104B (en) News processing method and device, storage medium and computer equipment
CN107526744B (en) Information display method and device based on search
CN108121736B (en) Method and device for establishing subject term determination model and electronic equipment
CN111291069B (en) Data processing method and device and electronic equipment
CN109145213B (en) Historical information based query recommendation method and device
CN111708943B (en) Search result display method and device for displaying search result
CN111708901A (en) Multimedia resource recommendation method and device, electronic equipment and storage medium
CN108073606B (en) News recommendation method and device for news recommendation
US11899706B2 (en) Content-specific keyword notification system
CN111125344B (en) Related word recommendation method and device
CN110399548A (en) A kind of search processing method, device, electronic equipment and storage medium
US11392589B2 (en) Multi-vertical entity-based search system
CN113392195B (en) Public opinion monitoring method and device, electronic equipment and storage medium
CN101957825A (en) Method for searching image based on image and video content in webpage
CN106462588B (en) Content creation from extracted content
CN113239183B (en) Training method and device for ranking model, electronic equipment and storage medium
CN113407775B (en) Video searching method and device and electronic equipment
CN110110046B (en) Method and device for recommending entities with same name
CN112328809A (en) Entity classification method, device and computer readable storage medium
US20150052155A1 (en) Method and system for ranking multimedia content elements
CN112463827B (en) Query method, query device, electronic equipment and storage medium
CN112052395B (en) Data processing method and device
CN113590861B (en) Picture information processing method and device and electronic equipment
CN117668343A (en) Keyword extraction method and device, electronic equipment and storage medium
CN110147488B (en) Page content processing method, processing device, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination