CN113032683B

CN113032683B - Method for quickly segmenting words in network popularization

Info

Publication number: CN113032683B
Application number: CN202110469657.2A
Authority: CN
Inventors: 李勤义
Original assignee: Maize Society Shenzhen Network Technology Co ltd
Current assignee: Maize Society Shenzhen Network Technology Co ltd
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2021-12-24
Anticipated expiration: 2041-04-28
Also published as: CN113032683A

Abstract

The invention discloses a method for quickly segmenting words in network popularization, which comprises the steps that a user inputs keywords, a word segmentation system automatically excavates all long tail words containing the keywords according to the keywords input by the user and stores the long tail words as txt files; the word segmentation system reads all long end words from the txt file, performs word segmentation, breaks up all long end words, extracts keywords with high occurrence frequency and summarizes high-frequency word roots, and returns the keywords to the user; the user reserves effective words according to the high-frequency root; screening out effective roots according to the reserved effective words; and the word segmentation system performs word segmentation according to the screened effective root words and derives an xls word segmentation table. According to the invention, when the same type of keywords are exported by the word segmentation system, the keywords are automatically grouped according to the length of the characters, so that better popularization is achieved, the word segmentation result is exported to the xls file to the local by one key, the problems of low word segmentation speed and keyword omission are solved, the time efficiency of screening effective words from a large number of keywords and carrying out classification and integration by enterprises is improved, and the working efficiency and the result are improved.

Description

Method for quickly segmenting words in network popularization

Technical Field

The invention relates to the technical field of computers, in particular to a method for quickly segmenting words in network popularization.

Background

With the increasing of enterprise transformation internet network marketing promotion, the method of enterprises in network promotion and the keywords in paying promotion need to be more accurate and effective, and how to screen effective keywords from tens of thousands, hundreds of thousands and millions of keywords is the problem that enterprises need to consider firstly when in network promotion, and how to perform keyword classification combination after screening effective keywords is also the problem that enterprises are very painful, and if effective words cannot be screened and word segmentation is performed according to the attributes of different words, the enterprises can cause great waste in the promotion process.

At present, word segmentation is basically performed through traditional manual word segmentation, which common words, such as factory words, price words, model words, scene words and the like, need to be found out from all long-end words at first in the traditional manual word segmentation, on one hand, different industries need to be very proficient to know which word roots exist in the long-end words to be segmented, and the method is tedious, time-consuming and energy-consuming, is easy to omit keywords, and needs a more convenient method capable of improving word segmentation speed.

In the operation process of traditional manual word segmentation, word-by-word classification is needed, if a core keyword has hundreds of thousands of long-tailed words, a great amount of time is consumed for searching and classifying word-by-word in the word segmentation process, the keyword is easy to miss, if a manufacturer word is to be segmented, the word containing the manufacturer needs to be found out from the hundreds of thousands of long-tailed words one by one, the words are classified together, thus the word classification needs to be manually screened once from the hundreds of thousands of words, and if all the words are to be segmented, the words need to be manually extracted for many times. Through the word segmentation system, long-tail words containing manufacturers can be automatically extracted by inputting the root word manufacturer system, and are classified according to the structures of the manufacturers at the head, the middle and the tail. Therefore, the traditional manual word segmentation still has a plurality of disadvantages in the operation process.

Disclosure of Invention

Aiming at the technical problems in the related art, the invention provides a method for quickly segmenting words in network popularization, which can overcome the defects of the prior art.

In order to achieve the technical purpose, the technical scheme of the invention is realized as follows:

a method for quickly segmenting words in network popularization comprises the following steps:

s1, the user inputs keywords, and the word segmentation system automatically excavates all long tail words containing the keywords according to the keywords input by the user and stores the long tail words as txt files;

s2, the word segmentation system reads all long-tail words from the txt file, performs word segmentation according to Chinese, breaks up all long-tail words, extracts keywords with high occurrence frequency, extracts high-frequency word roots and returns the high-frequency word roots to the user;

s3, the user reserves effective words according to the high-frequency root extracted by the word segmentation system;

s4, screening effective roots according to the remaining effective words;

and S5, the word segmentation system performs word segmentation according to the screened effective root words and derives an xls word segmentation table.

Further, the effective word retention is to extract a high-frequency root through a word segmentation system, remove the ineffective words and repeat the operation until all the ineffective words are removed.

Further, the effective word root is screened out by high-frequency word roots in the remaining effective words according to the principle that the parts of speech are similar and the structures are the same until no extractable effective word exists in the remaining effective words.

Further, in the word segmentation stage, after the effective root is selected, the word segmentation system extracts similar keywords from all long-tail words according to the root sequence and classifies the keywords according to all effective roots selected by the user, and in the similar keywords, the keywords with the same length are classified according to the character length, and finally an xls word segmentation table is generated.

Further, the step of dividing the keywords into columns is that after the words of the same type of effective root words are divided into a column, the words with the consistent length of the key words are subdivided into a column according to the length of the key words in each column, then regional words are extracted from each column and are divided into a column, and the content of each column is operated in a circulating mode.

The invention has the beneficial effects that: when the same type of keywords are exported through the system, the keywords are automatically grouped according to the length of the characters, so that better popularization is achieved, the word segmentation result is exported to the xls file to the local through one key, the problems that the word segmentation speed is low and the keywords are omitted are solved, the time efficiency of screening effective words from a large number of keywords and performing classification and integration by enterprises is improved, and the working efficiency and the result are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a keyword in a word segmentation system of a method for quickly segmenting words in a web browser according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart illustrating an implementation of the method for fast segmenting words in network popularization according to an embodiment of the present invention.

Fig. 3 is a schematic flow chart of an implementation process of the method for quickly segmenting words in network popularization according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

As shown in fig. 1-2, the method for fast segmenting words in network promotion according to the embodiment of the present invention includes that a user inputs keywords, and a word segmentation system automatically mines all long-tailed words including the keywords according to the keywords input by the user and stores the long-tailed words as txt long-tailed word files; the word segmentation system reads all long-tail words from the txt long-tail word file, breaks up all long-tail words according to the Chinese word segmentation root word technology, integrates and counts the roots with high occurrence frequency, directly analyzes common roots and the occurrence frequency, namely extracts keywords with high occurrence frequency, extracts and summarizes high-frequency roots, and returns the high-frequency roots to a user for analysis and use.

And the user removes the invalid words according to the high-frequency word roots automatically extracted by the word segmentation system, repeats the operation until all the invalid words are removed, and retains the remaining valid words until the next step.

And then, screening the effective words from the remaining effective words by the high-frequency root according to the principles of similar part of speech and same structure until no extractable keyword exists in the remaining keywords.

The word segmentation system extracts similar keywords from all long-tail words according to the root sequence of all effective roots selected by a user in all keywords, divides the keywords with the same length into a column according to the character length in the similar keywords, thins and divides the words with the consistent keyword character length into a column according to the keyword character length in each column, extracts regional words from each column, and divides the regional words into a column, so that the content of each column is operated circularly, and finally an xls word segmentation table is generated.

In order to facilitate understanding of the above-described technical aspects of the present invention, the above-described technical aspects of the present invention will be described in detail below in terms of specific usage.

As shown in fig. 3, first, a core word is input: and the system automatically performs long-tail word mining on the FFU, and excavates 13649 long-tail words of the FFU. The system breaks up all keywords to be combined according to the Chinese word segmentation principle, and can screen out 220 roots with high occurrence frequency: shandong, energy conservation, format, leak detection, interlayer, after sale, cleanliness, resistance, achievement, application, encyclopedia, introduction, Germany, recommendation, Zhengzhou, materials, evaluation, problem, smallpox, Futai, ranking, typing, Kunshan, titer, air-out, comparison, laboratory, clean shop, spot inspection, distance, formaldehyde removal, schematic, professional, lifting, cause, pipeline, retrofit, positive pressure, efficiency, times, Wuhan, technology, switch, download, comparison, hundred thousand, DC motor, fiberglass, cleaning, instructions, recovery, Guangdong, treatment, air change … ….

Screening out effective roots and invalid roots according to a part of roots listed above; then, filtering out invalid roots, automatically grouping and sequencing the system according to all the roots, preferably sequencing according to the length of characters, and then automatically screening out regional words for sequencing, if: energy savings, formats, leak detection, interlayers, after sales, resistance, reach, application, encyclopedia, introduction, recommendations, materials, evaluation, questions, ceiling, futai, ranking, typing, titer, air out, comparison, spot inspection, distance, specialty, hoisting, cause, piping, retrofit, positive pressure, efficiency, times, technology, switch, download, comparison, hundred thousand, cleaning, recovery, processing, ventilation, instructions, removal of formaldehyde, schematic, cleanliness, laboratories, clean shops, dc motors, fiberglass … ….

And finally, deriving the xls table according to the divided keywords. Such as the word list of table 1.

ffu + characteristic 9 characters	ffu + application 9 characters	ffu + application 9 characters	ffu + characteristic 8 characters	ffu + region 8 characters
					ffu Automation	ffu food plants	ffu laminar flow table	ffu mute	ffu Chongqing
ffu side air intake	ffu clean room	ffu mute cover	ffu New air	ffu Shenzhen
					ffu technical grade	ffu air shower	ffu clean room		ffu Suzhou
ffu entrance/exit	ffu super clean bench	ffu clean room

TABLE 1 word segmentation table

In summary, by means of the technical scheme of the invention, when the same type of keywords are exported by the system, the keywords are automatically grouped according to the length of the characters, so that better popularization is achieved, the word segmentation result is exported to the xls file to the local by one key, the problems of low word segmentation speed and keyword omission are solved, the time efficiency of screening effective words from a large number of keywords by an enterprise and carrying out classification and integration is improved, and the working efficiency and the result are improved.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for quickly segmenting words in network popularization is characterized by comprising the following steps:

s1, firstly, a user inputs a keyword, and the word segmentation system automatically excavates all long tail words containing the keyword according to the keyword input by the user and stores the long tail words as a txt file;

s2, the word segmentation system reads all long-tail words from the txt file, performs word segmentation according to the Chinese word segmentation root technology, breaks up all long-tail words, extracts keywords with high occurrence frequency, extracts high-frequency roots and returns the high-frequency roots to the user;

s3, the user screens out invalid roots according to the high-frequency roots extracted by the word segmentation system and retains the remaining valid words;

s4, screening effective roots according to the remaining effective words;

2. The method of claim 1, wherein the step of retaining the valid word comprises extracting a high-frequency root word by a word segmentation system, removing invalid words, and repeating the operation until all the invalid words are removed.

3. The method for rapidly segmenting words in network popularization according to claim 1, wherein the effective root is screened out by screening out effective roots from the remaining effective words through high-frequency roots according to the principle that the parts of speech are similar and the structures are the same until no extractable effective words exist in the remaining effective words.

4. The method for fast segmenting words in network popularization according to claim 1, wherein in the word segmentation stage, after the effective root is selected, the word segmentation system extracts similar keywords from all long-tail words according to the root sequence and classifies the keywords according to the character length, and in the similar keywords, the keywords with the same length are ranked according to the character length to finally generate an xls word segmentation table.

5. The method as claimed in claim 4, wherein the keyword clustering is performed by partitioning the same type of effective root words into a row, then refining the words with consistent keyword character length into a row according to the keyword character length in each row, then extracting the regional words from each row, and then dividing the regional words into a row, and cyclically operating the content of each row.