CN113204723A - Page background matching method and device based on page theme - Google Patents

Page background matching method and device based on page theme Download PDF

Info

Publication number
CN113204723A
CN113204723A CN202110391022.5A CN202110391022A CN113204723A CN 113204723 A CN113204723 A CN 113204723A CN 202110391022 A CN202110391022 A CN 202110391022A CN 113204723 A CN113204723 A CN 113204723A
Authority
CN
China
Prior art keywords
page
content
attribute value
text set
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110391022.5A
Other languages
Chinese (zh)
Inventor
郭世仁
廖琳
吴东庆
黄灏然
连剑波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkai University of Agriculture and Engineering
Original Assignee
Zhongkai University of Agriculture and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkai University of Agriculture and Engineering filed Critical Zhongkai University of Agriculture and Engineering
Priority to CN202110391022.5A priority Critical patent/CN113204723A/en
Publication of CN113204723A publication Critical patent/CN113204723A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a page background matching method and a device based on a page theme, wherein the method comprises the following steps: acquiring a page to be matched containing page content, and screening the page content of the page to be matched to obtain a page text set; extracting texts from the page text set to obtain a plurality of key texts; determining a page subject type based on the number of key texts; and matching the corresponding page background from a preset webpage background library according to the page theme type. The method can improve the matching accuracy, shorten the matching time and improve the matching efficiency, and meanwhile, the whole process can be automatically, efficiently and intelligently finished without manual intervention, so that the artificial error is reduced, the matching accuracy is further improved, the effect of accurate, simple and efficient webpage background setting is achieved, and the optimal combination of webpage content transmission and emotion expression is realized.

Description

Page background matching method and device based on page theme
Technical Field
The invention relates to the technical field of computers, in particular to a page background matching method and device based on page subjects.
Background
The rise of mobile internet and the application field and popularity of Web are further expanding. All large organizations, units or individuals can publish information through Web, the text of the information content is used as a main body to be embedded in a network page, and various backgrounds and decorations are matched to form a complete webpage to be presented to users. The background of the web page is an important accompany of the whole web page, directly determines the style and the color tone of the web page, and if the background is not properly used, the overall attractiveness of the web page is influenced, and the expression of the main content of the web page is also hindered.
Currently, a commonly used background adding method is to manually identify the information content of a web page by a web page editor, and select a corresponding page background and a matched theme emotion based on personal judgment, so as to edit and generate a new web page.
However, the currently used adding and editing methods have the following problems: firstly, before each judgment, an editor needs to spend a large amount of time reading the information content to be displayed, so that the matching time is prolonged, the matching efficiency is reduced, and each matching is based on the subjective judgment of the editor on the information content, so that the addition of a matched theme background or the inconsistency of theme emotion with the information content is easily caused, and the editing error is caused.
Disclosure of Invention
The invention provides a page background matching method and device based on a page theme.
The first aspect of the embodiments of the present invention provides a page background matching method based on a page theme, where the method includes:
acquiring a page to be matched containing page content, and screening the page content of the page to be matched to obtain a page text set;
extracting texts from the page text set to obtain a plurality of key texts;
determining a page subject type based on the number of key texts;
and matching the corresponding page background from a preset webpage background library according to the page theme type.
In a possible implementation manner of the first aspect, the determining a page topic type based on the plurality of key texts includes:
dividing the plurality of key texts into a plurality of emotion text sets through a preset webpage emotion dictionary, wherein each emotion text set comprises N pieces of key text information, and N is a positive integer greater than or equal to 0;
and determining an emotion text set containing the most key texts as a target emotion text from the plurality of emotion text sets, and taking an emotion corresponding to the target emotion text as a page theme type.
In a possible implementation manner of the first aspect, the extracting the text of the page text set to obtain a plurality of key texts includes:
performing word segmentation on the page text set through a preset word segmentation device to obtain a plurality of word segmentation texts;
and screening a plurality of non-virtual word segmentation texts from the plurality of segmentation texts to obtain a plurality of key texts.
In a possible implementation manner of the first aspect, the filtering to obtain a page text set from the page content of the page to be matched includes:
acquiring marked page content containing HTML marks in the page to be matched;
carrying out mark style screening on the marked page content to obtain a display page content;
and extracting a page text set from the display page content.
In one possible implementation manner of the first aspect, the displaying the page content includes: m title sentence contents, wherein M is a positive integer greater than or equal to 1;
the extracting a page text set from the display page content includes:
when M is equal to 1, adding the title sentence content into a preset text set to obtain a page text set;
when M is larger than 1, traversing the content attribute value corresponding to each title sentence content to obtain M content attribute values;
comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value;
when the value of the ith content attribute value is greater than the value of the (i + 1) th content attribute value, taking the ith content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
when the value of the ith content attribute value is smaller than the value of the (i + 1) th content attribute value, taking the (i + 1) th content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
judging whether i +1 is equal to M;
if i +1 is not equal to M, assigning i +1 to i, and repeatedly executing the step of comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value;
and if the i +1 is equal to the M, adding the title sentence content corresponding to the reference attribute value into a preset text set to obtain a page text set.
In one possible implementation manner of the first aspect, the content attribute value includes: a font size attribute value and a title attribute value;
wherein the font size attribute values include a numeric font size attribute value and a percentage font size attribute value.
In one possible implementation manner of the first aspect, the displaying the page content includes: the description content;
the extracting a page text set from the display page content includes:
extracting content attribute values from the description content, wherein the content attribute values comprise keyword attribute values and summary attribute values;
and adding the content attribute value to a preset text set to obtain a page text set.
A second aspect of the embodiments of the present invention provides a page context matching apparatus based on a page theme, where the apparatus includes:
the screening module is used for acquiring a page to be matched containing page content and screening the page content of the page to be matched to obtain a page text set;
the extraction module is used for extracting texts from the page text set to obtain a plurality of key texts;
the determining module is used for determining the page subject type based on the plurality of key texts;
and the matching module is used for matching the corresponding page background from a preset webpage background library according to the page theme type.
Compared with the prior art, the page background matching method and device based on the page theme provided by the embodiment of the invention have the beneficial effects that: according to the method, the key text of the target webpage can be quickly and simply extracted, the webpage key words are obtained through analysis, the webpage theme emotion types are identified through webpage theme emotion calculation on the webpage key words, and then the target webpage background is intelligently matched and set according to the preset webpage background emotion matching knowledge base. The method can improve the matching accuracy, shorten the matching time and improve the matching efficiency, and meanwhile, the whole process can be automatically, efficiently and intelligently finished without manual intervention, so that the artificial error is reduced, the matching accuracy is further improved, the effect of accurate, simple and efficient webpage background setting is achieved, and the optimal combination of webpage content transmission and emotion expression is realized.
Drawings
Fig. 1 is a schematic flowchart of a page context matching method based on a page theme according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a process of extracting title attribute values according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a process of extracting font size attribute values according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating an operation of a page context matching method based on page topics according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a page context matching apparatus based on a page theme according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
However, the currently used adding and editing methods have the following problems: before background matching is carried out each time or judgment is carried out, editors need to spend a large amount of time to read the information content to be displayed, matching time is prolonged, matching efficiency is reduced, each matching is based on subjective judgment of the editors on the information content, and the theme background added with matching is not consistent with the theme emotion expressed by the information content easily, so that editing errors are caused.
In order to solve the above problem, a page context matching method based on page topics provided by the embodiments of the present application will be described and explained in detail through the following specific embodiments.
It should be noted that, in this embodiment, the page context matching method based on the page theme may be applied to a server or a server group.
Referring to fig. 1, a flowchart of a page context matching method based on a page theme according to an embodiment of the present invention is shown.
As an example, the page background matching method based on the page theme may include:
s11, obtaining a page to be matched containing page content, and screening the page content of the page to be matched to obtain a page text set.
The page to be matched can be a page needing background matching, and the page to be matched can be provided with text contents needing displaying by a user.
After the page to be matched is obtained, different page text sets can be obtained by screening from the page to be matched, and the page text sets can be composed of texts of the page to be matched.
Since there are many text contents contained in the page to be matched, including titles, brief descriptions, center contents, ending and time, etc., states of different text contents may also be different, in order to improve the filtering efficiency, for example, step S11 may include the following sub-steps:
and a substep S111, obtaining the marked page content containing the HTML mark in the page to be matched.
The Markup page content of the HTML (HyperText Markup Language) Markup is a descriptive text composed by the user through the HTML Markup.
And a substep S112, performing mark style screening on the marked page content, and screening to obtain the displayed page content.
In this embodiment, the mark pattern may include a display mark pattern and a hidden mark pattern, and the corresponding hidden content is content that is not displayed on the page to be matched by the user. Since the user intends to hide such content, the content may not be used as a reference for topic or context matching of the web page. Specifically, the hidden flag pattern is set to a flag of display: none or visibility: hidden.
In actual operation, the mark style of the mark page content can be obtained, the display mark style is screened according to the mark style, and the mark page content corresponding to the display mark style is used as the display page content.
And a substep S113, extracting a page text set from the display page content.
After the content of the display page is obtained, corresponding page texts can be screened and extracted from the content of the display page, and finally, a page text set is generated by a plurality of page texts.
In practical application, the title sentence displayed on the page is one of important factors for attracting readers to read, and the corresponding page theme can be matched according to the title sentence. Specifically, in this embodiment, the content of the display page includes: m title sentence contents, wherein M is a positive integer greater than or equal to 1;
wherein, as an example, the sub-step S113 may include the following sub-steps:
and when M is equal to 1, adding the title sentence content into a preset text set to obtain a page text set.
In the present embodiment, the title sentence content is the text of the title sentence. When M is equal to 1, it is determined that the displayed page content only includes a text of a heading sentence, and the heading sentence content may be added to a preset text set to obtain a page text set.
In actual operation, the preset text set is an empty set preset by the user.
And when M is larger than 1, traversing the content attribute value corresponding to each title sentence content to obtain M content attribute values.
If M is greater than 1, it indicates that the web page content that the user needs to display may include a plurality of different topics or contents, and the different topics correspond to one title sentence, and each title sentence needs to be filtered.
In order to improve the screening efficiency, a content attribute value corresponding to each title sentence content may be obtained.
Comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value, wherein i is a positive integer with the initial value of 1;
when the value of the ith content attribute value is greater than the value of the (i + 1) th content attribute value, taking the ith content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
when the value of the ith content attribute value is smaller than the value of the (i + 1) th content attribute value, taking the (i + 1) th content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
judging whether i +1 is equal to M;
if i +1 is not equal to M, assigning i +1 to i, and repeatedly executing the step of comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value;
and if the i +1 is equal to the M, adding the title sentence content corresponding to the reference attribute value into a preset text set to obtain a page text set.
In this embodiment, when the content attribute value corresponding to each title sentence content is obtained, each content attribute value may be compared one by one, and then the title sentence content corresponding to the content attribute value with the largest numerical value is added to a preset text set by screening the content attribute value with the largest numerical value, so as to obtain a page text set.
Note that i is a positive integer whose initial value is 1, and the superposition calculation of i +1 is performed in the subsequent calculation.
For example, when a first content attribute value is obtained, the first content attribute value is used as a reference attribute value. Searching according to the sequence of the text content to obtain a second content attribute value, then comparing whether the first content attribute value is larger than the numerical value of the second content attribute value, and when the numerical value of the first content attribute value is larger than the numerical value of the second content attribute value, continuing to use the first content attribute value as a reference attribute value; and when the numerical value of the first content attribute value is smaller than the numerical value of the second content attribute value, updating the reference attribute value by taking the second content attribute value as the reference attribute value. Then judging whether the title sentence content corresponding to the second content attribute value is the last title sentence content, if so, finishing the comparison, and adding the title sentence content corresponding to the second content attribute value into a preset text set to obtain a page text set; if not, acquiring a third content attribute value, comparing the third content attribute value with a reference attribute value (the reference attribute value is the content attribute value with the largest value in the prior comparison), and taking the value with the largest value as the reference attribute value, and continuously calculating until the content attribute values of all the title sentence contents are compared.
Since the user may adjust the size of the heading sentence by the heading attribute and may adjust the size of the heading sentence by the font size when the page content is designed or input. In a specific implementation, the content attribute values include: a font size attribute value and a title attribute value. Wherein the font size attribute value (font-size) comprises a numeric font size attribute value and a percentage font size attribute value.
Referring to fig. 2, a schematic diagram of a flow of extracting title attribute values provided by an embodiment of the present invention is shown, specifically, h is a title attribute value of a title sentence, and the title attribute values are divided into h1 to h6, where the title attribute value of h1 is the largest and the title attribute value of h6 is the smallest. In practical operation, a preset text set Ts may be constructed, and the initialization is set to be null, that is, Ts { }. Then judging whether the first sentence title sentence content is displayed or not, if so, acquiring a title attribute value of the first sentence title sentence content, setting the title attribute value as a reference attribute value, adding the first sentence title sentence content into a preset text set Ts, then circularly searching a second sentence title sentence content from a body mark of the page content to the text direction, and if not, ending, and taking the preset text set Ts as a page text set; if the second-sentence title sentence content exists, comparing whether the title attribute value of the second-sentence title sentence content is larger than the reference attribute value or not, when the title attribute value of the second-sentence title sentence content is larger than the reference attribute value, taking the title attribute value of the second-sentence title sentence content as the reference attribute value, adding the second-sentence title sentence content into a preset text set Ts, replacing the first-sentence title sentence content with the second-sentence title sentence content, then circularly searching a third-sentence title sentence content in the text direction from a body (text) mark of the page content, and so on until all the title sentence contents are traversed; and when the title attribute value of the second title sentence content is smaller than the reference attribute value, circularly searching a third title sentence content from the body mark of the page content to the text direction, and so on until all the title sentence contents are traversed.
Referring to fig. 3, a schematic diagram of a process of extracting a font size attribute value according to an embodiment of the present invention is shown. Specifically, the content of the title sentence can be screened and extracted according to the font size attribute value.
Specifically, the initialization process may also be executed, a preset text set is constructed as a maximum word size heading sentence set, and the maximum word size heading sentence set is set to be empty to obtain T3 ← { tp ← null, tv ← null }, where null represents a null string, tp represents text of the heading sentence content indexed by the mark of the maximum word size represented by a percentage, and tv represents text of the heading sentence content indexed by the mark of the maximum word size represented by a numerical value.
Wherein, the current maximum word size fsp ← 0% expressed in percentage, and the current maximum word size fsv ← 0% expressed in numeric value. During the fetching process, T3, fsp, and fsv are continuously updated.
In actual operation, whether the body mark of the text content is scanned completely can be judged, and if the body mark of the text content is scanned completely, the operation is ended; if the body mark is not scanned completely, the first font-size attribute value (font-size) is looked up and denoted as fsx, where each font-size attribute value may be a displayable mark tx.
And then, judging the type of the fsx, if the type of the fsx is the percentage attribute value type, comparing the numerical values of the percentage attribute value, and if the type of the fsx is the numerical attribute value type, comparing the numerical values of the numerical attribute value. The fsx is then updated and T3 is updated by replacing the previous value with the maximum value, and finally updated to the set T3 until the maximum headline content is found.
In actual operation, the page content edited by the user may further include a text content, where the text content is a description content that the user needs to display, and specifically, the display page content includes: and describing content, wherein the describing content is text.
Wherein, as an example, the sub-step S113 may include the following sub-steps:
extracting content attribute values from the description content, wherein the content attribute values include a keyword attribute value and a summary attribute value.
And adding the content attribute value to a preset text set to obtain a page text set.
In actual practice, a webpage description set Ds may also be constructed in advance, and the initialization is set to be null, that is, Ds { }.
Then, whether the style mark describing the content is a display style mark is judged, if yes, the value of the content attribute in the description content < meta name ═ keywords "content ═ is extracted, and the value of the content attribute in the description content < meta name ═ description" content · "> is extracted, wherein the value of the content attribute in the < meta name ═ keywords" content · can be recorded as k, and the value is placed into Ds, namely: ds ═ { k }; the value of the content attribute in the < meta name ═ description ". the" > flag may be denoted as d, which is put into Ds, i.e.: ds ═ k, d }. In practical applications, the value of the content attribute in the < meta name ═ keywords "content ·" > tag and the value of the content attribute in the < meta name ═ description "content ·" > tag may be a line of text.
And S12, extracting texts from the page text set to obtain a plurality of key texts.
Because the page text set comprises a plurality of page contents, each page content may have a plurality of different texts, in order to improve the matching efficiency, after the page text set is obtained, the page text can be extracted, and a plurality of key texts are obtained from the page text set.
Since different text contents may contain one or more different words of insignificant importance such as a null word or a mood word, in order to improve the matching accuracy, step S12 may include the following sub-steps, as an example:
and a substep S121, performing word segmentation on the page text set through a preset word segmentation device to obtain a plurality of word segmentation texts.
And a substep S122, screening a plurality of non-particle word segmentation texts from the plurality of segmentation texts to obtain a plurality of key texts.
In actual operation, the page text set may be input into a preset word segmentation device for word segmentation, and the preset word segmentation device may be provided with a word segmentation method based on matching, a word segmentation method based on understanding, or a word segmentation method based on statistics, which are preset by a user.
Then, the part-of-speech of the word segmentation texts is judged and screened, the word segmentation texts are classified into virtual word texts and non-virtual word texts, the non-virtual word texts are extracted, and a plurality of key texts are obtained
And S13, determining the page theme type based on the key texts.
In an optional embodiment, the part of speech of a plurality of key texts may be used to determine the page theme type, the page theme type may be determined according to the number of the plurality of key texts, or the page theme type may be determined according to the sequence of the plurality of key texts.
Since the key text contains several, in order to improve the accuracy of matching, step S13 may include the following sub-steps, as an example:
and a substep S131, dividing the plurality of key texts into a plurality of emotion text sets through a preset webpage emotion dictionary, wherein each emotion text set comprises N pieces of key text information, and N is a positive integer greater than or equal to 0.
Three emotion text sets are set, namely celebration, sadness and neutrality respectively, and each emotion text set corresponds to one theme. And then, respectively identifying the emotion of each key text through a preset webpage emotion dictionary, and distributing the emotion to a corresponding emotion text set according to the emotion of each key text.
For example, deriving a number of key texts includes: celebration, enthusiasm, success, victory, birth, emerging, sacrifice, disaster, accident, deceased, disease, notice, message, letter, and the like.
The method comprises the following steps of obtaining the result after distribution through a preset webpage emotion dictionary:
the topical emotion text set for celebration includes: celebration, enthusiasm, success, victory, birth, emerging.
The topical emotional text set of sadness includes: sacrifice, disaster, accident, death, disease.
The neutral set of subject emotional texts comprises: notifications, messages, letters.
In the process of identification and distribution, if the emotion text set by the user is two, the plurality of key texts can be classified in two.
For example, the emotion text set by the user is a set of happiness and sadness, and the set can be classified as follows:
celebrating { success, hotness, celebration, festival, gold medal, birth, spring festival, national celebration, Yuan Dan, mid-autumn, newly marriage, victory, open curtain, closed curtain, red fire, happy, revitalized,. }
Mourning, { sacrifice, death, morbidity, disaster, car accident, earthquake, clarification, mourning, grief, sadness, mourning.
And a substep S132, determining the emotion text set containing the most key texts from the plurality of emotion text sets as a target emotion text, and taking the emotion corresponding to the target emotion text as a page theme type.
After the emotion text sets are distributed, the number of key texts contained in each emotion text set can be counted respectively. And then screening an emotion text set containing the most key texts as a target emotion text, and taking an emotion corresponding to the target emotion text as a page theme type.
For example, J is the number of key texts in the emotion text set representing joy, S is the number of key texts in the emotion text set representing sadness, and B is the number of key texts in the emotion text set representing neutral. If S is more than B and more than J, the sadness emotion text set is taken as the target emotion text, and sadness is taken as the page theme type.
In an optional embodiment, in order to improve the efficiency of the screening, when the number of the key texts in the two opposite topic emotion text sets is the same, a neutral emotion text set may be used as the target emotion text, and a neutral emotion text set may be used as the page topic type.
For example, when J ═ S, a neutral set of emotion texts is targeted for the emotion text, and neutrality is the page topic type.
And S14, matching the corresponding page background from a preset webpage background library according to the page theme type.
In actual practice, different page theme types may correspond to different page backgrounds. The user can set the corresponding page background when setting the page theme type, and store the page background in the page background library, and after determining the page theme type, the corresponding page background can be selected according to the page theme type.
For example, configuration rules for page theme type for celebration: the background color or background image is mainly red.
Configuration rules for the subject type of the sadness page: mainly using a background color or a background image of a gray color system or a black color system.
Configuration rules for neutral page topic types: the original background color or background picture of the webpage is kept unchanged.
In order to enrich the page background, the user can also set corresponding patterns or animations or other different display effects.
And finally, after the page background is determined, the page background can be added into the page to be matched, a corresponding display page is generated and displayed to the user for the user to watch.
Referring to fig. 4, an operation flow chart of a page context matching method based on a page theme according to an embodiment of the present invention is shown.
Specifically, the page to be matched may be preprocessed, where the preprocessing may include screening page content of the page to be matched and text extraction; then, theme emotion analysis of the webpage is carried out, and the theme type of the webpage is determined; and finally, matching according to the theme type of the page to obtain a corresponding page background.
In this embodiment, an embodiment of the present invention provides a page background matching method based on a page theme, which has the following beneficial effects: according to the method, the key text of the target webpage can be quickly and simply extracted, the webpage key words are obtained through analysis, the webpage theme emotion types are identified through webpage theme emotion calculation on the webpage key words, and then the target webpage background is intelligently matched and set according to the preset webpage background emotion matching knowledge base. The method can improve the matching accuracy, shorten the matching time and improve the matching efficiency, can automatically, efficiently and intelligently complete the whole process without manual intervention, reduces artificial errors, further improves the matching accuracy, achieves the effect of accurate, simple and efficient webpage background setting, and realizes the optimal combination of webpage content transmission and emotion expression.
The embodiment of the present invention further provides a page context matching device based on the page theme, and referring to fig. 5, a schematic structural diagram of the page context matching device based on the page theme according to the embodiment of the present invention is shown.
For example, the page context matching device based on the page theme may include:
the screening module 501 is configured to acquire a page to be matched containing page content, and screen the page content of the page to be matched to obtain a page text set;
an extraction module 502, configured to perform text extraction on the page text set to obtain a plurality of key texts;
a determining module 503, configured to determine a page subject type based on the number of key texts;
and a matching module 504, configured to match a corresponding page background from a preset webpage background library according to the page theme type.
Optionally, the determining module is further configured to:
dividing the plurality of key texts into a plurality of emotion text sets through a preset webpage emotion dictionary, wherein each emotion text set comprises N pieces of key text information, and N is a positive integer greater than or equal to 0;
and determining an emotion text set containing the most key texts as a target emotion text from the plurality of emotion text sets, and taking an emotion corresponding to the target emotion text as a page theme type.
Optionally, the extracting module is further configured to:
performing word segmentation on the page text set through a preset word segmentation device to obtain a plurality of word segmentation texts;
and screening a plurality of non-virtual word segmentation texts from the plurality of segmentation texts to obtain a plurality of key texts.
Optionally, the screening module is further configured to:
acquiring marked page content containing HTML marks in the page to be matched;
carrying out mark style screening on the marked page content to obtain a display page content;
and extracting a page text set from the display page content.
Optionally, the displaying page content includes: m title sentence contents, wherein M is a positive integer greater than or equal to 1;
the screening module is further configured to:
when M is equal to 1, adding the title sentence content into a preset text set to obtain a page text set;
when M is larger than 1, traversing the content attribute value corresponding to each title sentence content to obtain M content attribute values;
comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value;
when the value of the ith content attribute value is greater than the value of the (i + 1) th content attribute value, taking the ith content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
when the value of the ith content attribute value is smaller than the value of the (i + 1) th content attribute value, taking the (i + 1) th content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
judging whether i +1 is equal to M;
if i +1 is not equal to M, assigning i +1 to i, and repeatedly executing the step of comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value;
and if the i +1 is equal to the M, adding the title sentence content corresponding to the reference attribute value into a preset text set to obtain a page text set.
Optionally, the content attribute value includes: a font size attribute value and a title attribute value;
wherein the font size attribute values include a numeric font size attribute value and a percentage font size attribute value.
Optionally, the displaying page content includes: the description content;
the screening module is further configured to:
extracting content attribute values from the description content, wherein the content attribute values comprise keyword attribute values and summary attribute values;
and adding the content attribute value to a preset text set to obtain a page text set.
Further, an embodiment of the present application further provides an electronic device, including: the page background matching method based on the page theme comprises the following steps of storing a page background matching program, and executing the page background matching program on a processor.
Further, an embodiment of the present application also provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to enable a computer to execute the page context matching method based on page topics as described in the foregoing embodiment.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A page background matching method based on page subjects is characterized by comprising the following steps:
acquiring a page to be matched containing page content, and screening the page content of the page to be matched to obtain a page text set;
extracting texts from the page text set to obtain a plurality of key texts;
determining a page subject type based on the number of key texts;
and matching the corresponding page background from a preset webpage background library according to the page theme type.
2. The method for matching page background based on page theme of claim 1, wherein the determining the page theme type based on the plurality of key texts comprises:
dividing the plurality of key texts into a plurality of emotion text sets through a preset webpage emotion dictionary, wherein each emotion text set comprises N pieces of key text information, and N is a positive integer greater than or equal to 0;
and determining an emotion text set containing the most key texts as a target emotion text from the plurality of emotion text sets, and taking an emotion corresponding to the target emotion text as a page theme type.
3. The page background matching method based on page theme according to claim 1, wherein the text extraction is performed on the page text set to obtain a plurality of key texts, and the method comprises:
performing word segmentation on the page text set through a preset word segmentation device to obtain a plurality of word segmentation texts;
and screening a plurality of non-virtual word segmentation texts from the plurality of segmentation texts to obtain a plurality of key texts.
4. The page background matching method based on page theme according to any one of claims 1-3, wherein the filtering of the page content of the page to be matched to obtain a page text set comprises:
acquiring marked page content containing HTML marks in the page to be matched;
carrying out mark style screening on the marked page content to obtain a display page content;
and extracting a page text set from the display page content.
5. The page background matching method based on page theme of claim 4, wherein the displaying page content comprises: m title sentence contents, wherein M is a positive integer greater than or equal to 1;
the extracting a page text set from the display page content includes:
when M is equal to 1, adding the title sentence content into a preset text set to obtain a page text set;
when M is larger than 1, traversing the content attribute value corresponding to each title sentence content to obtain M content attribute values;
comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value;
when the value of the ith content attribute value is greater than the value of the (i + 1) th content attribute value, taking the ith content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
when the value of the ith content attribute value is smaller than the value of the (i + 1) th content attribute value, taking the (i + 1) th content attribute value as a reference attribute value, wherein the initial value of the reference attribute value is zero;
judging whether i +1 is equal to M;
if i +1 is not equal to M, assigning i +1 to i, and repeatedly executing the step of comparing whether the ith content attribute value is larger than the value of the (i + 1) th content attribute value;
and if the i +1 is equal to the M, adding the title sentence content corresponding to the reference attribute value into a preset text set to obtain a page text set.
6. The page background matching method based on page theme of claim 5, wherein the content attribute value comprises: a font size attribute value and a title attribute value;
wherein the font size attribute values include a numeric font size attribute value and a percentage font size attribute value.
7. The page background matching method based on page theme of claim 4, wherein the displaying page content comprises: the description content;
the extracting a page text set from the display page content includes:
extracting content attribute values from the description content, wherein the content attribute values comprise keyword attribute values and summary attribute values;
and adding the content attribute value to a preset text set to obtain a page text set.
8. An apparatus for matching page background based on page theme, the apparatus comprising:
the screening module is used for acquiring a page to be matched containing page content and screening the page content of the page to be matched to obtain a page text set;
the extraction module is used for extracting texts from the page text set to obtain a plurality of key texts;
the determining module is used for determining the page subject type based on the plurality of key texts;
and the matching module is used for matching the corresponding page background from a preset webpage background library according to the page theme type.
9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the page context matching method based on page theme as claimed in any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the page context matching method based on page subjects of any one of claims 1 to 7.
CN202110391022.5A 2021-04-12 2021-04-12 Page background matching method and device based on page theme Pending CN113204723A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110391022.5A CN113204723A (en) 2021-04-12 2021-04-12 Page background matching method and device based on page theme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110391022.5A CN113204723A (en) 2021-04-12 2021-04-12 Page background matching method and device based on page theme

Publications (1)

Publication Number Publication Date
CN113204723A true CN113204723A (en) 2021-08-03

Family

ID=77026586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110391022.5A Pending CN113204723A (en) 2021-04-12 2021-04-12 Page background matching method and device based on page theme

Country Status (1)

Country Link
CN (1) CN113204723A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584807A (en) * 2022-01-24 2022-06-03 北京达佳互联信息技术有限公司 Skin setting method and device, and skin display method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136089A (en) * 2007-09-24 2008-03-05 腾讯科技(深圳)有限公司 Method, system and device for updating e-mail web page background
CN103136188A (en) * 2011-11-22 2013-06-05 国际商业机器公司 Method and system used for sentiment estimation of web browsing user
CN106406882A (en) * 2016-09-14 2017-02-15 百度在线网络技术(北京)有限公司 Method and device for displaying post background in forum
CN106649603A (en) * 2016-11-25 2017-05-10 北京资采信息技术有限公司 Webpage text data sentiment classification designated information push method
CN108920434A (en) * 2018-06-06 2018-11-30 武汉酷犬数据科技有限公司 A kind of general Web page subject method for extracting content and system
CN109783182A (en) * 2019-02-15 2019-05-21 百度在线网络技术(北京)有限公司 A kind of method of adjustment, device, equipment and the medium of page subject matter tone
US20190155906A1 (en) * 2017-11-22 2019-05-23 International Business Machines Corporation Enhancing a computer to match emotion and tone in text with the emotion and tone depicted by the color in the theme of the page or its background

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136089A (en) * 2007-09-24 2008-03-05 腾讯科技(深圳)有限公司 Method, system and device for updating e-mail web page background
CN103136188A (en) * 2011-11-22 2013-06-05 国际商业机器公司 Method and system used for sentiment estimation of web browsing user
CN106406882A (en) * 2016-09-14 2017-02-15 百度在线网络技术(北京)有限公司 Method and device for displaying post background in forum
CN106649603A (en) * 2016-11-25 2017-05-10 北京资采信息技术有限公司 Webpage text data sentiment classification designated information push method
US20190155906A1 (en) * 2017-11-22 2019-05-23 International Business Machines Corporation Enhancing a computer to match emotion and tone in text with the emotion and tone depicted by the color in the theme of the page or its background
CN108920434A (en) * 2018-06-06 2018-11-30 武汉酷犬数据科技有限公司 A kind of general Web page subject method for extracting content and system
CN109783182A (en) * 2019-02-15 2019-05-21 百度在线网络技术(北京)有限公司 A kind of method of adjustment, device, equipment and the medium of page subject matter tone

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584807A (en) * 2022-01-24 2022-06-03 北京达佳互联信息技术有限公司 Skin setting method and device, and skin display method and device

Similar Documents

Publication Publication Date Title
US11551567B2 (en) System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
KR100359265B1 (en) Text processor
US9298699B2 (en) Presentation of written works based on character identities and attributes
US8775918B2 (en) System and method for automatic improvement of electronic presentations
Juola et al. A prototype for authorship attribution studies
US20180366013A1 (en) System and method for providing an interactive visual learning environment for creation, presentation, sharing, organizing and analysis of knowledge on subject matter
JPH08272826A (en) Method and device for working document
US9633008B1 (en) Cognitive presentation advisor
CN110309114B (en) Method and device for processing media information, storage medium and electronic device
CN111339284A (en) Product intelligent matching method, device, equipment and readable storage medium
US20100131534A1 (en) Information providing system
JP2016099741A (en) Information extraction support apparatus, method and program
CN111506794A (en) Rumor management method and device based on machine learning
CN111428503A (en) Method and device for identifying and processing same-name person
CN113536172A (en) Encyclopedic information display method and device and computer storage medium
JP3832693B2 (en) Structured document search and display method and apparatus
CN113204723A (en) Page background matching method and device based on page theme
CN106406882A (en) Method and device for displaying post background in forum
CN117436414A (en) Presentation generation method and device, electronic equipment and storage medium
JP5311488B2 (en) KANSEI information extraction device, KANSEI search device, method and program thereof
CN109284364B (en) Interactive vocabulary updating method and device for voice microphone-connecting interaction
KR101995315B1 (en) System and method for presenting fonts through retrieval
CN117708308B (en) RAG natural language intelligent knowledge base management method and system
CN114189740B (en) Video synthesis dialogue construction method and device, computer equipment and storage medium
JP5109615B2 (en) Document analysis support apparatus and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210803

RJ01 Rejection of invention patent application after publication