CN117421487B - Multiple network information screening management system based on artificial intelligence - Google Patents

Multiple network information screening management system based on artificial intelligence Download PDF

Info

Publication number
CN117421487B
CN117421487B CN202311749145.7A CN202311749145A CN117421487B CN 117421487 B CN117421487 B CN 117421487B CN 202311749145 A CN202311749145 A CN 202311749145A CN 117421487 B CN117421487 B CN 117421487B
Authority
CN
China
Prior art keywords
document
target
picture
title
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311749145.7A
Other languages
Chinese (zh)
Other versions
CN117421487A (en
Inventor
郭齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Kangnai Network Technology Co ltd
Original Assignee
Xi'an Kangnai Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Kangnai Network Technology Co ltd filed Critical Xi'an Kangnai Network Technology Co ltd
Priority to CN202311749145.7A priority Critical patent/CN117421487B/en
Publication of CN117421487A publication Critical patent/CN117421487A/en
Application granted granted Critical
Publication of CN117421487B publication Critical patent/CN117421487B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of screening management of various network information, and particularly discloses an artificial intelligence-based screening management system of various network information, which comprises the following components: the system comprises a platform information extraction module, a screening mode confirmation module, a single text title confirmation module, a single icon title confirmation module, an image-text title confirmation module and a cloud database; according to the invention, the screening mode of the current document is confirmed by extracting the category title information in the platform and the information issued by the current publisher, and the first-class category title and the second-class category title of the document in each screening mode are confirmed, so that automatic screening and confirmation of the category titles are realized, the accuracy and reliability of screening are improved, the time of manual screening is shortened, the working efficiency of the platform is improved, and meanwhile, the automatic screening realizes real-time classification and confirmation of the first-class category title and the second-class category title, and information classification is processed and updated in time, so that the timeliness and freshness of the information are maintained.

Description

Multiple network information screening management system based on artificial intelligence
Technical Field
The invention relates to the technical field of screening management of various network information, in particular to an artificial intelligence-based screening management system of various network information.
Background
With the rapid development of the internet, the network information volume is increased in an explosive manner, how to efficiently screen and process the information becomes an important problem, and the existing method for screening and confirming the category titles of the information published by the publishers in the platform is mostly dependent on manual operation, and is low in efficiency and easy to make mistakes, so that in order to ensure the accuracy of adding the category titles of the information published by the publishers, the information published by the publishers can be more accurately browsed by users, and screening management needs to be performed on the information published by the publishers in the platform.
The existing screening management mode for the information published by the publisher in the platform has the following problems: 1. at present, manual operation is relied on, so that the efficiency is low, errors are easy to occur, automatic screening is not performed on information published by publishers, the screening precision and reliability are reduced, the manual screening time is increased, the working efficiency is reduced, meanwhile, real-time classification and category title confirmation cannot be realized through automatic screening, information classification cannot be processed and updated timely, and therefore timeliness and freshness of the information cannot be maintained.
2. The current first class title and the second class title of the information issued by the publisher are confirmed by only considering the occurrence times of the same keywords in the document, the word number proportion of the keywords in the single document, the number proportion of the paragraphs and the occupation area and the position of the region where the atlas elements of the pictures in the single-image document are located are not comprehensively analyzed, the reasonability and the accuracy of the category title confirmation of the Shan Wen document and the single-image document are reduced, and the information issued by the publisher cannot be more accurately browsed by a user to a certain extent.
Disclosure of Invention
In view of this, in order to solve the problems set forth in the background art, a variety of network information screening management systems based on artificial intelligence are proposed.
The aim of the invention can be achieved by the following technical scheme: the invention provides a multiple network information screening management system based on artificial intelligence, comprising: the platform information extraction module is used for extracting each primary category title and each secondary category title corresponding to each primary category title in the target platform and extracting information published by the current publisher, wherein the information comprises characters and pictures.
And the screening mode confirming module is used for confirming the current document screening mode according to the information published by the current publisher, wherein the current document screening mode comprises a Shan Wen screening mode, a single-image screening mode and an image-text screening mode.
And the single-text title confirming module is used for marking the corresponding document as a target document when the current document screening mode is Shan Wen screening mode, extracting key information of the target document, confirming a primary category title of the target document, analyzing the coincidence degree between the target document and each corresponding secondary category title, and taking the secondary category title with the largest coincidence degree as the secondary category title of the target document.
And the single-image question confirmation module is used for marking the corresponding document as a target picture document when the current document screening mode is a single-image screening mode, extracting each atlas element of each picture in the target picture document, confirming the primary category title of the target picture document, collecting element information of each picture, analyzing the coincidence degree between the target picture document and each corresponding secondary category title, and taking the secondary category title with the largest coincidence degree as the secondary category title of the target picture document.
And the cloud database is used for storing keyword sets corresponding to the primary category titles and the secondary category titles respectively.
And the image-text title confirming module is used for confirming the primary category title and the secondary category title of the corresponding document in a similar way according to the confirmation mode of the primary category title and the secondary category title of the corresponding document when the current document screening mode is the image-text screening mode and the single-image screening mode.
Specifically, the confirmation mode of the current document screening mode is as follows: if the information issued by the current publisher only contains characters, the current document screening mode is Shan Wen screening mode, if the information issued by the current publisher only contains pictures, the current document screening mode is single-picture screening mode, and if the information issued by the current publisher contains characters and pictures, the current document screening mode is picture-text screening mode.
Specifically, the key information includes each keyword, the number of paragraphs, and the total number of fonts of the document.
Specifically, the confirming process of the primary category title of the target document is as follows: extracting each keyword from the key information of the target document, comparing each keyword of the target document with a keyword set corresponding to each primary category title stored in the cloud database, counting the number of the keywords of the target document in the keyword set corresponding to each primary category title, and taking the primary category title with the largest number as the primary category title of the target document.
Specifically, the analyzing the coincidence degree between the target document and each corresponding secondary category title comprises the following steps: a1, obtaining each secondary category title corresponding to the target document according to each secondary category title corresponding to each primary category title in the target platform.
And A2, extracting keyword sets corresponding to the secondary category titles from the cloud database, so as to obtain the keyword sets corresponding to the secondary category titles corresponding to the target documents.
A3, comparing each keyword of the target document with a keyword set corresponding to each secondary category title corresponding to the target document, counting the same keyword number between the target document and each secondary category title, and recording asWherein->Number representing title of class II +.>
A4, extracting the paragraph number and the total number of the fonts of the document from the key information of the target document and respectively recording asAnd
a5, counting the font number of the same keywords corresponding to the two-level category titles, and recording the number of times and the number of paragraphs appearing in the target document asAnd +.>And->Wherein->The numbers representing the same key words are given,
a6, calculating the word number ratio of each identical keyword corresponding to each secondary category title in the target document
A7, calculating the number proportion of paragraphs of each same keyword corresponding to each secondary category title in the target document,/>
A8, calculating the coincidence degree between the target document and each corresponding secondary category titleWherein->、/>And->Respectively representing the same keyword number, word number ratio of appearance and paragraph number ratio of appearance of the set reference, +.>And->The corresponding coincidence degree evaluation duty weight of the set same keyword number, the present word number duty ratio and the present paragraph number duty ratio are respectively represented, < ->Representing the same number of keywords.
Specifically, the confirmation mode of the primary category title of the target picture document is as follows: and taking each atlas element of each picture in the target picture document as each keyword of each picture, integrating each keyword of each picture to obtain each keyword of the target picture document, comparing each keyword of the target picture document with keyword sets corresponding to each primary category title stored in a cloud database, counting the number of keywords of the target picture document in the keyword sets corresponding to each primary category title, and taking the primary category title with the largest number as the primary category title of the target picture document.
Specifically, the element information includes the picture area and the occupied area and the position of the area where each atlas element is located.
Specifically, the analyzing the coincidence degree between the target picture document and each corresponding secondary category title includes the following steps: b1, calculating the coincidence degree between each secondary category title corresponding to the target picture document and each pictureWherein->Number representing picture,/->
B2, respectively extracting maximum value and minimum value from the coincidence degree between each secondary category title corresponding to the target picture document and each picture, and respectively marking asAnd->
B3, calculating the coincidence degree between the target picture document and each corresponding secondary category titleWherein->And->Respectively representing extreme value difference of picture coincidence degree and picture coincidence degree of set reference, < >>And->Respectively representing the set extreme value difference of the picture coincidence degree and the estimated duty ratio weight of the coincidence degree corresponding to the picture coincidence degree,/>Representing natural constant->Representing the number of pictures.
Specifically, the calculating the coincidence degree between each secondary category title corresponding to the target picture document and each picture includes the following specific calculating process: and C1, obtaining each secondary category title corresponding to the target picture document according to each secondary category title corresponding to each primary category title in the target platform.
And C2, extracting keyword sets corresponding to the secondary category titles from the cloud database, so as to obtain the keyword sets corresponding to the secondary category titles corresponding to the target picture document.
C3, targetComparing each keyword of each picture in the picture document with the keyword set of each secondary category title corresponding to the target picture document, counting the same keyword number between each secondary category title and each picture, and recording the same keyword number as the target keyword number
And C4, extracting the picture area and the occupied position of the area where each picture set element is located from the element information of each picture.
C5, obtaining the occupied area of each target keyword corresponding to each secondary category title in each picture according to the occupied area of each region of each atlas element of each picture, and marking asWherein->Number representing target keyword ++>
C6, recording the picture area of each picture as
C7, calculating the area occupation ratio of the area of each target keyword corresponding to each secondary category title in each picture,/>
C8, according to the positions of the areas where the atlas elements of the pictures are located, obtaining the positions of the areas where the target keywords corresponding to the secondary category titles are located in the pictures, and analyzing the coincidence coefficients of the positions of the areas where the target keywords corresponding to the secondary category titles are located in the pictures
C9, calculating the coincidence degree between each secondary category title corresponding to the target picture document and each pictureWherein->、/>And->The number of target keywords, the area ratio of the area where the target keywords are located and the position of the area where the target keywords are located respectively represent the coincidence coefficients, and the number of target keywords is +>、/>And->Respectively representing the set target keyword number, the area occupation ratio of the area and the coincidence degree evaluation occupation weight corresponding to the coincidence coefficient of the position of the area>Representing the number of target keywords.
Compared with the prior art, the embodiment of the invention has at least the following advantages or beneficial effects: (1) According to the invention, the screening mode of the current document is confirmed by extracting the category title information in the platform and the information issued by the current publisher, and the primary category title and the secondary category title of the document in each screening mode are confirmed, so that automatic screening and confirmation of the category titles are realized, the accuracy and reliability of screening are improved, the time of manual screening is shortened, the working efficiency is improved, and meanwhile, the automatic screening realizes real-time classification and confirmation of the category titles, timely processing and updating of information classification and the timeliness and freshness of the information are maintained.
(2) According to the method and the system for confirming the primary category titles of the target document, the primary category titles of the target document are confirmed according to the keywords, the paragraph numbers and the total number of the fonts of the document, and the coincidence degree between the target document and the corresponding secondary category titles is analyzed, so that the secondary category titles of the target document are confirmed, the confirmation rationality of the primary category titles and the secondary category titles of the target document is improved, and the information published by a publisher can be more accurately browsed by a user.
(3) According to the invention, the primary category titles of the target picture document are confirmed by combining the picture set elements of each picture in the target picture document, the picture area of each picture and the occupied area and the occupied position of the area of each picture set element, and the coincidence degree between the target picture document and each corresponding secondary category title is analyzed, so that the secondary category titles of the target picture document are confirmed, the confirming reliability of the primary category titles and the secondary category titles of the target picture document is improved, the readability and the comprehensiveness of the pictures are improved, and for some pictures with specific subjects or scenes, the corresponding categories are added, so that readers can be helped to better understand the picture content, the reading experience sense is improved, and the propagation effect of the pictures is also improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram showing the connection of the system modules according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a system for screening and managing various network information based on artificial intelligence, comprising: platform information extraction module, screening mode confirmation module, list title confirmation module, picture and text title confirmation module and cloud database.
The platform information extraction module is connected with the screening mode confirmation module, the single text title confirmation module, the single icon title confirmation module and the image-text title confirmation module are connected with the screening mode confirmation module, the single text title confirmation module and the single icon title confirmation module are both connected with the cloud database, and the single text title confirmation module and the single icon title confirmation module are both connected with the image-text title confirmation module.
The platform information extraction module is used for extracting each primary category title and each secondary category title corresponding to each primary category title in the target platform and extracting information published by the current publisher, wherein the information comprises characters and pictures.
It should be noted that, each primary category title in the target platform, each secondary category title corresponding to each primary category title, and the information published by the current publisher are all extracted from the background management system of the target platform.
And the screening mode confirming module is used for confirming the current document screening mode according to the information published by the current publisher, wherein the current document screening mode comprises a Shan Wen screening mode, a single-image screening mode and an image-text screening mode.
In a specific embodiment of the present invention, the confirmation method of the current document screening mode is as follows: if the information issued by the current publisher only contains characters, the current document screening mode is Shan Wen screening mode, if the information issued by the current publisher only contains pictures, the current document screening mode is single-picture screening mode, and if the information issued by the current publisher contains characters and pictures, the current document screening mode is picture-text screening mode.
And the single-text title confirming module is used for marking the corresponding document as a target document when the current document screening mode is Shan Wen screening mode, extracting key information of the target document, confirming primary category titles of the target document, analyzing the coincidence degree between the target document and each corresponding secondary category title, and taking the secondary category title with the largest coincidence degree as the secondary category title of the target document.
In a specific embodiment of the invention, the key information comprises each key word, paragraph number and total document font number.
The obtaining process of each keyword of the target document is as follows: preprocessing a target document, including word segmentation, stop word removal, punctuation mark removal and the like, performing word frequency statistics on the preprocessed target document, counting the occurrence times of each word in the target document, comparing the occurrence times with the set reference occurrence times, and taking a word as a keyword if the occurrence times of the word in the target document are greater than or equal to the set reference occurrence times, so that each keyword of the target document is counted.
In a specific embodiment of the present invention, the process of confirming the primary category header of the target document is: extracting each keyword from the key information of the target document, comparing each keyword of the target document with a keyword set corresponding to each primary category title stored in the cloud database, counting the number of the keywords of the target document in the keyword set corresponding to each primary category title, and taking the primary category title with the largest number as the primary category title of the target document.
In a specific embodiment of the present invention, the analyzing the coincidence degree between the target document and each corresponding secondary category title includes: a1, obtaining each secondary category title corresponding to the target document according to each secondary category title corresponding to each primary category title in the target platform.
And A2, extracting keyword sets corresponding to the secondary category titles from the cloud database, so as to obtain the keyword sets corresponding to the secondary category titles corresponding to the target documents.
A3, comparing each keyword of the target document with a keyword set corresponding to each secondary category title corresponding to the target document, counting the same keyword number between the target document and each secondary category title, and recording asWherein->Number representing title of class II +.>
A4, extracting the paragraph number and the total number of the fonts of the document from the key information of the target document and respectively recording asAnd
a5, counting the font number of the same keywords corresponding to the two-level category titles, and recording the number of times and the number of paragraphs appearing in the target document asAnd +.>And->Wherein->The numbers representing the same key words are given,
a6, calculating all the same keywords corresponding to all the secondary category titles to be displayed in the target documentPresent word number ratio
A7, calculating the number proportion of paragraphs of each same keyword corresponding to each secondary category title in the target document,/>
A8, calculating the coincidence degree between the target document and each corresponding secondary category titleWherein->、/>And->Respectively representing the same keyword number, word number ratio of appearance and paragraph number ratio of appearance of the set reference, +.>And->The corresponding coincidence degree evaluation duty weight of the set same keyword number, the present word number duty ratio and the present paragraph number duty ratio are respectively represented, < ->Representing the same number of keywords.
According to the embodiment of the invention, the primary category titles of the target document are confirmed according to the keywords, the paragraph numbers and the total number of the fonts of the document, and the coincidence degree between the target document and the corresponding secondary category titles is analyzed, so that the secondary category titles of the target document are confirmed, the reasonability of confirming the primary category titles and the secondary category titles of the target document is improved, and the information issued by a publisher can be more accurately browsed by a user.
The single-image question confirmation module is used for marking the corresponding document as a target picture document when the current document screening mode is a single-image screening mode, extracting each atlas element of each picture in the target picture document, confirming the primary category title of the target picture document, collecting element information of each picture, analyzing the coincidence degree between the target picture document and each corresponding secondary category title, and taking the secondary category title with the largest coincidence degree as the secondary category title of the target picture document.
The extraction mode of each atlas element of each picture in the target picture document is as follows: the method comprises the steps of collecting images of pictures in a target picture document, dividing the images of the pictures into a plurality of areas, and extracting and identifying features of the areas so as to extract atlas elements in the images of the pictures.
In a specific embodiment of the present invention, the confirmation method of the primary category header of the target picture document is: and taking each atlas element of each picture in the target picture document as each keyword of each picture, integrating each keyword of each picture to obtain each keyword of the target picture document, comparing each keyword of the target picture document with keyword sets corresponding to each primary category title stored in a cloud database, counting the number of keywords of the target picture document in the keyword sets corresponding to each primary category title, and taking the primary category title with the largest number as the primary category title of the target picture document.
Note that, the integrating each keyword of each picture means: and comparing the keywords of each picture, removing the same keywords, and taking the other keywords which are different from each other as the keywords of the target picture document.
In a specific embodiment of the present invention, the element information includes a picture area and an occupied area and a position of an area where each atlas element is located.
The image area is obtained by locating the image, the occupied area of the area where each atlas element is located is obtained by identifying the area where each atlas element is located from the image, and the occupied area of the area where each atlas element is located is obtained by carrying out area calculation on the area where each atlas element is located, and the position of the area where each atlas element is located is the position of the area where each atlas element is located.
In a specific embodiment of the present invention, the analyzing the coincidence degree between the target picture document and each corresponding secondary category title includes: b1, calculating the coincidence degree between each secondary category title corresponding to the target picture document and each pictureWherein->Number representing picture,/->
In a specific embodiment of the present invention, the calculating the coincidence degree between each secondary category title corresponding to the target picture document and each picture specifically includes: and C1, obtaining each secondary category title corresponding to the target picture document according to each secondary category title corresponding to each primary category title in the target platform.
And C2, extracting keyword sets corresponding to the secondary category titles from the cloud database, so as to obtain the keyword sets corresponding to the secondary category titles corresponding to the target picture document.
C3, comparing each keyword of each picture in the target picture document with each keyword set of each secondary category title corresponding to the target picture document, and counting the same keyword number between each secondary category title and each pictureAnd record it as the target keyword number
And C4, extracting the picture area and the occupied position of the area where each picture set element is located from the element information of each picture.
C5, obtaining the occupied area of each target keyword corresponding to each secondary category title in each picture according to the occupied area of each region of each atlas element of each picture, and marking asWherein->Number representing target keyword ++>
C6, recording the picture area of each picture as
C7, calculating the area occupation ratio of the area of each target keyword corresponding to each secondary category title in each picture,/>
C8, according to the positions of the areas where the atlas elements of the pictures are located, obtaining the positions of the areas where the target keywords corresponding to the secondary category titles are located in the pictures, and analyzing the coincidence coefficients of the positions of the areas where the target keywords corresponding to the secondary category titles are located in the pictures
It should be noted that, the region where each target keyword corresponding to each secondary category title is located in each pictureThe analysis process of the position coincidence coefficient is as follows: locating the center point of each picture, taking the center point as a round point, taking the set length as a radius, obtaining each set circle, taking the set circle as each target circle, and if the position of the region of a certain target keyword corresponding to a certain secondary category title in a certain picture is completely located in the corresponding target circle, marking the coincidence coefficient of the position of the region of the target keyword corresponding to the secondary category title in the picture asIf the position of the region of a target keyword corresponding to a second category title in a picture is completely outside the corresponding target circle, marking the region position coincidence coefficient of the target keyword corresponding to the second category title in the picture as->If the position part of the region of the target keyword corresponding to the second category title in the picture is positioned in the corresponding target circle, marking the region position coincidence coefficient of the target keyword corresponding to the second category title in the picture as->To sum up, obtaining the position of the area of each target keyword corresponding to each secondary category title in each picture to be in accordance with the coefficient +.>,/>The value of (2) is +.>Or->Or->Wherein, the method comprises the steps of, wherein,
c9, calculating the coincidence degree between each secondary category title corresponding to the target picture document and each pictureWherein->、/>And->The number of target keywords, the area ratio of the area where the target keywords are located and the position of the area where the target keywords are located respectively represent the coincidence coefficients, and the number of target keywords is +>、/>And->Respectively representing the set target keyword number, the area occupation ratio of the area and the coincidence degree evaluation occupation weight corresponding to the coincidence coefficient of the position of the area>Representing the number of target keywords.
B2, respectively extracting maximum value and minimum value from the coincidence degree between each secondary category title corresponding to the target picture document and each picture, and respectively marking asAnd->
B3, calculating the target picture document and the corresponding target picture documentCompliance between titles of secondary categoriesWherein->And->Respectively representing extreme value difference of picture coincidence degree and picture coincidence degree of set reference, < >>And->Respectively representing the set extreme value difference of the picture coincidence degree and the estimated duty ratio weight of the coincidence degree corresponding to the picture coincidence degree,/>Representing natural constant->Representing the number of pictures.
According to the embodiment of the invention, the primary category titles of the target picture document are confirmed by combining the picture set elements of each picture in the target picture document, the picture area of each picture and the occupied area and the occupied position of the area of each picture set element, and the coincidence degree between the target picture document and each corresponding secondary category title is analyzed, so that the secondary category titles of the target picture document are confirmed, the reliability of confirming the primary category titles and the secondary category titles of the target picture document is improved, the readability and the comprehensiveness of the pictures are improved, and for some pictures with specific subjects or scenes, the reader can be helped to better understand the picture content by adding corresponding categories, the reading experience sense is improved, and the propagation effect of the pictures is improved.
The cloud database is used for storing keyword sets corresponding to the primary category titles and the secondary category titles respectively.
And the image-text title confirming module is used for confirming the primary category title and the secondary category title of the corresponding document in a similar way according to the confirmation mode of the primary category title and the secondary category title of the corresponding document when the current document screening mode is the image-text screening mode and the single-image screening mode.
According to the embodiment of the invention, the screening mode of the current document is confirmed by extracting the category title information in the platform and the information issued by the current publisher, and the primary category title and the secondary category title of the document in each screening mode are confirmed, so that automatic screening and confirmation of the category titles are realized, the accuracy and reliability of screening are improved, the time of manual screening is shortened, the working efficiency is improved, and meanwhile, real-time classification and confirmation of the category titles are realized by automatic screening, information classification is processed and updated in time, and the timeliness and freshness of the information are maintained.
The foregoing is merely illustrative and explanatory of the principles of this invention, as various modifications and additions may be made to the specific embodiments described, or similar arrangements may be substituted by those skilled in the art, without departing from the principles of this invention or beyond the scope of this invention as defined in the claims.

Claims (7)

1. A multiple network information screening management system based on artificial intelligence, comprising:
the platform information extraction module is used for extracting each primary category title and each secondary category title corresponding to each primary category title in the target platform and extracting information published by the current publisher, wherein the information comprises characters and pictures;
the screening mode confirming module is used for confirming a current document screening mode according to the information published by the current publisher, wherein the current document screening mode comprises a Shan Wen screening mode, a single-image screening mode and an image-text screening mode;
the single-text title confirming module is used for marking the corresponding document as a target document when the current document screening mode is Shan Wen screening mode, extracting key information of the target document, confirming a first class title of the target document, analyzing the coincidence degree between the target document and each corresponding second class title, and taking the second class title with the largest coincidence degree as the second class title of the target document;
the single-image question confirmation module is used for marking the corresponding document as a target picture document when the current document screening mode is a single-image screening mode, extracting each atlas element of each picture in the target picture document, confirming the primary category title of the target picture document, collecting element information of each picture, analyzing the coincidence degree between the target picture document and each corresponding secondary category title, and taking the secondary category title with the largest coincidence degree as the secondary category title of the target picture document;
the cloud database is used for storing keyword sets corresponding to the primary category titles and the secondary category titles respectively;
the image-text title confirming module is used for confirming the primary category title and the secondary category title of the corresponding document in a similar way according to the confirmation mode of the primary category title and the secondary category title of the corresponding document when the current document screening mode is the image-text screening mode and the single-image screening mode so as to obtain the primary category title and the secondary category title of the corresponding document;
the method comprises the following specific analysis processes of analyzing the coincidence degree between a target picture document and each corresponding secondary category title:
b1, calculating the coincidence degree between each secondary category title corresponding to the target picture document and each pictureWherein->Number representing picture,/->
B2, classifying from the coincidence degree between each secondary category title corresponding to the target picture document and each pictureExtracting maximum value and minimum value respectively, and recording asAnd->
B3, calculating the coincidence degree between the target picture document and each corresponding secondary category titleWherein->And->Respectively representing extreme value difference of picture coincidence degree and picture coincidence degree of set reference, < >>And->Respectively representing the set extreme value difference of the picture coincidence degree and the estimated duty ratio weight of the coincidence degree corresponding to the picture coincidence degree,/>Representing natural constant->Representing the number of pictures;
the method for calculating the coincidence degree between each secondary category title corresponding to the target picture document and each picture comprises the following specific calculation processes:
c1, obtaining each secondary category title corresponding to a target picture document according to each secondary category title corresponding to each primary category title in the target platform;
c2, extracting keyword sets corresponding to the secondary category titles from the cloud database, so as to obtain keyword sets corresponding to the secondary category titles corresponding to the target picture documents;
c3, comparing the keywords of each picture in the target picture document with the keyword sets of the secondary category titles corresponding to the target picture document, counting the same keyword numbers between the secondary category titles and each picture, and recording the keyword sets as target keyword numbers
C4, extracting the picture area and the occupied position of the area where each picture set element is located from the element information of each picture;
c5, obtaining the occupied area of each target keyword corresponding to each secondary category title in each picture according to the occupied area of each region of each atlas element of each picture, and marking asWherein->A number indicating the target keyword is displayed,
c6, recording the picture area of each picture as
C7, calculating the area occupation ratio of the area of each target keyword corresponding to each secondary category title in each picture
C8, according to the positions of the areas where the atlas elements of the pictures are located, obtaining the positions of the areas where the target keywords corresponding to the secondary category titles are located in the pictures, and analyzing the coincidence coefficients of the positions of the areas where the target keywords corresponding to the secondary category titles are located in the pictures
C9, calculating the coincidence degree between each secondary category title corresponding to the target picture document and each pictureWherein->、/>And->The number of target keywords, the area ratio of the area where the target keywords are located and the position of the area where the target keywords are located respectively represent the coincidence coefficients, and the number of target keywords is +>、/>And->Respectively representing the set target keyword number, the area occupation ratio of the area and the coincidence degree evaluation occupation weight corresponding to the coincidence coefficient of the position of the area>Representing the number of target keywords.
2. The multiple network information screening management system based on artificial intelligence according to claim 1, wherein: the confirmation mode of the current document screening mode is as follows: if the information issued by the current publisher only contains characters, the current document screening mode is Shan Wen screening mode, if the information issued by the current publisher only contains pictures, the current document screening mode is single-picture screening mode, and if the information issued by the current publisher contains characters and pictures, the current document screening mode is picture-text screening mode.
3. The multiple network information screening management system based on artificial intelligence according to claim 1, wherein: the key information includes keywords, paragraph number and total document font number.
4. A multiple network information screening management system based on artificial intelligence according to claim 3, wherein: the confirming process of the primary category title of the target document comprises the following steps: extracting each keyword from the key information of the target document, comparing each keyword of the target document with a keyword set corresponding to each primary category title stored in the cloud database, counting the number of the keywords of the target document in the keyword set corresponding to each primary category title, and taking the primary category title with the largest number as the primary category title of the target document.
5. The multiple network information screening management system based on artificial intelligence according to claim 4, wherein: the method comprises the steps of analyzing the coincidence degree between a target document and each corresponding secondary category title, wherein the specific analysis process comprises the following steps:
a1, obtaining each secondary category title corresponding to a target document according to each secondary category title corresponding to each primary category title in the target platform;
a2, extracting keyword sets corresponding to the secondary category titles from the cloud database, so as to obtain keyword sets corresponding to the secondary category titles corresponding to the target documents;
a3, comparing each keyword of the target document with a keyword set corresponding to each secondary category title corresponding to the target document, counting the same keyword number between the target document and each secondary category title, and recording asWherein->Number representing title of class II +.>
A4, extracting the paragraph number and the total number of the fonts of the document from the key information of the target document and respectively recording asAnd->
A5, counting the font number of the same keywords corresponding to the two-level category titles, and recording the number of times and the number of paragraphs appearing in the target document asAnd +.>And->Wherein->The numbers representing the same key words are given,
a6, calculating the word number ratio of each identical keyword corresponding to each secondary category title in the target document
A7, calculating the number proportion of paragraphs of each same keyword corresponding to each secondary category title in the target document
A8, calculating the coincidence degree between the target document and each corresponding secondary category titleWherein->、/>And->Respectively representing the same keyword number, word number ratio of appearance and paragraph number ratio of appearance of the set reference, +.>And->Respectively represent the same set number of keywords and outputThe number of words present and the number of paragraphs present are compared with the corresponding coincidence level to evaluate the weight of the ratio,/>Representing the same number of keywords.
6. The multiple network information screening management system based on artificial intelligence according to claim 5, wherein: the first-level category titles of the target picture documents are confirmed in the following manner: and taking each atlas element of each picture in the target picture document as each keyword of each picture, integrating each keyword of each picture to obtain each keyword of the target picture document, comparing each keyword of the target picture document with keyword sets corresponding to each primary category title stored in a cloud database, counting the number of keywords of the target picture document in the keyword sets corresponding to each primary category title, and taking the primary category title with the largest number as the primary category title of the target picture document.
7. The multiple network information screening management system based on artificial intelligence according to claim 6, wherein: the element information comprises the picture area and the occupied position of the area where each atlas element is located.
CN202311749145.7A 2023-12-19 2023-12-19 Multiple network information screening management system based on artificial intelligence Active CN117421487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311749145.7A CN117421487B (en) 2023-12-19 2023-12-19 Multiple network information screening management system based on artificial intelligence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311749145.7A CN117421487B (en) 2023-12-19 2023-12-19 Multiple network information screening management system based on artificial intelligence

Publications (2)

Publication Number Publication Date
CN117421487A CN117421487A (en) 2024-01-19
CN117421487B true CN117421487B (en) 2024-03-08

Family

ID=89525214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311749145.7A Active CN117421487B (en) 2023-12-19 2023-12-19 Multiple network information screening management system based on artificial intelligence

Country Status (1)

Country Link
CN (1) CN117421487B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170016657A (en) * 2015-08-04 2017-02-14 서울시립대학교 산학협력단 An apparatus for managing document using table of contents, a method thereof, and a computer recordable medium storing the method
CN112464907A (en) * 2020-12-17 2021-03-09 广东电网有限责任公司 Document processing system and method
CN114064851A (en) * 2021-10-19 2022-02-18 中国人民解放军31511部队 Multi-machine retrieval method and system for government office documents
CN114218389A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Long text classification method in chemical preparation field based on graph neural network
CN114781997A (en) * 2022-04-06 2022-07-22 中国矿业大学 Intelligent examination system and implementation method for special construction scheme of critical engineering
CN115203614A (en) * 2022-07-28 2022-10-18 武汉小帆船电子商务有限公司 Page automatic generation, analysis and processing method based on webpage development
CN116186133A (en) * 2022-08-29 2023-05-30 苏州空天信息研究院 Electronic document management method integrating forward index and backward index
CN116932859A (en) * 2023-08-10 2023-10-24 苏州阿基米德网络科技有限公司 Medical equipment document searching and browsing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170016657A (en) * 2015-08-04 2017-02-14 서울시립대학교 산학협력단 An apparatus for managing document using table of contents, a method thereof, and a computer recordable medium storing the method
CN112464907A (en) * 2020-12-17 2021-03-09 广东电网有限责任公司 Document processing system and method
CN114064851A (en) * 2021-10-19 2022-02-18 中国人民解放军31511部队 Multi-machine retrieval method and system for government office documents
CN114218389A (en) * 2021-12-21 2022-03-22 一拓通信集团股份有限公司 Long text classification method in chemical preparation field based on graph neural network
CN114781997A (en) * 2022-04-06 2022-07-22 中国矿业大学 Intelligent examination system and implementation method for special construction scheme of critical engineering
CN115203614A (en) * 2022-07-28 2022-10-18 武汉小帆船电子商务有限公司 Page automatic generation, analysis and processing method based on webpage development
CN116186133A (en) * 2022-08-29 2023-05-30 苏州空天信息研究院 Electronic document management method integrating forward index and backward index
CN116932859A (en) * 2023-08-10 2023-10-24 苏州阿基米德网络科技有限公司 Medical equipment document searching and browsing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于文档目录存储的题库系统研究;曾晓莉;王龙业;;西藏大学学报(自然科学版);20100615(01);55-59 *

Also Published As

Publication number Publication date
CN117421487A (en) 2024-01-19

Similar Documents

Publication Publication Date Title
US7519607B2 (en) Computer-based system and method for generating, classifying, searching, and analyzing standardized text templates and deviations from standardized text templates
US8356045B2 (en) Method to identify common structures in formatted text documents
US10789281B2 (en) Regularities and trends discovery in a flow of business documents
CN112231484B (en) News comment auditing method, system, device and storage medium
CN110276054B (en) Insurance text structuring realization method
CN110188077B (en) Intelligent classification method and device for electronic files, electronic equipment and storage medium
CN112036145A (en) Financial statement identification method and device, computer equipment and readable storage medium
CN112015721A (en) E-commerce platform storage database optimization method based on big data
CN108197119A (en) The archives of paper quality digitizing solution of knowledge based collection of illustrative plates
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN112989827B (en) Text data set quality evaluation method based on multi-source heterogeneous characteristics
CN114550193A (en) Document integrity detection method and system and electronic equipment
CN113762100A (en) Name extraction and standardization method and device in medical bill, computing equipment and storage medium
CN113591476A (en) Data label recommendation method based on machine learning
CN117421487B (en) Multiple network information screening management system based on artificial intelligence
CN111598099A (en) Method and device for testing image text recognition performance, testing equipment and medium
CN111428497A (en) Method, device and equipment for automatically extracting financing information
CN116244421A (en) Method, device, equipment and readable storage medium for matching project names
CN112926577B (en) Medical bill image structuring method and device and computer readable medium
CN113408446B (en) Bill accounting method and device, electronic equipment and storage medium
GB2608112A (en) System and method for providing media content
CN113011174A (en) Surrounding mark string identification method based on text analysis
CN109977269B (en) Data self-adaptive fusion method for XML file
CN112733527B (en) Construction method and system of building engineering document knowledge network
Xu Cross-Media Retrieval: Methodologies and Challenges

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant