CN117591578A - Data mining system and mining method based on big data - Google Patents

Data mining system and mining method based on big data Download PDF

Info

Publication number
CN117591578A
CN117591578A CN202410073443.7A CN202410073443A CN117591578A CN 117591578 A CN117591578 A CN 117591578A CN 202410073443 A CN202410073443 A CN 202410073443A CN 117591578 A CN117591578 A CN 117591578A
Authority
CN
China
Prior art keywords
data
image
module
video
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410073443.7A
Other languages
Chinese (zh)
Other versions
CN117591578B (en
Inventor
高兴毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Science and Technology
Original Assignee
Shandong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Science and Technology filed Critical Shandong University of Science and Technology
Priority to CN202410073443.7A priority Critical patent/CN117591578B/en
Publication of CN117591578A publication Critical patent/CN117591578A/en
Application granted granted Critical
Publication of CN117591578B publication Critical patent/CN117591578B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Abstract

The invention discloses a data mining system and a data mining method based on big data, and belongs to the technical field of big data. In order to solve the problems of limited data capture types and low data utilization rate, the data acquisition unit acquires and captures basic data, image data and video data files, the text reading module, the image processing module and the video processing module can effectively process data files in various formats, so that the data in multiple formats can be classified and mined more comprehensively, the data mining effect is improved, the data mining unit extracts key bytes for capturing a data set to check and reject abnormal data, the key bytes are checked, the data in the captured data set can be effectively checked and screened, the cost and difficulty of data mining are effectively reduced, mining, storing and utilizing can be comprehensively and systematically carried out, the data utilization is carried out in a unified system pertinence mode, and the data utilization rate is improved.

Description

Data mining system and mining method based on big data
Technical Field
The invention relates to the technical field of big data, in particular to a data mining system and a mining method based on big data.
Background
Data mining refers to the process of analyzing and summarizing a large amount of collected data by using a proper statistical analysis method, extracting useful information and forming conclusions to study and summarize the data in detail. This process is also a supporting process for the quality management system.
Related patents such as publication number CN106339451a disclose a data mining system based on big data, including an information system, a data mining application server and an industry client; the information system is used for collecting and processing industry data of preset conditions of users and accessing the industry data into the system through a bus; the data mining application server is used for extracting, converting and loading data aiming at industry data preset by a user, and importing a data mining result into an industry client; the industry client is used for providing the final data after analysis and processing for the user for the client to extract. Different industry data such as bank data, gene sequences, financial control and the like can be preset by a user according to the self requirements, and the data mining application server performs targeted analysis processing according to preset conditions of the user, so that the system is simple in structure, clear in purpose and high in efficiency.
The above patent has the following problems in actual operation:
1. when the data is grabbed before being mined, the data format is single, so that the problems of less types and low breadth of the grabbed data can be brought, and the data mining effect is influenced;
2. the existing data analysis system can only realize simple statistical processing on the existing data, but cannot perform deep data mining analysis on the running state of an enterprise according to the existing data, so that the data cannot be fully utilized.
Disclosure of Invention
The invention aims to provide a data mining system and a mining method based on big data, so as to solve the problems in the background technology.
In order to achieve the above purpose, the present invention provides the following technical solutions: a big data based data mining system, comprising:
the data acquisition unit is used for:
collecting basic data, image data and video data files, generating a basic data file, reading text information of the basic data to obtain a keyword set, extracting text information and picture characteristics of the image data file, capturing the keyword set in the basic data and the text information and picture characteristics in the image data file, and generating a captured data set;
a data storage unit for:
interacting with a cloud platform, carrying out data set distributed storage and encryption on a keyword set in basic data, text information and picture characteristics in an image data file and a key region in a video data file, and sharing data through a network on the basis of the cloud platform;
a data mining unit for:
extracting key bytes of the grabbing data set, checking and eliminating abnormal data to generate de-abnormal data, and cleaning the data of the basic data file based on the de-abnormal data to generate a determined data set;
a data feedback unit for:
carrying out data retrieval and retrieval result display and reminding on the cloud platform;
cloud platform for:
after the classified information is stored and edited through the cloud, the classified information is transmitted to a data storage unit, and feedback information is returned;
a user terminal for:
the system is used for storing, operating and implementing the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit, and comprises at least one login end and at least one control terminal, when the control terminal works, the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit are operated, and the data mining system based on big data is realized, and the login end is connected to the control terminal, the cloud platform and the server through the Internet.
Further, the data acquisition unit includes:
the file acquisition module is used for:
collecting basic data, image data and video data files, and generating a basic data file based on the basic data, the image data and the video data files;
a text reading module for:
reading the text information of the basic data, dividing the text information to obtain a plurality of extracted words in the text information, and cleaning the extracted words according to the part-of-speech statistical characteristics to obtain a keyword set;
an image processing module for:
extracting text information and picture characteristics of the image data files, creating associated stamps for the image data files, obtaining associated stamps of each image data file, wherein the associated stamp of each image data file is a global unique associated stamp, and carrying out associated stamp association on the extracted text information and picture characteristics of each image data file;
the video processing module is used for:
cutting out a video key segment of a video data file, framing the video key segment to obtain a plurality of frames of video images, and determining a key area in each video image, wherein each video image comprises a key area;
the information grabbing module is used for:
and capturing the keyword set in the basic data, the text information and the picture characteristics in the image data file and the key region in the video data file, and generating a captured data set based on the captured data.
Further, the data storage unit includes:
a data storage module for:
the method comprises the steps of interacting with a cloud platform, carrying out data set distributed storage on a keyword set in basic data, text information and picture characteristics in an image data file and a key area in a video data file, and positioning data storage information through the cloud platform;
a data encryption module for:
interacting with a data storage module, and encrypting the distributed storage of the data set;
a data sharing module, configured to:
and interacting with the cloud platform, processing data in the cloud platform and sharing the data through a network.
Further, the data mining unit includes:
the abnormality rejection module is used for:
extracting key bytes of the grabbing data set for verification, carrying out association analysis on the key bytes, determining abnormal distinguishing key bytes, extracting abnormal data corresponding to the distinguishing key bytes from each item of data for eliminating, and generating de-abnormal data based on the grabbing data set after eliminating the abnormal data;
the data cleaning module is used for:
and cleaning the data of the basic data file based on the de-alien data, and cleaning and screening the basic data file based on the association stamp corresponding to the de-alien data during data cleaning to generate a determined data set.
Furthermore, the anomaly rejection module performs association analysis on the grabbing data set during verification, builds a data association analysis model, inputs the grabbing data set into the data association analysis model for data analysis, and outputs an analysis report based on a data analysis result.
Further, the data feedback unit includes:
the quick search module is used for:
interacting with the cloud platform and providing query service based on an index system;
the data feedback module is used for:
and displaying and reminding through the display equipment according to the retrieval result of the quick retrieval module.
Further, the cloud platform includes:
cloud database for:
classifying and storing a keyword set in the received basic data, text information and picture characteristics in an image data file and a key area in a video data file according to a data stream label;
a data processing module for:
grouping the classified stored data according to stream attribute information and data content, and classifying and marking, wherein the grouping comprises a basic data set, an image data set and a video data set;
the data matching module is used for:
and respectively matching the data sets processed and grouped by the data processing module according to the attribute information of the data storage unit to generate the data requirements of the corresponding data streams, and butting the data requirements of the data streams with the data sets.
Further, the data encryption module includes:
an encryption management sub-module for:
dividing the data set into a plurality of parts according to the distributed storage condition of the data set, registering an encryption method in each part, then carrying out use record aiming at the encryption method, carrying out key management aiming at a key in the use process of the encryption method, and forming a key index by combining the key management in the use record;
an encryption processing sub-module for:
and the distributed storage information of the distributed storage of the data set is obtained through interaction with the data storage module, encryption processing is carried out on an encryption method in a corresponding part according to the distributed storage information, and meanwhile, a secret key in the encryption processing process is fed back to the encryption management sub-module.
Further, the video processing module determines a plurality of frames of video images obtained by combining when determining a key region in each video image, including:
image recognition is carried out on the video image, imaging conditions in the video image are recognized, and an image recognition result is obtained;
dividing the video image into a plurality of areas according to the image recognition result;
the video image is analyzed in combination with the adjacent frame video image by the following formula:
in the above-mentioned formula(s),indicate->Analytical data value of block area, < >>Representing a symbolic function +_>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Representing the total frame number of the video image, +.>Indicate->Analysis of the block region, when->When indicate->The block areas are used to constitute the critical areas when +.>When indicate->The block regions are not used to construct critical regions;
the regions for constituting the key regions are combined together to form the key regions of the video image with reference to the analysis result.
The invention provides a mining method of a data mining system based on big data, which comprises the following steps:
step one: the data acquisition unit acquires basic data, image data and video data files and generates a basic data file, and performs characteristic grabbing based on the basic data file and generates a grabbing data set;
step two: the data storage unit performs data set distributed storage and encryption on the grabbing data set;
step three: the data mining unit cleans and rejects the abnormal data and generates a determined data set;
step four: the data storage unit stores the determined data set based on the cloud platform;
step five: the data storage unit shares data through a network, and the data feedback unit retrieves and displays the data based on the cloud platform.
Compared with the prior art, the invention has the beneficial effects that:
1. under the prior art, when the data is captured before being captured, the data format is single, so that the problems of less captured data types and low breadth can be brought to influence the effect of data capture.
2. Under the prior art, the existing data analysis system can only realize simple statistical processing on the existing data, but cannot perform deep data mining analysis on the running state of an enterprise according to the existing data, and is difficult to fully utilize the data.
Drawings
FIG. 1 is a schematic diagram of a system module according to the present invention.
Description of the embodiments
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a data mining system based on big data includes:
the data acquisition unit is used for:
collecting basic data, image data and video data files, generating a basic data file, reading text information of the basic data to obtain a keyword set, extracting text information and picture characteristics of the image data file, capturing the keyword set in the basic data and the text information and picture characteristics in the image data file, and generating a captured data set;
a data storage unit for:
interacting with a cloud platform, carrying out data set distributed storage and encryption on a keyword set in basic data, text information and picture characteristics in an image data file and a key region in a video data file, and sharing data through a network on the basis of the cloud platform;
a data mining unit for:
extracting key bytes of the grabbing data set, checking and eliminating abnormal data to generate de-abnormal data, and cleaning the data of the basic data file based on the de-abnormal data to generate a determined data set;
a data feedback unit for:
carrying out data retrieval and retrieval result display and reminding on the cloud platform;
cloud platform for:
after the classified information is stored and edited through the cloud, the classified information is transmitted to a data storage unit, and feedback information is returned;
a user terminal for:
the system is used for storing, operating and implementing the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit, and comprises at least one login end and at least one control terminal, when the control terminal works, the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit are operated, and the data mining system based on big data is realized, and the login end is connected to the control terminal, the cloud platform and the server through the Internet.
Specifically, when the system works, firstly, the data acquisition unit acquires basic data, image data and video data files and generates basic data files, characteristic capture is performed based on the basic data files and a captured data set is generated, secondly, the data storage unit performs data set distributed storage and encryption on the captured data set, then the data mining unit cleans and rejects abnormal data and generates a determined data set, the data storage unit stores the determined data set based on the cloud platform, when the data needs to be applied, the data storage unit shares the data through a network, and the data feedback unit searches and displays the data based on the cloud platform.
In order to solve the technical problems that when data is grabbed before being mined, the data format is single, so that the problems of less types and low breadth of grabbed data can be brought, and the effect of data mining is affected, the invention provides the following technical scheme:
the data acquisition unit includes:
the file acquisition module is used for:
collecting basic data, image data and video data files, and generating a basic data file based on the basic data, the image data and the video data files;
a text reading module for:
reading the text information of the basic data, dividing the text information to obtain a plurality of extracted words in the text information, and cleaning the extracted words according to the part-of-speech statistical characteristics to obtain a keyword set;
an image processing module for:
extracting text information and picture characteristics of the image data files, creating associated stamps for the image data files, obtaining associated stamps of each image data file, wherein the associated stamp of each image data file is a global unique associated stamp, and carrying out associated stamp association on the extracted text information and picture characteristics of each image data file;
the video processing module is used for:
cutting out a video key segment of a video data file, framing the video key segment to obtain a plurality of frames of video images, and determining a key area in each video image, wherein each video image comprises a key area;
the information grabbing module is used for:
and capturing the keyword set in the basic data, the text information and the picture characteristics in the image data file and the key region in the video data file, and generating a captured data set based on the captured data.
Specifically, the text reading module, the image processing module and the video processing module can effectively process data files in various formats, so that the data in the multiple formats can be classified and mined more comprehensively, the richness of the processable data is improved, the data mining effect is improved, and meanwhile, the mining accuracy and breadth of the data can be effectively improved by increasing the range of the types of the data.
The data storage unit includes:
a data storage module for:
the method comprises the steps of interacting with a cloud platform, carrying out data set distributed storage on a keyword set in basic data, text information and picture characteristics in an image data file and a key area in a video data file, and positioning data storage information through the cloud platform;
a data encryption module for:
interacting with a data storage module, and encrypting the distributed storage of the data set;
a data sharing module, configured to:
and interacting with the cloud platform, processing data in the cloud platform and sharing the data through a network.
Specifically, the data storage unit can store the data set in a distributed manner through the cloud platform, so that the storage space requirement of the server is reduced, meanwhile, the data security can be improved through an encryption mode, and various data can be conveniently and effectively fetched and applied through interaction of the data sharing module and the cloud platform.
In order to solve the technical problems that the existing data analysis system can only realize simple statistical processing on the existing data, but cannot perform deep data mining analysis on the running state of an enterprise according to the existing data, and is difficult to fully utilize the data, the invention provides the following technical scheme:
the data mining unit includes:
the abnormality rejection module is used for:
extracting key bytes of the grabbing data set for verification, carrying out association analysis on the key bytes, carrying out association analysis on the grabbing data set when the verification is carried out by an abnormal rejection module, building a data association analysis model, inputting the grabbing data set into the data association analysis model for data analysis, outputting an analysis report based on a data analysis result, determining the distinguishing key bytes with abnormality, extracting abnormal data corresponding to the distinguishing key bytes from each item of data for rejection, and generating de-abnormal data based on the grabbing data set after the abnormal data rejection;
the data cleaning module is used for:
and cleaning the data of the basic data file based on the de-alien data, and cleaning and screening the basic data file based on the association stamp corresponding to the de-alien data during data cleaning to generate a determined data set.
Specifically, by checking the key bytes, the data in the grabbing data set can be effectively checked and screened, the data in the grabbing data set is screened, the repeated and abnormal data are cleaned and removed, the data are mined, the cost and difficulty of data mining are effectively reduced, the mining, storage and utilization of the data can be comprehensively and systematically carried out, a unified system is formed for carrying out data utilization in a targeted mode, and the data utilization rate is improved.
The data feedback unit includes:
the quick search module is used for:
interacting with the cloud platform and providing query service based on an index system;
the data feedback module is used for:
and displaying and reminding through the display equipment according to the retrieval result of the quick retrieval module.
The cloud platform includes:
cloud database for:
classifying and storing a keyword set in the received basic data, text information and picture characteristics in an image data file and a key area in a video data file according to a data stream label;
a data processing module for:
grouping the classified stored data according to stream attribute information and data content, and classifying and marking, wherein the grouping comprises a basic data set, an image data set and a video data set;
the data matching module is used for:
and respectively matching the data sets processed and grouped by the data processing module according to the attribute information of the data storage unit to generate the data requirements of the corresponding data streams, and butting the data requirements of the data streams with the data sets.
Specifically, the cloud database can store the data information set and the determined data set, the data information set and the determined data set stored in the cloud database can be searched through the quick search module, so that when the data information set and the determined data set need to be applied, the data information set and the determined data set can be searched and called through creating a call task, and the data feedback module is used.
The data encryption module includes:
an encryption management sub-module for:
dividing the data set into a plurality of parts according to the distributed storage condition of the data set, registering an encryption method in each part, then carrying out use record aiming at the encryption method, carrying out key management aiming at a key in the use process of the encryption method, and forming a key index by combining the key management in the use record;
an encryption processing sub-module for:
and the distributed storage information of the distributed storage of the data set is obtained through interaction with the data storage module, encryption processing is carried out on an encryption method in a corresponding part according to the distributed storage information, and meanwhile, a secret key in the encryption processing process is fed back to the encryption management sub-module.
The data encryption module in the technical scheme comprises an encryption management sub-module and an encryption processing sub-module, wherein the encryption management sub-module is divided into a plurality of parts according to the distributed storage condition of the data set in a distributed mode, each part is used for carrying out key management by registering an encryption method and a true-lost encryption method, and each part is used for registering the encryption method and corresponds to the distributed storage condition of the data set in a distributed mode; the encryption processing sub-module interacts with the data storage module to encrypt the distributed storage of the data set. When the data encryption module encrypts the distributed storage of the data set, the encryption processing sub-module interacts with the data storage unit to obtain distributed storage information of the distributed storage of the data set in the data storage unit, then partial matching is carried out according to the distributed storage information, the encryption method in the corresponding part is called according to the matching condition to carry out encryption processing on the distributed storage information, the secret key in the encryption process is fed back to the encryption management sub-module, meanwhile, the encryption management sub-module carries out use records on the calling condition of the encryption method in each part, when the secret key feedback is obtained, the fed-back secret key is managed, and meanwhile, a secret key index is formed by combining the management of the secret key in the use records, so that the original information of the distributed storage information can be obtained by decryption according to the secret key index when the encrypted distributed storage information is decrypted.
The data encryption module is used for encrypting the distributed storage information of the data set distributed storage in the data storage module through the encryption management sub-module and the encryption processing sub-module, the security of the distributed storage information in the data storage module is improved, in addition, in the encryption management sub-module, the encryption management sub-module is divided into a plurality of parts according to the distributed condition of the data set distributed storage, so that the encryption method in each part can encrypt the distributed storage information in the corresponding distributed condition, the keyword set in the basic data, the text information and the picture characteristic in the image data file and the key region in the video data file can be better encrypted, the suitability of the distributed storage information and the encryption method is improved, the security coefficient of the distributed storage information is higher, the encryption processing sub-module feeds back the key in the encryption processing process to the encryption management sub-module, the encryption management sub-module can manage the key in the encryption process, the key adopted in the encryption process can be decrypted when the distributed storage information is obtained, and the distributed storage information can be ensured to be reserved when the original data is used later.
The video processing module determines a plurality of frames of video images obtained by combining when determining a key region in each video image, and comprises the following steps:
image recognition is carried out on the video image, imaging conditions in the video image are recognized, and an image recognition result is obtained;
dividing the video image into a plurality of areas according to the image recognition result;
the video image is analyzed in combination with the adjacent frame video image by the following formula:
in the above-mentioned formula(s),indicate->Analytical data value of block area, < >>Representing a symbolic function +_>Indicate->First->Block area +.>Personal specialImage information of symptom point->Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Representing the total frame number of the video image, +.>Indicate->Analysis of the block region, when->When indicate->The block areas are used to constitute the critical areas when +.>When indicate->The block regions are not used to construct critical regions;
the regions for constituting the key regions are combined together to form the key regions of the video image with reference to the analysis result.
Specifically, the data processing module divides according to the imaging condition of people or objects appearing in the video image, avoids dividing imaging in the video image into different areas, ensures the integrity of the key areas, and determines whether each area changes in the video image or not by combining adjacent frame video images, so that the changed part is used as a data set of the video image, the characteristics of the video image can be reflected, the redundancy of information can be reduced, the video image can be analyzed in adjacent frames by combining left and right adjacent frame video images during analysis, small changes of interval video images are avoided to be ignored, the accuracy of area analysis is improved, the accuracy of the key areas is further ensured, in addition, the areas for forming the key areas are combined together according to the analysis result, one key area included in each video image can more comprehensively comprise the changed part in the video image, and the key area can more accurately represent the video data file.
In order to better show a data mining system based on big data, the embodiment now provides a mining method of the data mining system based on big data, which comprises the following steps:
step one: the data acquisition unit acquires basic data, image data and video data files and generates a basic data file, and performs characteristic grabbing based on the basic data file and generates a grabbing data set;
step two: the data storage unit performs data set distributed storage and encryption on the grabbing data set;
step three: the data mining unit cleans and rejects the abnormal data and generates a determined data set;
step four: the data storage unit stores the determined data set based on the cloud platform;
step five: the data storage unit shares data through a network, and the data feedback unit retrieves and displays the data based on the cloud platform.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should be covered by the protection scope of the present invention by making equivalents and modifications to the technical solution and the inventive concept thereof.

Claims (10)

1. A big data based data mining system, comprising:
the data acquisition unit is used for:
collecting basic data, image data and video data files, generating a basic data file, reading text information of the basic data to obtain a keyword set, extracting text information and picture characteristics of the image data file, capturing the keyword set in the basic data and the text information and picture characteristics in the image data file, and generating a captured data set;
a data storage unit for:
interacting with a cloud platform, carrying out data set distributed storage and encryption on a keyword set in basic data, text information and picture characteristics in an image data file and a key region in a video data file, and sharing data through a network on the basis of the cloud platform;
a data mining unit for:
extracting key bytes of the grabbing data set, checking and eliminating abnormal data to generate de-abnormal data, and cleaning the data of the basic data file based on the de-abnormal data to generate a determined data set;
a data feedback unit for:
carrying out data retrieval and retrieval result display and reminding on the cloud platform;
cloud platform for:
after the classified information is stored and edited through the cloud, the classified information is transmitted to a data storage unit, and feedback information is returned;
a user terminal for:
the system is used for storing, operating and implementing the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit, and comprises at least one login end and at least one control terminal, when the control terminal works, the data acquisition unit, the data storage unit, the data mining unit and the data feedback unit are operated, and the data mining system based on big data is realized, and the login end is connected to the control terminal, the cloud platform and the server through the Internet.
2. A big data based data mining system according to claim 1, wherein: the data acquisition unit includes:
the file acquisition module is used for:
collecting basic data, image data and video data files, and generating a basic data file based on the basic data, the image data and the video data files;
a text reading module for:
reading the text information of the basic data, dividing the text information to obtain a plurality of extracted words in the text information, and cleaning the extracted words according to the part-of-speech statistical characteristics to obtain a keyword set;
an image processing module for:
extracting text information and picture characteristics of the image data files, creating associated stamps for the image data files, obtaining associated stamps of each image data file, wherein the associated stamp of each image data file is a global unique associated stamp, and carrying out associated stamp association on the extracted text information and picture characteristics of each image data file;
the video processing module is used for:
cutting out a video key segment of a video data file, framing the video key segment to obtain a plurality of frames of video images, and determining a key area in each video image, wherein each video image comprises a key area;
the information grabbing module is used for:
and capturing the keyword set in the basic data, the text information and the picture characteristics in the image data file and the key region in the video data file, and generating a captured data set based on the captured data.
3. A big data based data mining system according to claim 1, wherein: the data storage unit includes:
a data storage module for:
the method comprises the steps of interacting with a cloud platform, carrying out data set distributed storage on a keyword set in basic data, text information and picture characteristics in an image data file and a key area in a video data file, and positioning data storage information through the cloud platform;
a data encryption module for:
interacting with a data storage module, and encrypting the distributed storage of the data set;
a data sharing module, configured to:
and interacting with the cloud platform, processing data in the cloud platform and sharing the data through a network.
4. A big data based data mining system according to claim 1, wherein: the data mining unit includes:
the abnormality rejection module is used for:
extracting key bytes of the grabbing data set for verification, carrying out association analysis on the key bytes, determining abnormal distinguishing key bytes, extracting abnormal data corresponding to the distinguishing key bytes from each item of data for eliminating, and generating de-abnormal data based on the grabbing data set after eliminating the abnormal data;
the data cleaning module is used for:
and cleaning the data of the basic data file based on the de-alien data, and cleaning and screening the basic data file based on the association stamp corresponding to the de-alien data during data cleaning to generate a determined data set.
5. The big data based data mining system of claim 4, wherein: and the abnormal eliminating module carries out association analysis on the grabbing data set when checking, builds a data association analysis model, inputs the grabbing data set into the data association analysis model for data analysis, and outputs an analysis report based on a data analysis result.
6. A big data based data mining system according to claim 1, wherein: the data feedback unit includes:
the quick search module is used for:
interacting with the cloud platform and providing query service based on an index system;
the data feedback module is used for:
and displaying and reminding through the display equipment according to the retrieval result of the quick retrieval module.
7. A big data based data mining system according to claim 1, wherein: the cloud platform includes:
cloud database for:
classifying and storing a keyword set in the received basic data, text information and picture characteristics in an image data file and a key area in a video data file according to a data stream label;
a data processing module for:
grouping the classified stored data according to stream attribute information and data content, and classifying and marking, wherein the grouping comprises a basic data set, an image data set and a video data set;
the data matching module is used for:
and respectively matching the data sets processed and grouped by the data processing module according to the attribute information of the data storage unit to generate the data requirements of the corresponding data streams, and butting the data requirements of the data streams with the data sets.
8. A big data based data mining system according to claim 3, wherein: the data encryption module includes:
an encryption management sub-module for:
dividing the data set into a plurality of parts according to the distributed storage condition of the data set, registering an encryption method in each part, then carrying out use record aiming at the encryption method, carrying out key management aiming at a key in the use process of the encryption method, and forming a key index by combining the key management in the use record;
an encryption processing sub-module for:
and the distributed storage information of the distributed storage of the data set is obtained through interaction with the data storage module, encryption processing is carried out on an encryption method in a corresponding part according to the distributed storage information, and meanwhile, a secret key in the encryption processing process is fed back to the encryption management sub-module.
9. A big data based data mining system according to claim 2, wherein: the video processing module determines a plurality of frames of video images obtained by combining when determining a key region in each video image, and comprises the following steps:
image recognition is carried out on the video image, imaging conditions in the video image are recognized, and an image recognition result is obtained;
dividing the video image into a plurality of areas according to the image recognition result;
the video image is analyzed in combination with the adjacent frame video image by the following formula:
in the above-mentioned formula(s),indicate->Analytical data value of block area, < >>Representing a symbolic function +_>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->Block area +.>Image information of individual feature points, < >>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Indicate->First->The number of feature points in the block area, +.>Representing the total frame number of the video image, +.>Indicate->Analysis of the block region, when->When indicate->The block areas are used to constitute the critical areas when +.>When indicate->The block regions are not used to construct critical regions;
the regions for constituting the key regions are combined together to form the key regions of the video image with reference to the analysis result.
10. A mining method of a big data based data mining system according to any of claims 1-9, characterized by: the method comprises the following steps:
step one: the data acquisition unit acquires basic data, image data and video data files and generates a basic data file, and performs characteristic grabbing based on the basic data file and generates a grabbing data set;
step two: the data storage unit performs data set distributed storage and encryption on the grabbing data set;
step three: the data mining unit cleans and rejects the abnormal data and generates a determined data set;
step four: the data storage unit stores the determined data set based on the cloud platform;
step five: the data storage unit shares data through a network, and the data feedback unit retrieves and displays the data based on the cloud platform.
CN202410073443.7A 2024-01-18 2024-01-18 Data mining system and mining method based on big data Active CN117591578B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410073443.7A CN117591578B (en) 2024-01-18 2024-01-18 Data mining system and mining method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410073443.7A CN117591578B (en) 2024-01-18 2024-01-18 Data mining system and mining method based on big data

Publications (2)

Publication Number Publication Date
CN117591578A true CN117591578A (en) 2024-02-23
CN117591578B CN117591578B (en) 2024-04-09

Family

ID=89922331

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410073443.7A Active CN117591578B (en) 2024-01-18 2024-01-18 Data mining system and mining method based on big data

Country Status (1)

Country Link
CN (1) CN117591578B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294849A (en) * 2016-08-23 2017-01-04 成都卡莱博尔信息技术股份有限公司 Mass data inquiry system based on data mining technology
CN106815253A (en) * 2015-12-01 2017-06-09 慧科讯业有限公司 A kind of method for digging based on mixed data type data
CN107577771A (en) * 2017-09-07 2018-01-12 北京海融兴通信息安全技术有限公司 A kind of big data digging system
CN110472075A (en) * 2018-05-09 2019-11-19 中国互联网络信息中心 A kind of isomeric data classification storage method and system based on machine learning
KR20210074734A (en) * 2019-12-12 2021-06-22 동의대학교 산학협력단 System and Method for Extracting Keyword and Ranking in Video Subtitle
US20220004429A1 (en) * 2020-07-01 2022-01-06 International Business Machines Corporation Multi-modal data explainer pipeline
CN114117174A (en) * 2021-09-02 2022-03-01 杨子晴 Multi-format data screening management system based on big data
CN114764464A (en) * 2020-12-30 2022-07-19 北京华录新媒信息技术有限公司 Video content pushing technology based on data mining
CN115994173A (en) * 2022-12-07 2023-04-21 呼和浩特市大旗网络有限公司 Data mining system based on big data
CN116501779A (en) * 2023-06-26 2023-07-28 图林科技(深圳)有限公司 Big data mining analysis system for real-time feedback
CN116501725A (en) * 2023-05-23 2023-07-28 厦门快快网络科技有限公司 Big data processing method based on cloud computing
CN116916049A (en) * 2023-09-12 2023-10-20 北京青水环境科技有限公司 Video data online acquisition and storage system based on cloud computing technology

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815253A (en) * 2015-12-01 2017-06-09 慧科讯业有限公司 A kind of method for digging based on mixed data type data
CN106294849A (en) * 2016-08-23 2017-01-04 成都卡莱博尔信息技术股份有限公司 Mass data inquiry system based on data mining technology
CN107577771A (en) * 2017-09-07 2018-01-12 北京海融兴通信息安全技术有限公司 A kind of big data digging system
CN110472075A (en) * 2018-05-09 2019-11-19 中国互联网络信息中心 A kind of isomeric data classification storage method and system based on machine learning
KR20210074734A (en) * 2019-12-12 2021-06-22 동의대학교 산학협력단 System and Method for Extracting Keyword and Ranking in Video Subtitle
US20220004429A1 (en) * 2020-07-01 2022-01-06 International Business Machines Corporation Multi-modal data explainer pipeline
CN114764464A (en) * 2020-12-30 2022-07-19 北京华录新媒信息技术有限公司 Video content pushing technology based on data mining
CN114117174A (en) * 2021-09-02 2022-03-01 杨子晴 Multi-format data screening management system based on big data
CN115994173A (en) * 2022-12-07 2023-04-21 呼和浩特市大旗网络有限公司 Data mining system based on big data
CN116501725A (en) * 2023-05-23 2023-07-28 厦门快快网络科技有限公司 Big data processing method based on cloud computing
CN116501779A (en) * 2023-06-26 2023-07-28 图林科技(深圳)有限公司 Big data mining analysis system for real-time feedback
CN116916049A (en) * 2023-09-12 2023-10-20 北京青水环境科技有限公司 Video data online acquisition and storage system based on cloud computing technology

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHANCHI DONG.ETC: "Research of big data information mining and analysis", IEEE, 31 December 2022 (2022-12-31) *
张福铮;黄文琦;赵继光;董召杰;刘宸哲;严叶舟;曹立楠;: "基于Hadoop的电网非结构化数据智能分析云平台", 信息技术与信息化, no. 05, 28 May 2020 (2020-05-28) *
赵大海;郭晶;: "智能情报获取系统框架研究", 军民两用技术与产品, no. 08, 15 August 2020 (2020-08-15) *

Also Published As

Publication number Publication date
CN117591578B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
CN107656974B (en) Big data analysis system
AU2014237406B2 (en) Method and apparatus for substitution scheme for anonymizing personally identifiable information
CN108460582B (en) System information processing method, apparatus, computer device and storage medium
US20020038430A1 (en) System and method of data collection, processing, analysis, and annotation for monitoring cyber-threats and the notification thereof to subscribers
CN105138709B (en) Remote evidence taking system based on physical memory analysis
Teyssou et al. The InVID plug-in: web video verification on the browser
US7908239B2 (en) System for storing event data using a sum calculator that sums the cubes and squares of events
US20230327848A1 (en) Enhanced encryption for face-related data
CN109492604A (en) Faceform&#39;s characteristic statistics analysis system
CN114140082B (en) Enterprise content management system
CN113360566A (en) Information content monitoring method and system
CN117591578B (en) Data mining system and mining method based on big data
CN111639355B (en) Data security management method and system
CN115221453B (en) Media resource management method, device, server and medium
US20180295145A1 (en) Multicomputer Digital Data Processing to Provide Information Security Control
CN114817754A (en) VR learning system
CN111831683A (en) Automatic auditing method and system based on dynamic extended scene matching
KR20110070767A (en) Remote forensics system based on network
Kahvedžić Digital forensics and the DSAR effect
CN116405300B (en) Scene-based online protocol signing security analysis system and method
CN117493466B (en) Financial data synchronization method and system
CN114881774B (en) Electronic archive management system based on voucher information processing
Babau et al. A comprehensive survey of big data analytics and techniques
CN114936321A (en) Internet information data acquisition system with high accuracy
Parhad et al. Comparative analysis of Data Extraction for Qualcomm based android devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant