CN116451033A - Data noise filtering method and device and related products - Google Patents

Data noise filtering method and device and related products Download PDF

Info

Publication number
CN116451033A
CN116451033A CN202310265925.8A CN202310265925A CN116451033A CN 116451033 A CN116451033 A CN 116451033A CN 202310265925 A CN202310265925 A CN 202310265925A CN 116451033 A CN116451033 A CN 116451033A
Authority
CN
China
Prior art keywords
data
noise
continuity
discrete
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310265925.8A
Other languages
Chinese (zh)
Inventor
阮安邦
魏明
王佳帅
王铀之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Octa Innovations Information Technology Co Ltd
Original Assignee
Beijing Octa Innovations Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Octa Innovations Information Technology Co Ltd filed Critical Beijing Octa Innovations Information Technology Co Ltd
Priority to CN202310265925.8A priority Critical patent/CN116451033A/en
Publication of CN116451033A publication Critical patent/CN116451033A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Processing (AREA)

Abstract

The application discloses a data noise filtering method, a data noise filtering device and related products. The method comprises the following steps: performing attribute labeling processing on the target data to obtain corresponding attribute feature description; according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value; dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value; determining first noise in the continuity data set according to the set first information entropy; determining second noise in the discrete data set according to the set second information entropy; filtering first noise from the continuous data set to obtain continuous net data, and filtering second noise from the discrete data set to obtain discrete net data; and generating a net data set according to the continuous net data and the discrete net data, thereby realizing the finer granularity of denoising and improving the accuracy of noise processing.

Description

Data noise filtering method and device and related products
Technical Field
The present disclosure relates to the field of privacy computing technologies, and in particular, to a method and an apparatus for filtering data noise, and a related product.
Background
The rapid development of big data makes the value of the data gradually reflect. During data storage or use, data needs to be noise processed. However, when noise processing is performed on data at present, the overall noise processing is performed on the data, so that the granularity of processing is large, and the accuracy of noise processing is low.
Disclosure of Invention
Based on the above problems, the embodiments of the present application provide a method, an apparatus, and a related product for filtering data noise.
The embodiment of the application discloses the following technical scheme:
a method of filtering data noise, comprising:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
Filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
Optionally, the method further comprises: performing blocking processing on the target data to obtain a plurality of data blocks;
the attribute labeling processing is performed on the target data to obtain corresponding attribute feature description, which comprises the following steps: carrying out attribute labeling processing by taking the data blocks as units to obtain attribute feature vectors corresponding to each data block; and performing splicing processing on the attribute feature vectors corresponding to all the data blocks to obtain the attribute feature description corresponding to the target data.
Optionally, the evaluating the feature continuity and the discreteness of the target data according to the attribute feature description to obtain a continuity evaluation value and a discreteness evaluation value respectively includes: calculating the attention value among different attribute feature vectors in the attribute feature description; and respectively evaluating the feature continuity and the discreteness of the target data according to the attention value to obtain a continuity evaluation value and a discreteness evaluation value.
Optionally, the dividing the target data into a continuous data set and a discrete data set based on the continuous evaluation value and the discrete evaluation value includes: screening data blocks with the step length between the continuity evaluation values smaller than a set continuity value threshold value from the data blocks to form the continuity data set; and screening data blocks with the step length between the continuity evaluation values larger than or equal to the set continuity value threshold value from the data blocks so as to form the discrete data set.
Optionally, the determining the first noise in the continuous dataset according to the set first information entropy includes: and calculating the information entropy of the continuous data set, and comparing the information entropy with the set first information entropy to determine the first noise in the continuous data set.
Optionally, the determining the second noise in the discrete dataset according to the set second information entropy includes: and calculating the information entropy of the discrete data set, and comparing the information entropy with the set second information entropy to determine second noise in the discrete data set.
Optionally, the generating a net data set according to the continuous net data and the discrete net data includes: the continuous net data and the discrete net data are fused based on an attention matrix between the continuous net data and the discrete net data.
Optionally, the method further comprises: sample data is extracted from a target data set to take the extracted sample data as the target data.
Optionally, the method further comprises: and calling a set replacement data sampling mechanism to extract sample data from the target data set.
Optionally, the performing attribute labeling processing on the target data to obtain a corresponding attribute feature description includes: acquiring a scheduling command issued by a control node in a distributed processing cluster; and calling the marking node to carry out attribute marking processing on the target data according to the scheduling command to obtain corresponding attribute feature description.
Optionally, the partitioning the target data to obtain a plurality of data blocks includes: and performing data dicing processing on the target data based on the number of the labeling nodes to obtain a plurality of data blocks, so that the number of the data blocks is equal to the number of the labeling nodes.
Optionally, the target data is subjected to blocking processing to obtain a plurality of data blocks, including: and performing data dicing processing on the target data based on the number of the labeling nodes and the data processing amount of the single labeling node to obtain a plurality of data blocks, so that the data amount of the single data block is equal to the data amount of the single labeling node.
Optionally, the performing attribute labeling processing on the target data to obtain a corresponding attribute feature description includes: and carrying out attribute labeling processing on the target data based on a preset data attribute feature set to obtain corresponding attribute feature description.
Optionally, the performing attribute labeling processing on the target data based on the preset data attribute feature set to obtain a corresponding attribute feature description includes: and according to regular matching, performing attribute labeling processing on the target data based on a preset data attribute feature set to obtain corresponding attribute feature description.
Optionally, the performing attribute labeling processing on the target data based on the preset data attribute feature set to obtain a corresponding attribute feature description includes:
based on a preset data attribute feature set, carrying out parallel attribute labeling processing on the target data, and giving attribute labeling values;
and obtaining the attribute feature description corresponding to the target data according to the labeling value.
Optionally, the performing parallel attribute labeling processing on the target data and assigning attribute labeling values includes: and carrying out parallel attribute labeling processing on a plurality of data blocks included in the target data, and assigning attribute labeling values to each data block.
A data noise filtering apparatus, comprising:
the labeling unit is used for carrying out attribute labeling processing on the target data to obtain corresponding attribute feature description;
the evaluation unit is used for respectively evaluating the feature continuity and the discreteness of the target data according to the attribute feature description to obtain a continuity evaluation value and a discreteness evaluation value;
a dividing unit configured to divide the target data into a continuous data set and a discrete data set based on the continuous evaluation value and the discrete evaluation value;
a first noise determining unit configured to determine a first noise in the continuous data set according to a set first information entropy;
a second noise determining unit configured to determine a second noise in the discrete data set according to a set second information entropy;
the first noise filtering unit is used for filtering the first noise from the continuous data set to obtain continuous net data;
a second noise filtering unit, configured to filter the second noise from the discrete data set to obtain discrete net data;
and the net data generating unit is used for generating a net data set according to the continuous net data and the discrete net data.
Optionally, the device further comprises a dicing unit, configured to perform a dicing process on the target data to obtain a plurality of data blocks;
the labeling unit is specifically used for carrying out attribute labeling processing by taking the data blocks as units to obtain attribute feature vectors corresponding to each data block; and performing splicing processing on the attribute feature vectors corresponding to all the data blocks to obtain the attribute feature description corresponding to the target data.
Optionally, the evaluation unit is specifically configured to calculate an attention value between different attribute feature vectors in the attribute feature description; and respectively evaluating the feature continuity and the discreteness of the target data according to the attention value to obtain a continuity evaluation value and a discreteness evaluation value.
Optionally, the dividing unit is specifically configured to screen out data blocks, where a step length between continuity evaluation values is smaller than a set continuity value threshold, from the plurality of data blocks, so as to form the continuity data set; and screening data blocks with the step length between the continuity evaluation values larger than or equal to the set continuity value threshold value from the data blocks so as to form the discrete data set.
Optionally, the first noise determining unit is specifically configured to calculate an information entropy of the continuous data set, and compare the information entropy with a set first information entropy to determine a first noise in the continuous data set.
Optionally, the second noise determining unit is specifically configured to calculate an information entropy of the discrete dataset, and compare the information entropy with a set second information entropy to determine a second noise in the discrete dataset.
Optionally, the net data generating unit is specifically configured to fuse the continuous net data and the discrete net data based on an attention matrix between the continuous net data and the discrete net data.
Optionally, the apparatus further comprises an extracting unit, configured to extract sample data from a target data set, so as to use the extracted sample data as the target data.
Optionally, the extraction unit is further configured to invoke a set-back data sampling mechanism to extract sample data from the target dataset.
Optionally, the labeling unit is specifically configured to obtain a scheduling command issued by a control node in the distributed processing cluster; and calling the marking node to carry out attribute marking processing on the target data according to the scheduling command to obtain corresponding attribute feature description.
Optionally, the blocking unit is specifically configured to perform data blocking processing on the target data based on the number of the labeling nodes, so as to obtain a plurality of data blocks, so that the number of the data blocks is equal to the number of the labeling nodes.
Optionally, the blocking unit is specifically configured to perform data blocking processing on the target data based on the number of the labeling nodes and the data processing amount of the single labeling node, so as to obtain a plurality of data blocks, so that the data amount of the single data block is equal to the data amount of the single labeling node.
Optionally, the labeling unit is specifically configured to perform attribute labeling processing on the target data based on a preset data attribute feature set, so as to obtain a corresponding attribute feature description.
Optionally, the labeling unit is specifically configured to perform attribute labeling processing on the target data based on a preset data attribute feature set according to regular matching, so as to obtain a corresponding attribute feature description.
Optionally, the labeling unit is specifically configured to perform parallel attribute labeling processing on the target data based on a preset data attribute feature set, and assign an attribute labeling value; and obtaining the attribute feature description corresponding to the target data according to the labeling value.
Optionally, the labeling unit is specifically configured to perform parallel attribute labeling processing on a plurality of data blocks included in the target data, and assign an attribute labeling value to each data block.
An electronic device comprising a memory having an executable program stored thereon, and a processor that when executing the executable program performs the steps of:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
A computer storage medium storing a computer executable program that when executed performs the steps of:
Performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
A computer program product, the computer storage medium storing computer executable instructions that, when executed, perform the steps of:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
According to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
In the scheme provided by the embodiment of the application, attribute labeling processing is carried out on the target data to obtain corresponding attribute feature description; according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value; dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value; determining first noise in the continuous data set according to the set first information entropy; determining second noise in the discrete dataset according to the set second information entropy; filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data; and generating a net data set according to the continuous net data and the discrete net data, thereby realizing the finer granularity of denoising and improving the accuracy of noise processing.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive faculty for a person skilled in the art.
Fig. 1 is a flow chart of a method for filtering data noise according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a filtering device for data noise according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Fig. 4 is a schematic hardware structure of an electronic device in an embodiment of the present application.
Detailed Description
It is not necessary for any of the embodiments of the present application to be practiced with all of the advantages described above.
In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Fig. 1 is a flow chart of a method for filtering data noise according to an embodiment of the present application. As shown in fig. 1, it includes:
s101, performing attribute labeling processing on target data to obtain corresponding attribute feature description;
s102, respectively evaluating the feature continuity and the discreteness of the target data according to the attribute feature description to obtain a continuity evaluation value and a discreteness evaluation value;
s103, dividing the target data into a continuous data set and a discrete data set based on the continuous evaluation value and the discrete evaluation value;
s104, determining first noise in the continuous data set according to the set first information entropy;
s105, determining second noise in the discrete data set according to the set second information entropy;
s106, filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
and S107, generating a net data set according to the continuous net data and the discrete net data.
Optionally, the method further comprises: performing blocking processing on the target data to obtain a plurality of data blocks;
The attribute labeling processing is performed on the target data to obtain corresponding attribute feature description, which comprises the following steps: carrying out attribute labeling processing by taking the data blocks as units to obtain attribute feature vectors corresponding to each data block; and performing splicing processing on the attribute feature vectors corresponding to all the data blocks to obtain the attribute feature description corresponding to the target data.
Optionally, the evaluating the feature continuity and the discreteness of the target data according to the attribute feature description to obtain a continuity evaluation value and a discreteness evaluation value respectively includes: calculating the attention value among different attribute feature vectors in the attribute feature description; and respectively evaluating the feature continuity and the discreteness of the target data according to the attention value to obtain a continuity evaluation value and a discreteness evaluation value.
Optionally, the dividing the target data into a continuous data set and a discrete data set based on the continuous evaluation value and the discrete evaluation value includes: screening data blocks with the step length between the continuity evaluation values smaller than a set continuity value threshold value from the data blocks to form the continuity data set; and screening data blocks with the step length between the continuity evaluation values larger than or equal to the set continuity value threshold value from the data blocks so as to form the discrete data set.
Optionally, the determining the first noise in the continuous dataset according to the set first information entropy includes: and calculating the information entropy of the continuous data set, and comparing the information entropy with the set first information entropy to determine the first noise in the continuous data set.
Optionally, the determining the second noise in the discrete dataset according to the set second information entropy includes: and calculating the information entropy of the discrete data set, and comparing the information entropy with the set second information entropy to determine second noise in the discrete data set.
Optionally, the generating a net data set according to the continuous net data and the discrete net data includes: the continuous net data and the discrete net data are fused based on an attention matrix between the continuous net data and the discrete net data.
Optionally, the method further comprises: sample data is extracted from a target data set to take the extracted sample data as the target data.
Optionally, the method further comprises: and calling a set replacement data sampling mechanism to extract sample data from the target data set.
Optionally, the performing attribute labeling processing on the target data to obtain a corresponding attribute feature description includes: acquiring a scheduling command issued by a control node in a distributed processing cluster; and calling the marking node to carry out attribute marking processing on the target data according to the scheduling command to obtain corresponding attribute feature description.
Optionally, the partitioning the target data to obtain a plurality of data blocks includes: and performing data dicing processing on the target data based on the number of the labeling nodes to obtain a plurality of data blocks, so that the number of the data blocks is equal to the number of the labeling nodes.
Optionally, the target data is subjected to blocking processing to obtain a plurality of data blocks, including: and performing data dicing processing on the target data based on the number of the labeling nodes and the data processing amount of the single labeling node to obtain a plurality of data blocks, so that the data amount of the single data block is equal to the data amount of the single labeling node.
Optionally, the performing attribute labeling processing on the target data to obtain a corresponding attribute feature description includes: and carrying out attribute labeling processing on the target data based on a preset data attribute feature set to obtain corresponding attribute feature description.
Optionally, the performing attribute labeling processing on the target data based on the preset data attribute feature set to obtain a corresponding attribute feature description includes: and according to regular matching, performing attribute labeling processing on the target data based on a preset data attribute feature set to obtain corresponding attribute feature description.
Optionally, the performing attribute labeling processing on the target data based on the preset data attribute feature set to obtain a corresponding attribute feature description includes:
based on a preset data attribute feature set, carrying out parallel attribute labeling processing on the target data, and giving attribute labeling values;
and obtaining the attribute feature description corresponding to the target data according to the labeling value.
Optionally, the performing parallel attribute labeling processing on the target data and assigning attribute labeling values includes: and carrying out parallel attribute labeling processing on a plurality of data blocks included in the target data, and assigning attribute labeling values to each data block.
Fig. 2 is a schematic structural diagram of a filtering device for data noise according to an embodiment of the present application. As shown in fig. 2, it includes:
the labeling unit 201 is configured to perform attribute labeling processing on the target data to obtain a corresponding attribute feature description;
An evaluation unit 202, configured to evaluate feature continuity and discreteness of the target data according to the attribute feature description, to obtain a continuity evaluation value and a discreteness evaluation value;
a dividing unit 203 for dividing the target data into a continuous data set and a discrete data set based on the continuous evaluation value and the discrete evaluation value;
a first noise determining unit 204, configured to determine a first noise in the continuous data set according to a set first information entropy;
a second noise determining unit 205 configured to determine a second noise in the discrete dataset according to a set second information entropy;
a first noise filtering unit 206, configured to filter the first noise from the continuous data set to obtain continuous net data;
a second noise filtering unit 207, configured to filter the second noise from the discrete data set to obtain discrete net data;
a net data generating unit 208, configured to generate a net data set according to the continuous net data and the discrete net data.
Optionally, the device further comprises a dicing unit, configured to perform a dicing process on the target data to obtain a plurality of data blocks;
The labeling unit is specifically used for carrying out attribute labeling processing by taking the data blocks as units to obtain attribute feature vectors corresponding to each data block; and performing splicing processing on the attribute feature vectors corresponding to all the data blocks to obtain the attribute feature description corresponding to the target data.
Optionally, the evaluation unit is specifically configured to calculate an attention value between different attribute feature vectors in the attribute feature description; and respectively evaluating the feature continuity and the discreteness of the target data according to the attention value to obtain a continuity evaluation value and a discreteness evaluation value.
Optionally, the dividing unit is specifically configured to screen out data blocks, where a step length between continuity evaluation values is smaller than a set continuity value threshold, from the plurality of data blocks, so as to form the continuity data set; and screening data blocks with the step length between the continuity evaluation values larger than or equal to the set continuity value threshold value from the data blocks so as to form the discrete data set.
Optionally, the first noise determining unit is specifically configured to calculate an information entropy of the continuous data set, and compare the information entropy with a set first information entropy to determine a first noise in the continuous data set.
Optionally, the second noise determining unit is specifically configured to calculate an information entropy of the discrete dataset, and compare the information entropy with a set second information entropy to determine a second noise in the discrete dataset.
Optionally, the net data generating unit is specifically configured to fuse the continuous net data and the discrete net data based on an attention matrix between the continuous net data and the discrete net data.
Optionally, the apparatus further comprises an extracting unit, configured to extract sample data from a target data set, so as to use the extracted sample data as the target data.
Optionally, the extraction unit is further configured to invoke a set-back data sampling mechanism to extract sample data from the target dataset.
Optionally, the labeling unit is specifically configured to obtain a scheduling command issued by a control node in the distributed processing cluster; and calling the marking node to carry out attribute marking processing on the target data according to the scheduling command to obtain corresponding attribute feature description.
Optionally, the blocking unit is specifically configured to perform data blocking processing on the target data based on the number of the labeling nodes, so as to obtain a plurality of data blocks, so that the number of the data blocks is equal to the number of the labeling nodes.
Optionally, the blocking unit is specifically configured to perform data blocking processing on the target data based on the number of the labeling nodes and the data processing amount of the single labeling node, so as to obtain a plurality of data blocks, so that the data amount of the single data block is equal to the data amount of the single labeling node.
Optionally, the labeling unit is specifically configured to perform attribute labeling processing on the target data based on a preset data attribute feature set, so as to obtain a corresponding attribute feature description.
Optionally, the labeling unit is specifically configured to perform attribute labeling processing on the target data based on a preset data attribute feature set according to regular matching, so as to obtain a corresponding attribute feature description.
Optionally, the labeling unit is specifically configured to perform parallel attribute labeling processing on the target data based on a preset data attribute feature set, and assign an attribute labeling value; and obtaining the attribute feature description corresponding to the target data according to the labeling value.
Optionally, the labeling unit is specifically configured to perform parallel attribute labeling processing on a plurality of data blocks included in the target data, and assign an attribute labeling value to each data block.
Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, the system comprises a memory and a processor, wherein the memory stores an executable program, and the processor executes the following steps when running the executable program:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
Fig. 4 is a schematic diagram of a hardware structure of an electronic device in an embodiment of the present application; as shown in fig. 4, the hardware structure of the electronic device may include: the electronic device 400 comprises a computing unit 401 that may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 406 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.
Various components in electronic device 400 are connected to I/O interface 405, including: an input unit 406, an output unit 407, a storage unit 408, and a communication unit 409. The input unit 406 may be any type of device capable of inputting information to the electronic device 400, and the input unit 406 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device. The output unit 407 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, video/audio output terminals, vibrators, and/or printers. Storage unit 404 may include, but is not limited to, magnetic disks, optical disks. The communication unit 409 allows the electronic device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.
The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective devices and processes described above. For example, in some embodiments, the steps described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 40. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 400 via the ROM 402 and/or the communication unit 409. In some embodiments, the computing unit 401 may be configured to perform the above steps by any other suitable means (e.g., by means of firmware).
The electronic device of the embodiments of the present application exist in a variety of forms including, but not limited to:
(1) Mobile communication devices, which are characterized by mobile communication functionality and are aimed at providing voice, data communication. Such terminals include smart phones (e.g., iPhone), multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer equipment, which belongs to the category of personal computers, has the functions of calculation and processing and generally has the characteristic of mobile internet surfing. Such terminals include PDA, MID and UMPC devices, etc., such as iPad.
(3) Portable entertainment devices such devices can display and play multimedia content. Such devices include audio, video players (e.g., iPod), palm game consoles, electronic books, and smart toys and portable car navigation devices.
(4) The server, which is a device for providing computing services, is composed of a processor 410, a hard disk, a memory, a system bus, etc., and is similar to a general computer architecture, but is required to provide highly reliable services, and thus has high requirements in terms of processing capacity, stability, reliability, security, scalability, manageability, etc.
(5) Other electronic devices with data interaction function.
The present application also provides a computer storage medium storing a computer executable program which, when executed, performs the steps of:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
The present application also provides a computer program product, the computer storage medium storing computer executable instructions that, when executed, perform the steps of:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the apparatus embodiments, the description is relatively simple, with reference to the description of the apparatus embodiments in part. The above-described embodiments of the apparatus and system are merely illustrative, in which the modules illustrated as separate components may or may not be physically separate, and the components illustrated as modules may or may not be physical, i.e., may be located in one place, or may be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The foregoing is merely one specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for filtering data noise, comprising:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
Generating a net data set according to the continuous net data and the discrete net data.
2. The method of claim 1, further comprising: performing blocking processing on the target data to obtain a plurality of data blocks;
the attribute labeling processing is performed on the target data to obtain corresponding attribute feature description, which comprises the following steps: carrying out attribute labeling processing by taking the data blocks as units to obtain attribute feature vectors corresponding to each data block; and performing splicing processing on the attribute feature vectors corresponding to all the data blocks to obtain the attribute feature description corresponding to the target data.
3. The method according to claim 2, wherein the evaluating feature continuity and discreteness of the target data according to the attribute feature description to obtain a continuity evaluation value and a discreteness evaluation value, respectively, includes: calculating the attention value among different attribute feature vectors in the attribute feature description; and respectively evaluating the feature continuity and the discreteness of the target data according to the attention value to obtain a continuity evaluation value and a discreteness evaluation value.
4. The method of claim 2, wherein the dividing the target data into a continuity data set and a discretization data set based on the continuity evaluation value and the discretization evaluation value comprises: screening data blocks with the step length between the continuity evaluation values smaller than a set continuity value threshold value from the data blocks to form the continuity data set; and screening data blocks with the step length between the continuity evaluation values larger than or equal to the set continuity value threshold value from the data blocks so as to form the discrete data set.
5. The method of claim 2, wherein determining the first noise in the continuity dataset based on the set first information entropy comprises: and calculating the information entropy of the continuous data set, and comparing the information entropy with the set first information entropy to determine the first noise in the continuous data set.
6. The method of claim 2, wherein determining the second noise in the discrete dataset based on the set second information entropy comprises: and calculating the information entropy of the discrete data set, and comparing the information entropy with the set second information entropy to determine second noise in the discrete data set.
7. A data noise filtering apparatus, comprising:
the labeling unit is used for carrying out attribute labeling processing on the target data to obtain corresponding attribute feature description;
the evaluation unit is used for respectively evaluating the feature continuity and the discreteness of the target data according to the attribute feature description to obtain a continuity evaluation value and a discreteness evaluation value;
a dividing unit configured to divide the target data into a continuous data set and a discrete data set based on the continuous evaluation value and the discrete evaluation value;
A first noise determining unit configured to determine a first noise in the continuous data set according to a set first information entropy;
a second noise determining unit configured to determine a second noise in the discrete data set according to a set second information entropy;
a third noise filtering unit, configured to filter the first noise from the continuous data set to obtain continuous net data;
a fourth noise filtering unit, configured to filter the second noise from the discrete data set to obtain discrete net data;
and the net data generating unit is used for generating a net data set according to the continuous net data and the discrete net data.
8. An electronic device comprising a memory and a processor, the memory having stored thereon an executable program, the processor executing the executable program to perform the steps of:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
Determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
9. A computer storage medium storing a computer executable program which, when executed, performs the steps of:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
Determining second noise in the discrete dataset according to the set second information entropy;
filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
10. A computer program product, wherein the computer storage medium stores computer executable instructions that, when executed, perform the steps of:
performing attribute labeling processing on the target data to obtain corresponding attribute feature description;
according to the attribute feature description, feature continuity and discreteness of the target data are respectively evaluated to obtain a continuity evaluation value and a discreteness evaluation value;
dividing the target data into a continuity data set and a discreteness data set based on the continuity evaluation value and the discreteness evaluation value;
determining first noise in the continuous data set according to the set first information entropy;
determining second noise in the discrete dataset according to the set second information entropy;
Filtering the first noise from the continuous data set to obtain continuous net data, and filtering the second noise from the discrete data set to obtain discrete net data;
generating a net data set according to the continuous net data and the discrete net data.
CN202310265925.8A 2023-03-14 2023-03-14 Data noise filtering method and device and related products Pending CN116451033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310265925.8A CN116451033A (en) 2023-03-14 2023-03-14 Data noise filtering method and device and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310265925.8A CN116451033A (en) 2023-03-14 2023-03-14 Data noise filtering method and device and related products

Publications (1)

Publication Number Publication Date
CN116451033A true CN116451033A (en) 2023-07-18

Family

ID=87119270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310265925.8A Pending CN116451033A (en) 2023-03-14 2023-03-14 Data noise filtering method and device and related products

Country Status (1)

Country Link
CN (1) CN116451033A (en)

Similar Documents

Publication Publication Date Title
CN111813869B (en) Distributed data-based multi-task model training method and system
CN108932124A (en) neural network model compression method, device, terminal device and storage medium
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
US20170150214A1 (en) Method and apparatus for data processing
CN111144584A (en) Parameter tuning method, device and computer storage medium
CN109688183A (en) Group control device recognition methods, device, equipment and computer readable storage medium
CN112084017B (en) Memory management method and device, electronic equipment and storage medium
CN110652728A (en) Game resource management method and device, electronic equipment and storage medium
CN111580851A (en) Data management method and related device
CN109815298B (en) Method and device for determining character relationship network and storage medium
CN110706691B (en) Voice verification method and device, electronic equipment and computer readable storage medium
CN112966756A (en) Visual access rule generation method and device, machine readable medium and equipment
CN116451033A (en) Data noise filtering method and device and related products
CN115270161A (en) Encryption method and device based on encryption plug-in and related product
CN113891323B (en) WiFi-based user tag acquisition system
CN116415133A (en) Method and device for calculating data purity
CN109429282B (en) Frequency point configuration method and device
CN107168648B (en) File storage method and device and terminal
CN111461328B (en) Training method of neural network
CN112579618B (en) Feature library upgrading method and device, storage medium and computer equipment
CN113051126B (en) Portrait construction method, apparatus, device and storage medium
CN112843729A (en) Operation parameter determination method and device, computer equipment and storage medium
CN112820302A (en) Voiceprint recognition method and device, electronic equipment and readable storage medium
CN116415295A (en) Data security processing method and device and related products
CN111399733A (en) Method, device, control equipment and system for solving addiction of electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination