CN107391724A - A kind of screening technique of big data - Google Patents

A kind of screening technique of big data Download PDF

Info

Publication number
CN107391724A
CN107391724A CN201710646449.9A CN201710646449A CN107391724A CN 107391724 A CN107391724 A CN 107391724A CN 201710646449 A CN201710646449 A CN 201710646449A CN 107391724 A CN107391724 A CN 107391724A
Authority
CN
China
Prior art keywords
screening
dimension
data
big data
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201710646449.9A
Other languages
Chinese (zh)
Inventor
徐秋养
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Deep Research Information Technology Co Ltd
Original Assignee
Foshan Deep Research Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Deep Research Information Technology Co Ltd filed Critical Foshan Deep Research Information Technology Co Ltd
Priority to CN201710646449.9A priority Critical patent/CN107391724A/en
Publication of CN107391724A publication Critical patent/CN107391724A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/735Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The embodiments of the invention provide a kind of screening technique of big data, methods described includes:Screening analysis is carried out to the big data in big data group to be screened according to target dimension screening dimension;It will meet that preparatory condition requires, data corresponding at least one dimension subitem under the object filtering dimension save as the data group to be screened of next round;According to the quantity and target call of default screening dimension, determine whether number of screening round terminates to meet default screening quantity;If it is, terminate the screening process of the big data.Using the embodiment of the present invention, the present invention carries out Stepwise Screening by multi-turns screen analysis to data, will not factor data amount excessive the problem of causing system burden excessive so as to collapse, and target call screens the reference value setting of analysis according to data group to be screened in the wheel, improves the degree of accuracy of screening analysis.

Description

A kind of screening technique of big data
Technical field
The present invention relates to data processing field, more particularly to a kind of screening technique of big data.
Background technology
With the high speed development of informationization, big data is arisen at the historic moment, and such amount can not be handled greatly to make up conventional method And the defects of non-structural big data, people investigated cloud computing, information storage based on cloud computing, shares and excavates Means, it is marked down, effectively by these are a large amount of, at a high speed, diverse terminal big data store, but how to these Data carry out screening analysis, and carry out guidance to business decision from different dimensions using the selection result and have become popular words Topic.
In the prior art, the screening assays to data are only to carry out deployment analysis under certain single dimension to data, Or screening is combined under multiple dimensions.Screening defect under single dimension be if data message point be hidden in it is multiple Screen under dimension, be then difficult to be found;When the defects of combined sorting, is to determine certain dimension subitem to carry out data analysis, son The selection of item is largely dependent on the experience of the people judged, causes the estimate of situation for mistake easily occur.It is either single The screening mode of dimension or the screening mode for combining dimension, in screening process because have selected the screening dimension of mistake and When can not obtain final the selection result, it is required to re-start screening, has a strong impact on screening efficiency.
For example, in video field, generally realized on the operational platform by the combination of different screening dimensions to target information Flow or interim card situation monitoring analysis, screening dimension include:Region, city, operating system, browser, sex, age Section etc., the monitoring method of prior art are to choose its subitem respectively to target information in all screening dimensions according to previous experience Screening analysis is combined, if the target information is exactly problem information point, completes monitoring, otherwise chooses screening dimension again Other permutation and combination of degree subitem carry out screening analysis and complete monitoring.Although this method can be realized to video flow, video cardton Etc. the monitoring of information, but whole processing procedure information processing capacity is big, causes processor to be born larger, treatment effeciency is low, is unfavorable for Popularization and application.Also, the information point of doubtful problem is have found even with this method, due to other a large amount of permutation and combination be present May, therefore also be difficult to confirm that the information point is exactly optimal.
The content of the invention
The purpose of the embodiment of the present invention is to provide a kind of screening technique of big data, to improve the accurate of screening analysis Degree.
In order to achieve the above object, the embodiment of the invention discloses a kind of screening technique of big data, methods described to include:
Screening analysis is carried out to the big data in big data group to be screened according to target dimension screening dimension;
It will meet that data that preparatory condition requires, corresponding at least one dimension subitem under the object filtering dimension preserve For the data group to be screened of next round;
According to the quantity and target call of default screening dimension, determine whether number of screening round terminates to meet default screening quantity;
If it is, terminate the screening process of the big data.
Optionally, according to the quantity and target call of default screening dimension, determine whether number of screening round terminates to meet in advance If before screening quantity, also include in methods described:
Query Result table is established, the selection result of each round is put into the Query Result table;
The quantity and target call according to default screening dimension, determines whether number of screening round terminates to meet default screening number Amount, including:
According to the quantity and target call of default screening dimension, determine whether number of screening round terminates according to the Query Result table Meet default screening quantity.
Optionally, the index of the Query Result includes, and is established and indexed according to screening conditions, by being stored in the index In the page number find corresponding record in Query Result table.
Optionally, it is described by meet target call, corresponding to it is described screening dimension under at least one dimension subitem Data save as the data group to be screened of next round after, generation and preserve corresponding screening path, and in each round screening point Analysis can be recalled, and after recalling, the screening path for having generated and having preserved under the screening analysis recalled is deleted.
Optionally, the target call is data in the data group to be screened corresponding number under each dimension subitem Value is maximum or minimum, and the absolute value of the difference of greatest measure and minimum value is more than predetermined threshold;Or number under each dimension subitem It is more than preset range relative to the fluctuation range of reference value according to corresponding numerical value.
Wherein, the wheel of the multi-turns screen analysis is several determines according to the quantity and target call of screening dimension.
Screening assays provided by the invention and system, pending data is carried out progressively by multiple screening dimensions Screening, forms multi-turns screen analysis, and each round screening analysis is treated last round of the selection result as epicycle screening analysis Garbled data group so that data volume of the often wheel screening analysis all than last round of screening analysis is small therefore disposable with prior art Screening is combined under multiple screening conditions to compare, it is not easy to which factor data amount is excessive to cause system burden excessive so as to collapse The problem of, and the target call to be met is sub in the screening of the wheel all in accordance with its data group to be screened in each round screening analysis Reference value under is set, and improves the degree of accuracy of screening analysis.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of the screening technique of big data provided in an embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.
Fig. 1 is the schematic flow sheet of the screening technique of big data provided in an embodiment of the present invention, and the method comprising the steps of:
S101, screening analysis is carried out to the big data in big data group to be screened according to target dimension screening dimension.
S102, will meet preparatory condition requirement, corresponding at least one dimension subitem under the object filtering dimension Data save as the data group to be screened of next round.
S103, according to the quantity and target call of default screening dimension, determine whether number of screening round terminates to meet to preset Quantity is screened, if it is, performing S104.
S104, terminate the screening process of the big data.
The wheel number that multi-turns screen is analyzed in the screening assays is determined by the quantity and target call of screening dimension.
By being set to the attribute that data have in the embodiment of the present invention, and it is to screen the attribute setup of adaptation Attribute, that is, obtain screen dimension.The screening assays of embodiment illustrated in fig. 1 are carried out more by multiple screening dimensions to data Wheel screening analysis obtains the selection result, and each round screening analysis is treated last round of the selection result as epicycle screening analysis Garbled data group so that data volume of the often wheel screening analysis all than last round of screening analysis is small therefore disposable with prior art Screening is combined under multiple screening conditions to compare, it is not easy to which factor data amount is excessive to cause system burden excessive so as to collapse Problem, and each round screening analysis in the target call to be met all in accordance with its data group to be screened under the screening subitem of the wheel Reference value set, improve screening analysis the degree of accuracy.
When the data of unmet target call are analyzed in the screening of a certain wheel, enter if no longer reselecting screening dimension Row screening analysis, then the screening path before showing is wrong, now, in addition to step S204:Wrong screening analysis is recalled, is deleted Except the screening path for having generated and having preserved under the screening analysis recalled.During screening is analyzed, if it find that the choosing of a certain wheel The dimension subitem selected is wrong, and screening path is incorrect, by recalling wheel screening analysis and deleting the screening path so that more The data that wheel screening analysis obtains are removed in wheel screening analysis turns into the data group to be screened of next round, can avoid from initial The data of beginning reselect the screening dimension for deleting the wheel dimension subitem or its subitem screen the trouble of analysis.
The further optimization of the embodiment of the present invention, the target call in the embodiment of the present invention include:In data group to be screened Data corresponding to numerical value is maximum, numerical value corresponding to the data in data group to be screened is minimum and greatest measure and minimum value The absolute value of difference be more than predetermined threshold;Or numerical value corresponding to data is big relative to the fluctuation range of reference value under each dimension subitem In preset range.The historical data of predetermined threshold, reference value and preset range in historical data base determines.It is of the invention real Substantial amounts of historical results data that example can have system are applied as reference, and with this given threshold and scope, using waiting to sieve Maximum, minimum value and predetermined threshold in data group under dimension subitem or reference value and preset range is selected to carry out screening point Analysis, and the selection result that screening analysis obtains every time is maintained in historical data base, is coached, is gone through for later screening analysis History database constantly by more and more accurate data extending and renewal, compared with the prior art in the selection made according to personal experience It is higher to carry out the degree of accuracy for screening analysis.
Exemplary, industry wants to check that user in certain specific time period watches flow that video uses to send out on service platform Now during hiding information, multiple screening dimensions, such as region, operating system, browser are first set, wherein under each screening conditions There is respective dimension subitem, for example, region includes the part province of the China such as Beijing, Shanghai, Tianjin, Guangdong, operating system Including Windows, Android, IOS system, browser includes 360 browsers, baidu browser, Google's browser.
First round screening analysis is performed, process is as follows.
It is that user watches the flow that uses of video as data group to be screened, random selection using the data in initial data base One screening dimension, such as region, are screened under the screening dimension.Target call determining unit determines wheel screening analysis Middle target call is the maximum and minimum value that user uses flow under the subitem for searching out region dimension, and maximum and minimum The difference of value is more than predetermined threshold, and predetermined threshold is defined as 1000T by predetermined threshold determining unit and historical data base.
The user that the ground such as Beijing, Shanghai, Tianjin, Guangdong are obtained by screening analytic unit watches the flow that video uses: Pekinese user has used 568T, and the user in Shanghai has used 642T, and the user of Tianjin has used 295T, and the user in Guangdong uses 1546T.Thus it is Guangdong 1546T to obtain maximum, and minimum value is Tianjin 295T, while the difference of maximin is 1251T, More than predetermined threshold 1000T.Meet data demand, therefore data group to be screened using flow under dimension subitem Guangdong and Tianjin Generation unit is by the data group to be screened that next round is saved as using flow in Guangdong and Tianjin.Also, as depicted at step 203, After the data group to be screened of next round is saved, the generation of screening path processing unit and the corresponding screening path of preservation.
Perform the second wheel screening analysis.
Data group to be screened has been changed to Tianjin, In Guangdong Province user watches the flow of video.Selection operation system conduct The screening dimension of epicycle, target call determining unit determine that target call is to search out operating system dimension in wheel screening analysis Subitem under user use the maximum of flow, while calculated minimum, and the difference of maximum and minimum value is more than predetermined threshold, Predetermined threshold is defined as 50T by predetermined threshold determining unit and historical data base in epicycle screening analysis.
Repeat step 202 and step 203:By screen analytic unit obtain the user of In Guangdong Province using Windows, The flow that Android and IOS viewing video use is respectively 658T, 423T and 460T, and the user of Efficiency in Buildings in Tianjin Area makes The flow used with Windows, Android and IOS viewing video is 132T, 95T and 60T respectively, is thus obtained The user of In Guangdong Province is 658T using the maximum of flow, and minimum value 423T, the difference of maximin is 235T;Tianjin The family in area is 132T using the maximum of flow, and minimum value 60T, the difference of maximin is 72T.Two regional maximums are most Small value is all higher than predetermined threshold, thus under In Guangdong Province using Windows systems user flow and Efficiency in Buildings in Tianjin Area under use The flow of the user of Windows systems meets target call.Therefore data group generation unit to be screened is by the use in Guangdong and Tianjin The data group to be screened that the flow that video uses saves as next round is watched at family under using Windows systems.Also, such as step Shown in 203, after the data group to be screened of next round is saved, the generation of screening path processing unit and corresponding screening road is preserved Footpath.
Perform third round screening analysis.
Screening dimension is browser, and subitem is 360 browsers, baidu browser and Google's browser.Target call determines Unit determines that the target call in epicycle screening analysis is the maximum that user uses flow under the subitem for searching out browser dimension Value, while calculated minimum, and the difference of maximum and minimum value is more than predetermined threshold, predetermined threshold is by pre- in epicycle screening analysis Determine threshold value determination unit and historical data base is defined as 3 multiple values of minimum value under each subitem.
In Guangdong Province Windows user is obtained by screening analytic unit and uses 360 browsers, baidu browser and Google The flow that uses of browser viewing video is respectively 75T, 31T and 158T, Efficiency in Buildings in Tianjin Area Windows user using 360 browsers, The flow that baidu browser and Google's browser viewing video use is 12T, 5T and 23T respectively, thus obtains In Guangdong Province Windows user is 158T using the maximum of flow, and minimum value 31T, the difference of maximin is 127T, more than predetermined threshold Value 92T;Efficiency in Buildings in Tianjin Area Windows user is 23T using the maximum of flow, minimum value 5T, and the difference of maximin is 18T, more than predetermined threshold 15T.The Windows user in two areas uses flow in wheel screening analysis under respective subitem Maximin is all higher than predetermined threshold, thus In Guangdong Province Windows user using Google's browser viewing video flow and Efficiency in Buildings in Tianjin Area Windows user meets target call using the flow of Google's browser viewing video.Now data group to be screened The Windows user of Guangdong and Tianjin is watched the flow that video uses under Google's browser and saves as next round by generation unit Data group to be screened.Also, after as depicted at step 203, the data group to be screened of next round is saved, the processing of screening path is single Member generation and the corresponding screening path of preservation.
By judging that obtaining the screening analysis under all screening dimensions is performed both by finishing, therefore the selection result is screened for third round Data group to be screened is obtained in analysis, i.e. the Windows user in Guangdong and Tianjin watches what video used under Google's browser Flow.The selection result is stored in historical data base to update historical data base.Path is screened in third round screening analysis The screening path that processing unit is generated and preserved can make as the flow for inquiring about user's viewing video in the special time next time With the entrance of the query composition of situation.
It should be noted that herein, such as first and second or the like relational terms are used merely to a reality Body or operation make a distinction with another entity or operation, and not necessarily require or imply and deposited between these entities or operation In any this actual relation or order.Moreover, term " comprising ", "comprising" or its any other variant are intended to Nonexcludability includes, so that process, method, article or equipment including a series of elements not only will including those Element, but also the other element including being not expressly set out, or it is this process, method, article or equipment also to include Intrinsic key element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that Other identical element also be present in process, method, article or equipment including the key element.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the scope of the present invention.It is all Any modification, equivalent substitution and improvements made within the spirit and principles in the present invention etc., are all contained in protection scope of the present invention It is interior.

Claims (5)

1. a kind of screening technique of big data, it is characterised in that methods described includes:
Screening analysis is carried out to the big data in big data group to be screened according to target dimension screening dimension;
It will meet that data that preparatory condition requires, corresponding at least one dimension subitem under the object filtering dimension preserve For the data group to be screened of next round;
According to the quantity and target call of default screening dimension, determine whether number of screening round terminates to meet default screening quantity;
If it is, terminate the screening process of the big data.
2. big data screening technique according to claim 1, it is characterised in that according to it is default screening dimension quantity and Target call, before determining whether number of screening round terminates to meet default screening quantity, also include in methods described:
Query Result table is established, the selection result of each round is put into the Query Result table;
The quantity and target call according to default screening dimension, determines whether number of screening round terminates to meet default screening number Amount, including:
According to the quantity and target call of default screening dimension, determine whether number of screening round terminates according to the Query Result table Meet default screening quantity.
3. big data screening technique according to claim 1, it is characterised in that the index of the Query Result includes, root Establish and index according to screening conditions, corresponding record in Query Result table is found by the page number being stored in the index.
4. the screening technique of big data according to claim 1, it is characterised in that it is described by meet target call, It is raw after saving as the data group to be screened of next round corresponding to the data for screening at least one dimension subitem under dimension Path is screened accordingly into preservation, and can be recalled in each round screening analysis, after recalling, the screening analysis recalled Under the screening path that has generated and preserved be deleted.
5. the screening technique of big data according to claim 1, it is characterised in that the target call is described to wait to sieve Select the corresponding numerical value under each dimension subitem of the data in data group maximum or minimum, and greatest measure and minimum value it The absolute value of difference is more than predetermined threshold;Or numerical value corresponding to data is more than relative to the fluctuation range of reference value under each dimension subitem Preset range.
CN201710646449.9A 2017-08-01 2017-08-01 A kind of screening technique of big data Withdrawn CN107391724A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710646449.9A CN107391724A (en) 2017-08-01 2017-08-01 A kind of screening technique of big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710646449.9A CN107391724A (en) 2017-08-01 2017-08-01 A kind of screening technique of big data

Publications (1)

Publication Number Publication Date
CN107391724A true CN107391724A (en) 2017-11-24

Family

ID=60342962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710646449.9A Withdrawn CN107391724A (en) 2017-08-01 2017-08-01 A kind of screening technique of big data

Country Status (1)

Country Link
CN (1) CN107391724A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460097A (en) * 2018-02-01 2018-08-28 广东聚晨知识产权代理有限公司 A kind of intelligent screening system of big data
CN110993117A (en) * 2019-12-26 2020-04-10 北京亚信数据有限公司 Abnormal medical insurance identification method and device based on medical big data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150400A (en) * 2013-03-27 2013-06-12 领航动力信息系统有限公司 MapReduce-framework-based data screening method
CN104408084A (en) * 2014-11-06 2015-03-11 北京锐安科技有限公司 Method and device for screening big data
CN105893408A (en) * 2015-11-13 2016-08-24 乐视云计算有限公司 Screening analysis method and system for big data
CN106933904A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The filter method and device of data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150400A (en) * 2013-03-27 2013-06-12 领航动力信息系统有限公司 MapReduce-framework-based data screening method
CN104408084A (en) * 2014-11-06 2015-03-11 北京锐安科技有限公司 Method and device for screening big data
CN105893408A (en) * 2015-11-13 2016-08-24 乐视云计算有限公司 Screening analysis method and system for big data
CN106933904A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The filter method and device of data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108460097A (en) * 2018-02-01 2018-08-28 广东聚晨知识产权代理有限公司 A kind of intelligent screening system of big data
CN110993117A (en) * 2019-12-26 2020-04-10 北京亚信数据有限公司 Abnormal medical insurance identification method and device based on medical big data

Similar Documents

Publication Publication Date Title
CN110941594B (en) Splitting method and device of video file, electronic equipment and storage medium
CN104717124B (en) A kind of friend recommendation method, apparatus and server
CN102104635B (en) Method and device for updating Internet protocol (IP) address base
CN105868679A (en) Fingerprint information dynamic update method and fingerprint identification device
CN111345011A (en) APP pushing method and device, electronic equipment and computer readable storage medium
CN109542289B (en) MES operation method, device, equipment and storage medium
CN107391724A (en) A kind of screening technique of big data
EP3349126A1 (en) Method, device, storage medium, and apparatus for automatically discovering fuel station poi
CN102262660B (en) Method and device implemented by computer and used for obtaining search result
CN111294819A (en) Network optimization method and device
CN102073684A (en) Method and device for excavating search log and page search method and device
CN105893408A (en) Screening analysis method and system for big data
CN106021556A (en) Address information processing method and device
CN105159884A (en) Method and device for establishing industry dictionary and industry identification method and device
CN108319672A (en) Mobile terminal malicious information filtering method and system based on cloud computing
CN106156111A (en) Patent document search method, device and system
CN104281688B (en) A kind of automatic cleaning method and device for browser
CN106126563B (en) Single-time-phase full-coverage retrieval method for remote sensing data based on spatial secondary filtering
CN110704773B (en) Abnormal behavior detection method and system based on frequent behavior sequence mode
CN102915313A (en) Error correction relation generation method and system in web search
CN106569734B (en) The restorative procedure and device that memory overflows when data are shuffled
CN106570058A (en) Searching method and search engine
CN114448775B (en) Equipment fault information processing method and device, electronic equipment and storage medium
CN108460097A (en) A kind of intelligent screening system of big data
CN105871650A (en) Data updating method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20171124

WW01 Invention patent application withdrawn after publication