CN109947803A - A kind of data processing method, system and storage medium - Google Patents

A kind of data processing method, system and storage medium Download PDF

Info

Publication number
CN109947803A
CN109947803A CN201910186327.5A CN201910186327A CN109947803A CN 109947803 A CN109947803 A CN 109947803A CN 201910186327 A CN201910186327 A CN 201910186327A CN 109947803 A CN109947803 A CN 109947803A
Authority
CN
China
Prior art keywords
characteristic information
apparatus characteristic
information
rule
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910186327.5A
Other languages
Chinese (zh)
Other versions
CN109947803B (en
Inventor
贾思阳
韩孟龙
孟菲
王二飞
车文彬
闫柄任
刘克恒
高子惠
郭丽娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu panorama Intelligent Technology Co.,Ltd.
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201910186327.5A priority Critical patent/CN109947803B/en
Publication of CN109947803A publication Critical patent/CN109947803A/en
Application granted granted Critical
Publication of CN109947803B publication Critical patent/CN109947803B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of data processing method, system and storage mediums.Its method includes: to obtain apparatus characteristic information set, and the set element in the apparatus characteristic information set is apparatus characteristic information, the facility information of apparatus characteristic information terminal device for identification;Determine the frequency of occurrence of each apparatus characteristic information in the apparatus characteristic information set;Top n apparatus characteristic information is obtained according to frequency of occurrence descending;The top n apparatus characteristic information and processing request instruction are sent to data processing client-side interface;The processing result that the data processing client-side interface returns is received, the processing result is the processing result that the instruction instructed according to the processing request handles the top n apparatus characteristic information.The treatment effeciency of data processing method provided in an embodiment of the present invention is higher.

Description

A kind of data processing method, system and storage medium
Technical field
The present invention relates to technical field of data processing more particularly to a kind of data processing methods, system and storage medium.
Background technique
The purpose of to realize security protection, creation user's portrait etc., the equipment such as router are often to accessing its terminal Equipment carries out equipment identification.
Current device identification method is mainly host name (hostname) information of real-time acquisition terminal equipment, using pre- The regularity collection first obtained matches host name information, carries out equipment identification according to matching result.
Wherein, other regular collections for needing to use are by manually to mass data in regularity collection or equipment identification Obtained from being analyzed and processed, treatment effeciency is low.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind State device identification method, system and the storage medium of problem.
In a first aspect, the embodiment of the invention provides a kind of data processing methods, comprising:
Apparatus characteristic information set is obtained, the set element in the apparatus characteristic information set is apparatus characteristic information, The facility information of apparatus characteristic information terminal device for identification;
Determine the frequency of occurrence of each apparatus characteristic information in the apparatus characteristic information set;
Top n apparatus characteristic information is obtained according to frequency of occurrence descending;
The top n apparatus characteristic information and processing request instruction are sent to data processing client-side interface;
The processing result that the data processing client-side interface returns is received, the processing result is asked according to the processing The a plurality of rule for asking the instruction of instruction to handle the top n apparatus characteristic information.
Inventor has found in the implementation of the present invention, in the apparatus characteristic information set for extracting regularity collection In the presence of a large amount of duplicate apparatus characteristic informations, there is also a large amount of single apparatus characteristic informations.And in apparatus characteristic information set In duplicate apparatus characteristic information it is more, illustrate bigger, the single apparatus characteristic information of probability that the apparatus characteristic information occurs The probability repeated is smaller.Therefore, apparatus characteristic information duplicate in apparatus characteristic information set can be considered as one to set Standby characteristic information, and then apparatus characteristic information is ranked up according to frequency of occurrence, top n apparatus characteristic information is sent to number According to processing client-side interface, to be used for create-rule.As it can be seen that method provided in an embodiment of the present invention, passes through the data of automation Treatment process filters out partial data for create-rule from mass data, even if by manually to the data screened Analysis create-rule is carried out, workload is greatly reduced, and treatment effeciency gets a promotion.Using side provided in an embodiment of the present invention Method carries out the screening of data, has also been effectively ensured in follow-up equipment identification process, the coverage area of rule match.
With reference to first aspect, described according to appearance frequency in the first implementation of first aspect of the embodiment of the present invention Before secondary descending obtains top n apparatus characteristic information, the method also includes:
According to the functional relation of the object statistics value of preceding i apparatus characteristic information and sequence value i, obtain predetermined Inflection point object statistics value corresponding sequence value i=N, 1≤i≤I, I are the equipment feature in the apparatus characteristic information set Information sum, the object statistics value are number of elements of the preceding i apparatus characteristic information in the apparatus characteristic information set It is preceding i equipment feature with the ratio of the element total quantity in the apparatus characteristic information set or the object statistics value Probability density of the information in the apparatus characteristic information set.
Method provided in an embodiment of the present invention is determined the value of N using edge effect, set as much as possible by what is repeated Standby characteristic information covers, and excludes individual equipment characteristic information.
With reference to first aspect or the first implementation of first aspect, at second of first aspect of the embodiment of the present invention In implementation, the processing request instruction is characterized rule information creation instruction, and the rule is for describing equipment feature The regularity of the matching relationship of information and facility information;The method also includes:
The regularity is added in characteristic information rule set.
With reference to first aspect or the first implementation of first aspect, in the third of first aspect of the embodiment of the present invention In implementation, the processing request instruction is mapping ruler creation instruction, the method also includes:
Obtain the facility information identified to the top n apparatus characteristic information;
The facility information is sent to the processing client-side interface, to determine according to the facility information for retouching The mapping ruler between the facility information and standard device information is stated, the rule is the mapping ruler.
The third implementation with reference to first aspect, in the 4th kind of implementation of first aspect of the embodiment of the present invention In, before the acquisition apparatus characteristic information set, the method also includes:
Apparatus characteristic information is obtained, and identifies to obtain facility information according to the apparatus characteristic information;
The facility information is matched using the mapping ruler set pre-established;
If not matching to obtain standard device information, the apparatus characteristic information is added in cluster tool.
Method provided in an embodiment of the present invention, before obtaining apparatus characteristic information set, i.e., to apparatus characteristic information into Row screening, the apparatus characteristic information covered in existing mapping ruler set is filtered, data volume is further reduced, and is improved Treatment effeciency.
With reference to first aspect or the first implementation of first aspect, at the 5th kind of first aspect of the embodiment of the present invention In implementation, the apparatus characteristic information includes user agent's information.
With reference to first aspect or the first implementation of first aspect, at the 6th kind of first aspect of the embodiment of the present invention In implementation, the apparatus characteristic information includes host name information.
Second aspect, the embodiment of the invention provides a kind of data processing systems, comprising:
Information aggregate acquiring unit, the collection for obtaining apparatus characteristic information set, in the apparatus characteristic information set Conjunction element is apparatus characteristic information, the facility information of apparatus characteristic information terminal device for identification;
Apparatus characteristic information frequency of occurrence statistic unit, for determining, each equipment is special in the apparatus characteristic information set The frequency of occurrence of reference breath;
Frequency of occurrence sequencing unit, for obtaining top n apparatus characteristic information according to frequency of occurrence descending;
Request instruction transmission unit, for the top n apparatus characteristic information and processing request instruction to be sent to data Handle client-side interface;
Processing result receiving unit, the processing result returned for receiving the data processing client-side interface, the place Reason is the result is that a plurality of rule that the instruction instructed according to the processing request handles the top n apparatus characteristic information Then.
Inventor has found in the implementation of the present invention, in the apparatus characteristic information set for extracting regularity collection In the presence of a large amount of duplicate apparatus characteristic informations, there is also a large amount of single apparatus characteristic informations.And in apparatus characteristic information set In duplicate apparatus characteristic information it is more, illustrate bigger, the single apparatus characteristic information of probability that the apparatus characteristic information occurs The probability repeated is smaller.Therefore, apparatus characteristic information duplicate in apparatus characteristic information set can be considered as one to set Standby characteristic information, and then apparatus characteristic information is ranked up according to frequency of occurrence, top n apparatus characteristic information is sent to number According to processing client-side interface, to be used for create-rule.As it can be seen that system provided in an embodiment of the present invention, passes through the data of automation Treatment process filters out partial data for create-rule from mass data, even if by manually to the data screened Analysis create-rule is carried out, workload is greatly reduced, and treatment effeciency gets a promotion.Using system provided in an embodiment of the present invention System carries out the screening of data, has also been effectively ensured in follow-up equipment identification process, the coverage area of rule match.
In conjunction with second aspect, in the first implementation of second aspect of the embodiment of the present invention, the system also includes Threshold value determination unit is used for:
Before obtaining top n apparatus characteristic information according to frequency of occurrence descending, according to the target of preceding i apparatus characteristic information The functional relation of statistical value and sequence value i, obtains the corresponding sequence value i=N of predetermined inflection point object statistics value, and 1 ≤ i≤I, I are the apparatus characteristic information sum in the apparatus characteristic information set, and the object statistics value is preceding i equipment Element sum of the characteristic information in the number of elements and the apparatus characteristic information set in the apparatus characteristic information set The ratio of amount or the object statistics value are probability of the preceding i apparatus characteristic information in the apparatus characteristic information set Density.
Method provided in an embodiment of the present invention is determined the value of N using edge effect, set as much as possible by what is repeated Standby characteristic information covers, and excludes individual equipment characteristic information.
In conjunction with the first of second aspect or second aspect implementation, at second of second aspect of the embodiment of the present invention In implementation, the processing request instruction is characterized rule information creation instruction, and the processing result is for describing equipment The regularity of the matching relationship of characteristic information and facility information;The system also includes information adding units, are used for:
The regularity is added in characteristic information rule set.
In conjunction with the first of second aspect or second aspect implementation, in the third of second aspect of the embodiment of the present invention In implementation, the processing request instruction is mapping ruler creation instruction, the system also includes facility information transmission unit, For:
Obtain the facility information identified to the top n apparatus characteristic information;
The facility information is sent to the processing client-side interface, to determine according to the facility information for retouching The mapping ruler between the facility information and standard device information is stated, the rule is the mapping ruler.
In conjunction with the third implementation of second aspect, in the 4th kind of implementation of second aspect of the embodiment of the present invention In, the system also includes facility information characteristic set updating units, it is used for:
Apparatus characteristic information is obtained, and identifies to obtain facility information according to the apparatus characteristic information;
The facility information is matched using the mapping ruler set pre-established;
If not matching to obtain standard device information, the apparatus characteristic information is added in apparatus characteristic information set.
System provided in an embodiment of the present invention, before obtaining apparatus characteristic information set, i.e., to apparatus characteristic information into Row screening, the apparatus characteristic information covered in existing mapping ruler set is filtered, data volume is further reduced, and is improved Treatment effeciency.
In conjunction with the first of second aspect or second aspect implementation, at the 5th kind of second aspect of the embodiment of the present invention In implementation, the apparatus characteristic information includes user agent's information.
In conjunction with the first of second aspect or second aspect implementation, at the 6th kind of second aspect of the embodiment of the present invention In implementation, the apparatus characteristic information includes host name information.
The third aspect, the embodiment of the present invention provide a kind of computer system, comprising:
One or more processors;
Memory;
One or more application program, wherein one or more of application programs are stored in the memory and quilt It is configured to be executed by one or more of processors, realizes the method as described in any implementation of first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, for being stored as above-mentioned third party The instruction of application program used in computer system described in face.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows system architecture diagram according to an embodiment of the invention;
Fig. 2 shows data processing method flow charts according to an embodiment of the invention;
Fig. 3 a shows the Hostname edge effect curve graph generated according to an embodiment of the present invention;
Fig. 3 b shows the Hostname edge effect curve graph generated according to a further embodiment of the invention;
Fig. 4 a shows the UA edge effect curve graph generated according to an embodiment of the present invention;
Fig. 4 b shows the UA edge effect curve graph generated according to a further embodiment of the invention;
Fig. 5 shows data processing system block diagram according to an embodiment of the invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Data processing method provided in an embodiment of the present invention and subsequent device identification method can with but not only limit application In system shown in FIG. 1.Within the system, in data handling procedure provided in an embodiment of the present invention, cloud server 103 is right Apparatus characteristic information is screened, and the apparatus characteristic information after screening is sent to data processing client 105, by data processing Client 105 generates corresponding rule according to the apparatus characteristic information after screening, and cloud server 103 receives data processing client The rule that end 105 returns;It carries out in equipment identification process, router 101 is used to acquire and report the terminal device 102 of access Information, the information that Cloud Server 103 is used to be reported according to router 101 identifies terminal 102, obtains its facility information, And the equipment letter of the terminal device 102 of the output couple in router 101 of the terminal device 104 by being equipped with destination application Breath.
Wherein, the router 101 in Fig. 1 may be replaced by other IOT (Internet of Things, Internet of Things) and set Standby or intelligent mobile terminal (such as smart phone, tablet computer).
Wherein, the terminal device 102 of access refers to couple in router 101 to be connected to the terminal of local area network or internet Equipment, such as intelligent mobile terminal (smart phone, tablet computer), intelligent appliance equipment, Intelligent office equipment, wearable intelligence Equipment etc..
Wherein, destination application refers to the application program that is communicated and can be controlled it with router 101.
Wherein, data processing client refers to the computer equipment for being equipped with display screen.
It should be pointed out that in other application scenarios or implementation, it can also be by the separate server on internet Or the function that the equipment in local area network replaces above-mentioned Cloud Server to realize, the embodiment of the present invention are not construed as limiting this.
Method provided in an embodiment of the present invention is described in detail below in conjunction with Fig. 2.
As shown in Fig. 2, data processing method provided in an embodiment of the present invention includes following operation:
Step 201 obtains apparatus characteristic information set, and the set element in the apparatus characteristic information set is equipment feature Information, the facility information of apparatus characteristic information terminal device for identification.
The embodiment of the present invention is not defined the data source and data memory format of apparatus characteristic information set.
It for example and without limitation, can capture apparatus characteristic information be added to equipment from internet by reptile instrument In characteristic information set, apparatus characteristic information can also be obtained by the interface of third party's data platform and is added to equipment feature letter In breath set, the apparatus characteristic information that setting condition is met in equipment identification process can also be added to apparatus characteristic information collection In conjunction.
Wherein, setting condition can be determined according to actual scene demand in practical applications.For example and without limitation, The setting condition may is that the facility information recognized does not include target information (such as device model), then corresponding equipment feature Information meets setting condition;And/or equipment identify used in rule set can not the equipment identification information of successful match meet and set Fixed condition.
For example and without limitation, apparatus characteristic information set can by but be not limited only to be stored in the form of data form In database.
Step 202, the frequency of occurrence for determining each apparatus characteristic information in above equipment characteristic information set.
There may be duplicate apparatus characteristic information in apparatus characteristic information set, duplicate number is the equipment feature Frequency of occurrence of the information in apparatus characteristic information set.
The embodiment of the present invention is not defined the definition of duplicate apparatus characteristic information, can be according to need in practical application It defines.For example, the identical apparatus characteristic information of content is duplicate apparatus characteristic information;In another example aiming field takes Being worth identical apparatus characteristic information is duplicate apparatus characteristic information.
In the embodiment of the present invention, there are many implementations of step 202, for example and without limitation, a kind of reality wherein In existing mode, successively determine object element in a predetermined sequence, create counter for object element, and traverse object element it Element afterwards deletes the element, and carry out cumulative behaviour to the counter of object element whenever having element and object element to repeat Make.In another implementation, the element in apparatus characteristic information set is compared two-by-two in a predetermined sequence, according to comparing As a result element is grouped, same group of duplicate element, can determine that each equipment is special by the number of elements in statistics each group The frequency of occurrence of reference breath.
Step 203 obtains top n apparatus characteristic information according to frequency of occurrence descending.
Top n apparatus characteristic information and processing request instruction are sent to data processing client-side interface by step 204.
Data processing client-side interface can with but be not limited only to by human-computer interaction interface show top n equipment feature letter Breath so that user analyzes, and obtains the control instruction of user by human-computer interaction interface to create-rule.
Step 205 receives the processing result that above-mentioned data processing client-side interface returns, which is according to above-mentioned The a plurality of rule that the instruction of processing request instruction handles top n apparatus characteristic information.
Inventor has found in the implementation of the present invention, in the apparatus characteristic information set for extracting regularity collection In the presence of a large amount of duplicate apparatus characteristic informations, there is also a large amount of single apparatus characteristic informations.And in apparatus characteristic information set In duplicate apparatus characteristic information it is more, illustrate bigger, the single apparatus characteristic information of probability that the apparatus characteristic information occurs The probability repeated is smaller.Therefore, apparatus characteristic information duplicate in apparatus characteristic information set can be considered as one to set Standby characteristic information, and then apparatus characteristic information is ranked up according to frequency of occurrence, top n apparatus characteristic information is sent to number According to processing client-side interface, to be used for create-rule.As it can be seen that method provided in an embodiment of the present invention, passes through the data of automation Treatment process filters out partial data for create-rule from mass data, even if by manually to the data screened Analysis create-rule is carried out, workload is greatly reduced, and treatment effeciency gets a promotion.Using side provided in an embodiment of the present invention Method carries out the screening of data, has also been effectively ensured in follow-up equipment identification process, the coverage area of rule match.
In the embodiment of the present invention, N is also possible to carry out in a predetermined manner either predetermined fixed value The dynamic value of adjustment.
If N is dynamic adjusted value, it is preferred that the edge effect that can use statistical value determines the value of N.It is specific: root According to the object statistics value of preceding i apparatus characteristic information and the functional relation of sequence value i, it is corresponding to obtain inflection point object statistics value Sort value i=N, and 1≤i≤I, I are the apparatus characteristic information sum in the apparatus characteristic information set, the object statistics Value is number of elements of the preceding i apparatus characteristic information in the apparatus characteristic information set and the apparatus characteristic information set In element total quantity ratio or the object statistics value be preceding i apparatus characteristic information in the apparatus characteristic information Probability density in set.
Wherein, inflection point object statistics value can be predetermined value, can also determine by other means.As an example rather than limit It is fixed, coordinate system is established with object statistics value and sequence value, and draw the target of preceding i apparatus characteristic information in the coordinate system The function relation curve (i.e. edge effect curve) of statistical value and sequence value i, shows the coordinate system by human-computer interaction interface And function relation curve.In a kind of implementation, generating and show can be along the display control that the function relation curve moves, should Display control is used to show the coordinate points information of its position, is also used to that the control event detected is reported (such as to click, double-click Deng);After receiving the target control event and corresponding coordinate points that display control reports, inflection point target is determined according to the coordinate points The value of statistical value and N.In another implementation, after detecting target control event, the target control event pair is obtained The cursor position answered determines the corresponding coordinate points of the cursor position according to predetermined mapping relations, true according to the coordinate points Determine inflection point object statistics value and the value of N.
Method provided in an embodiment of the present invention is determined the value of N using edge effect, set as much as possible by what is repeated Standby characteristic information covers, and excludes individual equipment characteristic information.
Method provided in an embodiment of the present invention can be applied in a variety of realization scenes.
For example, can use the update that method provided in an embodiment of the present invention realizes characteristic information rule set.On correspondingly, It states processing request instruction and is characterized rule information creation instruction, above-mentioned rule is for describing apparatus characteristic information and facility information Matching relationship regularity.After step 205, also regularity is added in characteristic information rule set.
Wherein, apparatus characteristic information can be UA (User Agent, user agent) information.
Wherein, apparatus characteristic information can also be hostname (host name) information.By taking hostname as an example, in database A large amount of hostname information can not identify brand and model by resolver, need to divide these hostname information Analysis, supplements corresponding regularity.Wherein, hostname rule magnitude about 20,000,000 to be combed counts hostname data The frequency and the descending arrangement of appearance, draw the edge effect curve graph as shown in Fig. 3 a or Fig. 3 b, root according to above-mentioned processing mode Hostname information is screened according to edge effect figure, it is only necessary to which the hostname information manually combed is 2000 or so.
In another example can use the update that method provided in an embodiment of the present invention realizes mapping ruler collection.Mapping ruler collection Facility information for obtaining identification is standardized mapping, obtains the facility information of standard.Correspondingly, the processing request Instruction is mapping ruler creation instruction, the method also includes: acquisition is identified to obtain to the top n apparatus characteristic information Facility information;The facility information is sent to the processing client-side interface, is used to be determined according to the facility information In describing the mapping ruler between the facility information and standard device information, the rule is the mapping ruler.
Further, before obtaining apparatus characteristic information set, apparatus characteristic information is obtained, and according to the equipment feature Information identifies to obtain facility information;The facility information is matched using the mapping ruler set pre-established;If not With standard device information is obtained, the apparatus characteristic information is added in cluster tool.
By taking UA information as an example, facility information triple (address Mac, brand, model) is obtained by UA resolver.Triple In brand, the double major keys of model can brand, model in Association repository (i.e. mapping ruler collection), and the result not being associated with then into Enter automatic evaluation mechanism (handling using above-mentioned data screening method into new).It, will be wait comb 7,000,000 in a specific example UA data, removal repeat and count the frequency of appearance, are arranged and are drawn as shown in figures 4 a and 4b according to statistics frequency descending Edge effect figure.According to edge effect figure, abscissa indicates the accounting of independent UA quantity and UA total amount in Fig. 4 a.With independence UA number be continuously increased, UA total amount accounting expands rapidly, and when UA total amount reaches certain magnitude, accounting is smooth-out and approaches 1.It is further found according to Fig. 4 b, the probability density covering almost 100% that preceding 1000 UA occurs.Before this illustrates combing 1000 independent UA, extracting rule information can radiate the almost all of UA data of covering, be based on the method for million grades of UA numbers The maintenance work of amount is reduced to 1000.
Method provided in an embodiment of the present invention, before obtaining apparatus characteristic information set, i.e., to apparatus characteristic information into Row screening, the apparatus characteristic information covered in existing mapping ruler set is filtered, data volume is further reduced, and is improved Treatment effeciency.
The embodiment of the invention provides a kind of data processing systems, as shown in Figure 5, comprising:
Information aggregate acquiring unit 501, for obtaining apparatus characteristic information set, in the apparatus characteristic information set Set element is apparatus characteristic information, the facility information of apparatus characteristic information terminal device for identification;
Apparatus characteristic information frequency of occurrence statistic unit 502, for determining each in the apparatus characteristic information set set The frequency of occurrence of standby characteristic information;
Frequency of occurrence sequencing unit 503, for obtaining top n apparatus characteristic information according to frequency of occurrence descending;
Request instruction transmission unit 504, for the top n apparatus characteristic information and processing request instruction to be sent to number According to processing client-side interface;
Processing result receiving unit 505, the processing result returned for receiving the data processing client-side interface are described It is a plurality of that processing result is that the instruction instructed according to the processing request handles the top n apparatus characteristic information Rule.
Inventor has found in the implementation of the present invention, in the apparatus characteristic information set for extracting regularity collection In the presence of a large amount of duplicate apparatus characteristic informations, there is also a large amount of single apparatus characteristic informations.And in apparatus characteristic information set In duplicate apparatus characteristic information it is more, illustrate bigger, the single apparatus characteristic information of probability that the apparatus characteristic information occurs The probability repeated is smaller.Therefore, apparatus characteristic information duplicate in apparatus characteristic information set can be considered as one to set Standby characteristic information, and then apparatus characteristic information is ranked up according to frequency of occurrence, top n apparatus characteristic information is sent to number According to processing client-side interface, to be used for create-rule.As it can be seen that system provided in an embodiment of the present invention, passes through the data of automation Treatment process filters out partial data for create-rule from mass data, even if by manually to the data screened Analysis create-rule is carried out, workload is greatly reduced, and treatment effeciency gets a promotion.Using system provided in an embodiment of the present invention System carries out the screening of data, has also been effectively ensured in follow-up equipment identification process, the coverage area of rule match.
Optionally, the system also includes threshold value determination units, are used for:
Before obtaining top n apparatus characteristic information according to frequency of occurrence descending, according to the target of preceding i apparatus characteristic information The functional relation of statistical value and sequence value i, obtains the corresponding sequence value i=N of predetermined inflection point object statistics value, and 1 ≤ i≤I, I are the apparatus characteristic information sum in the apparatus characteristic information set, and the object statistics value is preceding i equipment Element sum of the characteristic information in the number of elements and the apparatus characteristic information set in the apparatus characteristic information set The ratio of amount or the object statistics value are probability of the preceding i apparatus characteristic information in the apparatus characteristic information set Density.
Method provided in an embodiment of the present invention is determined the value of N using edge effect, set as much as possible by what is repeated Standby characteristic information covers, and excludes individual equipment characteristic information.
Optionally, the processing request instruction is characterized rule information creation instruction, and the processing result is for describing The regularity of the matching relationship of apparatus characteristic information and facility information;The system also includes information adding units, are used for:
The regularity is added in characteristic information rule set.
Optionally, the processing request instruction is mapping ruler creation instruction, and the system also includes facility information transmission Unit is used for:
Obtain the facility information identified to the top n apparatus characteristic information;
The facility information is sent to the processing client-side interface, to determine according to the facility information for retouching The mapping ruler between the facility information and standard device information is stated, the rule is the mapping ruler.
Optionally, it the system also includes facility information characteristic set updating unit, is used for:
Apparatus characteristic information is obtained, and identifies to obtain facility information according to the apparatus characteristic information;
The facility information is matched using the mapping ruler set pre-established;
If not matching to obtain standard device information, the apparatus characteristic information is added in apparatus characteristic information set.
System provided in an embodiment of the present invention, before obtaining apparatus characteristic information set, i.e., to apparatus characteristic information into Row screening, the apparatus characteristic information covered in existing mapping ruler set is filtered, data volume is further reduced, and is improved Treatment effeciency.
Optionally, the apparatus characteristic information includes user agent's information.
In conjunction with the first of second aspect or second aspect implementation, at the 6th kind of second aspect of the embodiment of the present invention In implementation, the apparatus characteristic information includes host name information.
The embodiment of the present invention provides a kind of computer system, comprising:
One or more processors;
Memory;
One or more application program, wherein one or more of application programs are stored in the memory and quilt It is configured to be executed by one or more of processors, realizes the method as described in any of the above-described implementation.
The embodiment of the present invention provides a kind of computer readable storage medium, for being stored as used in above-mentioned computer system Application program instruction.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments in this include institute in other embodiments Including certain features rather than other feature, but the combination of the feature of different embodiment means in the scope of the present invention Within and form different embodiments.For example, in the following claims, embodiment claimed it is any it One can in any combination mode come using.
Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) Lai Shixian according to the system in the embodiment of the present invention in some or all portions The some or all functions of part.The present invention is also implemented as a part or complete for executing method as described herein The device or device program (for example, computer program and computer program product) in portion.It is such to realize program of the invention It can store on a computer-readable medium, or may be in the form of one or more signals.Such signal can be with It downloads from internet website, is perhaps provided on the carrier signal or is provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.
The invention discloses:
A1, a kind of data processing method, comprising:
Apparatus characteristic information set is obtained, the set element in the apparatus characteristic information set is apparatus characteristic information, The facility information of apparatus characteristic information terminal device for identification;
Determine the frequency of occurrence of each apparatus characteristic information in the apparatus characteristic information set;
Top n apparatus characteristic information is obtained according to frequency of occurrence descending;
The top n apparatus characteristic information and processing request instruction are sent to data processing client-side interface;
The processing result that the data processing client-side interface returns is received, the processing result is asked according to the processing The a plurality of rule for asking the instruction of instruction to handle the top n apparatus characteristic information.
A2, method according to a1, it is described according to frequency of occurrence descending obtain top n apparatus characteristic information before, institute State method further include:
According to the functional relation of the object statistics value of preceding i apparatus characteristic information and sequence value i, obtain predetermined Inflection point object statistics value corresponding sequence value i=N, 1≤i≤I, I are the equipment feature in the apparatus characteristic information set Information sum, the object statistics value are number of elements of the preceding i apparatus characteristic information in the apparatus characteristic information set It is preceding i equipment feature with the ratio of the element total quantity in the apparatus characteristic information set or the object statistics value Probability density of the information in the apparatus characteristic information set.
A3, method according to a1 or a2, the processing request instruction are characterized rule information creation instruction, the rule It is then the regularity for describing the matching relationship of apparatus characteristic information and facility information;The method also includes:
The regularity is added in characteristic information rule set.
A4, method according to a1 or a2, the processing request instruction are mapping ruler creation instruction, and the method is also Include:
Obtain the facility information identified to the top n apparatus characteristic information;
The facility information is sent to the processing client-side interface, to determine according to the facility information for retouching The mapping ruler between the facility information and standard device information is stated, the rule is the mapping ruler.
A5, method according to a4, before the acquisition apparatus characteristic information set, the method also includes:
Apparatus characteristic information is obtained, and identifies to obtain facility information according to the apparatus characteristic information;
The facility information is matched using the mapping ruler set pre-established;
If not matching to obtain standard device information, the apparatus characteristic information is added in cluster tool.
A6, method according to a1 or a2, the apparatus characteristic information include user agent's information.
A7, method according to a1 or a2, the apparatus characteristic information include host name information.
B8, a kind of data processing system, comprising:
Information aggregate acquiring unit, the collection for obtaining apparatus characteristic information set, in the apparatus characteristic information set Conjunction element is apparatus characteristic information, the facility information of apparatus characteristic information terminal device for identification;
Apparatus characteristic information frequency of occurrence statistic unit, for determining, each equipment is special in the apparatus characteristic information set The frequency of occurrence of reference breath;
Frequency of occurrence sequencing unit, for obtaining top n apparatus characteristic information according to frequency of occurrence descending;
Request instruction transmission unit, for the top n apparatus characteristic information and processing request instruction to be sent to data Handle client-side interface;
Processing result receiving unit, the processing result returned for receiving the data processing client-side interface, the place Reason is the result is that a plurality of rule that the instruction instructed according to the processing request handles the top n apparatus characteristic information Then.
B9, the system according to B8, the system also includes threshold value determination units, are used for:
Before obtaining top n apparatus characteristic information according to frequency of occurrence descending, according to the target of preceding i apparatus characteristic information The functional relation of statistical value and sequence value i, obtains the corresponding sequence value i=N of predetermined inflection point object statistics value, and 1 ≤ i≤I, I are the apparatus characteristic information sum in the apparatus characteristic information set, and the object statistics value is preceding i equipment Element sum of the characteristic information in the number of elements and the apparatus characteristic information set in the apparatus characteristic information set The ratio of amount or the object statistics value are probability of the preceding i apparatus characteristic information in the apparatus characteristic information set Density.
B11, the system according to B8 or B9, the processing request instruction is characterized rule information creation instruction, described Rule is the regularity for describing the matching relationship of apparatus characteristic information and facility information;The system also includes information to add Add unit, be used for:
The regularity is added in characteristic information rule set.
B12, the system according to B8 or B9, the processing request instruction are mapping ruler creation instruction, the system Further include facility information transmission unit, be used for:
Obtain the facility information identified to the top n apparatus characteristic information;
The facility information is sent to the processing client-side interface, to determine according to the facility information for retouching The mapping ruler between the facility information and standard device information is stated, the rule is the mapping ruler.
B13, system according to b12, the system also includes facility information characteristic set updating units, are used for:
Apparatus characteristic information is obtained, and identifies to obtain facility information according to the apparatus characteristic information;
The facility information is matched using the mapping ruler set pre-established;
If not matching to obtain standard device information, the apparatus characteristic information is added in apparatus characteristic information set.
B14, the system according to B8 or B9, the apparatus characteristic information include user agent's information.
B15, the system according to B8 or B9, the apparatus characteristic information include host name information.
C16, a kind of computer system, comprising:
One or more processors;
Memory;
One or more application program, wherein one or more of application programs are stored in the memory and quilt It is configured to be executed by one or more of processors, realizes such as the described in any item methods of A1-A7.
D17, a kind of computer readable storage medium are answered used in computer system described in above-mentioned C16 for being stored as With the instruction of program.

Claims (10)

1. a kind of data processing method characterized by comprising
Apparatus characteristic information set is obtained, the set element in the apparatus characteristic information set is apparatus characteristic information, described The facility information of apparatus characteristic information terminal device for identification;
Determine the frequency of occurrence of each apparatus characteristic information in the apparatus characteristic information set;
Top n apparatus characteristic information is obtained according to frequency of occurrence descending;
The top n apparatus characteristic information and processing request instruction are sent to data processing client-side interface;
The processing result that the data processing client-side interface returns is received, the processing result is to refer to according to the processing request The a plurality of rule that the instruction of order handles the top n apparatus characteristic information.
2. the method according to claim 1, wherein described obtain top n equipment spy according to frequency of occurrence descending Before reference breath, the method also includes:
According to the functional relation of the object statistics value of preceding i apparatus characteristic information and sequence value i, predetermined inflection point is obtained Object statistics value corresponding sequence value i=N, 1≤i≤I, I are the apparatus characteristic information in the apparatus characteristic information set Sum, the number of elements and institute that the object statistics value is preceding i apparatus characteristic information in the apparatus characteristic information set The ratio or the object statistics value for stating the element total quantity in apparatus characteristic information set are preceding i apparatus characteristic information Probability density in the apparatus characteristic information set.
3. method according to claim 1 or 2, which is characterized in that the processing request instruction is characterized rule information wound Instruction is built, the rule is the regularity for describing the matching relationship of apparatus characteristic information and facility information;The method Further include:
The regularity is added in characteristic information rule set.
4. method according to claim 1 or 2, which is characterized in that the processing request instruction is that mapping ruler creation refers to It enables, the method also includes:
Obtain the facility information identified to the top n apparatus characteristic information;
The facility information is sent to the processing client-side interface, to determine according to the facility information for describing The mapping ruler between facility information and standard device information is stated, the rule is the mapping ruler.
5. according to the method described in claim 4, it is characterized in that, before the acquisition apparatus characteristic information set, the side Method further include:
Apparatus characteristic information is obtained, and identifies to obtain facility information according to the apparatus characteristic information;
The facility information is matched using the mapping ruler set pre-established;
If not matching to obtain standard device information, the apparatus characteristic information is added in cluster tool.
6. method according to claim 1 or 2, which is characterized in that the apparatus characteristic information includes user agent's information.
7. method according to claim 1 or 2, which is characterized in that the apparatus characteristic information includes host name information.
8. a kind of data processing system characterized by comprising
Information aggregate acquiring unit, the set member for obtaining apparatus characteristic information set, in the apparatus characteristic information set Element is apparatus characteristic information, the facility information of apparatus characteristic information terminal device for identification;
Apparatus characteristic information frequency of occurrence statistic unit, for determining each equipment feature letter in the apparatus characteristic information set The frequency of occurrence of breath;
Frequency of occurrence sequencing unit, for obtaining top n apparatus characteristic information according to frequency of occurrence descending;
Request instruction transmission unit, for the top n apparatus characteristic information and processing request instruction to be sent to data processing Client-side interface;
Processing result receiving unit, the processing result returned for receiving the data processing client-side interface, the processing knot Fruit is a plurality of rule that the instruction instructed according to the processing request handles the top n apparatus characteristic information.
9. a kind of computer system characterized by comprising
One or more processors;
Memory;
One or more application program, wherein one or more of application programs are stored in the memory and are configured To be executed by one or more of processors, the method according to claim 1 to 7 is realized.
10. a kind of computer readable storage medium, which is characterized in that for being stored as computer described in the claims 9 The instruction of application program used in system.
CN201910186327.5A 2019-03-12 2019-03-12 Data processing method, system and storage medium Active CN109947803B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910186327.5A CN109947803B (en) 2019-03-12 2019-03-12 Data processing method, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910186327.5A CN109947803B (en) 2019-03-12 2019-03-12 Data processing method, system and storage medium

Publications (2)

Publication Number Publication Date
CN109947803A true CN109947803A (en) 2019-06-28
CN109947803B CN109947803B (en) 2021-11-19

Family

ID=67009687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910186327.5A Active CN109947803B (en) 2019-03-12 2019-03-12 Data processing method, system and storage medium

Country Status (1)

Country Link
CN (1) CN109947803B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110345056A (en) * 2019-07-12 2019-10-18 四川虹美智能科技有限公司 SCM Based data processing method, driver, controller and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145157A (en) * 2007-06-14 2008-03-19 中兴通讯股份有限公司 XML format embedded type apparatus characteristic information analysis method
CN105162888A (en) * 2015-09-30 2015-12-16 北京奇虎科技有限公司 Remote tracking method for intelligent wearable device, terminal and server
CN106407768A (en) * 2015-07-29 2017-02-15 阿里巴巴集团控股有限公司 Methods and devices for determining device fingerprint and identifying target device
CN106603510A (en) * 2016-11-28 2017-04-26 深圳市金立通信设备有限公司 Data processing method and terminal
US20180004815A1 (en) * 2015-12-01 2018-01-04 Huawei Technologies Co., Ltd. Stop word identification method and apparatus
US20180157712A1 (en) * 2015-05-06 2018-06-07 Örjan Vestgöte Technology AB Method, system and computer program product for performing numeric searches
CN108959585A (en) * 2018-07-10 2018-12-07 维沃移动通信有限公司 A kind of expression picture acquisition methods and terminal device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101145157A (en) * 2007-06-14 2008-03-19 中兴通讯股份有限公司 XML format embedded type apparatus characteristic information analysis method
US20180157712A1 (en) * 2015-05-06 2018-06-07 Örjan Vestgöte Technology AB Method, system and computer program product for performing numeric searches
CN106407768A (en) * 2015-07-29 2017-02-15 阿里巴巴集团控股有限公司 Methods and devices for determining device fingerprint and identifying target device
CN105162888A (en) * 2015-09-30 2015-12-16 北京奇虎科技有限公司 Remote tracking method for intelligent wearable device, terminal and server
US20180004815A1 (en) * 2015-12-01 2018-01-04 Huawei Technologies Co., Ltd. Stop word identification method and apparatus
CN106603510A (en) * 2016-11-28 2017-04-26 深圳市金立通信设备有限公司 Data processing method and terminal
CN108959585A (en) * 2018-07-10 2018-12-07 维沃移动通信有限公司 A kind of expression picture acquisition methods and terminal device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陆泽橼等: "基于雷达脉冲压缩信号的辐射源个体识别技术", 《电脑知识与技术》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110345056A (en) * 2019-07-12 2019-10-18 四川虹美智能科技有限公司 SCM Based data processing method, driver, controller and system

Also Published As

Publication number Publication date
CN109947803B (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN108776934B (en) Distributed data calculation method and device, computer equipment and readable storage medium
US9973521B2 (en) System and method for field extraction of data contained within a log stream
US11915104B2 (en) Normalizing text attributes for machine learning models
CN106649831B (en) Data filtering method and device
CN109951354A (en) A kind of terminal device recognition methods, system and storage medium
CN104036004B (en) Search for error correction method and search error correction device
CN114861910B (en) Compression method, device, equipment and medium of neural network model
CN112463859B (en) User data processing method and server based on big data and business analysis
CN113051308A (en) Alarm information processing method, equipment, storage medium and device
CN113536770B (en) Text analysis method, device and equipment based on artificial intelligence and storage medium
CN104933096B (en) Abnormal key recognition methods, device and the data system of database
CN109947803A (en) A kind of data processing method, system and storage medium
CN107330031B (en) Data storage method and device and electronic equipment
CN111368128B (en) Target picture identification method, device and computer readable storage medium
WO2019024238A1 (en) Range value data statistical method and system, electronic device, and computer readable storage medium
CN109040089B (en) Network policy auditing method, equipment and computer readable storage medium
US11663184B2 (en) Information processing method of grouping data, information processing system for grouping data, and non-transitory computer readable storage medium
CN110532267A (en) Determination method, apparatus, storage medium and the electronic device of field
US10438695B1 (en) Semi-automated clustered case resolution system
CN114356712A (en) Data processing method, device, equipment, readable storage medium and program product
CN110083357B (en) Interface construction method, device, server and storage medium
CN112508518A (en) RPA flow generation method combining RPA and AI, corresponding device and readable storage medium
US11132235B2 (en) Data processing method, distributed data processing system and storage medium
CN109559139A (en) A kind of processing method of item object, device, medium and electronic equipment
CN110119406B (en) Method and device for checking real-time task records

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210907

Address after: No. 1201, 12 / F, building 6, No. 599, shijicheng South Road, Chengdu hi tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu, Sichuan 610094

Applicant after: Chengdu panorama Intelligent Technology Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Applicant before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant