CN105183836A - Symbol characteristic based algorithm for obtaining big data information of event - Google Patents

Symbol characteristic based algorithm for obtaining big data information of event Download PDF

Info

Publication number
CN105183836A
CN105183836A CN201510553189.1A CN201510553189A CN105183836A CN 105183836 A CN105183836 A CN 105183836A CN 201510553189 A CN201510553189 A CN 201510553189A CN 105183836 A CN105183836 A CN 105183836A
Authority
CN
China
Prior art keywords
event
symbolic
decimal
large data
symbol
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510553189.1A
Other languages
Chinese (zh)
Other versions
CN105183836B (en
Inventor
张雨
张弛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU RUNBANG INTELLIGENT PARKING EQUIPMENT CO., LTD.
Original Assignee
Nanjing Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Institute of Technology filed Critical Nanjing Institute of Technology
Priority to CN201510553189.1A priority Critical patent/CN105183836B/en
Publication of CN105183836A publication Critical patent/CN105183836A/en
Application granted granted Critical
Publication of CN105183836B publication Critical patent/CN105183836B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a symbol characteristic based algorithm for obtaining big data information of an event. The algorithm comprises the following steps of: step 1: obtaining a decimal time sequence {xn} of the event and setting total length of sampling; step 2: setting to-be-encoded binary symbol length L and sampling time delay tau; step 3: calculating a mean value mu of the decimal time sequence {xn}; step 4: using mu as a dividing line P0 of two symbol domains 0 and 1, and setting a threshold function; step 5: widely applying the threshold function to {xn}, and changing an element xn of the decimal time sequence {xn} into an element sn in a binary symbol sequence {sn} according to the binary symbol length L and the sampling time delay tau to construct the binary symbol sequence {sn}; step 6: carrying out decimal coding on {sn}, and changing {sn} into a decimal symbol code sequence {Sn}; and step 7: making statistics on the occurrence frequency Pn of each symbol code Sn in {Sn} to form a symbol code Sn-frequency Pn histogram. According to the algorithm disclosed by the invention, the explicitation of big data characteristics is achieved, so that whether the decimal time sequence {xn} of the representative event has the big data characteristics can be conveniently determined.

Description

A kind of algorithm obtaining the large data message of event based on symbolic feature
Technical field
The present invention relates to a kind of algorithm obtaining the large data message of event based on symbolic feature.
Background technology
Such definition is given: " large data " need new tupe just can have stronger decision edge, the magnanimity seeing clearly discovery power and process optimization ability, high growth rate and diversified information assets for " large data " (Bigdata) research institution Gartner.
For the decimal system time series describing a certain generalized event as shown in Figure 1, what kind of does is its large data characteristics? if there is large data characteristics, then how to obtain this large data characteristics? the method obtaining large data in prior art is not unique, and this patent proposes a kind of algorithm obtaining the large data message of event based on symbolic feature.
Summary of the invention
For the problems referred to above, the invention provides a kind of algorithm obtaining the large data message of event based on symbolic feature, achieve the domination of large data characteristics, be convenient to judge to correspond to a certain symbolic code S nparticular event whether there is large data characteristics; Further, be convenient to judge to correspond to symbolic code sequence { S n(also namely correspond to decimal system time series { x n) a certain generalized event whether there is randomness or determinacy.
For realizing above-mentioned technical purpose, reach above-mentioned technique effect, the present invention is achieved through the following technical solutions:
Obtain an algorithm for the large data message of event based on symbolic feature, it is characterized in that, comprise the steps:
Step 1: the decimal system time series { x obtaining event nand sampling total length is set;
Step 2: binit length L to be encoded and sampling time delay τ is set;
Step 3: calculate decimal system time series { x naverage μ;
Step 4: using μ as the dividing line P of 0 and 1 two symbol field 0, threshold function table is set s n = 1 , x n &GreaterEqual; &mu; 0 , x n < &mu; ;
Step 5: to { x nall over executing threshold function table, according to binit length L and sampling time delay τ by decimal system time series { x nelement x nbe transformed to binary symbol sequence { s nin element s n, build binary symbol sequence { s n;
Step 6: to { s ncarry out decimal coded, be converted into decimal symbol code sequence { S n;
Step 7: statistics { S nin each symbolic code S nthe frequency P occurred n, form symbolic code S n?frequency P nnogata figure.
Preferably, also step 8 is comprised: according to symbolic code S n?frequency P nnogata figure computed improved entropy H s(L).
The invention has the beneficial effects as follows:
To this time series { x nimplement " coarse "---symbolism, make the time series of original numerical value change multiterminal be converted into the symbol sebolic addressing only having several numerical value.By " coarse " processing, obtain symbolic code S n-frequency P nfigure, wherein, the symbolic code of large frequency correspond to strong information, and the symbolic code of little frequency correspond to Weak Information, thus achieves the domination of large data characteristics.
Further, can to symbolic code S n-frequency P nhistogram calculation improves entropy H s(L), the H of randomness event s(L)>=0.9, the H of deterministic case s(L)≤0.1, thus can judge to correspond to symbolic code sequence { S n(also namely correspond to decimal system time series { x n) a certain generalized event whether there is randomness or determinacy.
Accompanying drawing explanation
Fig. 1 is the decimal system time series { x of a certain generalized event n;
Fig. 2 is decimal system time series { x nbe converted to binary symbol sequence { s nschematic diagram;
Fig. 3 is certain stock index change { x nweekly figure and be converted into binary symbol sequence { s nschematic diagram;
When Fig. 4 is binit length L=3, certain stock index change { x nthe symbolic code S of weekly figure n-frequency P nhistogram;
Fig. 5 is certain four-cylinder diesel engine fuselage shaking { x nschematic diagram;
When Fig. 6 is binit length L=6, certain four-cylinder diesel engine fuselage shaking { x nsymbolic code S n-frequency P nhistogram.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment, technical solution of the present invention is described in further detail, can better understand the present invention to make those skilled in the art and can be implemented, but illustrated embodiment is not as a limitation of the invention.
Obtain an algorithm for the large data message of event based on symbolic feature, comprise the steps:
Step 1: the decimal system time series { x obtaining event nand sampling total length is set;
Step 2: binit length L to be encoded and sampling time delay τ is set;
Step 3: calculate decimal system time series { x naverage μ;
Step 4: using μ as the dividing line P of 0 and 1 two symbol field 0, threshold function table is set s n = 1 , x n &GreaterEqual; &mu; 0 , x n < &mu; ;
Step 5: to { x nall over executing threshold function table, according to binit length L and sampling time delay τ by decimal system time series { x nelement x nbe transformed to binary symbol sequence { s nin element s n, build binary symbol sequence { s n;
Step 6: to { s ncarry out decimal coded, be converted into decimal symbol code sequence { S n;
Step 7: statistics { S nin each symbolic code S nthe frequency P occurred n, form symbolic code S n?frequency P nnogata figure.At symbolic code S n?frequency P nin figure, certain symbolic code S ncharacterize a certain particular event, the frequency P of its correspondence nthe intensity that this particular event occurs.Wherein, the symbolic code of large frequency correspond to strong information, and the symbolic code of little frequency correspond to Weak Information, if this frequency P nnumerical value relatively other symbolic codes is larger, then can judge that this particular event has large data characteristics, thus achieve the domination of large data characteristics.By corresponding to certain symbolic code S of a certain particular event (i.e. " individual events ") nthe frequency P occurred n, just can judge whether this particular event has regular large data characteristics.
A threshold value can be rule of thumb set, as certain symbolic code S nfrequency P nwhen being greater than the threshold value of setting, judge that this particular event has large data characteristics.
Further, can according to symbolic code S n?frequency P nnogata graphics calculations " improves entropy H s(L) ", its computing formula is such as formula (1):
H s ( L ) &equiv; - 1 logN s e q &Sigma; i p i , L logp i , L - - - ( 1 )
In formula (1): N seqit is the total number of symbolic code with non-zero frequency; I is the number sequence number of symbolic code; p i,Lthe frequency of i-th symbolic code of L that to be length be.
Due to the H of randomness event s(L)>=0.9, the H of deterministic case s(L)≤0.1, thus can judge to correspond to symbolic code sequence { S n(also namely correspond to decimal system time series { x n) a certain generalized event (i.e. " overall event ") whether there is randomness or determinacy.
By determining binary symbol sequence { s nlength L and time delay τ, determine decimal system time series { x naverage μ, threshold function table is set, can by decimal system time series { x nbe transformed to binary symbol sequence { s n, then to { s ndecimal symbol code sequence { S is converted into do decimal coded n.Wherein, each parameter is preferably: sampling total length>=50 point, and the span of L is the span of 3 ~ 6, τ is 1 ~ 3, and it should be noted that, the span 1 ~ 3 of τ, refers at symbol field s nnext element is got at interval of 1 ~ 3 bit data.Fig. 2 is the decimal system time series { x to a certain generalized event corresponding in Fig. 1 n, be converted into binary symbol sequence { s nprocess, for express simply clear for the purpose of, get symbol lengths L=3, time delay τ=1.
The stock index Changing Pattern of economic field is analyzed, seeks the relation between how empty two sides.Fig. 3 is economic field stock index change { x nweekly (5 days) figure, and be converted into binary symbol sequence { s nprocess, be the intensive expressing the variation of Fig. 3 stock index, get symbol lengths L=3, time delay τ=1, the stock index change { x of its correspondence nsymbolic code S n?frequency P nhistogram as shown in Figure 4.
As seen from Figure 4, the frequency that symbolic code " 101 " occurs maximum (6 times), the frequency that symbolic code " 010 " occurs takes second place (4 times).In figure 3, " 101 " characterize the dark ∨ bounce-back of stock index, and " 010 " characterizes the large ∧ drop of stock index.At the severity of the how empty both sides' game in a Zhou Zhong stock market, changed to the greatest extent the symbolic code X of figure by weekly stock index n?frequency P nhistogram quantitative expression, occupies windward than short side in many ways.And the improvement entropy H of Fig. 4 s(L)=0.68, illustrates the effect being simultaneously subject to certainty factor and random factor in weekly stock index change procedure.
The diesel vibration of engineering field is analyzed, seeks the action effect of Influential Factors.Fig. 5 is engineering field four-cylinder diesel engine fuselage shaking { x nfigure, for express the of short duration large vibration of Fig. 5 and between the feature of small vibration large-spacing, get symbol lengths L=6, time delay τ=3, the four-cylinder diesel engine fuselage shaking { x of its correspondence nsymbolic code S n?frequency P nhistogram as shown in Figure 6.
As seen from Figure 5, fuselage shaking time history { x nthere is large vibration of short duration several times, this is the result of clashing into cylinder sleeve in firing top centre and bottom dead center-nearby, exhaust top dead center and bottom dead center-nearby piston excited target respectively, with diesel load, lives that it is all relevant to fill in ?steel-jacket gap, piston ring sticking state etc.In figure 6, these possible influence factors can by the symbolic code S of fuselage shaking n?frequency P nhistogram quantitative expression, wherein has the decimal symbol code that several frequency is larger.The binit of symbol lengths L=6 can be converted into, observe in Figure 5 and the opportunity finding it to occur, just can judge it is which factor causes diesel engine vibration the most very.And the improvement entropy H of Fig. 6 s(L)=0.9754, illustrates that diesel engine vibration has the attribute of randomness event on the whole.
These are only the preferred embodiments of the present invention; not thereby the scope of the claims of the present invention is limited; every utilize instructions of the present invention and accompanying drawing content to do equivalent structure or equivalent flow process conversion; or be directly or indirectly used in the technical field that other are relevant, be all in like manner included in scope of patent protection of the present invention.

Claims (6)

1. obtain an algorithm for the large data message of event based on symbolic feature, it is characterized in that, comprise the steps:
Step 1: the decimal system time series { x obtaining event nand sampling total length is set;
Step 2: binit length L to be encoded and sampling time delay τ is set;
Step 3: calculate decimal system time series { x naverage μ;
Step 4: using μ as the dividing line P of 0 and 1 two symbol field 0, threshold function table is set s n = 1 , x n &GreaterEqual; &mu; 0 , x n < &mu; ;
Step 5: to { x nall over executing threshold function table, according to binit length L and sampling time delay τ by the decimal system
Time series { x nelement x nbe transformed to binary symbol sequence { s nin element s n, build binary symbol sequence { s n;
Step 6: to { s ncarry out decimal coded, be converted into decimal symbol code sequence { S n;
Step 7: statistics { S nin each symbolic code S nthe frequency P occurred n, form symbolic code S n?frequency P nnogata figure.
2. a kind of algorithm obtaining the large data message of event based on symbolic feature according to claim 1, is characterized in that, sampling total length >=50 point.
3. a kind of algorithm obtaining the large data message of event based on symbolic feature according to claim 1, it is characterized in that, the span of L is 3 ~ 6.
4. a kind of algorithm obtaining the large data message of event based on symbolic feature according to claim 1, it is characterized in that, the span of τ is 1 ~ 3.
5. a kind of algorithm obtaining the large data message of event based on symbolic feature according to claim 1, is characterized in that, as certain symbolic code S nfrequency P nwhen being greater than the threshold value of setting, judge that a certain particular event that correspond to this symbolic code has large data characteristics.
6. a kind of algorithm obtaining the large data message of event based on symbolic feature according to claim 1, is characterized in that, according to symbolic code S n?frequency P nnogata figure computed improved entropy H s(L), H is worked as s(L), when>=0.9, judge to correspond to symbolic code sequence { S ngeneralized event there is randomness; Work as H s(L), when≤0.1, judge to correspond to symbolic code sequence { S ngeneralized event there is determinacy; Work as 0.1<H s(L), during <0.9, judge to correspond to symbolic code sequence { S ngeneralized event be subject to the effect of certainty factor and random factor simultaneously.
CN201510553189.1A 2015-09-01 2015-09-01 A kind of algorithm that event big data information is obtained based on symbolic feature Active CN105183836B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510553189.1A CN105183836B (en) 2015-09-01 2015-09-01 A kind of algorithm that event big data information is obtained based on symbolic feature

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510553189.1A CN105183836B (en) 2015-09-01 2015-09-01 A kind of algorithm that event big data information is obtained based on symbolic feature

Publications (2)

Publication Number Publication Date
CN105183836A true CN105183836A (en) 2015-12-23
CN105183836B CN105183836B (en) 2018-06-15

Family

ID=54905918

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510553189.1A Active CN105183836B (en) 2015-09-01 2015-09-01 A kind of algorithm that event big data information is obtained based on symbolic feature

Country Status (1)

Country Link
CN (1) CN105183836B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942300A (en) * 2014-04-15 2014-07-23 大连海事大学 Dynamic solution method of center time series
CN104636325A (en) * 2015-02-06 2015-05-20 中南大学 Document similarity determining method based on maximum likelihood estimation
CN104679991A (en) * 2015-01-27 2015-06-03 吉林大学 Ordered proposition-oriented novel method of information fusion
CN104866929A (en) * 2015-06-11 2015-08-26 陈虹 International investment index data processing and analysis method and international investment index data processing and analysis system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942300A (en) * 2014-04-15 2014-07-23 大连海事大学 Dynamic solution method of center time series
CN104679991A (en) * 2015-01-27 2015-06-03 吉林大学 Ordered proposition-oriented novel method of information fusion
CN104636325A (en) * 2015-02-06 2015-05-20 中南大学 Document similarity determining method based on maximum likelihood estimation
CN104866929A (en) * 2015-06-11 2015-08-26 陈虹 International investment index data processing and analysis method and international investment index data processing and analysis system

Also Published As

Publication number Publication date
CN105183836B (en) 2018-06-15

Similar Documents

Publication Publication Date Title
CN106202922A (en) A kind of transformer fault diagnosis system based on clustering algorithm
CN109118075B (en) Electric power industrial control terminal safety monitoring method based on business logic consistency
CN103379099A (en) Hostile attack identification method and system
CN102393922A (en) Fuzzy Petri inference method of intelligent alarm expert system of transformer substation
Xu et al. Intuitionistic fuzzy soft set
CN106971254A (en) A kind of service monitoring system and method
CN108399147A (en) A kind of transformer excitation flow recognition method based on MEEMD algorithms
CN109474281A (en) Data encoding, coding/decoding method and device
CN105824880A (en) Webpage grasping method and device
CN109598334A (en) A kind of sample generating method and device
CN103823869A (en) Data extracting and predicting model establishing method for environment monitoring
CN106021476A (en) Push system of personal information
CN105183836A (en) Symbol characteristic based algorithm for obtaining big data information of event
KR100916805B1 (en) Method of hash algorithms having 256 bit output
El Sibai et al. A performance study of the chain sampling algorithm
CN105678623A (en) Metaheuristic searching method for solving flexibility workshop job scheduling
CN104778202B (en) The analysis method and system of event evolutionary process based on keyword
CN103618601B (en) Preselected integer factorization-based RSA (Rivest, Shamir and Adleman) password cracking system and method
CN105678078A (en) Symbolized quality characteristic grey prediction method of complicated electromechanical system
Hanawal et al. Guessing and compression subject to distortion
CN108629475A (en) A kind of exchange method of the operation information analysis system based on macroeconomic data
CN101620549A (en) Performance analysis method and device
CN1332218C (en) Secondary radar response code extracting and confidence beaconing algorithm
CN113037553A (en) IEC102 protocol communication behavior abnormity detection method and system based on IA-SVM
CN104715197A (en) Quick file scanning method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Zhang Yu

Inventor after: Zhang Chi

Inventor after: Shi Huanran

Inventor after: Zou Jianping

Inventor before: Zhang Yu

Inventor before: Zhang Chi

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170707

Address after: 211803 Star Industrial Park, Pukou District, Jiangsu, Nanjing

Applicant after: JIANGSU RUNBANG INTELLIGENT PARKING EQUIPMENT CO., LTD.

Address before: 1 No. 211167 Jiangsu city of Nanjing province Jiangning Science Park Hongjing Road

Applicant before: Nanjing Institute of Technology

CB02 Change of applicant information

Address after: 211803 Pukou District Star Industrial Park in Nanjing, Jiangsu

Applicant after: Jiangsu run state intelligent garage Limited by Share Ltd

Address before: 211803 Pukou District Star Industrial Park in Nanjing, Jiangsu

Applicant before: JIANGSU RUNBANG INTELLIGENT PARKING EQUIPMENT CO., LTD.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: An algorithm for obtaining event big data information based on symbol features

Effective date of registration: 20211125

Granted publication date: 20180615

Pledgee: Bank of China Limited Nanjing Jiangbei New Area Branch

Pledgor: JIANGSU RUNBANG INTELLIGENT GARAGE CO.,LTD.

Registration number: Y2021980013224