CN107733694A - The automatic analysis method of internet of things oriented real time data - Google Patents
The automatic analysis method of internet of things oriented real time data Download PDFInfo
- Publication number
- CN107733694A CN107733694A CN201710874448.XA CN201710874448A CN107733694A CN 107733694 A CN107733694 A CN 107733694A CN 201710874448 A CN201710874448 A CN 201710874448A CN 107733694 A CN107733694 A CN 107733694A
- Authority
- CN
- China
- Prior art keywords
- data
- internet
- index
- real
- things
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/958—Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
- G06F16/986—Document structures and storage, e.g. HTML extensions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/147—Network analysis or design for predicting network behaviour
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/02—Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/12—Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
Abstract
The present invention relates to a kind of automatic analysis method of internet of things oriented real time data, it comprises the following steps:Entity information is gathered and obtained, and the method obtained in real time using Watir and Nokogiri information is handled;Data real-time management;Data real-time statistics;Entity information is searched in real time;Real-time estimate;Real-time exchange.Thus, search result can be directly transmitted to user or be transmitted to prediction module be predicted processing, establish forecast model, prediction acquired results more have realistic meaning for direct search result, the result after prediction is transmitted to user again, so as to play the effect of directiveness to the decision-making of user.Can search meet the entity of particular state, while after the real-time status of special entity can be searched and predict certain time the entity state, so as to therefrom obtain the information useful to user.
Description
Technical field
The present invention relates to a kind of automatic analysis method, more particularly to a kind of side of automatically analyzing of internet of things oriented real time data
Method.
Background technology
With the continuous progress of development in science and technology, technology of Internet of things oneself be widely used in smart home, intelligence control traffic, intelligence disappear
The multiple fields such as anti-, environmental monitoring, logistics express delivery, foodstuff traceability, industry monitoring, hotel management, health care, more and more
People wishes conveniently and efficiently obtain Internet of Things data information by network, observation processing is carried out to data at any time, to do
Go out corresponding decision-making.
Internet of Things real time data automatic analysis system is realized by technologies such as radio frequency identification, sensor and network interconnections
To the mark of the various objects of physical world, perception, monitoring, tracking, control box management, people and thing, human and environment and thing with
Efficient, intelligent, harmonious information exchange is realized between thing, physical world and information are realized eventually through intelligent decision control technology
The fusion in the world.
Traditional information network is using semanteme as core, and Internet of Things (InternetofThings, abbreviation IOT) then exists
On the basis of this, the Internet of Things awareness apparatus with wireless short range communication ability is embedded into various types of utensils so that this
The various data of a little utensil contributions can be obtained automatically by system, so as to extend the scope of physical world information.IOT in recent years
Technology is developed rapidly, and numerous notable achievements is also achieved in terms of industrialization.But because IOT awareness apparatus is using different
Hardware platform, operating system, database and middleware, its rely on network environment it is also different so that between equipment not can from
By communicating, its application platform is difficult to share and reused, and causes that IOT application and developments difficulty is big, the right high, autgmentability of system misfortune
Difference, third party's resource are difficult to the inferior position such as integrated.The bottleneck that these inferior positions develop into IOT so that IOT realizes large-scale application
It is heavy with Difficulty.Therefore, letter need to build a kind of more open and flexible system architecture, be passed so as to more easily share
The information data and control function of sensor.With Internet rapid development and widely use, all terminal users may be used
To share various information and application by Web.Because Web technologies realize the basis to the various services of each terminal user offer
Platform, it becomes IOT and realizes the shared best-of-breed technology selection of heterogeneous resource.Based on this, IOT is combined with Web technologies
Form the open system framework of article WWW (WebofThings, abbreviation WOT).WOT utilizes Web design concept and skill
Art, Web information space is incorporated into by the information data and service of contributing all kinds of Internet of Things awareness apparatus, so as to realize difference
The access of data and service, access are with polymerizeing under platform.
With the attention to Internet of Things both at home and abroad, in terms of Internet of Things search technique, for different application environments,
Exist it is a variety of realize technology, be such as applied to Snoggle/Microsearch systems [} lfgl of centralized management, be not suitable for it is big
The big OCHCObjectscallinghome of the MAX system-computed amount expenses of scale environment) } system, support Data Stream Processing
GSNCGlobalsensornetworks systems etc.;These systems can be realized a certain degree of according to conditions such as applicable situations
Internet of Things is searched for, but the physical entity in terms of 10,000,000,000 is counted in the Internet of Things, magnanimity entity have isomerism, physical security,
The features such as data dynamic, state real-time, the entity that these systems can not still be dealt with Internet of Things completely are searched in real time.
In view of it is above-mentioned the defects of, the design people, be actively subject to research and innovation, it is real-time to found a kind of internet of things oriented
The automatic analysis method of data, make it with more the value in industry.
The content of the invention
A kind of in order to solve the above technical problems, automatically analyzing it is an object of the invention to provide internet of things oriented real time data
Method.
The automatic analysis method of the internet of things oriented real time data of the present invention, it comprises the following steps:
Step 1, entity information collection is with obtaining, at the method obtained in real time using Watir and Nokogiri information
Reason;Step 2, data real-time management;Step 3, data real-time statistics;Step 4, entity information are searched in real time;Step 5, it is real
When predict;Step 6, real-time exchange.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 1, lead to
Loading page module is crossed, obtains page HTML modules, parsing gained HTML modules and data memory module,
The loading page module and data memory module provide and the extraneous interface contacted;
The loading page module loads the outside page by network linking address, passes to and obtains HTML modules, obtains
HTML modules are directed to the dynamic page that loading page module passes over, and obtain the html document of the page, while will be obtained
Html document pass to parsing HTML modules;
Parse HTML modules and parse required content of text by location technology from obtained html document, will solve
The content separated out carries out data storage to data memory module.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 2,
Using distributed data base participative management, residing distributed data base ginseng is supported in global domination set, global control is scattered, global
The scattered control mode in control section;
The distributed data base is by local place data base management system, global data base management system, global data
Dictionary, telecommunication management composition.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 3,
The application layer that data in real-time data base are uploaded in Internet of Things three-decker, and complete to count.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 4,
Lucene systems are formed using the kit of the open source of written in Java, source code has been divided into 7 modules, including,
Org.apache.lucene.document modules:Source record for user to be provided is Document, is used in combination
Document management during storing index;
Org.apache.lucene.util modules:For providing the support of public tool-class, constant class;
Org.apache.lucene.store modules:For providing storage management to index file, can select specific
Domain is stored or not stored;
Org.apache.lucene.index modules:For providing the management to index, indexed for establishing, update rope
Draw or delete index;
Org.apache.lucene.search modules:, can be according to rope when called for realizing match query function
Quotation part retrieves relevant matches file;
Org.apache.lucene.analysis modules:For analyzing the file being indexed, data source is entered on request
Row filtering, slicing operation;
Org.apache.lucene.queryparser modules:For analyzing the input inquiry word of user, there is provided
Suitable query, belongs to query analyzer.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 4,
For the data source file being related to, resolver is first passed around, the manageable text messages of Lucene is extracted, then passes through
Analyzer processing,
The Analyzer processing includes:Participle, i.e., carry out effective cutting by data source, according to space or punctuate,
It is divided into word one by one or numeral, then removes optional word;
The Analyzer is the text analyzer in Lucene, and one character string is divided into single word by setting rule
Language, and the invalid word in character string is filtered out, the invalid word includes " of ", " the " in English, in Chinese
" ", " ", these words are without effective information;Afterwards, Lucene index document D ocument objects and corresponding are created
Index domain Field objects.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 5,
By judging the cyclic pattern of Internet of Things network entity, to special time period settling time window, predict certain position in the future time
The state value of entity, include polymerization forecast model according to the cyclic pattern of Internet of Things event is different, monocycle forecast model is more all
Phase forecast model.
Yet further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 6,
Scan the fixed disk file of whole computer, it is established that the information MAP contact of virtualization, directly carry out the access behaviour of fileinfo
Make, when data query exchanges, it is necessary to using index technology and caching technology,
The index technology can include aggregat ion pheromones, nonclustered index, and aggregat ion pheromones are according to the order of index, to data
The information such as storehouse, form are disposably stored, and nonclustered index can show newly-increased data message;
The file buffering strategy file buffering is completed by interim table.
By such scheme, the present invention at least has advantages below:
1st, the automatic analysis technology based on internet of things oriented real time data, each sensor physically is will be attached to this
The state of entity is perceived, by the information perceived by wireless network transmissions into object database.
2nd, on webpage, the data on webpage are in Internet of Things three-decker real-time release the information in database again
Application layer, the Internet of Things information on acquisition applications layer, establishes search framework to the Internet of Things data of collection, searches for Internet of Things in real time
Entity information.
The 3rd, search result can be directly transmitted to user or be transmitted to prediction module be predicted processing, establish forecast model, in advance
Survey acquired results more has realistic meaning for direct search result, then the result after prediction is transmitted into user, so as to give
The effect of directiveness is played in the decision-making of user.
4th, the entity for meeting particular state can be searched for, while the real-time status of special entity can be searched and predict a timing
Between after the entity state, so as to therefrom obtaining the information useful to user.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after.
Brief description of the drawings
Fig. 1 is the structural representation of Internet of Things Real time data acquisition block schematic illustration.
The Watir that Fig. 2 is obtains software flow schematic diagram in real time with Nokogiri information.
Fig. 3 is distributing real-time data bank configuration diagram.
Fig. 4 is the process schematic that Lucene establishes index.
Fig. 5 is Lucene retrieving schematic diagram.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below
Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Such as the automatic analysis method of Fig. 1 to 5 internet of things oriented real time data, its unusual part is to include following
Step:
First, entity information collection is with obtaining, at the method obtained in real time using Watir and Nokogiri information
Reason.The reason is that the entity real time information of Internet of Things dynamic page is obtained from client, can be connected for any one
The client of internet, the reality that can realize entity multidate information is obtained by the Internet of Things web page real time information based on Watir
When obtain and storage.This acquisition methods need not access server simultaneously, it is not required that the database in connection server.
Watir is the Open-Source Tools for being used for webpage automatic test, is to be realized with Ruby, full name is
" WebApplicationTestinginRuby ", it is compact, flexibly, there is provided many functions.Watir can be with simulation browser
Various operations are carried out, such as:Click on control, button, loading webpage etc..Watir scripts directly can be run in a browser.This skill
The operations such as loading Internet of Things dynamic page, the positioning of required page elements are carried out using Watir in art scheme.
Ruby parses XML, most notable in HTML plug-in unit to have Hpricot and Nokogiri.Nokogiri speed ratio
Widely used Hpricot is fast many.Show by Benchmark tests, Nokogiri is in the speed of loading XML document
7 times of Hpricot, it is 5 times of Hpricot in the speed of XPATH search, and is on CSSSelector search
1.62 times of Hpricot.Nokogiri can parse HTML/XML files, using the teaching of the invention it is possible to provide XPATH and CSSSelector is supported.
The technical program can reach very high speed using Nokogiri parsing html documents.
The method can carry out cycle acquisition compared with traditional MetaSeeker methods according to the predefined cycle, for
Webpage renewal speed, which does not reach 13 milliseconds of dynamic real time data and can realized, exhaustively to be captured, and can be realized self-defined
Form stores, and the scalability of this method is fine.
Afterwards, data real-time management.Specifically, the real time data of magnanimity needs to establish real-time data base carries out pipe to it
Reason.First, multiple node databases are included in distributed memory database system, these node databases all remain certain
Autonomy, data distribution and database reconciliation, with data distribution possessed by the transparency be combined with each other after can realize
Database balance is improved, and preferably meets the requirement that Internet of Things mass data is handled in real time.Secondly, distributed storage technology with
Cloud computing technology has been bound to each other to form distributing real-time data bank technology, the number of multiple data acquisition units and data server
Shorting data storage and Data Detection service, energy are formed on the basis of cloud service platform according to memory unit and data retrieval part
Enough requirements for preferably meeting Internet of Things mass data processing.
Then, data real-time statistics are carried out.During this period, search engine has played leading role, Yong Hutong in entity search
The search engine inquiry entity information and related entities of entity-oriented search are crossed, the technical program mainly introduces Internet of Things network entity reality
When search in the core search framework Lucene that uses, realize the structure to Lucene search frameworks.User passes through Internet of Things
The function of search of middle entity inquires about the entity information of needs from the entity information of real-time dynamic change.The attribute bag of physical entity
Static attribute, dynamic attribute etc. are included, these attributes are also closely related with language ambience information in addition.Internet of Things solid data has sea
The characteristic such as amount property, isomerism, dynamic, real-time, compatibility and security.
Then, searched in real time using entity information, real-time estimate.Why this mode is used, the entity in Internet of Things
State has hard real-time and high dynamic, obtains entity information in real time and returns to the information of user with very strong by search
Instantaneity, little to the directive significance of user, the state of the particular result only to searching is analyzed and therefrom predicted
The information sometime put in the future is obtained, the effect of directiveness could be so played to the decision-making of user.
Finally, real-time exchange processing is carried out.
When Internet of Things data information exchanges, it is necessary to by Winesap function accessing file information, calculating is concentrated mainly on
Data information exchange is realized on machine hard disk.But during the data documents disposal of reality, this file operation method work
Inefficient, along with the further development in big data epoch, various types of file datas are more pursued in processing procedure
Efficiency.Therefore, by the further analysis to DIA documents information accessing operating technology, a brand-new DIA documents information accessing side is found out
Method, that is, Memory Mapping File method, larger data message amount file can quickly be handled, obtain corresponding number
According to file, corresponding memory pointer is set on the computer's hard, and corresponding file access authority is set, so just can ensure that
The security of DIA documents information accessing.Thereby, it is possible to save the time, the memory headroom of computer system is expanded, improves data
The treatment effeciency of message file.
From the point of view of a preferable embodiment of the invention, gather and obtain for entity information, loading page can be passed through
Module, obtain page HTML modules, parsing gained HTML modules and data memory module.Loading page module and data storage mould
Block provides and the extraneous interface contacted.Loading page module loads the outside page by network linking address, passes to acquisition
HTML modules, obtain HTML modules and be directed to the dynamic page that loading page module passes over, obtain the html document of the page,
The html document obtained is passed into parsing HTML modules simultaneously.HTML modules are parsed from obtained html document by fixed
Position technology parses required content of text, and the content parsed is carried out into data storage to data memory module.For
The various applications of Internet of Things are exactly to be extended in data memory module.
, can be with reference to from the point of view of Fig. 1 for actual implement:
(1) loading page and frame module.
Involved Internet of Things dynamic page includes smart home, humiture, browsing data, temperature tactics, illumination strategy
Five bulks.Its corresponding HTML fragment is as follows:
<Div class=" nav ">
<ul>
<Li class=" mainlevel " id=" mainlevel_01 " jQuery1355212543380=" 2 ">
<A onclick=" hrefControl (' body.jsp ');" herf=" javascript:void(0);”>
Text _ smart home
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_02 " jQuery1355212543380=" 4 ">
<A onclick=" hrefControl (' jsp/wsd.jsp ');" herf=" javascript:void(0);”>
Text _ humiture
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_03 " jQuery1355212543380=" 6 ">
<a
Onclick=" hrefControl (' servlet/DataServletPage=1 ');" herf="
javascript:void(0);”>
Text _ browsing data
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_04 " jQuery1355212543380=" 8 ">
<A onclick=" hrefControl (' servlet/tacticServletPrefix=
temperature’);
" herf=" javascript:void(0);”>
Text _ temperature tactics
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_05 " jQuery1355212543380=" 10 ">
<A onclick=" hrefControl (' servlet/tacticServletPrefix=beam ');
" herf=" javascript:void(0);”>
Text _ illumination strategy
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_06 " jQuery1355212543380=" 12 ">
<Li class=" mainlevel " id=" mainlevel_05 " style=background-image:none;
background-attachment:scroll;background-repeat:repeat;background-
position-x:
0%;background-position-y:0%;background-color:transparent;”jQuery
1355212543380=" 14 "/>
<Id=" body " src " jsp/wsd.jsp " the frame B of 643 " height=" of iframe width=" 568 "
The margin of order=" 0 "
Width=" 5 ">
Meanwhile the various operations of Watir energy simulation browsers are relied on, the present invention is using Watir loading Internet of Things dynamics
The page.Specifically, " Watir is passed through::Browser.new " creates a browser instances, so as to carry out the behaviour of simulation browser
Make.By " goto (' http:// 10.14.11.100/sh/index.jsp') " method is loaded into smart home dynamic page.
Wherein http:// 10.14.11.100/sh/index.jsp is the chained address of internet of things intelligent household dynamic page.For
Other Internet of Things dynamic pages, the link only need to be replaced with to other chained addresses.
The data for needing to gather when actually implementing are located at below humiture module, after Internet of Things dynamic page is entered,
Real-time saltus step data under humiture module can just be browsed by having to carry out page turn over operation.Page turn over operation does not change page chain ground connection
Location, and content of pages changes.In actual browser, performed by mouse and click on page turn over operation.But simulated in Watir clear
Look at during device, clicking operation is only simulated by " onclick " method, so as to realize page-turning function.Internet of Things dynamic page
Five bulks in face represent that humiture module is second piece, in second " li " label, at this by five " li " nodes
" onclick=" hrefControl (' jsp/wsd.jsp ') in " a " label under " li " label;" " represent clicking operation
To call the js modules of the real-time saltus step data of Dynamic Announce.Hit in Watir simulation browsers point, " a " label and " li " mark
Label are not supported to click on, No. id " mainlevel_02 " therefore, navigated in " li " label, so as to the under " a " label
One link carries out clicking operation, using " li (:Id, ' mainlevel_02') the realization simulation of .links [0] .click " methods
Click on page turn over operation.
There are frame structure, i.e. IFrame frameworks under humiture module, the js modules of real-time saltus step data are located at IFrame frames
In frame structure, must be navigated to the Watir dynamic web pages loaded with Watir under IFrame frameworks could take out under framework
Html document parses for Nokogiri.Watir by the normalize_specifiers in Locator classes and
Matchwithspecifiers methods realize positioning, and the effect of normalize_specifiers methods is construction
Specifiers, specifiers are positioning " mark " to be used of object.Matchwithspecifiers is then to judge member
Whether element meets feature defined in specifiers.The technical program positions Watir by the src attributes of IFrame frameworks
To " jsp/wsd.jsp ", i.e., with method " .frame (:Src, ' jsp/wsd.jsp') " positioning is realized, wherein " jsp/
Wsd.jsp " is the js modules of the real-time saltus step data of Dynamic Announce.
(2) HTML modules are obtained.
Due to calling Internet of Things Dynamic Announce content to need one in client display from server by Javascript
The fixed time.Therefore, although completing the loading positioning for Internet of Things dynamic page, still need to consider to move in practice
The state page loading time.The present invention is controlled etc. to be loaded by Watir waiting mechanism and Ruby dormancy mechanism
The time of dynamic page, i.e., realized by " .wait " method and " sleep () " method.After Watir loading positioning is completed,
Html document under framework is obtained by " .frame.html " method.
(3) HTML modules are parsed.
The html document that the technical program is obtained using Nokogiri parsings.Nokogiri passes through " Nokogiri::
The html document that HTML.parse " methods obtain, and parsed.Nokogiri provides XPath and CSSSelector modes
To find the node in document.The present invention looks for the node of required content on the page by CSSSelector, fetches the moment
Page text information, carry out location element without XPath, because CSSlocator is faster than XPathlocator speed, particularly exist
Below IE (XPath resolvers of the IE without oneself).CSSSelector can accurately navigate to the Elements of test very much.
Data on Internet of Things dynamic page are captured by loop cycle pattern, the single acquisition time is short, and stability is high, and real-time is good,
The real-time collection to multidate information can be realized well.
Specifically, CSSSelector can be divided into the basic type of several classes roughly:ID selectors (#id), Class choosings
Select device (.class), type (type) selector (p), attribute (Attribute) selector, PseudoClasses selectors etc..
Can combine these single selectors in use, such as:div#id,div:last-child.Used in this programme
To be Class selectors, because class values have of the same name for " shidu ", navigate to the Class under IFrame frameworks first
It is worth for the node of " shiduqu ", then the node that class values are " shidu " is chosen toward next stage.Returning to class values is
All content of text of " shidu ".I.e. by the way that " .css (' div.shiduqudiv.shidu') .text " methods are realized
CSSSelector positioning functions, returned text content.
(4) data memory module
Storage of this method for the Internet of Things dynamic page information of acquisition is very flexible, it is possible to achieve the number of various forms
According to storage.In Ruby, the functions such as file system, reading, write-in and deletion can be accessed using IO classes to realize.File classes
It is the subclass of IO classes, the technical program creates document by File classes, for data storage.I.e. logical " File.new (" # { i }
.txt ", " w ") " method is realized and creates txt documents, and content is write in document.Wherein " # { i } " is document name, passes through change
I is measured to control, so as to realize the function of circulation storage, i.e., is created that a new document per circulation primary.
In addition, also can so it be easily managed in the data deposit database of acquisition.Ruby by using exploitation formula system
One database interface DBI (DatabaseInterface, database interface layer) is connected with database-driven plug-in unit, with realization pair
The access of database and operation data.Therefore, for the Internet of Things network entity real time information of collection, can more be expired using database purchase
The demand of sufficient Internet of Things mass data.So in terms of data storage, flexibly practical, autgmentability is fine.
In actually implementing, the software flow pattern of Watir and Nokogiri information real time acquiring methods is as shown in Fig. 2 originally
Technical scheme is using loop cycle operation crawl data, and the cycle, big I freely controlled, and the minimum period is up to 13 milliseconds.Circulate bar
Part is also freely limited, and the time length of circulation crawl can be controlled from circulation time, cycle-index etc., not yet reaches circulation
Then continue to capture data during condition untill reaching cycling condition.
Distributed data base participative management is used for data real-time management, residing distributed data base ginseng supports global control
The control mode that system is concentrated, global control is scattered, global control section is scattered.
Distributed data base by local place data base management system, global data base management system, global catalog,
Telecommunication management forms.Be responsible for establishing and management local data bank, realize site autonomy ability, perform the functions such as topical application and
Distribution transparency is provided, coordinates the execution of global things and coordinates each local data base management system (local DBMS), ensure the overall situation of database
Uniformity, realize and update the function such as synchronous.Distributed data base can be with artificial intelligence technology, the network communications technology, parallel meter
Calculation technology interpenetrates, and is combined with each other, and turns into the principal character of current database technology development.
Meanwhile distributed data base is the product that traditional database technique is combined with network technology.One distributed number
According to storehouse it is distributed in physical space on each node of computer network, but logically may belong to the data of same system
Set, the system architecture of distributed data base can be used.The distributed memory database technology has Local physical Space Self-governance
The features such as property globally shared with logic, the redundancy of data, transparency of the independence of data and system.In this system
In, to meet claimed below:
(1) each network node memory database keeps its autonomy.
(2) memory database clustering, by read and write abruption, vertically and horizontally cutting strategy reply mass data storage;
(3) a variety of data slit modes, horizontal cutting is carried out on the basis of overall vertical cutting pattern, tackles different answer
With the processing different with being done required for data;
(4) it is mutually coordinated between each node memory database, promote each memory database to can serve as other nodes
Service end;
(5) transparency of data distribution is kept, meets the harmony feature between the distributivity of data and database, with reference to interior
The improvement balanced between deposit data storehouse, solves the requirement that Internet of Things mass data is handled in real time;
(6) memory database persistence, the data variation in memory database need to copy to on disk database, leading to
Two Level Database and asynchronous write are crossed to complete persistence.
In order to implement to facilitate, as shown in figure 3, Real-Time Databases System Technique and cloud computing technology can be subjected to depth integration,
, data expansible, scalable with database size are realized by the cloud computing center server cluster being distributed in all over the world
Base management system reliability, maintainable high distributed real-time database system, the system contain data processing compression, number
According to retrieval, data storage virtualization technology, clash handle, content distributing network technology, transaction scheduling, malfunction monitoring and recovery,
The multiple functions such as load balancing, mass data storage, high concurrent office are realized on the basis of real-time, distribution, virtualization
The functions such as reason, storage encryption, distributed redundancy backup, system dynamic expansion.
Also, in the framework of distributing real-time data bank, the service group of data acquisition unit and database server node
Part accesses the platform by the middleware interface of distributing communication service platform, realizes the interaction with other serviced components.Respectively
Component is attached with other functional units in a manner of servicing, called, it is possible to achieve data interaction it is free, efficient.In addition
By the carry out communication link of other nodes with equally accessing the service, the transmitting-receiving of data can also be taken by distributing communication
The interface of business platform is realized.Distributing communication service platform makes node exist by the buffering queue and asynchronous call mechanism of inside
Data need not be concerned about the state of receiving node when sending, will be adjusted back when receiving data by message and realize that node data obtains.
Data storage, retrieval service component needed for more data acquisition units and data server are accessed flat by cloud service
Platform forms unified data storage, data retrieval service and externally provides the service, breaches conventional separate unit real time data processing
The island mode of server, form the functions such as decentralization, reciprocity Distributed Storage, data retrieval is
System.The real time data of collection is sent to unified data storage service by service platform by data acquisition unit or data server
Functional module carries out the storage of real time data.And client is then even linked into communication garment by platform interface or Web server
Business platform, and to unified data query service request and carry out data query.To by distributing communication service platform to its
He is sent for the server node of data node, and data are sent successfully it can be assumed that being write successfully for data.When node connects
When receiving data, the reception of data can be completed by callback interface.
For data real-time statistics, the application layer that the data in real-time data base are uploaded in Internet of Things three-decker,
And complete to count.
From the point of view of the real-time search of entity information, Lucene is formed using the kit of the open source of written in Java
System, source code is divided into 7 modules, including,
(1) Org.apache.lucene.document modules:Source record for user to be provided is Document, and
Document management during for storing index;
(2) Org.apache.lucene.util modules:For providing the support of public tool-class, constant class;
(3) Org.apache.lucene.store modules:For providing storage management to index file, spy can be selected
Fixed domain is stored or not stored;
(4) Org.apache.lucene.index modules:For providing the management to index, indexed for establishing, renewal
Index or deletion index;
(5) Org.apache.lucene.search modules:, being capable of basis when called for realizing match query function
Index file retrieves relevant matches file;
(6) Org.apache.lucene.analysis modules:For analyzing the file being indexed, to data source on request
Filtered, slicing operation;
(7) Org.apache.lucene.queryparser modules:For analyzing the input inquiry word of user,
Suitable query is provided, belongs to query analyzer.
Java language is object-oriented language, has strong platform-neutral.Therefore, Lucene index file lattice
Formula also has strong platform-neutral.Also, the forms such as data source file word, html, txt, xml for being related to, it is first
Resolver is first passed through, extracts the manageable text messages of Lucene, then by Analyzer processing,
Analyzer processing includes:Participle, i.e., carry out effective cutting by data source, according to space or punctuate, be divided into
Word or numeral one by one, then remove optional word.
Meanwhile Lucene system architecture mainly encapsulates three parts group by index core, external interface and foundation structure
Into this several major has all been packaged into abstract class, so that the Lucene degree of coupling is lowered, preferably embodies
The characteristics of Lucene object-oriented.
Directory system is the nucleus module of Lucene search frameworks, and the process for establishing index is exactly to turn substantial amounts of data source
It is melted into the document form for that can be rapidly searched.When data source substantial amounts, the foundation of index can greatly improve search
Efficiency.The process that Lucene establishes index is as shown in Figure 4.
Analyzer is the text analyzer in Lucene, and one character string is divided into single word by setting rule,
And the invalid word in character string is filtered out, invalid word includes " of ", " the " in English, " ", " " in Chinese
Deng these words are without effective information.It, which is filtered, can reduce index file, improve recall precision.Afterwards, create Lucene's
Index document D ocument objects and corresponding index domain Field objects.Document is the record for the data source being indexed, it
Record including text, character string or database table etc..One Document can be made up of multiple information fields,
They are stored by Field in Document.The information of the memory scan of needs is added in Document Field, and
The Field that needs are indexed writes memory, can be internal memory or disk.This completes the process for establishing index.
When actually implementing, index data base is established after completion, can realize function of search by search index storehouse.
Lucene search indexs are exactly to obtain the inquiry request Query of user's input, search for existing index data base, are then back to result
Process.Lucene retrieving is as shown in Figure 5.
First, Lucene calls IndexSearcher to open index database, and IndexSearcher is most basic in Lucene
Gopher.Then using QueryPaser conversion query statements, so that it becomes being available for the object inquired about inside Lucene.
After search is completed, Lucene returns to search result and is shown to user.In Lucene, represent to search with the example of Hits classes
The set of hitch fruit.Lucene once only deposits a part of retrieval result in Hits set, and display discharges empty after completing
Between, then other part is shown, rather than all results are disposably all put into, so do greatly saving internal memory
Space.
For real-time estimate, by judging the cyclic pattern of Internet of Things network entity, to special time period settling time window, in advance
The state value of certain time position entities in the future is measured, includes polymerization prediction mould according to the cyclic pattern of Internet of Things event is different
Type, monocycle forecast model, multiperiod forecasts model.
Specifically, it is simplest prediction mould to polymerize forecast model (APM-AggregatedPredictionModel)
Type.Polymerization forecast model does not have specific aim for the cyclic pattern of the time window of selection, suitable for any cyclic pattern and original
Reason is simple, and amount of calculation is small, but the result of the prediction of the forecast model is inaccurate, can only give one guiding substantially of user, no
Analysis of Policy Making suitable for Internet of Things.
If it is repeated after one section of course of event obvious period L, then be likely to identical
Offset the different time points (for period L), monocycle forecast model (SPM-Single-
PeriodPredictionModel) there is good predictive ability to such case.
Sensor senses to entity state be many times this kind of entity of multicycle mixed influence, multiperiod forecasts model
(MPM-Multi-periodPredictionModel) more preferable prediction effect can be reached.In multiperiod forecasts model, cycle ginseng
Number is more.The discovery of periodic event uses convolutional calculation periodic method in MPM, and MPM can obtain good precision for periodic event
Prediction result, it is computationally intensive but because the convolution cycle finds that algorithm is related to multiple FFT and inverse FFT is calculated, take more, discomfort
In the Internet of Things research high for requirement of real-time.
From the point of view of real-time exchange, the fixed disk file of whole computer is scanned, it is established that the information MAP contact of virtualization,
The accessing operation of fileinfo is directly carried out, it is necessary to using index technology and caching technology when data query exchanges.It is specific next
Say, index technology can include aggregat ion pheromones, nonclustered index, and aggregat ion pheromones are according to the order of index, to database, form etc.
Information is disposably stored.The frequency of use of this index technology is higher, has very strong operability, but can not retrieve new
The data of increasing, it can not also show newly-increased data.Nonclustered index can show newly-increased data message, can quick search go out number
It is believed that breath, does not interfere with the modification of data message yet.File buffering strategy file buffering is completed by interim table.Caching technology master
If high-frequency access can be carried out to data message in a short time, accelerate the speed of data query, such as, in inquiry thing
During networking Back ground Information, it is possible to Computer Service end is cached by caching technology, so can not only be carried for user
For corresponding data message, moreover it is possible to effectively avoid the repetition of data message from inquiring about, reduce the frequency of database access, can be maximum
Limit improves the efficiency of inquiry reaction.
Meanwhile file buffering strategy file buffering also contains both of which, existing row buffering pattern, there is table buffering mould again
Formula.Be expert in buffer mode, mainly data message handled in real time, read in data message, in transmitting procedure, it is necessary to
Waste more times.With the continuous development of technology of Internet of things, this row buffering pattern has not adapted to technology hair at this stage
The demand of exhibition, and table buffering pattern belongs to a kind of disposable data message mathematics method, processing data information speed ratio is very fast, by
In substantial amounts of computer memory space can be occupied when data message calculates, computer waste of storage space can be caused, if meter
Calculation machine memory space is little, and this mode is difficult to be widely adopted.Therefore, it is necessary to further study file buffering strategy, find
A kind of better information way to play for time, seeks to row buffering pattern and table buffering pattern being used in combination with, is using table
During buffer mode, data buffering information can be deleted in time, set some intelligent, automation data to delete program, so
The utilization ratio of file data can not only be improved, moreover it is possible to save the memory space of computer.
Interim table technology primarily directed to traditional tables of data processing method for, in traditional tables of data processing procedure
In, directly tables of data can be operated, by corresponding data form connection method, operational data table, so as to select
The tables of data for being more conform with user data information demand is selected out, comparatively, utilization ratio is not high for traditional data table, and form
Stability, integrality etc. all existing defects., can be timely and interim table belongs to a kind of brand-new tables of data processing method
Processing data information, data message operation can also be carried out for whole interim table quickly by the interim table of data message typing,
The utilization ratio of data message can be so effectively improved, and the data information security of interim table can be higher.
Knowable to big data quantity real time information exchanging policy based on Internet of Things is analyzed, Internet of Things is in the reality of people
It is widely used in life, and great convenience is provided for the life of people.The exchange of big data quantity real time information is realized,
Information exchange system that can quickly in more New Tradition technology of Internet of things, improves the efficiency that information exchanges, gradually realizes that data are believed
Cease the shared of resource and efficiently utilize.
It is can be seen that by above-mentioned character express and with reference to accompanying drawing using after the present invention, gather around and have the following advantages:
1st, the automatic analysis technology based on internet of things oriented real time data, each sensor physically is will be attached to this
The state of entity is perceived, by the information perceived by wireless network transmissions into object database.
2nd, on webpage, the data on webpage are in Internet of Things three-decker real-time release the information in database again
Application layer, the Internet of Things information on acquisition applications layer, establishes search framework to the Internet of Things data of collection, searches for Internet of Things in real time
Entity information.
The 3rd, search result can be directly transmitted to user or be transmitted to prediction module be predicted processing, establish forecast model, in advance
Survey acquired results more has realistic meaning for direct search result, then the result after prediction is transmitted into user, so as to give
The effect of directiveness is played in the decision-making of user.
4th, the entity for meeting particular state can be searched for, while the real-time status of special entity can be searched and predict a timing
Between after the entity state, so as to therefrom obtaining the information useful to user.
Described above is only the preferred embodiment of the present invention, is not intended to limit the invention, it is noted that for this skill
For the those of ordinary skill in art field, without departing from the technical principles of the invention, can also make it is some improvement and
Modification, these improvement and modification also should be regarded as protection scope of the present invention.
Claims (8)
1. the automatic analysis method of internet of things oriented real time data, it is characterised in that comprise the following steps:
Step 1, entity information collection and acquisition, the method obtained in real time using Watir and Nokogiri information are handled;
Step 2, data real-time management;
Step 3, data real-time statistics;
Step 4, entity information are searched in real time;
Step 5, real-time estimate;
Step 6, real-time exchange.
2. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step
In one, by loading page module, page HTML modules, parsing gained HTML modules and data memory module are obtained,
The loading page module and data memory module provide and the extraneous interface contacted;
The loading page module loads the outside page by network linking address, passes to and obtains HTML modules, obtains HTML
Module is directed to the dynamic page that passes over of loading page module, obtain the html document of the page, while will be obtained
Html document passes to parsing HTML modules;
Parse HTML modules and parse required content of text by location technology from obtained html document, will parse
The content come carries out data storage to data memory module.
3. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step
In two, using distributed data base participative management, residing distributed data base ginseng is supported in global domination set, global control point
Dissipate, the control mode that global control section is scattered;
The distributed data base by local place data base management system, global data base management system, global catalog,
Telecommunication management forms.
4. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step
In three, application layer that the data in real-time data base are uploaded in Internet of Things three-decker, and complete to count.
5. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step
In four, Lucene systems are formed using the kit of the open source of written in Java, source code has been divided into 7 modules,
Including,
Org.apache.lucene.document modules:Source record for user to be provided is Document, and for depositing
Document management during storage index;
Org.apache.lucene.util modules:For providing the support of public tool-class, constant class;
Org.apache.lucene.store modules:For providing storage management to index file, specific domain can be selected to enter
Row is stored or not stored;
Org.apache.lucene.index modules:For providing management to index, indexed for establishing, renewal index or
It is to delete index;
Org.apache.lucene.search modules:, can be according to index text when called for realizing match query function
Part retrieves relevant matches file;
Org.apache.lucene.analysis modules:For analyzing the file being indexed, data source was carried out on request
Filter, slicing operation;
Org.apache.lucene.queryparser modules:For analyzing the input inquiry word of user, there is provided suitable
Query, belong to query analyzer.
6. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step
In four, for the data source file being related to, resolver is first passed around, extracts the manageable text messages of Lucene, then
By Analyzer processing,
The Analyzer processing includes:Participle, i.e., carry out effective cutting by data source, according to space or punctuate, be divided into
Word or numeral one by one, then remove optional word;
The Analyzer is the text analyzer in Lucene, and one character string is divided into single word by setting rule,
And filter out the invalid word in character string, the invalid word include English in " of ", " the ", Chinese in " ",
" ", these words are without effective information;Afterwards, Lucene index document D ocument objects and corresponding index domain are created
Field objects.
7. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step
In five, by judging the cyclic pattern of Internet of Things network entity, to special time period settling time window, certain time in the future is predicted
The state value of position entities, include polymerization forecast model according to the cyclic pattern of Internet of Things event is different, monocycle forecast model,
Multiperiod forecasts model.
8. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step
In six, the fixed disk file of whole computer is scanned, it is established that the information MAP contact of virtualization, directly carry out depositing for fileinfo
Extract operation, when data query exchanges, it is necessary to using index technology and caching technology,
The index technology can include aggregat ion pheromones, nonclustered index, aggregat ion pheromones according to the order of index, to database,
The information such as form are disposably stored, and nonclustered index can show newly-increased data message;
The file buffering strategy file buffering is completed by interim table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710874448.XA CN107733694A (en) | 2017-09-25 | 2017-09-25 | The automatic analysis method of internet of things oriented real time data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710874448.XA CN107733694A (en) | 2017-09-25 | 2017-09-25 | The automatic analysis method of internet of things oriented real time data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107733694A true CN107733694A (en) | 2018-02-23 |
Family
ID=61207921
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710874448.XA Pending CN107733694A (en) | 2017-09-25 | 2017-09-25 | The automatic analysis method of internet of things oriented real time data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107733694A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635264A (en) * | 2018-11-29 | 2019-04-16 | 上海哔哩哔哩科技有限公司 | Game service datamation statistical method, system and storage medium |
CN111309399A (en) * | 2020-02-26 | 2020-06-19 | 北京思特奇信息技术股份有限公司 | Method, system, medium and device for starting easy-to-ask native client |
CN111490886A (en) * | 2019-01-25 | 2020-08-04 | 北京数安鑫云信息技术有限公司 | Network data processing method and system |
CN112688711A (en) * | 2021-02-02 | 2021-04-20 | 深圳市安普信达软件技术服务有限公司 | Food detection management system based on cloud computing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080086627A1 (en) * | 2006-10-06 | 2008-04-10 | Steven John Splaine | Methods and apparatus to analyze computer software |
CN103092936A (en) * | 2013-01-08 | 2013-05-08 | 华北电力大学(保定) | Real-time information acquisition method of dynamic page of Internet of Things |
CN104615748A (en) * | 2015-02-12 | 2015-05-13 | 华北电力大学(保定) | Watir-based (web application testing in ruby based) internet-of-things web event processing method |
CN106844538A (en) * | 2016-12-30 | 2017-06-13 | 中国电子科技集团公司第五十四研究所 | A kind of many attribute sort methods and device for being applied to Internet of Things search |
-
2017
- 2017-09-25 CN CN201710874448.XA patent/CN107733694A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080086627A1 (en) * | 2006-10-06 | 2008-04-10 | Steven John Splaine | Methods and apparatus to analyze computer software |
CN103092936A (en) * | 2013-01-08 | 2013-05-08 | 华北电力大学(保定) | Real-time information acquisition method of dynamic page of Internet of Things |
CN104615748A (en) * | 2015-02-12 | 2015-05-13 | 华北电力大学(保定) | Watir-based (web application testing in ruby based) internet-of-things web event processing method |
CN106844538A (en) * | 2016-12-30 | 2017-06-13 | 中国电子科技集团公司第五十四研究所 | A kind of many attribute sort methods and device for being applied to Internet of Things search |
Non-Patent Citations (3)
Title |
---|
毛捷磊: "基于物联网的大数据量实时信息交换策略分析", 《信息与电脑》 * |
沈丹凤: "面向物联网的实体实时搜索技术研究", 《中国优秀硕士论文期刊网》 * |
翁祖泉: "基于物联网海量数据处理的数据库技术分析与研究", 《物联网技术》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635264A (en) * | 2018-11-29 | 2019-04-16 | 上海哔哩哔哩科技有限公司 | Game service datamation statistical method, system and storage medium |
CN111490886A (en) * | 2019-01-25 | 2020-08-04 | 北京数安鑫云信息技术有限公司 | Network data processing method and system |
CN111309399A (en) * | 2020-02-26 | 2020-06-19 | 北京思特奇信息技术股份有限公司 | Method, system, medium and device for starting easy-to-ask native client |
CN112688711A (en) * | 2021-02-02 | 2021-04-20 | 深圳市安普信达软件技术服务有限公司 | Food detection management system based on cloud computing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103631596B (en) | Business object data typing and the configuration device and collocation method for updating rule | |
CN101997927B (en) | A kind of method and system of WEB platform data caching | |
CN103310012A (en) | Distributed web crawler system | |
Li et al. | An active crawler for discovering geospatial web services and their distribution pattern–A case study of OGC Web Map Service | |
CN105045932B (en) | A kind of data page querying method based on descending storage | |
CN107733694A (en) | The automatic analysis method of internet of things oriented real time data | |
CN101609460B (en) | Searching method of supporting isomeric geoscientific data resources and searching system | |
CN103092936B (en) | A kind of Internet of Things dynamic page real-time information collection method | |
Zhang et al. | Building information modeling–based cyber-physical platform for building performance monitoring | |
CN106021583A (en) | Statistical method and system for page flow data | |
CN108228743A (en) | A kind of real-time big data search engine system | |
Ding et al. | SeaCloudDM: a database cluster framework for managing and querying massive heterogeneous sensor sampling data | |
CN115827907B (en) | Cross-cloud multi-source data cube discovery and integration method based on distributed memory | |
Liu et al. | Sensor information retrieval from Internet of Things: Representation and indexing | |
CN109710767A (en) | Multilingual big data service platform | |
CN102868601B (en) | Routing system related to network topology based on graphic configuration database businesses | |
Tomasic et al. | Improving access to environmental data using context information | |
CN115168474B (en) | Internet of things central station system building method based on big data model | |
Spanos et al. | SensorStream: A semantic real–time stream management system | |
da Silva et al. | Providing geographic-multidimensional decision support over the Web | |
Shen et al. | A Catalogue Service for Internet GIS ervices Supporting Active Service Evaluation and Real‐Time Quality Monitoring | |
Brody et al. | Digitometric services for open archives environments | |
KR20140104544A (en) | System and method for building of semantic data | |
CN109542933A (en) | A kind of data base query method, device, equipment and medium | |
Ben-El-Kezadri et al. | XAV: a fast and flexible tracing framework for network simulation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180223 |