CN107733694A - The automatic analysis method of internet of things oriented real time data - Google Patents

The automatic analysis method of internet of things oriented real time data Download PDF

Info

Publication number
CN107733694A
CN107733694A CN201710874448.XA CN201710874448A CN107733694A CN 107733694 A CN107733694 A CN 107733694A CN 201710874448 A CN201710874448 A CN 201710874448A CN 107733694 A CN107733694 A CN 107733694A
Authority
CN
China
Prior art keywords
data
internet
index
real
things
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710874448.XA
Other languages
Chinese (zh)
Inventor
钮立明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Nothing Worry Technology Co Ltd
Original Assignee
Suzhou Nothing Worry Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Nothing Worry Technology Co Ltd filed Critical Suzhou Nothing Worry Technology Co Ltd
Priority to CN201710874448.XA priority Critical patent/CN107733694A/en
Publication of CN107733694A publication Critical patent/CN107733694A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • G06F16/986Document structures and storage, e.g. HTML extensions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Abstract

The present invention relates to a kind of automatic analysis method of internet of things oriented real time data, it comprises the following steps:Entity information is gathered and obtained, and the method obtained in real time using Watir and Nokogiri information is handled;Data real-time management;Data real-time statistics;Entity information is searched in real time;Real-time estimate;Real-time exchange.Thus, search result can be directly transmitted to user or be transmitted to prediction module be predicted processing, establish forecast model, prediction acquired results more have realistic meaning for direct search result, the result after prediction is transmitted to user again, so as to play the effect of directiveness to the decision-making of user.Can search meet the entity of particular state, while after the real-time status of special entity can be searched and predict certain time the entity state, so as to therefrom obtain the information useful to user.

Description

The automatic analysis method of internet of things oriented real time data
Technical field
The present invention relates to a kind of automatic analysis method, more particularly to a kind of side of automatically analyzing of internet of things oriented real time data Method.
Background technology
With the continuous progress of development in science and technology, technology of Internet of things oneself be widely used in smart home, intelligence control traffic, intelligence disappear The multiple fields such as anti-, environmental monitoring, logistics express delivery, foodstuff traceability, industry monitoring, hotel management, health care, more and more People wishes conveniently and efficiently obtain Internet of Things data information by network, observation processing is carried out to data at any time, to do Go out corresponding decision-making.
Internet of Things real time data automatic analysis system is realized by technologies such as radio frequency identification, sensor and network interconnections To the mark of the various objects of physical world, perception, monitoring, tracking, control box management, people and thing, human and environment and thing with Efficient, intelligent, harmonious information exchange is realized between thing, physical world and information are realized eventually through intelligent decision control technology The fusion in the world.
Traditional information network is using semanteme as core, and Internet of Things (InternetofThings, abbreviation IOT) then exists On the basis of this, the Internet of Things awareness apparatus with wireless short range communication ability is embedded into various types of utensils so that this The various data of a little utensil contributions can be obtained automatically by system, so as to extend the scope of physical world information.IOT in recent years Technology is developed rapidly, and numerous notable achievements is also achieved in terms of industrialization.But because IOT awareness apparatus is using different Hardware platform, operating system, database and middleware, its rely on network environment it is also different so that between equipment not can from By communicating, its application platform is difficult to share and reused, and causes that IOT application and developments difficulty is big, the right high, autgmentability of system misfortune Difference, third party's resource are difficult to the inferior position such as integrated.The bottleneck that these inferior positions develop into IOT so that IOT realizes large-scale application It is heavy with Difficulty.Therefore, letter need to build a kind of more open and flexible system architecture, be passed so as to more easily share The information data and control function of sensor.With Internet rapid development and widely use, all terminal users may be used To share various information and application by Web.Because Web technologies realize the basis to the various services of each terminal user offer Platform, it becomes IOT and realizes the shared best-of-breed technology selection of heterogeneous resource.Based on this, IOT is combined with Web technologies Form the open system framework of article WWW (WebofThings, abbreviation WOT).WOT utilizes Web design concept and skill Art, Web information space is incorporated into by the information data and service of contributing all kinds of Internet of Things awareness apparatus, so as to realize difference The access of data and service, access are with polymerizeing under platform.
With the attention to Internet of Things both at home and abroad, in terms of Internet of Things search technique, for different application environments, Exist it is a variety of realize technology, be such as applied to Snoggle/Microsearch systems [} lfgl of centralized management, be not suitable for it is big The big OCHCObjectscallinghome of the MAX system-computed amount expenses of scale environment) } system, support Data Stream Processing GSNCGlobalsensornetworks systems etc.;These systems can be realized a certain degree of according to conditions such as applicable situations Internet of Things is searched for, but the physical entity in terms of 10,000,000,000 is counted in the Internet of Things, magnanimity entity have isomerism, physical security, The features such as data dynamic, state real-time, the entity that these systems can not still be dealt with Internet of Things completely are searched in real time.
In view of it is above-mentioned the defects of, the design people, be actively subject to research and innovation, it is real-time to found a kind of internet of things oriented The automatic analysis method of data, make it with more the value in industry.
The content of the invention
A kind of in order to solve the above technical problems, automatically analyzing it is an object of the invention to provide internet of things oriented real time data Method.
The automatic analysis method of the internet of things oriented real time data of the present invention, it comprises the following steps:
Step 1, entity information collection is with obtaining, at the method obtained in real time using Watir and Nokogiri information Reason;Step 2, data real-time management;Step 3, data real-time statistics;Step 4, entity information are searched in real time;Step 5, it is real When predict;Step 6, real-time exchange.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 1, lead to Loading page module is crossed, obtains page HTML modules, parsing gained HTML modules and data memory module,
The loading page module and data memory module provide and the extraneous interface contacted;
The loading page module loads the outside page by network linking address, passes to and obtains HTML modules, obtains HTML modules are directed to the dynamic page that loading page module passes over, and obtain the html document of the page, while will be obtained Html document pass to parsing HTML modules;
Parse HTML modules and parse required content of text by location technology from obtained html document, will solve The content separated out carries out data storage to data memory module.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 2, Using distributed data base participative management, residing distributed data base ginseng is supported in global domination set, global control is scattered, global The scattered control mode in control section;
The distributed data base is by local place data base management system, global data base management system, global data Dictionary, telecommunication management composition.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 3, The application layer that data in real-time data base are uploaded in Internet of Things three-decker, and complete to count.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 4, Lucene systems are formed using the kit of the open source of written in Java, source code has been divided into 7 modules, including,
Org.apache.lucene.document modules:Source record for user to be provided is Document, is used in combination Document management during storing index;
Org.apache.lucene.util modules:For providing the support of public tool-class, constant class;
Org.apache.lucene.store modules:For providing storage management to index file, can select specific Domain is stored or not stored;
Org.apache.lucene.index modules:For providing the management to index, indexed for establishing, update rope Draw or delete index;
Org.apache.lucene.search modules:, can be according to rope when called for realizing match query function Quotation part retrieves relevant matches file;
Org.apache.lucene.analysis modules:For analyzing the file being indexed, data source is entered on request Row filtering, slicing operation;
Org.apache.lucene.queryparser modules:For analyzing the input inquiry word of user, there is provided Suitable query, belongs to query analyzer.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 4, For the data source file being related to, resolver is first passed around, the manageable text messages of Lucene is extracted, then passes through Analyzer processing,
The Analyzer processing includes:Participle, i.e., carry out effective cutting by data source, according to space or punctuate, It is divided into word one by one or numeral, then removes optional word;
The Analyzer is the text analyzer in Lucene, and one character string is divided into single word by setting rule Language, and the invalid word in character string is filtered out, the invalid word includes " of ", " the " in English, in Chinese " ", " ", these words are without effective information;Afterwards, Lucene index document D ocument objects and corresponding are created Index domain Field objects.
Further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 5, By judging the cyclic pattern of Internet of Things network entity, to special time period settling time window, predict certain position in the future time The state value of entity, include polymerization forecast model according to the cyclic pattern of Internet of Things event is different, monocycle forecast model is more all Phase forecast model.
Yet further, the automatic analysis method of above-mentioned internet of things oriented real time data, wherein, in the step 6, Scan the fixed disk file of whole computer, it is established that the information MAP contact of virtualization, directly carry out the access behaviour of fileinfo Make, when data query exchanges, it is necessary to using index technology and caching technology,
The index technology can include aggregat ion pheromones, nonclustered index, and aggregat ion pheromones are according to the order of index, to data The information such as storehouse, form are disposably stored, and nonclustered index can show newly-increased data message;
The file buffering strategy file buffering is completed by interim table.
By such scheme, the present invention at least has advantages below:
1st, the automatic analysis technology based on internet of things oriented real time data, each sensor physically is will be attached to this The state of entity is perceived, by the information perceived by wireless network transmissions into object database.
2nd, on webpage, the data on webpage are in Internet of Things three-decker real-time release the information in database again Application layer, the Internet of Things information on acquisition applications layer, establishes search framework to the Internet of Things data of collection, searches for Internet of Things in real time Entity information.
The 3rd, search result can be directly transmitted to user or be transmitted to prediction module be predicted processing, establish forecast model, in advance Survey acquired results more has realistic meaning for direct search result, then the result after prediction is transmitted into user, so as to give The effect of directiveness is played in the decision-making of user.
4th, the entity for meeting particular state can be searched for, while the real-time status of special entity can be searched and predict a timing Between after the entity state, so as to therefrom obtaining the information useful to user.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, below with presently preferred embodiments of the present invention and coordinate accompanying drawing describe in detail as after.
Brief description of the drawings
Fig. 1 is the structural representation of Internet of Things Real time data acquisition block schematic illustration.
The Watir that Fig. 2 is obtains software flow schematic diagram in real time with Nokogiri information.
Fig. 3 is distributing real-time data bank configuration diagram.
Fig. 4 is the process schematic that Lucene establishes index.
Fig. 5 is Lucene retrieving schematic diagram.
Embodiment
With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.
Such as the automatic analysis method of Fig. 1 to 5 internet of things oriented real time data, its unusual part is to include following Step:
First, entity information collection is with obtaining, at the method obtained in real time using Watir and Nokogiri information Reason.The reason is that the entity real time information of Internet of Things dynamic page is obtained from client, can be connected for any one The client of internet, the reality that can realize entity multidate information is obtained by the Internet of Things web page real time information based on Watir When obtain and storage.This acquisition methods need not access server simultaneously, it is not required that the database in connection server.
Watir is the Open-Source Tools for being used for webpage automatic test, is to be realized with Ruby, full name is " WebApplicationTestinginRuby ", it is compact, flexibly, there is provided many functions.Watir can be with simulation browser Various operations are carried out, such as:Click on control, button, loading webpage etc..Watir scripts directly can be run in a browser.This skill The operations such as loading Internet of Things dynamic page, the positioning of required page elements are carried out using Watir in art scheme.
Ruby parses XML, most notable in HTML plug-in unit to have Hpricot and Nokogiri.Nokogiri speed ratio Widely used Hpricot is fast many.Show by Benchmark tests, Nokogiri is in the speed of loading XML document 7 times of Hpricot, it is 5 times of Hpricot in the speed of XPATH search, and is on CSSSelector search 1.62 times of Hpricot.Nokogiri can parse HTML/XML files, using the teaching of the invention it is possible to provide XPATH and CSSSelector is supported. The technical program can reach very high speed using Nokogiri parsing html documents.
The method can carry out cycle acquisition compared with traditional MetaSeeker methods according to the predefined cycle, for Webpage renewal speed, which does not reach 13 milliseconds of dynamic real time data and can realized, exhaustively to be captured, and can be realized self-defined Form stores, and the scalability of this method is fine.
Afterwards, data real-time management.Specifically, the real time data of magnanimity needs to establish real-time data base carries out pipe to it Reason.First, multiple node databases are included in distributed memory database system, these node databases all remain certain Autonomy, data distribution and database reconciliation, with data distribution possessed by the transparency be combined with each other after can realize Database balance is improved, and preferably meets the requirement that Internet of Things mass data is handled in real time.Secondly, distributed storage technology with Cloud computing technology has been bound to each other to form distributing real-time data bank technology, the number of multiple data acquisition units and data server Shorting data storage and Data Detection service, energy are formed on the basis of cloud service platform according to memory unit and data retrieval part Enough requirements for preferably meeting Internet of Things mass data processing.
Then, data real-time statistics are carried out.During this period, search engine has played leading role, Yong Hutong in entity search The search engine inquiry entity information and related entities of entity-oriented search are crossed, the technical program mainly introduces Internet of Things network entity reality When search in the core search framework Lucene that uses, realize the structure to Lucene search frameworks.User passes through Internet of Things The function of search of middle entity inquires about the entity information of needs from the entity information of real-time dynamic change.The attribute bag of physical entity Static attribute, dynamic attribute etc. are included, these attributes are also closely related with language ambience information in addition.Internet of Things solid data has sea The characteristic such as amount property, isomerism, dynamic, real-time, compatibility and security.
Then, searched in real time using entity information, real-time estimate.Why this mode is used, the entity in Internet of Things State has hard real-time and high dynamic, obtains entity information in real time and returns to the information of user with very strong by search Instantaneity, little to the directive significance of user, the state of the particular result only to searching is analyzed and therefrom predicted The information sometime put in the future is obtained, the effect of directiveness could be so played to the decision-making of user.
Finally, real-time exchange processing is carried out.
When Internet of Things data information exchanges, it is necessary to by Winesap function accessing file information, calculating is concentrated mainly on Data information exchange is realized on machine hard disk.But during the data documents disposal of reality, this file operation method work Inefficient, along with the further development in big data epoch, various types of file datas are more pursued in processing procedure Efficiency.Therefore, by the further analysis to DIA documents information accessing operating technology, a brand-new DIA documents information accessing side is found out Method, that is, Memory Mapping File method, larger data message amount file can quickly be handled, obtain corresponding number According to file, corresponding memory pointer is set on the computer's hard, and corresponding file access authority is set, so just can ensure that The security of DIA documents information accessing.Thereby, it is possible to save the time, the memory headroom of computer system is expanded, improves data The treatment effeciency of message file.
From the point of view of a preferable embodiment of the invention, gather and obtain for entity information, loading page can be passed through Module, obtain page HTML modules, parsing gained HTML modules and data memory module.Loading page module and data storage mould Block provides and the extraneous interface contacted.Loading page module loads the outside page by network linking address, passes to acquisition HTML modules, obtain HTML modules and be directed to the dynamic page that loading page module passes over, obtain the html document of the page, The html document obtained is passed into parsing HTML modules simultaneously.HTML modules are parsed from obtained html document by fixed Position technology parses required content of text, and the content parsed is carried out into data storage to data memory module.For The various applications of Internet of Things are exactly to be extended in data memory module.
, can be with reference to from the point of view of Fig. 1 for actual implement:
(1) loading page and frame module.
Involved Internet of Things dynamic page includes smart home, humiture, browsing data, temperature tactics, illumination strategy Five bulks.Its corresponding HTML fragment is as follows:
<Div class=" nav ">
<ul>
<Li class=" mainlevel " id=" mainlevel_01 " jQuery1355212543380=" 2 ">
<A onclick=" hrefControl (' body.jsp ');" herf=" javascript:void(0);”>
Text _ smart home
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_02 " jQuery1355212543380=" 4 ">
<A onclick=" hrefControl (' jsp/wsd.jsp ');" herf=" javascript:void(0);”>
Text _ humiture
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_03 " jQuery1355212543380=" 6 ">
<a
Onclick=" hrefControl (' servlet/DataServletPage=1 ');" herf=" javascript:void(0);”>
Text _ browsing data
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_04 " jQuery1355212543380=" 8 ">
<A onclick=" hrefControl (' servlet/tacticServletPrefix= temperature’);
" herf=" javascript:void(0);”>
Text _ temperature tactics
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_05 " jQuery1355212543380=" 10 ">
<A onclick=" hrefControl (' servlet/tacticServletPrefix=beam ');
" herf=" javascript:void(0);”>
Text _ illumination strategy
Text _ empty text node
<Li class=" mainlevel " id=" mainlevel_06 " jQuery1355212543380=" 12 ">
<Li class=" mainlevel " id=" mainlevel_05 " style=background-image:none;
background-attachment:scroll;background-repeat:repeat;background- position-x:
0%;background-position-y:0%;background-color:transparent;”jQuery
1355212543380=" 14 "/>
<Id=" body " src " jsp/wsd.jsp " the frame B of 643 " height=" of iframe width=" 568 " The margin of order=" 0 "
Width=" 5 ">
Meanwhile the various operations of Watir energy simulation browsers are relied on, the present invention is using Watir loading Internet of Things dynamics The page.Specifically, " Watir is passed through::Browser.new " creates a browser instances, so as to carry out the behaviour of simulation browser Make.By " goto (' http:// 10.14.11.100/sh/index.jsp') " method is loaded into smart home dynamic page. Wherein http:// 10.14.11.100/sh/index.jsp is the chained address of internet of things intelligent household dynamic page.For Other Internet of Things dynamic pages, the link only need to be replaced with to other chained addresses.
The data for needing to gather when actually implementing are located at below humiture module, after Internet of Things dynamic page is entered, Real-time saltus step data under humiture module can just be browsed by having to carry out page turn over operation.Page turn over operation does not change page chain ground connection Location, and content of pages changes.In actual browser, performed by mouse and click on page turn over operation.But simulated in Watir clear Look at during device, clicking operation is only simulated by " onclick " method, so as to realize page-turning function.Internet of Things dynamic page Five bulks in face represent that humiture module is second piece, in second " li " label, at this by five " li " nodes " onclick=" hrefControl (' jsp/wsd.jsp ') in " a " label under " li " label;" " represent clicking operation To call the js modules of the real-time saltus step data of Dynamic Announce.Hit in Watir simulation browsers point, " a " label and " li " mark Label are not supported to click on, No. id " mainlevel_02 " therefore, navigated in " li " label, so as to the under " a " label One link carries out clicking operation, using " li (:Id, ' mainlevel_02') the realization simulation of .links [0] .click " methods Click on page turn over operation.
There are frame structure, i.e. IFrame frameworks under humiture module, the js modules of real-time saltus step data are located at IFrame frames In frame structure, must be navigated to the Watir dynamic web pages loaded with Watir under IFrame frameworks could take out under framework Html document parses for Nokogiri.Watir by the normalize_specifiers in Locator classes and Matchwithspecifiers methods realize positioning, and the effect of normalize_specifiers methods is construction Specifiers, specifiers are positioning " mark " to be used of object.Matchwithspecifiers is then to judge member Whether element meets feature defined in specifiers.The technical program positions Watir by the src attributes of IFrame frameworks To " jsp/wsd.jsp ", i.e., with method " .frame (:Src, ' jsp/wsd.jsp') " positioning is realized, wherein " jsp/ Wsd.jsp " is the js modules of the real-time saltus step data of Dynamic Announce.
(2) HTML modules are obtained.
Due to calling Internet of Things Dynamic Announce content to need one in client display from server by Javascript The fixed time.Therefore, although completing the loading positioning for Internet of Things dynamic page, still need to consider to move in practice The state page loading time.The present invention is controlled etc. to be loaded by Watir waiting mechanism and Ruby dormancy mechanism The time of dynamic page, i.e., realized by " .wait " method and " sleep () " method.After Watir loading positioning is completed, Html document under framework is obtained by " .frame.html " method.
(3) HTML modules are parsed.
The html document that the technical program is obtained using Nokogiri parsings.Nokogiri passes through " Nokogiri:: The html document that HTML.parse " methods obtain, and parsed.Nokogiri provides XPath and CSSSelector modes To find the node in document.The present invention looks for the node of required content on the page by CSSSelector, fetches the moment Page text information, carry out location element without XPath, because CSSlocator is faster than XPathlocator speed, particularly exist Below IE (XPath resolvers of the IE without oneself).CSSSelector can accurately navigate to the Elements of test very much. Data on Internet of Things dynamic page are captured by loop cycle pattern, the single acquisition time is short, and stability is high, and real-time is good, The real-time collection to multidate information can be realized well.
Specifically, CSSSelector can be divided into the basic type of several classes roughly:ID selectors (#id), Class choosings Select device (.class), type (type) selector (p), attribute (Attribute) selector, PseudoClasses selectors etc.. Can combine these single selectors in use, such as:div#id,div:last-child.Used in this programme To be Class selectors, because class values have of the same name for " shidu ", navigate to the Class under IFrame frameworks first It is worth for the node of " shiduqu ", then the node that class values are " shidu " is chosen toward next stage.Returning to class values is All content of text of " shidu ".I.e. by the way that " .css (' div.shiduqudiv.shidu') .text " methods are realized CSSSelector positioning functions, returned text content.
(4) data memory module
Storage of this method for the Internet of Things dynamic page information of acquisition is very flexible, it is possible to achieve the number of various forms According to storage.In Ruby, the functions such as file system, reading, write-in and deletion can be accessed using IO classes to realize.File classes It is the subclass of IO classes, the technical program creates document by File classes, for data storage.I.e. logical " File.new (" # { i } .txt ", " w ") " method is realized and creates txt documents, and content is write in document.Wherein " # { i } " is document name, passes through change I is measured to control, so as to realize the function of circulation storage, i.e., is created that a new document per circulation primary.
In addition, also can so it be easily managed in the data deposit database of acquisition.Ruby by using exploitation formula system One database interface DBI (DatabaseInterface, database interface layer) is connected with database-driven plug-in unit, with realization pair The access of database and operation data.Therefore, for the Internet of Things network entity real time information of collection, can more be expired using database purchase The demand of sufficient Internet of Things mass data.So in terms of data storage, flexibly practical, autgmentability is fine.
In actually implementing, the software flow pattern of Watir and Nokogiri information real time acquiring methods is as shown in Fig. 2 originally Technical scheme is using loop cycle operation crawl data, and the cycle, big I freely controlled, and the minimum period is up to 13 milliseconds.Circulate bar Part is also freely limited, and the time length of circulation crawl can be controlled from circulation time, cycle-index etc., not yet reaches circulation Then continue to capture data during condition untill reaching cycling condition.
Distributed data base participative management is used for data real-time management, residing distributed data base ginseng supports global control The control mode that system is concentrated, global control is scattered, global control section is scattered.
Distributed data base by local place data base management system, global data base management system, global catalog, Telecommunication management forms.Be responsible for establishing and management local data bank, realize site autonomy ability, perform the functions such as topical application and Distribution transparency is provided, coordinates the execution of global things and coordinates each local data base management system (local DBMS), ensure the overall situation of database Uniformity, realize and update the function such as synchronous.Distributed data base can be with artificial intelligence technology, the network communications technology, parallel meter Calculation technology interpenetrates, and is combined with each other, and turns into the principal character of current database technology development.
Meanwhile distributed data base is the product that traditional database technique is combined with network technology.One distributed number According to storehouse it is distributed in physical space on each node of computer network, but logically may belong to the data of same system Set, the system architecture of distributed data base can be used.The distributed memory database technology has Local physical Space Self-governance The features such as property globally shared with logic, the redundancy of data, transparency of the independence of data and system.In this system In, to meet claimed below:
(1) each network node memory database keeps its autonomy.
(2) memory database clustering, by read and write abruption, vertically and horizontally cutting strategy reply mass data storage;
(3) a variety of data slit modes, horizontal cutting is carried out on the basis of overall vertical cutting pattern, tackles different answer With the processing different with being done required for data;
(4) it is mutually coordinated between each node memory database, promote each memory database to can serve as other nodes Service end;
(5) transparency of data distribution is kept, meets the harmony feature between the distributivity of data and database, with reference to interior The improvement balanced between deposit data storehouse, solves the requirement that Internet of Things mass data is handled in real time;
(6) memory database persistence, the data variation in memory database need to copy to on disk database, leading to Two Level Database and asynchronous write are crossed to complete persistence.
In order to implement to facilitate, as shown in figure 3, Real-Time Databases System Technique and cloud computing technology can be subjected to depth integration, , data expansible, scalable with database size are realized by the cloud computing center server cluster being distributed in all over the world Base management system reliability, maintainable high distributed real-time database system, the system contain data processing compression, number According to retrieval, data storage virtualization technology, clash handle, content distributing network technology, transaction scheduling, malfunction monitoring and recovery, The multiple functions such as load balancing, mass data storage, high concurrent office are realized on the basis of real-time, distribution, virtualization The functions such as reason, storage encryption, distributed redundancy backup, system dynamic expansion.
Also, in the framework of distributing real-time data bank, the service group of data acquisition unit and database server node Part accesses the platform by the middleware interface of distributing communication service platform, realizes the interaction with other serviced components.Respectively Component is attached with other functional units in a manner of servicing, called, it is possible to achieve data interaction it is free, efficient.In addition By the carry out communication link of other nodes with equally accessing the service, the transmitting-receiving of data can also be taken by distributing communication The interface of business platform is realized.Distributing communication service platform makes node exist by the buffering queue and asynchronous call mechanism of inside Data need not be concerned about the state of receiving node when sending, will be adjusted back when receiving data by message and realize that node data obtains.
Data storage, retrieval service component needed for more data acquisition units and data server are accessed flat by cloud service Platform forms unified data storage, data retrieval service and externally provides the service, breaches conventional separate unit real time data processing The island mode of server, form the functions such as decentralization, reciprocity Distributed Storage, data retrieval is System.The real time data of collection is sent to unified data storage service by service platform by data acquisition unit or data server Functional module carries out the storage of real time data.And client is then even linked into communication garment by platform interface or Web server Business platform, and to unified data query service request and carry out data query.To by distributing communication service platform to its He is sent for the server node of data node, and data are sent successfully it can be assumed that being write successfully for data.When node connects When receiving data, the reception of data can be completed by callback interface.
For data real-time statistics, the application layer that the data in real-time data base are uploaded in Internet of Things three-decker, And complete to count.
From the point of view of the real-time search of entity information, Lucene is formed using the kit of the open source of written in Java System, source code is divided into 7 modules, including,
(1) Org.apache.lucene.document modules:Source record for user to be provided is Document, and Document management during for storing index;
(2) Org.apache.lucene.util modules:For providing the support of public tool-class, constant class;
(3) Org.apache.lucene.store modules:For providing storage management to index file, spy can be selected Fixed domain is stored or not stored;
(4) Org.apache.lucene.index modules:For providing the management to index, indexed for establishing, renewal Index or deletion index;
(5) Org.apache.lucene.search modules:, being capable of basis when called for realizing match query function Index file retrieves relevant matches file;
(6) Org.apache.lucene.analysis modules:For analyzing the file being indexed, to data source on request Filtered, slicing operation;
(7) Org.apache.lucene.queryparser modules:For analyzing the input inquiry word of user, Suitable query is provided, belongs to query analyzer.
Java language is object-oriented language, has strong platform-neutral.Therefore, Lucene index file lattice Formula also has strong platform-neutral.Also, the forms such as data source file word, html, txt, xml for being related to, it is first Resolver is first passed through, extracts the manageable text messages of Lucene, then by Analyzer processing,
Analyzer processing includes:Participle, i.e., carry out effective cutting by data source, according to space or punctuate, be divided into Word or numeral one by one, then remove optional word.
Meanwhile Lucene system architecture mainly encapsulates three parts group by index core, external interface and foundation structure Into this several major has all been packaged into abstract class, so that the Lucene degree of coupling is lowered, preferably embodies The characteristics of Lucene object-oriented.
Directory system is the nucleus module of Lucene search frameworks, and the process for establishing index is exactly to turn substantial amounts of data source It is melted into the document form for that can be rapidly searched.When data source substantial amounts, the foundation of index can greatly improve search Efficiency.The process that Lucene establishes index is as shown in Figure 4.
Analyzer is the text analyzer in Lucene, and one character string is divided into single word by setting rule, And the invalid word in character string is filtered out, invalid word includes " of ", " the " in English, " ", " " in Chinese Deng these words are without effective information.It, which is filtered, can reduce index file, improve recall precision.Afterwards, create Lucene's Index document D ocument objects and corresponding index domain Field objects.Document is the record for the data source being indexed, it Record including text, character string or database table etc..One Document can be made up of multiple information fields, They are stored by Field in Document.The information of the memory scan of needs is added in Document Field, and The Field that needs are indexed writes memory, can be internal memory or disk.This completes the process for establishing index.
When actually implementing, index data base is established after completion, can realize function of search by search index storehouse. Lucene search indexs are exactly to obtain the inquiry request Query of user's input, search for existing index data base, are then back to result Process.Lucene retrieving is as shown in Figure 5.
First, Lucene calls IndexSearcher to open index database, and IndexSearcher is most basic in Lucene Gopher.Then using QueryPaser conversion query statements, so that it becomes being available for the object inquired about inside Lucene. After search is completed, Lucene returns to search result and is shown to user.In Lucene, represent to search with the example of Hits classes The set of hitch fruit.Lucene once only deposits a part of retrieval result in Hits set, and display discharges empty after completing Between, then other part is shown, rather than all results are disposably all put into, so do greatly saving internal memory Space.
For real-time estimate, by judging the cyclic pattern of Internet of Things network entity, to special time period settling time window, in advance The state value of certain time position entities in the future is measured, includes polymerization prediction mould according to the cyclic pattern of Internet of Things event is different Type, monocycle forecast model, multiperiod forecasts model.
Specifically, it is simplest prediction mould to polymerize forecast model (APM-AggregatedPredictionModel) Type.Polymerization forecast model does not have specific aim for the cyclic pattern of the time window of selection, suitable for any cyclic pattern and original Reason is simple, and amount of calculation is small, but the result of the prediction of the forecast model is inaccurate, can only give one guiding substantially of user, no Analysis of Policy Making suitable for Internet of Things.
If it is repeated after one section of course of event obvious period L, then be likely to identical Offset the different time points (for period L), monocycle forecast model (SPM-Single- PeriodPredictionModel) there is good predictive ability to such case.
Sensor senses to entity state be many times this kind of entity of multicycle mixed influence, multiperiod forecasts model (MPM-Multi-periodPredictionModel) more preferable prediction effect can be reached.In multiperiod forecasts model, cycle ginseng Number is more.The discovery of periodic event uses convolutional calculation periodic method in MPM, and MPM can obtain good precision for periodic event Prediction result, it is computationally intensive but because the convolution cycle finds that algorithm is related to multiple FFT and inverse FFT is calculated, take more, discomfort In the Internet of Things research high for requirement of real-time.
From the point of view of real-time exchange, the fixed disk file of whole computer is scanned, it is established that the information MAP contact of virtualization, The accessing operation of fileinfo is directly carried out, it is necessary to using index technology and caching technology when data query exchanges.It is specific next Say, index technology can include aggregat ion pheromones, nonclustered index, and aggregat ion pheromones are according to the order of index, to database, form etc. Information is disposably stored.The frequency of use of this index technology is higher, has very strong operability, but can not retrieve new The data of increasing, it can not also show newly-increased data.Nonclustered index can show newly-increased data message, can quick search go out number It is believed that breath, does not interfere with the modification of data message yet.File buffering strategy file buffering is completed by interim table.Caching technology master If high-frequency access can be carried out to data message in a short time, accelerate the speed of data query, such as, in inquiry thing During networking Back ground Information, it is possible to Computer Service end is cached by caching technology, so can not only be carried for user For corresponding data message, moreover it is possible to effectively avoid the repetition of data message from inquiring about, reduce the frequency of database access, can be maximum Limit improves the efficiency of inquiry reaction.
Meanwhile file buffering strategy file buffering also contains both of which, existing row buffering pattern, there is table buffering mould again Formula.Be expert in buffer mode, mainly data message handled in real time, read in data message, in transmitting procedure, it is necessary to Waste more times.With the continuous development of technology of Internet of things, this row buffering pattern has not adapted to technology hair at this stage The demand of exhibition, and table buffering pattern belongs to a kind of disposable data message mathematics method, processing data information speed ratio is very fast, by In substantial amounts of computer memory space can be occupied when data message calculates, computer waste of storage space can be caused, if meter Calculation machine memory space is little, and this mode is difficult to be widely adopted.Therefore, it is necessary to further study file buffering strategy, find A kind of better information way to play for time, seeks to row buffering pattern and table buffering pattern being used in combination with, is using table During buffer mode, data buffering information can be deleted in time, set some intelligent, automation data to delete program, so The utilization ratio of file data can not only be improved, moreover it is possible to save the memory space of computer.
Interim table technology primarily directed to traditional tables of data processing method for, in traditional tables of data processing procedure In, directly tables of data can be operated, by corresponding data form connection method, operational data table, so as to select The tables of data for being more conform with user data information demand is selected out, comparatively, utilization ratio is not high for traditional data table, and form Stability, integrality etc. all existing defects., can be timely and interim table belongs to a kind of brand-new tables of data processing method Processing data information, data message operation can also be carried out for whole interim table quickly by the interim table of data message typing, The utilization ratio of data message can be so effectively improved, and the data information security of interim table can be higher.
Knowable to big data quantity real time information exchanging policy based on Internet of Things is analyzed, Internet of Things is in the reality of people It is widely used in life, and great convenience is provided for the life of people.The exchange of big data quantity real time information is realized, Information exchange system that can quickly in more New Tradition technology of Internet of things, improves the efficiency that information exchanges, gradually realizes that data are believed Cease the shared of resource and efficiently utilize.
It is can be seen that by above-mentioned character express and with reference to accompanying drawing using after the present invention, gather around and have the following advantages:
1st, the automatic analysis technology based on internet of things oriented real time data, each sensor physically is will be attached to this The state of entity is perceived, by the information perceived by wireless network transmissions into object database.
2nd, on webpage, the data on webpage are in Internet of Things three-decker real-time release the information in database again Application layer, the Internet of Things information on acquisition applications layer, establishes search framework to the Internet of Things data of collection, searches for Internet of Things in real time Entity information.
The 3rd, search result can be directly transmitted to user or be transmitted to prediction module be predicted processing, establish forecast model, in advance Survey acquired results more has realistic meaning for direct search result, then the result after prediction is transmitted into user, so as to give The effect of directiveness is played in the decision-making of user.
4th, the entity for meeting particular state can be searched for, while the real-time status of special entity can be searched and predict a timing Between after the entity state, so as to therefrom obtaining the information useful to user.
Described above is only the preferred embodiment of the present invention, is not intended to limit the invention, it is noted that for this skill For the those of ordinary skill in art field, without departing from the technical principles of the invention, can also make it is some improvement and Modification, these improvement and modification also should be regarded as protection scope of the present invention.

Claims (8)

1. the automatic analysis method of internet of things oriented real time data, it is characterised in that comprise the following steps:
Step 1, entity information collection and acquisition, the method obtained in real time using Watir and Nokogiri information are handled;
Step 2, data real-time management;
Step 3, data real-time statistics;
Step 4, entity information are searched in real time;
Step 5, real-time estimate;
Step 6, real-time exchange.
2. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step In one, by loading page module, page HTML modules, parsing gained HTML modules and data memory module are obtained,
The loading page module and data memory module provide and the extraneous interface contacted;
The loading page module loads the outside page by network linking address, passes to and obtains HTML modules, obtains HTML Module is directed to the dynamic page that passes over of loading page module, obtain the html document of the page, while will be obtained Html document passes to parsing HTML modules;
Parse HTML modules and parse required content of text by location technology from obtained html document, will parse The content come carries out data storage to data memory module.
3. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step In two, using distributed data base participative management, residing distributed data base ginseng is supported in global domination set, global control point Dissipate, the control mode that global control section is scattered;
The distributed data base by local place data base management system, global data base management system, global catalog, Telecommunication management forms.
4. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step In three, application layer that the data in real-time data base are uploaded in Internet of Things three-decker, and complete to count.
5. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step In four, Lucene systems are formed using the kit of the open source of written in Java, source code has been divided into 7 modules, Including,
Org.apache.lucene.document modules:Source record for user to be provided is Document, and for depositing Document management during storage index;
Org.apache.lucene.util modules:For providing the support of public tool-class, constant class;
Org.apache.lucene.store modules:For providing storage management to index file, specific domain can be selected to enter Row is stored or not stored;
Org.apache.lucene.index modules:For providing management to index, indexed for establishing, renewal index or It is to delete index;
Org.apache.lucene.search modules:, can be according to index text when called for realizing match query function Part retrieves relevant matches file;
Org.apache.lucene.analysis modules:For analyzing the file being indexed, data source was carried out on request Filter, slicing operation;
Org.apache.lucene.queryparser modules:For analyzing the input inquiry word of user, there is provided suitable Query, belong to query analyzer.
6. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step In four, for the data source file being related to, resolver is first passed around, extracts the manageable text messages of Lucene, then By Analyzer processing,
The Analyzer processing includes:Participle, i.e., carry out effective cutting by data source, according to space or punctuate, be divided into Word or numeral one by one, then remove optional word;
The Analyzer is the text analyzer in Lucene, and one character string is divided into single word by setting rule, And filter out the invalid word in character string, the invalid word include English in " of ", " the ", Chinese in " ", " ", these words are without effective information;Afterwards, Lucene index document D ocument objects and corresponding index domain are created Field objects.
7. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step In five, by judging the cyclic pattern of Internet of Things network entity, to special time period settling time window, certain time in the future is predicted The state value of position entities, include polymerization forecast model according to the cyclic pattern of Internet of Things event is different, monocycle forecast model, Multiperiod forecasts model.
8. the automatic analysis method of internet of things oriented real time data according to claim 1, it is characterised in that:The step In six, the fixed disk file of whole computer is scanned, it is established that the information MAP contact of virtualization, directly carry out depositing for fileinfo Extract operation, when data query exchanges, it is necessary to using index technology and caching technology,
The index technology can include aggregat ion pheromones, nonclustered index, aggregat ion pheromones according to the order of index, to database, The information such as form are disposably stored, and nonclustered index can show newly-increased data message;
The file buffering strategy file buffering is completed by interim table.
CN201710874448.XA 2017-09-25 2017-09-25 The automatic analysis method of internet of things oriented real time data Pending CN107733694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710874448.XA CN107733694A (en) 2017-09-25 2017-09-25 The automatic analysis method of internet of things oriented real time data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710874448.XA CN107733694A (en) 2017-09-25 2017-09-25 The automatic analysis method of internet of things oriented real time data

Publications (1)

Publication Number Publication Date
CN107733694A true CN107733694A (en) 2018-02-23

Family

ID=61207921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710874448.XA Pending CN107733694A (en) 2017-09-25 2017-09-25 The automatic analysis method of internet of things oriented real time data

Country Status (1)

Country Link
CN (1) CN107733694A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635264A (en) * 2018-11-29 2019-04-16 上海哔哩哔哩科技有限公司 Game service datamation statistical method, system and storage medium
CN111309399A (en) * 2020-02-26 2020-06-19 北京思特奇信息技术股份有限公司 Method, system, medium and device for starting easy-to-ask native client
CN111490886A (en) * 2019-01-25 2020-08-04 北京数安鑫云信息技术有限公司 Network data processing method and system
CN112688711A (en) * 2021-02-02 2021-04-20 深圳市安普信达软件技术服务有限公司 Food detection management system based on cloud computing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080086627A1 (en) * 2006-10-06 2008-04-10 Steven John Splaine Methods and apparatus to analyze computer software
CN103092936A (en) * 2013-01-08 2013-05-08 华北电力大学(保定) Real-time information acquisition method of dynamic page of Internet of Things
CN104615748A (en) * 2015-02-12 2015-05-13 华北电力大学(保定) Watir-based (web application testing in ruby based) internet-of-things web event processing method
CN106844538A (en) * 2016-12-30 2017-06-13 中国电子科技集团公司第五十四研究所 A kind of many attribute sort methods and device for being applied to Internet of Things search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080086627A1 (en) * 2006-10-06 2008-04-10 Steven John Splaine Methods and apparatus to analyze computer software
CN103092936A (en) * 2013-01-08 2013-05-08 华北电力大学(保定) Real-time information acquisition method of dynamic page of Internet of Things
CN104615748A (en) * 2015-02-12 2015-05-13 华北电力大学(保定) Watir-based (web application testing in ruby based) internet-of-things web event processing method
CN106844538A (en) * 2016-12-30 2017-06-13 中国电子科技集团公司第五十四研究所 A kind of many attribute sort methods and device for being applied to Internet of Things search

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
毛捷磊: "基于物联网的大数据量实时信息交换策略分析", 《信息与电脑》 *
沈丹凤: "面向物联网的实体实时搜索技术研究", 《中国优秀硕士论文期刊网》 *
翁祖泉: "基于物联网海量数据处理的数据库技术分析与研究", 《物联网技术》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635264A (en) * 2018-11-29 2019-04-16 上海哔哩哔哩科技有限公司 Game service datamation statistical method, system and storage medium
CN111490886A (en) * 2019-01-25 2020-08-04 北京数安鑫云信息技术有限公司 Network data processing method and system
CN111309399A (en) * 2020-02-26 2020-06-19 北京思特奇信息技术股份有限公司 Method, system, medium and device for starting easy-to-ask native client
CN112688711A (en) * 2021-02-02 2021-04-20 深圳市安普信达软件技术服务有限公司 Food detection management system based on cloud computing

Similar Documents

Publication Publication Date Title
CN103631596B (en) Business object data typing and the configuration device and collocation method for updating rule
CN101997927B (en) A kind of method and system of WEB platform data caching
CN103310012A (en) Distributed web crawler system
Li et al. An active crawler for discovering geospatial web services and their distribution pattern–A case study of OGC Web Map Service
CN105045932B (en) A kind of data page querying method based on descending storage
CN107733694A (en) The automatic analysis method of internet of things oriented real time data
CN101609460B (en) Searching method of supporting isomeric geoscientific data resources and searching system
CN103092936B (en) A kind of Internet of Things dynamic page real-time information collection method
Zhang et al. Building information modeling–based cyber-physical platform for building performance monitoring
CN106021583A (en) Statistical method and system for page flow data
CN108228743A (en) A kind of real-time big data search engine system
Ding et al. SeaCloudDM: a database cluster framework for managing and querying massive heterogeneous sensor sampling data
CN115827907B (en) Cross-cloud multi-source data cube discovery and integration method based on distributed memory
Liu et al. Sensor information retrieval from Internet of Things: Representation and indexing
CN109710767A (en) Multilingual big data service platform
CN102868601B (en) Routing system related to network topology based on graphic configuration database businesses
Tomasic et al. Improving access to environmental data using context information
CN115168474B (en) Internet of things central station system building method based on big data model
Spanos et al. SensorStream: A semantic real–time stream management system
da Silva et al. Providing geographic-multidimensional decision support over the Web
Shen et al. A Catalogue Service for Internet GIS ervices Supporting Active Service Evaluation and Real‐Time Quality Monitoring
Brody et al. Digitometric services for open archives environments
KR20140104544A (en) System and method for building of semantic data
CN109542933A (en) A kind of data base query method, device, equipment and medium
Ben-El-Kezadri et al. XAV: a fast and flexible tracing framework for network simulation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180223