CN111047112A - Computer internet of things data processing system - Google Patents

Computer internet of things data processing system Download PDF

Info

Publication number
CN111047112A
CN111047112A CN201911377769.4A CN201911377769A CN111047112A CN 111047112 A CN111047112 A CN 111047112A CN 201911377769 A CN201911377769 A CN 201911377769A CN 111047112 A CN111047112 A CN 111047112A
Authority
CN
China
Prior art keywords
data
data processing
module
logistics
web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911377769.4A
Other languages
Chinese (zh)
Other versions
CN111047112B (en
Inventor
刘巍巍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang Sport University
Original Assignee
Shenyang Sport University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang Sport University filed Critical Shenyang Sport University
Priority to CN201911377769.4A priority Critical patent/CN111047112B/en
Publication of CN111047112A publication Critical patent/CN111047112A/en
Application granted granted Critical
Publication of CN111047112B publication Critical patent/CN111047112B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • G06Q10/047Optimisation of routes or paths, e.g. travelling salesman problem
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Abstract

The invention provides a computer Internet of things data processing system which comprises a data acquisition module, a data processing module, a data storage module, an information optimization module and a logistics distribution module, and can acquire logistics data from a plurality of heterogeneous systems in real time to efficiently process the data in a real-time or batch processing mode, thereby improving the sequential delivery capacity of goods, reducing the forwarding times of the goods at intermediate nodes, improving the transportation efficiency of the goods, and overcoming the difficulties of untimely management of complex events and the like.

Description

Computer internet of things data processing system
Technical Field
The invention belongs to the field of computer Internet of things, and particularly relates to a computer Internet of things data processing system.
Background
Computer networking is leading to a shift in the thinking model of the logistics industry. Logistics service providers use sensor technologies such as GPS or telemetry to track and manage their cargo processes, which helps to label and connect factories, ships and machines, etc., and also provide forecasted events and prevention of accidents for delivery delays by using external data that contains critical information about the event, such as information traffic accidents and natural disasters, correlating data from different sensors and social media and analyzing in real time. The connectivity of "things" enables instant communication between devices over the Internet, and this highly connected ecosystem has a profound impact on the revenue of both the logistics operators, their business customers and the end customers. One of the main advantages of the ecosystem of the internet of things is that it can merge and fuse information of logistics sensors and external sensors, such as weather sensors and traffic (GPS) sensors, and the internet of things can also be connected with social media, such as providing information of events such as important traffic, accidents, weather, natural disasters, and the like.
However, due to the diversity of data and the difference of collection speed, the accuracy and speed of collecting and processing data from different sources are very different, and meanwhile, the workload of processing data in real time is very large, and the traditional logistics information system cannot solve the problem. On the other hand, although predictive analysis to predict shipment delays or prescriptive analysis to optimize routes can increase delivery speed and thus customer satisfaction within a prescribed time, delayed delivery remains a pending problem and timely delivery is a significant challenge for logistics companies because delays are sometimes caused by factors that anyone cannot control. The delay in delivery can have various effects, such as customer churn or order cancellation, which can cause significant losses. Therefore, timely delivery is critical to logistics companies. In recent years, logistics enterprises are beginning to investigate how to utilize data prediction delay, and particularly, in terms of big data technology, logistics providers are concerned about using a lot of accidents, traffic congestion and other event streams from external resources, such as social media real-time analysis and prediction delay. Real-time prediction delays enable companies to take actions, such as optimizing real-time flight routes. The existing solution is based on the classical data processing technology, so that the traditional logistics information system cannot process the sensor or social media data in real time because the data flows in a high-speed state, and the traditional data processing method cannot process the modeless data such as text. Existing data processing methods (e.g., techniques or algorithms) do not have sufficient efficiency to process data in real-time.
Considering the evaluation of data sources, most existing solutions are limited to only one data source. In addition, for the continuous improvement of real-time systems, the prior art uses static historical data sets for testing, and obviously, the current logistics requirements cannot be met only by relying on historical data. Based on the above, the invention provides a mixed framework for batch processing and real-time processing of mass data, which is based on a classification algorithm and can collect stream data in real time from a plurality of heterogeneous systems to efficiently process the data in a real-time or batch processing mode. The present invention is directed to developing a hybrid solution that enables real-time data to be processed in bulk, making logistics services possible, and there is an urgent need for computer processing to provide programs to perform analysis in real-time.
Disclosure of Invention
The invention provides a computer Internet of things data processing system which is based on a classification algorithm and can collect logistics data from a plurality of heterogeneous systems to process the data efficiently in real time.
A computer Internet of things data processing system comprises a data acquisition module, a data processing module, a data storage module, an information optimization module and a logistics distribution module, wherein the data processing module comprises a batch data processing device and a real-time data processing module, the batch data processing device is used for reading/extracting stored data and preparing the data, the batch data processing device comprises a data preparation stage and a data processing stage, the data preparation stage comprises data extraction, data cleaning, data filtering, data integration and data storage, the data processing stage classifies and processes fully prepared data, the batch data processing device directly sends the data to the real-time data processing module through a wireless/wired network, the information optimization module performs line optimization on logistics and transmits an optimized line to the logistics distribution module through the wireless/wired data, the batch data processing device processes the logistics data from the plurality of data sensors and the logistics application in batches.
Furthermore, a data extractor captures web pages linked in a specific website from the cloud server, extracts links from the crawled web pages, and stores the extracted link data information in the data storage module respectively; the query module provides a user search interface, a user inputs search words, and returns query results to the user according to the query of the user, the data filtering is to remove noise from web pages, filter out some script identifiers and useless information, store useful texts in each web page, perform word segmentation, noise removal and sorting, extract keywords of the web pages, and acquire a web page PR value calculated based on the link relation of the web pages according to the link relation among the web pages extracted in the web page capturing module and the idea of a PageRank sorting algorithm; and then, calculating similarity weight of the logistics related information and related webpage keywords by using a space vector model, increasing the weight of historical search and search keywords of a user, finally recalculating contribution values among webpages with link relations through an algorithm, and obtaining a rank ranking, wherein the contribution values are used as important reference basis of logistics service.
Further, the data filtering comprises the following steps:
(1) analyzing web page link Set needing sortingwebLinking the orientation relations, and determining the out-linking and in-linking conditions of each webpage;
(2) from SetwebExtracting keywords from the page content of each webpage to generate a keyword set S of the webpageweb_keywords={V1,V2,V3,…,Vi};
(3) Calculating SetwebObtaining keyword correlation factor set W (u) by the similarity between the keywords corresponding to each webpage and K;
(4) finding a keyword list S such as logistics, traffic, weather, geographical position and the like corresponding to the user according to the IDh_web_keywords
(5) Calculating SetwebThe corresponding key words and S of each web page in the databaseh_web_keywordsObtaining the influencing factor H (u);
(6) for each web page, there are three factors, according to the formula GR ═ 1-d) + d [ ∑ pr (v) (α/N)v+ β·W(u)+γ·H(u))];
And calculating the comprehensive score of each webpage to obtain the final webpage ranking GR, wherein α, gamma respectively represents the weight of the link, the topic relevance factor and the user factor in PR value distribution.
Further, data extraction includes information sources for collecting various structured and unstructured data to obtain complete and accurate descriptions of regions of interest and to normalize the multi-source heterogeneous data.
Furthermore, the web page is grabbed by using a Heritrix open source crawler program, and on the existing open source code, a user can expand each component of the web page to realize the grabbing logic of the user and acquire required resources from a network.
Furthermore, the data acquisition module acquires multi-source heterogeneous data, wherein the multi-source heterogeneous data comprises information of a data sensor and information of logistics application, and the data sensor comprises a vehicle sensor and a weather sensor; the logistics application comprises microblog and social media.
Further, data cleansing is the detection of correction or removal of corrupted or inaccurate record sets, tables.
Further, two steps of data set composition are performed: in a first step, data is converted from a source to a target serialized format; the second step is to merge the converted data.
Further, the real-time data processing module groups or segments the data items, and generates an aggregate data set from the objective function, which is effectively analyzed in predicting delivery delays.
Further, the information optimization module is used for constructing high-throughput persistent data and information of a reliably delivered collection system, and further performing theme aggregation on the logistics route, wherein the theme aggregation is divided into one or more linear and ordered message sequences, and each message is identified according to the index of the message sequence.
The original PageRank algorithm only considers the link-in and link-out relations of web pages, does not analyze whether the content of the web pages is consistent with or similar to the topic searched by a user, can capture high-quality web pages, but also captures web pages which are irrelevant to the query topic or have low similarity, namely the topic drift problem exists.
The real-time data processing module executes the event cluster in real time and obtains instant insight on the processed data, and the objective function is generated into an aggregated data set, so that effective analysis is facilitated when delivery delay is predicted. And timely adjusting the logistics transportation according to the interactive data in real time so as to realize informatization and standardization of the logistics distribution products.
The computer Internet of things data processing system optimizes logistics lines, can save a large amount of manpower and material resources, enables goods to be delivered to customers in time, improves user satisfaction, improves the sequential delivery capacity of the goods, reduces the forwarding times of the goods at intermediate nodes, improves the transportation efficiency of the goods, and overcomes the difficulties of untimely management of complex events and the like.
Drawings
FIG. 1 is a schematic diagram of a computer Internet of things data processing system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
A computer Internet of things data processing system comprises a data acquisition module, a data processing module, a data storage module and a logistics distribution module, wherein the data processing module comprises batch data processing equipment and a real-time data processing module, the batch data processing equipment is used for reading/extracting stored data and performing data preparation, cleaning and filtering of the data are performed under the real-time data processing condition, and the batch data processing equipment directly sends the data to the real-time data processing module through a wireless network.
The data acquisition module acquires multi-source heterogeneous data, wherein the multi-source heterogeneous data comprises information of a data sensor and information of logistics application, and the data sensor comprises a vehicle sensor and a weather sensor; the logistics applications include microblogging, Twitter, social media, Facebook, and the like.
The batch data processing device carries out batch processing on logistics data from a plurality of data sensors and logistics applications, and comprises two stages: a data preparation phase and a data processing phase. The data preparation phase comprises data extraction, data cleaning, data filtering, data integration and data storage. And in the data processing stage, classifying the fully prepared data. Specifically, the method comprises the following steps:
data extraction: the method is used for collecting various information sources to obtain complete and accurate description of the interested region and standardizing multi-source heterogeneous data. The data extractor uses data both internal and external, the internal data source typically being the system used by the user. A customer system includes an information system (supply chain management), Customer Relationship Management (CRM), logistics management system, and Account Management System (AMS) that are formed from supply chain management. These systems produce large amounts of data that are collected by a data extractor. It also obtains data from external source weather sensors, and other social media. Further, structured and unstructured data may be collected. For example, unstructured text may be collected from microblogs, or structured business process data from a logistics information system may be collected. The data extractor is used for grabbing web pages linked in a specific website from the cloud server and extracting links from the crawled web pages, extracted link data information is stored in the data storage module respectively, meanwhile, the data extractor comprises a web page preprocessing module and an inquiry module, and the web page preprocessing module is used for analyzing the grabbed web pages, establishing indexes and calculating the grades of the web pages; the query module provides a user search interface, and the user inputs search terms and returns a query result to the user according to the query of the user. The web page is grabbed by using a Heritrix open source crawler program, the Heritrix is a crawler for grabbing web page contents in a multi-thread mode, and on the existing open source code, a user can expand each component of the crawler to realize the grabbing logic of the user and acquire required resources from a network.
And (3) data filtering: refers to a broad strategy or solution for optimizing a data set. Data overload, which is refined to what a group of users need, does not include other data that may be repetitive, irrelevant, or even sensitive, increases computational cost and accuracy of data processing. During the collection process, the data block, especially the label, determines the direct and indirect connection between the transportation, delivery, logistics and shipping processes. For example, the get message "today's stock prices are very high" will be deleted by the data filter because it does not carry any information related to the logistics flow. Data filtering is to consist of three parts: webpage denoising, Chinese word segmentation and link analysis. Most web pages are semi-structured and have a large amount of format information, so the first step of analyzing the content of the web page is to denoise the web page and filter out some script identifiers and useless information. And then, storing the useful texts in each page, analyzing the texts, performing word segmentation, denoising and sequencing on the texts, and extracting the keywords of the webpage. According to the link relation between the webpages extracted from the webpage capturing module and by using the idea of a PageRank sorting algorithm, a webpage PR value calculated based on the link relation of the webpages is firstly obtained. And then, calculating similarity weight of the logistics related information and related webpage keywords by using a space vector model, and increasing the weight of historical search and search keywords of the user. And finally, recalculating the contribution values among the web pages with the link relation through an algorithm, and obtaining the rank ranking which is used as an important reference basis of the logistics service. The method comprises the following steps:
(1) analyzing web page link Set needing sortingwebLinking the orientation relations, and determining the out-linking and in-linking conditions of each webpage;
(2) from SetwebExtracting keywords from the page content of each webpage to generate a keyword set S of the webpageweb_keywords={V1,V2,V3,…,Vi};
(3) Calculating SetwebThe similarity between the key word corresponding to each webpage and the K is obtained to obtain the key word phaseThe relationship factors set W (u);
(4) finding a keyword list S such as logistics, traffic, weather, geographical position and the like corresponding to the user according to the IDh_web_keywords
(5) Calculating SetwebThe corresponding key words and S of each web page in the databaseh_web_keywordsObtaining the influencing factor H (u);
(6) for each web page, there are three factors, according to the formula GR ═ 1-d) + d [ ∑ pr (v) (α/N)v+ β·W(u)+γ·H(u))]
And calculating the comprehensive score of each webpage to obtain a final webpage ranking GR, wherein α, gamma respectively represents the weight of the link, the topic relevance factor and the user factor in PR value distribution, the three parameters are all larger than 0, in order to ensure the convergence of the algorithm, the sum of the three values is equal to 1, the weight of each item represents the importance degree of the factors in the distribution process, and the change of the values of the three factors can influence the quality of the sequencing result.
Data cleaning: it is a set of records, tables, that detect corrections (or removals) for corruption or inaccuracy.
Data integration: the data set composition is performed in two steps. In a first step, the data is converted from a source to a target serialized format; the second step is to merge the converted data.
Data storage: this step is intended to process the integrated data set and store the data in memory.
The data query module mainly comprises two parts: a query agent and a user interface. After system pre-processing, the data passed to the query module at this point consists of two parts: index the web page library and reverse files. The query agent receives the query phrases input by the user through the user interface, searches from the index webpage library and the inverted file after segmenting the phrases, acquires the documents containing the query phrases, and returns the documents to the user as a return result. In the process of realizing query, after the query phrases are segmented, the vector representation of the query is obtained, and the weight of the query phrases in the inverted index and the position information of the terms are comprehensively considered. Calculating the similarity between the query and the webpage document through a traditional information retrieval model; and combining the webpage ranking obtained in the webpage preprocessing stage, sequencing the webpages to form a final ranking, and then returning the corresponding webpages to the user according to the ranking sequence.
The real-time data processing module is a core component. Logistics services have different shipping modes, including air, ship and land, and a single transportation mode cannot meet the transportation requirements. Particularly overseas logistics, such as products manufactured in china are shipped to customers in different cities abroad; the shipping process must be intermodal, meaning that the process will include trucks, trains, ships, or air, etc. The integrated multimodal logistics process is susceptible to various challenges, resulting in delivery delays. For example, if customs clearance at a port is delayed, cargo may be delayed even if all other modes of transportation conform to a predetermined schedule. Uncertain events such as natural disasters, war, strikes may affect one or more delivery modes or integrate further steps of the logistics process. Uncertainty is a major challenge for such events. Thus, the present invention analyzes data in real time to extract factors that may cause delivery delays, the information of which contains a continuous stream of data that may cause delivery delay events. The real-time data processing module is based on social media and sensor events, the access speed of the real-time data processing module is one hundred thousand times of that of a magnetic disk, and the real-time data processing module is designed to add lacking data information to facilitate timely handling of events. These events first enter the delivery to the data storage module via distributed messages. For such uncertain events, the real-time data processing module can preferentially extend the processing behavior, rather than batch processing. The real-time data processing module executes the cluster of events in real time and obtains instant insight into the processed data. Categorization is the process of grouping or segmenting data items that are similar in a cluster but belong to another cluster than the data. The invention is based on the classification concept, and the objective function is generated into an aggregated data set, thereby being beneficial to effective analysis when the delivery delay is predicted.
Let Xi={X1,X2,…,XnDenotes data with n logistics objectsSet, wherein Xi={X1,X2,…,XnDenotes m attributes of the ith object, and the dataset is represented as an n × m matrix. Classify the data set T times, Ri={Ri1,Ri2,…,RiTThe result of the ith object under T-time classification is represented, the base classification result is represented as an n multiplied by T matrix, the data information adopts paired constraints, and the paired constraints describe the relationship between two data objects and comprise two relationships: the information of the necessary connection relation reflecting that the data object belongs to the same class is marked as M, and the information of the disconnected relation reflecting that the data object does not belong to the same class is marked as C.
In the original data characteristic space, the original data is expressed into an n multiplied by n matrix D, D (i, j) represents the similarity between an object i and an object j, and Gaussian similarity is used for calculating
Figure RE-GDA0002391322870000081
Where δ is a hyper-parameter, then calculating a diagonal matrix E, where the elements on the diagonal are the sums of all elements in a row (column) of the W matrix, normalizing to obtain a final matrix D ═ E-1/2WE-1/2The closer the distance the greater the similarity between the two points. In the symbolic feature space formed by the base classes, the base classes are represented as an n × n matrix B. B (i, j) represents the number of times that the object i and the object j are classified into one class under the T-time base classification result, and is calculated according to the following formula:
Figure RE-GDA0002391322870000082
δ(Rit,Rjt)=1,Rit=Rjt;δ(Rit,Rjt)=0,Rit≠Rjt
in the supervised information feature space, the pairwise constraints are represented as an n × n matrix S. The pair-wise constraints have symmetry and transitivity for a given same data set. Calculating the similarity between the object points according to the following formula to ensure the nonnegativity of the similarity matrix S,
Figure RE-GDA0002391322870000083
in this way, after n × n matrices D, B and S are respectively constructed in three feature spaces of original data, basis classification and supervision information, three similarity matrices are linearly combined to construct a new matrix L ═ w1D+w2B+w3S, wherein, w1、 w2、w3And respectively carrying out NMF classification on the L for the weights of the original data, the base classification and the supervision information to obtain a result, and selecting a row with the maximum NMI value as a class label in a final result matrix.
The information optimization module optimizes logistics routes according to NMI values, buyer information, seller information and transportation information (such as flights, train numbers and the like), is a publish-subscribe-based information system, is a fast and highly extensible distributed information module, is used for constructing a collection system of persistent data high throughput and reliable delivery, and is used for performing topic collection on logistics routes by using information, and is divided into one or more linear and ordered message sequences, wherein each message is identified according to the index of the message sequence. The information optimization module transmits the optimized line to the logistics distribution module through wireless/wired data, and data interaction is achieved.
The logistics distribution module comprises a GPS module and a displacement sensor, the position of the goods is monitored in real time through the combination of the GPS module and the displacement sensor, and logistics conveying is adjusted in time according to interactive data in real time, so that informatization and standardization of logistics distribution products are achieved.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications, equivalent changes and modifications made to the above embodiment according to the technical spirit of the present invention are within the scope of the technical solution of the present invention.

Claims (9)

1. A computer Internet of things data processing system is characterized by comprising a data acquisition module, a data processing module, a data storage module, an information optimization module and a logistics distribution module, the data processing module comprises batch data processing equipment and a real-time data processing module, the batch data processing equipment is used for reading/extracting stored data and preparing data, the batch data processing equipment comprises a data preparing stage and a data processing stage, the data preparation phase comprises data extraction, data cleaning, data filtering, data integration and data storage, the data processing stage classifies and processes the prepared sufficient data, the batch data processing equipment directly sends the data to the real-time data processing module through a wireless/wired network, the information optimization module optimizes logistics lines of logistics, and the optimized lines are transmitted to the logistics distribution module through wireless/wired data.
2. The computer internet-of-things data processing system as claimed in claim 1, wherein the data extractor fetches linked web pages from a specific website from the cloud server and extracts links from the crawled web pages, the extracted link data information is stored in the data storage module respectively, and meanwhile, the data extractor comprises a web page preprocessing module and a query module, the web page preprocessing module analyzes the crawled web pages, establishes indexes, and calculates the grades of the web pages; the query module provides a user search interface, a user inputs search words, and returns query results to the user according to the query of the user, the data filtering is to remove noise from web pages, filter out some script identifiers and useless information, store useful texts in each web page, perform word segmentation, noise removal and sorting, extract keywords of the web pages, and acquire a web page PR value calculated based on the link relation of the web pages according to the link relation among the web pages extracted in the web page capturing module and the idea of a PageRank sorting algorithm; and then, calculating similarity weight of the logistics related information and related webpage keywords by using a space vector model, increasing the weight of historical search and search keywords of a user, finally recalculating contribution values among webpages with link relations through an algorithm, and obtaining a rank ranking, wherein the contribution values are used as important reference basis of logistics service.
3. The computer internet of things data processing system of claim 2, wherein the data filtering comprises the steps of:
(1) analyzing web page link Set needing sortingwebLinking the orientation relations, and determining the out-linking and in-linking conditions of each webpage;
(2) from SetwebExtracting keywords from the page content of each webpage to generate a keyword set S of the webpageweb_keywords={V1,V2,V3,…,Vi};
(3) Calculating SetwebObtaining keyword correlation factor set W (u) by the similarity between the keywords corresponding to each webpage and K;
(4) finding a keyword list S such as logistics, traffic, weather, geographical position and the like corresponding to the user according to the IDh_web_keywords
(5) Calculating SetwebThe corresponding key words and S of each web page in the databaseh_web_keywordsObtaining the influencing factor H (u);
(6) for each web page, there are three factors, according to the formula GR ═ 1-d) + d [ ∑ pr (v) (α/N)v+β·W(u)+γ·H(u))];
And calculating the comprehensive score of each webpage to obtain the final webpage ranking GR, wherein α, gamma respectively represents the weight of the link, the topic relevance factor and the user factor in PR value distribution.
4. A computer internet of things data processing system as claimed in any one of claims 1 to 3 wherein data extraction includes a data processing system for collecting various sources of structured and unstructured data information to obtain a complete and accurate description of a region of interest and to normalize the multi-source heterogeneous data.
5. A computer IOP data processing system according to any of claims 1 to 4 in which the crawling of web pages is done using the Heritrix open source crawler, on its existing open source code, the user can extend its components to implement its own crawling logic and obtain the required resources from the network.
6. The computer internet of things data processing system of any one of claims 1-4, wherein the data acquisition module acquires multi-source heterogeneous data, the multi-source heterogeneous data comprising information of data sensors and information of logistics applications, the data sensors comprising vehicle sensors, weather sensors; the logistics application comprises microblog and social media.
7. A computer Internet of things data processing system as claimed in any one of claims 1 to 4, wherein data cleansing is the detection of corrections or removal of corrupt or inaccurate sets, tables of records.
8. A computer internet of things data processing system as claimed in claim 1, wherein the real-time data processing module groups or segments data items, generates an aggregate data set from the objective function, and performs an efficient analysis in predicting delivery delays.
9. The computer internet of things data processing system of claim 1, wherein the information optimization module is configured to construct a collection system of high throughput persistent data and reliable deliveries to subject the logistics route into one or more linearly ordered sequences of messages, wherein each message is identified by its index.
CN201911377769.4A 2019-12-27 2019-12-27 Computer internet of things data processing system Active CN111047112B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911377769.4A CN111047112B (en) 2019-12-27 2019-12-27 Computer internet of things data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911377769.4A CN111047112B (en) 2019-12-27 2019-12-27 Computer internet of things data processing system

Publications (2)

Publication Number Publication Date
CN111047112A true CN111047112A (en) 2020-04-21
CN111047112B CN111047112B (en) 2020-11-06

Family

ID=70240850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911377769.4A Active CN111047112B (en) 2019-12-27 2019-12-27 Computer internet of things data processing system

Country Status (1)

Country Link
CN (1) CN111047112B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000040197A (en) * 1998-07-22 2000-02-08 Honda Motor Co Ltd Automatic optimization device and optimization method
US20020184329A1 (en) * 2001-06-04 2002-12-05 Maureen Chen System and method for dynamically managing and facilitating data real time via a shared computer network
CN105389639A (en) * 2015-12-15 2016-03-09 上海汽车集团股份有限公司 Logistics transportation route planning method, device and system based on machine learning
CN108335075A (en) * 2018-03-02 2018-07-27 华南理工大学 A kind of processing system and method for Logistics Oriented big data
CN109359151A (en) * 2018-10-29 2019-02-19 上海船舶工艺研究所(中国船舶工业集团公司第十研究所) A kind of body section logistics big data Visualization Platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000040197A (en) * 1998-07-22 2000-02-08 Honda Motor Co Ltd Automatic optimization device and optimization method
US20020184329A1 (en) * 2001-06-04 2002-12-05 Maureen Chen System and method for dynamically managing and facilitating data real time via a shared computer network
CN105389639A (en) * 2015-12-15 2016-03-09 上海汽车集团股份有限公司 Logistics transportation route planning method, device and system based on machine learning
CN108335075A (en) * 2018-03-02 2018-07-27 华南理工大学 A kind of processing system and method for Logistics Oriented big data
CN109359151A (en) * 2018-10-29 2019-02-19 上海船舶工艺研究所(中国船舶工业集团公司第十研究所) A kind of body section logistics big data Visualization Platform

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林婷薇等: "基于主题相关与用户历史的网页排序算法", 《计算机工程与设计》 *
石荣丽: "基于大数据的智慧物流园区信息平台建设", 《企业经济》 *

Also Published As

Publication number Publication date
CN111047112B (en) 2020-11-06

Similar Documents

Publication Publication Date Title
US20210049548A1 (en) Multi-phase consolidation optimization tool
US9767166B2 (en) System and method for predicting user behaviors based on phrase connections
AU2022201654A1 (en) System and engine for seeded clustering of news events
TWI778481B (en) Computer-implemented system for ai-based product integration and deduplication and method integrating and deduplicating products using ai
CN110555568B (en) Road traffic running state real-time perception method based on social network information
CN108984775B (en) Public opinion monitoring method and system based on commodity comments
US10210551B1 (en) Calculating data relevance for valuation
CN106599065A (en) Food safety online public opinion early warning system based on Storm distributed framework
Mohd Selamat et al. Big data analytics—A review of data‐mining models for small and medium enterprises in the transportation sector
Mikavicaa et al. Big data: challenges and opportunities in logistics systems
CN111444304A (en) Search ranking method and device
CN111581193A (en) Data processing method, device, computer system and storage medium
Sihombing et al. Fake review detection on yelp dataset using classification techniques in machine learning
CN116109373A (en) Recommendation method and device for financial products, electronic equipment and medium
AlShaer et al. IBRIDIA: A hybrid solution for processing big logistics data
CN109933575A (en) The storage method and device of monitoring data
CN111047112B (en) Computer internet of things data processing system
Akpinar et al. Data mining applications in civil aviation sector: State-of-art review
Sharma Study of sentiment analysis using hadoop
US20220156228A1 (en) Data Tagging And Synchronisation System
Zhang et al. DGWC: Distributed and generic web crawler for online information extraction
Qiao et al. Constructing a data warehouse based decision support platform for China tourism industry
Annam et al. Entropy based informative content density approach for efficient web content extraction
Dave et al. Identifying big data dimensions and structure
US20220050884A1 (en) Utilizing machine learning models to automatically generate a summary or visualization of data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant