CN108037917A - International trade data management system - Google Patents
International trade data management system Download PDFInfo
- Publication number
- CN108037917A CN108037917A CN201810081263.8A CN201810081263A CN108037917A CN 108037917 A CN108037917 A CN 108037917A CN 201810081263 A CN201810081263 A CN 201810081263A CN 108037917 A CN108037917 A CN 108037917A
- Authority
- CN
- China
- Prior art keywords
- data
- management system
- international trade
- data management
- distributed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
- Information Transfer Between Computers (AREA)
Abstract
The invention discloses a kind of international trade data management system, using HDFS distributed file storage systems, handle super large file, the access data of streaming, one data set is once generated by data source, it will be replicated and be distributed in different memory nodes, be then responding to various data analysis task requests;International trade data management system provided by the invention, the design of HDFS are established on the basis of more response " write-once, repeatedly read-write " task.This means a data set once being generated by data source, it will be replicated and be distributed in different memory nodes, be then responding to various data analysis task requests.
Description
Technical field
The present invention relates to a kind of data handling system, more particularly to a kind of international trade data management system, apply in state
Border trading.
Background technology
In domain of international trade, interstate each trade contacts all have transaction record.In the modern times
Under the international trade rule of change, the trade between country is increasingly frequent, and thousands of trade daily records formed data
It is magnanimity.And trade record data are the different institutions that different language is stored in every country in different formats, are created
The features such as not only data volume is big for such data, data structure nonstandardized technique, language is diversified.Effectively scientifically organization and administration are international
The transaction record data of trade are always the difficult point in the field.The commercial value for excavating commercial data behind is even more in the market one
The target directly chased.
International trade for current multinational multilingual more data structures records data, and the storages of data, integration, retrieval are all
It is faced with lot of challenges.It is mainly reflected in following three aspect:
A. across the text retrieval of language.Traditional retrieval mode can only use text itself in the record data of different language
Language retrieved.It can not accomplish a kind of data of language retrieval whole.
B. in existing trade record data, the HS of product description or product only to this transaction encodes (The
Harmonization System Code).Specific name of product or product keyword can not be explicitly pointed out.When according to production
When product keyword is retrieved, retrieval result is not accurate enough, and as a result noise is big.
C. the relevance of various countries' data is very low, is only capable of being associated by Business Name, 6 HS codings.But company
For title since each side's writing style and call format differ, a company has a variety of literary styles, is associated by Business Name
Efficiency substantially reduce.6 HS codings are due to being international, so the association results of same HS codings are accurate.But
It is that the product scope representated by the HS codings of 6 is very wide, so, even if passing through 6 HS encodes accurate correlations, actual application
Value is nor very big.So finding suitable associate field, the data for integrating each language in various countries are to excavate trade transaction data
Difficult point and emphasis.
The content of the invention
The present invention is to provide a kind of international trade data management system to solve the technical solution that above-mentioned technical problem uses
System, realizes the information retrieval of accurate multinational, multilingual magnanimity trade record, wherein, specific implementation is:
Using HDFS distributed file storage systems, super large file is handled, the access data of streaming, a data set is once
Generated by data source, will be replicated and be distributed in different memory nodes, be then responding to various data analysis tasks
Request.
Above-mentioned international trade data management system, wherein:Using the MapReduce technological frames of Hadoop and Spark,
Application program can easily be write based on the frame, these application programs can operate in what is be made of thousands of business machines
On big cluster, and with a kind of reliable, the mode with fault-tolerant ability concurrently handles the mass data collection of TB ranks.
Above-mentioned international trade data management system, wherein:Grabbed based on a quick, high-level screen scraping and web
Frame (Scrapy) is taken, establishes distributed reptile system, automatically extracts the opening data message of webpage, including station address,
Associated person information, phone and Email and company information.
Above-mentioned international trade data management system, wherein:By using the natural language technology based on statistics, including point
Word, automatic translation, complete the product keyword extraction to multilingual text data.
Above-mentioned international trade data management system, wherein:By using intelligent message list filtering and user behavior number
According to analytical technology, dynamically distributes mail mass-sending passage.
The present invention has the advantages that relative to the prior art:
This item purpose information retrieval system is to be based on Elasticsearch (5.6.3) platform, is made with 300,000 trade records
It is as follows for test data set Evaluation Platform performance, performance:
Task names | CPU takes (second) |
Write data | |
Data are saved as to the dictionary format specified | 4.6*10-12 |
Establish all indexes | 27.1024359029 |
Update the data (one newer field of addition) | |
Data save as the dictionary format specified | 4.3*10-12 |
Establish all indexes | 34.0859549809 |
The intelligent extraction of English product keyword:With reference to the open language material of internet, during 56,000,000 trades of parsing record
Text message, intelligence produce name of product label.As a result:Coverage is 89.8%, Accuracy 88.2%.
Brief description of the drawings
Fig. 1 is application framework figure.
The system architecture diagram that Fig. 2 is.
Fig. 3 is product keyword extraction flow chart.
Embodiment
International trade data management system provided by the invention, is applied in domain of international trade.
International trade data management system provided by the invention, Fig. 1 are application framework figure, including five parts, are respectively:
Web, service, cache, database, big data.The Web layers of interfaceization output for being mainly directed towards user and data.
Service layers of main completion user service, payment services, search service, statistical fractals and mail send service.Cache layers
As a node before data persistence, hot spot data is mainly put into or access speed nearest from user and is situated between faster
In matter, accelerate the access of data, reduce the response time.Database layer:The trade formatted is provided for application layer and records data, mutually
Networking data.The behavioral data of user is stored at the same time.Data layers of transmission for realizing data of Big, gather, and clean, standardization, with
And data it is online it is (offline) calculate, statistical analysis, the function such as data mining.The system architecture diagram that Fig. 2 is, passes through CDN service
Device completes load balancing, content distribution and scheduling, is then used to web application clusters and DNS service part, this part by calling
Web application functions are completed in family center, search engine service, mail service, payment services and data statistics service.The number of period
Responded first by cache server according to transmission, subsequently into permanent data store cluster.Last big data platform provides
The function such as data sampling and processing, cleaning, standardization and calculating, statistics and excavation.
Using big data memory technology, using HDFS distributed file storage systems, (super large file leads to processing super large file
Refer to hundred MB, the file of hundreds of TB sizes be set), the access data of streaming, the design of HDFS, which is established, is more responding " one
On the basis of secondary write-in, repeatedly read-write " task.This means a data set once being generated by data source, will be replicated point
It is dealt into different memory nodes, is then responding to various data analysis task requests.As a rule, analysis task
The most of data that will be related in data set, that is to say, that for HDFS, request reads whole data set than reading one
Bar record is more efficient.
International trade data management system provided by the invention, using big data treatment technology, using Hadoop and Spark
MapReduce technological frames, application program can easily be write based on the frame, these application programs can operate in by
On the big cluster of thousands of business machine compositions, and with a kind of reliable, the mode with fault-tolerant ability concurrently handles TB
The mass data collection of rank.
International trade data management system provided by the invention, information retrieval system are to be based on Elasticsearch
(5.6.3) platform.By establishing the core with index engine, query engine, text analyzing engine and peripheral applications system etc.
Software systems, there is provided quick processing inquiry, returns the result the function of collection.
International trade data management system provided by the invention, it is fast based on one using open data grabber matching technique
Fast, high-level screen scraping and web crawl frame (Scrapy), establish distributed reptile system, automatically extract webpage
Open data message, including station address, associated person information (phone and Email) and company information etc..
International trade data management system provided by the invention, realizes that natural language key words text extracts, by using
Natural language technology (including participle, automatic translation etc.) based on statistics, is completed crucial to the product of multilingual text data
Word extracts, the visible Fig. 3 of details.The product description in trade record is extracted first, after washing idle character, carries out reduction word
It is dry.After carrying out cutting by N-gram algorithms with after, as data set A.In addition, foreign trade is searched etc. the potential product word of other platforms
Remittance (including stopping word) is handled with same cleaning and stem extracting rule, as data set B.Data set B is in data set A
Be mapping, it is multiple continuously by mapping to product vocabulary spliced and merged after be reduced into former word.Former word is mapped
Into existing classification tree, father and son's node de-redundancy and single plural number are completed after reunification, be deposited into Hbase makes for application layer
With.
International trade data management system provided by the invention, mail is intelligently sent, by using intelligent message list
Filter and user behavior data analytical technology, dynamically distributes mail mass-sending passage, ensure mail sends and receives success rate.
Claims (5)
- A kind of 1. international trade data management system, it is characterised in that:Using HDFS distributed file storage systems, super large is handled File, the access data of streaming, a data set will be replicated once being generated by data source and be distributed to different memory nodes In, it is then responding to various data analysis task requests.
- 2. international trade data management system as claimed in claim 1, it is characterised in that:Using Hadoop's and Spark MapReduce technological frames, can easily write application program, these application programs can be operated in by upper based on the frame On the big cluster of thousand business machine compositions, and with a kind of reliable, the mode with fault-tolerant ability concurrently handles TB grades Other mass data collection.
- 3. international trade data management system as claimed in claim 2, it is characterised in that:It is quick, high-level based on one Screen scraping and web crawl frame (Scrapy), establish distributed reptile system, automatically extract the opening data letter of webpage Breath, including station address, associated person information, phone and Email and company information.
- 4. international trade data management system as claimed in claim 3, it is characterised in that:By using the nature based on statistics Language technology, including participle, automatic translation, complete the product keyword extraction to multilingual text data.
- 5. international trade data management system as claimed in claim 4, it is characterised in that:By using intelligent message list Filter and user behavior data analytical technology, dynamically distributes mail mass-sending passage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810081263.8A CN108037917A (en) | 2018-01-29 | 2018-01-29 | International trade data management system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810081263.8A CN108037917A (en) | 2018-01-29 | 2018-01-29 | International trade data management system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108037917A true CN108037917A (en) | 2018-05-15 |
Family
ID=62097495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810081263.8A Pending CN108037917A (en) | 2018-01-29 | 2018-01-29 | International trade data management system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108037917A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113902331A (en) * | 2021-10-27 | 2022-01-07 | 上海腾道信息技术有限公司 | International trade data management system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8131782B1 (en) * | 2003-04-28 | 2012-03-06 | Hewlett-Packard Development Company, L.P. | Shadow directory structure in a distributed segmented file system |
CN102682082A (en) * | 2012-04-07 | 2012-09-19 | 山东师范大学 | Network Flash searching system and network Flash searching method based on content structure characteristics |
CN106611046A (en) * | 2016-12-16 | 2017-05-03 | 武汉中地数码科技有限公司 | Big data technology-based space data storage processing middleware framework |
CN106934014A (en) * | 2017-03-10 | 2017-07-07 | 山东省科学院情报研究所 | A kind of network data excavation based on Hadoop and analysis platform and its method |
CN107463559A (en) * | 2016-06-05 | 2017-12-12 | 贵州双龙数联科技有限公司 | A kind of business location information obtains analysis and storage system |
-
2018
- 2018-01-29 CN CN201810081263.8A patent/CN108037917A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8131782B1 (en) * | 2003-04-28 | 2012-03-06 | Hewlett-Packard Development Company, L.P. | Shadow directory structure in a distributed segmented file system |
CN102682082A (en) * | 2012-04-07 | 2012-09-19 | 山东师范大学 | Network Flash searching system and network Flash searching method based on content structure characteristics |
CN107463559A (en) * | 2016-06-05 | 2017-12-12 | 贵州双龙数联科技有限公司 | A kind of business location information obtains analysis and storage system |
CN106611046A (en) * | 2016-12-16 | 2017-05-03 | 武汉中地数码科技有限公司 | Big data technology-based space data storage processing middleware framework |
CN106934014A (en) * | 2017-03-10 | 2017-07-07 | 山东省科学院情报研究所 | A kind of network data excavation based on Hadoop and analysis platform and its method |
Non-Patent Citations (2)
Title |
---|
樊重俊 等: "《大数据分析与应用》", 31 January 2016, 立信会计出版社 * |
饶文碧: "《Hadoop核心技术与实验》", 30 April 2017, 武汉大学出版社 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113902331A (en) * | 2021-10-27 | 2022-01-07 | 上海腾道信息技术有限公司 | International trade data management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107122443B (en) | A kind of distributed full-text search system and method based on Spark SQL | |
CN105677844B (en) | A kind of orientation of moving advertising big data pushes and user is across screen recognition methodss | |
US10725981B1 (en) | Analyzing big data | |
JP2021108183A (en) | Method, apparatus, device and storage medium for intention recommendation | |
US9361320B1 (en) | Modeling big data | |
CN109034993A (en) | Account checking method, equipment, system and computer readable storage medium | |
US20160085742A1 (en) | Automated collective term and phrase index | |
CN104516949B (en) | Web data treating method and apparatus, inquiry processing method and question answering system | |
CN104424360A (en) | Method and system for accessing a set of data tables in a source database | |
CN112269816B (en) | Government affair appointment correlation retrieval method | |
CN110019616A (en) | A kind of POI trend of the times state acquiring method and its equipment, storage medium, server | |
CN109840254A (en) | A kind of data virtualization and querying method, device | |
CN109710767B (en) | Multilingual big data service platform | |
CN111708774B (en) | Industry analytic system based on big data | |
CN103455335A (en) | Multilevel classification Web implementation method | |
CN112100182A (en) | Data warehousing processing method and device and server | |
US20180089193A1 (en) | Category-based data analysis system for processing stored data-units and calculating their relevance to a subject domain with exemplary precision, and a computer-implemented method for identifying from a broad range of data sources, social entities that perform the function of Social Influencers | |
CN110688383A (en) | Data acquisition method and system | |
CN108037917A (en) | International trade data management system | |
CN116467291A (en) | Knowledge graph storage and search method and system | |
CN110062112A (en) | Data processing method, device, equipment and computer readable storage medium | |
CN114416848A (en) | Data blood relationship processing method and device based on data warehouse | |
CN110502529B (en) | Data processing method, device, server and storage medium | |
CN113779215A (en) | Data processing platform | |
KR20220061388A (en) | A recording medium in which the program providing the keyword-item mapping information service of news articles |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180515 |
|
WD01 | Invention patent application deemed withdrawn after publication |