CN108037917A - International trade data management system - Google Patents

International trade data management system Download PDF

Info

Publication number
CN108037917A
CN108037917A CN201810081263.8A CN201810081263A CN108037917A CN 108037917 A CN108037917 A CN 108037917A CN 201810081263 A CN201810081263 A CN 201810081263A CN 108037917 A CN108037917 A CN 108037917A
Authority
CN
China
Prior art keywords
data
management system
international trade
data management
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810081263.8A
Other languages
Chinese (zh)
Inventor
庞振环
徐诚
崔智杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Tengdao Information Technology Co Ltd
Original Assignee
Shanghai Tengdao Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Tengdao Information Technology Co Ltd filed Critical Shanghai Tengdao Information Technology Co Ltd
Priority to CN201810081263.8A priority Critical patent/CN108037917A/en
Publication of CN108037917A publication Critical patent/CN108037917A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention discloses a kind of international trade data management system, using HDFS distributed file storage systems, handle super large file, the access data of streaming, one data set is once generated by data source, it will be replicated and be distributed in different memory nodes, be then responding to various data analysis task requests;International trade data management system provided by the invention, the design of HDFS are established on the basis of more response " write-once, repeatedly read-write " task.This means a data set once being generated by data source, it will be replicated and be distributed in different memory nodes, be then responding to various data analysis task requests.

Description

International trade data management system
Technical field
The present invention relates to a kind of data handling system, more particularly to a kind of international trade data management system, apply in state Border trading.
Background technology
In domain of international trade, interstate each trade contacts all have transaction record.In the modern times Under the international trade rule of change, the trade between country is increasingly frequent, and thousands of trade daily records formed data It is magnanimity.And trade record data are the different institutions that different language is stored in every country in different formats, are created The features such as not only data volume is big for such data, data structure nonstandardized technique, language is diversified.Effectively scientifically organization and administration are international The transaction record data of trade are always the difficult point in the field.The commercial value for excavating commercial data behind is even more in the market one The target directly chased.
International trade for current multinational multilingual more data structures records data, and the storages of data, integration, retrieval are all It is faced with lot of challenges.It is mainly reflected in following three aspect:
A. across the text retrieval of language.Traditional retrieval mode can only use text itself in the record data of different language Language retrieved.It can not accomplish a kind of data of language retrieval whole.
B. in existing trade record data, the HS of product description or product only to this transaction encodes (The Harmonization System Code).Specific name of product or product keyword can not be explicitly pointed out.When according to production When product keyword is retrieved, retrieval result is not accurate enough, and as a result noise is big.
C. the relevance of various countries' data is very low, is only capable of being associated by Business Name, 6 HS codings.But company For title since each side's writing style and call format differ, a company has a variety of literary styles, is associated by Business Name Efficiency substantially reduce.6 HS codings are due to being international, so the association results of same HS codings are accurate.But It is that the product scope representated by the HS codings of 6 is very wide, so, even if passing through 6 HS encodes accurate correlations, actual application Value is nor very big.So finding suitable associate field, the data for integrating each language in various countries are to excavate trade transaction data Difficult point and emphasis.
The content of the invention
The present invention is to provide a kind of international trade data management system to solve the technical solution that above-mentioned technical problem uses System, realizes the information retrieval of accurate multinational, multilingual magnanimity trade record, wherein, specific implementation is:
Using HDFS distributed file storage systems, super large file is handled, the access data of streaming, a data set is once Generated by data source, will be replicated and be distributed in different memory nodes, be then responding to various data analysis tasks Request.
Above-mentioned international trade data management system, wherein:Using the MapReduce technological frames of Hadoop and Spark, Application program can easily be write based on the frame, these application programs can operate in what is be made of thousands of business machines On big cluster, and with a kind of reliable, the mode with fault-tolerant ability concurrently handles the mass data collection of TB ranks.
Above-mentioned international trade data management system, wherein:Grabbed based on a quick, high-level screen scraping and web Frame (Scrapy) is taken, establishes distributed reptile system, automatically extracts the opening data message of webpage, including station address, Associated person information, phone and Email and company information.
Above-mentioned international trade data management system, wherein:By using the natural language technology based on statistics, including point Word, automatic translation, complete the product keyword extraction to multilingual text data.
Above-mentioned international trade data management system, wherein:By using intelligent message list filtering and user behavior number According to analytical technology, dynamically distributes mail mass-sending passage.
The present invention has the advantages that relative to the prior art:
This item purpose information retrieval system is to be based on Elasticsearch (5.6.3) platform, is made with 300,000 trade records It is as follows for test data set Evaluation Platform performance, performance:
Task names CPU takes (second)
Write data
Data are saved as to the dictionary format specified 4.6*10-12
Establish all indexes 27.1024359029
Update the data (one newer field of addition)
Data save as the dictionary format specified 4.3*10-12
Establish all indexes 34.0859549809
The intelligent extraction of English product keyword:With reference to the open language material of internet, during 56,000,000 trades of parsing record Text message, intelligence produce name of product label.As a result:Coverage is 89.8%, Accuracy 88.2%.
Brief description of the drawings
Fig. 1 is application framework figure.
The system architecture diagram that Fig. 2 is.
Fig. 3 is product keyword extraction flow chart.
Embodiment
International trade data management system provided by the invention, is applied in domain of international trade.
International trade data management system provided by the invention, Fig. 1 are application framework figure, including five parts, are respectively: Web, service, cache, database, big data.The Web layers of interfaceization output for being mainly directed towards user and data. Service layers of main completion user service, payment services, search service, statistical fractals and mail send service.Cache layers As a node before data persistence, hot spot data is mainly put into or access speed nearest from user and is situated between faster In matter, accelerate the access of data, reduce the response time.Database layer:The trade formatted is provided for application layer and records data, mutually Networking data.The behavioral data of user is stored at the same time.Data layers of transmission for realizing data of Big, gather, and clean, standardization, with And data it is online it is (offline) calculate, statistical analysis, the function such as data mining.The system architecture diagram that Fig. 2 is, passes through CDN service Device completes load balancing, content distribution and scheduling, is then used to web application clusters and DNS service part, this part by calling Web application functions are completed in family center, search engine service, mail service, payment services and data statistics service.The number of period Responded first by cache server according to transmission, subsequently into permanent data store cluster.Last big data platform provides The function such as data sampling and processing, cleaning, standardization and calculating, statistics and excavation.
Using big data memory technology, using HDFS distributed file storage systems, (super large file leads to processing super large file Refer to hundred MB, the file of hundreds of TB sizes be set), the access data of streaming, the design of HDFS, which is established, is more responding " one On the basis of secondary write-in, repeatedly read-write " task.This means a data set once being generated by data source, will be replicated point It is dealt into different memory nodes, is then responding to various data analysis task requests.As a rule, analysis task The most of data that will be related in data set, that is to say, that for HDFS, request reads whole data set than reading one Bar record is more efficient.
International trade data management system provided by the invention, using big data treatment technology, using Hadoop and Spark MapReduce technological frames, application program can easily be write based on the frame, these application programs can operate in by On the big cluster of thousands of business machine compositions, and with a kind of reliable, the mode with fault-tolerant ability concurrently handles TB The mass data collection of rank.
International trade data management system provided by the invention, information retrieval system are to be based on Elasticsearch (5.6.3) platform.By establishing the core with index engine, query engine, text analyzing engine and peripheral applications system etc. Software systems, there is provided quick processing inquiry, returns the result the function of collection.
International trade data management system provided by the invention, it is fast based on one using open data grabber matching technique Fast, high-level screen scraping and web crawl frame (Scrapy), establish distributed reptile system, automatically extract webpage Open data message, including station address, associated person information (phone and Email) and company information etc..
International trade data management system provided by the invention, realizes that natural language key words text extracts, by using Natural language technology (including participle, automatic translation etc.) based on statistics, is completed crucial to the product of multilingual text data Word extracts, the visible Fig. 3 of details.The product description in trade record is extracted first, after washing idle character, carries out reduction word It is dry.After carrying out cutting by N-gram algorithms with after, as data set A.In addition, foreign trade is searched etc. the potential product word of other platforms Remittance (including stopping word) is handled with same cleaning and stem extracting rule, as data set B.Data set B is in data set A Be mapping, it is multiple continuously by mapping to product vocabulary spliced and merged after be reduced into former word.Former word is mapped Into existing classification tree, father and son's node de-redundancy and single plural number are completed after reunification, be deposited into Hbase makes for application layer With.
International trade data management system provided by the invention, mail is intelligently sent, by using intelligent message list Filter and user behavior data analytical technology, dynamically distributes mail mass-sending passage, ensure mail sends and receives success rate.

Claims (5)

  1. A kind of 1. international trade data management system, it is characterised in that:Using HDFS distributed file storage systems, super large is handled File, the access data of streaming, a data set will be replicated once being generated by data source and be distributed to different memory nodes In, it is then responding to various data analysis task requests.
  2. 2. international trade data management system as claimed in claim 1, it is characterised in that:Using Hadoop's and Spark MapReduce technological frames, can easily write application program, these application programs can be operated in by upper based on the frame On the big cluster of thousand business machine compositions, and with a kind of reliable, the mode with fault-tolerant ability concurrently handles TB grades Other mass data collection.
  3. 3. international trade data management system as claimed in claim 2, it is characterised in that:It is quick, high-level based on one Screen scraping and web crawl frame (Scrapy), establish distributed reptile system, automatically extract the opening data letter of webpage Breath, including station address, associated person information, phone and Email and company information.
  4. 4. international trade data management system as claimed in claim 3, it is characterised in that:By using the nature based on statistics Language technology, including participle, automatic translation, complete the product keyword extraction to multilingual text data.
  5. 5. international trade data management system as claimed in claim 4, it is characterised in that:By using intelligent message list Filter and user behavior data analytical technology, dynamically distributes mail mass-sending passage.
CN201810081263.8A 2018-01-29 2018-01-29 International trade data management system Pending CN108037917A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810081263.8A CN108037917A (en) 2018-01-29 2018-01-29 International trade data management system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810081263.8A CN108037917A (en) 2018-01-29 2018-01-29 International trade data management system

Publications (1)

Publication Number Publication Date
CN108037917A true CN108037917A (en) 2018-05-15

Family

ID=62097495

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810081263.8A Pending CN108037917A (en) 2018-01-29 2018-01-29 International trade data management system

Country Status (1)

Country Link
CN (1) CN108037917A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902331A (en) * 2021-10-27 2022-01-07 上海腾道信息技术有限公司 International trade data management system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131782B1 (en) * 2003-04-28 2012-03-06 Hewlett-Packard Development Company, L.P. Shadow directory structure in a distributed segmented file system
CN102682082A (en) * 2012-04-07 2012-09-19 山东师范大学 Network Flash searching system and network Flash searching method based on content structure characteristics
CN106611046A (en) * 2016-12-16 2017-05-03 武汉中地数码科技有限公司 Big data technology-based space data storage processing middleware framework
CN106934014A (en) * 2017-03-10 2017-07-07 山东省科学院情报研究所 A kind of network data excavation based on Hadoop and analysis platform and its method
CN107463559A (en) * 2016-06-05 2017-12-12 贵州双龙数联科技有限公司 A kind of business location information obtains analysis and storage system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8131782B1 (en) * 2003-04-28 2012-03-06 Hewlett-Packard Development Company, L.P. Shadow directory structure in a distributed segmented file system
CN102682082A (en) * 2012-04-07 2012-09-19 山东师范大学 Network Flash searching system and network Flash searching method based on content structure characteristics
CN107463559A (en) * 2016-06-05 2017-12-12 贵州双龙数联科技有限公司 A kind of business location information obtains analysis and storage system
CN106611046A (en) * 2016-12-16 2017-05-03 武汉中地数码科技有限公司 Big data technology-based space data storage processing middleware framework
CN106934014A (en) * 2017-03-10 2017-07-07 山东省科学院情报研究所 A kind of network data excavation based on Hadoop and analysis platform and its method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
樊重俊 等: "《大数据分析与应用》", 31 January 2016, 立信会计出版社 *
饶文碧: "《Hadoop核心技术与实验》", 30 April 2017, 武汉大学出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113902331A (en) * 2021-10-27 2022-01-07 上海腾道信息技术有限公司 International trade data management system

Similar Documents

Publication Publication Date Title
CN107122443B (en) A kind of distributed full-text search system and method based on Spark SQL
CN105677844B (en) A kind of orientation of moving advertising big data pushes and user is across screen recognition methodss
US10725981B1 (en) Analyzing big data
JP2021108183A (en) Method, apparatus, device and storage medium for intention recommendation
US9361320B1 (en) Modeling big data
CN109034993A (en) Account checking method, equipment, system and computer readable storage medium
US20160085742A1 (en) Automated collective term and phrase index
CN104516949B (en) Web data treating method and apparatus, inquiry processing method and question answering system
CN104424360A (en) Method and system for accessing a set of data tables in a source database
CN112269816B (en) Government affair appointment correlation retrieval method
CN110019616A (en) A kind of POI trend of the times state acquiring method and its equipment, storage medium, server
CN109840254A (en) A kind of data virtualization and querying method, device
CN109710767B (en) Multilingual big data service platform
CN111708774B (en) Industry analytic system based on big data
CN103455335A (en) Multilevel classification Web implementation method
CN112100182A (en) Data warehousing processing method and device and server
US20180089193A1 (en) Category-based data analysis system for processing stored data-units and calculating their relevance to a subject domain with exemplary precision, and a computer-implemented method for identifying from a broad range of data sources, social entities that perform the function of Social Influencers
CN110688383A (en) Data acquisition method and system
CN108037917A (en) International trade data management system
CN116467291A (en) Knowledge graph storage and search method and system
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN114416848A (en) Data blood relationship processing method and device based on data warehouse
CN110502529B (en) Data processing method, device, server and storage medium
CN113779215A (en) Data processing platform
KR20220061388A (en) A recording medium in which the program providing the keyword-item mapping information service of news articles

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180515

WD01 Invention patent application deemed withdrawn after publication