CN111695000A - Multi-source big data loading method and system - Google Patents

Multi-source big data loading method and system Download PDF

Info

Publication number
CN111695000A
CN111695000A CN202010551553.1A CN202010551553A CN111695000A CN 111695000 A CN111695000 A CN 111695000A CN 202010551553 A CN202010551553 A CN 202010551553A CN 111695000 A CN111695000 A CN 111695000A
Authority
CN
China
Prior art keywords
loading
storage
rule
storage server
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010551553.1A
Other languages
Chinese (zh)
Other versions
CN111695000B (en
Inventor
董瑞朝
董新建
曹晓青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Lanhai Navigation Big Data Development Co ltd
Original Assignee
Shandong Lanhai Navigation Big Data Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Lanhai Navigation Big Data Development Co ltd filed Critical Shandong Lanhai Navigation Big Data Development Co ltd
Priority to CN202010551553.1A priority Critical patent/CN111695000B/en
Publication of CN111695000A publication Critical patent/CN111695000A/en
Application granted granted Critical
Publication of CN111695000B publication Critical patent/CN111695000B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multi-source big data loading method and a multi-source big data loading system, wherein the method comprises a first recording unit, a classifier and an extractor, wherein the first recording unit is used for recording storage paths of all storage nodes in a storage server and attribute information of storage entities, the classifier is used for linking at least one relation between the storage paths in the first recording unit and the attribute information of the storage entities, the extractor extracts expressions based on the relation, establishes a first loading rule and establishes a second loading rule, and the cloud database determines the attribute information corresponding to loaded data through at least one of the first loading rule and the second loading rule so as to search corresponding entity objects in an original data set of the storage server. The invention also provides a system and a matching method. According to the method and the device, the content loaded by the user is compared with the first loading rule and the second loading rule which are obtained according to the storage rule, and the corresponding entity object can be accurately searched in the original data set of the storage server.

Description

Multi-source big data loading method and system
Technical Field
The invention relates to the technical field of computers, in particular to a big data loading method, and specifically relates to a multi-source big data loading method and system.
Background
With the fusion of big data, the data types are various, and the storage of the storage object firstly needs to ensure the correctness of data storage and be convenient for retrieval. The method comprises the steps of storing customer data, loading production plans, financial statements, company plans, production orders, searching data and the like, wherein the loading of big data is not separated, the loading of the data is needed to directly reflect searched results, most of the big data loading modes at present are only used for loading terms searched by users, but when the terms of the users are wrong and deviated, the results required by the customers cannot be reflected timely, and meanwhile, in the aspect of stored data processing, the method of storing in a tree mode only can show single tree results under the traditional search results and cannot comprehensively meet the search requirements of the users.
Disclosure of Invention
The invention aims to provide a multi-source big data loading method and a multi-source big data loading system to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme:
the multi-source big data loading method is characterized by comprising the following steps
Setting at least one storage server, wherein the storage server comprises at least one first recording unit, the first recording unit is used for recording the storage paths of all storage nodes in the storage server and the attribute information of the storage entities,
providing a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
providing an extractor, the extractor extracting the expression based on the relationship,
selecting a screening rule to identify repeated items in the expression according to the expression, deleting the repeated items to establish a first loading rule,
analyzing the similar expression rules in the first loading rule, deleting the similar expression rules to establish a second loading rule,
the first loading rule and the second loading rule are saved in memory,
the memory is saved to a cloud database,
the user data end inputs and receives the operation command of the user,
the loading server obtains the loading command from the user data terminal,
the load command is transmitted to the cloud database,
the cloud database determines attribute information corresponding to the loaded data through at least one of the first loading rule and the second loading rule, so that a corresponding entity object is searched in an original data set of the storage server.
Preferably, the information searched by the user, which is received by the user data terminal, is split into one or more of connection words, specific terms and pictures, and is stored in the cloud database.
Preferably, the method for searching the entity object in the original data set of the storage server is as follows:
the cloud database compares the connection words, the specific entries and the pictures which are split into one or more than one with the established second loading rule, if the connection words, the specific entries and the pictures have the same or similar loading items, the loading items are executed to obtain corresponding entity objects in the original data set of the storage server,
if the same or similar loading items do not exist after the comparison with the second loading rule, the cloud database is divided into one or more of connection words, specific terms and pictures to be compared with the established first loading rule, if the same or similar loading items exist, the loading items are executed to obtain corresponding entity objects in the original data set of the storage server,
and if the same or similar loading items do not exist after the comparison with the first loading rule, loading the corresponding entity object in the original data set acquired in the storage server by using fuzzy query according to the recording habit and the loading related history recorded by the cloud database.
Preferably, the display mode of the entity object is as follows:
acquiring a corresponding entity object in an original data set of a storage server as a first display result by using a second loading rule;
acquiring a corresponding entity object in an original data set of a storage server as a second display result by using a first loading rule;
and acquiring a corresponding entity object in the original data set of the storage server by using the recording habit and the loading related history recorded by the cloud database as a third display result.
Preferably, the expression further includes an internal node address of the storage data under the corresponding storage node determined based on the relationship.
Preferably, the expressions are sequentially extracted in a parent-level encoding and child-level data set mode.
The invention also provides a multi-source big data loading system, which comprises
A first recording unit for recording attribute information of storage paths and storage entities of all storage nodes in the storage server,
a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
an extractor for extracting an expression based on the relationship,
a filter for selecting a filtering rule to identify a duplicate term in the expression according to the expression,
a first loading tag, deleting the repeated items to establish a first loading rule,
the second loading label analyzes the similar expression rules in the first loading rule, deletes the similar expression rules to establish a second loading rule,
a memory storing a first load tag and a second load tag,
a user data terminal for inputting and receiving the operation command of the user,
the loading server acquires a loading command from the user data terminal and transmits the loading command to the cloud database;
and the cloud database receives the loading command, starts a loading actuator to determine the attribute information corresponding to the loaded data through at least one of the first loading tag and the second loading tag so as to search the corresponding entity object in the original data set of the storage server.
Preferably, the classifier links the storage paths in a tree network.
Compared with the prior art, the invention has the beneficial effects that:
in the invention, the content loaded by the user is divided into one or more of connecting words, specific terms and pictures, and the first loading rule and the second loading rule which are obtained according to the storage rule are compared, so that the corresponding entity object can be accurately searched in the original data set of the storage server.
When the user fails to provide accurate loading elements, the user can load the results desired by the user through the recording habits and the loading related history of the user by utilizing fuzzy query.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
fig. 2 is a general block diagram of the system of the present invention.
Detailed Description
Detailed description of the preferred embodimentsreferring to fig. 1-2. The invention also provides a multi-source big data loading system, which comprises
A first recording unit for recording attribute information of storage paths and storage entities of all storage nodes in the storage server,
a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
an extractor for extracting an expression based on the relationship,
a filter for selecting a filtering rule to identify a duplicate term in the expression according to the expression,
a first loading tag, deleting the repeated items to establish a first loading rule,
the second loading label analyzes the similar expression rules in the first loading rule, deletes the similar expression rules to establish a second loading rule,
a memory storing a first load tag and a second load tag,
a user data terminal for inputting and receiving the operation command of the user,
the loading server acquires a loading command from the user data terminal and transmits the loading command to the cloud database;
and the cloud database receives the loading command, starts a loading actuator to determine the attribute information corresponding to the loaded data through at least one of the first loading tag and the second loading tag so as to search the corresponding entity object in the original data set of the storage server.
The classifier links the storage paths in a tree network.
Example 1
The invention provides a multi-source big data loading method which comprises the following steps
Setting at least one storage server, wherein the storage server comprises at least one first recording unit, the first recording unit is used for recording the storage paths of all storage nodes in the storage server and the attribute information of the storage entities,
providing a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
providing an extractor, the extractor extracting the expression based on the relationship,
selecting a screening rule to identify repeated items in the expression according to the expression, deleting the repeated items to establish a first loading rule,
analyzing the similar expression rules in the first loading rule, deleting the similar expression rules to establish a second loading rule,
the first loading rule and the second loading rule are saved in memory,
the memory is saved to a cloud database,
the user data end inputs and receives the operation command of the user,
the loading server obtains the loading command from the user data terminal,
the load command is transmitted to the cloud database,
the cloud database determines attribute information corresponding to the loaded data through at least one of the first loading rule and the second loading rule, so that a corresponding entity object is searched in an original data set of the storage server.
The method for searching the entity object in the original data set of the storage server comprises the following steps: and the cloud database compares the connection words, the specific entries and the pictures which are split into one or more than one with the established second loading rule, and if the connection words, the specific entries and the pictures have the same or similar loading items, the loading items are executed to acquire corresponding entity objects in the original data set of the storage server. And acquiring the corresponding entity object in the original data set of the storage server as a first display result by using a second loading rule.
Example 2
The invention provides a multi-source big data loading method which comprises the following steps
Setting at least one storage server, wherein the storage server comprises at least one first recording unit, the first recording unit is used for recording the storage paths of all storage nodes in the storage server and the attribute information of the storage entities,
providing a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
providing an extractor, the extractor extracting the expression based on the relationship,
selecting a screening rule to identify repeated items in the expression according to the expression, deleting the repeated items to establish a first loading rule,
analyzing the similar expression rules in the first loading rule, deleting the similar expression rules to establish a second loading rule,
the first loading rule and the second loading rule are saved in memory,
the memory is saved to a cloud database,
the user data end inputs and receives the operation command of the user,
the loading server obtains the loading command from the user data terminal,
the load command is transmitted to the cloud database,
the cloud database determines attribute information corresponding to the loaded data through at least one of the first loading rule and the second loading rule, so that a corresponding entity object is searched in an original data set of the storage server.
The method for searching the entity object in the original data set of the storage server comprises the following steps:
if there is no identical or similar loading item after the comparison with the second loading rule in embodiment 1, the cloud database compares the connection word, the specific entry and the picture split into one or more of the connection word, the specific entry and the picture with the established first loading rule, if there is an identical or similar loading item, the loading item is executed to obtain an entity object corresponding to the original data set of the storage server, and the entity object corresponding to the original data set of the storage server is obtained by using the first loading rule as a second display result.
Example 3
The invention provides a multi-source big data loading method which comprises the following steps
Setting at least one storage server, wherein the storage server comprises at least one first recording unit, the first recording unit is used for recording the storage paths of all storage nodes in the storage server and the attribute information of the storage entities,
providing a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
providing an extractor, the extractor extracting the expression based on the relationship,
selecting a screening rule to identify repeated items in the expression according to the expression, deleting the repeated items to establish a first loading rule,
analyzing the similar expression rules in the first loading rule, deleting the similar expression rules to establish a second loading rule,
the first loading rule and the second loading rule are saved in memory,
the memory is saved to a cloud database,
the user data end inputs and receives the operation command of the user,
the loading server obtains the loading command from the user data terminal,
the load command is transmitted to the cloud database,
the cloud database determines attribute information corresponding to the loaded data through at least one of the first loading rule and the second loading rule, so that a corresponding entity object is searched in an original data set of the storage server.
The method for searching the entity object in the original data set of the storage server comprises the following steps:
if there is no identical or similar loading item after comparison with the first loading rule in embodiment 2, the recording habit and the loading related history recorded in the cloud database are used, and the entity object corresponding to the original data set acquired in the storage server is loaded by using the fuzzy query. And acquiring a corresponding entity object in the original data set of the storage server by using the recording habit and the loading related history recorded by the cloud database as a third display result.
As can be seen from embodiments 1 to 3, the content loaded by the user is split into one or more of a connection word, a specific entry and a picture, and the first loading rule and the second loading rule obtained according to the storage rule are compared, so that when the user fails to provide an accurate loading element, a corresponding entity object can be accurately searched in the original data set of the storage server, and the result desired by the user can be loaded by fuzzy query through the recording habits and the loading related history of the user.
The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts of the present invention. The foregoing is only a preferred embodiment of the present invention, and it should be noted that there are objectively infinite specific structures due to the limited character expressions, and it will be apparent to those skilled in the art that a plurality of modifications, decorations or changes may be made without departing from the principle of the present invention, and the technical features described above may be combined in a suitable manner; such modifications, variations, combinations, or adaptations of the invention using its spirit and scope, as defined by the claims, may be directed to other uses and embodiments.

Claims (8)

1. The multi-source big data loading method is characterized by comprising the following steps
Setting at least one storage server, wherein the storage server comprises at least one first recording unit, the first recording unit is used for recording the storage paths of all storage nodes in the storage server and the attribute information of the storage entities,
providing a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
providing an extractor, the extractor extracting the expression based on the relationship,
selecting a screening rule to identify repeated items in the expression according to the expression, deleting the repeated items to establish a first loading rule,
analyzing the similar expression rules in the first loading rule, deleting the similar expression rules to establish a second loading rule,
the first loading rule and the second loading rule are saved in memory,
the memory is saved to a cloud database,
the user data end inputs and receives the operation command of the user,
the loading server obtains the loading command from the user data terminal,
the load command is transmitted to the cloud database,
the cloud database determines attribute information corresponding to the loaded data through at least one of the first loading rule and the second loading rule, so that a corresponding entity object is searched in an original data set of the storage server.
2. The multi-source big data loading method according to claim 1, wherein the information searched by the user is split into one or more of connection words, specific terms and pictures by using the information searched by the user received by the user data terminal, and is stored in a cloud database.
3. The multi-source big data loading method according to claim 1, wherein the method for searching the entity object in the original data set of the storage server is as follows:
the cloud database compares the connection words, the specific entries and the pictures which are split into one or more than one with the established second loading rule, if the connection words, the specific entries and the pictures have the same or similar loading items, the loading items are executed to obtain corresponding entity objects in the original data set of the storage server,
if the same or similar loading items do not exist after the comparison with the second loading rule, the cloud database is divided into one or more of connection words, specific terms and pictures to be compared with the established first loading rule, if the same or similar loading items exist, the loading items are executed to obtain corresponding entity objects in the original data set of the storage server,
and if the same or similar loading items do not exist after the comparison with the first loading rule, loading the corresponding entity object in the original data set acquired in the storage server by using fuzzy query according to the recording habit and the loading related history recorded by the cloud database.
4. The multi-source big data loading method according to claim 1 or 3, wherein the entity object is displayed in a manner that:
acquiring a corresponding entity object in an original data set of a storage server as a first display result by using a second loading rule;
acquiring a corresponding entity object in an original data set of a storage server as a second display result by using a first loading rule;
and acquiring a corresponding entity object in the original data set of the storage server by using the recording habit and the loading related history recorded by the cloud database as a third display result.
5. The multi-source big data loading method according to claim 1, wherein the expression further includes an internal node address of the storage data under the corresponding storage node determined based on the relationship.
6. The multi-source big data loading method according to claim 1, wherein the expressions are sequentially extracted in a parent-level encoding and child-level data set manner.
7. A multi-source big data loading system is characterized by comprising
A first recording unit for recording attribute information of storage paths and storage entities of all storage nodes in the storage server,
a classifier for linking at least one relationship between the storage path within the first recording unit and the attribute information of the storage entity,
an extractor for extracting an expression based on the relationship,
a filter for selecting a filtering rule to identify a duplicate term in the expression according to the expression,
a first loading tag, deleting the repeated items to establish a first loading rule,
the second loading label analyzes the similar expression rules in the first loading rule, deletes the similar expression rules to establish a second loading rule,
a memory storing a first load tag and a second load tag,
a user data terminal for inputting and receiving the operation command of the user,
the loading server acquires a loading command from the user data terminal and transmits the loading command to the cloud database;
and the cloud database receives the loading command, starts a loading actuator to determine the attribute information corresponding to the loaded data through at least one of the first loading tag and the second loading tag so as to search the corresponding entity object in the original data set of the storage server.
8. The multi-source big data loading system according to claim 7, wherein the classifier links the storage paths in a tree network.
CN202010551553.1A 2020-06-16 2020-06-16 Multi-source big data loading method and system Active CN111695000B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010551553.1A CN111695000B (en) 2020-06-16 2020-06-16 Multi-source big data loading method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010551553.1A CN111695000B (en) 2020-06-16 2020-06-16 Multi-source big data loading method and system

Publications (2)

Publication Number Publication Date
CN111695000A true CN111695000A (en) 2020-09-22
CN111695000B CN111695000B (en) 2021-04-27

Family

ID=72481408

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010551553.1A Active CN111695000B (en) 2020-06-16 2020-06-16 Multi-source big data loading method and system

Country Status (1)

Country Link
CN (1) CN111695000B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101026627A (en) * 2007-03-15 2007-08-29 上海交通大学 Multi-source data fusion system based on rule and certainty factor
CN105893526A (en) * 2016-03-30 2016-08-24 上海坤士合生信息科技有限公司 Multi-source data fusion system and method
CN107193858A (en) * 2017-03-28 2017-09-22 福州金瑞迪软件技术有限公司 Towards the intelligent Service application platform and method of multi-source heterogeneous data fusion
CN107341215A (en) * 2017-06-07 2017-11-10 北京航空航天大学 A kind of vertical knowledge mapping classification ensemble querying method of multi-source based on Distributed Computing Platform
CN107609154A (en) * 2017-09-23 2018-01-19 浪潮软件集团有限公司 Method and device for processing multi-source heterogeneous data
CN108804613A (en) * 2018-05-30 2018-11-13 国网山东省电力公司经济技术研究院 A kind of Various database real time fusion system and its fusion method
US10146954B1 (en) * 2012-06-11 2018-12-04 Quest Software Inc. System and method for data aggregation and analysis
CN109710667A (en) * 2018-11-27 2019-05-03 中科曙光国际信息产业有限公司 A kind of shared realization method and system of the multisource data fusion based on big data platform
CN110334133A (en) * 2019-07-11 2019-10-15 京东城市(北京)数字科技有限公司 Rule digging method and device, electronic equipment and computer readable storage medium
CN110990351A (en) * 2019-12-05 2020-04-10 南方电网数字电网研究院有限公司 Unstructured data acquisition method, device and system and computer equipment
CN110990390A (en) * 2019-12-02 2020-04-10 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method and device, computer equipment and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101026627A (en) * 2007-03-15 2007-08-29 上海交通大学 Multi-source data fusion system based on rule and certainty factor
US10146954B1 (en) * 2012-06-11 2018-12-04 Quest Software Inc. System and method for data aggregation and analysis
CN105893526A (en) * 2016-03-30 2016-08-24 上海坤士合生信息科技有限公司 Multi-source data fusion system and method
CN107193858A (en) * 2017-03-28 2017-09-22 福州金瑞迪软件技术有限公司 Towards the intelligent Service application platform and method of multi-source heterogeneous data fusion
CN107341215A (en) * 2017-06-07 2017-11-10 北京航空航天大学 A kind of vertical knowledge mapping classification ensemble querying method of multi-source based on Distributed Computing Platform
CN107609154A (en) * 2017-09-23 2018-01-19 浪潮软件集团有限公司 Method and device for processing multi-source heterogeneous data
CN108804613A (en) * 2018-05-30 2018-11-13 国网山东省电力公司经济技术研究院 A kind of Various database real time fusion system and its fusion method
CN109710667A (en) * 2018-11-27 2019-05-03 中科曙光国际信息产业有限公司 A kind of shared realization method and system of the multisource data fusion based on big data platform
CN110334133A (en) * 2019-07-11 2019-10-15 京东城市(北京)数字科技有限公司 Rule digging method and device, electronic equipment and computer readable storage medium
CN110990390A (en) * 2019-12-02 2020-04-10 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method and device, computer equipment and storage medium
CN110990351A (en) * 2019-12-05 2020-04-10 南方电网数字电网研究院有限公司 Unstructured data acquisition method, device and system and computer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田卫东 等: "一种精简的关联规则表示模型", 《计算机应用研究》 *

Also Published As

Publication number Publication date
CN111695000B (en) 2021-04-27

Similar Documents

Publication Publication Date Title
US11899681B2 (en) Knowledge graph building method, electronic apparatus and non-transitory computer readable storage medium
CN100530185C (en) Network behavior based personalized recommendation method and system
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
US11216516B2 (en) Method and system for scalable search using microservice and cloud based search with records indexes
CN110532309B (en) Generation method of college library user portrait system
CN102253936A (en) Method for recording access of user to merchandise information, search method and server
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN112269816B (en) Government affair appointment correlation retrieval method
CN110795613A (en) Commodity searching method, device and system and electronic equipment
US8862609B2 (en) Expanding high level queries
JP2004030221A (en) Method for automatically detecting table to be modified
CN114265957A (en) Multiple data source combined query method and system based on graph database
CN113407678A (en) Knowledge graph construction method, device and equipment
CN105159898A (en) Searching method and searching device
CN114328947A (en) Knowledge graph-based question and answer method and device
CN117609468A (en) Method and device for generating search statement
CN113779110A (en) Family relation network extraction method and device, computer equipment and storage medium
CN111259223B (en) News recommendation and text classification method based on emotion analysis model
CN117667841A (en) Enterprise data management platform and method
CN111695000B (en) Multi-source big data loading method and system
CN109460467B (en) Method for constructing network information classification system
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
CN110399431A (en) A kind of incidence relation construction method, device and equipment
CN115935042A (en) Intelligent pledge asset duplicate checking method and system based on fusion model
CN114662002A (en) Object recommendation method, medium, device and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant