CN107103067A - A kind of method of data synchronization and system based on search engine - Google Patents

A kind of method of data synchronization and system based on search engine Download PDF

Info

Publication number
CN107103067A
CN107103067A CN201710254007.XA CN201710254007A CN107103067A CN 107103067 A CN107103067 A CN 107103067A CN 201710254007 A CN201710254007 A CN 201710254007A CN 107103067 A CN107103067 A CN 107103067A
Authority
CN
China
Prior art keywords
data
index
data source
warehouse
synchrodata
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710254007.XA
Other languages
Chinese (zh)
Inventor
赵艳飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Si Tech Information Technology Co Ltd
Original Assignee
Beijing Si Tech Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Si Tech Information Technology Co Ltd filed Critical Beijing Si Tech Information Technology Co Ltd
Priority to CN201710254007.XA priority Critical patent/CN107103067A/en
Publication of CN107103067A publication Critical patent/CN107103067A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of method of data synchronization based on search engine, including:Warehouse is indexed according to service creation, index field is created in index warehouse;Parse the corresponding data source of index field, and the corresponding full dose synchrodata information of disposition data source in the form of the first table;The corresponding sync cap of full dose synchrodata information is called, data source is imported into index warehouse, the synchronization of full dose data is completed.The problem of present invention is used to solve data synchronization process inefficiency, realize real-time automatically synchronizing data, it is ensured that the integrality and accuracy of data.

Description

A kind of method of data synchronization and system based on search engine
Technical field
The present invention relates to communication technical field, particularly a kind of method of data synchronization and system based on search engine.
Background technology
In this information-based epoch, search engine plays extremely important effect in all trades and professions, for example, on-line shop Storekeeper needs, to brief introduction of the search engine offer on its commodity, to search for and read for buyer.And traditional search engine, its Back-stage management is that the transmission of data is realized by representing the character string of data message, but character string can not be directly very clear The person of being developed know that data may be lost or distort during synchrodata, cause data message inaccurate, and Manager is difficult to know in a short time again.In addition, the form species for the data source that the user such as businessman provides is various, and backstage is managed The plug-in unit of data source is parsed in reason can not very meet the parsing of data source of all kinds of forms, cause the parsing of data source to be lost Lose, this strong influence efficiency and accuracy of the search engine for data information transfer.
The content of the invention
The present invention provides a kind of method of data synchronization and system based on search engine, for solving data synchronization process effect The problem of rate is low, realizes real-time automatically synchronizing data, it is ensured that the integrality and accuracy of data.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of method of data synchronization based on search engine, Including:
Step 1, according to service creation index warehouse, create index field in the index warehouse;
Step 2, the corresponding data source of the parsing index field, and the configuration data source correspondence in the form of the first table Full dose synchrodata information;
Step 3, the corresponding sync cap of the full dose synchrodata information is called, the data source is imported into the index Warehouse, completes the synchronization of full dose data.
The beneficial effects of the invention are as follows:The present invention configures the same step number of full dose by configuration index field and in the form of a table It is believed that breath, it is to avoid it is traditional by the use of character string etc. as information carrier the problem of, realize the visualization of data message;Separately Outside, after full dose data to be imported to index warehouse by sync cap, administrative staff, which enter the back-stage management page, can check full dose Whether data are present in index warehouse.This method drastically increase by full dose data import index warehouse success rate and Accuracy rate, realizes real-time automatically synchronizing data, it is ensured that the integrality and accuracy of data.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the synchronous method also includes:
Step 4, when the data source is to that should have incremental data, configured and once described increased per synchronous in the form of the second table Measure the time interval of data;
Step 5, the sync cap is called, the incremental data and the time interval are imported into the index warehouse;
Step 6, record in second table time that the incremental data imports the index warehouse;
Step 7, using the moment as starting point, wait after the time interval, the synchronous incremental data completes incremental number According to synchronization.
Further beneficial effect is the present invention:After the corresponding full dose data of a business are synchronized, if the full dose number During according to there is corresponding incremental data, the incremental data can also be synchronized, increasing the flexibility of data syn-chronization.
Further, the step 1 includes:
Step 1.1, according to service creation index warehouse;
Step 1.2, the importing configuration files into the index warehouse;
Step 1.3, the configuration querying information in the configuration file, the Query Information include Data source table and data source Unique encodings;
Step 1.4, according to the Data source table, create index field, the index field includes index field name and institute State the corresponding field type of field name.
Further, when first table is configured with the corresponding full dose synchrodata information of multiple data sources, The step 3 includes:According to the order of the data source unique encodings, call the full dose synchrodata information corresponding successively Multiple data sources are imported the index warehouse by sync cap;Or,
When first table is configured with the corresponding full dose synchrodata information of multiple data sources, and receive only During the instruction of a synchronous data source, the step 3 includes:According to the corresponding data source unique encodings of the data source, adjust The index warehouse is imported with the corresponding sync cap of the full dose synchrodata information, and by the data source.
Further, it is necessary to which when one data source of re-synchronization or multiple data sources, the step 3 also includes:As needed The corresponding data source unique encodings of data source of re-synchronization, call the corresponding synchronization of the full dose synchrodata information to connect Mouthful, it would be desirable to the data source of re-synchronization imports the index warehouse.
Present invention also offers a kind of data synchronous system based on search engine, including:
Index field creation module, is indexed for indexing warehouse according to service creation, and being created in the index warehouse Field;
Synchrodata information collocation module, for the index field created according to the index field creation module, Parse the corresponding data source of the index field, and the corresponding full dose synchrodata of the configuration data source in the form of the first table Information;
Synchrodata import modul, for the same step number of the full dose configured according to the synchrodata information collocation module It is believed that breath, calls the corresponding sync cap of the full dose synchrodata information, and the data source is imported into the index warehouse.
The beneficial effects of the invention are as follows:The system is by index field creation module configuration index field and passes through synchronization Data message configuration module configures full dose synchrodata information in the form of a table, it is to avoid traditional is used as letter by the use of character string etc. The problem of ceasing carrier, realizes the visualization of data message;In addition, full dose data are imported by synchrodata import modul Index behind warehouse, administrative staff, which enter the back-stage management page, can check whether full dose data are present in index warehouse.This is System drastically increases the success rate and accuracy rate that full dose data are imported to index warehouse, realizes real-time automatically synchronizing data, It ensure that the integrality and accuracy of data.
Further, the synchrodata information collocation module is additionally operable to:When the data source is to that should have incremental data, with Time interval of the form configuration of second table per synchronous once described incremental data;
The synchrodata import modul is additionally operable to:The sync cap is called, by the incremental data and the time Interval imports the index warehouse;
The synchrodata information collocation module is additionally operable to:The incremental data importing is recorded in second table described Index the time in warehouse;
The synchrodata import modul is additionally operable to:Using the moment as starting point, wait after the time interval, synchronous institute State incremental data.
Further, the index field creation module specifically for:
Warehouse is indexed according to service creation, the importing configuration files into the index warehouse are matched somebody with somebody in the configuration file Query Information is put, the Query Information includes Data source table and data source unique encodings, according to the Data source table, creates index Field, the index field includes index field name field type corresponding with the field name.
Further, when first table is configured with the corresponding full dose synchrodata information of multiple data sources, The synchrodata import modul is used for:According to the order of the data source unique encodings, the same step number of the full dose is called successively It is believed that ceasing corresponding sync cap, multiple data sources are imported into the index warehouse;Or,
When first table is configured with the corresponding full dose synchrodata information of multiple data sources, and receive only During the instruction of a synchronous data source, the synchrodata import modul is used for:According to the corresponding data source of the data source Unique encodings, call the corresponding sync cap of the full dose synchrodata information, and the data source is imported into the index warehouse.
Further, it is necessary to which when one data source of re-synchronization or multiple data sources, the synchrodata import modul is also used In:In the corresponding data source unique encodings of the data source of re-synchronization as needed, the full dose synchrodata is called to believe Cease corresponding sync cap, it would be desirable to which the data source of re-synchronization imports the index warehouse.
Brief description of the drawings
Fig. 1 is a kind of schematic flow sheet for method of data synchronization based on search engine that the embodiment of the present invention one is provided;
Fig. 2 is a kind of schematic flow sheet for method of data synchronization based on search engine that the embodiment of the present invention two is provided;
Fig. 3 is Fig. 1 and/or the schematic flow sheet of the step 110 in Fig. 2;
Fig. 4 is a kind of schematic structure for data synchronous system based on search engine that the embodiment of the present invention three is provided Figure.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.
Embodiment one:
A kind of method of data synchronization 100 based on search engine, as shown in figure 1, including:
Step 110, according to service creation index warehouse, index warehouse in create index field;
Step 120, the corresponding data source of parsing index field, and the corresponding full dose of disposition data source in the form of the first table Synchrodata information;
Step 130, the corresponding sync cap of full dose synchrodata information is called, data source is imported into index warehouse, completed The synchronization of full dose data.
Embodiment two:
Optionally, as an alternative embodiment of the invention, as shown in Fig. 2 methods described 100 includes:
Step 110, according to service creation index warehouse, index warehouse in create index field;
Step 120, the corresponding data source of parsing index field, and the corresponding full dose of disposition data source in the form of the first table Synchrodata information;
Step 130, the corresponding sync cap of full dose synchrodata information is called, data source is imported into index warehouse, completed The synchronization of full dose data;
Step 140, when data source is to that should have incremental data, configured in the form of the second table per a synchronous incremental data Time interval;
Step 150, sync cap is called, incremental data and time interval are imported into index warehouse;
Step 160, in the second table recording increment data import index warehouse at the time of;
Step 170, using the above-mentioned moment as starting point, wait after above-mentioned time interval, synchronous incremental data, complete incremental data Synchronization.
Specifically, in the above-described embodiments, as shown in figure 3, the step 110 in Fig. 1 and/or Fig. 2 includes:
Step 111, according to service creation index warehouse corel;
Step 112, the importing configuration files into the core1/conf files in index warehouse;
Step 113, the configuration querying information in configuration file, Query Information include rule searching, Data source table and data Source unique encodings;
Step 114, according to Data source table, create index field, the index field includes index field name and field name pair The field type answered.
Wherein, configuration querying information in step 113, be specifically:Unique Key parameter values are set to data source Unique encodings (i.e. id), for example:<unique Key>doc id</unique Key>;By default search field ginsengs Acquiescence is the corresponding field of data source in search Data source table when numerical value is set to search, for example:<default search field>doctitle</default search field>;By solr query parser parameter attributes default Operator value is set to the rule searching of acquiescence, for example:<Solr queryparser default operator=" OR"/>.Wherein, business one data source of correspondence, data source one unique encodings of correspondence.
Index field is created in step 114, is specifically:Setting in Data source table needs to be synchronized to the number in search engine According to the corresponding field name in source, field type, for example:<Field name=" doctitle " type=" text_ik " indexed =" true " stored=" true " omitNorms=" true "/>, wherein, name property values are field name, type attributes It is worth for field type.
In addition, it is necessary to explanation, in embodiment one and embodiment two, for parsing the corresponding data source of index field The parsing data algorithm increased income for apache tika in plug-in unit, and the plug-in unit of increasing income of instrument have passed through optimization so that this is opened Source plug-in unit can be to any form (for example:Mysql, oracle, txt, word, ppt, excel and pdf) data source solved Analysis, and the success rate parsed is 100%, which greatly improves the success rate that full dose data are imported to index warehouse and accurately Rate, it is ensured that the integrality and accuracy of data.
Simultaneously, it is necessary to which explanation, full dose synchrodata information is configuration information, the configuration information is remembered by three tables Link information, the number of Data source table and the corresponding index field of data source of data source are recorded, for positioning, connecting and search for Data source, in addition, the synchronization on incremental data, except above-mentioned three tables, in addition to the 4th table, the 4th token record is same Walk the time interval of incremental data and incremental data is imported to the time for indexing warehouse.In addition, many numbers of multinomial business correspondence According to source, the configuration information of each business can be all recorded in above-mentioned table, for example, the full dose for data source is synchronous, above-mentioned three Table is respectively table 1, table 2 and table 3, and business has A business and B business, then the corresponding configuration information of A business is distributed in the of table 1 The first row of a line, the first row of table 2 and table 3, the corresponding configuration information of B business is distributed in the second row of table 1, the second of table 2 The third line of row and table 3.
When full dose data (i.e. data source) are to that there should be incremental data and when needing synchronous, step 140~170 are performed, wherein, After calling sync cap that time interval and incremental data are successfully imported to index warehouse, the importing time is recorded, it is above-mentioned waiting After time interval, above-mentioned incremental data can synchronously enter the corresponding index warehouse of full dose data, complete the synchronization of incremental data, For example, time interval be 5 minutes, incremental data import index warehouse time for 8 points 15 minutes, wait 5 minutes after, 8: 20 Divide synchronous incremental data, the synchronization of incremental data is completed, without calling the corresponding sync cap of full dose synchrodata information again. In addition, above-mentioned time interval can be determined on a case-by-case basis.
After calling the corresponding sync cap of full dose synchrodata information and data source imported into index warehouse, staff The back-stage management page can be logged in, into index warehouse, inquiry data source whether there is, if it does, explanation is imported successfully, if Be not present, it may be possible to which the data source address that user provides is not connected, or hardware problem, can manual intervention, again from step 110 proceed by data syn-chronization operation.
Specifically, in embodiment two, in step 120, step 140 and step 160, during configuration full dose synchrodata information Need to set up the first table, need to set up the second table when configuring the time interval of corresponding incremental data often synchronization once.Wherein, One table includes search_db_tb tables, search_db tables, search_db_tb_field tables, and the second table includes sys_task tables.
Wherein,
Search_db_tb tables:
id db_id tb_name query delta_query_id delta_query
The project that search_db_tb tables include is:Id, db_id, tb_name, query, delta_query_id and delta_query.Wherein, id is table major key;Db_id is the major key of search_db tables;Tb_name is data table name;pk_ Id is tables of data major key;Inquiry sql sentences when query is synchrodata;When delta_query_id is increment synchronization data The inquiry sql sentences of execution, Query Result is to need the data id of increment synchronization;When delta_query is increment synchronization data The inquiry sql sentences of execution, the sql implementing results inquiry data in delta_query_id field values.
Search_db tables:
id service_id url driver username passwordid
The project that search_db tables include is:Id, service_id, url, driver, username and passwordid.Id is table major key;Service_id is the major key of search_index tables;Url, driver, username, Password-id is data road link information.
Search_db_tb_field tables:
The project that search_db_tb_field tables include is:id、tb_id、field_name、index_name、is_ Filter_html, is_pinyin, index_pinyin_name and doc_obtainid.Id is table major key;Tb_id is The major key of search_db_tb tables;Field_name is that tables of data will be synchronized to the field name in search engine;index_name For data sheet field corresponding field name in a search engine;Is_filter_html represents whether field value filters html marks Label (1 represents filtering, and 2 represent not filter) is_pinyin represents whether to turn phonetic by field value that (1 represents to turn phonetic, and 2 represent not turn Phonetic) index_pinyin_name is expressed as turning the field value of phonetic, is synchronized to the field name after search engine;doc_ Obtain indicates whether to take file content according to path (1 represents it is that 2 represent no).
Sys_task tables:
id task_name class_pash expression last_task_timel
The project that sys_task tables include is:Id, task_name, class_pash, expression and last_task_ time.Id is table major key;Task_name is Incremental Transactions title;Class_pash is Incremental Transactions path;Expression is Incremental Transactions time interval;Last_task_time is the time that last Incremental Transactions are performed.
It should be noted that a full dose synchrodata is then accordingly performed when there is multiple incremental datas to need synchronous Multiple step 140~step 160.
Specifically, in the above-described embodiments, when the first table is configured with the corresponding full dose synchrodata information of multiple data sources When, step 130 includes:According to the order of data source unique encodings, the corresponding synchronization of full dose synchrodata information is called to connect successively Mouthful, multiple data sources are imported into index warehouse;Or,
When the first table is configured with the corresponding full dose synchrodata information of multiple data sources, and receive an only synchronous data During the instruction in source, step 130 includes:According to the corresponding data source unique encodings of the data source, call unique containing the data source The sync cap of coding, and the data source is imported into index warehouse.
In addition, it is necessary to which when one data source of re-synchronization or multiple data sources, step 130 also includes:As needed again The synchronous corresponding data source unique encodings of data source, call the corresponding sync cap of full dose synchrodata information, it would be desirable to weight New synchronous data source imports index warehouse.
The present invention configures full dose synchrodata information by configuration index field and in the form of a table, it is to avoid traditional The problem of by the use of character string etc. as information carrier, realize the visualization of data message;In addition, inciting somebody to action complete by sync cap Measure data to import behind index warehouse, administrative staff, which enter the back-stage management page, can check whether full dose data are present in indexing storehouse In storehouse, further, increased income plug-in unit for apache tika because the present invention parses being used for of using the instrument of data source, and should The parsing data algorithm increased income in plug-in unit have passed through optimization so that the plug-in unit of increasing income can be to any form (for example:mysql、 Oracle, txt, word, ppt, excel and pdf) data source parsed, and parsing success rate be 100%.This method The success rate and accuracy rate that full dose data are imported to index warehouse are drastically increased, real-time automatically synchronizing data is realized, protected The integrality and accuracy of data are demonstrate,proved.
Embodiment three:
Present invention also offers a kind of data synchronous system 200 based on search engine, as shown in figure 4, including:
Index field creation module, for indexing warehouse according to service creation, and creates index field in index warehouse;
Synchrodata information collocation module, for the index field created according to index field creation module, parsing index The corresponding data source of field, and the corresponding full dose synchrodata information of disposition data source in the form of the first table;
Synchrodata import modul, for the full dose synchrodata information configured according to synchrodata information collocation module, The corresponding sync cap of full dose synchrodata information is called, and data source is imported into index warehouse.
In addition, it is necessary to explanation, after the corresponding full dose data of a business are synchronized, if the full dose data have correspondence Incremental data when, the incremental data can also be synchronized.Accordingly,
Synchrodata information collocation module is additionally operable to:When data source is to that there should be incremental data, matched somebody with somebody in the form of the second table Put the time interval of every incremental data of synchronization;
Synchrodata import modul is additionally operable to:Sync cap is called, incremental data and time interval are imported into index warehouse;
Synchrodata information collocation module is additionally operable to:Recorded in the second table by incremental data import index warehouse when Carve;
Synchrodata import modul is additionally operable to:So that constantly for starting point, after latency period, synchronous incremental data is completed The synchronization of incremental data.
Wherein, index field creation module specifically for:
Warehouse is indexed according to service creation, the importing configuration files into index warehouse, configuration querying is believed in configuration file Breath, and according to Query Information, index field is created, wherein, it is unique that Query Information includes rule searching, Data source table and data source Coding, index field includes index field name field type corresponding with field name.
When the first table is configured with the corresponding full dose synchrodata information of multiple data sources, synchrodata import modul is used In:According to the order of data source unique encodings, the corresponding sync cap of full dose synchrodata information is called successively, by multiple data Source imports index warehouse;Or,
When the first table is configured with the corresponding full dose synchrodata information of multiple data sources, and receive an only synchronous data During the instruction in source, synchrodata import modul is used for:According to the corresponding data source unique encodings of the data source, call full dose synchronous The corresponding sync cap of data message, and the data source is imported into index warehouse.
When needing one data source of re-synchronization or multiple data sources, synchrodata import modul is additionally operable to:As needed The corresponding data source unique encodings of data source of re-synchronization, call the corresponding sync cap of full dose synchrodata information, need to The data source of re-synchronization is wanted to import index warehouse.
It should be noted that the system is developed by Java language, by index field creation module configuration index field with And full dose synchrodata information is configured by synchrodata information collocation module in the form of a table, it is to avoid traditional utilization character The problem of string etc. is as information carrier, realizes the visualization of data message;In addition, inciting somebody to action complete by synchrodata import modul Measure data to import behind index warehouse, administrative staff, which enter the back-stage management page, can check whether full dose data are present in indexing storehouse In storehouse, further, because the instrument for being used to parse data source that the present invention is used in synchrodata information collocation module is The parsing data algorithm that apache tika increase income in plug-in unit, and the plug-in unit of increasing income have passed through optimization so that the plug-in unit of increasing income can be right Any form is (for example:Mysql, oracle, txt, word, ppt, excel and pdf) data source parsed, and parsing Success rate is 100%.The system drastically increases the success rate and accuracy rate that full dose data are imported to index warehouse, realizes Real-time automatically synchronizing data, it is ensured that the integrality and accuracy of data.
Further, since the instrument for being used to parse data source that the present invention is used in synchrodata information collocation module is The parsing data algorithm that apache tika increase income in plug-in unit, and the plug-in unit of increasing income have passed through optimization so that the plug-in unit of increasing income can be right The data source of any form is parsed, and after tested, the success rate of parsing is 100%, unsuccessful if there is parsing, that It is probably that the database of other side is not connected, or hardware problem.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (10)

1. a kind of method of data synchronization based on search engine, it is characterised in that including:
Step 1, according to service creation index warehouse, create index field in the index warehouse;
Step 2, the corresponding data source of the parsing index field, and the configuration data source is corresponding complete in the form of the first table Measure synchrodata information;
Step 3, the corresponding sync cap of the full dose synchrodata information is called, the data source is imported into the index storehouse Storehouse, completes the synchronization of full dose data.
2. a kind of method of data synchronization based on search engine according to claim 1, it is characterised in that the synchronization side Method also includes:
Step 4, when the data source is to that should have incremental data, configured in the form of the second table per synchronous once described incremental number According to time interval;
Step 5, the sync cap is called, the incremental data and the time interval are imported into the index warehouse;
Step 6, at the time of record in second table incremental data and import the index warehouse;
Step 7, using the moment as starting point, wait after the time interval, the synchronous incremental data completes incremental data It is synchronous.
3. a kind of method of data synchronization based on search engine according to claim 1 or 2, it is characterised in that the step Rapid 1 includes:
Step 1.1, according to service creation index warehouse;
Step 1.2, the importing configuration files into the index warehouse;
Step 1.3, the configuration querying information in the configuration file, the Query Information includes Data source table and data source is unique Coding;
Step 1.4, according to the Data source table, create index field, the index field includes index field name and the word The corresponding field type of section name.
4. a kind of method of data synchronization based on search engine according to claim 3, it is characterised in that when described first When table is configured with multiple data sources corresponding full dose synchrodata information, the step 3 includes:According to the data The order of source unique encodings, calls the corresponding sync cap of the full dose synchrodata information successively, by multiple data sources Import the index warehouse;Or,
When first table is configured with the corresponding full dose synchrodata information of multiple data sources, and receive only synchronous During the instruction of one data source, the step 3 includes:According to the corresponding data source unique encodings of the data source, institute is called The corresponding sync cap of full dose synchrodata information is stated, and the data source is imported into the index warehouse.
5. a kind of method of data synchronization based on search engine according to claim 3, it is characterised in that need again same When walking a data source or multiple data sources, the step 3 includes:The corresponding number of data source of re-synchronization as needed According to source unique encodings, the corresponding sync cap of the full dose synchrodata information is called, it would be desirable to which the data source of re-synchronization is led Enter the index warehouse.
6. a kind of data synchronous system based on search engine, it is characterised in that including:
Index field creation module, for indexing warehouse according to service creation, and creates index field in the index warehouse;
Synchrodata information collocation module, for the index field created according to the index field creation module, parsing The corresponding data source of the index field, and configure in the form of the first table the corresponding full dose synchrodata letter of the data source Breath;
Synchrodata import modul, the full dose synchrodata for being configured according to the synchrodata information collocation module is believed Breath, calls the corresponding sync cap of the full dose synchrodata information, and the data source is imported into the index warehouse.
7. a kind of data synchronous system based on search engine according to claim 6, it is characterised in that the same to step number It is additionally operable to according to information collocation module:When the data source is to that there should be incremental data, configured in the form of the second table per synchronous one The time interval of the secondary incremental data;
The synchrodata import modul is additionally operable to:The sync cap is called, by the incremental data and the time interval Import the index warehouse;
The synchrodata information collocation module is additionally operable to:The incremental data is recorded in second table and imports the index The time in warehouse;
The synchrodata import modul is additionally operable to:Using the moment as starting point, wait after the time interval, the synchronous increasing Measure data.
8. a kind of data synchronous system based on search engine according to claim 7, it is characterised in that the index word Section creation module specifically for:
Warehouse is indexed according to service creation, the importing configuration files into the index warehouse are configured in the configuration file and looked into Information is ask, the Query Information includes Data source table and data source unique encodings, according to the Data source table, creates index word Section, the index field includes index field name field type corresponding with the field name.
9. a kind of data synchronous system based on search engine according to claim 8, it is characterised in that when described first When table is configured with multiple data sources corresponding full dose synchrodata information, the synchrodata import modul is used for: According to the order of the data source unique encodings, the corresponding sync cap of the full dose synchrodata information is called successively, will be many The individual data source imports the index warehouse;Or,
When first table is configured with the corresponding full dose synchrodata information of multiple data sources, and receive only synchronous During the instruction of one data source, the synchrodata import modul is used for:It is unique according to the corresponding data source of the data source Coding, calls the corresponding sync cap of the full dose synchrodata information, and the data source is imported into the index warehouse.
10. a kind of data synchronous system based on search engine according to claim 9, it is characterised in that need again When a synchronous data source or multiple data sources, the synchrodata import modul is additionally operable to:The number of re-synchronization as needed According to the corresponding data source unique encodings in source, the corresponding sync cap of the full dose synchrodata information is called, it would be desirable to weight New synchronous data source imports the index warehouse.
CN201710254007.XA 2017-04-18 2017-04-18 A kind of method of data synchronization and system based on search engine Pending CN107103067A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710254007.XA CN107103067A (en) 2017-04-18 2017-04-18 A kind of method of data synchronization and system based on search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710254007.XA CN107103067A (en) 2017-04-18 2017-04-18 A kind of method of data synchronization and system based on search engine

Publications (1)

Publication Number Publication Date
CN107103067A true CN107103067A (en) 2017-08-29

Family

ID=59657051

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710254007.XA Pending CN107103067A (en) 2017-04-18 2017-04-18 A kind of method of data synchronization and system based on search engine

Country Status (1)

Country Link
CN (1) CN107103067A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108845995A (en) * 2018-03-23 2018-11-20 腾讯科技(深圳)有限公司 Data processing method, device, storage medium and electronic device
CN109739887A (en) * 2018-12-21 2019-05-10 平安科技(深圳)有限公司 Synchronous task searching method, system, device and readable storage medium storing program for executing
CN110147362A (en) * 2019-04-04 2019-08-20 中电科大数据研究院有限公司 One kind is based on the acquisition of event driven DOC DATA and processing system and its method
CN110245134A (en) * 2019-04-26 2019-09-17 石化盈科信息技术有限责任公司 A kind of increment synchronization method applied to search service
CN110263028A (en) * 2019-04-26 2019-09-20 石化盈科信息技术有限责任公司 A kind of full dose synchronous method applied to search service
CN111865576A (en) * 2020-07-03 2020-10-30 北京天空卫士网络安全技术有限公司 Method and device for synchronizing URL classification data
CN112507200A (en) * 2020-12-28 2021-03-16 浪潮云信息技术股份公司 Method and apparatus for synchronizing data into search engine
CN113378022A (en) * 2020-03-10 2021-09-10 北京搜狗科技发展有限公司 In-station search platform, search method and related device
CN115098648A (en) * 2022-08-25 2022-09-23 歌尔股份有限公司 Enterprise data searching method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140164325A1 (en) * 2012-12-07 2014-06-12 Institute For Information Industry Data synchronization system and method for synchronizing data
CN105930493A (en) * 2016-05-04 2016-09-07 北京思特奇信息技术股份有限公司 Method and system for data synchronization between different databases
CN106095911A (en) * 2016-06-07 2016-11-09 腾讯科技(深圳)有限公司 Search system and method for data synchronization
CN106469158A (en) * 2015-08-17 2017-03-01 杭州海康威视系统技术有限公司 Method of data synchronization and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140164325A1 (en) * 2012-12-07 2014-06-12 Institute For Information Industry Data synchronization system and method for synchronizing data
CN106469158A (en) * 2015-08-17 2017-03-01 杭州海康威视系统技术有限公司 Method of data synchronization and device
CN105930493A (en) * 2016-05-04 2016-09-07 北京思特奇信息技术股份有限公司 Method and system for data synchronization between different databases
CN106095911A (en) * 2016-06-07 2016-11-09 腾讯科技(深圳)有限公司 Search system and method for data synchronization

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108845995B (en) * 2018-03-23 2020-08-18 腾讯科技(深圳)有限公司 Data processing method, data processing apparatus, storage medium, and electronic apparatus
CN108845995A (en) * 2018-03-23 2018-11-20 腾讯科技(深圳)有限公司 Data processing method, device, storage medium and electronic device
CN109739887A (en) * 2018-12-21 2019-05-10 平安科技(深圳)有限公司 Synchronous task searching method, system, device and readable storage medium storing program for executing
CN110147362A (en) * 2019-04-04 2019-08-20 中电科大数据研究院有限公司 One kind is based on the acquisition of event driven DOC DATA and processing system and its method
CN110245134B (en) * 2019-04-26 2021-07-06 石化盈科信息技术有限责任公司 Increment synchronization method applied to search service
CN110263028A (en) * 2019-04-26 2019-09-20 石化盈科信息技术有限责任公司 A kind of full dose synchronous method applied to search service
CN110263028B (en) * 2019-04-26 2021-06-15 石化盈科信息技术有限责任公司 Full-scale synchronization method applied to search service
CN110245134A (en) * 2019-04-26 2019-09-17 石化盈科信息技术有限责任公司 A kind of increment synchronization method applied to search service
CN113378022A (en) * 2020-03-10 2021-09-10 北京搜狗科技发展有限公司 In-station search platform, search method and related device
CN111865576A (en) * 2020-07-03 2020-10-30 北京天空卫士网络安全技术有限公司 Method and device for synchronizing URL classification data
CN111865576B (en) * 2020-07-03 2023-02-28 北京天空卫士网络安全技术有限公司 Method and device for synchronizing URL classification data
CN112507200A (en) * 2020-12-28 2021-03-16 浪潮云信息技术股份公司 Method and apparatus for synchronizing data into search engine
CN115098648A (en) * 2022-08-25 2022-09-23 歌尔股份有限公司 Enterprise data searching method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN107103067A (en) A kind of method of data synchronization and system based on search engine
CN103902653B (en) A kind of method and apparatus for building data warehouse table genetic connection figure
Bast et al. Open information extraction via contextual sentence decomposition
USRE48030E1 (en) Computer-implemented system and method for tagged and rectangular data processing
US10496624B2 (en) Index key generating device, index key generating method, and search method
CN107608949A (en) A kind of Text Information Extraction method and device based on semantic model
TW201901661A (en) Speech recognition method and system
EP1225516A1 (en) Storing data of an XML-document in a relational database
CN103186639B (en) Data creation method and system
CN100444591C (en) Method for acquiring front-page keyword and its application system
CN103177120B (en) A kind of XPath query pattern tree matching method based on index
US9110852B1 (en) Methods and systems for extracting information from text
CN110781183B (en) Processing method and device for incremental data in Hive database and computer equipment
CN102737049A (en) Method and system for database query
CN105550359B (en) Webpage sorting method and device based on vertical search and server
CN101520770A (en) Method and device for analyzing, converting and splitting structured data
CN110532358A (en) A kind of template automatic generation method towards knowledge base question and answer
CN110909168A (en) Knowledge graph updating method and device, storage medium and electronic device
CN108829651A (en) A kind of method, apparatus of document treatment, terminal device and storage medium
CN111198898A (en) Big data query method and big data query device
CN104021216B (en) Message proxy server and information publish subscription method and system
CN104572736A (en) Keyword extraction method and device based on social networking services
CN110119404B (en) Intelligent access system and method based on natural language understanding
WO2018226255A1 (en) Functional equivalence of tuples and edges in graph databases
CN106407288B (en) Method and system for synchronously updating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170829