CN111563142A - SQL automatic benchmarking matching method and device - Google Patents

SQL automatic benchmarking matching method and device Download PDF

Info

Publication number
CN111563142A
CN111563142A CN202010674260.2A CN202010674260A CN111563142A CN 111563142 A CN111563142 A CN 111563142A CN 202010674260 A CN202010674260 A CN 202010674260A CN 111563142 A CN111563142 A CN 111563142A
Authority
CN
China
Prior art keywords
name
standard
matching
synonym
sql
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010674260.2A
Other languages
Chinese (zh)
Inventor
张艳清
查文宇
周宇
刘俊良
金日海
王怡君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN202010674260.2A priority Critical patent/CN111563142A/en
Publication of CN111563142A publication Critical patent/CN111563142A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for matching SQL automatic benchmarks, which mainly relate to the field of data processing and data management and are used for the association mapping between metadata items acquired in the data management and standards.

Description

SQL automatic benchmarking matching method and device
Technical Field
The invention relates to the field of data management, in particular to a method and a device for automatically matching SQL benchmarks.
Background
Structured Query Language (SQL), a special purpose programming Language, is a database Query and programming Language for accessing data and querying, updating, and managing relational database systems. Structured query languages are high-level, non-procedural programming languages that allow users to work on high-level data structures. The method does not require a user to specify a data storage method and does not require the user to know a specific data storage mode, so that different database systems with completely different underlying structures can use the same structured query language as an interface for data input and management. The structured query language statements can be nested, which allows for great flexibility and powerful functionality. SQL can independently complete all activities in the life cycle of the database, including a series of operations such as defining a relation mode, inputting data, establishing the database, checking, updating, maintaining, reconstructing the database, controlling the security of the database and the like, so that a good environment is provided for the development of a database application system, the mode can be gradually modified as required at any time after the database is put into operation, and the operation of the database is not influenced, so that the system has good expandability.
The operation mode of the existing SQL benchmarking scheme is as follows: firstly, inquiring codes with the same name according to Chinese and English names of data items, and if the codes have the same name, performing association and bidding; secondly, matching by English names if no codes matched with Chinese and English exist, and performing association benchmarking if the codes matched with English exist; thirdly, if the association of the first two steps fails, the association standard needs to be manually specified.
The method has the problems that the matching degree is low, the synonyms cannot be matched only by the consistency of names, and when the synonyms cannot be matched through the names, a large amount of manual operation is needed, so that the process is very complicated.
Disclosure of Invention
The invention aims to: the method and the device for automatically matching the SQL against the target solve the problems that the matching degree of the existing scheme is low, matching can be carried out only by means of name consistency, synonyms cannot be matched, and when the matching cannot be carried out through the names, a large amount of manual operation is needed, and the process is very complicated.
The technical scheme adopted by the invention is as follows:
a SQL automatic benchmarking matching method comprises the following steps:
s1, reading the first name of the data item, inquiring whether the first name has the standard with the same name or not according to the first name, if so, performing association and object alignment, and if not, turning to the step S2;
s2, identifying the first name to obtain a second name of the Chinese part of the first name and a third name of the non-Chinese part of the first name;
s3, inquiring whether the standard has the same name according to any one of the second name and the third name, if so, performing association and object matching, and if not, going to the step S4;
s4, inquiring whether the synonyms of the first name of the synonym library have the same name standard, if so, performing association and object alignment, otherwise, turning to the step S5;
s5, after the first name is subjected to word segmentation processing, whether the word segmentation has the same name standard or not is inquired, if yes, the association is carried out to the mark, and if not, the step S6 is carried out;
and S6, performing manual matching according to the first name to complete the association benchmarking of the data item.
Adopt above-mentioned scheme, through the upgrading to algorithms such as synonym, participle, associativity in the algorithm chain, can optimize the holistic matching degree that promotes in later stage, let automatic matching degree to the mark can improve, reduce manual operation's work load, it is low to have solved current scheme matching degree, must the name unanimity just can match, can't match to the synonym, and when unable when matching through the name, need the manual work to carry out a large amount of operations, the very loaded down with trivial details problem of process.
Further, the method for querying whether the criteria have the same name according to the synonym of the first name in step S4 includes the following steps:
s401, querying a synonym of the second name as a fourth name and querying a synonym of the third name as a fifth name in a synonym library;
s402, inquiring whether the standard with the same name exists according to any item of the fourth name and the fifth name.
Further, the method for establishing the synonym library comprises the following steps:
s403, establishing a KV database, and setting the name of each standard as a Key;
s404, loading an existing synonym database, and writing synonyms corresponding to each Key into a KV engine as KV;
s405, after the data item is manually matched according to the first name in the step S6, the corresponding first name is added into KV corresponding to the matched standard.
Further, the method for querying whether the participles have the same name after performing the participle processing on the first name in step S5 includes the following steps:
s501, performing word segmentation on the second name to obtain a second name group, and performing word segmentation on the third name to obtain a third name group;
s502, traversing all the participles in the second name group and the third name group, and inquiring whether the participles have the same name standard.
Further, the step S5 further includes:
s503, associating the second name to obtain a sixth name, and associating the third name to obtain a seventh name;
s504, whether the standard has the same name is inquired according to any one of the sixth name and the seventh name, if yes, the standard is correlated, and if not, the step S6 is carried out.
An apparatus for matching SQL auto-benchmarks, comprising:
a memory: for storing executable instructions;
a processor: the system is used for executing the executable instructions stored in the memory to realize the SQL automatic benchmarking matching method.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the invention relates to a matching method and a device for automatically matching SQL (structured query language) targets, which solve the problems that the matching degree of the existing scheme is low, matching can be carried out only by the condition that names are consistent, synonyms cannot be matched, and a large amount of manual operation is needed when matching of the names cannot be carried out, so that the process is very complicated;
2. according to the method and the device for matching the SQL automatic benchmarks, the synonym database adopts an incremental modeling method, the database is updated in real time, manual repeated operation is avoided, the matching degree of the automatic benchmarks is improved, and the workload of manual operation is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts, wherein:
FIG. 1 is a schematic flow diagram of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to fig. 1, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
And (4) standard alignment: and mapping the data item information in the collected metadata with the existing standard so as to facilitate subsequent processing.
Example 1
As shown in fig. 1, a method for matching SQL automatic benchmarks includes the following steps:
s1, reading the first name of the data item, inquiring whether the first name has the standard with the same name or not according to the first name, if so, performing association and object alignment, and if not, turning to the step S2;
s2, identifying the first name to obtain a second name of the Chinese part of the first name and a third name of the non-Chinese part of the first name;
s3, inquiring whether the standard has the same name according to any one of the second name and the third name, if so, performing association and object matching, and if not, going to the step S4;
s4, inquiring whether the synonyms of the first name of the synonym library have the same name standard, if so, performing association and object alignment, otherwise, turning to the step S5;
s5, after the first name is subjected to word segmentation processing, whether the word segmentation has the same name standard or not is inquired, if yes, the association is carried out to the mark, and if not, the step S6 is carried out;
and S6, performing manual matching according to the first name to complete the association benchmarking of the data item.
Adopt above-mentioned scheme, through the upgrading to algorithms such as synonym, participle, associativity in the algorithm chain, can optimize the holistic matching degree that promotes in later stage, let automatic matching degree to the mark can improve, reduce manual operation's work load, it is low to have solved current scheme matching degree, must the name unanimity just can match, can't match to the synonym, and when unable when matching through the name, need the manual work to carry out a large amount of operations, the very loaded down with trivial details problem of process.
Example 2
In this embodiment, based on embodiment 1, the method for querying whether the synonyms of the first name have the same name in step S4 includes the following steps:
s401, querying a synonym of the second name as a fourth name and querying a synonym of the third name as a fifth name in a synonym library;
s402, inquiring whether the standard with the same name exists according to any item of the fourth name and the fifth name.
Example 3
In this embodiment, on the basis of embodiment 1, the method for establishing the synonym library includes:
s403, establishing a KV database, and setting the name of each standard as a Key;
s404, loading an existing synonym database, and writing synonyms corresponding to each Key into a KV engine as KV;
s405, after the data item is manually matched according to the first name in the step S6, the corresponding first name is added into KV corresponding to the matched standard.
And when the synonym query is carried out on the second name, searching in the KV database, reading a Key corresponding to a word if the word with the same name as the second name is searched, and replacing the second name with the Key corresponding to the data as the synonym of the second name to finish the synonym query.
Meanwhile, the incremental modeling method is adopted in the scheme, after manual matching is carried out each time, the result of manual matching is recorded, the first name of manual matching is counted into the corresponding KV data, manual repeated labor is avoided, the matching degree of automatic target matching is improved, and the workload of manual operation is reduced.
Example 4
In this embodiment, based on embodiment 1, the method for querying whether the participles have the same name after performing the participle processing on the first name in step S5 includes the following steps:
s501, performing word segmentation on the second name to obtain a second name group, and performing word segmentation on the third name to obtain a third name group;
s502, traversing all the participles in the second name group and the third name group, and inquiring whether the participles have the same name standard.
Example 5
This embodiment is further based on embodiment 1, and the step S5 further includes:
s503, associating the second name to obtain a sixth name, and associating the third name to obtain a seventh name;
s504, whether the standard has the same name is inquired according to any one of the sixth name and the seventh name, if yes, the standard is correlated, and if not, the step S6 is carried out.
Example 6
An apparatus for matching SQL auto-benchmarks, comprising:
a memory: for storing executable instructions;
a processor: the system is used for executing the executable instructions stored in the memory to realize the SQL automatic benchmarking matching method.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (6)

1. A SQL automatic benchmarking matching method is characterized in that: the method comprises the following steps:
s1, reading the first name of the data item, inquiring whether the first name has the standard with the same name or not according to the first name, if so, performing association and object alignment, and if not, turning to the step S2;
s2, identifying the first name to obtain a second name of the Chinese part of the first name and a third name of the non-Chinese part of the first name;
s3, inquiring whether the standard has the same name according to any one of the second name and the third name, if so, performing association and object matching, and if not, going to the step S4;
s4, inquiring whether the synonyms of the first name of the synonym library have the same name standard, if so, performing association and object alignment, otherwise, turning to the step S5;
s5, after the first name is subjected to word segmentation processing, whether the word segmentation has the same name standard or not is inquired, if yes, the association is carried out to the mark, and if not, the step S6 is carried out;
and S6, performing manual matching according to the first name to complete the association benchmarking of the data item.
2. The method for matching SQL automatic benchmarks according to claim 1, characterized in that: the method for querying whether the criteria of the same name are provided according to the synonym of the first name in the step S4 includes the following steps:
s401, querying a synonym of the second name as a fourth name and querying a synonym of the third name as a fifth name in a synonym library;
s402, inquiring whether the standard with the same name exists according to any item of the fourth name and the fifth name.
3. The method for matching SQL automatic benchmarks according to claim 1, characterized in that: the method for establishing the synonym library comprises the following steps:
s403, establishing a KV database, and setting the name of each standard as a Key;
s404, loading an existing synonym database, and writing synonyms corresponding to each Key into a KV engine as KV;
s405, after the data item is manually matched according to the first name in the step S6, the corresponding first name is added into KV corresponding to the matched standard.
4. The method for matching SQL automatic benchmarks according to claim 1, characterized in that: the method for querying whether the participles have the same name after performing the participle processing on the first name in the step S5 includes the following steps:
s501, performing word segmentation on the second name to obtain a second name group, and performing word segmentation on the third name to obtain a third name group;
s502, traversing all the participles in the second name group and the third name group, and inquiring whether the participles have the same name standard.
5. The method for matching SQL automatic benchmarks according to claim 1, characterized in that: the step S5 further includes:
s503, associating the second name to obtain a sixth name, and associating the third name to obtain a seventh name;
s504, whether the standard has the same name is inquired according to any one of the sixth name and the seventh name, if yes, the standard is correlated, and if not, the step S6 is carried out.
6. The utility model provides a matching device of automatic benchmarking of SQL which characterized in that: the method comprises the following steps:
a memory: for storing executable instructions;
a processor: the executable instructions stored in the memory are executed to implement the SQL automatic benchmarking matching method according to claim 1.
CN202010674260.2A 2020-07-14 2020-07-14 SQL automatic benchmarking matching method and device Pending CN111563142A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010674260.2A CN111563142A (en) 2020-07-14 2020-07-14 SQL automatic benchmarking matching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010674260.2A CN111563142A (en) 2020-07-14 2020-07-14 SQL automatic benchmarking matching method and device

Publications (1)

Publication Number Publication Date
CN111563142A true CN111563142A (en) 2020-08-21

Family

ID=72075443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010674260.2A Pending CN111563142A (en) 2020-07-14 2020-07-14 SQL automatic benchmarking matching method and device

Country Status (1)

Country Link
CN (1) CN111563142A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101788992A (en) * 2009-05-06 2010-07-28 厦门东南融通系统工程有限公司 Method and system for converting query sentence of database
CN108846101A (en) * 2018-06-19 2018-11-20 艾普阳科技(深圳)有限公司 A kind of method and apparatus automatically generating SQL statement
CN108922633A (en) * 2018-06-22 2018-11-30 北京海德康健信息科技有限公司 A kind of disease name standard convention method and canonical system
CN109408561A (en) * 2018-10-17 2019-03-01 杭州骑轻尘信息技术有限公司 Business Name matching process and device
CN109542453A (en) * 2018-11-20 2019-03-29 北京千丁互联科技有限公司 Database information recognition methods, device and terminal
CN109726217A (en) * 2019-01-10 2019-05-07 北京字节跳动网络技术有限公司 A kind of database operation method, device, equipment and storage medium
CN110188568A (en) * 2019-05-27 2019-08-30 深圳前海微众银行股份有限公司 Confidential information identification method, device, equipment and computer readable storage medium
US20190347271A1 (en) * 2016-11-15 2019-11-14 Spirent Communications, Inc. SQL Interceptor For Use With Third Party Data Analytics Packages
CN110837492A (en) * 2019-11-15 2020-02-25 中科院计算技术研究所大数据研究院 Method for providing data service by multi-source data unified SQL

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101788992A (en) * 2009-05-06 2010-07-28 厦门东南融通系统工程有限公司 Method and system for converting query sentence of database
US20190347271A1 (en) * 2016-11-15 2019-11-14 Spirent Communications, Inc. SQL Interceptor For Use With Third Party Data Analytics Packages
CN108846101A (en) * 2018-06-19 2018-11-20 艾普阳科技(深圳)有限公司 A kind of method and apparatus automatically generating SQL statement
CN108922633A (en) * 2018-06-22 2018-11-30 北京海德康健信息科技有限公司 A kind of disease name standard convention method and canonical system
CN109408561A (en) * 2018-10-17 2019-03-01 杭州骑轻尘信息技术有限公司 Business Name matching process and device
CN109542453A (en) * 2018-11-20 2019-03-29 北京千丁互联科技有限公司 Database information recognition methods, device and terminal
CN109726217A (en) * 2019-01-10 2019-05-07 北京字节跳动网络技术有限公司 A kind of database operation method, device, equipment and storage medium
CN110188568A (en) * 2019-05-27 2019-08-30 深圳前海微众银行股份有限公司 Confidential information identification method, device, equipment and computer readable storage medium
CN110837492A (en) * 2019-11-15 2020-02-25 中科院计算技术研究所大数据研究院 Method for providing data service by multi-source data unified SQL

Similar Documents

Publication Publication Date Title
US10025904B2 (en) Systems and methods for managing a master patient index including duplicate record detection
CN105630864B (en) Forced ordering of a dictionary storing row identifier values
US7685106B2 (en) Sharing of full text index entries across application boundaries
US8051045B2 (en) Archive indexing engine
US20070118547A1 (en) Efficient index versioning in multi-version databases
JP5768063B2 (en) Matching metadata sources using rules that characterize conformance
CN107038222B (en) Database cache implementation method and system
US9069813B2 (en) Query translation for searching complex structures of objects
US11977581B2 (en) System and method for searching chains of regions and associated search operators
CN111191105B (en) Method, device, system, equipment and storage medium for searching government affair information
CN115543402B (en) Software knowledge graph increment updating method based on code submission
US9390111B2 (en) Database insert with deferred materialization
US11556527B2 (en) System and method for value based region searching and associated search operators
CN112883030A (en) Data collection method and device, computer equipment and storage medium
US20080005077A1 (en) Encoded version columns optimized for current version access
US20060122963A1 (en) System and method for performing a data uniqueness check in a sorted data set
EP2542986A1 (en) System and method for rowset inserts
CN117609468A (en) Method and device for generating search statement
CN111563142A (en) SQL automatic benchmarking matching method and device
US8725713B2 (en) String searches in a computer database
US20210141773A1 (en) Configurable Hyper-Referenced Associative Object Schema
CN116680279A (en) Hash index-based data processing method, system, device and computer readable medium
CN114911874A (en) Blood relationship generation method and device, computer equipment and storage medium
CN114610541A (en) Database backup method and device
CN117425886A (en) List-based data search with addition-only data structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200821