WO2019024231A1 - Automatic data matching method, electronic device and computer-readable storage medium - Google Patents

Automatic data matching method, electronic device and computer-readable storage medium Download PDF

Info

Publication number
WO2019024231A1
WO2019024231A1 PCT/CN2017/104820 CN2017104820W WO2019024231A1 WO 2019024231 A1 WO2019024231 A1 WO 2019024231A1 CN 2017104820 W CN2017104820 W CN 2017104820W WO 2019024231 A1 WO2019024231 A1 WO 2019024231A1
Authority
WO
WIPO (PCT)
Prior art keywords
field
type
dynamic list
classification
special
Prior art date
Application number
PCT/CN2017/104820
Other languages
French (fr)
Chinese (zh)
Inventor
陈娴娴
李菲菲
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019024231A1 publication Critical patent/WO2019024231A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • the present invention relates to the field of computer information technology, and in particular, to an automatic data matching method, an electronic device, and a computer readable storage medium.
  • Feature extraction is an important step in various data mining prediction models. It is very important to classify the classification features according to the existing target classification in the data preprocessing stage.
  • the direct matching of the uncleaned classification features with the target classification has problems such as data matching success rate and extremely low accuracy, which cannot meet the model requirements.
  • the data magnitude has far exceeded the scope of manual matching. Therefore, the data matching algorithm design in the prior art is not reasonable enough and needs to be improved.
  • the present invention provides an automatic data matching method, an electronic device, and a computer readable storage medium.
  • the special field structured splitting process and the field logical inclusion relationship effectively improve the matching success rate of the classification feature and the target classification. Accuracy.
  • the present invention provides an electronic device including a memory, a processor, and an automatic data matching program stored on the memory, where the data automatic matching program is implemented by the processor. The following steps:
  • the unsuccessful matching field is matched with the target classification by a preset field logical inclusion relationship.
  • the normalizing the classification feature obtained by the feature extraction operation according to the preset dynamic list comprises:
  • the preset dynamic list is the first type dynamic list, extracting the first type special character stored in the first type dynamic list, and obtaining the feature extraction operation according to the extracted first type special character
  • the classification feature is deleted or replaced, and the normalized first type classification is obtained.
  • the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character
  • the classification feature is deleted or replaced to obtain a normalized second type classification feature
  • the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character
  • the classification feature is deleted or replaced to obtain a normalized third type classification feature.
  • splitting the special field into several field segments comprises:
  • the field segment before the split point and the field segment after the split point are extracted separately.
  • splitting the special field into several field segments comprises:
  • the preset dynamic list is a first type dynamic list, extracting, from the normalized first type classification feature, a first type special field including a detachable character, according to the detachable character a location in the first type special field, splitting the first type special field into a plurality of field segments;
  • the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character Positioning in the second type special field, splitting the second type special field into a plurality of field segments;
  • the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character
  • the location in the third type special field splits the third type special field into several field segments.
  • the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship includes:
  • the semantic similarity value is greater than the preset threshold, determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification.
  • the preset dynamic list is dynamically adjusted according to data changes of the data source.
  • the target is classified into preset rule data in an internal data platform.
  • the present invention further provides an automatic data matching method, which is applied to an electronic device, and the method includes:
  • the unsuccessful matching field is matched with the target classification by a preset field logical inclusion relationship.
  • the normalizing the classification feature obtained by the feature extraction operation according to the preset dynamic list comprises:
  • the preset dynamic list is the first type dynamic list, extracting the first type special character stored in the first type dynamic list, and obtaining the feature extraction operation according to the extracted first type special character
  • the classification feature is deleted or replaced to obtain a normalized first type classification feature
  • the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character
  • the classification feature is deleted or replaced to obtain a normalized second type classification feature
  • the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character
  • the classification feature is deleted or replaced to obtain a normalized third type classification feature.
  • splitting the special field into several field segments comprises:
  • the field segment before the split point and the field segment after the split point are extracted separately.
  • the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship includes:
  • the semantic similarity value is greater than the preset threshold, determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification.
  • the preset dynamic list is dynamically adjusted according to data changes of the data source.
  • the target is classified into preset rule data in an internal data platform.
  • the present invention also provides a computer readable storage medium storing an automatic data matching program, the data automatic matching program being executable by at least one processor, so that The at least one processor performs the steps of the data automatic matching method as described above.
  • the electronic device, data automatic matching method and calculation proposed by the invention is structured and split by special fields, which effectively improves the matching success rate and accuracy of the classification feature and the target classification. Further, the irregularity is solved by the field logical inclusion relationship (or field semantic inclusion relationship). The matching problem of the missing field, thereby further improving the matching success rate and accuracy of the classification feature and the target classification.
  • FIG. 1 is a schematic diagram of an optional hardware architecture of an electronic device of the present invention
  • FIG. 2 is a block diagram showing an embodiment of an automatic data matching procedure in an electronic device of the present invention
  • FIG. 3 is a schematic diagram of an implementation process of an embodiment of an automatic data matching method according to the present invention.
  • the present invention proposes an electronic device 2.
  • FIG. 1 a schematic diagram of an optional hardware architecture of the electronic device 2 of the present invention is shown.
  • the electronic device 2 may include, but is not limited to, a memory 21, a processor 22, and a network interface 23 that can communicate with each other through a system bus.
  • FIG. 1 only shows the electronic device 2 with the components 21-23, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
  • the electronic device 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server.
  • the electronic device 2 may be an independent server or a server cluster composed of multiple servers. .
  • the memory 21 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 21 may be an internal storage unit of the electronic device 2, such as a hard disk or memory of the electronic device 2.
  • the memory 21 may also be an external storage device of the electronic device 2, such as a plug-in hard disk equipped on the electronic device 2, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc.
  • the memory 21 may also include both an internal storage unit of the electronic device 2 and an external storage device thereof.
  • the memory 21 is generally used to store an operating system installed in the electronic device 2 and various types of application software, such as the data automatic matching program 20 and the like. Further, the memory 21 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 22 is typically used to control the overall operation of the electronic device 2, such as performing control and processing related to data interaction or communication with the electronic device 2.
  • the processor 22 is configured to run program code or process data stored in the memory 21, such as running the data automatic matching program 20 and the like.
  • the network interface 23 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 2 and other electronic devices.
  • the network interface 23 is configured to connect the electronic device 2 to an external data platform through a network, and establish a data transmission channel and a communication connection between the electronic device 2 and an external data platform.
  • the network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network.
  • Wireless or wired networks such as network, Bluetooth, Wi-Fi, etc.
  • FIG. 2 it is a block diagram of an embodiment of an automatic data matching program 20 in the electronic device 2 of the present invention.
  • the data automatic matching program 20 may be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and are processed by one or more processors ( This embodiment is executed by the processor 22) to complete the present invention.
  • the data automatic matching program 20 can be divided into an obtaining module 201, a processing module 202, a first matching module 203, and a second matching module 204.
  • a program module as used herein refers to a series of computer program instructions that are capable of performing a particular function. The functions of each of the program modules 201-204 will be described in detail below.
  • the obtaining module 201 is configured to acquire a classification feature obtained by the feature extraction operation.
  • the feature extraction operation is a pre-processing step of various data mining prediction models.
  • the classification features include, but are not limited to, text data such as drug name, diagnostic information, medical order information, medical equipment, surgery type, family history, and the like.
  • the processing module 202 is configured to perform normalization processing on the classification feature obtained by the feature extraction operation according to a preset dynamic list to obtain a normalized classification feature.
  • the preset dynamic list includes a dynamic list corresponding to different types of data sources, such as a dynamic list corresponding to the first type of data source (such as a dynamic list corresponding to the MS SQL Server data source, hereinafter referred to as a dynamic list corresponding to the second type of data source (such as a dynamic list corresponding to the Oracle data source, hereinafter referred to as a "second type dynamic list”), and a dynamic corresponding to the third type of data source.
  • List such as the dynamic list corresponding to the MySQL data source, hereinafter referred to as the "third type dynamic list”
  • the number of dynamic lists may also be increased or decreased according to the number of data source types.
  • different dynamic characters are stored in the dynamic list corresponding to different types of data sources, and are used for performing classification feature normalization processing on different types of data sources.
  • the first type of dynamic list stores a first type of special character for performing classification feature normalization processing on the first type of data source
  • the second type of dynamic list stores the second type of special character.
  • the third type of motion is stored in the state list for performing classification feature normalization processing on the third type data source.
  • the preset dynamic list is dynamically adjusted according to data changes of the data source, such as adding new special characters.
  • the first type dynamic list is dynamically adjusted according to data changes of the first type data source
  • the second type dynamic list is dynamically adjusted according to data changes of the second type data source
  • the third type dynamic list is based on The data changes of the third type of data source are dynamically adjusted and the like.
  • the normalizing the classification feature obtained by the feature extraction according to the preset dynamic list comprises: extracting a special character stored in a preset dynamic list, according to the extracted The special character performs normalization processing such as deleting or replacing the classification feature obtained by the feature extraction operation.
  • the preset dynamic list is a first type dynamic list
  • extracting first type special characters such as "/" and " ⁇ ", etc.
  • the extracted first type special character deletes or replaces the classification feature obtained by the feature extraction operation to obtain a normalized first type classification feature.
  • the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character
  • the classification feature is deleted or replaced to obtain a normalized second type classification feature.
  • the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character
  • the classification feature is deleted or replaced to obtain a normalized third type classification feature.
  • the first matching module 203 is configured to extract, from the normalized classification feature, a special field that includes a detachable character, and according to the position of the detachable character in the special field, the special The field is split into several field segments and the split field segments are matched to the target classification.
  • the target classification may be preset rule data in an internal data platform (such as a Hadoop data platform).
  • splitting the special field into a plurality of field segments comprises: recording a position of the detachable character in the special field as a split point; respectively extracting the split point The field fragment and the field fragment after the split point.
  • the normalized classification feature includes a special field "a+b" or "a//b", where "+” and “//” are detachable characters, then The special fields are split into field segments "a” and "b", and the split field segments "a” and “b” are respectively matched with the target classification.
  • the present invention can effectively improve the matching success rate and accuracy of the classification feature and the target classification by the special field structured split processing described in the first matching module 203.
  • the preset dynamic list is a first type dynamic list
  • the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character
  • the location in the second type special field splits the second type special field into a plurality of field segments, and matches the split field segment with the target classification.
  • the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character
  • the location in the third type special field splits the third type special field into a plurality of field segments, and matches the split field segment with the target classification.
  • the second matching module 204 is configured to match the unsuccessful matching field with the target classification by using a preset field logical inclusion relationship (or a field semantic inclusion relationship).
  • the data matching in the first matching module 203 can be recorded as a first match, and the first matching includes: matching of a special field (ie, splitting the special field into field segments and The target classification is matched to match the non-special field (that is, the non-special field in the normalized classification feature is matched with the target classification).
  • the data matching in the second matching module 204 can be recorded as a second matching, and the second matching includes: matching the field in which the first matching is unsuccessful with the target classification.
  • the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship includes:
  • a semantic logic similarity calculation algorithm (such as an algorithm for calculating semantic similarity based on a tree hierarchy), a semantic similarity value of a field with unsuccessful matching (that is, a field with unsuccessful first matching) and a target classification is calculated;
  • the semantic similarity value is greater than a preset threshold (eg, 80%)
  • a preset threshold eg, 80%
  • the match is unsuccessful.
  • the field "Aspirie Slices” is modified to match the successful fields.
  • the present invention solves the matching problem of the irregularity missing field by the field logical inclusion relationship (or the field semantic inclusion relationship) in the second matching module 204, thereby further improving the matching success rate of the classification feature and the target classification.
  • the accuracy rate, and the matching efficiency has a significant advantage compared with the manual matching, which greatly reduces the workload of manual matching.
  • the second matching module 204 may also be removed. .
  • the data automatic matching program 20 proposed by the present invention effectively improves the matching success rate and accuracy of the classification feature and the target classification through the special field structured splitting process, and further, through the field logic
  • the inclusion relationship (or field semantic inclusion relationship) solves the matching problem of the irregularity missing field, thereby further improving the matching success rate and accuracy of the classification feature and the target classification.
  • the present invention also proposes an automatic data matching method.
  • FIG. 3 it is a schematic flowchart of an implementation of an embodiment of the data automatic matching method of the present invention.
  • the order of execution of the steps in the flowchart shown in FIG. 3 may be changed according to different requirements, and some steps may be omitted.
  • Step S31 Acquire a classification feature obtained by the feature extraction operation.
  • the feature extraction operation is a pre-processing step of various data mining prediction models.
  • the classification features include, but are not limited to, text data such as drug name, diagnostic information, medical order information, medical equipment, surgery type, family history, and the like.
  • Step S32 normalizing the classification features obtained by the feature extraction operation according to the preset dynamic list to obtain a normalized classification feature.
  • the preset dynamic list includes a dynamic list corresponding to different types of data sources, such as a dynamic list corresponding to the first type of data source (such as a dynamic list corresponding to the MS SQL Server data source, hereinafter referred to as a dynamic list corresponding to the second type of data source (such as a dynamic list corresponding to the Oracle data source, hereinafter referred to as a "second type dynamic list”), and a dynamic corresponding to the third type of data source.
  • List such as the dynamic list corresponding to the MySQL data source, hereinafter referred to as the "third type dynamic list”
  • the number of dynamic lists may also be increased or decreased according to the number of data source types.
  • different dynamic characters are stored in the dynamic list corresponding to different types of data sources, and are used for performing classification feature normalization processing on different types of data sources.
  • the first type of dynamic list stores a first type of special character for performing classification feature normalization processing on the first type of data source
  • the second type of dynamic list stores the second type of special character.
  • the third type of motion is stored in the state list for performing classification feature normalization processing on the third type data source.
  • the preset dynamic list is dynamically adjusted according to data changes of the data source, such as adding new special characters.
  • the first type dynamic list is dynamically adjusted according to data changes of the first type data source
  • the second type dynamic list is dynamically adjusted according to data changes of the second type data source
  • the third type dynamic list is based on The data changes of the third type of data source are dynamically adjusted and the like.
  • the normalizing the classification feature obtained by the feature extraction according to the preset dynamic list comprises: extracting a special character stored in a preset dynamic list, according to the extracted The special character performs normalization processing such as deleting or replacing the classification feature obtained by the feature extraction operation.
  • the preset dynamic list is a first type dynamic list
  • extracting first type special characters such as "/" and " ⁇ ", etc.
  • the extracted first type special character deletes or replaces the classification feature obtained by the feature extraction operation to obtain a normalized first type classification feature.
  • the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character
  • the classification feature is deleted or replaced to obtain a normalized second type classification feature.
  • the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character
  • the classification feature is deleted or replaced to obtain a normalized third type classification feature.
  • Step S33 extracting a special field containing the detachable character from the normalized classification feature, and splitting the special field into several field segments according to the position of the detachable character in the special field And matching the split field segments with the target classification.
  • the target classification may be preset rule data in an internal data platform (such as a Hadoop data platform).
  • splitting the special field into a plurality of field segments comprises: recording a position of the detachable character in the special field as a split point; respectively extracting the split point The field fragment and the field fragment after the split point.
  • the normalized classification feature includes a special field "a+b" or "a//b", where "+” and “//” are detachable characters, then The special fields are split into field segments "a” and "b", and the split field segments "a” and “b” are respectively matched with the target classification.
  • the present invention passes the special word described in step S33.
  • the segmental structured splitting process can effectively improve the matching success rate and accuracy of the classification feature and the target classification.
  • the preset dynamic list is a first type dynamic list
  • the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character
  • the location in the second type special field splits the second type special field into a plurality of field segments, and matches the split field segment with the target classification.
  • the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character
  • the location in the third type special field splits the third type special field into a plurality of field segments, and matches the split field segment with the target classification.
  • Step S34 matching the unsuccessful matching field with the target classification by using a preset field logical inclusion relationship (or a field semantic inclusion relationship).
  • the data matching in step S33 may be recorded as a first match, and the first match includes: matching of special fields (ie, splitting the special field into field segments and target classifications) Matching) matching with non-special fields (that is, matching non-special fields in the normalized classification features with target classifications).
  • the data matching in step S34 may be recorded as a second matching, and the second matching includes: matching the field in which the first matching is unsuccessful with the target classification.
  • the step of matching the unsuccessful matching field with the target classification by using the preset field logical inclusion relationship includes:
  • a semantic logic similarity calculation algorithm (such as an algorithm for calculating semantic similarity based on a tree hierarchy), a semantic similarity value of a field with unsuccessful matching (that is, a field with unsuccessful first matching) and a target classification is calculated;
  • the semantic similarity value is greater than a preset threshold (eg, 80%)
  • a preset threshold eg, 80%
  • the match is unsuccessful.
  • the field "Aspirie Slices” is modified to match the successful fields.
  • the present invention solves the matching problem of the irregularity missing field by the field logical inclusion relationship (or the field semantic inclusion relationship) described in step S34, thereby further improving the matching success rate and accuracy of the classification feature and the target classification.
  • the matching efficiency has a significant advantage compared with the manual matching, which greatly reduces the workload of manual matching.
  • the step S34 may also be removed.
  • the automatic data matching method proposed by the present invention effectively improves the matching success rate and accuracy of the classification feature and the target classification through the special field structured splitting process, and further, through the field logic inclusion relationship (or the field semantic inclusion relationship) solves the matching problem of the irregularity missing field, thereby further improving the matching success rate and accuracy of the classification feature and the target classification.
  • the present invention also provides a computer readable storage medium (such as a ROM/RAM, a magnetic disk, an optical disk), wherein the computer readable storage medium stores an automatic data matching program 20, and the data is automatically
  • the matching program 20 can be executed by at least one processor 22 to cause the at least one processor 22 to perform the steps of the data automatic matching method as described above.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

An automatic data matching method, the method comprising the following steps: acquiring a classified feature obtained by a feature extraction operation (S31); performing, according to a preset dynamic list, normalization processing of the classified feature obtained by the feature extraction operation, to obtain a normalized classified feature (S32); extracting, from the normalized classified feature, a special field comprising a splittable character, splitting the special field into several field segments according to the position of the splittable character in the special field, and comparing the split field segments with a target category (S33); and on the basis of a preset field logic inclusion relationship, comparing a field which does not match with the target category (S34). The method can improve the rate of successful matching between a classified feature and a target category and the accuracy thereof. Further disclosed are an electronic device and a computer-readable storage medium.

Description

数据自动匹配方法、电子设备及计算机可读存储介质Automatic data matching method, electronic device and computer readable storage medium
本申请基于巴黎公约申明享有2017年8月4日递交的申请号为CN201710660957.2、名称为“数据自动匹配方法、电子装置及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。The present application is based on the priority of the Chinese Patent Application entitled "Data Automatic Matching Method, Electronic Device and Computer-Readable Storage Medium", filed on August 4, 2017, with the application number of CN201710660957.2, which is filed on August 4, 2017. The entire content of the application is incorporated herein by reference.
技术领域Technical field
本发明涉及计算机信息技术领域,尤其涉及一种数据自动匹配方法、电子设备及计算机可读存储介质。The present invention relates to the field of computer information technology, and in particular, to an automatic data matching method, an electronic device, and a computer readable storage medium.
背景技术Background technique
特征提取是各类数据挖掘预测模型的重要步骤,其中对分类特征按照已有的目标分类进行归一化划分在数据预处理阶段有着十分重要的作用。但是,未经清洗的分类特征与目标分类的直接的完全匹配存在着数据匹配成功率及准确率极低等问题,不能满足模型需求。并且,由于海量数据不断迁入,数据量级已远远超出手动匹配的掌控范围。故,现有技术中的数据匹配算法设计不够合理,亟需改进。Feature extraction is an important step in various data mining prediction models. It is very important to classify the classification features according to the existing target classification in the data preprocessing stage. However, the direct matching of the uncleaned classification features with the target classification has problems such as data matching success rate and extremely low accuracy, which cannot meet the model requirements. Moreover, due to the continuous movement of massive data, the data magnitude has far exceeded the scope of manual matching. Therefore, the data matching algorithm design in the prior art is not reasonable enough and needs to be improved.
发明内容Summary of the invention
有鉴于此,本发明提出一种数据自动匹配方法、电子设备及计算机可读存储介质,通过特殊字段结构化拆分处理和字段逻辑包含关系,有效提升了分类特征与目标分类的匹配成功率和准确率。In view of this, the present invention provides an automatic data matching method, an electronic device, and a computer readable storage medium. The special field structured splitting process and the field logical inclusion relationship effectively improve the matching success rate of the classification feature and the target classification. Accuracy.
首先,为实现上述目的,本发明提出一种电子设备,所述电子设备包括存储器、处理器,所述存储器上存储有数据自动匹配程序,所述数据自动匹配程序被所述处理器执行时实现如下步骤:First, in order to achieve the above object, the present invention provides an electronic device including a memory, a processor, and an automatic data matching program stored on the memory, where the data automatic matching program is implemented by the processor. The following steps:
获取特征提取操作得到的分类特征;Obtaining the classification feature obtained by the feature extraction operation;
根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理,得到归一化后的分类特征;Performing normalization on the classification features obtained by the feature extraction operation according to a preset dynamic list to obtain a normalized classification feature;
从所述归一化后的分类特征中提取包含可拆分字符的特殊字段,根据该可拆分字符在所述特殊字段中的位置,将所述特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配;及Extracting a special field containing the detachable character from the normalized classification feature, and splitting the special field into a plurality of field segments according to the position of the detachable character in the special field, and The split field segment matches the target classification; and
通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配。The unsuccessful matching field is matched with the target classification by a preset field logical inclusion relationship.
优选地,所述根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理包括:Preferably, the normalizing the classification feature obtained by the feature extraction operation according to the preset dynamic list comprises:
若所述预设的动态列表为第一类型动态列表,则提取该第一类型动态列表中存储的第一类型特殊字符,根据所述提取的第一类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第一类型分类 特征;If the preset dynamic list is the first type dynamic list, extracting the first type special character stored in the first type dynamic list, and obtaining the feature extraction operation according to the extracted first type special character The classification feature is deleted or replaced, and the normalized first type classification is obtained. feature;
若所述预设的动态列表为第二类型动态列表,则提取该第二类型动态列表中存储的第二类型特殊字符,根据所述提取的第二类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第二类型分类特征;及If the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character The classification feature is deleted or replaced to obtain a normalized second type classification feature;
若所述预设的动态列表为第三类型动态列表,则提取该第三类型动态列表中存储的第三类型特殊字符,根据所述提取的第三类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第三类型分类特征。If the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character The classification feature is deleted or replaced to obtain a normalized third type classification feature.
优选地,将所述特殊字段拆分成若干字段片段包括:Preferably, splitting the special field into several field segments comprises:
将该可拆分字符在所述特殊字段中的位置记录为拆分点;及Recording the position of the detachable character in the special field as a split point; and
分别提取该拆分点之前的字段片段和该拆分点之后的字段片段。The field segment before the split point and the field segment after the split point are extracted separately.
优选地,将所述特殊字段拆分成若干字段片段包括:Preferably, splitting the special field into several field segments comprises:
若所述预设的动态列表为第一类型动态列表,则从所述归一化后的第一类型分类特征中提取包含可拆分字符的第一类型特殊字段,根据该可拆分字符在所述第一类型特殊字段中的位置,将所述第一类型特殊字段拆分成若干字段片段;If the preset dynamic list is a first type dynamic list, extracting, from the normalized first type classification feature, a first type special field including a detachable character, according to the detachable character a location in the first type special field, splitting the first type special field into a plurality of field segments;
若所述预设的动态列表为第二类型动态列表,则从所述归一化后的第二类型分类特征中提取包含可拆分字符的第二类型特殊字段,根据该可拆分字符在所述第二类型特殊字段中的位置,将所述第二类型特殊字段拆分成若干字段片段;及If the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character Positioning in the second type special field, splitting the second type special field into a plurality of field segments; and
若所述预设的动态列表为第三类型动态列表,则从所述归一化后的第三类型分类特征中提取包含可拆分字符的第三类型特殊字段,根据该可拆分字符在所述第三类型特殊字段中的位置,将所述第三类型特殊字段拆分成若干字段片段。If the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character The location in the third type special field splits the third type special field into several field segments.
优选地,所述通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配包括:Preferably, the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship includes:
根据语义逻辑相似度计算算法,计算匹配不成功的字段与目标分类的语义相似度值;及Calculating a semantic similarity value between the unsuccessful matching field and the target classification according to the semantic logic similarity calculation algorithm; and
若该语义相似度值大于预设阀值,则判定该匹配不成功的字段与该目标分类存在逻辑包含关系,将该匹配不成功的字段标记为与该目标分类存在匹配关系。If the semantic similarity value is greater than the preset threshold, determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification.
优选地,所述预设的动态列表根据数据源的数据变化进行动态调整。Preferably, the preset dynamic list is dynamically adjusted according to data changes of the data source.
优选地,所述目标分类为内部数据平台中预设的规则数据。Preferably, the target is classified into preset rule data in an internal data platform.
此外,为实现上述目的,本发明还提供一种数据自动匹配方法,该方法应用于电子设备,所述方法包括:In addition, in order to achieve the above object, the present invention further provides an automatic data matching method, which is applied to an electronic device, and the method includes:
获取特征提取操作得到的分类特征;Obtaining the classification feature obtained by the feature extraction operation;
根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化 处理,得到归一化后的分类特征;Normalizing the classification features obtained by the feature extraction operation according to a preset dynamic list Processing, obtaining normalized classification features;
从所述归一化后的分类特征中提取包含可拆分字符的特殊字段,根据该可拆分字符在所述特殊字段中的位置,将所述特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配;及Extracting a special field containing the detachable character from the normalized classification feature, and splitting the special field into a plurality of field segments according to the position of the detachable character in the special field, and The split field segment matches the target classification; and
通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配。The unsuccessful matching field is matched with the target classification by a preset field logical inclusion relationship.
优选地,所述根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理包括:Preferably, the normalizing the classification feature obtained by the feature extraction operation according to the preset dynamic list comprises:
若所述预设的动态列表为第一类型动态列表,则提取该第一类型动态列表中存储的第一类型特殊字符,根据所述提取的第一类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第一类型分类特征;If the preset dynamic list is the first type dynamic list, extracting the first type special character stored in the first type dynamic list, and obtaining the feature extraction operation according to the extracted first type special character The classification feature is deleted or replaced to obtain a normalized first type classification feature;
若所述预设的动态列表为第二类型动态列表,则提取该第二类型动态列表中存储的第二类型特殊字符,根据所述提取的第二类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第二类型分类特征;及If the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character The classification feature is deleted or replaced to obtain a normalized second type classification feature;
若所述预设的动态列表为第三类型动态列表,则提取该第三类型动态列表中存储的第三类型特殊字符,根据所述提取的第三类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第三类型分类特征。If the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character The classification feature is deleted or replaced to obtain a normalized third type classification feature.
优选地,将所述特殊字段拆分成若干字段片段包括:Preferably, splitting the special field into several field segments comprises:
将该可拆分字符在所述特殊字段中的位置记录为拆分点;及Recording the position of the detachable character in the special field as a split point; and
分别提取该拆分点之前的字段片段和该拆分点之后的字段片段。The field segment before the split point and the field segment after the split point are extracted separately.
优选地,所述通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配包括:Preferably, the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship includes:
根据语义逻辑相似度计算算法,计算匹配不成功的字段与目标分类的语义相似度值;及Calculating a semantic similarity value between the unsuccessful matching field and the target classification according to the semantic logic similarity calculation algorithm; and
若该语义相似度值大于预设阀值,则判定该匹配不成功的字段与该目标分类存在逻辑包含关系,将该匹配不成功的字段标记为与该目标分类存在匹配关系。If the semantic similarity value is greater than the preset threshold, determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification.
优选地,所述预设的动态列表根据数据源的数据变化进行动态调整。Preferably, the preset dynamic list is dynamically adjusted according to data changes of the data source.
优选地,所述目标分类为内部数据平台中预设的规则数据。Preferably, the target is classified into preset rule data in an internal data platform.
进一步地,为实现上述目的,本发明还提供一种计算机可读存储介质,所述计算机可读存储介质存储有数据自动匹配程序,所述数据自动匹配程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的数据自动匹配方法的步骤。Further, in order to achieve the above object, the present invention also provides a computer readable storage medium storing an automatic data matching program, the data automatic matching program being executable by at least one processor, so that The at least one processor performs the steps of the data automatic matching method as described above.
相较于现有技术,本发明所提出的电子设备、数据自动匹配方法及计算 机可读存储介质,通过特殊字段结构化拆分处理,有效提升了分类特征与目标分类的匹配成功率和准确率,进一步地,通过字段逻辑包含关系(或字段语义包含关系)解决了不规则性缺失字段的匹配问题,从而进一步地提升了分类特征与目标分类的匹配成功率和准确率。Compared with the prior art, the electronic device, data automatic matching method and calculation proposed by the invention The machine readable storage medium is structured and split by special fields, which effectively improves the matching success rate and accuracy of the classification feature and the target classification. Further, the irregularity is solved by the field logical inclusion relationship (or field semantic inclusion relationship). The matching problem of the missing field, thereby further improving the matching success rate and accuracy of the classification feature and the target classification.
附图说明DRAWINGS
图1是本发明电子设备一可选的硬件架构的示意图;1 is a schematic diagram of an optional hardware architecture of an electronic device of the present invention;
图2是本发明电子设备中数据自动匹配程序一实施例的模块示意图;2 is a block diagram showing an embodiment of an automatic data matching procedure in an electronic device of the present invention;
图3为本发明数据自动匹配方法一实施例的实施流程示意图。FIG. 3 is a schematic diagram of an implementation process of an embodiment of an automatic data matching method according to the present invention.
附图标记:Reference mark:
电子设备 Electronic equipment 22
存储器Memory 21twenty one
处理器processor 22twenty two
网络接口Network Interface 23twenty three
数据自动匹配程序Automatic data matching program 2020
获取模块 Acquisition module 201201
处理模块 Processing module 202202
第一匹配模块 First matching module 203203
第二匹配模块 Second matching module 204204
流程步骤Process step S31-S34S31-S34
本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The implementation, functional features, and advantages of the present invention will be further described in conjunction with the embodiments.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
需要说明的是,在本发明中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括 至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本发明要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present invention are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of indicated technical features. . Thus, features defining "first" and "second" may be explicitly or implicitly included. At least one of the features. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. It is also within the scope of protection required by the present invention.
进一步需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It is further to be understood that the term "comprises", "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a And includes other elements not explicitly listed, or elements that are inherent to such a process, method, article, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.
首先,本发明提出一种电子设备2。First, the present invention proposes an electronic device 2.
参阅图1所示,是本发明电子设备2一可选的硬件架构的示意图。本实施例中,所述电子设备2可包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23。需要指出的是,图1仅示出了具有组件21-23的电子设备2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。Referring to FIG. 1, a schematic diagram of an optional hardware architecture of the electronic device 2 of the present invention is shown. In this embodiment, the electronic device 2 may include, but is not limited to, a memory 21, a processor 22, and a network interface 23 that can communicate with each other through a system bus. It is pointed out that FIG. 1 only shows the electronic device 2 with the components 21-23, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
其中,所述电子设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备,该电子设备2可以是独立的服务器,也可以是多个服务器所组成的服务器集群。The electronic device 2 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The electronic device 2 may be an independent server or a server cluster composed of multiple servers. .
所述存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述电子设备2的内部存储单元,例如该电子设备2的硬盘或内存。在另一些实施例中,所述存储器21也可以是所述电子设备2的外部存储设备,例如该电子设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述电子设备2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器21通常用于存储安装于所述电子设备2的操作系统和各类应用软件,例如所述数据自动匹配程序20等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 21 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 21 may be an internal storage unit of the electronic device 2, such as a hard disk or memory of the electronic device 2. In other embodiments, the memory 21 may also be an external storage device of the electronic device 2, such as a plug-in hard disk equipped on the electronic device 2, a smart memory card (SMC), and a secure digital device. (Secure Digital, SD) card, flash card, etc. Of course, the memory 21 may also include both an internal storage unit of the electronic device 2 and an external storage device thereof. In this embodiment, the memory 21 is generally used to store an operating system installed in the electronic device 2 and various types of application software, such as the data automatic matching program 20 and the like. Further, the memory 21 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述电子设备2的总体操作,例如执行与所述电子设备2进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述的数据自动匹配程序20等。 The processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 22 is typically used to control the overall operation of the electronic device 2, such as performing control and processing related to data interaction or communication with the electronic device 2. In this embodiment, the processor 22 is configured to run program code or process data stored in the memory 21, such as running the data automatic matching program 20 and the like.
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述电子设备2与其他电子设备之间建立通信连接。例如,所述网络接口23用于通过网络将所述电子设备2与外部数据平台相连,在所述电子设备2与外部数据平台之间的建立数据传输通道和通信连接。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 2 and other electronic devices. For example, the network interface 23 is configured to connect the electronic device 2 to an external data platform through a network, and establish a data transmission channel and a communication connection between the electronic device 2 and an external data platform. The network may be an intranet, an Internet, a Global System of Mobile communication (GSM), a Wideband Code Division Multiple Access (WCDMA), a 4G network, or a 5G network. Wireless or wired networks such as network, Bluetooth, Wi-Fi, etc.
至此,己经详细介绍了本发明各个实施例的应用环境和相关设备的硬件结构和功能。下面,将基于上述应用环境和相关设备,提出本发明的各个实施例。So far, the application environment and the hardware structure and functions of the related devices of the various embodiments of the present invention have been described in detail. Hereinafter, various embodiments of the present invention will be proposed based on the above-described application environment and related equipment.
参阅图2所示,是本发明电子设备2中数据自动匹配程序20一实施例的模块示意图。本实施例中,所述的数据自动匹配程序20可以被分割成一个或多个程序模块,所述一个或者多个程序模块被存储于所述存储器21中,并由一个或多个处理器(本实施例中为所述处理器22)所执行,以完成本发明。例如,在图2中,所述的数据自动匹配程序20可以被分割成获取模块201、处理模块202、第一匹配模块203、以及第二匹配模块204。本发明所称的程序模块是指能够完成特定功能的一系列计算机程序指令段。以下将就各程序模块201-204的功能进行详细描述。Referring to FIG. 2, it is a block diagram of an embodiment of an automatic data matching program 20 in the electronic device 2 of the present invention. In this embodiment, the data automatic matching program 20 may be divided into one or more program modules, and the one or more program modules are stored in the memory 21 and are processed by one or more processors ( This embodiment is executed by the processor 22) to complete the present invention. For example, in FIG. 2, the data automatic matching program 20 can be divided into an obtaining module 201, a processing module 202, a first matching module 203, and a second matching module 204. A program module as used herein refers to a series of computer program instructions that are capable of performing a particular function. The functions of each of the program modules 201-204 will be described in detail below.
所述获取模块201,用于获取特征提取操作得到的分类特征。其中,所述特征提取操作为各类数据挖掘预测模型的预处理步骤。优选地,在本实施例中,所述分类特征包括,但不限于,药品名称、诊断信息、医嘱信息、医疗器材、手术类型、家族史等文本数据。The obtaining module 201 is configured to acquire a classification feature obtained by the feature extraction operation. The feature extraction operation is a pre-processing step of various data mining prediction models. Preferably, in the embodiment, the classification features include, but are not limited to, text data such as drug name, diagnostic information, medical order information, medical equipment, surgery type, family history, and the like.
所述处理模块202,用于根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理,得到归一化后的分类特征。The processing module 202 is configured to perform normalization processing on the classification feature obtained by the feature extraction operation according to a preset dynamic list to obtain a normalized classification feature.
优选地,在本实施例中,所述预设的动态列表包括不同类型数据源对应的动态列表,如第一类型数据源对应的动态列表(如MS SQL Server数据源对应的动态列表,以下简称为“第一类型动态列表”)、第二类型数据源对应的动态列表(如Oracle数据源对应的动态列表,以下简称为“第二类型动态列表”)、及第三类型数据源对应的动态列表(如MySQL数据源对应的动态列表,以下简称为“第三类型动态列表”)等。本领域技术人员应当可以理解,在其它实施例中,所述动态列表的数量,也可以根据数据源类型的数量增加或减少。Preferably, in this embodiment, the preset dynamic list includes a dynamic list corresponding to different types of data sources, such as a dynamic list corresponding to the first type of data source (such as a dynamic list corresponding to the MS SQL Server data source, hereinafter referred to as a dynamic list corresponding to the second type of data source (such as a dynamic list corresponding to the Oracle data source, hereinafter referred to as a "second type dynamic list"), and a dynamic corresponding to the third type of data source. List (such as the dynamic list corresponding to the MySQL data source, hereinafter referred to as the "third type dynamic list"). Those skilled in the art should understand that in other embodiments, the number of dynamic lists may also be increased or decreased according to the number of data source types.
优选地,在本实施例中,不同类型数据源对应的动态列表中存储有不同的特殊字符,用于针对不同类型的数据源进行分类特征归一化处理。例如,所述第一类型动态列表中存储有第一类型特殊字符,用于针对第一类型数据源进行分类特征归一化处理;所述第二类型动态列表中存储有第二类型特殊字符,用于针对第二类型数据源进行分类特征归一化处理;所述第三类型动 态列表中存储有第三类型特殊字符,用于针对第三类型数据源进行分类特征归一化处理。Preferably, in this embodiment, different dynamic characters are stored in the dynamic list corresponding to different types of data sources, and are used for performing classification feature normalization processing on different types of data sources. For example, the first type of dynamic list stores a first type of special character for performing classification feature normalization processing on the first type of data source, and the second type of dynamic list stores the second type of special character. For classifying feature normalization processing for a second type of data source; the third type of motion The third type special character is stored in the state list for performing classification feature normalization processing on the third type data source.
优选地,在本实施例中,所述预设的动态列表根据数据源的数据变化进行动态调整,如增加新的特殊字符等。例如,所述第一类型动态列表根据第一类型数据源的数据变化进行动态调整,所述第二类型动态列表根据第二类型数据源的数据变化进行动态调整,所述第三类型动态列表根据第三类型数据源的数据变化进行动态调整等。Preferably, in this embodiment, the preset dynamic list is dynamically adjusted according to data changes of the data source, such as adding new special characters. For example, the first type dynamic list is dynamically adjusted according to data changes of the first type data source, and the second type dynamic list is dynamically adjusted according to data changes of the second type data source, and the third type dynamic list is based on The data changes of the third type of data source are dynamically adjusted and the like.
优选地,在本实施例中,所述根据预设的动态列表对所述特征提取得到的分类特征进行归一化处理包括:提取预设的动态列表中存储的特殊字符,根据所述提取的特殊字符对所述特征提取操作得到的分类特征进行删除或替换等归一化处理。Preferably, in the embodiment, the normalizing the classification feature obtained by the feature extraction according to the preset dynamic list comprises: extracting a special character stored in a preset dynamic list, according to the extracted The special character performs normalization processing such as deleting or replacing the classification feature obtained by the feature extraction operation.
具体而言,若所述预设的动态列表为第一类型动态列表,则提取该第一类型动态列表中存储的第一类型特殊字符(如“/”和“\”等),根据所述提取的第一类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第一类型分类特征。Specifically, if the preset dynamic list is a first type dynamic list, extracting first type special characters (such as "/" and "\", etc.) stored in the first type dynamic list, according to the The extracted first type special character deletes or replaces the classification feature obtained by the feature extraction operation to obtain a normalized first type classification feature.
若所述预设的动态列表为第二类型动态列表,则提取该第二类型动态列表中存储的第二类型特殊字符,根据所述提取的第二类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第二类型分类特征。If the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character The classification feature is deleted or replaced to obtain a normalized second type classification feature.
若所述预设的动态列表为第三类型动态列表,则提取该第三类型动态列表中存储的第三类型特殊字符,根据所述提取的第三类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第三类型分类特征。If the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character The classification feature is deleted or replaced to obtain a normalized third type classification feature.
所述第一匹配模块203,用于从所述归一化后的分类特征中提取包含可拆分字符的特殊字段,根据该可拆分字符在所述特殊字段中的位置,将所述特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。其中,所述目标分类可以是内部数据平台(如Hadoop数据平台)中预设的规则数据。The first matching module 203 is configured to extract, from the normalized classification feature, a special field that includes a detachable character, and according to the position of the detachable character in the special field, the special The field is split into several field segments and the split field segments are matched to the target classification. The target classification may be preset rule data in an internal data platform (such as a Hadoop data platform).
优选地,在本实施例中,将所述特殊字段拆分成若干字段片段包括:将该可拆分字符在所述特殊字段中的位置记录为拆分点;分别提取该拆分点之前的字段片段和该拆分点之后的字段片段。Preferably, in the embodiment, splitting the special field into a plurality of field segments comprises: recording a position of the detachable character in the special field as a split point; respectively extracting the split point The field fragment and the field fragment after the split point.
举例而言,若所述归一化后的分类特征中包含特殊字段“a+b”或“a//b”,其中,“+”和“//”为可拆分字符,则将所述特殊字段拆分成字段片段“a”和“b”,然后将拆分成的字段片段“a”和“b”分别与目标分类进行匹配。For example, if the normalized classification feature includes a special field "a+b" or "a//b", where "+" and "//" are detachable characters, then The special fields are split into field segments "a" and "b", and the split field segments "a" and "b" are respectively matched with the target classification.
由于直接将特殊字段(如“a+b”或“a//b”)与目标分类进行匹配时,很可能会导致匹配失败。但是,如果将上述特殊字段拆分成字段片段“a”和“b”后,再将拆分成的字段片段“a”和“b”分别与目标分类进行匹配 时,匹配成功率将会大大提升。因此,本发明通过第一匹配模块203中所述的特殊字段结构化拆分处理,能有效提升分类特征与目标分类的匹配成功率和准确率。Since a special field (such as "a+b" or "a//b") is directly matched to the target classification, it is likely to cause the match to fail. However, if the above special field is split into the field segments "a" and "b", the split field segments "a" and "b" are respectively matched with the target classification. When the match success rate will be greatly improved. Therefore, the present invention can effectively improve the matching success rate and accuracy of the classification feature and the target classification by the special field structured split processing described in the first matching module 203.
优选地,在本实施例中,若所述预设的动态列表为第一类型动态列表,则从所述归一化后的第一类型分类特征中提取包含可拆分字符的第一类型特殊字段,根据该可拆分字符在所述第一类型特殊字段中的位置,将所述第一类型特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。Preferably, in this embodiment, if the preset dynamic list is a first type dynamic list, extracting the first type special including the detachable characters from the normalized first type classification feature a field, according to a position of the detachable character in the first type special field, splitting the first type special field into a plurality of field segments, and matching the split field segment with a target classification .
若所述预设的动态列表为第二类型动态列表,则从所述归一化后的第二类型分类特征中提取包含可拆分字符的第二类型特殊字段,根据该可拆分字符在所述第二类型特殊字段中的位置,将所述第二类型特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。If the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character The location in the second type special field splits the second type special field into a plurality of field segments, and matches the split field segment with the target classification.
若所述预设的动态列表为第三类型动态列表,则从所述归一化后的第三类型分类特征中提取包含可拆分字符的第三类型特殊字段,根据该可拆分字符在所述第三类型特殊字段中的位置,将所述第三类型特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。If the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character The location in the third type special field splits the third type special field into a plurality of field segments, and matches the split field segment with the target classification.
所述第二匹配模块204,用于通过预设的字段逻辑包含关系(或字段语义包含关系),将匹配不成功的字段与目标分类进行匹配。The second matching module 204 is configured to match the unsuccessful matching field with the target classification by using a preset field logical inclusion relationship (or a field semantic inclusion relationship).
优选地,在本实施例中,第一匹配模块203中的数据匹配可以记为第一次匹配,所述第一次匹配包括:特殊字段的匹配(即将所述特殊字段拆分成字段片段与目标分类进行匹配)与非特殊字段的匹配(即将所述归一化后的分类特征中的非特殊字段与目标分类进行匹配)。进一步地,第二匹配模块204中的数据匹配可以记为第二次匹配,所述第二次匹配包括:将所述第一次匹配不成功的字段与目标分类进行匹配。Preferably, in this embodiment, the data matching in the first matching module 203 can be recorded as a first match, and the first matching includes: matching of a special field (ie, splitting the special field into field segments and The target classification is matched to match the non-special field (that is, the non-special field in the normalized classification feature is matched with the target classification). Further, the data matching in the second matching module 204 can be recorded as a second matching, and the second matching includes: matching the field in which the first matching is unsuccessful with the target classification.
优选地,在本实施例中,所述通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配包括:Preferably, in this embodiment, the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship includes:
根据语义逻辑相似度计算算法(如基于树状层次计算语义相似度的算法),计算匹配不成功的字段(即第一次匹配不成功的字段)与目标分类的语义相似度值;According to a semantic logic similarity calculation algorithm (such as an algorithm for calculating semantic similarity based on a tree hierarchy), a semantic similarity value of a field with unsuccessful matching (that is, a field with unsuccessful first matching) and a target classification is calculated;
若该语义相似度值大于预设阀值(如80%),则判定该匹配不成功的字段与该目标分类存在逻辑包含关系,将该匹配不成功的字段标记为与该目标分类存在匹配关系,即将该匹配不成功的字段修改为匹配成功的字段。If the semantic similarity value is greater than a preset threshold (eg, 80%), determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification , the field that the match was unsuccessful is modified to match the successful field.
举例而言,若第一次匹配不成功的字段包含“阿司匹林片”,而目标分类包含字段“阿司匹林”,由于“阿司匹林片”与“阿司匹林”存在语义逻辑包含关系,则将该匹配不成功的字段“阿司匹林片”修改为匹配成功的字段。For example, if the field with the first unsuccessful match contains "aspirin tablets" and the target category contains the field "aspirin", since the "aspirin tablets" and "aspirin" have a semantic logic inclusion relationship, the match is unsuccessful. The field "Aspirie Slices" is modified to match the successful fields.
由于第二匹配模块204将第一匹配模块203中第一次匹配不成功的字段 进一步进行匹配,若发现第一次匹配不成功的字段与目标分类存在逻辑包含关系(或语义包含关系),则将第一次匹配不成功的字段修改为匹配成功的字段。因此,本发明通过第二匹配模块204中所述的字段逻辑包含关系(或字段语义包含关系)解决了不规则性缺失字段的匹配问题,从而进一步地提升了分类特征与目标分类的匹配成功率和准确率,且匹配效率与手动匹配相比有显著性优势,大大降低了人工匹配的工作量。Since the second matching module 204 will match the first matching module 203 to the first unsuccessful field If the matching of the first unsuccessful field and the target classification have a logical inclusion relationship (or a semantic inclusion relationship), the field whose first matching is unsuccessful is modified to be a successfully matched field. Therefore, the present invention solves the matching problem of the irregularity missing field by the field logical inclusion relationship (or the field semantic inclusion relationship) in the second matching module 204, thereby further improving the matching success rate of the classification feature and the target classification. And the accuracy rate, and the matching efficiency has a significant advantage compared with the manual matching, which greatly reduces the workload of manual matching.
需要说明的是,在其它实施例中,在某些情况下,例如,所述第一次匹配成功率已经较高(如大于90%)的情形下,所述第二匹配模块204也可以去除。It should be noted that, in other embodiments, in some cases, for example, if the first matching success rate is already high (eg, greater than 90%), the second matching module 204 may also be removed. .
通过上述程序模块201-204,本发明所提出的数据自动匹配程序20,通过特殊字段结构化拆分处理,有效提升了分类特征与目标分类的匹配成功率和准确率,进一步地,通过字段逻辑包含关系(或字段语义包含关系)解决了不规则性缺失字段的匹配问题,从而进一步地提升了分类特征与目标分类的匹配成功率和准确率。Through the above program modules 201-204, the data automatic matching program 20 proposed by the present invention effectively improves the matching success rate and accuracy of the classification feature and the target classification through the special field structured splitting process, and further, through the field logic The inclusion relationship (or field semantic inclusion relationship) solves the matching problem of the irregularity missing field, thereby further improving the matching success rate and accuracy of the classification feature and the target classification.
此外,本发明还提出一种数据自动匹配方法。In addition, the present invention also proposes an automatic data matching method.
参阅图3所示,是本发明数据自动匹配方法一实施例的实施流程示意图。在本实施例中,根据不同的需求,图3所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。Referring to FIG. 3, it is a schematic flowchart of an implementation of an embodiment of the data automatic matching method of the present invention. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 3 may be changed according to different requirements, and some steps may be omitted.
步骤S31,获取特征提取操作得到的分类特征。其中,所述特征提取操作为各类数据挖掘预测模型的预处理步骤。优选地,在本实施例中,所述分类特征包括,但不限于,药品名称、诊断信息、医嘱信息、医疗器材、手术类型、家族史等文本数据。Step S31: Acquire a classification feature obtained by the feature extraction operation. The feature extraction operation is a pre-processing step of various data mining prediction models. Preferably, in the embodiment, the classification features include, but are not limited to, text data such as drug name, diagnostic information, medical order information, medical equipment, surgery type, family history, and the like.
步骤S32,根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理,得到归一化后的分类特征。Step S32, normalizing the classification features obtained by the feature extraction operation according to the preset dynamic list to obtain a normalized classification feature.
优选地,在本实施例中,所述预设的动态列表包括不同类型数据源对应的动态列表,如第一类型数据源对应的动态列表(如MS SQL Server数据源对应的动态列表,以下简称为“第一类型动态列表”)、第二类型数据源对应的动态列表(如Oracle数据源对应的动态列表,以下简称为“第二类型动态列表”)、及第三类型数据源对应的动态列表(如MySQL数据源对应的动态列表,以下简称为“第三类型动态列表”)等。本领域技术人员应当可以理解,在其它实施例中,所述动态列表的数量,也可以根据数据源类型的数量增加或减少。Preferably, in this embodiment, the preset dynamic list includes a dynamic list corresponding to different types of data sources, such as a dynamic list corresponding to the first type of data source (such as a dynamic list corresponding to the MS SQL Server data source, hereinafter referred to as a dynamic list corresponding to the second type of data source (such as a dynamic list corresponding to the Oracle data source, hereinafter referred to as a "second type dynamic list"), and a dynamic corresponding to the third type of data source. List (such as the dynamic list corresponding to the MySQL data source, hereinafter referred to as the "third type dynamic list"). Those skilled in the art should understand that in other embodiments, the number of dynamic lists may also be increased or decreased according to the number of data source types.
优选地,在本实施例中,不同类型数据源对应的动态列表中存储有不同的特殊字符,用于针对不同类型的数据源进行分类特征归一化处理。例如,所述第一类型动态列表中存储有第一类型特殊字符,用于针对第一类型数据源进行分类特征归一化处理;所述第二类型动态列表中存储有第二类型特殊字符,用于针对第二类型数据源进行分类特征归一化处理;所述第三类型动 态列表中存储有第三类型特殊字符,用于针对第三类型数据源进行分类特征归一化处理。Preferably, in this embodiment, different dynamic characters are stored in the dynamic list corresponding to different types of data sources, and are used for performing classification feature normalization processing on different types of data sources. For example, the first type of dynamic list stores a first type of special character for performing classification feature normalization processing on the first type of data source, and the second type of dynamic list stores the second type of special character. For classifying feature normalization processing for a second type of data source; the third type of motion The third type special character is stored in the state list for performing classification feature normalization processing on the third type data source.
优选地,在本实施例中,所述预设的动态列表根据数据源的数据变化进行动态调整,如增加新的特殊字符等。例如,所述第一类型动态列表根据第一类型数据源的数据变化进行动态调整,所述第二类型动态列表根据第二类型数据源的数据变化进行动态调整,所述第三类型动态列表根据第三类型数据源的数据变化进行动态调整等。Preferably, in this embodiment, the preset dynamic list is dynamically adjusted according to data changes of the data source, such as adding new special characters. For example, the first type dynamic list is dynamically adjusted according to data changes of the first type data source, and the second type dynamic list is dynamically adjusted according to data changes of the second type data source, and the third type dynamic list is based on The data changes of the third type of data source are dynamically adjusted and the like.
优选地,在本实施例中,所述根据预设的动态列表对所述特征提取得到的分类特征进行归一化处理包括:提取预设的动态列表中存储的特殊字符,根据所述提取的特殊字符对所述特征提取操作得到的分类特征进行删除或替换等归一化处理。Preferably, in the embodiment, the normalizing the classification feature obtained by the feature extraction according to the preset dynamic list comprises: extracting a special character stored in a preset dynamic list, according to the extracted The special character performs normalization processing such as deleting or replacing the classification feature obtained by the feature extraction operation.
具体而言,若所述预设的动态列表为第一类型动态列表,则提取该第一类型动态列表中存储的第一类型特殊字符(如“/”和“\”等),根据所述提取的第一类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第一类型分类特征。Specifically, if the preset dynamic list is a first type dynamic list, extracting first type special characters (such as "/" and "\", etc.) stored in the first type dynamic list, according to the The extracted first type special character deletes or replaces the classification feature obtained by the feature extraction operation to obtain a normalized first type classification feature.
若所述预设的动态列表为第二类型动态列表,则提取该第二类型动态列表中存储的第二类型特殊字符,根据所述提取的第二类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第二类型分类特征。If the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character The classification feature is deleted or replaced to obtain a normalized second type classification feature.
若所述预设的动态列表为第三类型动态列表,则提取该第三类型动态列表中存储的第三类型特殊字符,根据所述提取的第三类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第三类型分类特征。If the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character The classification feature is deleted or replaced to obtain a normalized third type classification feature.
步骤S33,从所述归一化后的分类特征中提取包含可拆分字符的特殊字段,根据该可拆分字符在所述特殊字段中的位置,将所述特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。其中,所述目标分类可以是内部数据平台(如Hadoop数据平台)中预设的规则数据。Step S33, extracting a special field containing the detachable character from the normalized classification feature, and splitting the special field into several field segments according to the position of the detachable character in the special field And matching the split field segments with the target classification. The target classification may be preset rule data in an internal data platform (such as a Hadoop data platform).
优选地,在本实施例中,将所述特殊字段拆分成若干字段片段包括:将该可拆分字符在所述特殊字段中的位置记录为拆分点;分别提取该拆分点之前的字段片段和该拆分点之后的字段片段。Preferably, in the embodiment, splitting the special field into a plurality of field segments comprises: recording a position of the detachable character in the special field as a split point; respectively extracting the split point The field fragment and the field fragment after the split point.
举例而言,若所述归一化后的分类特征中包含特殊字段“a+b”或“a//b”,其中,“+”和“//”为可拆分字符,则将所述特殊字段拆分成字段片段“a”和“b”,然后将拆分成的字段片段“a”和“b”分别与目标分类进行匹配。For example, if the normalized classification feature includes a special field "a+b" or "a//b", where "+" and "//" are detachable characters, then The special fields are split into field segments "a" and "b", and the split field segments "a" and "b" are respectively matched with the target classification.
由于直接将特殊字段(如“a+b”或“a//b”)与目标分类进行匹配时,很可能会导致匹配失败。但是,如果将上述特殊字段拆分成字段片段“a”和“b”后,再将拆分成的字段片段“a”和“b”分别与目标分类进行匹配时,匹配成功率将会大大提升。因此,本发明通过步骤S33中所述的特殊字 段结构化拆分处理,能有效提升分类特征与目标分类的匹配成功率和准确率。Since a special field (such as "a+b" or "a//b") is directly matched to the target classification, it is likely to cause the match to fail. However, if the above special fields are split into the field segments "a" and "b", and the split field segments "a" and "b" are respectively matched with the target classification, the matching success rate will be greatly improved. Upgrade. Therefore, the present invention passes the special word described in step S33. The segmental structured splitting process can effectively improve the matching success rate and accuracy of the classification feature and the target classification.
优选地,在本实施例中,若所述预设的动态列表为第一类型动态列表,则从所述归一化后的第一类型分类特征中提取包含可拆分字符的第一类型特殊字段,根据该可拆分字符在所述第一类型特殊字段中的位置,将所述第一类型特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。Preferably, in this embodiment, if the preset dynamic list is a first type dynamic list, extracting the first type special including the detachable characters from the normalized first type classification feature a field, according to a position of the detachable character in the first type special field, splitting the first type special field into a plurality of field segments, and matching the split field segment with a target classification .
若所述预设的动态列表为第二类型动态列表,则从所述归一化后的第二类型分类特征中提取包含可拆分字符的第二类型特殊字段,根据该可拆分字符在所述第二类型特殊字段中的位置,将所述第二类型特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。If the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character The location in the second type special field splits the second type special field into a plurality of field segments, and matches the split field segment with the target classification.
若所述预设的动态列表为第三类型动态列表,则从所述归一化后的第三类型分类特征中提取包含可拆分字符的第三类型特殊字段,根据该可拆分字符在所述第三类型特殊字段中的位置,将所述第三类型特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配。If the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character The location in the third type special field splits the third type special field into a plurality of field segments, and matches the split field segment with the target classification.
步骤S34,通过预设的字段逻辑包含关系(或字段语义包含关系),将匹配不成功的字段与目标分类进行匹配。Step S34, matching the unsuccessful matching field with the target classification by using a preset field logical inclusion relationship (or a field semantic inclusion relationship).
优选地,在本实施例中,步骤S33中的数据匹配可以记为第一次匹配,所述第一次匹配包括:特殊字段的匹配(即将所述特殊字段拆分成字段片段与目标分类进行匹配)与非特殊字段的匹配(即将所述归一化后的分类特征中的非特殊字段与目标分类进行匹配)。进一步地,步骤S34中的数据匹配可以记为第二次匹配,所述第二次匹配包括:将所述第一次匹配不成功的字段与目标分类进行匹配。Preferably, in this embodiment, the data matching in step S33 may be recorded as a first match, and the first match includes: matching of special fields (ie, splitting the special field into field segments and target classifications) Matching) matching with non-special fields (that is, matching non-special fields in the normalized classification features with target classifications). Further, the data matching in step S34 may be recorded as a second matching, and the second matching includes: matching the field in which the first matching is unsuccessful with the target classification.
优选地,在本实施例中,所述通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配的步骤包括:Preferably, in this embodiment, the step of matching the unsuccessful matching field with the target classification by using the preset field logical inclusion relationship includes:
根据语义逻辑相似度计算算法(如基于树状层次计算语义相似度的算法),计算匹配不成功的字段(即第一次匹配不成功的字段)与目标分类的语义相似度值;According to a semantic logic similarity calculation algorithm (such as an algorithm for calculating semantic similarity based on a tree hierarchy), a semantic similarity value of a field with unsuccessful matching (that is, a field with unsuccessful first matching) and a target classification is calculated;
若该语义相似度值大于预设阀值(如80%),则判定该匹配不成功的字段与该目标分类存在逻辑包含关系,将该匹配不成功的字段标记为与该目标分类存在匹配关系,即将该匹配不成功的字段修改为匹配成功的字段。If the semantic similarity value is greater than a preset threshold (eg, 80%), determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification , the field that the match was unsuccessful is modified to match the successful field.
举例而言,若第一次匹配不成功的字段包含“阿司匹林片”,而目标分类包含字段“阿司匹林”,由于“阿司匹林片”与“阿司匹林”存在语义逻辑包含关系,则将该匹配不成功的字段“阿司匹林片”修改为匹配成功的字段。For example, if the field with the first unsuccessful match contains "aspirin tablets" and the target category contains the field "aspirin", since the "aspirin tablets" and "aspirin" have a semantic logic inclusion relationship, the match is unsuccessful. The field "Aspirie Slices" is modified to match the successful fields.
由于步骤S34将步骤S33中第一次匹配不成功的字段进一步进行匹配,若发现第一次匹配不成功的字段与目标分类存在逻辑包含关系(或语义包含关 系),则将第一次匹配不成功的字段修改为匹配成功的字段。因此,本发明通过步骤S34中所述的字段逻辑包含关系(或字段语义包含关系)解决了不规则性缺失字段的匹配问题,从而进一步地提升了分类特征与目标分类的匹配成功率和准确率,且匹配效率与手动匹配相比有显著性优势,大大降低了人工匹配的工作量。Since the first unsuccessful matching field in step S33 is further matched in step S34, if the first unsuccessful matching field is found to have a logical inclusion relationship with the target classification (or the semantic inclusion) System), the field with the first unsuccessful match is modified to match the successful field. Therefore, the present invention solves the matching problem of the irregularity missing field by the field logical inclusion relationship (or the field semantic inclusion relationship) described in step S34, thereby further improving the matching success rate and accuracy of the classification feature and the target classification. And the matching efficiency has a significant advantage compared with the manual matching, which greatly reduces the workload of manual matching.
需要说明的是,在其它实施例中,在某些情况下,例如,所述第一次匹配成功率已经较高(如大于90%)的情形下,所述步骤S34也可以去除。It should be noted that, in other embodiments, in some cases, for example, in the case where the first matching success rate is already high (eg, greater than 90%), the step S34 may also be removed.
通过上述步骤S31-S34,本发明所提出的数据自动匹配方法,通过特殊字段结构化拆分处理,有效提升了分类特征与目标分类的匹配成功率和准确率,进一步地,通过字段逻辑包含关系(或字段语义包含关系)解决了不规则性缺失字段的匹配问题,从而进一步地提升了分类特征与目标分类的匹配成功率和准确率。Through the above steps S31-S34, the automatic data matching method proposed by the present invention effectively improves the matching success rate and accuracy of the classification feature and the target classification through the special field structured splitting process, and further, through the field logic inclusion relationship (or the field semantic inclusion relationship) solves the matching problem of the irregularity missing field, thereby further improving the matching success rate and accuracy of the classification feature and the target classification.
进一步地,为实现上述目的,本发明还提供一种计算机可读存储介质(如ROM/RAM、磁碟、光盘),所述计算机可读存储介质存储有数据自动匹配程序20,所述数据自动匹配程序20可被至少一个处理器22执行,以使所述至少一个处理器22执行如上所述的数据自动匹配方法的步骤。Further, in order to achieve the above object, the present invention also provides a computer readable storage medium (such as a ROM/RAM, a magnetic disk, an optical disk), wherein the computer readable storage medium stores an automatic data matching program 20, and the data is automatically The matching program 20 can be executed by at least one processor 22 to cause the at least one processor 22 to perform the steps of the data automatic matching method as described above.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件来实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and can also be implemented by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
以上参照附图说明了本发明的优选实施例,并非因此局限本发明的权利范围。上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。另外,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。The preferred embodiments of the present invention have been described above with reference to the drawings, and are not intended to limit the scope of the invention. The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments. Additionally, although logical sequences are shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
本领域技术人员不脱离本发明的范围和实质,可以有多种变型方案实现本发明,比如作为一个实施例的特征可用于另一实施例而得到又一实施例。凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。 A person skilled in the art can implement the invention in various variants without departing from the scope and spirit of the invention. For example, the features of one embodiment can be used in another embodiment to obtain a further embodiment. The equivalent structure or equivalent process transformations made by the present specification and the drawings are directly or indirectly applied to other related technical fields, and are included in the scope of patent protection of the present invention.

Claims (20)

  1. 一种电子设备,其特征在于,所述电子设备包括存储器、处理器,所述存储器上存储有数据自动匹配程序,所述数据自动匹配程序被所述处理器执行时实现如下步骤:An electronic device, comprising: a memory, a processor, wherein the memory stores an automatic data matching program, and when the data automatic matching program is executed by the processor, the following steps are implemented:
    获取特征提取操作得到的分类特征;Obtaining the classification feature obtained by the feature extraction operation;
    根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理,得到归一化后的分类特征;Performing normalization on the classification features obtained by the feature extraction operation according to a preset dynamic list to obtain a normalized classification feature;
    从所述归一化后的分类特征中提取包含可拆分字符的特殊字段,根据该可拆分字符在所述特殊字段中的位置,将所述特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配;及Extracting a special field containing the detachable character from the normalized classification feature, and splitting the special field into a plurality of field segments according to the position of the detachable character in the special field, and The split field segment matches the target classification; and
    通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配。The unsuccessful matching field is matched with the target classification by a preset field logical inclusion relationship.
  2. 如权利要求1所述的电子设备,其特征在于,所述根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理包括:The electronic device according to claim 1, wherein the normalizing the classification feature obtained by the feature extraction operation according to the preset dynamic list comprises:
    若所述预设的动态列表为第一类型动态列表,则提取该第一类型动态列表中存储的第一类型特殊字符,根据所述提取的第一类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第一类型分类特征;If the preset dynamic list is the first type dynamic list, extracting the first type special character stored in the first type dynamic list, and obtaining the feature extraction operation according to the extracted first type special character The classification feature is deleted or replaced to obtain a normalized first type classification feature;
    若所述预设的动态列表为第二类型动态列表,则提取该第二类型动态列表中存储的第二类型特殊字符,根据所述提取的第二类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第二类型分类特征;及If the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character The classification feature is deleted or replaced to obtain a normalized second type classification feature;
    若所述预设的动态列表为第三类型动态列表,则提取该第三类型动态列表中存储的第三类型特殊字符,根据所述提取的第三类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第三类型分类特征。If the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character The classification feature is deleted or replaced to obtain a normalized third type classification feature.
  3. 如权利要求2所述的电子设备,其特征在于,将所述特殊字段拆分成若干字段片段包括:The electronic device of claim 2, wherein splitting the special field into a plurality of field segments comprises:
    将该可拆分字符在所述特殊字段中的位置记录为拆分点;及Recording the position of the detachable character in the special field as a split point; and
    分别提取该拆分点之前的字段片段和该拆分点之后的字段片段。The field segment before the split point and the field segment after the split point are extracted separately.
  4. 如权利要求3所述的电子设备,其特征在于,将所述特殊字段拆分成若干字段片段包括:The electronic device of claim 3, wherein splitting the special field into a plurality of field segments comprises:
    若所述预设的动态列表为第一类型动态列表,则从所述归一化后的第一类型分类特征中提取包含可拆分字符的第一类型特殊字段,根据该可拆分字符在所述第一类型特殊字段中的位置,将所述第一类型特殊字段拆分成若干 字段片段;If the preset dynamic list is a first type dynamic list, extracting, from the normalized first type classification feature, a first type special field including a detachable character, according to the detachable character Positioning in the first type special field, splitting the first type special field into several Field fragment
    若所述预设的动态列表为第二类型动态列表,则从所述归一化后的第二类型分类特征中提取包含可拆分字符的第二类型特殊字段,根据该可拆分字符在所述第二类型特殊字段中的位置,将所述第二类型特殊字段拆分成若干字段片段;及If the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character Positioning in the second type special field, splitting the second type special field into a plurality of field segments; and
    若所述预设的动态列表为第三类型动态列表,则从所述归一化后的第三类型分类特征中提取包含可拆分字符的第三类型特殊字段,根据该可拆分字符在所述第三类型特殊字段中的位置,将所述第三类型特殊字段拆分成若干字段片段。If the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character The location in the third type special field splits the third type special field into several field segments.
  5. 如权利要求1所述的电子设备,其特征在于,所述通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配包括:The electronic device according to claim 1, wherein the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship comprises:
    根据语义逻辑相似度计算算法,计算匹配不成功的字段与目标分类的语义相似度值;及Calculating a semantic similarity value between the unsuccessful matching field and the target classification according to the semantic logic similarity calculation algorithm; and
    若该语义相似度值大于预设阀值,则判定该匹配不成功的字段与该目标分类存在逻辑包含关系,将该匹配不成功的字段标记为与该目标分类存在匹配关系。If the semantic similarity value is greater than the preset threshold, determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification.
  6. 如权利要求1所述的电子设备,其特征在于,所述预设的动态列表根据数据源的数据变化进行动态调整。The electronic device according to claim 1, wherein the preset dynamic list is dynamically adjusted according to data changes of the data source.
  7. 如权利要求1所述的电子设备,其特征在于,所述目标分类为内部数据平台中预设的规则数据。The electronic device according to claim 1, wherein the target is classified into rule data preset in an internal data platform.
  8. 一种数据自动匹配方法,应用于电子设备,其特征在于,所述方法包括:An automatic data matching method is applied to an electronic device, and the method includes:
    获取特征提取操作得到的分类特征;Obtaining the classification feature obtained by the feature extraction operation;
    根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理,得到归一化后的分类特征;Performing normalization on the classification features obtained by the feature extraction operation according to a preset dynamic list to obtain a normalized classification feature;
    从所述归一化后的分类特征中提取包含可拆分字符的特殊字段,根据该可拆分字符在所述特殊字段中的位置,将所述特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配;及Extracting a special field containing the detachable character from the normalized classification feature, and splitting the special field into a plurality of field segments according to the position of the detachable character in the special field, and The split field segment matches the target classification; and
    通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配。The unsuccessful matching field is matched with the target classification by a preset field logical inclusion relationship.
  9. 如权利要求8所述的数据自动匹配方法,其特征在于,所述根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理包括:The automatic data matching method according to claim 8, wherein the normalizing the classification features obtained by the feature extraction operation according to the preset dynamic list comprises:
    若所述预设的动态列表为第一类型动态列表,则提取该第一类型动态列表中存储的第一类型特殊字符,根据所述提取的第一类型特殊字符对所述特 征提取操作得到的分类特征进行删除或替换,得到归一化后的第一类型分类特征;If the preset dynamic list is the first type dynamic list, extracting the first type special character stored in the first type dynamic list, and according to the extracted first type special character pair The classification feature obtained by the extraction operation is deleted or replaced, and the normalized first type classification feature is obtained;
    若所述预设的动态列表为第二类型动态列表,则提取该第二类型动态列表中存储的第二类型特殊字符,根据所述提取的第二类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第二类型分类特征;及If the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character The classification feature is deleted or replaced to obtain a normalized second type classification feature;
    若所述预设的动态列表为第三类型动态列表,则提取该第三类型动态列表中存储的第三类型特殊字符,根据所述提取的第三类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第三类型分类特征。If the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character The classification feature is deleted or replaced to obtain a normalized third type classification feature.
  10. 如权利要求8所述的数据自动匹配方法,其特征在于,将所述特殊字段拆分成若干字段片段包括:The data automatic matching method according to claim 8, wherein the splitting the special field into a plurality of field segments comprises:
    将该可拆分字符在所述特殊字段中的位置记录为拆分点;及Recording the position of the detachable character in the special field as a split point; and
    分别提取该拆分点之前的字段片段和该拆分点之后的字段片段。The field segment before the split point and the field segment after the split point are extracted separately.
  11. 如权利要求10所述的数据自动匹配方法,其特征在于,将所述特殊字段拆分成若干字段片段包括:The data automatic matching method according to claim 10, wherein the splitting the special field into a plurality of field segments comprises:
    若所述预设的动态列表为第一类型动态列表,则从所述归一化后的第一类型分类特征中提取包含可拆分字符的第一类型特殊字段,根据该可拆分字符在所述第一类型特殊字段中的位置,将所述第一类型特殊字段拆分成若干字段片段;If the preset dynamic list is a first type dynamic list, extracting, from the normalized first type classification feature, a first type special field including a detachable character, according to the detachable character a location in the first type special field, splitting the first type special field into a plurality of field segments;
    若所述预设的动态列表为第二类型动态列表,则从所述归一化后的第二类型分类特征中提取包含可拆分字符的第二类型特殊字段,根据该可拆分字符在所述第二类型特殊字段中的位置,将所述第二类型特殊字段拆分成若干字段片段;及If the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character Positioning in the second type special field, splitting the second type special field into a plurality of field segments; and
    若所述预设的动态列表为第三类型动态列表,则从所述归一化后的第三类型分类特征中提取包含可拆分字符的第三类型特殊字段,根据该可拆分字符在所述第三类型特殊字段中的位置,将所述第三类型特殊字段拆分成若干字段片段。If the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character The location in the third type special field splits the third type special field into several field segments.
  12. 如权利要求8所述的数据自动匹配方法,其特征在于,所述通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配包括:The automatic data matching method according to claim 8, wherein the matching the unsuccessful field with the target classification by using the preset field logical inclusion relationship comprises:
    根据语义逻辑相似度计算算法,计算匹配不成功的字段与目标分类的语义相似度值;及Calculating a semantic similarity value between the unsuccessful matching field and the target classification according to the semantic logic similarity calculation algorithm; and
    若该语义相似度值大于预设阀值,则判定该匹配不成功的字段与该目标分类存在逻辑包含关系,将该匹配不成功的字段标记为与该目标分类存在匹配关系。 If the semantic similarity value is greater than the preset threshold, determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification.
  13. 如权利要求8所述的数据自动匹配方法,其特征在于,所述预设的动态列表根据数据源的数据变化进行动态调整。The data automatic matching method according to claim 8, wherein the preset dynamic list is dynamically adjusted according to data changes of the data source.
  14. 如权利要求8所述的数据自动匹配方法,其特征在于,所述目标分类为内部数据平台中预设的规则数据。The data automatic matching method according to claim 8, wherein the target classification is preset rule data in an internal data platform.
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有数据自动匹配程序,所述数据自动匹配程序可被至少一个处理器执行,以使所述至少一个处理器执行以下步骤:A computer readable storage medium storing an automatic data matching program, the data automatic matching program being executable by at least one processor to cause the at least one processor to perform the following steps:
    获取特征提取操作得到的分类特征;Obtaining the classification feature obtained by the feature extraction operation;
    根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理,得到归一化后的分类特征;Performing normalization on the classification features obtained by the feature extraction operation according to a preset dynamic list to obtain a normalized classification feature;
    从所述归一化后的分类特征中提取包含可拆分字符的特殊字段,根据该可拆分字符在所述特殊字段中的位置,将所述特殊字段拆分成若干字段片段,并将所述拆分成的字段片段与目标分类进行匹配;及Extracting a special field containing the detachable character from the normalized classification feature, and splitting the special field into a plurality of field segments according to the position of the detachable character in the special field, and The split field segment matches the target classification; and
    通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配。The unsuccessful matching field is matched with the target classification by a preset field logical inclusion relationship.
  16. 如权利要求15所述的计算机可读存储介质,其特征在于,所述根据预设的动态列表对所述特征提取操作得到的分类特征进行归一化处理包括:The computer readable storage medium according to claim 15, wherein the normalizing the classification feature obtained by the feature extraction operation according to the preset dynamic list comprises:
    若所述预设的动态列表为第一类型动态列表,则提取该第一类型动态列表中存储的第一类型特殊字符,根据所述提取的第一类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第一类型分类特征;If the preset dynamic list is the first type dynamic list, extracting the first type special character stored in the first type dynamic list, and obtaining the feature extraction operation according to the extracted first type special character The classification feature is deleted or replaced to obtain a normalized first type classification feature;
    若所述预设的动态列表为第二类型动态列表,则提取该第二类型动态列表中存储的第二类型特殊字符,根据所述提取的第二类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第二类型分类特征;及If the preset dynamic list is the second type dynamic list, extracting the second type special character stored in the second type dynamic list, and obtaining the feature extraction operation according to the extracted second type special character The classification feature is deleted or replaced to obtain a normalized second type classification feature;
    若所述预设的动态列表为第三类型动态列表,则提取该第三类型动态列表中存储的第三类型特殊字符,根据所述提取的第三类型特殊字符对所述特征提取操作得到的分类特征进行删除或替换,得到归一化后的第三类型分类特征。If the preset dynamic list is a third type dynamic list, extracting a third type special character stored in the third type dynamic list, and obtaining the feature extraction operation according to the extracted third type special character The classification feature is deleted or replaced to obtain a normalized third type classification feature.
  17. 如权利要求15所述的计算机可读存储介质,其特征在于,将所述特殊字段拆分成若干字段片段包括:The computer readable storage medium of claim 15 wherein splitting the special field into a number of field segments comprises:
    将该可拆分字符在所述特殊字段中的位置记录为拆分点;及Recording the position of the detachable character in the special field as a split point; and
    分别提取该拆分点之前的字段片段和该拆分点之后的字段片段。The field segment before the split point and the field segment after the split point are extracted separately.
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,将所述特 殊字段拆分成若干字段片段包括:A computer readable storage medium according to claim 17, wherein said special The split field is divided into several field segments including:
    若所述预设的动态列表为第一类型动态列表,则从所述归一化后的第一类型分类特征中提取包含可拆分字符的第一类型特殊字段,根据该可拆分字符在所述第一类型特殊字段中的位置,将所述第一类型特殊字段拆分成若干字段片段;If the preset dynamic list is a first type dynamic list, extracting, from the normalized first type classification feature, a first type special field including a detachable character, according to the detachable character a location in the first type special field, splitting the first type special field into a plurality of field segments;
    若所述预设的动态列表为第二类型动态列表,则从所述归一化后的第二类型分类特征中提取包含可拆分字符的第二类型特殊字段,根据该可拆分字符在所述第二类型特殊字段中的位置,将所述第二类型特殊字段拆分成若干字段片段;及If the preset dynamic list is a second type dynamic list, extracting, from the normalized second type classification feature, a second type special field including a detachable character, according to the detachable character Positioning in the second type special field, splitting the second type special field into a plurality of field segments; and
    若所述预设的动态列表为第三类型动态列表,则从所述归一化后的第三类型分类特征中提取包含可拆分字符的第三类型特殊字段,根据该可拆分字符在所述第三类型特殊字段中的位置,将所述第三类型特殊字段拆分成若干字段片段。If the preset dynamic list is a third type dynamic list, extracting, from the normalized third type classification feature, a third type special field including a detachable character, according to the detachable character The location in the third type special field splits the third type special field into several field segments.
  19. 如权利要求15所述的计算机可读存储介质,其特征在于,所述通过预设的字段逻辑包含关系,将匹配不成功的字段与目标分类进行匹配包括:The computer readable storage medium according to claim 15, wherein the matching the unsuccessful matching field with the target classification by using the preset field logical inclusion relationship comprises:
    根据语义逻辑相似度计算算法,计算匹配不成功的字段与目标分类的语义相似度值;及Calculating a semantic similarity value between the unsuccessful matching field and the target classification according to the semantic logic similarity calculation algorithm; and
    若该语义相似度值大于预设阀值,则判定该匹配不成功的字段与该目标分类存在逻辑包含关系,将该匹配不成功的字段标记为与该目标分类存在匹配关系。If the semantic similarity value is greater than the preset threshold, determining that the unsuccessful match field has a logical inclusion relationship with the target classification, and marking the unsuccessful matching field as having a matching relationship with the target classification.
  20. 如权利要求15所述的计算机可读存储介质,其特征在于,所述预设的动态列表根据数据源的数据变化进行动态调整。 The computer readable storage medium of claim 15 wherein said predetermined dynamic list is dynamically adjusted based on data changes of the data source.
PCT/CN2017/104820 2017-08-04 2017-09-30 Automatic data matching method, electronic device and computer-readable storage medium WO2019024231A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710660957.2 2017-08-04
CN201710660957.2A CN107679544A (en) 2017-08-04 2017-08-04 Automatic data matching method, electronic equipment and computer-readable recording medium

Publications (1)

Publication Number Publication Date
WO2019024231A1 true WO2019024231A1 (en) 2019-02-07

Family

ID=61135325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/104820 WO2019024231A1 (en) 2017-08-04 2017-09-30 Automatic data matching method, electronic device and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN107679544A (en)
WO (1) WO2019024231A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090641A (en) * 2019-11-25 2020-05-01 南京医渡云医学技术有限公司 Data processing method and device, electronic equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679544A (en) * 2017-08-04 2018-02-09 平安科技(深圳)有限公司 Automatic data matching method, electronic equipment and computer-readable recording medium
CN111209924B (en) * 2018-11-19 2023-04-18 零氪科技(北京)有限公司 System for automatically extracting medical advice and application
CN110222103A (en) * 2019-04-19 2019-09-10 平安科技(深圳)有限公司 Extract method and device, the computer equipment, storage medium of excel data
CN111950974B (en) * 2020-07-02 2024-05-14 广州仓实信息科技有限公司 Progress information processing method, device, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436475A (en) * 2011-09-29 2012-05-02 用友软件股份有限公司 Data table summarizing device and data table summarizing method
CN104731976A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Method for finding and sorting private data in data table
CN106649890A (en) * 2017-02-07 2017-05-10 税云网络科技服务有限公司 Data storage method and device
CN107679544A (en) * 2017-08-04 2018-02-09 平安科技(深圳)有限公司 Automatic data matching method, electronic equipment and computer-readable recording medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103731298A (en) * 2013-11-15 2014-04-16 中国航天科工集团第二研究院七〇六所 Large-scale distributed network safety data acquisition method and system
CN103761341B (en) * 2014-02-21 2017-02-22 北京嘉和美康信息技术有限公司 Information matching method and device
CN103914570A (en) * 2014-04-25 2014-07-09 北京中讯爱乐科技有限公司 Intelligent customer service searching method and system based on character string similarity algorithm
CN105138829B (en) * 2015-08-13 2018-01-12 易保互联医疗信息科技(北京)有限公司 A kind of natural language processing method and system of Chinese medical information
CN106934409B (en) * 2015-12-29 2021-04-20 优信拍(北京)信息科技有限公司 Data matching method and device
CN106326422B (en) * 2016-08-24 2019-09-17 北京大学 A kind of method and system of the food safety data information retrieval of knowledge based ontology
CN106934220B (en) * 2017-02-24 2019-07-19 黑龙江特士信息技术有限公司 Disease class entity recognition method and device towards multi-data source

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102436475A (en) * 2011-09-29 2012-05-02 用友软件股份有限公司 Data table summarizing device and data table summarizing method
CN104731976A (en) * 2015-04-14 2015-06-24 海量云图(北京)数据技术有限公司 Method for finding and sorting private data in data table
CN106649890A (en) * 2017-02-07 2017-05-10 税云网络科技服务有限公司 Data storage method and device
CN107679544A (en) * 2017-08-04 2018-02-09 平安科技(深圳)有限公司 Automatic data matching method, electronic equipment and computer-readable recording medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111090641A (en) * 2019-11-25 2020-05-01 南京医渡云医学技术有限公司 Data processing method and device, electronic equipment and storage medium
CN111090641B (en) * 2019-11-25 2024-04-02 医渡云(北京)技术有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN107679544A (en) 2018-02-09

Similar Documents

Publication Publication Date Title
WO2019024231A1 (en) Automatic data matching method, electronic device and computer-readable storage medium
US11727305B2 (en) System and method for detecting anomalies in prediction generation systems
US10650192B2 (en) Method and device for recognizing domain named entity
WO2021003819A1 (en) Man-machine dialog method and man-machine dialog apparatus based on knowledge graph
US10650274B2 (en) Image clustering method, image clustering system, and image clustering server
WO2019085471A1 (en) Database synchronization method, application server, and computer readable storage medium
WO2019061991A1 (en) Multi-element universal model platform modeling method, electronic device, and computer readable storage medium
WO2019024162A1 (en) Intention obtaining method, electronic device, and computer-readable storage medium
EP3343411A1 (en) Sql auditing method and apparatus, server and storage device
US9116879B2 (en) Dynamic rule reordering for message classification
WO2022156066A1 (en) Character recognition method and apparatus, electronic device and storage medium
WO2020177384A1 (en) Method and apparatus for reporting and processing user message status of message pushing, and storage medium
CN109189888B (en) Electronic device, infringement analysis method, and storage medium
WO2016015621A1 (en) Human face picture name recognition method and system
CN107679084B (en) Clustering label generation method, electronic device and computer readable storage medium
WO2019085120A1 (en) Collaborative filtering recommendation method, electronic device, and computer readable storage medium
WO2019075967A1 (en) Enterprise name recognition method, electronic device, and computer-readable storage medium
WO2019075970A1 (en) Line wrap recognition method for table information, electronic device, and computer-readable storage medium
WO2021051869A1 (en) Text data layout arrangement method, device, computer apparatus, and storage medium
WO2021017290A1 (en) Knowledge graph-based entity identification data enhancement method and system
CN111859093A (en) Sensitive word processing method and device and readable storage medium
US20140052734A1 (en) Computing device and method for creating data indexes for big data
CN112416972A (en) Real-time data stream processing method, device, equipment and readable storage medium
WO2019041525A1 (en) Method, electronic apparatus, and computer readable storage medium for identifying entities having identical name
WO2019041528A1 (en) Method, electronic apparatus, and computer readable storage medium for determining polarity of news sentiment

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06/08/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17920545

Country of ref document: EP

Kind code of ref document: A1