CN114238572A

CN114238572A - Artificial intelligence-based multi-database data extraction method and device and electronic equipment

Info

Publication number: CN114238572A
Application number: CN202111536087.0A
Authority: CN
Inventors: 陈蔚然
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2022-03-25
Anticipated expiration: 2041-12-15
Also published as: CN114238572B

Abstract

The application discloses a method and a device for extracting data of multiple databases based on artificial intelligence and electronic equipment, wherein the method comprises the following steps: determining a characteristic field according to the extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task; carrying out field expansion on the characteristic fields to obtain x expansion fields, and combining the characteristic fields and the x expansion fields to obtain a characteristic field group; determining a plurality of databases according to the extraction task, and clustering the plurality of databases to obtain z database groups; performing data extraction in each database group in the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results correspond to the z database groups one by one; generating an extraction result heteromorphic graph according to the z extraction results, and adjusting the extraction result heteromorphic graph to obtain an extraction result isomorphic graph; and sending the z extraction results and the extraction result isomorphic graph to display equipment for displaying.

Description

Artificial intelligence-based multi-database data extraction method and device and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a method and a device for extracting multi-database data based on artificial intelligence and electronic equipment.

Background

At present, in data governance work, the indispensable step is to carry out combing and induction on indexes of a plurality of departments, which needs to gather data of indexes or fields containing target indexes or fields and synonyms of the target indexes or fields one by one. However, since there are many departments and the systems and databases used by the departments are different, the conventional extraction method cannot extract data in a plurality of different systems, that is, the data extraction method cannot be used universally among different databases, which results in low extraction efficiency.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the application provides a method and a device for extracting data of multiple databases based on artificial intelligence and an electronic device, so that the problem that the data extraction mode cannot be universal among different databases is solved, and the data extraction efficiency is improved.

In a first aspect, an embodiment of the present application provides an artificial intelligence-based multi-database data extraction method, including:

determining a characteristic field according to the extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task;

carrying out field extension on the characteristic field to obtain x extension fields, and combining the characteristic field and the x extension fields to obtain a characteristic field group, wherein x is an integer greater than or equal to 1;

determining a plurality of databases according to the extraction task, and clustering the databases to obtain z database groups, wherein z is an integer greater than or equal to 1;

performing data extraction in each database group in the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results correspond to the z database groups one by one;

generating an extraction result heteromorphic graph according to the z extraction results, and adjusting the extraction result heteromorphic graph to obtain an extraction result isomorphic graph;

and sending the z extraction results and the extraction result isomorphic graph to display equipment for displaying.

In a second aspect, an embodiment of the present application provides an artificial intelligence-based multi-database data extraction apparatus, including:

the expansion module is used for determining a characteristic field according to the extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task, field expansion is carried out on the characteristic field to obtain x expansion fields, and the characteristic field and the x expansion fields are combined to obtain a characteristic field group, wherein x is an integer greater than or equal to 1;

the clustering module is used for determining a plurality of databases according to the extraction task and clustering the databases to obtain z database groups, wherein z is an integer greater than or equal to 1;

the extraction module is used for extracting data in each database group in the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results correspond to the z database groups one by one;

the processing module is used for generating an extraction result heteromorphic graph according to the z extraction results and adjusting the extraction result heteromorphic graph to obtain an extraction result isomorphic graph;

and the display module is used for sending the z extraction results and the extraction result isomorphism map to display equipment for displaying.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor coupled to the memory, the memory for storing a computer program, the processor for executing the computer program stored in the memory to cause the electronic device to perform the method of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, the computer program causing a computer to perform the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer operable to cause the computer to perform a method according to the first aspect.

The implementation of the embodiment of the application has the following beneficial effects:

in the embodiment of the application, the general indexes or fields of the data required to be extracted are determined through the extraction task, and then the general indexes or fields are expanded to find out the similar words, the synonyms and the like of the general indexes or fields, so that the data required to be extracted can be more comprehensively described. Then, the databases which may exist in the data which need to be extracted are determined through the extraction task, and clustering processing is carried out on the databases, so that the associated databases among the services of the stored data are clustered together, the stored data are associated through the association among the services, and the efficiency of subsequent data extraction is improved. After all the data meeting the requirements are extracted, an abnormal composition graph and a isomorphic composition graph can be constructed according to the extraction results, the relevance between the extracted data is visually displayed, meanwhile, the isomorphic composition graph can obviously display the relation center of the extracted data, and the complexity of subsequent analysis and processing is simplified.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic hardware structure diagram of an artificial intelligence-based multi-database data extraction apparatus according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an artificial intelligence-based multi-database data extraction method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a method for performing field extension on a feature field to obtain x extended fields according to an embodiment of the present application;

fig. 4 is a schematic flow chart of a method for performing semantic extraction on each first term according to an adjacent term of each first term in the h first terms in the feature field to obtain h semantic vectors corresponding to the h first terms one to one according to the embodiment of the present application;

fig. 5 is a schematic flowchart of a method for determining a plurality of databases according to an extraction task and performing clustering on the plurality of databases to obtain z database clusters according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a resulting metamorphic image provided in accordance with an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a resulting isomorphism provided by an embodiment of the present application;

fig. 8 is a block diagram illustrating functional modules of an artificial intelligence-based multi-database data extraction apparatus according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments obtained by a person of ordinary skill in the art without any inventive work based on the embodiments in the present application are within the scope of protection of the present application.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

First, referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of an artificial intelligence-based multi-database data extraction apparatus according to an embodiment of the present disclosure. The artificial intelligence based multi-database data extraction apparatus 100 includes at least one processor 101, a communication line 102, a memory 103, and at least one communication interface 104.

In this embodiment, the processor 101 may be a general processing unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more ics for controlling the execution of programs according to the present disclosure.

The communication link 102, which may include a path, carries information between the aforementioned components.

The communication interface 104 may be any transceiver or other device (e.g., an antenna, etc.) for communicating with other devices or communication networks, such as an ethernet, RAN, Wireless Local Area Network (WLAN), etc.

The memory 103 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In this embodiment, the memory 103 may be independent and connected to the processor 101 through the communication line 102. The memory 103 may also be integrated with the processor 101. The memory 103 provided in the embodiments of the present application may generally have a nonvolatile property. The memory 103 is used for storing computer-executable instructions for executing the scheme of the application, and is controlled by the processor 101 to execute. The processor 101 is configured to execute computer-executable instructions stored in the memory 103, thereby implementing the methods provided in the embodiments of the present application described below.

In alternative embodiments, computer-executable instructions may also be referred to as application code, which is not specifically limited in this application.

In alternative embodiments, processor 101 may include one or more CPUs, such as CPU0 and CPU1 of FIG. 1.

In an alternative embodiment, the artificial intelligence based multiple database data extraction apparatus 100 may include multiple processors, such as processor 101 and processor 107 in fig. 1. Each of these processors may be a single-core (single-CPU) processor or a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In an alternative embodiment, if the artificial intelligence based multi-database data extraction apparatus 100 is a server, for example, it may be an independent server, or may be a cloud server that provides basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, Content Delivery Network (CDN), big data and artificial intelligence platform, and the like. The artificial intelligence based multi-database data extraction apparatus 100 may further comprise an output device 105 and an input device 106. The output device 105 is in communication with the processor 101 and may display information in a variety of ways. For example, the output device 105 may be a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, a projector (projector), or the like. The input device 106 is in communication with the processor 101 and may receive user input in a variety of ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, or a sensing device, among others.

The artificial intelligence based multi-database data extraction apparatus 100 may be a general purpose device or a special purpose device. The present embodiment does not limit the type of the artificial intelligence based multi-database data extraction apparatus 100.

Next, it should be noted that the embodiments disclosed in the present application may acquire and process related data based on artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.

The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The artificial intelligence-based multi-database data extraction method disclosed in the present application will be explained below:

referring to fig. 2, fig. 2 is a schematic flowchart of an artificial intelligence-based multi-database data extraction method according to an embodiment of the present disclosure. The method for extracting the data of the multiple databases based on the artificial intelligence comprises the following steps:

201: and determining the characteristic field according to the extraction task.

In this embodiment, the feature field may be a general field for extracting data that the task requires to extract. Illustratively, the feature field may be a format of a data file, e.g., for a picture file, the special field may be "jpg", "png", "tif", etc.; the feature field may also be a specific value of a feature, for example, if the extraction task is to extract loan data of all male customers, the feature field may be "gender: male "; in addition, the characteristic field may also be a general publicity for a certain campaign, for example, if the current extraction task is to extract the campaign data of all the bieleven discounted campaigns, the characteristic field may be some publicities commonly used in the campaign: "double eleven folding activities" and the like.

202: and carrying out field extension on the characteristic field to obtain x extension fields, and combining the characteristic field and the x extension fields to obtain a characteristic field group.

In this embodiment, x is an integer greater than or equal to 1, and each of the x extension fields is a synonymous field or a near field of the feature field. In short, synonymy or near-synonymy expansion can be performed on the characteristic field, synonyms and near-synonyms of the characteristic field can be found, and then the data needing to be extracted can be more comprehensively described, so that the finally extracted data is more comprehensive.

Specifically, the present embodiment provides a method for performing field extension on a feature field to obtain x extension fields, and as shown in fig. 3, the method includes:

301: and performing word segmentation processing on the characteristic fields to obtain h first words.

In the present embodiment, h is an integer of 1 or more. For example, in the present embodiment, the feature field may be segmented by N-gram segmentation using the numbers of elements 2, 3, and 4, respectively. Specifically, the N-gram segmentation method is a method of segmenting a sentence into a plurality of segment sequences each composed of N characters, each segment being called an N-gram. The N-gram segmentation may be referred to as uni-gram (unary-gram) when N is 1, bi-gram (binary-gram) when N is 2, and tri-gram (ternary-gram) when N is 3.

Specifically, following the above-described example of a feature field for "bieleven discounting activities are in progress", if bi-gram is used to segment the feature field, the first term can be derived: "double ten", "eleven", "dozen", "discount", "active", "moving positive", "on", "in", "going on" and "in line".

In addition, in an alternative embodiment, after the segmentation result is obtained, the segmentation result may be filtered and cleaned to filter out the segmentation result that is not meaningful. Following the above example of "bieleven discount activities are in progress", fields of the segmentation results of which "discount activity", "move positive", "go" and "go" are all of no specific meaning can be filtered out, and the segmentation results containing certain semantics are retained, for example: "twenty", "eleven", "dozen", "discount", "active", "on", and "go", these retained segmentation results are taken as the final first words.

Furthermore, in an alternative embodiment, vocabulary association may also be performed through the context of the segmented first word to obtain a word with a new meaning as the supplementary first word. For example, for the first word "twenty", in connection with the first word "eleven" below it, the word "twenty-one" containing the new meaning may be combined as the complementary first word.

302: and according to the adjacent terms of each first term in the h first terms in the characteristic field, performing semantic extraction on each first term to obtain h semantic vectors corresponding to the h first terms one by one.

In this embodiment, the separate semantic extraction of each first term may split the semantics of the first terms from each other, and lose the original relevance. Based on this, the present embodiment provides a method for extracting semantics of each h first term according to an adjacent term of each first term in a feature field, so as to obtain h semantic vectors corresponding to the h first terms one to one, and may retain a relevance between each first term, as shown in fig. 4, where the method includes:

401: and respectively carrying out word embedding processing on each first word and adjacent words of each first word to obtain a first word vector corresponding to each first word and an adjacent word vector corresponding to the adjacent words.

In this embodiment, the adjacent words may be left adjacent words and/or right adjacent words. Meanwhile, when the current first word is the first word, the left adjacent word does not exist, so that when the adjacent word is defined as the left adjacent word, the adjacent word can be regarded as empty to be subjected to word embedding processing, or the left adjacent word of the adjacent word is replaced by the right adjacent word to be subjected to word embedding processing. Similarly, when the current first word is the last word, the word adjacent to the right does not exist, so when the adjacent word is defined as the word adjacent to the right, the word embedding processing can be performed by regarding the adjacent word as empty, or performing the word embedding processing by replacing the adjacent word with the word adjacent to the left from the word adjacent to the right.

402: determining the word length of the adjacent words, and multiplying the word length of the adjacent words and the adjacent word vectors to obtain second word vectors.

Specifically, if the adjacent word vector corresponding to the adjacent word is [213456], and the word length is 2, the second word vector is: [213423 ]. times.2 ═ 426846 ].

403: and splicing the first word vector and the second word vector to obtain the semantic vector of each first word.

In this embodiment, the first word vector and the second word vector may be transversely spliced according to a position relationship between the first word and the adjacent word, so as to obtain a semantic vector of each first word. For example, if the adjacent word is defined as a left adjacent word and the second word vector is [426846], and the first word vector corresponding to the first word is [11223344], then the semantic vector of the first word may be: [42684611223344]. Therefore, the semantic vector of each first word is disturbed through the adjacent words of each first word, so that the semantic vector of each first word comprises the semantics of the adjacent words, the relevance between the semantic vector and the adjacent words is deeply excavated, and the problem that the semantic vectors of h first words are mutually split is avoided.

303: and matching in a preset expansion library according to the semantic vector corresponding to each first word to obtain h second word groups corresponding to the h first words one by one.

In the embodiment, a set of commonly used words predetermined by experts is stored in the preset expansion library, and the commonly used words can be classified and stored according to different application fields. Therefore, when matching is carried out, the application field corresponding to the data extracted by the task can be determined by extracting the task, and then the common vocabulary corresponding to the application field is called for matching. The matching efficiency is improved, and meanwhile, the obtained second words in the second word group are more accurate.

304: and sequencing the terms in the h second term groups according to the position sequence of each first term in the characteristic field to obtain x expansion fields.

In this embodiment, the second term in a certain second term group may be used to replace the corresponding first term in the original feature field to obtain the extension field. Or each first word in the original characteristic field can be replaced by a certain second word in the corresponding second word group, so as to obtain the extension field.

Meanwhile, in an optional implementation mode, semantic extraction can be performed on the original characteristic field to obtain a semantic vector, then similarity calculation is performed on the generated semantic vector of the extended field, and then the extended field with overlarge semantic deviation is removed.

203: and determining a plurality of databases according to the extraction task, and clustering the plurality of databases to obtain z database groups.

In the present embodiment, z is an integer of 1 or more. Meanwhile, the present embodiment also provides a method for determining a plurality of databases according to an extraction task, and performing clustering processing on the plurality of databases to obtain z database clusters, as shown in fig. 5, where the method includes:

501: and determining the service type corresponding to the data to be extracted according to the extraction task.

502: a plurality of databases are determined in a database cluster according to the business type.

In this embodiment, the plurality of databases are databases associated with the service corresponding to the service type determined in step 501. In a simple sense, the plurality of databases may be databases storing data generated by or providing data for services of the service type. For example, the corresponding service table of each database may be checked to determine whether a service with the same service type as the service type determined in step 501 exists in the service interfaced with the database, and if so, the database may be used as a member of the plurality of databases.

503: and determining the relation scores between the database A and the database B in the plurality of databases to obtain the k relation scores.

In this embodiment, the database a and the database B are any two different databases in the plurality of databases, each relationship score in the k relationship scores is used to identify a degree of association between data stored in the two databases corresponding to each relationship score, and k is an integer greater than or equal to 1.

Specifically, in this embodiment, the common traffic amount between the database a and the database B may be determined, and at the same time, the total traffic amount of the database a and the total traffic amount of the database B may be determined separately. Therefore, the ratio of the common service quantity of the database A to the service quantity and the ratio of the common service quantity of the database B to the total service quantity are summed to obtain the relationship score between the database A and the database B.

That is, the relationship score can be expressed by the formula (i):

wherein g represents the relation score between database A and database B, j represents the number of common services between database A and database B, s_ARepresenting the total number of services, s, of database A_BRepresenting the total traffic volume of database B.

504: and clustering the plurality of databases according to the k relation scores to obtain z database groups.

In this embodiment, several databases whose relationship scores are greater than a preset threshold may be clustered into one database group. Therefore, databases with the same service can be clustered together, and the efficiency of subsequent data extraction is improved.

In an alternative embodiment, the database may be further classified by a database relationship table. For example, the primary database and its corresponding backup database are classified into one class, or the databases serving the same business chain are classified into one class. In addition, when the determined number of the databases is 1, the clustering process can be directly skipped, and the subsequent data extraction can be directly performed.

204: and performing data extraction in each database group in the z database groups according to the characteristic field group to obtain z extraction results corresponding to the z database groups one by one.

In this embodiment, an artificial intelligence based index identifier is provided for data extraction for each of z database clusters. Specifically, the pointer identifier may be a Robot Process Automation (RPA) crawler that can quickly navigate between different databases and view the fields, pointers in the tables. Then, the similarity between the inspected index and the fields in the characteristic field group is calculated, and then the data with the similarity meeting the set condition is extracted to obtain the extraction result.

205: and generating an extraction result heteromorphic graph according to the z extraction results, and adjusting the extraction result heteromorphic graph to obtain an extraction result homographic graph.

In the present embodiment, each of the z extraction results may include q extraction data, q being an integer greater than or equal to 1. Based on this, the z extraction results include z × q extraction data in total.

In this regard, in the present embodiment, the keyword extraction may be performed for each of the z × q pieces of extracted data to obtain z × q pieces of first keywords. In short, there is a one-to-one correspondence between the zxq first keywords and the zxq extracted data. Then, performing deduplication processing on the zxq first keywords to obtain p second keywords, wherein p is an integer which is greater than or equal to 1 and less than or equal to zxq. And then taking p second keywords as p word nodes, taking zxq extracted data as zxq data nodes, and connecting each data node in the zxq data nodes with the word node corresponding to each data node to obtain a result heteromorphic graph. And finally, in the result abnormal composition graph, connecting all data nodes connected with each word node in the p word nodes with each other, and deleting the p word nodes to obtain a result abnormal composition graph.

Specifically, assume that there are 5 pieces of extracted data: data 1, data 2, data 3, data 4, and data 5. The keywords are extracted, the keywords of the data 1 are "loan" and "interest", the keywords of the data 2 are "fund" and "interest", the keywords of the data 3 are "loan", the keywords of the data 4 are "interest", and the keywords of the data 5 are "fund". After the repeated processing, 3 keywords "loan", "fund" and "interest" are obtained, which are not repeated, and are respectively used as word nodes. Meanwhile, the data 1, the data 2, the data 3, the data 4 and the data 5 are respectively used as data nodes, and each data node is connected with the corresponding word node, so that a result heteromorphic graph as shown in fig. 6 can be obtained.

Based on the result heteromorphic graph of fig. 6, the data nodes connected with each word node in the graph are connected with each other, and each word node and the connection line between each word node and the corresponding data node are deleted, so that the result homomorphic graph shown in fig. 7 can be obtained. Meanwhile, as can be seen from the result composition, the line segments drawn by the data 2 and the data 1 are the most, and therefore, the relevance between the data 2 and the data 1 and each data extracted this time is the strongest. Therefore, the relationship center of the data extracted at this time can be obviously displayed through the result composition graph, and the complexity of subsequent analysis processing is simplified.

206: and sending the z extraction results and the extraction result isomorphic graph to display equipment for displaying.

In summary, in the artificial intelligence-based multi-database data extraction method provided by the invention, the general indicators or fields of the data required to be extracted are determined through the extraction tasks, and then the general indicators or fields are expanded to find out the synonyms, synonyms and the like of the general indicators or fields, so as to more comprehensively describe the data required to be extracted. Then, the databases which may exist in the data which need to be extracted are determined through the extraction task, and clustering processing is carried out on the databases, so that the associated databases among the services of the stored data are clustered together, the stored data are associated through the association among the services, and the efficiency of subsequent data extraction is improved. Meanwhile, the data in different databases are crawled by combining the RPA crawler with AI field identification, so that the problem that the data extraction mode between different databases cannot be universal is solved, the RPA crawler replaces repeated labor, and the data extraction efficiency is improved more efficiently. Finally, after all the data meeting the requirements are extracted, an abnormal composition graph and a isomorphic composition graph can be constructed according to the extraction results, the relevance between the extracted data is visually displayed, meanwhile, the isomorphic composition graph can obviously display the relation center of the data extracted at this time, and the complexity of subsequent analysis and processing is simplified.

Referring to fig. 8, fig. 8 is a block diagram illustrating functional modules of an artificial intelligence-based multi-database data extraction apparatus according to an embodiment of the present disclosure. As shown in fig. 8, the artificial intelligence based multi-database data extraction apparatus 800 includes:

an extension module 801, configured to determine a feature field according to an extraction task, where the feature field is a general field of data that the extraction task requires to extract, perform field extension on the feature field to obtain x extension fields, and combine the feature field and the x extension fields to obtain a feature field group, where x is an integer greater than or equal to 1;

the clustering module 802 is configured to determine multiple databases according to the extraction task, and perform clustering on the multiple databases to obtain z database groups, where z is an integer greater than or equal to 1;

the extracting module 803 is configured to perform data extraction in each of the z database clusters according to the feature field group to obtain z extraction results, where the z extraction results are in one-to-one correspondence with the z database clusters;

the processing module 804 is used for generating an extraction result heteromorphic graph according to the z extraction results, and adjusting the extraction result heteromorphic graph to obtain an extraction result isomorphic graph;

and a display module 805, configured to send the z extraction results and the extraction result isomorphism map to a display device for displaying.

In the embodiment of the present invention, in terms of performing field extension on the feature field to obtain x extended fields, the extension module 801 is specifically configured to:

performing word segmentation processing on the characteristic field to obtain h first words, wherein h is an integer greater than or equal to 1;

performing semantic extraction on each first word according to adjacent words of each first word in the h first words in the feature field to obtain h semantic vectors, wherein the h semantic vectors correspond to the h first words one by one;

matching in a preset expansion library according to the semantic vector corresponding to each first word to obtain h second word groups, wherein the h second word groups correspond to the h first words one by one;

and sequencing the terms in the h second term groups according to the position sequence of each first term in the characteristic field to obtain x expansion fields.

In an embodiment of the present invention, in terms of extracting semantics of each first term according to an adjacent term of each first term in the feature field in the h first terms to obtain h semantic vectors, the extension module 801 is specifically configured to:

performing word embedding processing on each first word and adjacent words of each first word respectively to obtain a first word vector corresponding to each first word and adjacent word vectors corresponding to adjacent words;

determining word lengths of adjacent words, and multiplying the word lengths of the adjacent words and adjacent word vectors to obtain second word vectors;

and splicing the first word vector and the second word vector to obtain the semantic vector of each first word.

In an embodiment of the present invention, in determining a plurality of databases according to the extraction task, and performing clustering processing on the plurality of databases to obtain z database clusters, the clustering module 802 is specifically configured to:

determining a service type corresponding to the data to be extracted according to the extraction task;

determining a plurality of databases in a database group according to the service types;

determining a relation score between a database A and a database B in a plurality of databases to obtain a k relation score, wherein the database A and the database B are any two different databases in the plurality of databases, each relation score in the k relation scores is used for identifying the association degree between data stored in the two databases corresponding to each relation score, and k is an integer greater than or equal to 1;

and clustering the plurality of databases according to the k relation scores to obtain z database groups.

In an embodiment of the present invention, each of the z extraction results includes q extraction data, where q is an integer greater than or equal to 1. Based on this, in the aspect of generating the extraction result heteromorphic graph according to the z extraction results and adjusting the extraction result heteromorphic graph to obtain the extraction result isomorphic graph, the processing module 804 is specifically configured to:

extracting keywords from each piece of the zxq extracted data to obtain zxq first keywords, wherein the zxq first keywords correspond to the zxq extracted data one by one;

carrying out duplicate removal processing on the zxq first keywords to obtain p second keywords, wherein p is an integer which is greater than or equal to 1 and less than or equal to zxq;

taking p second keywords as p word nodes, taking zxq extracted data as zxq data nodes, and connecting each data node of the zxq data nodes with the word node corresponding to each data node to obtain a result heteromorphic graph;

in the result abnormal composition graph, all data nodes connected with each word node in the p word nodes are connected with each other, and the p word nodes are deleted to obtain a result abnormal composition graph.

In the embodiment of the present invention, in terms of performing data extraction in each database cluster of z database clusters according to the feature field group to obtain z extraction results, the extraction module 803 is specifically configured to:

and performing data extraction in each database group through the robot process automatic crawler to obtain z extraction results.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 9, the electronic device 900 includes a transceiver 901, a processor 902, and a memory 903. Connected to each other by a bus 904. The memory 903 is used to store computer programs and data, and may transfer the data stored in the memory 903 to the processor 902.

The processor 902 is configured to read the computer program in the memory 903 to perform the following operations:

In an embodiment of the present invention, in field expanding the feature field to obtain x expanded fields, the processor 902 is specifically configured to perform the following operations:

In an embodiment of the present invention, in terms of performing semantic extraction on each first term according to an adjacent term of each first term in the h first terms in the feature field to obtain h semantic vectors, the processor 902 is specifically configured to perform the following operations:

In an embodiment of the present invention, in determining a plurality of databases according to the extraction task, and performing clustering on the plurality of databases to obtain z database clusters, the processor 902 is specifically configured to perform the following operations:

In an embodiment of the present invention, each of the z extraction results includes q extraction data, where q is an integer greater than or equal to 1. Based on this, in terms of generating the extraction result heteromorphic graph according to the z extraction results and adjusting the extraction result heteromorphic graph to obtain the extraction result isomorphic graph, the processor 902 is specifically configured to perform the following operations:

In an embodiment of the present invention, in terms of performing data extraction in each of z database clusters according to the feature field set to obtain z extraction results, the processor 902 is specifically configured to perform the following operations:

It should be understood that the multiple database data extraction device based on artificial intelligence in the present application may include a smart Phone (e.g., an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile Internet device MID (Mobile Internet Devices, abbreviated as MID), a robot, or a wearable device, etc. The artificial intelligence based multi-database data extraction apparatus is merely an example, and not an exhaustive list, and includes but is not limited to the artificial intelligence based multi-database data extraction apparatus. In practical applications, the apparatus for extracting multiple database data based on artificial intelligence may further include: intelligent vehicle-mounted terminal, computer equipment and the like.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention can be implemented by combining software and a hardware platform. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments.

Accordingly, the present application also provides a computer readable storage medium, which stores a computer program, wherein the computer program is executed by a processor to implement part or all of the steps of any one of the artificial intelligence based multiple database data extraction methods as described in the above method embodiments. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, and the like.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the artificial intelligence based multiple database data extraction methods as described in the above method embodiments.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are all alternative embodiments and that the acts and modules referred to are not necessarily required by the application.

In the above embodiments, the description of each embodiment has its own emphasis, and for parts not described in detail in a certain embodiment, reference may be made to the description of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, and the memory may include: flash Memory disks, Read-only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the methods and their core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A method for extracting data from multiple databases based on artificial intelligence, the method comprising:

determining a characteristic field according to an extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task;

performing field extension on the characteristic field to obtain x extension fields, and combining the characteristic field and the x extension fields to obtain a characteristic field group, wherein x is an integer greater than or equal to 1;

performing data extraction in each database group of the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results are in one-to-one correspondence with the z database groups;

generating an extraction result heterogeneous graph according to the z extraction results, and adjusting the extraction result heterogeneous graph to obtain an extraction result homogeneous graph;

2. The method of claim 1, wherein the field expanding the characteristic field to obtain x expanded fields comprises:

performing semantic extraction on each first word according to the adjacent word of each first word in the h first words in the feature field to obtain h semantic vectors, wherein the h semantic vectors correspond to the h first words one by one;

and sequencing the terms in the h second term groups according to the position sequence of each first term in the characteristic field to obtain the x extension fields.

3. The method of claim 2, wherein the semantic extracting each of the h first terms according to its neighboring terms in the feature field to obtain h semantic vectors comprises:

performing word embedding processing on each first word and adjacent words of each first word respectively to obtain a first word vector corresponding to each first word and an adjacent word vector corresponding to the adjacent words;

determining the word length of the adjacent words, and multiplying the word length of the adjacent words and the adjacent word vector to obtain a second word vector;

4. The method of claim 1, wherein determining a plurality of databases according to the extraction task and clustering the plurality of databases to obtain z database clusters comprises:

determining the databases in a database group according to the service types;

determining a relationship score between a database A and a database B in the plurality of databases to obtain k relationship scores, wherein the database A and the database B are any two different databases in the plurality of databases, each relationship score in the k relationship scores is used for identifying the association degree between data stored in the two databases corresponding to each relationship score, and k is an integer greater than or equal to 1;

and clustering the plurality of databases according to the k relation scores to obtain the z database groups.

5. The method of claim 1,

each of the z extraction results comprises q extraction data, wherein q is an integer greater than or equal to 1;

the generating an extraction result heterogeneous graph according to the z extraction results, and adjusting the extraction result heterogeneous graph to obtain an extraction result homogeneous graph comprises:

taking the p second keywords as p word nodes, taking the zxq extracted data as zxq data nodes, and connecting each data node in the zxq data nodes with the word node corresponding to each data node to obtain the result heteromorphic graph;

and in the result abnormal composition graph, connecting all data nodes connected with each word node in the p word nodes with each other, and deleting the p word nodes to obtain the result abnormal composition graph.

6. The method of claim 1, wherein said extracting data from each of the z database clusters according to the set of characteristic fields to obtain z extraction results comprises:

and performing data extraction in each database group through a robot process automatic crawler to obtain the z extraction results.

7. An artificial intelligence based multi-database data extraction apparatus, the apparatus comprising:

the system comprises an extension module, a data extraction module and a data extraction module, wherein the extension module is used for determining a characteristic field according to an extraction task, the characteristic field is a general field of data required to be extracted by the extraction task, field extension is carried out on the characteristic field to obtain x extension fields, and the characteristic field and the x extension fields are combined to obtain a characteristic field group, wherein x is an integer greater than or equal to 1;

the processing module is used for generating an extraction result heterogeneous graph according to the z extraction results and adjusting the extraction result heterogeneous graph to obtain an extraction result homogeneous graph;

and the display module is used for sending the z extraction results and the extraction result isomorphic graph to display equipment for displaying.

8. The apparatus according to claim 7, wherein, in the field expanding the feature field to obtain x expanded fields, the expanding module is specifically configured to:

9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, the one or more programs including instructions for performing the steps in the method of any of claims 1-6.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method according to any one of claims 1-6.