CN114238572B

CN114238572B - Multi-database data extraction method and device based on artificial intelligence and electronic equipment

Info

Publication number: CN114238572B
Application number: CN202111536087.0A
Authority: CN
Inventors: 陈蔚然
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-12-15
Filing date: 2021-12-15
Publication date: 2024-04-16
Anticipated expiration: 2041-12-15
Also published as: CN114238572A

Abstract

The application discloses a multi-database data extraction method and device based on artificial intelligence and electronic equipment, wherein the method comprises the following steps: determining a characteristic field according to the extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task; performing field expansion on the feature field to obtain x expansion fields, and combining the feature field and the x expansion fields to obtain a feature field group; determining a plurality of databases according to the extraction task, and clustering the databases to obtain z database groups; carrying out data extraction in each of the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results are in one-to-one correspondence with the z database groups; generating an extraction result iso-graph according to the z extraction results, and adjusting the extraction result iso-graph to obtain an extraction result isomorphic graph; and transmitting the z extraction results and the isomorphic diagrams of the extraction results to display equipment for display.

Description

Multi-database data extraction method and device based on artificial intelligence and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a multi-database data extraction method and device based on artificial intelligence and electronic equipment.

Background

Currently, in the data management work, an indispensable step is to group and summarize indexes of a plurality of departments, which needs to summarize data containing target indexes or fields, and the synonyms and the indexes or fields of the target indexes or fields one by one. However, due to the numerous departments, the systems and databases used by the departments are different, and the traditional extraction method cannot extract data in various different systems, namely, the data extraction methods among different databases cannot be commonly used, so that the extraction efficiency is low.

Disclosure of Invention

In order to solve the problems in the prior art, the embodiment of the application provides a multi-database data extraction method and device based on artificial intelligence and electronic equipment, which solve the problem that the data extraction modes among different databases cannot be commonly used and improve the data extraction efficiency.

In a first aspect, an embodiment of the present application provides an artificial intelligence based multi-database data extraction method, including:

Determining a characteristic field according to the extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task;

Performing field expansion on the characteristic field to obtain x expansion fields, and combining the characteristic field and the x expansion fields to obtain a characteristic field group, wherein x is an integer greater than or equal to 1;

determining a plurality of databases according to the extraction task, and clustering the plurality of databases to obtain z database groups, wherein z is an integer greater than or equal to 1;

Carrying out data extraction in each of the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results are in one-to-one correspondence with the z database groups;

Generating an extraction result iso-graph according to the z extraction results, and adjusting the extraction result iso-graph to obtain an extraction result isomorphic graph;

And transmitting the z extraction results and the isomorphic diagrams of the extraction results to display equipment for display.

In a second aspect, embodiments of the present application provide an artificial intelligence based multi-database data extraction apparatus, comprising:

the expansion module is used for determining a characteristic field according to the extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task, performing field expansion on the characteristic field to obtain x expansion fields, and combining the characteristic field and the x expansion fields to obtain a characteristic field group, wherein x is an integer greater than or equal to 1;

The clustering module is used for determining a plurality of databases according to the extraction task, and clustering the databases to obtain z database groups, wherein z is an integer greater than or equal to 1;

the extraction module is used for carrying out data extraction in each of the z database groups according to the characteristic field groups to obtain z extraction results, wherein the z extraction results are in one-to-one correspondence with the z database groups;

the processing module is used for generating an extraction result iso-graph according to the z extraction results, and adjusting the extraction result iso-graph to obtain an extraction result isomorphic graph;

And the display module is used for sending the z extraction results and the extraction result isomorphic diagrams to display equipment for display.

In a third aspect, an embodiment of the present application provides an electronic device, including: and a processor coupled to the memory, the memory for storing a computer program, the processor for executing the computer program stored in the memory to cause the electronic device to perform the method as in the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, the computer program causing a computer to perform the method as in the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program, the computer being operable to cause a computer to perform a method as in the first aspect.

The implementation of the embodiment of the application has the following beneficial effects:

In the embodiment of the application, the general index or field of the data to be extracted is determined through the extraction task, and then the general index or field is expanded to find out the hyponyms, the synonyms and the like of the general index or field so as to more comprehensively describe the data to be extracted. Then, databases which possibly exist in the data to be extracted are determined through the extraction task, and clustering processing is carried out on the databases, so that the databases which are associated among the services of the stored data are clustered together, the stored data are associated through the association among the services, and then the efficiency of the subsequent data extraction is improved. After all the data meeting the requirements are extracted, an isomorphic diagram and an isomorphic diagram can be constructed according to the extraction results, the correlation among the extracted data is visually displayed, and meanwhile, the isomorphic diagram can obviously display the relationship center of the extracted data, so that the complexity of subsequent analysis and processing is simplified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a hardware architecture of an artificial intelligence based multi-database data extraction device according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a multi-database data extraction method based on artificial intelligence according to an embodiment of the present application;

FIG. 3 is a flow chart of a method for performing field expansion on a feature field to obtain x expansion fields according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of a method for extracting semantics of each first word according to the neighboring words of each first word in the feature field, so as to obtain h semantic vectors corresponding to the h first words one by one;

FIG. 5 is a flow chart of a method for determining multiple databases according to an extraction task and clustering the multiple databases to obtain z database groups according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a resulting iso-pattern provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a resulting isomorphic diagram provided by an embodiment of the application;

FIG. 8 is a functional block diagram of an artificial intelligence based multi-database data extraction device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the present application. All other embodiments, based on the embodiments of the application, which are apparent to those of ordinary skill in the art without inventive faculty, are intended to be within the scope of the application.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art will explicitly and implicitly understand that the embodiments described herein may be combined with other embodiments.

First, referring to fig. 1, fig. 1 is a schematic hardware structure diagram of an artificial intelligence-based multi-database data extraction device according to an embodiment of the present application. The artificial intelligence based multi-database data extraction apparatus 100 includes at least one processor 101, communication lines 102, memory 103, and at least one communication interface 104.

In this embodiment, the processor 101 may be a general-purpose central processing unit (central processing unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program according to the present application.

Communication line 102 may include a pathway to transfer information between the above-described components.

The communication interface 104, which may be any transceiver-like device (e.g., antenna, etc.), is used to communicate with other devices or communication networks, such as ethernet, RAN, wireless local area network (wireless local area networks, WLAN), etc.

The memory 103 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-only memory, EEPROM), a compact disc (compact disc read-only memory) or other optical disc storage, optical disc storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In this embodiment, the memory 103 may be independently provided and connected to the processor 101 via the communication line 102. Memory 103 may also be integrated with processor 101. The memory 103 provided by embodiments of the present application may generally have non-volatility. The memory 103 is used for storing computer-executable instructions for executing the scheme of the present application, and is controlled by the processor 101 to execute the instructions. The processor 101 is configured to execute computer-executable instructions stored in the memory 103 to implement the methods provided in the embodiments of the present application described below.

In alternative embodiments, computer-executable instructions may also be referred to as application code, as the application is not particularly limited.

In alternative embodiments, processor 101 may include one or more CPUs, such as CPU0 and CPU1 in fig. 1.

In alternative embodiments, the artificial intelligence based multi-database data extraction apparatus 100 may include multiple processors, such as processor 101 and processor 107 in FIG. 1. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).

In an alternative embodiment, if the multi-database data extraction device 100 based on artificial intelligence is a server, for example, it may be a stand-alone server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (ContentDelivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The artificial intelligence based multi-database data extraction apparatus 100 may further include an output device 105 and an input device 106. The output device 105 communicates with the processor 101 and may display information in a variety of ways. For example, the output device 105 may be a Liquid Crystal Display (LCD) CRYSTAL DISPLAY, a Light Emitting Diode (LED) display device, a Cathode Ray Tube (CRT) display device, or a projector (projector), or the like. The input device 106 is in communication with the processor 101 and may receive user input in a variety of ways. For example, the input device 106 may be a mouse, a keyboard, a touch screen device, a sensing device, or the like.

The artificial intelligence based multi-database data extraction apparatus 100 described above may be a general purpose device or a dedicated device. Embodiments of the present application are not limited in the type of database data extraction device 100 based on artificial intelligence.

Secondly, it should be noted that, the embodiment of the present disclosure may acquire and process related data based on artificial intelligence technology. Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.

Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

The method for extracting database data based on artificial intelligence disclosed by the application is described below:

referring to fig. 2, fig. 2 is a schematic flow chart of a multi-database data extraction method based on artificial intelligence according to an embodiment of the present application. The multi-database data extraction method based on artificial intelligence comprises the following steps:

201: and determining the characteristic field according to the extraction task.

In this embodiment, the feature field may be a general field of data that the extraction task requires to extract. By way of example, the characteristic field may be in the format of a data file, e.g., for a picture file, the special field may be "jpg", "png", "tif", etc.; the feature field may also be a specific value of a certain feature, for example, the present extraction task is to extract loan data of all male clients, and then the feature field may be "gender: man "; in addition, the feature field may also be a general propaganda of a certain activity, for example, the extracting task is to extract activity data of all twenty-one discount activities, and then the feature field may be some propaganda commonly used in the activities: "double eleven folding activities" and the like.

202: And carrying out field expansion on the characteristic field to obtain x expansion fields, and combining the characteristic field and the x expansion fields to obtain a characteristic field group.

In this embodiment, x is an integer greater than or equal to 1, and each of the x extension fields is a synonymous field or a near-sense field of the feature field. In short, the synonyms and the paraphrases of the feature fields can be found out by carrying out synonymous or paraphrases expansion on the feature fields, so that the data needing to be extracted are more comprehensively described, and the finally extracted data is more comprehensive.

Specifically, in this embodiment, a method for performing field extension on a feature field to obtain x extension fields is provided, as shown in fig. 3, where the method includes:

301: and performing word segmentation processing on the characteristic field to obtain h first words.

In this embodiment, h is an integer of 1 or more. For example, in the present embodiment, the feature field may be segmented by using N-gram segmentation methods of the element numbers 2, 3, and 4, respectively. Specifically, the N-gram segmentation method is a method of segmenting a sentence into a plurality of segment sequences each consisting of N characters, each segment being called an N-gram. The N-gram segmentation method may be referred to as a uni-gram (unitary gram) when n=1, as a bi-gram (binary gram) when n=2, and as a tri-gram (ternary gram) when n=3.

Specifically, along with the example of a feature field that is in progress with the "double eleven discounting activity" described above, if bi-gram is used to segment the feature field, the first word may be derived: "double-ten", "eleven", "dozen", "discounted", "folded active", "in-progress", "proceeding" and "in-line".

In addition, in an alternative embodiment, after the segmentation result is obtained, the segmentation result may be filtered and cleaned, so as to filter out the segmentation result having no meaning. Along the example of "double eleven discounting activity is in progress", the "discounting activity", "moving forward", "in progress" and "in line" in the segmentation result can be filtered out, so as to retain the segmentation result containing a certain semantic meaning, for example: "twenty", "eleven", "dozen", "discounted", "active", "ongoing" and "running" take these retained segmentation results as the final first words.

In addition, in an alternative embodiment, word association can be performed through the context of the segmented first word, so as to obtain a word with new meaning as a supplementary first word. For example, for the first word "double ten", the first word "eleven" which follows in connection with it may be combined to form a first word which contains the word "double eleven" of new meaning as a complement.

302: And carrying out semantic extraction on each first word according to the adjacent word of each first word in the feature field, so as to obtain h semantic vectors corresponding to the h first words one by one.

In this embodiment, the individual semantic extraction of each first word may split the semantics of the first words from each other, and lose their original relevance. Based on this, the present embodiment provides a method for extracting semantics of each first term according to the neighboring term of each first term in the feature field, where the h semantic vectors are in one-to-one correspondence with the h first terms, and may preserve the relevance between each first term, as shown in fig. 4, where the method includes:

401: and respectively carrying out word embedding processing on each first word and adjacent words of each first word to obtain a first word vector corresponding to each first word and an adjacent word vector corresponding to the adjacent word.

In this embodiment, the adjacent words may be left adjacent words and/or right adjacent words. Meanwhile, when the current first word is the first word, the left adjacent word does not exist, so when the adjacent word is defined as the left adjacent word, the adjacent word can be regarded as empty to perform word embedding processing, or the adjacent word is replaced by the right adjacent word from the left adjacent word to perform word embedding processing. Similarly, when the current first word is the last word, the right adjacent word does not exist, so when the adjacent word is defined as the right adjacent word, the word embedding process can be performed by considering the adjacent word as empty, or the word embedding process can be performed by replacing the adjacent word with the left adjacent word.

402: Determining word lengths of adjacent words, and multiplying the word lengths of the adjacent words by the adjacent word vectors to obtain second word vectors.

Specifically, if the adjacent word vector corresponding to the adjacent word is [213456] and the word length is 2, the second word vector is: [213423] x 2= [426846].

403: And splicing the first word vector and the second word vector to obtain the semantic vector of each first word.

In this embodiment, the first word vector and the second word vector may be transversely spliced according to the positional relationship between the first word and the adjacent word, so as to obtain a semantic vector of each first word. For example, if the neighboring word is defined as a left neighboring word and the second word vector is [426846], and the first word vector corresponding to the first word is [11223344], then since the neighboring word is located on the left side of the first word, the semantic vector of the first word may be: [42684611223344]. Therefore, the semantic vectors of the adjacent words of each first word are disturbed by the adjacent words of each first word, so that the semantic vectors of each first word contain the semantics of the adjacent words, the relevance between the two is deeply mined, and the problem that the semantic vectors of h first words are mutually split is avoided.

303: And matching in a preset expansion library according to the semantic vector corresponding to each first word to obtain h second word groups corresponding to the h first words one by one.

In this embodiment, a set of common vocabularies predetermined by an expert are stored in a preset expansion library, and the common vocabularies can be classified and stored according to different application fields. Therefore, when matching is performed, the application field corresponding to the data extracted by the task can be determined through the extraction task, and then the common vocabulary corresponding to the application field is called for matching. And the matching efficiency is improved, and meanwhile, the second words in the obtained second word group are more accurate.

304: And ordering the words in the h second word groups according to the position sequence of each first word in the characteristic field to obtain x extension fields.

In this embodiment, the corresponding first words in the original feature field may be replaced with the second words in a certain second word group, respectively, to obtain the extension field. Each first word in the original characteristic field can be replaced by a certain second word in the corresponding second word group to obtain an extension field.

Meanwhile, in an alternative embodiment, the original feature field can be subjected to semantic extraction to obtain a semantic vector, then the semantic vector of the extension field generated later is subjected to similarity calculation, and then the extension field with overlarge semantic deviation is removed.

203: And determining a plurality of databases according to the extraction task, and clustering the plurality of databases to obtain z database groups.

In this embodiment, z is an integer of 1 or more. Meanwhile, the present embodiment also provides a method for determining a plurality of databases according to an extraction task, and clustering the plurality of databases to obtain z database groups, as shown in fig. 5, where the method includes:

501: and determining the service type corresponding to the data to be extracted according to the extraction task.

502: A plurality of databases is determined in a database group according to the type of service.

In this embodiment, the plurality of databases are databases associated with the service presence corresponding to the service type determined in step 501. In brief, the plurality of databases may be databases storing data generated by the service of the service type or providing data for the service of the service type. For example, the corresponding service table of each database may be checked to determine whether there is a service with the same service type as the service type determined in step 501 in the service to which it is docked, and if so, the database is taken as a member of the plurality of databases.

503: And determining the relationship scores between the database A and the database B in the plurality of databases to obtain k relationship scores.

In this embodiment, the database a and the database B are any two different databases among the plurality of databases, each of the k relationship scores is used to identify a degree of association between data stored in the two databases corresponding to each relationship score, and k is an integer greater than or equal to 1.

Specifically, in the present embodiment, the common traffic amount between the database a and the database B may be determined, and at the same time, the total traffic amount of the database a and the total traffic amount of the database B may be determined, respectively. Thus, the ratio of the common business quantity of the database A to the business quantity and the ratio of the common business quantity of the database B to the total business quantity are summed to obtain the relation score between the database A and the database B.

That is, the relationship score may be represented by formula ①:

Where g represents a relationship score between database A and database B, j represents a common traffic number between database A and database B, s _A represents a total traffic number of database A, and s _B represents a total traffic number of database B.

504: And clustering the databases according to the k relation scores to obtain z database groups.

In this embodiment, a plurality of databases with a relationship score greater than a preset threshold may be clustered into one database group. Therefore, databases of the same service can be clustered together, and the efficiency of subsequent data extraction is improved.

In alternative embodiments, the database may also be categorized by a database relationship table. For example, the primary database and its corresponding backup database are classified as one type, or databases serving the same business chain are classified as one type. In addition, when the number of the determined databases is 1, the clustering process can be directly skipped, and the subsequent data extraction can be directly performed.

204: And carrying out data extraction in each of the z database groups according to the characteristic field group to obtain z extraction results which are in one-to-one correspondence with the z database groups.

In this embodiment, an artificial intelligence based index identifier is provided for data extraction for each of the z database clusters. Specifically, the index identifier may be a robotic process automation (Robotic process automation, RPA) crawler that may fly between different databases and view fields, indexes in various tables. And then calculating the similarity between the inspected index and the fields in the characteristic field group, and extracting the data with the similarity meeting the set condition to obtain an extraction result.

205: Generating an extraction result iso-composition according to the z extraction results, and adjusting the extraction result iso-composition to obtain an extraction result isomorphic diagram.

In the present embodiment, each of the z extraction results may include q extraction data, q being an integer greater than or equal to 1. Based on this, the z extraction results include z×q extraction data in total.

In view of this, in the present embodiment, the extraction of the keywords may be performed for each of the z×q pieces of extraction data, and z×q pieces of first keywords may be obtained. In short, there is a one-to-one correspondence between z×q first keywords and z×q extracted data. Then, performing de-duplication processing on the zxq first keywords to obtain p second keywords, wherein p is an integer greater than or equal to 1 and less than or equal to zxq. And then, taking p second keywords as p word nodes, taking z×q extracted data as z×q data nodes, and connecting each data node in the z×q data nodes with the word node corresponding to each data node to obtain a result heterogram. And finally, in the result heterograms, connecting each data node connected with each word node in the p word nodes, and deleting the p word nodes to obtain a result isomorphic graph.

Specifically, assume that there are 5 extracted data: data 1, data 2, data 3, data 4, and data 5. The keywords of the data 1 are "loan" and "interest", the keywords of the data 2 are "foundation" and "interest", the keywords of the data 3 are "loan", the keywords of the data 4 are "interest", and the keywords of the data 5 are "foundation". Through the repeated processing, 3 keywords of loan, fund and interest, which are not repeated mutually, are obtained and are respectively used as word nodes. Meanwhile, data 1, data 2, data 3, data 4 and data 5 are respectively used as data nodes, and each data node is connected with the corresponding word node, so that a result heterograph shown in fig. 6 can be obtained.

Based on the result heterogram of fig. 6, the data nodes connected with each word node in the graph are connected with each other, each word node and the connecting line between each word node and the corresponding data node are deleted, and the result isomorphic graph shown in fig. 7 can be obtained. Meanwhile, as can be seen from the isomorphic diagram of the result, the most line segments are led out by the data 2 and the data 1, so that the correlation between the data 2 and the data 1 and each data extracted at this time is the strongest. Therefore, the relationship center of the extracted data can be obviously displayed through the result isomorphic diagram, and the complexity of subsequent analysis and processing is simplified.

206: And transmitting the z extraction results and the isomorphic diagrams of the extraction results to display equipment for display.

In summary, in the multi-database data extraction method based on artificial intelligence provided by the invention, the general index or field of the data to be extracted is determined through the extraction task, and then the general index or field is expanded to find out the hyponyms, the synonyms and the like of the general index or field so as to more comprehensively describe the data to be extracted. Then, databases which possibly exist in the data to be extracted are determined through the extraction task, and clustering processing is carried out on the databases, so that the databases which are associated among the services of the stored data are clustered together, the stored data are associated through the association among the services, and then the efficiency of the subsequent data extraction is improved. Meanwhile, the data in different databases are crawled by combining the RPA crawler with the AI field identification, so that the problem that the data extraction modes among different databases cannot be universal is solved, repeated labor is replaced by the RPA crawler, and the data extraction efficiency is further improved more efficiently. Finally, after all the data meeting the requirements are extracted, an isomorphic graph and an isomorphic graph can be constructed according to the extraction results, the correlation among the extracted data is visually displayed, and meanwhile, the isomorphic graph can obviously display the relationship center of the extracted data, so that the complexity of subsequent analysis and processing is simplified.

Referring to fig. 8, fig. 8 is a functional block diagram of a multi-database data extraction device based on artificial intelligence according to an embodiment of the present application. As shown in fig. 8, the artificial intelligence based multi-database data extraction apparatus 800 includes:

An extension module 801, configured to determine a feature field according to an extraction task, where the feature field is a general field of data that the extraction task requires to extract, perform field extension on the feature field to obtain x extension fields, and combine the feature field and the x extension fields to obtain a feature field group, where x is an integer greater than or equal to 1;

A clustering module 802, configured to determine a plurality of databases according to the extraction task, and perform clustering processing on the plurality of databases to obtain z database groups, where z is an integer greater than or equal to 1;

The extraction module 803 is configured to perform data extraction in each of the z database groups according to the feature field set to obtain z extraction results, where the z extraction results are in one-to-one correspondence with the z database groups;

the processing module 804 is configured to generate an extraction result iso-graph according to the z extraction results, and adjust the extraction result iso-graph to obtain an extraction result isomorphic graph;

And the display module 805 is configured to send the z extraction results and the isomorphic diagrams of the extraction results to display equipment for display.

In the embodiment of the present invention, in field expansion is performed on the feature field to obtain x expansion fields, the expansion module 801 is specifically configured to:

performing word segmentation processing on the characteristic field to obtain h first words, wherein h is an integer greater than or equal to 1;

carrying out semantic extraction on each first word according to the adjacent word of each first word in the feature field in the h first words to obtain h semantic vectors, wherein the h semantic vectors are in one-to-one correspondence with the h first words;

Matching in a preset expansion library according to the semantic vector corresponding to each first word to obtain h second word groups, wherein the h second word groups are in one-to-one correspondence with the h first words;

And ordering the words in the h second word groups according to the position sequence of each first word in the characteristic field to obtain x extension fields.

In the embodiment of the present invention, in terms of extracting semantics of each first word according to the neighboring word of each first word in the feature field, to obtain h semantic vectors, the expansion module 801 is specifically configured to:

Word embedding processing is carried out on each first word and adjacent words of each first word respectively to obtain a first word vector corresponding to each first word and an adjacent word vector corresponding to the adjacent word;

determining word lengths of adjacent words, and multiplying the word lengths of the adjacent words by adjacent word vectors to obtain second word vectors;

And splicing the first word vector and the second word vector to obtain the semantic vector of each first word.

In the embodiment of the present invention, in determining a plurality of databases according to the extraction task, and performing clustering processing on the plurality of databases to obtain z database groups, the clustering module 802 is specifically configured to:

Determining a service type corresponding to data to be extracted according to the extraction task;

determining a plurality of databases in a database group according to the service type;

Determining relationship scores between a database A and a database B in a plurality of databases to obtain k relationship scores, wherein the database A and the database B are any two different databases in the plurality of databases, each relationship score in the k relationship scores is used for identifying the association degree between data stored in the two databases corresponding to each relationship score, and k is an integer greater than or equal to 1;

And clustering the databases according to the k relation scores to obtain z database groups.

In an embodiment of the present invention, each of the z extraction results includes q extraction data, where q is an integer greater than or equal to 1. Based on this, in generating an extraction result iso-graph according to z extraction results, and adjusting the extraction result iso-graph to obtain an extraction result isomorphic graph, the processing module 804 is specifically configured to:

Extracting keywords from each extracted data in the z×q extracted data to obtain z×q first keywords, wherein the z×q first keywords are in one-to-one correspondence with the z×q extracted data;

Performing de-duplication treatment on the z×q first keywords to obtain p second keywords, wherein p is an integer greater than or equal to 1 and less than or equal to z×q;

Taking p second keywords as p word nodes, taking z×q extracted data as z×q data nodes, and connecting each data node in the z×q data nodes with the word node corresponding to each data node to obtain a result heterogram;

In the result heterograms, each data node connected with each word node in the p word nodes is connected with each other, and the p word nodes are deleted, so that a result isomorphic graph is obtained.

In the embodiment of the present invention, in extracting data in each of the z database groups according to the feature field set to obtain z extraction results, the extraction module 803 is specifically configured to:

and carrying out data extraction in each database group by using the robot flow automatic crawler to obtain z extraction results.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the electronic device 900 includes a transceiver 901, a processor 902, and a memory 903. Which are connected by a bus 904. The memory 903 is used to store computer programs and data, and the data stored in the memory 903 may be transferred to the processor 902.

The processor 902 is configured to read a computer program in the memory 903 to perform the following operations:

In the embodiment of the present invention, the processor 902 is specifically configured to perform the following operations in performing field extension on the feature field to obtain x extension fields:

In the embodiment of the present invention, in terms of extracting semantics of each first word according to the neighboring word of each first word in the feature field, to obtain h semantic vectors, the processor 902 is specifically configured to perform the following operations:

In the embodiment of the present invention, in determining a plurality of databases according to the extraction task, and performing clustering processing on the plurality of databases to obtain z database groups, the processor 902 is specifically configured to perform the following operations:

In an embodiment of the present invention, each of the z extraction results includes q extraction data, where q is an integer greater than or equal to 1. Based on this, the processor 902 is specifically configured to perform the following operations in generating an extraction result isomorphic map according to z extraction results, and adjusting the extraction result isomorphic map to obtain an extraction result isomorphic map:

In an embodiment of the present invention, the processor 902 is specifically configured to perform the following operations in extracting data from each of the z database groups according to the feature field set to obtain z extraction results:

It should be understood that the multi-database data extraction device based on artificial intelligence in the present application may include a smart Phone (such as an Android Phone, an iOS Phone, a Windows Phone, etc.), a tablet computer, a palm computer, a notebook computer, a Mobile internet device MID (Mobile INTERNET DEVICES, abbreviated as MID), a robot, a wearable device, etc. The above-described artificial intelligence-based multi-database data extraction apparatus is merely exemplary and not exhaustive, and includes, but is not limited to, the above-described artificial intelligence-based multi-database data extraction apparatus. In practical application, the multi-database data extraction device based on artificial intelligence may further include: intelligent vehicle terminals, computer devices, etc.

From the above description of embodiments, it will be apparent to those skilled in the art that the present invention may be implemented in software in combination with a hardware platform. With such understanding, all or part of the technical solution of the present invention contributing to the background art may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in the various embodiments or parts of the embodiments of the present invention.

Accordingly, embodiments of the present application also provide a computer readable storage medium storing a computer program for execution by a processor to perform part or all of the steps of any one of the artificial intelligence based multi-database data extraction methods described in the method embodiments above. For example, the storage medium may include a hard disk, a floppy disk, an optical disk, a magnetic tape, a magnetic disk, a flash memory, etc.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform part or all of the steps of any one of the artificial intelligence-based multi-database data extraction methods described in the method embodiments above.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are alternative embodiments, and that the acts and modules involved are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional divisions when actually implemented, such as multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.

The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product or all or part of the technical solution, which is stored in a memory, and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned memory includes: a usb disk, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, and the memory may include: flash disk, read-only memory (English: read-OnlyMemor, ROM for short), random access device (English: random Access Memory, RAM for short), magnetic disk or optical disk.

The foregoing has outlined rather broadly the more detailed description of the embodiments of the application in order that the detailed description of the principles and embodiments of the application may be implemented in conjunction with the detailed description of the embodiments that follows, the claims being merely intended to facilitate the understanding of the method and concepts underlying the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A method for extracting database data based on artificial intelligence, the method comprising:

Determining a characteristic field according to an extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task;

Carrying out data extraction in each database group in the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results are in one-to-one correspondence with the z database groups, each extraction result in the z extraction results comprises q extraction data, and q is an integer greater than or equal to 1;

Extracting keywords from each extracted data in z×q extracted data to obtain z×q first keywords, wherein the z×q first keywords are in one-to-one correspondence with the z×q extracted data;

Performing de-duplication treatment on the zxq first keywords to obtain p second keywords, wherein p is an integer greater than or equal to 1 and less than or equal to zxq;

Taking the p second keywords as p word nodes, taking the z×q extracted data as z×q data nodes, and connecting each data node in the z×q data nodes with the word node corresponding to each data node to obtain an extraction result heterogram;

in the result iso-graph, connecting each data node connected with each word node in the p word nodes, and deleting the p word nodes to obtain an extraction result isomorphic graph;

and sending the z extraction results and the isomorphic diagrams of the extraction results to display equipment for display.

2. The method of claim 1, wherein performing field expansion on the feature field to obtain x expansion fields includes:

According to the adjacent words of each first word in the h first words in the characteristic field, carrying out semantic extraction on each first word to obtain h semantic vectors, wherein the h semantic vectors are in one-to-one correspondence with the h first words;

And sequencing the words in the h second word groups according to the position sequence of each first word in the characteristic field to obtain the x extension fields.

3. The method of claim 2, wherein the performing semantic extraction on each of the h first words according to the neighboring words of each of the h first words in the feature field to obtain h semantic vectors includes:

Determining word lengths of the adjacent words, and multiplying the word lengths of the adjacent words by the adjacent word vectors to obtain second word vectors;

4. The method according to claim 1, wherein determining a plurality of databases according to the extraction task, and clustering the plurality of databases to obtain z database groups, includes:

Determining the databases in the database group according to the service type;

Determining the relationship scores between a database A and a database B in the plurality of databases to obtain k relationship scores, wherein the database A and the database B are any two different databases in the plurality of databases, each relationship score in the k relationship scores is used for identifying the association degree between the data stored in the two databases corresponding to each relationship score, and k is an integer greater than or equal to 1;

And clustering the databases according to the k relation scores to obtain the z database groups.

5. The method according to claim 1, wherein the performing data extraction in each of the z database groups according to the feature field set to obtain z extraction results includes:

And carrying out data extraction in each database group by using a robot flow automatic crawler to obtain the z extraction results.

6. An artificial intelligence based multi-database data extraction apparatus, the apparatus comprising:

the expansion module is used for determining a characteristic field according to an extraction task, wherein the characteristic field is a general field of data required to be extracted by the extraction task, performing field expansion on the characteristic field to obtain x expansion fields, and combining the characteristic field and the x expansion fields to obtain a characteristic field group, wherein x is an integer greater than or equal to 1;

The extraction module is used for carrying out data extraction in each database group in the z database groups according to the characteristic field group to obtain z extraction results, wherein the z extraction results are in one-to-one correspondence with the z database groups, each extraction result in the z extraction results comprises q extraction data, and q is an integer greater than or equal to 1;

the processing module is used for extracting keywords from each piece of extracted data in the z×q pieces of extracted data to obtain z×q pieces of first keywords, wherein the z×q pieces of first keywords are in one-to-one correspondence with the z×q pieces of extracted data; performing de-duplication treatment on the zxq first keywords to obtain p second keywords, wherein p is an integer greater than or equal to 1 and less than or equal to zxq; taking the p second keywords as p word nodes, taking the z×q extracted data as z×q data nodes, and connecting each data node in the z×q data nodes with the word node corresponding to each data node to obtain an extraction result heterogram; in the result iso-graph, connecting each data node connected with each word node in the p word nodes, and deleting the p word nodes to obtain an extraction result isomorphic graph;

7. The apparatus of claim 6, wherein, in performing field expansion on the feature field to obtain x expansion fields, the expansion module is specifically configured to:

8. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the processor, the one or more programs comprising instructions for performing the steps of the method of any of claims 1-5.

9. A readable computer storage medium, characterized in that the readable computer storage medium stores a computer program, which is executed by a processor to implement the method of any of claims 1-5.