CN114579104A

CN114579104A - Data analysis scene generation method, device, equipment and storage medium

Info

Publication number: CN114579104A
Application number: CN202210207332.1A
Authority: CN
Inventors: 李田雨
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-06-03

Abstract

The invention discloses a method, a device, equipment and a storage medium for generating a data analysis scene. The method comprises the following steps: receiving a scene demand file to be processed, and performing semantic analysis and data extraction processing on the scene demand file by using a pre-training language model to obtain data analysis scene information corresponding to the scene demand file; acquiring a candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information; and assembling a data query statement and a front-end component code of the data analysis scene corresponding to the scene demand file according to the candidate scene information in the candidate scene information set. According to the technical scheme, the beneficial effects of automatically creating the adaptive data analysis scene and the front-end code frame according to the scene demand file are achieved.

Description

Data analysis scene generation method, device, equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for generating a data analysis scenario.

Background

In a transaction management system, business data analysis scenes of all service lines are designed and realized mainly in a mode of manually analyzing scenes, manually searching data sources and the like.

However, in the existing demand analysis process related to the business data analysis function, due to the lack of knowledge of data analysis by business personnel, the business data analysis scenario that can be proposed is limited in a very limited range. This imposes a certain limit on the development of data analysis functions, because the proposed scenarios are limited, resulting in a not wide enough data coverage and a large amount of data and data tables in an unutilized state.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for generating a data analysis scene, which are used for solving the problem that the commercial data analysis scene is limited when commercial data is manually analyzed.

According to an aspect of the present invention, a method for generating a data analysis scenario is provided, including:

receiving a scene demand file to be processed, and performing semantic analysis and data extraction processing on the scene demand file by using a pre-training language model to obtain data analysis scene information corresponding to the scene demand file;

acquiring a candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information;

and assembling a data query statement and a front-end component code of the data analysis scene corresponding to the scene demand file according to the candidate scene information in the candidate scene information set.

According to another aspect of the present invention, there is provided a data analysis scenario generation apparatus, including:

the receiving module is used for receiving a scene demand file to be processed, performing semantic analysis and data extraction processing on the scene demand file by using a pre-training language model, and obtaining data analysis scene information corresponding to the scene demand file;

the acquisition module is used for acquiring a candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information;

and the assembling module is used for assembling a data query statement and a front-end component code of the data analysis scene corresponding to the scene demand file according to the candidate scene information in the candidate scene information set.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the method of generating a data analysis scenario according to any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the method for generating a data analysis scenario according to any one of the embodiments of the present invention when the computer instructions are executed.

According to the technical scheme of the embodiment of the invention, the scene demand file to be processed is received, and the pre-training language model is used for performing semantic analysis and data extraction processing on the scene demand file to obtain data analysis scene information corresponding to the scene demand file; acquiring a candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information; and assembling a data query statement and a front-end component code of a data analysis scene corresponding to the scene demand file according to the candidate scene information in the candidate scene information set, so that the problem that the commercial data analysis scene is limited when commercial data is manually analyzed is solved, and the beneficial effect of automatically creating an adaptive data analysis scene and a front-end code frame according to the scene demand file is achieved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for generating a data analysis scenario according to an embodiment of the present invention;

fig. 2 is a flowchart of another method for generating a data analysis scenario according to a second embodiment of the present invention;

fig. 3 is a schematic structural diagram of a data analysis scenario generation apparatus according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device implementing the method for generating a data analysis scenario according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "target," "candidate," and the like in the description and claims of the invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Fig. 1 is a flowchart of a method for generating a data analysis scenario according to an embodiment of the present invention, where the embodiment is applicable to a situation that a data analysis scenario and a front-end code framework adapted to a newly added scenario requirement file can be automatically generated, and the method can be executed by a data analysis scenario generation device, where the data analysis scenario generation device can be implemented in a form of hardware and/or software, and the device can be configured in an electronic device. As shown in fig. 1, the method includes:

s110, receiving a scene demand file to be processed, and performing semantic analysis and data extraction processing on the scene demand file by using a pre-training language model to obtain data analysis scene information corresponding to the scene demand file.

The scene requirement file can be a newly added data table, a newly added requirement document, an unused historical design document, log data and the like. Pre-trained language models aim at learning semantic-based word-embedding vectors in the natural language processing task. The pre-training language model can capture information such as word semantics, context environment semantics, syntax structure, semantic role, reference relationship and the like, and is an important technology for extracting text data information. Data analysis context information may be understood as a semantic representation relating to a current context.

The data analysis mainly refers to extracting information which has certain commercial value and can provide direct or indirect guidance for commercial operation by means of modeling abstraction and the like by utilizing mass user data. For example, the method is a typical information output of business data analysis, and comprises the following steps of establishing a user portrait which is significant for the understanding of a target user, and analyzing indexes such as liveness, retention rate, conversion rate and profit rate of the business value of the system.

In this embodiment, if the scene requirement file is a new data table or unused log data, in order to automatically generate a data analysis scene applicable to the new data table, a pre-training language model may be used to perform semantic analysis and data extraction processing on the new data table to obtain field information and data dictionary information of the data table, and the field information and the data dictionary information are used as data analysis scene information corresponding to the scene requirement file. For a newly added data analysis scenario requirement document or an unused historical design document, in order to perform data traceability analysis and automatically generate a data analysis scenario suitable for the new requirement document, a data analysis scenario requirement text is input into a pre-training language model for semantic parsing and data extraction, and an extracted semantic parsing result is used as newly added data analysis scenario information.

And S120, acquiring a candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information.

The data analysis scene pool stores description information of data analysis scenes which are initially designed manually and description information of data analysis scenes which are subsequently matched with unused historical demand files automatically according to an artificial intelligence algorithm.

Optionally, the obtaining a candidate scene information set matched with the scene requirement file from a data analysis scene pool based on the data analysis scene information may include: finding out a target historical demand file with the same elements as the scene demand file from all the historical demand files; and for all the target historical demand files, inquiring a data analysis scene for data analysis by using the data in the target historical demand files from a data analysis scene pool, and adding a candidate scene information set.

In this embodiment, because the data analysis scenes corresponding to the similar scene demand files have more similarities, after the data analysis scene information corresponding to the current to-be-processed scene demand file is obtained, the target history demand file having the same field as the current to-be-processed scene demand file can be searched from the existing history demand files, or the target history demand file having the field with high similarity calculated by the pre-training language model and the current to-be-processed scene demand file can be searched. The definition of high similarity is not limited fixedly, and may be set according to the requirement of the scheme, for example, when the similarity between fields is higher than 90%, it is considered that the two are high similarity. And for all target historical demand files, inquiring a scene for data analysis by using the data in the data analysis scene pool, and listing the description information of the scene into a candidate scene information set.

Optionally, after acquiring a candidate scene information set matching the scene requirement file from a data analysis scene pool based on the data analysis scene information, the method may further include: screening out target data query sentences of which the maximum frequent item sets are smaller than a preset threshold value from the candidate scene information sets; and deleting the candidate scene information corresponding to the target data query statement from the candidate scene information set.

The frequent item set refers to a set of subsequences that frequently appear in the data. Frequent item sets are often used in association rule learning scenarios, such as finding shopping combinations in a customer's shopping bar that appear more frequently.

In this embodiment, in order to further improve the accuracy of matching data analysis scenes, after one or more candidate scenes corresponding to a scene requirement file are determined, the data query statements of each candidate scene are verified one by one, and candidate scene information in which fields in the data query statements cannot meet requirements is removed from a candidate scene information set, that is, candidate scene information in which the maximum frequent item set of the data query statements is smaller than a preset threshold is deleted, so that data analysis scenes with higher similarity to a currently processed scene requirement file are left in the set. The value of the preset threshold may be 2, 3 or other values.

S130, assembling a data query statement and a front-end component code of the data analysis scene corresponding to the scene demand file according to the candidate scene information in the candidate scene information set.

In this embodiment, according to a plurality of candidate scene information, a data query statement of a currently processed scene demand file in a data analysis scene may be assembled by integrating multidimensional data across a plurality of demand files, and a front-end page code may be intelligently generated by a slot-filling type code according to data such as a field, a table name, and description information in the candidate scene information, so as to implement page rendering.

Optionally, the assembling a data query statement and a front-end component code of a data analysis scenario corresponding to the scenario requirement file according to the candidate scenario information in the candidate scenario information set may include: assembling a data query statement corresponding to the scene demand file according to the fields, the description information and the table name in the candidate scene information set; sending fields and table names in the candidate scene information set to a front-end component code as parameters; and calculating historical data analysis scenes with similarity greater than a first threshold with the description information in the candidate scene information set, and sending the visualization mode with the most frequent use of the historical data analysis scenes to a front-end component code as a parameter to perform page rendering.

In the embodiment, through intelligent design and backtracking trial calculation of the generated result, an available data analysis scene is automatically generated for the newly added scene demand file and the newly added data table, the feasibility of data analysis is fully measured, a data taking mode which spans multiple tables and integrates multi-dimensional data is designed, and the phenomenon of 'data gap' which actually exists in data but cannot be taken by a demand party is effectively eliminated. Meanwhile, slot filling codes are generated intelligently, historical data are fully mined, the most matched visual chart is automatically selected, and corresponding front-end page codes are generated. The connection between data analysis and page development is opened, and the selection cost and the communication cost of data analysts and developers are greatly reduced.

According to the technical scheme of the embodiment of the invention, the scene requirement file to be processed is received, and the pre-training language model is used for carrying out semantic analysis and data extraction processing on the scene requirement file to obtain data analysis scene information corresponding to the scene requirement file; acquiring a candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information; and assembling a data query statement and a front-end component code of a data analysis scene corresponding to the scene demand file according to the candidate scene information in the candidate scene information set, so that the problem that the commercial data analysis scene is limited when commercial data is manually analyzed is solved, and the beneficial effect of automatically creating an adaptive data analysis scene and a front-end code frame according to the scene demand file is achieved.

Example two

Fig. 2 is a flowchart of another method for generating a data analysis scenario according to a second embodiment of the present invention. On the basis of the above embodiments, the present embodiment further provides a specific step of performing model training on the pre-trained language model. As shown in fig. 2, the method includes:

s210, extracting mapping relation data under different data subject domains from the historical demand file.

The historical requirement file refers to all existing requirement files, and may include a scene requirement document or a data table which is not used for data analysis, or may include a scene requirement document or a data table which is used for data analysis. The data topic field is generally a collection of data topics with relatively close relation, for example, data table fields with the same source, business scenarios with similar logical relation, similar page design for data visualization display, and data query scripts with similar structure and different elements.

Optionally, the mapping relationship data in the different data topic domains includes: mapping relation data of the data and the data dictionary description; mapping relation data of a data analysis scene and a business rule; mapping relation data of the data analysis scene and the data query statement; and mapping relation data of the data query statement and the visualization mode in the data analysis scene.

Illustratively, the mapping relationship data of the data and the data dictionary description is in the form of "[ field C1 of table 1] represents [ bond basic information ]". The mapping relation data of the data analysis scene and the business rules is in the form of [ business rules ] of a [ overseas branch bond index detail query ] scene: and inquiring the foreign currency and bond investment index details of the overseas institution according to the date, the bond code, the currency and the transaction combination. The mapping relation data of the data analysis scene and the data query statement is in the form of 'query statement' of a overseas branch bond index detail analysis scene, namely 'SELECT A1, B2, C3, D4 FROM TABLE1 and TABLE 2'. The mapping relation data of the data query statement and the visualization mode in the data analysis scene is in the form of a scatter diagram in the visualization mode of the [ query statement ] in the overseas branch bond index detail analysis scene.

In the embodiment, through a highly abstract main body framework, the data analysis requirements of the whole database are compatible in a flexible configuration and flexible expansion mode, data dictionary information is introduced as important training data of an artificial intelligence algorithm, and newly added data table information is fully utilized for data analysis scene design.

Considering that if two data subject domains have similar logic structures and metadata designs, data analysis logic can be reused on a fine tuning basis, but a scene, background logic and a data source are not effectively connected in series on the basis of the prior art by a requirement document. Therefore, the embodiment structures the historical design experience information into the structured data assets which are easy to analyze and utilize by integrating three-dimensional information of the requirement document, the metadata mapping and the data dictionary, and effectively supports the construction of the data analysis system under the scene of the invention.

S220, performing data cleaning and data extraction on the mapping relation data, and performing ambiguity avoiding processing on the operated mapping relation data.

Optionally, the performing data cleaning and data extraction on the mapping relationship data, and performing ambiguity avoiding processing on the operated mapping relationship data may include: carrying out data cleaning and data extraction operations on the mapping relation data of the data analysis scene and the service rule and the mapping relation data of the data analysis scene and the data query statement to obtain scene description information; and combining field data dictionary information in the data query statement and corresponding scene description information, and combining the second-order features generated by combination with all mapping relation data.

It should be noted that, in the mapping relationship data, mapping relationship data described by the data and data dictionary and mapping relationship data of the data analysis scenario and the data query sentence are used as mapping logic information, the mapping relationship data of the data analysis scenario and the business rule is used as training data of a pre-training corpus after data cleaning and processing, the query sentence in the mapping relationship data of the data analysis scenario and the data query sentence is used as training data of the pre-training corpus after data cleaning and data extraction (mainly extracting table name and dictionary information corresponding to a field), and the mapping relationship data of the data query sentence and the visualization mode in the data analysis scenario is used as a key parameter for judging the data analysis scenario to select the visualization chart type.

Wherein, the corpus is pre-trained: modeling is carried out by taking the description information of the data analysis scene in the existing requirement document as the corpus, and a pre-training language model suitable for the system is obtained. Training corpora of the pre-training corpus: data dictionary description of fields, field data dictionary description after element replacement of keywords and corpus of text structured mining: and (5) data query statements after the elements of the keywords are replaced.

In this embodiment, the content common to the data analysis scenario and the data table is related to the field information, but the field information having the same semantic representation (e.g. the same word description in the data dictionary) in different data analysis scenario information may be mapped to different data tables, similar to the ambiguity problem in the natural language processing. To solve this problem, the present embodiment provides data dictionary information I for fields in a data query sentence_mTextual description information I corresponding to a data analysis scenario_nCombining, second order features after combination

And combining the text description information with the original data analysis scene to be used as the training input of the pre-training model.

And S230, taking the processed mapping relation data as training data, and performing model training on the pre-training language model based on the multi-head attention mechanism.

In this embodiment, feature input data in each scene is processed based on the processed mapping relationship data, and the pre-training language models are trained respectively to obtain pre-training language models suitable for each topic domain. With this language model, the present embodiment can perform similarity calculation and support the calculation of the text classification task using the language model as a natural language feature extractor. And further finding out the data analysis scene which is most similar to the current scene requirement document semantics in the existing data analysis scenes.

The multi-head attention mechanism is a core processing mechanism in a mainstream deep learning natural language processing model, such as a Transformer model (Transformer) and a BERT model. The multi-head attention mechanism obtains the vector representation of the text containing context semantics through the matrix operation of a plurality of attention head devices and the matrix linear transformation operation. In the embodiment, the multi-head attention mechanism mainly plays a remarkable advantage in processing long text context semantics, and improves accuracy of tasks such as historical document mining.

S240, receiving a scene demand file to be processed, and performing semantic analysis and data extraction processing on the scene demand file by using a pre-training language model to obtain data analysis scene information corresponding to the scene demand file.

And S250, acquiring a candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information.

S260, assembling a data query statement and a front-end component code of the data analysis scene corresponding to the scene demand file according to the candidate scene information in the candidate scene information set.

According to the technical scheme, unused historical scene demand files and log data are processed and mined, field mapping relations under all data subject domains are integrated, a multi-head attention mechanism deep learning algorithm is adopted to establish a pre-training language model, then the pre-training language model is used for performing complicated logic semantic analysis and abstract extraction on newly added scene demand files, the historical demand files with high similarity are matched according to extraction results of the semantic analysis, and key information of the historical demand files is extracted to perform automatic generation of scene information and code portions.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a data analysis scenario generation apparatus according to a third embodiment of the present invention. As shown in fig. 3, the apparatus includes:

the receiving module 310 is configured to execute receiving of a scene requirement file to be processed, perform semantic analysis and data extraction processing on the scene requirement file by using a pre-training language model, and obtain data analysis scene information corresponding to the scene requirement file;

an obtaining module 320, configured to perform, based on the data analysis scenario information, obtaining a candidate scenario information set matching the scenario requirement file from a data analysis scenario pool;

an assembling module 330, configured to perform assembling, according to candidate scene information in the candidate scene information set, a data query statement and a front-end component code of a data analysis scene corresponding to the scene requirement file.

Optionally, the apparatus further comprises: the model training module is used for carrying out semantic analysis and data extraction processing on the scene demand file by using a pre-training language model before receiving the scene demand file to be processed and obtaining data analysis scene information corresponding to the scene demand file,

extracting mapping relation data under different data subject domains from the historical demand file;

carrying out data cleaning and data extraction on the mapping relation data, and carrying out ambiguity avoidance processing on the operated mapping relation data;

and taking the processed mapping relation data as training data, and carrying out model training on a pre-training language model based on a multi-head attention mechanism.

Optionally, the mapping relationship data in the different data topic domains includes:

mapping relation data of the data and the data dictionary description;

mapping relation data of a data analysis scene and a business rule;

mapping relation data of the data analysis scene and the data query statement;

and mapping relation data of the data query statement and the visualization mode in the data analysis scene.

Optionally, the model training module is configured to:

carrying out data cleaning and data extraction operations on the mapping relation data of the data analysis scene and the service rule and the mapping relation data of the data analysis scene and the data query statement to obtain scene description information;

and combining field data dictionary information in the data query statement and corresponding scene description information, and combining the second-order features generated by combination with all mapping relation data.

Optionally, the obtaining module 320 is configured to:

finding out a target historical demand file with the same elements as the scene demand file from all the historical demand files;

and for all the target historical demand files, inquiring a data analysis scene for data analysis by using the data in the target historical demand files from a data analysis scene pool, and adding a candidate scene information set.

Optionally, the apparatus further comprises:

the verification module is used for screening out target data query sentences of which the maximum frequent item sets are smaller than a preset threshold value from the candidate scene information set after acquiring the candidate scene information set matched with the scene demand file from a data analysis scene pool based on the data analysis scene information; and deleting the candidate scene information corresponding to the target data query statement from the candidate scene information set.

Optionally, an assembly module 330 is configured to:

assembling a data query statement corresponding to the scene demand file according to the fields, the description information and the table name in the candidate scene information set;

sending fields and table names in the candidate scene information set to a front-end component code as parameters;

and calculating historical data analysis scenes with similarity larger than a first threshold value with the description information in the candidate scene information set, and sending the visualization mode with the most frequent use of the historical data analysis scenes to a front-end component code as a parameter to perform page rendering.

The device for generating the data analysis scene provided by the embodiment of the invention can execute the method for generating the data analysis scene provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example four

FIG. 4 shows a schematic block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM)12, a Random Access Memory (RAM)13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM)12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 11 performs the various methods and processes described above, such as the generation method of the data analysis scenario.

In some embodiments, the generation method of the data analysis scenario may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the generation method of the data analysis scenario described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured by any other suitable means (e.g., by means of firmware) to perform the generation method of the data analysis scenario.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for generating a data analysis scenario, comprising:

2. The method according to claim 1, wherein before receiving the scene requirement file to be processed, performing semantic analysis and data extraction processing on the scene requirement file by using a pre-trained language model, and obtaining data analysis scene information corresponding to the scene requirement file, the method further comprises:

3. The method of claim 2, wherein the mapping relationship data under different data subject domains comprises:

mapping relation data of the data and the data dictionary description;

mapping relation data of a data analysis scene and a business rule;

mapping relation data of the data analysis scene and the data query statement;

4. The method according to claim 3, wherein performing data cleansing and data extraction operations on the mapping relationship data, and performing ambiguity avoidance processing on the operated mapping relationship data comprises:

5. The method of claim 1, wherein the obtaining a candidate scene information set matching the scene requirement file from a data analysis scene pool based on the data analysis scene information comprises:

6. The method of claim 5, further comprising, after the obtaining a set of candidate scenario information matching the scenario requirement file from a pool of data analysis scenarios based on the data analysis scenario information:

screening out target data query sentences of which the maximum frequent item sets are smaller than a preset threshold value from the candidate scene information sets;

and deleting the candidate scene information corresponding to the target data query statement from the candidate scene information set.

7. The method of claim 1, wherein the assembling a data query statement and a front-end component code of a data analysis scenario corresponding to the scenario requirement file according to the candidate scenario information in the candidate scenario information set comprises:

and calculating historical data analysis scenes with similarity greater than a first threshold with the description information in the candidate scene information set, and sending the visualization mode with the most frequent use of the historical data analysis scenes to a front-end component code as a parameter to perform page rendering.

8. An apparatus for generating a data analysis scenario, comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of generating a data analysis scenario of any one of claims 1-7.

10. A computer-readable storage medium storing computer instructions for causing a processor to implement the method of generating a data analysis scenario of any one of claims 1-7 when executed.