WO2022116430A1

WO2022116430A1 - Big data mining-based model deployment method, apparatus and device, and storage medium

Info

Publication number: WO2022116430A1
Application number: PCT/CN2021/083486
Authority: WO
Inventors: 黄丽媛
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-12-02
Filing date: 2021-03-29
Publication date: 2022-06-09
Also published as: CN112445845A

Abstract

A big data mining-based model deployment method, apparatus and device, and a storage medium. The method is applied to a big data mining platform, and comprises: performing service data crawling from a plurality of institution databases by means of a platform layer, and updating the service data to a base layer; obtaining a data mining request by means of a service layer, performing semantic analysis on the data mining request, and determining mined content corresponding to the data mining request; obtaining, by means of matching, a training algorithm corresponding to the mined content from a preset algorithm library by means of the platform layer, and selecting service data corresponding to the mined content from the base layer; and by taking the selected service data as a sample and using a training algorithm, establishing a corresponding service model by means of the platform layer, and deploying the service model into a functional layer. The present invention also relates to the blockchain technology. Service data is stored in a blockchain. The method implements intelligent deployment of a service model, and improves the mining efficiency of massive number of services.

Description

Model deployment method, device, equipment and storage medium based on big data mining

This application claims the priority of the Chinese patent application filed on December 02, 2020, with the application number 202011386029.X and the invention titled "Model Deployment Method, Device, Equipment and Storage Medium Based on Big Data Mining", The entire contents of which are incorporated by reference in the application.

technical field

The present application relates to the field of artificial intelligence, and in particular, to a method, device, device and storage medium for model deployment based on big data mining.

Background technique

The medical data in the regional medical information system is typical big data. Big data has 4V characteristics (Volume, Velocity, variety, value), including: (1) Larger volume (Volume): Regional medical data is usually obtained from the An area with millions of people and hundreds of medical institutions, and the amount of data continues to grow. According to the relevant regulations of the medical industry, a patient's data usually needs to be retained for more than 50 years; (2) Faster generation speed (Velocity): Medical information services may include a large number of online or real-time data analysis and processing needs. For example: diagnosis and medication recommendations in clinical decision support, epidemiological analysis report generation, health indicator early warning, etc.; (3) Higher diversity (Vanety): medical data usually contains various structured data tables, non-(semi-semi) ) Structured text documents (XML and narrative text), medical images and other forms of data storage; (4) More value (Value): Needless to say, the value of medical data is not only closely related to our personal life, It can also be used for national and even global disease prevention and control, new drug research and development, and overcoming chronic diseases.

The inventor realizes that the collection, storage, mining and application of medical data in the industry are all carried out independently at present, especially when mining medical-related information from medical data, it is impossible to collect the latest data in one step and screen the available target medical data as samples. Building and deploying analysis models, visualizing model output results, etc., all mining work still needs to be done from scratch, resulting in low efficiency of medical data mining.

SUMMARY OF THE INVENTION

The main purpose of this application is to solve the technical problems of low medical data mining efficiency and inflexible deployment.

A first aspect of the present application provides a model deployment method based on big data mining, which is applied to a big data mining platform. The big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer, and a basic layer. The model deployment method based on big data mining includes:

Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;

Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;

Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;

Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.

A second aspect of the present application provides a computer device, comprising: a memory and at least one processor, wherein instructions are stored in the memory, the memory and the at least one processor are interconnected by a line; the at least one processor Invoke the instructions in the memory, so that the computer device executes the steps of the model deployment method based on big data mining as described below, wherein the big data mining platform sequentially includes from top to bottom: a business layer, Function layer, platform layer and base layer, the steps of the model deployment method based on big data mining include:

Every preset period, crawl business data from each institutional database by the platform layer, and update the business data in the base layer;

A third aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, when the computer-readable storage medium runs on a computer, the computer executes the model deployment based on big data mining as described below The steps of the method, wherein the big data mining platform sequentially includes from top to bottom: a business layer, a functional layer, a platform layer and a basic layer, and the steps of the big data mining-based model deployment method include:

A fourth aspect of the present application provides a model deployment device based on big data mining, which is applied to a big data mining platform. The big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer, and a basic layer. The model deployment device based on big data mining includes:

A crawling module, used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;

a semantic analysis module, configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;

a selection module, configured to obtain a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;

The deployment module is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.

In the technical solution provided by this application, when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the Obtain data mining requests through the business layer and perform semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, the training algorithm corresponding to the mining content is matched, and a business training model is built, and on the other hand, it is selected from the basic layer. The business data corresponding to the content is mined, and the business data is input into the business training model as a sample for training to build a business model for data mining, and the business model can be deployed to the function layer for standby. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.

Description of drawings

1 is a schematic diagram of an embodiment of a model deployment method based on big data mining in the application;

2 is a schematic diagram of another embodiment of a model deployment method based on big data mining in the application;

3 is a schematic diagram of an embodiment of a model deployment device based on big data mining in the present application;

4 is a schematic diagram of another embodiment of a model deployment device based on big data mining in the present application;

FIG. 5 is a schematic diagram of an embodiment of the computer device in this application.

Detailed ways

The embodiments of the present application provide a model deployment method, device, equipment and storage medium based on big data mining, which crawls business data from multiple institutional databases through the platform layer, and updates the business data to the base layer; The data mining request is obtained from the platform layer, and the data mining request is semantically analyzed to determine the mining content corresponding to the data mining request; the training algorithm corresponding to the mining content is matched by the platform layer, and the business data corresponding to the mining content is selected from the base layer; The business data is used as a sample, and the training algorithm is used to build the corresponding business model through the platform layer, and deploy the business model to the function layer. This application also relates to blockchain technology, and business data is stored in the blockchain. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.

The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.

Before the description of the embodiment, the big data mining platform is introduced by taking the medical field as an example. The big data mining platform can at least include the basic layer, platform layer, functional layer, and business layer, as follows:

Basic layer: Stores a large amount of medical data, such as CT (Computed Tomography), MRI (Magnetic Resonance Imaging, magnetic resonance imaging) image data, doctor diagnosis report data, etc., which are common in medical imaging data. Different medical data types are stored in the corresponding fixed data storage format; while the database can use a file storage architecture that combines traditional centralized storage and HDFS (Hadoop Distributed File System, distributed file system), using the row keys and columns in HBase. The flexibility of key and column family design effectively organizes multi-dimensional medical data together and realizes the multi-dimensional data storage model in traditional data warehouses.

Platform layer: As the main area of data processing, it contains at least functional modules such as data collection engine, algorithm search engine, data retrieval engine, etc. It also serves as a place for business model training.

For the data collection engine, MapReduce (mapping-reduce) can be used as the computing core, and Flume/Sqoop (a data collection system) can be used to extract data from multiple medical institution databases, then standardize, convert the format and install it. Loaded into the data storage area of the base layer;

For the algorithm search engine, Hadoop ML/Mahout can be used to build, and the associated algorithm library can provide various training algorithms such as Bayesian discriminant analysis, clustering, decision tree, correlation algorithm, recommendation algorithm, etc., to assist clinical disease diagnosis, Behavior analysis and other medical data mining provides algorithm support;

For the data retrieval engine, the SQL-like (Structured Query Language) query interface provided by Apache Hive (a database tool) can be used to provide analysts with a convenient way to obtain data.

Functional layer: The deployment site after the training of various types of business models is completed. Through various types of business models, real-time query, statistical analysis, deep mining, machine learning and other functions of medical data are provided to provide functional support for the business layer.

Business layer: directly connected to the client terminal, under the support of the function layer, it provides real-time query, statistical analysis, deep mining, machine learning and other applications of medical data corresponding to the function layer.

Next, for ease of understanding, the specific process of the embodiment of the present application will be described below. Please refer to FIG. 1 . The first embodiment of the model deployment method based on big data mining in the embodiment of the present application includes:

S101, every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;

It can be understood that the execution body of the present application may be a model deployment device based on big data mining, and may also be a terminal or a server, which is not specifically limited here. The embodiments of the present application take the server as an execution subject as an example for description. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned business data, the above-mentioned business data can also be stored in a node of a blockchain. In addition, the business data may include medical data, insurance data, traffic data, user data, online shopping data, etc. When various types of business data are mined using the inventive method, the steps and processes are essentially the same, and examples are not provided in this embodiment. Take medical data as an example.

In this embodiment, when the big data mining platform does not perform the mining task of medical data, it continuously updates the medical data in the base layer, so that when the mining task is performed later, the latest medical data can be provided in time for data mining without re-checking and Crawl the latest medical data to increase the mining efficiency of medical data. Among them, for the update of medical data in the basic layer, it is automatically checked whether each medical data has an index. When the medical data already has an index, the current medical data is used to replace the field value of the corresponding original medical data. If there is no index value , the medical data is newly inserted into the base layer to complete the update of the medical data. For different medical institutions, the preset period can be different, and the crawling of medical data is performed asynchronously. For example, for large medical institutions, medical data can be collected sequentially every day, that is, the preset period is 24 hours. For research institutes, due to The number of medical data updates is small, so a longer period of crawling can be set, such as a week or a month.

There is a special data collection engine in the platform layer to obtain medical data from various medical institutions, and there is a storage area for fixed storage of medical data in the base layer, which is stored in a fixed-format data storage model; for the data storage model, each data The storage model has a model number. Each data storage model contains multiple data tables, which are identified by the table number. The row number and column number in each data table, so each piece of medical data can be identified by the model number + table number + row number + column number. to uniquely identify.

S102, acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;

In this embodiment, the business layer is directly connected with the client terminal, and the data mining request received by the client terminal contains data mining information, which can be obtained according to the content selected by the user or the content entered in the text box, according to the data mining information , through the semantic analysis model, the specific mining content in the data mining information can be analyzed and represented by the data mining label. The specific semantic analysis process is as follows:

(1) parsing the data mining request, obtaining corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;

(2) Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining labels;

(3) Determine the mining content corresponding to the data mining request based on the data mining tag.

In this embodiment, in addition to user identity authentication information, the data mining request also includes data mining information, such as "patient behavior analysis of cardiovascular and cerebrovascular diseases", "aided clinical decision-making for coronary heart disease", "pancreatitis disease control warning", etc. ; After word segmentation, for example, "patient behavior analysis of cardiovascular and cerebrovascular" can be divided into three key points for mining: "cardio and cerebrovascular", "patient behavior", "analysis"; then use the preset semantic analysis model to mine The key points are analyzed by word segmentation. The semantic analysis model has an expert database, and each mining point word segmentation is mapped to the data mining tags with the same meaning, for example, "cardiovascular" is mapped to "cardiovascular" data mining tags, "cerebrovascular" data Mining tags.

S103, acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;

In this embodiment, due to the huge amount of medical data, for example, the size of each image or report of a single user is calculated in megabytes, and so much medical data is stored in the basic layer, it is necessary to use machine learning and data mining algorithms to automatically analyze medical data. , obtain effective, novel, potentially useful and understandable medical data from a large amount of medical data, and can discover the characteristic knowledge implicit in the large-scale medical data.

First select the medical data related to the specified disease and data application based on the mining content; then select the appropriate model training algorithm for the mining content, and build a business training model to further extract the common features related to the disease and the application from the initially selected medical data. For example, for "patient behavior analysis of cardiovascular and cerebrovascular", the training content can be obtained as "patient behavior analysis", then neighborhood-based algorithms, latent semantic models, and graph-based random walk algorithms can be selected.

S104. Using the selected business data as a training sample, use the model training algorithm for training to generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.

In this embodiment, the medical data is used as a training sample, and the training samples are marked according to the diagnostic content in the medical data, medical image feature areas, etc.; then a corresponding business training model is generated according to the model training algorithm, and the training samples are used for training , to generate the corresponding business model; and finally deploy the business model in the functional layer for use, when there is a user request, such as "patient behavior prediction instruction", "disease warning instruction", "data mining instruction", it will be in The functional layer calls the business model to perform data mining on the corresponding incoming information. The specific construction process of the business model is as follows:

(1) using the selected business data as a training sample through the platform layer, and labeling the training sample to obtain a corresponding annotation file;

(2) generating a business training model according to the model training algorithm, inputting the training sample and the annotation file into the business training model, and outputting mining results;

(3) Calculate the loss value of the business training model based on the mining result, and train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model.

In this embodiment, the training samples are automatically obtained from the base layer, the model training algorithm is automatically obtained from the preset algorithm library, and the model training algorithm is written into the pre-written model framework to obtain the corresponding business training model, and then The business training model is trained through training samples and annotation files, and the loss function of the model is also determined according to the mining content. For example, for the analysis of patient behavior, the model can be measured by the logistic regression loss function.

In the embodiment of the present application, when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the business data The platform layer obtains data mining requests and performs semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, it matches the preset algorithm corresponding to the mining content, and builds a business training model, and on the other hand selects the mining content from the basic layer. The business data corresponding to the content, and input the business data as a sample into the business training model for training to build a business model for data mining, and deploy the business model to the functional layer to stand by. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.

Referring to FIG. 2, the second embodiment of the model deployment method based on big data mining in the embodiment of the present application includes:

S201, every preset period, crawl business data from multiple institutional databases through the data collection engine, and standardize the business data;

In this embodiment, the platform layer includes a data collection engine, which is used to obtain the latest medical data from multiple medical institutions, and to standardize the medical data, including data cleaning, preprocessing, error correction, and missing value filling. , the process of discretizing continuous values, removing outliers, and normalizing data.

S202, converting the standardized business data into a preset semantic format, and determining the semantic feature of the converted business data based on the semantic format;

S203, obtaining the document semantic framework of the data storage model in the base layer, and associating corresponding semantic features according to the document semantic framework;

S204, based on the associated document semantic framework and semantic features, store the converted business data in the data storage model;

In this embodiment, after the medical data is standardized, it needs to be converted into a fixed semantic format, and different semantic formats have corresponding document semantic frameworks. The corresponding data storage model can be obtained by storing the semantic framework, and the data storage model is extensible. The medical data of different medical institutions has data attributes, such as the records of the medical activities such as the name of the institution, patient information, examination information, diagnosis information, treatment information, etc. Extensible Markup Language (a semantic format) document in the clinical document framework (a document semantic framework) format to save semantic data, while the aforementioned data such as institution name, patient information, examination information, diagnosis information, treatment information, etc. Attributes are corresponding semantic features. In the document semantic framework, medical data of different semantic features are stored in positions corresponding to different tables, table rows, and table columns.

According to the data attributes, medical data can be converted into a fixed semantic format in the form of multi-level tags. The first level tag is the document semantic frame, which can be determined according to different types of medical data, such as image data, text data of electronic medical records, The second-level label is a data table, which can be determined according to different types of diseases, different medical institutions or different patients, and the third-level label is a table column or table row, which is determined according to various patient information, historical medical records, etc.

Medical data is stored in accordance with the corresponding fixed semantic format, according to different types of medical data, for specific data attributes, look up the stored data, and use the stored data corresponding to the data attribute as the semantic feature of the medical data, such as corresponding to electronic medical records. There is a table for storing user information, and the stored data corresponding to the user's age and blood type attribute can be used as the semantic feature of medical data for user behavior analysis. The semantic feature can be encoded by a fixed encoding rule and stored into the corresponding location in the corresponding data storage model.

The platform layer also includes an algorithm search engine, and the acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer includes:

S205, determining the data mining attribute corresponding to the data mining content based on the data mining tag, and determining the corresponding multi-layer algorithm tag based on the data mining attribute;

S206, based on the multi-layer algorithm label, through the algorithm search engine, obtain a model training algorithm matching the mining content in a preset algorithm library;

In this embodiment, the platform layer also includes an algorithm search engine, which is associated with a preset algorithm library, and can be used to search for a model training algorithm required for mining content from the preset algorithm library, and build a business training model according to the model training algorithm. And then input medical data for training to obtain the final business model; and according to different data mining types (determined based on data mining tags), including disease early warning, clinical diagnosis, patient behavior analysis, etc., determine different data mining attributes, and Differentiate multiple data mining attributes into multiple layers of algorithm labels to determine the final model training algorithm used.

For example, for the data mining label "disease early warning", data mining attributes can be obtained from "disease early warning": "machine learning", "logistic regression", "multi-classification", "semi-supervised learning", according to the obtained data mining attributes, can be Determine the following four-layer algorithm labels (i.e., multi-layer algorithm labels) as follows:

The first layer is "semi-supervised learning";

The second layer is "machine learning";

The third layer is "logistic regression";

The fourth layer is "multi-classification";

Through the above four layers of algorithm labels, you can search for the "softmax" algorithm.

The platform layer also includes a data retrieval engine, and the selection of business data corresponding to the mining content from the base layer includes:

S207, based on the data mining label, determine the data mining index value corresponding to the data mining content;

S208, according to the data mining index value, through the data retrieval engine, determine and obtain the storage location of the business data corresponding to the mining content;

In this embodiment, the platform layer also includes a data retrieval engine, which can retrieve corresponding medical data from the base layer according to the data mining index value, and the data mining label can be mapped to the corresponding data mining index value, such as "cardiovascular", "Cerebral blood vessels" can be mapped to five data mining index values of fields a, b, c, d, and e. Through the index values of fields a, b, c, d, and e, the corresponding data storage model, data table, The data in the table row or table column can be medical data of a certain data storage model, or can be all the data in a certain data table, table row, and table column.

After selecting and obtaining the model training algorithm and business data corresponding to the mining content, the following business models can be trained:

S209. Using the selected business data as a training sample, use the model training algorithm to perform training to generate a corresponding business model.

In the embodiment of the present application, the data collection engine in the platform layer is used to crawl business data from multiple business organizations for backup; then, the algorithm search engine is used to select a suitable algorithm library from multiple preset algorithm libraries for deployment Business training model; then select appropriate business data as a sample through the data retrieval engine, and input it into the business training model for training to build the business model required for data mining and realize the intelligent deployment of the business model.

The model deployment method based on big data mining in the embodiment of the present application has been described above. The following describes the model deployment device based on big data mining in the embodiment of the present application. Please refer to FIG. 3 . An embodiment of the model deployment device includes:

The crawling module 301 is used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;

A semantic analysis module 302, configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;

A selection module 303, configured to acquire a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;

The deployment module 304 is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.

Referring to FIG. 4 , another embodiment of the apparatus for model deployment based on big data mining in the embodiment of the present application includes:

Specifically, the platform layer includes a data collection engine, and the crawling module 301 includes:

A data standardization processing unit 3011, configured to crawl business data from multiple institutional databases through the data collection engine, and perform standardization processing on the business data;

a format conversion unit 3012, configured to convert the standardized business data into a preset semantic format, and determine the semantic feature of the converted business data based on the semantic format;

Association unit 3013 is used to obtain the document semantic frame of the data storage model in the base layer, and according to the document semantic frame, associate the corresponding semantic feature;

The storage unit 3014 is configured to store the converted business data in the data storage model based on the associated document semantic framework and semantic features.

Specifically, the semantic analysis module 302 includes:

A word segmentation unit 3021, configured to parse the data mining request, obtain corresponding data mining information, and perform word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;

The semantic analysis unit 3022 is configured to input the word segmentation of the mining points into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags; based on the data mining tags, determine the mining content corresponding to the data mining request.

Specifically, the platform layer also includes an algorithm search engine, the selection module 303 includes an algorithm search unit 3031, and the algorithm search unit 3031 is used for:

Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;

Based on the multi-layer algorithm tags, through the algorithm search engine, obtain a model training algorithm matching the mining content in the preset algorithm library.

Specifically, the platform layer further includes a data retrieval engine, and the selection module 303 further includes a data retrieval unit 3032, which is used for:

determining, based on the data mining tag, a data mining index value corresponding to the data mining content;

According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.

Specifically, the deployment module 304 includes:

An annotation unit 3041, configured to use the selected business data as a training sample through the platform layer, and annotate the training sample to obtain a corresponding annotation file;

The training unit 3042 is configured to generate a business training model according to the model training algorithm, input the training samples and the annotation files into the business training model, and output mining results; based on the mining results, calculate the The loss value of the business training model, and the business training model is trained based on the loss value, and the training is stopped until the loss value is less than the preset loss value, and the corresponding business model is output.

In the embodiment of the present application, the data collection engine, algorithm search engine and data retrieval engine in the platform layer first crawl business data from multiple business organizations for backup; then select a suitable one from multiple preset algorithm libraries The algorithm library deploys the business training model; then select the appropriate business data as a sample and input it into the business training model for training to build the business model required for data mining, realize the intelligent deployment of the business model, and improve the mining efficiency of business data.

3 and 4 above describe the model deployment apparatus based on big data mining in the embodiments of the present application in detail from the perspective of modular functional entities, and the computer equipment in the embodiments of the present application is described in detail below from the perspective of hardware processing.

FIG. 5 is a schematic structural diagram of a computer device provided by an embodiment of the present application. The computer device 500 may vary greatly due to different configurations or performance, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store applications 533 or data 532. Among them, the memory 520 and the storage medium 530 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the computer device 500 . Furthermore, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the computer device 500 .

Computer device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the computer device structure shown in FIG. 5 does not constitute a limitation to the computer device, and may include more or less components than the one shown, or combine some components, or arrange different components.

The present application also provides a computer device, the computer device is any device that can perform the steps of the model deployment method based on big data mining in the above-mentioned embodiments, the computer device includes a memory and a processor, and the memory stores a memory and a processor. Computer-readable instructions, when the computer-readable instructions are executed by the processor, cause the processor to execute the steps of the big data mining-based model deployment method in the foregoing embodiments.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium may also be a volatile computer-readable storage medium. The computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to execute the steps of the model deployment method based on big data mining.

Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims

A model deployment method based on big data mining is applied to a big data mining platform, wherein the big data mining platform includes in order from top to bottom: a business layer, a functional layer, a platform layer and a basic layer, and the big data mining platform is based on the big data. Mined model deployment methods include:

Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;

Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;

Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;

Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
The model deployment method based on big data mining according to claim 1, wherein a data collection engine is included in the platform layer, and business data is crawled from databases of various institutions through the platform layer, and the business Data updates to the base layer include:

Crawling business data from multiple institutional databases through the data collection engine, and standardizing the business data;

converting the standardized business data into a preset semantic format, and determining the semantic features of the converted business data based on the semantic format;

Obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;

Based on the associated document semantic framework and semantic features, the transformed business data is stored in the data storage model.
The model deployment method based on big data mining according to claim 1, wherein the performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request comprises:

Parsing the data mining request to obtain corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;

Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags;

Based on the data mining tag, the mining content corresponding to the data mining request is determined.
The model deployment method based on big data mining according to claim 3, wherein the platform layer further includes an algorithm search engine, and the acquiring a model matching the mining content in a preset algorithm library of the platform layer Training algorithms include:

Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;

Based on the multi-layer algorithm tags, through the algorithm search engine, obtain a model training algorithm matching the mining content in the preset algorithm library.
The model deployment method based on big data mining according to claim 3, wherein the platform layer further includes a data retrieval engine, and the selecting business data corresponding to the mining content from the base layer comprises:

determining, based on the data mining tag, a data mining index value corresponding to the data mining content;

According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.
The model deployment method based on big data mining according to any one of claims 1 to 5, wherein the selected business data is used as a training sample, and the model training algorithm is used for training, and generating a corresponding business model comprises:

The selected business data is used as a training sample by the platform layer, and the training sample is marked to obtain a corresponding marked file;

According to the model training algorithm, a business training model is generated, and the training samples and the annotation file are input into the business training model, and mining results are output;

Calculate the loss value of the business training model based on the mining result, train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model .
A computer device, wherein the computer device comprises: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor are interconnected by wires;

The at least one processor invokes the instructions in the memory, so that the computer device executes the steps of the model deployment method based on big data mining as described below, wherein the big data mining platform is from top to bottom The steps include: a business layer, a functional layer, a platform layer and a basic layer, and the steps of the model deployment method based on big data mining include:

Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;

Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;

Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;

Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
The computer device according to claim 7, wherein a data collection engine is included in the platform layer, and the computer device executes the crawling of business data from databases of various institutions through the platform layer, and collects the business data When updating to the steps in the base layer, include:

Crawling business data from multiple institutional databases through the data collection engine, and standardizing the business data;

converting the standardized business data into a preset semantic format, and determining the semantic features of the converted business data based on the semantic format;

Obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;

Based on the associated document semantic framework and semantic features, the transformed business data is stored in the data storage model.
The computer device according to claim 7, wherein, when the computer device performs the step of performing the semantic analysis on the data mining request to determine the mining content corresponding to the data mining request, the method comprises:

Parsing the data mining request to obtain corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;

Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags;

Based on the data mining tag, the mining content corresponding to the data mining request is determined.
The computer device according to claim 9, wherein an algorithm search engine is further included in the platform layer, and the computer device executes the acquisition of a model matching the mining content in a preset algorithm library of the platform layer When training the algorithm steps, include:

Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;

Based on the multi-layer algorithm tags, through the algorithm search engine, obtain the model training algorithm matching the mining content in the preset algorithm library.
The computer device according to claim 9, wherein the platform layer further includes a data retrieval engine, and when the computer device performs the step of selecting business data corresponding to the mining content from the base layer ,include:

determining, based on the data mining tag, a data mining index value corresponding to the data mining content;

According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.
The computer device according to claims 7-11, wherein, when the computer device performs the step of generating a corresponding business model by using the selected business data as a training sample, using the model training algorithm for training, and generating a corresponding business model, the steps include:

The selected business data is used as a training sample by the platform layer, and the training sample is marked to obtain a corresponding marked file;

According to the model training algorithm, a business training model is generated, and the training samples and the annotation file are input into the business training model, and mining results are output;

Calculate the loss value of the business training model based on the mining result, train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model .
A computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, the steps of the model deployment method based on big data mining as described below are implemented, wherein, The big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer and a basic layer, and the steps of the model deployment method based on big data mining include:

Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;

Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;

Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;

Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
The computer-readable storage medium according to claim 13, wherein a data collection engine is included in the platform layer, and when the computer program is executed by the processor, the platform layer implements the crawling of services from databases of various institutions through the platform layer data, and updating the business data to the steps in the base layer, including:

Crawling business data from multiple institutional databases through the data collection engine, and standardizing the business data;

converting the standardized business data into a preset semantic format, and determining the semantic features of the converted business data based on the semantic format;

Obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;

Based on the associated document semantic framework and semantic features, the transformed business data is stored in the data storage model.
The computer-readable storage medium according to claim 13, wherein, when the computer program is executed by the processor, the data mining request is implemented to perform semantic analysis, and the step of determining the mining content corresponding to the data mining request comprises:

Parsing the data mining request to obtain corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;

Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags;

Based on the data mining tag, the mining content corresponding to the data mining request is determined.
The computer-readable storage medium according to claim 15, wherein the platform layer further includes an algorithm search engine, and when the computer program is executed by the processor, the acquisition of the preset algorithm library of the platform layer and the When the steps of the model training algorithm matching the mining content, include:

Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;

Based on the multi-layer algorithm tags, through the algorithm search engine, obtain a model training algorithm matching the mining content in the preset algorithm library.
The computer-readable storage medium according to claim 15, wherein the platform layer further includes a data retrieval engine, and when the computer program is executed by the processor, the selection from the base layer and the mining content are implemented The corresponding business data steps include:

determining, based on the data mining tag, a data mining index value corresponding to the data mining content;

According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.
The computer-readable storage medium according to claims 13-17, wherein, when the computer program is executed by the processor, the selected service data is used as a training sample, and the model training algorithm is used for training to generate a corresponding service When modeling steps, include:

The selected business data is used as a training sample by the platform layer, and the training sample is marked to obtain a corresponding marked file;

According to the model training algorithm, a business training model is generated, and the training samples and the annotation file are input into the business training model, and mining results are output;

Calculate the loss value of the business training model based on the mining result, train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model .
A model deployment device based on big data mining is applied to a big data mining platform, wherein the big data mining platform includes in order from top to bottom: a business layer, a functional layer, a platform layer and a basic layer, the big data-based Excavated model deployment devices include:

A crawling module, used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;

a semantic analysis module, configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;

a selection module, configured to obtain a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;

The deployment module is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
The model deployment device based on big data mining according to claim 19, wherein the platform layer includes a data collection engine, and the crawling module comprises:

a data standardization processing unit, used for crawling business data from multiple institutional databases through the data collection engine, and performing standardization processing on the business data;

a format conversion unit, configured to convert the standardized business data into a preset semantic format, and determine the semantic feature of the converted business data based on the semantic format;

an association unit, configured to obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;

A storage unit, configured to store the converted business data in the data storage model based on the associated document semantic framework and semantic features.