WO2022116430A1 - Big data mining-based model deployment method, apparatus and device, and storage medium - Google Patents

Big data mining-based model deployment method, apparatus and device, and storage medium Download PDF

Info

Publication number
WO2022116430A1
WO2022116430A1 PCT/CN2021/083486 CN2021083486W WO2022116430A1 WO 2022116430 A1 WO2022116430 A1 WO 2022116430A1 CN 2021083486 W CN2021083486 W CN 2021083486W WO 2022116430 A1 WO2022116430 A1 WO 2022116430A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
mining
business
model
layer
Prior art date
Application number
PCT/CN2021/083486
Other languages
French (fr)
Chinese (zh)
Inventor
黄丽媛
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022116430A1 publication Critical patent/WO2022116430A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a method, device, device and storage medium for model deployment based on big data mining.
  • the medical data in the regional medical information system is typical big data.
  • Big data has 4V characteristics (Volume, Velocity, variety, value), including: (1) Larger volume (Volume): Regional medical data is usually obtained from the An area with millions of people and hundreds of medical institutions, and the amount of data continues to grow. According to the relevant regulations of the medical industry, a patient's data usually needs to be retained for more than 50 years; (2) Faster generation speed (Velocity): Medical information services may include a large number of online or real-time data analysis and processing needs.
  • Velocity Medical information services may include a large number of online or real-time data analysis and processing needs.
  • the inventor realizes that the collection, storage, mining and application of medical data in the industry are all carried out independently at present, especially when mining medical-related information from medical data, it is impossible to collect the latest data in one step and screen the available target medical data as samples. Building and deploying analysis models, visualizing model output results, etc., all mining work still needs to be done from scratch, resulting in low efficiency of medical data mining.
  • the main purpose of this application is to solve the technical problems of low medical data mining efficiency and inflexible deployment.
  • a first aspect of the present application provides a model deployment method based on big data mining, which is applied to a big data mining platform.
  • the big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer, and a basic layer.
  • the model deployment method based on big data mining includes:
  • the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
  • a second aspect of the present application provides a computer device, comprising: a memory and at least one processor, wherein instructions are stored in the memory, the memory and the at least one processor are interconnected by a line; the at least one processor Invoke the instructions in the memory, so that the computer device executes the steps of the model deployment method based on big data mining as described below, wherein the big data mining platform sequentially includes from top to bottom: a business layer, Function layer, platform layer and base layer, the steps of the model deployment method based on big data mining include:
  • the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
  • a third aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, when the computer-readable storage medium runs on a computer, the computer executes the model deployment based on big data mining as described below
  • the steps of the method wherein the big data mining platform sequentially includes from top to bottom: a business layer, a functional layer, a platform layer and a basic layer, and the steps of the big data mining-based model deployment method include:
  • the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
  • a fourth aspect of the present application provides a model deployment device based on big data mining, which is applied to a big data mining platform.
  • the big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer, and a basic layer.
  • the model deployment device based on big data mining includes:
  • a crawling module used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;
  • a semantic analysis module configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;
  • a selection module configured to obtain a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;
  • the deployment module is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
  • business data when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the Obtain data mining requests through the business layer and perform semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, the training algorithm corresponding to the mining content is matched, and a business training model is built, and on the other hand, it is selected from the basic layer.
  • the business data corresponding to the content is mined, and the business data is input into the business training model as a sample for training to build a business model for data mining, and the business model can be deployed to the function layer for standby.
  • the application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
  • FIG. 1 is a schematic diagram of an embodiment of a model deployment method based on big data mining in the application
  • FIG. 2 is a schematic diagram of another embodiment of a model deployment method based on big data mining in the application
  • FIG. 3 is a schematic diagram of an embodiment of a model deployment device based on big data mining in the present application
  • FIG. 4 is a schematic diagram of another embodiment of a model deployment device based on big data mining in the present application.
  • FIG. 5 is a schematic diagram of an embodiment of the computer device in this application.
  • the embodiments of the present application provide a model deployment method, device, equipment and storage medium based on big data mining, which crawls business data from multiple institutional databases through the platform layer, and updates the business data to the base layer;
  • the data mining request is obtained from the platform layer, and the data mining request is semantically analyzed to determine the mining content corresponding to the data mining request;
  • the training algorithm corresponding to the mining content is matched by the platform layer, and the business data corresponding to the mining content is selected from the base layer;
  • the business data is used as a sample, and the training algorithm is used to build the corresponding business model through the platform layer, and deploy the business model to the function layer.
  • This application also relates to blockchain technology, and business data is stored in the blockchain. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
  • the big data mining platform is introduced by taking the medical field as an example.
  • the big data mining platform can at least include the basic layer, platform layer, functional layer, and business layer, as follows:
  • Basic layer Stores a large amount of medical data, such as CT (Computed Tomography), MRI (Magnetic Resonance Imaging, magnetic resonance imaging) image data, doctor diagnosis report data, etc., which are common in medical imaging data.
  • CT Computer Tomography
  • MRI Magnetic Resonance Imaging
  • magnetic resonance imaging Magnetic resonance imaging
  • Different medical data types are stored in the corresponding fixed data storage format; while the database can use a file storage architecture that combines traditional centralized storage and HDFS (Hadoop Distributed File System, distributed file system), using the row keys and columns in HBase.
  • HDFS Hadoop Distributed File System, distributed file system
  • Platform layer As the main area of data processing, it contains at least functional modules such as data collection engine, algorithm search engine, data retrieval engine, etc. It also serves as a place for business model training.
  • MapReduce maps-reduce
  • Flume/Sqoop a data collection system
  • Hadoop ML/Mahout can be used to build, and the associated algorithm library can provide various training algorithms such as Bayesian discriminant analysis, clustering, decision tree, correlation algorithm, recommendation algorithm, etc., to assist clinical disease diagnosis, Behavior analysis and other medical data mining provides algorithm support;
  • the SQL-like (Structured Query Language) query interface provided by Apache Hive (a database tool) can be used to provide analysts with a convenient way to obtain data.
  • Functional layer The deployment site after the training of various types of business models is completed. Through various types of business models, real-time query, statistical analysis, deep mining, machine learning and other functions of medical data are provided to provide functional support for the business layer.
  • Business layer directly connected to the client terminal, under the support of the function layer, it provides real-time query, statistical analysis, deep mining, machine learning and other applications of medical data corresponding to the function layer.
  • the first embodiment of the model deployment method based on big data mining in the embodiment of the present application includes:
  • the execution body of the present application may be a model deployment device based on big data mining, and may also be a terminal or a server, which is not specifically limited here.
  • the embodiments of the present application take the server as an execution subject as an example for description.
  • the above-mentioned business data can also be stored in a node of a blockchain.
  • the business data may include medical data, insurance data, traffic data, user data, online shopping data, etc.
  • the big data mining platform when the big data mining platform does not perform the mining task of medical data, it continuously updates the medical data in the base layer, so that when the mining task is performed later, the latest medical data can be provided in time for data mining without re-checking and Crawl the latest medical data to increase the mining efficiency of medical data.
  • the update of medical data in the basic layer it is automatically checked whether each medical data has an index.
  • the current medical data is used to replace the field value of the corresponding original medical data. If there is no index value , the medical data is newly inserted into the base layer to complete the update of the medical data.
  • the preset period can be different, and the crawling of medical data is performed asynchronously.
  • medical data can be collected sequentially every day, that is, the preset period is 24 hours.
  • the preset period is 24 hours.
  • due to The number of medical data updates is small, so a longer period of crawling can be set, such as a week or a month.
  • each data storage model contains multiple data tables, which are identified by the table number. The row number and column number in each data table, so each piece of medical data can be identified by the model number + table number + row number + column number. to uniquely identify.
  • the business layer is directly connected with the client terminal, and the data mining request received by the client terminal contains data mining information, which can be obtained according to the content selected by the user or the content entered in the text box, according to the data mining information , through the semantic analysis model, the specific mining content in the data mining information can be analyzed and represented by the data mining label.
  • the specific semantic analysis process is as follows:
  • the data mining request also includes data mining information, such as "patient behavior analysis of cardiovascular and cerebrovascular diseases", “aided clinical decision-making for coronary heart disease”, “pancreatitis disease control warning”, etc. ;
  • data mining information such as "patient behavior analysis of cardiovascular and cerebrovascular diseases", “aided clinical decision-making for coronary heart disease”, “pancreatitis disease control warning”, etc. ;
  • word segmentation for example, "patient behavior analysis of cardiovascular and cerebrovascular” can be divided into three key points for mining: “cardio and cerebrovascular”, “patient behavior”, “analysis”; then use the preset semantic analysis model to mine The key points are analyzed by word segmentation.
  • the semantic analysis model has an expert database, and each mining point word segmentation is mapped to the data mining tags with the same meaning, for example, "cardiovascular” is mapped to “cardiovascular” data mining tags, "cerebrovascular” data Mining tags.
  • the medical data related to the specified disease and data application based on the mining content; then select the appropriate model training algorithm for the mining content, and build a business training model to further extract the common features related to the disease and the application from the initially selected medical data.
  • the training content can be obtained as "patient behavior analysis”
  • neighborhood-based algorithms, latent semantic models, and graph-based random walk algorithms can be selected.
  • the medical data is used as a training sample, and the training samples are marked according to the diagnostic content in the medical data, medical image feature areas, etc.; then a corresponding business training model is generated according to the model training algorithm, and the training samples are used for training , to generate the corresponding business model; and finally deploy the business model in the functional layer for use, when there is a user request, such as "patient behavior prediction instruction”, “disease warning instruction”, “data mining instruction”, it will be in
  • the functional layer calls the business model to perform data mining on the corresponding incoming information.
  • the specific construction process of the business model is as follows:
  • the training samples are automatically obtained from the base layer
  • the model training algorithm is automatically obtained from the preset algorithm library
  • the model training algorithm is written into the pre-written model framework to obtain the corresponding business training model
  • the business training model is trained through training samples and annotation files, and the loss function of the model is also determined according to the mining content.
  • the model can be measured by the logistic regression loss function.
  • business data when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the business data
  • the platform layer obtains data mining requests and performs semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, it matches the preset algorithm corresponding to the mining content, and builds a business training model, and on the other hand selects the mining content from the basic layer.
  • the business data corresponding to the content and input the business data as a sample into the business training model for training to build a business model for data mining, and deploy the business model to the functional layer to stand by.
  • the application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
  • the second embodiment of the model deployment method based on big data mining in the embodiment of the present application includes:
  • the platform layer includes a data collection engine, which is used to obtain the latest medical data from multiple medical institutions, and to standardize the medical data, including data cleaning, preprocessing, error correction, and missing value filling. , the process of discretizing continuous values, removing outliers, and normalizing data.
  • a data collection engine which is used to obtain the latest medical data from multiple medical institutions, and to standardize the medical data, including data cleaning, preprocessing, error correction, and missing value filling. , the process of discretizing continuous values, removing outliers, and normalizing data.
  • the corresponding data storage model can be obtained by storing the semantic framework, and the data storage model is extensible.
  • the medical data of different medical institutions has data attributes, such as the records of the medical activities such as the name of the institution, patient information, examination information, diagnosis information, treatment information, etc.
  • Extensible Markup Language (a semantic format) document in the clinical document framework (a document semantic framework) format to save semantic data, while the aforementioned data such as institution name, patient information, examination information, diagnosis information, treatment information, etc.
  • Attributes are corresponding semantic features.
  • medical data of different semantic features are stored in positions corresponding to different tables, table rows, and table columns.
  • medical data can be converted into a fixed semantic format in the form of multi-level tags.
  • the first level tag is the document semantic frame, which can be determined according to different types of medical data, such as image data, text data of electronic medical records,
  • the second-level label is a data table, which can be determined according to different types of diseases, different medical institutions or different patients, and the third-level label is a table column or table row, which is determined according to various patient information, historical medical records, etc.
  • Medical data is stored in accordance with the corresponding fixed semantic format, according to different types of medical data, for specific data attributes, look up the stored data, and use the stored data corresponding to the data attribute as the semantic feature of the medical data, such as corresponding to electronic medical records.
  • the semantic feature can be encoded by a fixed encoding rule and stored into the corresponding location in the corresponding data storage model.
  • the platform layer also includes an algorithm search engine, and the acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer includes:
  • the platform layer also includes an algorithm search engine, which is associated with a preset algorithm library, and can be used to search for a model training algorithm required for mining content from the preset algorithm library, and build a business training model according to the model training algorithm. And then input medical data for training to obtain the final business model; and according to different data mining types (determined based on data mining tags), including disease early warning, clinical diagnosis, patient behavior analysis, etc., determine different data mining attributes, and Differentiate multiple data mining attributes into multiple layers of algorithm labels to determine the final model training algorithm used.
  • algorithm search engine which is associated with a preset algorithm library, and can be used to search for a model training algorithm required for mining content from the preset algorithm library, and build a business training model according to the model training algorithm. And then input medical data for training to obtain the final business model; and according to different data mining types (determined based on data mining tags), including disease early warning, clinical diagnosis, patient behavior analysis, etc., determine different data mining attributes, and Differentiate multiple data mining attributes into multiple layers of algorithm labels to determine the final
  • data mining attributes can be obtained from “disease early warning”: “machine learning”, “logistic regression”, “multi-classification”, “semi-supervised learning”, according to the obtained data mining attributes, can be Determine the following four-layer algorithm labels (i.e., multi-layer algorithm labels) as follows:
  • the first layer is "semi-supervised learning"
  • the second layer is "machine learning"
  • the third layer is "logistic regression"
  • the fourth layer is "multi-classification"
  • the platform layer also includes a data retrieval engine, and the selection of business data corresponding to the mining content from the base layer includes:
  • the platform layer also includes a data retrieval engine, which can retrieve corresponding medical data from the base layer according to the data mining index value, and the data mining label can be mapped to the corresponding data mining index value, such as "cardiovascular", "Cerebral blood vessels” can be mapped to five data mining index values of fields a, b, c, d, and e.
  • the corresponding data storage model, data table Through the index values of fields a, b, c, d, and e, the corresponding data storage model, data table,
  • the data in the table row or table column can be medical data of a certain data storage model, or can be all the data in a certain data table, table row, and table column.
  • the data collection engine in the platform layer is used to crawl business data from multiple business organizations for backup; then, the algorithm search engine is used to select a suitable algorithm library from multiple preset algorithm libraries for deployment Business training model; then select appropriate business data as a sample through the data retrieval engine, and input it into the business training model for training to build the business model required for data mining and realize the intelligent deployment of the business model.
  • An embodiment of the model deployment device includes:
  • the crawling module 301 is used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;
  • a semantic analysis module 302 configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;
  • a selection module 303 configured to acquire a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;
  • the deployment module 304 is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
  • business data when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the business data
  • the platform layer obtains data mining requests and performs semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, it matches the preset algorithm corresponding to the mining content, and builds a business training model, and on the other hand selects the mining content from the basic layer.
  • the business data corresponding to the content and input the business data as a sample into the business training model for training to build a business model for data mining, and deploy the business model to the functional layer to stand by.
  • the application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
  • another embodiment of the apparatus for model deployment based on big data mining in the embodiment of the present application includes:
  • the crawling module 301 is used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;
  • a semantic analysis module 302 configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;
  • a selection module 303 configured to acquire a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;
  • the deployment module 304 is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
  • the platform layer includes a data collection engine
  • the crawling module 301 includes:
  • a data standardization processing unit 3011 configured to crawl business data from multiple institutional databases through the data collection engine, and perform standardization processing on the business data;
  • a format conversion unit 3012 configured to convert the standardized business data into a preset semantic format, and determine the semantic feature of the converted business data based on the semantic format;
  • Association unit 3013 is used to obtain the document semantic frame of the data storage model in the base layer, and according to the document semantic frame, associate the corresponding semantic feature;
  • the storage unit 3014 is configured to store the converted business data in the data storage model based on the associated document semantic framework and semantic features.
  • the semantic analysis module 302 includes:
  • a word segmentation unit 3021 configured to parse the data mining request, obtain corresponding data mining information, and perform word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;
  • the semantic analysis unit 3022 is configured to input the word segmentation of the mining points into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags; based on the data mining tags, determine the mining content corresponding to the data mining request.
  • the platform layer also includes an algorithm search engine
  • the selection module 303 includes an algorithm search unit 3031
  • the algorithm search unit 3031 is used for:
  • the platform layer further includes a data retrieval engine
  • the selection module 303 further includes a data retrieval unit 3032, which is used for:
  • the storage location of the business data corresponding to the mining content is determined and acquired.
  • the deployment module 304 includes:
  • An annotation unit 3041 configured to use the selected business data as a training sample through the platform layer, and annotate the training sample to obtain a corresponding annotation file;
  • the training unit 3042 is configured to generate a business training model according to the model training algorithm, input the training samples and the annotation files into the business training model, and output mining results; based on the mining results, calculate the The loss value of the business training model, and the business training model is trained based on the loss value, and the training is stopped until the loss value is less than the preset loss value, and the corresponding business model is output.
  • the data collection engine, algorithm search engine and data retrieval engine in the platform layer first crawl business data from multiple business organizations for backup; then select a suitable one from multiple preset algorithm libraries
  • the algorithm library deploys the business training model; then select the appropriate business data as a sample and input it into the business training model for training to build the business model required for data mining, realize the intelligent deployment of the business model, and improve the mining efficiency of business data.
  • FIG. 5 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 500 may vary greatly due to different configurations or performance, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store applications 533 or data 532.
  • the memory 520 and the storage medium 530 may be short-term storage or persistent storage.
  • the program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the computer device 500 .
  • the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the computer device 500 .
  • Computer device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • operating systems 531 such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
  • the present application also provides a computer device, the computer device is any device that can perform the steps of the model deployment method based on big data mining in the above-mentioned embodiments, the computer device includes a memory and a processor, and the memory stores a memory and a processor. Computer-readable instructions, when the computer-readable instructions are executed by the processor, cause the processor to execute the steps of the big data mining-based model deployment method in the foregoing embodiments.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • the computer-readable storage medium may also be a volatile computer-readable storage medium.
  • the computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to execute the steps of the model deployment method based on big data mining.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

A big data mining-based model deployment method, apparatus and device, and a storage medium. The method is applied to a big data mining platform, and comprises: performing service data crawling from a plurality of institution databases by means of a platform layer, and updating the service data to a base layer; obtaining a data mining request by means of a service layer, performing semantic analysis on the data mining request, and determining mined content corresponding to the data mining request; obtaining, by means of matching, a training algorithm corresponding to the mined content from a preset algorithm library by means of the platform layer, and selecting service data corresponding to the mined content from the base layer; and by taking the selected service data as a sample and using a training algorithm, establishing a corresponding service model by means of the platform layer, and deploying the service model into a functional layer. The present invention also relates to the blockchain technology. Service data is stored in a blockchain. The method implements intelligent deployment of a service model, and improves the mining efficiency of massive number of services.

Description

基于大数据挖掘的模型部署方法、装置、设备及存储介质Model deployment method, device, equipment and storage medium based on big data mining
本申请要求于2020年12月02日提交中国专利局、申请号为202011386029.X、发明名称为“基于大数据挖掘的模型部署方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。This application claims the priority of the Chinese patent application filed on December 02, 2020, with the application number 202011386029.X and the invention titled "Model Deployment Method, Device, Equipment and Storage Medium Based on Big Data Mining", The entire contents of which are incorporated by reference in the application.
技术领域technical field
本申请涉及人工智能领域,尤其涉及一种基于大数据挖掘的模型部署方法、装置、设备及存储介质。The present application relates to the field of artificial intelligence, and in particular, to a method, device, device and storage medium for model deployment based on big data mining.
背景技术Background technique
区域医疗信息系统中的医疗数据是典型的大数据,大数据具有4V特性(Volume,Velocity,variety,value),包括:(1)更大的容量(Volume):区域医疗数据通常是来自于拥有上百万人口和上百家医疗机构的区域,并且数据量持续增长。按照医疗行业的相关规定,一个患者的数据通常需要保留50年以上;(2)更快的生成速度(Velocity):医疗信息服务中可能包含大量在线或实时数据分析处理的需求。例如:临床决策支持中的诊断和用药建议、流行病分析报表生成、健康指标预警等;(3)更高的多样性(Vanety):医疗数据通常会包含各种结构化数据表、非(半)结构化文本文档(XML和叙述文本)、医疗影像等多种多样的数据存储形式;(4)更多的价值(Value):医疗数据的价值不必多说,它不仅与我们个人生活息息相关,更可用于国家乃至全球的疾病防控、新药研发和顽疾攻克。The medical data in the regional medical information system is typical big data. Big data has 4V characteristics (Volume, Velocity, variety, value), including: (1) Larger volume (Volume): Regional medical data is usually obtained from the An area with millions of people and hundreds of medical institutions, and the amount of data continues to grow. According to the relevant regulations of the medical industry, a patient's data usually needs to be retained for more than 50 years; (2) Faster generation speed (Velocity): Medical information services may include a large number of online or real-time data analysis and processing needs. For example: diagnosis and medication recommendations in clinical decision support, epidemiological analysis report generation, health indicator early warning, etc.; (3) Higher diversity (Vanety): medical data usually contains various structured data tables, non-(semi-semi) ) Structured text documents (XML and narrative text), medical images and other forms of data storage; (4) More value (Value): Needless to say, the value of medical data is not only closely related to our personal life, It can also be used for national and even global disease prevention and control, new drug research and development, and overcoming chronic diseases.
发明人意识到,目前业内对于医疗数据的采集、保存、挖掘、应用均独立进行,尤其是从医疗数据挖掘与医疗有关的信息时,无法一步到位采集最新数据、筛选可用目标医疗数据作为样本,搭建及部署分析模型、对模型输出结果进行可视化处理等,全部挖掘工作仍需从头做起,导致医疗数据挖掘效率低。The inventor realizes that the collection, storage, mining and application of medical data in the industry are all carried out independently at present, especially when mining medical-related information from medical data, it is impossible to collect the latest data in one step and screen the available target medical data as samples. Building and deploying analysis models, visualizing model output results, etc., all mining work still needs to be done from scratch, resulting in low efficiency of medical data mining.
发明内容SUMMARY OF THE INVENTION
本申请的主要目的在于解决医疗数据挖掘效率低且部署不够灵活的技术问题。The main purpose of this application is to solve the technical problems of low medical data mining efficiency and inflexible deployment.
本申请第一方面提供了一种基于大数据挖掘的模型部署方法,应用于大数据挖掘平台,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述基于大数据挖掘的模型部署方法包括:A first aspect of the present application provides a model deployment method based on big data mining, which is applied to a big data mining platform. The big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer, and a basic layer. The model deployment method based on big data mining includes:
每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;
获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;
获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;
以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
本申请第二方面提供了一种计算机设备,包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;所述至少一个处理器调用所述存储器中的所述指令,以使得所述计算机设备执行如下所述的基于大数据挖掘的模型部署方法的步骤,其中,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述基于大数据挖掘的模型部署方法的步骤包括:A second aspect of the present application provides a computer device, comprising: a memory and at least one processor, wherein instructions are stored in the memory, the memory and the at least one processor are interconnected by a line; the at least one processor Invoke the instructions in the memory, so that the computer device executes the steps of the model deployment method based on big data mining as described below, wherein the big data mining platform sequentially includes from top to bottom: a business layer, Function layer, platform layer and base layer, the steps of the model deployment method based on big data mining include:
每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据 更新至所述基础层中;Every preset period, crawl business data from each institutional database by the platform layer, and update the business data in the base layer;
获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;
获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;
以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
本申请的第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如下所述的基于大数据挖掘的模型部署方法的步骤,其中,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述基于大数据挖掘的模型部署方法的步骤包括:A third aspect of the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, when the computer-readable storage medium runs on a computer, the computer executes the model deployment based on big data mining as described below The steps of the method, wherein the big data mining platform sequentially includes from top to bottom: a business layer, a functional layer, a platform layer and a basic layer, and the steps of the big data mining-based model deployment method include:
每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;
获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;
获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;
以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
本申请第四方面提供了一种基于大数据挖掘的模型部署装置,应用于大数据挖掘平台,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述基于大数据挖掘的模型部署装置包括:A fourth aspect of the present application provides a model deployment device based on big data mining, which is applied to a big data mining platform. The big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer, and a basic layer. The model deployment device based on big data mining includes:
爬取模块,用于每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;A crawling module, used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;
语义分析模块,用于获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;a semantic analysis module, configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;
选取模块,用于获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;a selection module, configured to obtain a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;
部署模块,用于以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。The deployment module is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
本申请提供的技术方案中,在不进行业务数据挖掘时,可通过平台层从多个机构数据库中爬取业务数据,并将业务数据更新至基础层中;在进行业务数据挖掘时,则先通过业务层获取数据挖掘请求并进行语义分析,以确定当前业务数据挖掘的挖掘内容;然后通过平台层一方面匹配挖掘内容对应的训练算法,并搭建业务训练模型,另一方面从基础层中选取挖掘内容对应的业务数据,并将业务数据作为样本输入业务训练模型中进行训练,以搭建用于数据挖掘的业务模型,并将业务模型部署到功能层中待命即可。本申请实现了业务模型的智能部署,提升了对海量业务数的挖掘效率。In the technical solution provided by this application, when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the Obtain data mining requests through the business layer and perform semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, the training algorithm corresponding to the mining content is matched, and a business training model is built, and on the other hand, it is selected from the basic layer. The business data corresponding to the content is mined, and the business data is input into the business training model as a sample for training to build a business model for data mining, and the business model can be deployed to the function layer for standby. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
附图说明Description of drawings
图1为本申请中基于大数据挖掘的模型部署方法的一个实施例示意图;1 is a schematic diagram of an embodiment of a model deployment method based on big data mining in the application;
图2为本申请中基于大数据挖掘的模型部署方法的另一个实施例示意图;2 is a schematic diagram of another embodiment of a model deployment method based on big data mining in the application;
图3为本申请中基于大数据挖掘的模型部署装置的一个实施例示意图;3 is a schematic diagram of an embodiment of a model deployment device based on big data mining in the present application;
图4为本申请中基于大数据挖掘的模型部署装置的另一个实施例示意图;4 is a schematic diagram of another embodiment of a model deployment device based on big data mining in the present application;
图5为本申请中计算机设备的一个实施例示意图。FIG. 5 is a schematic diagram of an embodiment of the computer device in this application.
具体实施方式Detailed ways
本申请实施例提供了一种基于大数据挖掘的模型部署方法、装置、设备及存储介质,通过平台层从多个机构数据库中爬取业务数据,并将业务数据更新至基础层中;通过业务层获取数据挖掘请求,并对数据挖掘请求进行语义分析,确定数据挖掘请求对应的挖掘内容;通过平台层匹配挖掘内容对应的训练算法,并从基础层中选取挖掘内容对应的业务数据;以选取的业务数据为样本,采用训练算法,通过平台层搭建对应的业务模型,并将业务模型部署到功能层中。本申请还涉及区块链技术,业务数据存储于区块链中。本申请实现了业务模型的智能部署,提升了对海量业务数的挖掘效率。The embodiments of the present application provide a model deployment method, device, equipment and storage medium based on big data mining, which crawls business data from multiple institutional databases through the platform layer, and updates the business data to the base layer; The data mining request is obtained from the platform layer, and the data mining request is semantically analyzed to determine the mining content corresponding to the data mining request; the training algorithm corresponding to the mining content is matched by the platform layer, and the business data corresponding to the mining content is selected from the base layer; The business data is used as a sample, and the training algorithm is used to build the corresponding business model through the platform layer, and deploy the business model to the function layer. This application also relates to blockchain technology, and business data is stored in the blockchain. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”或“具有”及其任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description and claims of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that data so used may be interchanged under appropriate circumstances so that the embodiments described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" or "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or device comprising a series of steps or units is not necessarily limited to those expressly listed steps or units, but may include other steps or units not expressly listed or inherent to these processes, methods, products or devices.
在进行实施例的说明之前,以医疗领域为例,先对大数据挖掘平台进行介绍。大数据挖掘平台中至少可以包含基础层、平台层、功能层、业务层,具体如下所示:Before the description of the embodiment, the big data mining platform is introduced by taking the medical field as an example. The big data mining platform can at least include the basic layer, platform layer, functional layer, and business layer, as follows:
基础层:存储有大量的医疗数据,比如医学影像资料中常见的CT(Computed Tomography,电子计算机断层扫描)、MRI(Magnetic Resonance Imaging,磁共振成象)的图像数据、医生诊断报告数据等,根据不同的医疗数据类型,以对应的固定数据存储格式进行存储;而数据库可以采用传统的集中存储和HDFS(Hadoop Distributed File System,分布式文件系统)相结合的文件存储架构,利用HBase中行键、列键、列族设计的灵活性,将多维医疗数据有效地组织在一起,实现传统数据仓库中的多维数据存储模型。Basic layer: Stores a large amount of medical data, such as CT (Computed Tomography), MRI (Magnetic Resonance Imaging, magnetic resonance imaging) image data, doctor diagnosis report data, etc., which are common in medical imaging data. Different medical data types are stored in the corresponding fixed data storage format; while the database can use a file storage architecture that combines traditional centralized storage and HDFS (Hadoop Distributed File System, distributed file system), using the row keys and columns in HBase. The flexibility of key and column family design effectively organizes multi-dimensional medical data together and realizes the multi-dimensional data storage model in traditional data warehouses.
平台层:作为数据处理的主要区域,至少包含有数据收集引擎、算法搜索引擎、数据检索引擎等功能模块;另外亦作为业务模型训练的场所。Platform layer: As the main area of data processing, it contains at least functional modules such as data collection engine, algorithm search engine, data retrieval engine, etc. It also serves as a place for business model training.
对于数据收集引擎,可以以MapReduce(映射-归约)为计算核心,采用Flume/Sqoop(一种数据收集系统)实现从多个医疗机构数据库中)抽取数据、再进行标准化处理、转换格式并装载入基础层的数据存储区域中;For the data collection engine, MapReduce (mapping-reduce) can be used as the computing core, and Flume/Sqoop (a data collection system) can be used to extract data from multiple medical institution databases, then standardize, convert the format and install it. Loaded into the data storage area of the base layer;
对于算法搜索引擎,可以采用Hadoop ML/Mahout进行搭建,关联算法库,可提供贝叶斯判别分析、聚类、决策树、关联度算法、推荐算法等多种训练算法,为辅助临床疾病诊断、行为分析等医疗数据挖掘提供算法支撑;For the algorithm search engine, Hadoop ML/Mahout can be used to build, and the associated algorithm library can provide various training algorithms such as Bayesian discriminant analysis, clustering, decision tree, correlation algorithm, recommendation algorithm, etc., to assist clinical disease diagnosis, Behavior analysis and other medical data mining provides algorithm support;
对于数据检索引擎,可以采用Apache Hive(一种数据库工具)提供的类SQL(Structured Query Language,结构化查询语言)查询的接口,为分析人员提供便捷的数据获取方式。For the data retrieval engine, the SQL-like (Structured Query Language) query interface provided by Apache Hive (a database tool) can be used to provide analysts with a convenient way to obtain data.
功能层:各类型业务模型训练完成后的部署场所,通过各类型业务模型,提供医疗数据的即时查询、统计分析、深度挖掘、机器学习等功能,为业务层提供功能支持。Functional layer: The deployment site after the training of various types of business models is completed. Through various types of business models, real-time query, statistical analysis, deep mining, machine learning and other functions of medical data are provided to provide functional support for the business layer.
业务层:直接对接客户终端,在功能层的支撑下,提供与功能层相对应的医疗数据的即时查询、统计分析、深度挖掘、机器学习等应用。Business layer: directly connected to the client terminal, under the support of the function layer, it provides real-time query, statistical analysis, deep mining, machine learning and other applications of medical data corresponding to the function layer.
接下来为便于理解,下面对本申请实施例的具体流程进行描述,请参阅图1,本申请实施例中基于大数据挖掘的模型部署方法的第一个实施例包括:Next, for ease of understanding, the specific process of the embodiment of the present application will be described below. Please refer to FIG. 1 . The first embodiment of the model deployment method based on big data mining in the embodiment of the present application includes:
S101、每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;S101, every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;
可以理解的是,本申请的执行主体可以为基于大数据挖掘的模型部署装置,还可以是终端或者服务器,具体此处不做限定。本申请实施例以服务器为执行主体为例进行说明。需要强调的是,为进一步保证上述业务数据的私密和安全性,上述业务数据还可以存储于一区块链的节点中。另外,业务数据可包括医疗数据、保险数据、交通数据、用户数据、网购数据等,各类型业务数据使用发明方法进行数据挖掘时,步骤流程本质上相同,本实施例中不一一举例,下面以医疗数据为例进行说明。It can be understood that the execution body of the present application may be a model deployment device based on big data mining, and may also be a terminal or a server, which is not specifically limited here. The embodiments of the present application take the server as an execution subject as an example for description. It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned business data, the above-mentioned business data can also be stored in a node of a blockchain. In addition, the business data may include medical data, insurance data, traffic data, user data, online shopping data, etc. When various types of business data are mined using the inventive method, the steps and processes are essentially the same, and examples are not provided in this embodiment. Take medical data as an example.
本实施例中,大数据挖掘平台在不执行医疗数据的挖掘任务时,持续更新基础层中的医疗数据,以供后面执行挖掘任务时,可及时提供最新医疗数据进行数据挖掘,无需重新检查及爬取最新医疗数据,增加医疗数据的挖掘效率。其中,对于基础层中医疗数据的更新,自动检验每条医疗数据是否已存在索引,当医疗数据已经存在索引,则以当前的医疗数据替代对应的原医疗数据的字段值,若未存在索引值,则将医疗数据新插入基础层中,完成医疗数据的更新。对于不同的医疗机构,预设周期可以不同,采用异步方式进行医疗数据的爬取,比如对于大型的医疗机构,可每天依次采集医疗数据,即预设周期为24小时,而对于研究所,由于医疗数据更新数量较少,故可设置更长的周期进行爬取,比如一个星期或者一个月。In this embodiment, when the big data mining platform does not perform the mining task of medical data, it continuously updates the medical data in the base layer, so that when the mining task is performed later, the latest medical data can be provided in time for data mining without re-checking and Crawl the latest medical data to increase the mining efficiency of medical data. Among them, for the update of medical data in the basic layer, it is automatically checked whether each medical data has an index. When the medical data already has an index, the current medical data is used to replace the field value of the corresponding original medical data. If there is no index value , the medical data is newly inserted into the base layer to complete the update of the medical data. For different medical institutions, the preset period can be different, and the crawling of medical data is performed asynchronously. For example, for large medical institutions, medical data can be collected sequentially every day, that is, the preset period is 24 hours. For research institutes, due to The number of medical data updates is small, so a longer period of crawling can be set, such as a week or a month.
平台层中有专门的数据收集引擎从各医疗机构中获取医疗数据,而基础层中有固定存储医疗数据的存储区域,并以固定格式的数据存储模型进行存储;对于数据存储模型,每一个数据存储模型具有模型编号,各数据存储模型中包含多张数据表,以表格编号进行标识,各数据表中行编号、列编号,故每一条医疗数据可通过模型编号+表编号+行编号+列编号进行唯一标识。There is a special data collection engine in the platform layer to obtain medical data from various medical institutions, and there is a storage area for fixed storage of medical data in the base layer, which is stored in a fixed-format data storage model; for the data storage model, each data The storage model has a model number. Each data storage model contains multiple data tables, which are identified by the table number. The row number and column number in each data table, so each piece of medical data can be identified by the model number + table number + row number + column number. to uniquely identify.
S102、获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;S102, acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;
本实施例中,业务层直接与客户终端对接,客户终端收到的数据挖掘请求中包含有数据挖掘信息,可依据用户单选或者复选的内容、文本框输入的内容得到,根据数据挖掘信息,可通过语义分析模型,分析数据挖掘信息中的具体挖掘内容,并以数据挖掘标签进行表示。而具体的语义分析过程如下所示:In this embodiment, the business layer is directly connected with the client terminal, and the data mining request received by the client terminal contains data mining information, which can be obtained according to the content selected by the user or the content entered in the text box, according to the data mining information , through the semantic analysis model, the specific mining content in the data mining information can be analyzed and represented by the data mining label. The specific semantic analysis process is as follows:
(1)解析所述数据挖掘请求,得到对应的数据挖掘信息,并对所述数据挖掘信息进行分词处理,得到多个挖掘要点分词;(1) parsing the data mining request, obtaining corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;
(2)将所述各挖掘要点分词输入预置语义分析模型中进行语义分析,得到多个数据挖掘标签;(2) Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining labels;
(3)基于所述数据挖掘标签,确定所述数据挖掘请求对应的挖掘内容。(3) Determine the mining content corresponding to the data mining request based on the data mining tag.
本实施例中,数据挖掘请求中除用户身份认证信息等,还包含数据挖掘信息,比如“心脑血管的患者行为分析”、“冠心病的辅助临床决策”、“胰腺炎疾控预警”等;在分词处理后,比如“心脑血管的患者行为分析”即可分为三个挖掘要点分词:“心脑血管”、“患者行为”、“分析”;然后使用预置语义分析模型对挖掘要点分词进行分析,语义分析模型中带有专家数据库,将各挖掘要点分词映射至实质意思相同的数据挖掘标签,比如“心脑血管”映射至“心血管”数据挖掘标签、“脑血管”数据挖掘标签。In this embodiment, in addition to user identity authentication information, the data mining request also includes data mining information, such as "patient behavior analysis of cardiovascular and cerebrovascular diseases", "aided clinical decision-making for coronary heart disease", "pancreatitis disease control warning", etc. ; After word segmentation, for example, "patient behavior analysis of cardiovascular and cerebrovascular" can be divided into three key points for mining: "cardio and cerebrovascular", "patient behavior", "analysis"; then use the preset semantic analysis model to mine The key points are analyzed by word segmentation. The semantic analysis model has an expert database, and each mining point word segmentation is mapped to the data mining tags with the same meaning, for example, "cardiovascular" is mapped to "cardiovascular" data mining tags, "cerebrovascular" data Mining tags.
S103、获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;S103, acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;
本实施例中,由于医疗数据数量庞大,比如单个用户的每一个影像或报告大小都是以兆计算,如此多的医疗数据存储在基础层,需要借助机器学习和数据挖掘算法来自动分析医疗数据,从大量医疗数据中获取有效的、新颖的、潜在有用的、可理解的医疗数据,能够发现隐含在大规模医疗数据中的特性知识。In this embodiment, due to the huge amount of medical data, for example, the size of each image or report of a single user is calculated in megabytes, and so much medical data is stored in the basic layer, it is necessary to use machine learning and data mining algorithms to automatically analyze medical data. , obtain effective, novel, potentially useful and understandable medical data from a large amount of medical data, and can discover the characteristic knowledge implicit in the large-scale medical data.
首先根据挖掘内容初步选取指定疾病与数据应用相关的医疗数据;然后为挖掘内容筛 选合适的模型训练算法,构建业务训练模型,以从初步选取的医疗数据中进一步提取疾病与应用相关的共性特征,比如对于“心脑血管的患者行为分析”,可得到训练内容为“患者行为分析”,则可以选取基于邻域的算法,隐语义模型、基于图的随机游走算法等。First select the medical data related to the specified disease and data application based on the mining content; then select the appropriate model training algorithm for the mining content, and build a business training model to further extract the common features related to the disease and the application from the initially selected medical data. For example, for "patient behavior analysis of cardiovascular and cerebrovascular", the training content can be obtained as "patient behavior analysis", then neighborhood-based algorithms, latent semantic models, and graph-based random walk algorithms can be selected.
S104、以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。S104. Using the selected business data as a training sample, use the model training algorithm for training to generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
本实施例中,将医疗数据作为训练样本,并根据医疗数据中的诊断内容、医疗图像特征区域等对训练样本进行标注;然后根据模型训练算法生成对应的业务训练模型,并采用训练样本进行训练,以生成对应的业务模型;而最终将业务模型部署在功能层中进行使用,当有用户请求,比如“患者行为预测指令”、“疾病预警指令”、“数据挖掘指令”进入时,即在功能层调用业务模型对对应的传入信息进行数据挖掘即可。而业务模型的具体搭建过程如下所示:In this embodiment, the medical data is used as a training sample, and the training samples are marked according to the diagnostic content in the medical data, medical image feature areas, etc.; then a corresponding business training model is generated according to the model training algorithm, and the training samples are used for training , to generate the corresponding business model; and finally deploy the business model in the functional layer for use, when there is a user request, such as "patient behavior prediction instruction", "disease warning instruction", "data mining instruction", it will be in The functional layer calls the business model to perform data mining on the corresponding incoming information. The specific construction process of the business model is as follows:
(1)通过所述平台层将选取的业务数据作为训练样本,并对所述训练样本进行标注,得到对应的标注文件;(1) using the selected business data as a training sample through the platform layer, and labeling the training sample to obtain a corresponding annotation file;
(2)根据所述模型训练算法,生成业务训练模型,并将所述训练样本和所述标注文件输入所述业务训练模型中,输出挖掘结果;(2) generating a business training model according to the model training algorithm, inputting the training sample and the annotation file into the business training model, and outputting mining results;
(3)基于所述挖掘结果,计算所述业务训练模型的损失值,并基于所述损失值对所述业务训练模型进行训练,直到所述损失值小于预置损失值时停止训练,输出对应的业务模型。(3) Calculate the loss value of the business training model based on the mining result, and train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model.
本实施例中,训练样本从基础层中自动获取,模型训练算法从预置算法库中自动获取,将模型训练算法写入预先写好的模型框架中,即可得到对应的业务训练模型,然后通过训练样本和标注文件对业务训练模型进行训练,而模型的损失函数亦根据挖掘内容而定,比如对于患者行为分析,可通过逻辑回归损失函数对模型进行衡量。In this embodiment, the training samples are automatically obtained from the base layer, the model training algorithm is automatically obtained from the preset algorithm library, and the model training algorithm is written into the pre-written model framework to obtain the corresponding business training model, and then The business training model is trained through training samples and annotation files, and the loss function of the model is also determined according to the mining content. For example, for the analysis of patient behavior, the model can be measured by the logistic regression loss function.
本申请实施例中,在不进行业务数据挖掘时,可通过平台层从多个机构数据库中爬取业务数据,并将业务数据更新至基础层中;在进行业务数据挖掘时,则先通过业务层获取数据挖掘请求并进行语义分析,以确定当前业务数据挖掘的挖掘内容;然后通过平台层一方面匹配挖掘内容对应的预置算法,并搭建业务训练模型,另一方面从基础层中选取挖掘内容对应的业务数据,并将业务数据作为样本输入业务训练模型中进行训练,以搭建用于数据挖掘的业务模型,并将业务模型部署到功能层中待命即可。本申请实现了业务模型的智能部署,提升了对海量业务数的挖掘效率。In the embodiment of the present application, when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the business data The platform layer obtains data mining requests and performs semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, it matches the preset algorithm corresponding to the mining content, and builds a business training model, and on the other hand selects the mining content from the basic layer. The business data corresponding to the content, and input the business data as a sample into the business training model for training to build a business model for data mining, and deploy the business model to the functional layer to stand by. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
请参阅图2,本申请实施例中基于大数据挖掘的模型部署方法的第二个实施例包括:Referring to FIG. 2, the second embodiment of the model deployment method based on big data mining in the embodiment of the present application includes:
S201、每隔预设周期,通过所述数据收集引擎从多个机构数据库中爬取业务数据,并对所述业务数据进行标准化处理;S201, every preset period, crawl business data from multiple institutional databases through the data collection engine, and standardize the business data;
本实施例中,平台层包含数据收集引擎,用于从多个医疗机构中获取最新的医疗数据,并对医疗数据进行标准化处理,包括主要包括数据清洗、预处理、错值纠正、缺失值填补、连续值离散化、去掉异常值、以及数据归一化的过程。In this embodiment, the platform layer includes a data collection engine, which is used to obtain the latest medical data from multiple medical institutions, and to standardize the medical data, including data cleaning, preprocessing, error correction, and missing value filling. , the process of discretizing continuous values, removing outliers, and normalizing data.
S202、将标准化处理后的业务数据转换为预置的语义格式,并基于所述语义格式,确定所述转换后的业务数据的语义特征;S202, converting the standardized business data into a preset semantic format, and determining the semantic feature of the converted business data based on the semantic format;
S203、获取所述基础层中数据存储模型的文档语义框架,并根据所述文档语义框架,关联对应的语义特征;S203, obtaining the document semantic framework of the data storage model in the base layer, and associating corresponding semantic features according to the document semantic framework;
S204、基于关联后的文档语义框架与语义特征,将所述转换后的业务数据存储至所述数据存储模型中;S204, based on the associated document semantic framework and semantic features, store the converted business data in the data storage model;
本实施例中,在医疗数据标准化处理后,还需对其转化为固定的语义格式,而不同的语义格式具有对应的文档语义框架,同一语义格式的医疗数据依据其语义特征,在对应的 文档语义框架进行存储即可得到对应的数据存储模型,且数据存储模型具有可扩展性。不同医疗机构的医疗数据本身带有数据属性,比如机构名称、患者信息、检查信息、诊断信息、治疗信息等医疗活动过程的记录,比如对于电子病历中的医疗数据,存储时需采用基于语义网的临床文档框架(一种文档语义框架)格式的可扩展标记语言(一种语义格式)文档来保存语义数据,而前面提及的机构名称、患者信息、检查信息、诊断信息、治疗信息等数据属性,即为对应的语义特征,在文档语义框架中,不同的语义特征,其医疗数据存储在不同的表格、表行、表列对应的位置中。In this embodiment, after the medical data is standardized, it needs to be converted into a fixed semantic format, and different semantic formats have corresponding document semantic frameworks. The corresponding data storage model can be obtained by storing the semantic framework, and the data storage model is extensible. The medical data of different medical institutions has data attributes, such as the records of the medical activities such as the name of the institution, patient information, examination information, diagnosis information, treatment information, etc. Extensible Markup Language (a semantic format) document in the clinical document framework (a document semantic framework) format to save semantic data, while the aforementioned data such as institution name, patient information, examination information, diagnosis information, treatment information, etc. Attributes are corresponding semantic features. In the document semantic framework, medical data of different semantic features are stored in positions corresponding to different tables, table rows, and table columns.
而根据数据属性,可以将医疗数据以多级标签的方式转换为固定的语义格式,第一级标签为文档语义框架,可以根据不同类型的医疗数据确定,比如图像数据、电子病历的文本数据、第二级标签为数据表,可以根据不同类型疾病、不同医疗机构或者不同患者进行确定,第三集标签为表列或者表行,根据各条患者信息、历史病历内容等进行确定。According to the data attributes, medical data can be converted into a fixed semantic format in the form of multi-level tags. The first level tag is the document semantic frame, which can be determined according to different types of medical data, such as image data, text data of electronic medical records, The second-level label is a data table, which can be determined according to different types of diseases, different medical institutions or different patients, and the third-level label is a table column or table row, which is determined according to various patient information, historical medical records, etc.
医疗数据依照对应固定的语义格式进行存储,根据不同类型的医疗数据,针对特定的数据属性,查找其存储数据,将对应数据属性的存储数据作为该医疗数据的语义特征即可,比如对应电子病历中,有存储用户信息的表格,其中的用户年龄、血型属性相应的存储数据即可作为医疗数据进行用户行为分析时的语义特征,其中,语义特征通过固定的编码规则进行编码即可,并存储至相应数据存储模型中的相应位置中。Medical data is stored in accordance with the corresponding fixed semantic format, according to different types of medical data, for specific data attributes, look up the stored data, and use the stored data corresponding to the data attribute as the semantic feature of the medical data, such as corresponding to electronic medical records. There is a table for storing user information, and the stored data corresponding to the user's age and blood type attribute can be used as the semantic feature of medical data for user behavior analysis. The semantic feature can be encoded by a fixed encoding rule and stored into the corresponding location in the corresponding data storage model.
所述平台层中还包含算法搜索引擎,所述获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法包括:The platform layer also includes an algorithm search engine, and the acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer includes:
S205、基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘属性,并基于所述数据挖掘属性,确定对应的多层算法标签;S205, determining the data mining attribute corresponding to the data mining content based on the data mining tag, and determining the corresponding multi-layer algorithm tag based on the data mining attribute;
S206、基于所述多层算法标签,通过所述算法搜索引擎,获取预置算法库中与所述挖掘内容匹配的模型训练算法;S206, based on the multi-layer algorithm label, through the algorithm search engine, obtain a model training algorithm matching the mining content in a preset algorithm library;
本实施例中,平台层中还包含算法搜索引擎,与预置算法库相关联,可用于从预置算法库中搜索挖掘内容所需的的模型训练算法,根据模型训练算法搭建业务训练模型,并后续输入医疗数据进行训练,以得到最终的业务模型;而根据不同的数据挖掘类型(基于数据挖掘标签确定),包括疾病预警、临床诊断、患者行为分析等,确定不同的数据挖掘属性,并将多个数据挖掘属性区分为多层算法标签,以确定最终使用的模型训练算法。In this embodiment, the platform layer also includes an algorithm search engine, which is associated with a preset algorithm library, and can be used to search for a model training algorithm required for mining content from the preset algorithm library, and build a business training model according to the model training algorithm. And then input medical data for training to obtain the final business model; and according to different data mining types (determined based on data mining tags), including disease early warning, clinical diagnosis, patient behavior analysis, etc., determine different data mining attributes, and Differentiate multiple data mining attributes into multiple layers of algorithm labels to determine the final model training algorithm used.
比如对于数据挖掘标签“疾病预警”,从“疾病预警”可以得到数据挖掘属性:“机器学习”、“逻辑回归”,“多分类”、“半监督学习”,根据得到的数据挖掘属性,可确定以下四层算法标签(即多层算法标签)如下所示:For example, for the data mining label "disease early warning", data mining attributes can be obtained from "disease early warning": "machine learning", "logistic regression", "multi-classification", "semi-supervised learning", according to the obtained data mining attributes, can be Determine the following four-layer algorithm labels (i.e., multi-layer algorithm labels) as follows:
第一层为“半监督学习”;The first layer is "semi-supervised learning";
第二层为“机器学习”;The second layer is "machine learning";
第三层为“逻辑回归”;The third layer is "logistic regression";
第四层为“多分类”;The fourth layer is "multi-classification";
通过上面四层算法标签,即可搜索到“softmax”算法。Through the above four layers of algorithm labels, you can search for the "softmax" algorithm.
所述平台层中还包含数据检索引擎,所述从所述基础层中选取与所述挖掘内容对应的业务数据包括:The platform layer also includes a data retrieval engine, and the selection of business data corresponding to the mining content from the base layer includes:
S207、基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘索引值;S207, based on the data mining label, determine the data mining index value corresponding to the data mining content;
S208、根据所述数据挖掘索引值,通过所述数据检索引擎,确定与所述挖掘内容对应的业务数据的存储位置并进行获取;S208, according to the data mining index value, through the data retrieval engine, determine and obtain the storage location of the business data corresponding to the mining content;
本实施例中,平台层中还包含数据检索引擎,可根据数据挖掘索引值从基础层中检索对应的医疗数据,而数据挖掘标签可映射到对应的数据挖掘索引值,比如“心血管”、“脑血管”可映射到字段a、b、c、d、e五个数据挖掘索引值,通过字段a、b、c、d、e的索引值,即可找到对应数据存储模型、数据表、表行或者表列中数据,可以是某一个数据存储 模型的医疗数据,也可以是某一数据表、表行、表列中的全部数据。In this embodiment, the platform layer also includes a data retrieval engine, which can retrieve corresponding medical data from the base layer according to the data mining index value, and the data mining label can be mapped to the corresponding data mining index value, such as "cardiovascular", "Cerebral blood vessels" can be mapped to five data mining index values of fields a, b, c, d, and e. Through the index values of fields a, b, c, d, and e, the corresponding data storage model, data table, The data in the table row or table column can be medical data of a certain data storage model, or can be all the data in a certain data table, table row, and table column.
在选取得到所述挖掘内容对应的模型训练算法和业务数据后,则可进行以下业务模型的训练:After selecting and obtaining the model training algorithm and business data corresponding to the mining content, the following business models can be trained:
S209、以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型。S209. Using the selected business data as a training sample, use the model training algorithm to perform training to generate a corresponding business model.
本申请实施例中,通过平台层中的数据收集引擎,从多个业务机构中爬取业务数据以供备用;然后通过算法搜索引擎,从预置的多个算法库中选择合适的算法库部署业务训练模型;再通过数据检索引擎选取合适的业务数据作为样本,输入业务训练模型中进行训练,以搭建数据挖掘所需的业务模型,实现业务模型的智能部署。In the embodiment of the present application, the data collection engine in the platform layer is used to crawl business data from multiple business organizations for backup; then, the algorithm search engine is used to select a suitable algorithm library from multiple preset algorithm libraries for deployment Business training model; then select appropriate business data as a sample through the data retrieval engine, and input it into the business training model for training to build the business model required for data mining and realize the intelligent deployment of the business model.
上面对本申请实施例中基于大数据挖掘的模型部署方法进行了描述,下面对本申请实施例中基于大数据挖掘的模型部署装置进行描述,请参阅图3,本申请实施例中基于大数据挖掘的模型部署装置一个实施例包括:The model deployment method based on big data mining in the embodiment of the present application has been described above. The following describes the model deployment device based on big data mining in the embodiment of the present application. Please refer to FIG. 3 . An embodiment of the model deployment device includes:
爬取模块301,用于每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;The crawling module 301 is used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;
语义分析模块302,用于获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;A semantic analysis module 302, configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;
选取模块303,用于获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;A selection module 303, configured to acquire a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;
部署模块304,用于以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。The deployment module 304 is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
本申请实施例中,在不进行业务数据挖掘时,可通过平台层从多个机构数据库中爬取业务数据,并将业务数据更新至基础层中;在进行业务数据挖掘时,则先通过业务层获取数据挖掘请求并进行语义分析,以确定当前业务数据挖掘的挖掘内容;然后通过平台层一方面匹配挖掘内容对应的预置算法,并搭建业务训练模型,另一方面从基础层中选取挖掘内容对应的业务数据,并将业务数据作为样本输入业务训练模型中进行训练,以搭建用于数据挖掘的业务模型,并将业务模型部署到功能层中待命即可。本申请实现了业务模型的智能部署,提升了对海量业务数的挖掘效率。In the embodiment of the present application, when business data mining is not performed, business data can be crawled from multiple institutional databases through the platform layer, and the business data can be updated to the basic layer; when business data mining is performed, the business data The platform layer obtains data mining requests and performs semantic analysis to determine the mining content of the current business data mining; then, through the platform layer, on the one hand, it matches the preset algorithm corresponding to the mining content, and builds a business training model, and on the other hand selects the mining content from the basic layer. The business data corresponding to the content, and input the business data as a sample into the business training model for training to build a business model for data mining, and deploy the business model to the functional layer to stand by. The application realizes the intelligent deployment of the business model, and improves the mining efficiency of massive business numbers.
请参阅图4,本申请实施例中基于大数据挖掘的模型部署装置的另一个实施例包括:Referring to FIG. 4 , another embodiment of the apparatus for model deployment based on big data mining in the embodiment of the present application includes:
爬取模块301,用于每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;The crawling module 301 is used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;
语义分析模块302,用于获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;A semantic analysis module 302, configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;
选取模块303,用于获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;A selection module 303, configured to acquire a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;
部署模块304,用于以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。The deployment module 304 is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
具体的,所述平台层中包含数据收集引擎,所述爬取模块301包括:Specifically, the platform layer includes a data collection engine, and the crawling module 301 includes:
数据标准化处理单元3011,用于通过所述数据收集引擎从多个机构数据库中爬取业务数据,并对所述业务数据进行标准化处理;A data standardization processing unit 3011, configured to crawl business data from multiple institutional databases through the data collection engine, and perform standardization processing on the business data;
格式转换单元3012,用于将标准化处理后的业务数据转换为预置的语义格式,并基于所述语义格式,确定所述转换后的业务数据的语义特征;a format conversion unit 3012, configured to convert the standardized business data into a preset semantic format, and determine the semantic feature of the converted business data based on the semantic format;
关联单元3013,用于获取所述基础层中数据存储模型的文档语义框架,并根据所述文 档语义框架,关联对应的语义特征; Association unit 3013 is used to obtain the document semantic frame of the data storage model in the base layer, and according to the document semantic frame, associate the corresponding semantic feature;
存储单元3014,用于基于关联后的文档语义框架与语义特征,将所述转换后的业务数据存储至所述数据存储模型中。The storage unit 3014 is configured to store the converted business data in the data storage model based on the associated document semantic framework and semantic features.
具体的,所述语义分析模块302包括:Specifically, the semantic analysis module 302 includes:
分词单元3021,用于解析所述数据挖掘请求,得到对应的数据挖掘信息,并对所述数据挖掘信息进行分词处理,得到多个挖掘要点分词;A word segmentation unit 3021, configured to parse the data mining request, obtain corresponding data mining information, and perform word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;
语义分析单元3022,用于将所述各挖掘要点分词输入预置语义分析模型中进行语义分析,得到多个数据挖掘标签;基于所述数据挖掘标签,确定所述数据挖掘请求对应的挖掘内容。The semantic analysis unit 3022 is configured to input the word segmentation of the mining points into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags; based on the data mining tags, determine the mining content corresponding to the data mining request.
具体的,所述平台层中还包含算法搜索引擎,所述选取模块303包括算法搜索单元3031,所述算法搜索单元3031用于:Specifically, the platform layer also includes an algorithm search engine, the selection module 303 includes an algorithm search unit 3031, and the algorithm search unit 3031 is used for:
基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘属性,并基于所述数据挖掘属性,确定对应的多层算法标签;Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;
基于所述多层算法标签,通过所述算法搜索引擎,获取预置算法库中与所述挖掘内容匹配的模型训练算法。Based on the multi-layer algorithm tags, through the algorithm search engine, obtain a model training algorithm matching the mining content in the preset algorithm library.
具体的,所述平台层中还包含数据检索引擎,所述选取模块303还包括数据检索单元3032,所述数据检索单元3032用于:Specifically, the platform layer further includes a data retrieval engine, and the selection module 303 further includes a data retrieval unit 3032, which is used for:
基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘索引值;determining, based on the data mining tag, a data mining index value corresponding to the data mining content;
根据所述数据挖掘索引值,通过所述数据检索引擎,确定与所述挖掘内容对应的业务数据的存储位置并进行获取。According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.
具体的,所述部署模块304包括:Specifically, the deployment module 304 includes:
标注单元3041,用于通过所述平台层将选取的业务数据作为训练样本,并对所述训练样本进行标注,得到对应的标注文件;An annotation unit 3041, configured to use the selected business data as a training sample through the platform layer, and annotate the training sample to obtain a corresponding annotation file;
训练单元3042,用于根据所述模型训练算法,生成业务训练模型,并将所述训练样本和所述标注文件输入所述业务训练模型中,输出挖掘结果;基于所述挖掘结果,计算所述业务训练模型的损失值,并基于所述损失值对所述业务训练模型进行训练,直到所述损失值小于预置损失值时停止训练,输出对应的业务模型。The training unit 3042 is configured to generate a business training model according to the model training algorithm, input the training samples and the annotation files into the business training model, and output mining results; based on the mining results, calculate the The loss value of the business training model, and the business training model is trained based on the loss value, and the training is stopped until the loss value is less than the preset loss value, and the corresponding business model is output.
本申请实施例中,平台层中的数据收集引擎、算法搜索引擎和数据检索引擎,首先从多个业务机构中爬取业务数据以供备用;然后从预置的多个算法库中选择合适的算法库部署业务训练模型;再选取合适的业务数据作为样本,输入业务训练模型中进行训练,以搭建数据挖掘所需的业务模型,实现业务模型的智能部署,提升业务数据的挖掘效率。In the embodiment of the present application, the data collection engine, algorithm search engine and data retrieval engine in the platform layer first crawl business data from multiple business organizations for backup; then select a suitable one from multiple preset algorithm libraries The algorithm library deploys the business training model; then select the appropriate business data as a sample and input it into the business training model for training to build the business model required for data mining, realize the intelligent deployment of the business model, and improve the mining efficiency of business data.
上面图3和图4从模块化功能实体的角度对本申请实施例中的基于大数据挖掘的模型部署装置进行详细描述,下面从硬件处理的角度对本申请实施例中计算机设备进行详细描述。3 and 4 above describe the model deployment apparatus based on big data mining in the embodiments of the present application in detail from the perspective of modular functional entities, and the computer equipment in the embodiments of the present application is described in detail below from the perspective of hardware processing.
图5是本申请实施例提供的一种计算机设备的结构示意图,该计算机设备500可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(central processing units,CPU)510(例如,一个或一个以上处理器)和存储器520,一个或一个以上存储应用程序533或数据532的存储介质530(例如一个或一个以上海量存储设备)。其中,存储器520和存储介质530可以是短暂存储或持久存储。存储在存储介质530的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对计算机设备500中的一系列指令操作。更进一步地,处理器510可以设置为与存储介质530通信,在计算机设备500上执行存储介质530中的一系列指令操作。FIG. 5 is a schematic structural diagram of a computer device provided by an embodiment of the present application. The computer device 500 may vary greatly due to different configurations or performance, and may include one or more processors (central processing units, CPU) 510 (eg, one or more processors) and memory 520, one or more storage media 530 (eg, one or more mass storage devices) that store applications 533 or data 532. Among them, the memory 520 and the storage medium 530 may be short-term storage or persistent storage. The program stored in the storage medium 530 may include one or more modules (not shown in the figure), and each module may include a series of instructions to operate on the computer device 500 . Furthermore, the processor 510 may be configured to communicate with the storage medium 530 to execute a series of instruction operations in the storage medium 530 on the computer device 500 .
计算机设备500还可以包括一个或一个以上电源540,一个或一个以上有线或无线网络接口550,一个或一个以上输入输出接口560,和/或,一个或一个以上操作系统531,例如 Windows Serve,Mac OS X,Unix,Linux,FreeBSD等等。本领域技术人员可以理解,图5示出的计算机设备结构并不构成对计算机设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。 Computer device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input and output interfaces 560, and/or, one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the computer device structure shown in FIG. 5 does not constitute a limitation to the computer device, and may include more or less components than the one shown, or combine some components, or arrange different components.
本申请还提供一种计算机设备,该计算机设备是可以执行上述各实施例中基于大数据挖掘的模型部署方法的步骤的任何一种设备,所述计算机设备包括存储器和处理器,存储器中存储有计算机可读指令,计算机可读指令被处理器执行时,使得处理器执行上述各实施例中的所述基于大数据挖掘的模型部署方法的步骤。The present application also provides a computer device, the computer device is any device that can perform the steps of the model deployment method based on big data mining in the above-mentioned embodiments, the computer device includes a memory and a processor, and the memory stores a memory and a processor. Computer-readable instructions, when the computer-readable instructions are executed by the processor, cause the processor to execute the steps of the big data mining-based model deployment method in the foregoing embodiments.
本申请还提供一种计算机可读存储介质,该计算机可读存储介质可以为非易失性计算机可读存储介质,该计算机可读存储介质也可以为易失性计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行所述基于大数据挖掘的模型部署方法的步骤。The present application also provides a computer-readable storage medium. The computer-readable storage medium may be a non-volatile computer-readable storage medium. The computer-readable storage medium may also be a volatile computer-readable storage medium. The computer-readable storage medium stores instructions that, when executed on a computer, cause the computer to execute the steps of the model deployment method based on big data mining.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the system, device and unit described above may refer to the corresponding process in the foregoing method embodiments, which will not be repeated here.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage medium , including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the present application.

Claims (20)

  1. 一种基于大数据挖掘的模型部署方法,应用于大数据挖掘平台,其中,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述基于大数据挖掘的模型部署方法包括:A model deployment method based on big data mining is applied to a big data mining platform, wherein the big data mining platform includes in order from top to bottom: a business layer, a functional layer, a platform layer and a basic layer, and the big data mining platform is based on the big data. Mined model deployment methods include:
    每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;
    获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;
    获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;
    以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
  2. 根据权利要求1所述的基于大数据挖掘的模型部署方法,其中,所述平台层中包含数据收集引擎,所述通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中包括:The model deployment method based on big data mining according to claim 1, wherein a data collection engine is included in the platform layer, and business data is crawled from databases of various institutions through the platform layer, and the business Data updates to the base layer include:
    通过所述数据收集引擎从多个机构数据库中爬取业务数据,并对所述业务数据进行标准化处理;Crawling business data from multiple institutional databases through the data collection engine, and standardizing the business data;
    将标准化处理后的业务数据转换为预置的语义格式,并基于所述语义格式,确定所述转换后的业务数据的语义特征;converting the standardized business data into a preset semantic format, and determining the semantic features of the converted business data based on the semantic format;
    获取所述基础层中数据存储模型的文档语义框架,并根据所述文档语义框架,关联对应的语义特征;Obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;
    基于关联后的文档语义框架与语义特征,将所述转换后的业务数据存储至所述数据存储模型中。Based on the associated document semantic framework and semantic features, the transformed business data is stored in the data storage model.
  3. 根据权利要求1所述的基于大数据挖掘的模型部署方法,其中,所述对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容包括:The model deployment method based on big data mining according to claim 1, wherein the performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request comprises:
    解析所述数据挖掘请求,得到对应的数据挖掘信息,并对所述数据挖掘信息进行分词处理,得到多个挖掘要点分词;Parsing the data mining request to obtain corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;
    将所述各挖掘要点分词输入预置语义分析模型中进行语义分析,得到多个数据挖掘标签;Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags;
    基于所述数据挖掘标签,确定所述数据挖掘请求对应的挖掘内容。Based on the data mining tag, the mining content corresponding to the data mining request is determined.
  4. 根据权利要求3所述的基于大数据挖掘的模型部署方法,其中,所述平台层中还包含算法搜索引擎,所述获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法包括:The model deployment method based on big data mining according to claim 3, wherein the platform layer further includes an algorithm search engine, and the acquiring a model matching the mining content in a preset algorithm library of the platform layer Training algorithms include:
    基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘属性,并基于所述数据挖掘属性,确定对应的多层算法标签;Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;
    基于所述多层算法标签,通过所述算法搜索引擎,获取预置算法库中与所述挖掘内容匹配的模型训练算法。Based on the multi-layer algorithm tags, through the algorithm search engine, obtain a model training algorithm matching the mining content in the preset algorithm library.
  5. 根据权利要求3所述的基于大数据挖掘的模型部署方法,其中,所述平台层中还包含数据检索引擎,所述从所述基础层中选取与所述挖掘内容对应的业务数据包括:The model deployment method based on big data mining according to claim 3, wherein the platform layer further includes a data retrieval engine, and the selecting business data corresponding to the mining content from the base layer comprises:
    基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘索引值;determining, based on the data mining tag, a data mining index value corresponding to the data mining content;
    根据所述数据挖掘索引值,通过所述数据检索引擎,确定与所述挖掘内容对应的业务数据的存储位置并进行获取。According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.
  6. 根据权利要求1-5中任一项所述的基于大数据挖掘的模型部署方法,其中,所述以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型包括:The model deployment method based on big data mining according to any one of claims 1 to 5, wherein the selected business data is used as a training sample, and the model training algorithm is used for training, and generating a corresponding business model comprises:
    通过所述平台层将选取的业务数据作为训练样本,并对所述训练样本进行标注,得到对应的标注文件;The selected business data is used as a training sample by the platform layer, and the training sample is marked to obtain a corresponding marked file;
    根据所述模型训练算法,生成业务训练模型,并将所述训练样本和所述标注文件输入所述业务训练模型中,输出挖掘结果;According to the model training algorithm, a business training model is generated, and the training samples and the annotation file are input into the business training model, and mining results are output;
    基于所述挖掘结果,计算所述业务训练模型的损失值,并基于所述损失值对所述业务训练模型进行训练,直到所述损失值小于预置损失值时停止训练,输出对应的业务模型。Calculate the loss value of the business training model based on the mining result, train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model .
  7. 一种计算机设备,其中,所述计算机设备包括:存储器和至少一个处理器,所述存储器中存储有指令,所述存储器和所述至少一个处理器通过线路互连;A computer device, wherein the computer device comprises: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor are interconnected by wires;
    所述至少一个处理器调用所述存储器中的所述指令,以使得所述计算机设备执行如下所述的基于大数据挖掘的模型部署方法的步骤,其中,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述的基于大数据挖掘的模型部署方法的步骤包括:The at least one processor invokes the instructions in the memory, so that the computer device executes the steps of the model deployment method based on big data mining as described below, wherein the big data mining platform is from top to bottom The steps include: a business layer, a functional layer, a platform layer and a basic layer, and the steps of the model deployment method based on big data mining include:
    每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;
    获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;
    获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;
    以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
  8. 根据权利要求7所述的计算机设备,其中,所述平台层中包含数据收集引擎,所述计算机设备执行所述通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中的步骤时,包括:The computer device according to claim 7, wherein a data collection engine is included in the platform layer, and the computer device executes the crawling of business data from databases of various institutions through the platform layer, and collects the business data When updating to the steps in the base layer, include:
    通过所述数据收集引擎从多个机构数据库中爬取业务数据,并对所述业务数据进行标准化处理;Crawling business data from multiple institutional databases through the data collection engine, and standardizing the business data;
    将标准化处理后的业务数据转换为预置的语义格式,并基于所述语义格式,确定所述转换后的业务数据的语义特征;converting the standardized business data into a preset semantic format, and determining the semantic features of the converted business data based on the semantic format;
    获取所述基础层中数据存储模型的文档语义框架,并根据所述文档语义框架,关联对应的语义特征;Obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;
    基于关联后的文档语义框架与语义特征,将所述转换后的业务数据存储至所述数据存储模型中。Based on the associated document semantic framework and semantic features, the transformed business data is stored in the data storage model.
  9. 根据权利要求7所述的计算机设备,其中,所述计算机设备执行所述对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容的步骤时,包括:The computer device according to claim 7, wherein, when the computer device performs the step of performing the semantic analysis on the data mining request to determine the mining content corresponding to the data mining request, the method comprises:
    解析所述数据挖掘请求,得到对应的数据挖掘信息,并对所述数据挖掘信息进行分词处理,得到多个挖掘要点分词;Parsing the data mining request to obtain corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;
    将所述各挖掘要点分词输入预置语义分析模型中进行语义分析,得到多个数据挖掘标签;Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags;
    基于所述数据挖掘标签,确定所述数据挖掘请求对应的挖掘内容。Based on the data mining tag, the mining content corresponding to the data mining request is determined.
  10. 根据权利要求9所述的计算机设备,其中,所述平台层中还包含算法搜索引擎,所述计算机设备在执行所述获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法的步骤时,包括:The computer device according to claim 9, wherein an algorithm search engine is further included in the platform layer, and the computer device executes the acquisition of a model matching the mining content in a preset algorithm library of the platform layer When training the algorithm steps, include:
    基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘属性,并基于所述数据挖掘属性,确定对应的多层算法标签;Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;
    基于所述多层算法标签,通过所述算法搜索引擎,获取预置算法库中与所述挖掘内容 匹配的模型训练算法。Based on the multi-layer algorithm tags, through the algorithm search engine, obtain the model training algorithm matching the mining content in the preset algorithm library.
  11. 根据权利要求9所述的计算机设备,其中,所述平台层中还包含数据检索引擎,所述计算机设备在执行所述从所述基础层中选取与所述挖掘内容对应的业务数据的步骤时,包括:The computer device according to claim 9, wherein the platform layer further includes a data retrieval engine, and when the computer device performs the step of selecting business data corresponding to the mining content from the base layer ,include:
    基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘索引值;determining, based on the data mining tag, a data mining index value corresponding to the data mining content;
    根据所述数据挖掘索引值,通过所述数据检索引擎,确定与所述挖掘内容对应的业务数据的存储位置并进行获取。According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.
  12. 根据权利要求7-11所述的计算机设备,其中,所述计算机设备在执行所述以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型的步骤时,包括:The computer device according to claims 7-11, wherein, when the computer device performs the step of generating a corresponding business model by using the selected business data as a training sample, using the model training algorithm for training, and generating a corresponding business model, the steps include:
    通过所述平台层将选取的业务数据作为训练样本,并对所述训练样本进行标注,得到对应的标注文件;The selected business data is used as a training sample by the platform layer, and the training sample is marked to obtain a corresponding marked file;
    根据所述模型训练算法,生成业务训练模型,并将所述训练样本和所述标注文件输入所述业务训练模型中,输出挖掘结果;According to the model training algorithm, a business training model is generated, and the training samples and the annotation file are input into the business training model, and mining results are output;
    基于所述挖掘结果,计算所述业务训练模型的损失值,并基于所述损失值对所述业务训练模型进行训练,直到所述损失值小于预置损失值时停止训练,输出对应的业务模型。Calculate the loss value of the business training model based on the mining result, train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model .
  13. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下所述的基于大数据挖掘的模型部署方法的步骤,其中,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述的基于大数据挖掘的模型部署方法的步骤包括:A computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, the steps of the model deployment method based on big data mining as described below are implemented, wherein, The big data mining platform includes, from top to bottom, a business layer, a functional layer, a platform layer and a basic layer, and the steps of the model deployment method based on big data mining include:
    每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;Every preset period, crawl business data from each institutional database through the platform layer, and update the business data to the base layer;
    获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;Acquiring the data mining request received by the business layer, and performing semantic analysis on the data mining request to determine the mining content corresponding to the data mining request;
    获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;Acquiring a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and selecting business data corresponding to the mining content from the base layer;
    以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。Taking the selected business data as a training sample, the model training algorithm is used for training, a corresponding business model is generated and deployed to the functional layer, and an interface for accessing the business model is provided externally.
  14. 根据权利要求13所述的计算机可读存储介质,其中,所述平台层中包含数据收集引擎,所述计算机程序被处理器执行时实现所述通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中的步骤时,包括:The computer-readable storage medium according to claim 13, wherein a data collection engine is included in the platform layer, and when the computer program is executed by the processor, the platform layer implements the crawling of services from databases of various institutions through the platform layer data, and updating the business data to the steps in the base layer, including:
    通过所述数据收集引擎从多个机构数据库中爬取业务数据,并对所述业务数据进行标准化处理;Crawling business data from multiple institutional databases through the data collection engine, and standardizing the business data;
    将标准化处理后的业务数据转换为预置的语义格式,并基于所述语义格式,确定所述转换后的业务数据的语义特征;converting the standardized business data into a preset semantic format, and determining the semantic features of the converted business data based on the semantic format;
    获取所述基础层中数据存储模型的文档语义框架,并根据所述文档语义框架,关联对应的语义特征;Obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;
    基于关联后的文档语义框架与语义特征,将所述转换后的业务数据存储至所述数据存储模型中。Based on the associated document semantic framework and semantic features, the transformed business data is stored in the data storage model.
  15. 根据权利要求13所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时实现所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容的步骤时,包括:The computer-readable storage medium according to claim 13, wherein, when the computer program is executed by the processor, the data mining request is implemented to perform semantic analysis, and the step of determining the mining content corresponding to the data mining request comprises:
    解析所述数据挖掘请求,得到对应的数据挖掘信息,并对所述数据挖掘信息进行分词处理,得到多个挖掘要点分词;Parsing the data mining request to obtain corresponding data mining information, and performing word segmentation processing on the data mining information to obtain a plurality of key points for word segmentation;
    将所述各挖掘要点分词输入预置语义分析模型中进行语义分析,得到多个数据挖掘标签;Inputting the word segmentation of each mining point into a preset semantic analysis model for semantic analysis to obtain a plurality of data mining tags;
    基于所述数据挖掘标签,确定所述数据挖掘请求对应的挖掘内容。Based on the data mining tag, the mining content corresponding to the data mining request is determined.
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述平台层中还包含算法搜索引擎,所述计算机程序被处理器执行时实现所述获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法的步骤时,包括:The computer-readable storage medium according to claim 15, wherein the platform layer further includes an algorithm search engine, and when the computer program is executed by the processor, the acquisition of the preset algorithm library of the platform layer and the When the steps of the model training algorithm matching the mining content, include:
    基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘属性,并基于所述数据挖掘属性,确定对应的多层算法标签;Based on the data mining tag, determine the data mining attribute corresponding to the data mining content, and determine the corresponding multi-layer algorithm tag based on the data mining attribute;
    基于所述多层算法标签,通过所述算法搜索引擎,获取预置算法库中与所述挖掘内容匹配的模型训练算法。Based on the multi-layer algorithm tags, through the algorithm search engine, obtain a model training algorithm matching the mining content in the preset algorithm library.
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述平台层中还包含数据检索引擎,所述计算机程序被处理器执行时实现所述从所述基础层中选取与所述挖掘内容对应的业务数据的步骤时,包括:The computer-readable storage medium according to claim 15, wherein the platform layer further includes a data retrieval engine, and when the computer program is executed by the processor, the selection from the base layer and the mining content are implemented The corresponding business data steps include:
    基于所述数据挖掘标签,确定所述数据挖掘内容对应的数据挖掘索引值;determining, based on the data mining tag, a data mining index value corresponding to the data mining content;
    根据所述数据挖掘索引值,通过所述数据检索引擎,确定与所述挖掘内容对应的业务数据的存储位置并进行获取。According to the data mining index value, through the data retrieval engine, the storage location of the business data corresponding to the mining content is determined and acquired.
  18. 根据权利要求13-17所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时实现所述以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型的步骤时,包括:The computer-readable storage medium according to claims 13-17, wherein, when the computer program is executed by the processor, the selected service data is used as a training sample, and the model training algorithm is used for training to generate a corresponding service When modeling steps, include:
    通过所述平台层将选取的业务数据作为训练样本,并对所述训练样本进行标注,得到对应的标注文件;The selected business data is used as a training sample by the platform layer, and the training sample is marked to obtain a corresponding marked file;
    根据所述模型训练算法,生成业务训练模型,并将所述训练样本和所述标注文件输入所述业务训练模型中,输出挖掘结果;According to the model training algorithm, a business training model is generated, and the training samples and the annotation file are input into the business training model, and mining results are output;
    基于所述挖掘结果,计算所述业务训练模型的损失值,并基于所述损失值对所述业务训练模型进行训练,直到所述损失值小于预置损失值时停止训练,输出对应的业务模型。Calculate the loss value of the business training model based on the mining result, train the business training model based on the loss value, stop training until the loss value is less than the preset loss value, and output the corresponding business model .
  19. 一种基于大数据挖掘的模型部署装置,应用于大数据挖掘平台,其中,所述大数据挖掘平台由上至下依次包括:业务层、功能层、平台层和基础层,所述基于大数据挖掘的模型部署装置包括:A model deployment device based on big data mining is applied to a big data mining platform, wherein the big data mining platform includes in order from top to bottom: a business layer, a functional layer, a platform layer and a basic layer, the big data-based Excavated model deployment devices include:
    爬取模块,用于每隔预设周期,通过所述平台层从各机构数据库中爬取业务数据,并将所述业务数据更新至所述基础层中;A crawling module, used for crawling business data from each institutional database through the platform layer at every preset period, and updating the business data to the base layer;
    语义分析模块,用于获取所述业务层收到的数据挖掘请求,并对所述数据挖掘请求进行语义分析,确定所述数据挖掘请求对应的挖掘内容;a semantic analysis module, configured to acquire the data mining request received by the business layer, perform semantic analysis on the data mining request, and determine the mining content corresponding to the data mining request;
    选取模块,用于获取所述平台层的预置算法库中与所述挖掘内容匹配的模型训练算法,并从所述基础层中选取与所述挖掘内容对应的业务数据;a selection module, configured to obtain a model training algorithm matching the mining content in the preset algorithm library of the platform layer, and select business data corresponding to the mining content from the base layer;
    部署模块,用于以选取的业务数据为训练样本,采用所述模型训练算法进行训练,生成对应业务模型并部署到所述功能层以及对外提供访问所述业务模型的接口。The deployment module is configured to use the selected business data as a training sample, use the model training algorithm for training, generate a corresponding business model, deploy it to the functional layer, and provide an external interface for accessing the business model.
  20. 根据权利要求19中所述的基于大数据挖掘的模型部署装置,其中,所述平台层中包含数据收集引擎,所述爬取模块包括:The model deployment device based on big data mining according to claim 19, wherein the platform layer includes a data collection engine, and the crawling module comprises:
    数据标准化处理单元,用于通过所述数据收集引擎从多个机构数据库中爬取业务数据,并对所述业务数据进行标准化处理;a data standardization processing unit, used for crawling business data from multiple institutional databases through the data collection engine, and performing standardization processing on the business data;
    格式转换单元,用于将标准化处理后的业务数据转换为预置的语义格式,并基于所述语义格式,确定所述转换后的业务数据的语义特征;a format conversion unit, configured to convert the standardized business data into a preset semantic format, and determine the semantic feature of the converted business data based on the semantic format;
    关联单元,用于获取所述基础层中数据存储模型的文档语义框架,并根据所述文档语义框架,关联对应的语义特征;an association unit, configured to obtain the document semantic framework of the data storage model in the base layer, and associate corresponding semantic features according to the document semantic framework;
    存储单元,用于基于关联后的文档语义框架与语义特征,将所述转换后的业务数据存储至所述数据存储模型中。A storage unit, configured to store the converted business data in the data storage model based on the associated document semantic framework and semantic features.
PCT/CN2021/083486 2020-12-02 2021-03-29 Big data mining-based model deployment method, apparatus and device, and storage medium WO2022116430A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011386029.XA CN112445845A (en) 2020-12-02 2020-12-02 Model deployment method, device, equipment and storage medium based on big data mining
CN202011386029.X 2020-12-02

Publications (1)

Publication Number Publication Date
WO2022116430A1 true WO2022116430A1 (en) 2022-06-09

Family

ID=74740466

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083486 WO2022116430A1 (en) 2020-12-02 2021-03-29 Big data mining-based model deployment method, apparatus and device, and storage medium

Country Status (2)

Country Link
CN (1) CN112445845A (en)
WO (1) WO2022116430A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098784A (en) * 2022-07-18 2022-09-23 李圣刚 Data mining method and data mining system
CN115766795A (en) * 2022-11-28 2023-03-07 福州大学 Intelligent service method of trusted electronic file platform based on block chain
CN116483872A (en) * 2023-06-20 2023-07-25 合肥青谷信息科技有限公司 Complex data processing method and device and electronic equipment
CN116842238A (en) * 2023-07-24 2023-10-03 武汉赛思云科技有限公司 Method and system for realizing enterprise data visualization based on big data analysis

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445845A (en) * 2020-12-02 2021-03-05 平安科技(深圳)有限公司 Model deployment method, device, equipment and storage medium based on big data mining
CN113420017B (en) * 2021-06-21 2023-10-13 上海特高信息技术有限公司 Block chain application method for acquiring training data set of robot navigation algorithm
CN116151601A (en) * 2021-11-15 2023-05-23 中兴通讯股份有限公司 Stream service modeling method, device, platform, electronic equipment and storage medium
CN114880462A (en) * 2022-02-25 2022-08-09 北京百度网讯科技有限公司 Medical document analysis method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050102292A1 (en) * 2000-09-28 2005-05-12 Pablo Tamayo Enterprise web mining system and method
CN104699985A (en) * 2015-03-26 2015-06-10 西安电子科技大学 Medical big-data acquisition and analysis system and method
CN111125061A (en) * 2019-12-18 2020-05-08 甘肃省卫生健康统计信息中心(西北人口信息中心) Method for standardizing and promoting health medical big data
CN111709941A (en) * 2020-06-24 2020-09-25 上海迪影科技有限公司 Lightweight automatic deep learning system and method for pathological image
CN112015962A (en) * 2020-07-24 2020-12-01 北京艾巴斯智能科技发展有限公司 Government affair intelligent big data center system architecture
CN112445845A (en) * 2020-12-02 2021-03-05 平安科技(深圳)有限公司 Model deployment method, device, equipment and storage medium based on big data mining

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050102292A1 (en) * 2000-09-28 2005-05-12 Pablo Tamayo Enterprise web mining system and method
CN104699985A (en) * 2015-03-26 2015-06-10 西安电子科技大学 Medical big-data acquisition and analysis system and method
CN111125061A (en) * 2019-12-18 2020-05-08 甘肃省卫生健康统计信息中心(西北人口信息中心) Method for standardizing and promoting health medical big data
CN111709941A (en) * 2020-06-24 2020-09-25 上海迪影科技有限公司 Lightweight automatic deep learning system and method for pathological image
CN112015962A (en) * 2020-07-24 2020-12-01 北京艾巴斯智能科技发展有限公司 Government affair intelligent big data center system architecture
CN112445845A (en) * 2020-12-02 2021-03-05 平安科技(深圳)有限公司 Model deployment method, device, equipment and storage medium based on big data mining

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115098784A (en) * 2022-07-18 2022-09-23 李圣刚 Data mining method and data mining system
CN115766795A (en) * 2022-11-28 2023-03-07 福州大学 Intelligent service method of trusted electronic file platform based on block chain
CN116483872A (en) * 2023-06-20 2023-07-25 合肥青谷信息科技有限公司 Complex data processing method and device and electronic equipment
CN116483872B (en) * 2023-06-20 2023-09-12 合肥青谷信息科技有限公司 Complex data processing method and device and electronic equipment
CN116842238A (en) * 2023-07-24 2023-10-03 武汉赛思云科技有限公司 Method and system for realizing enterprise data visualization based on big data analysis
CN116842238B (en) * 2023-07-24 2024-03-22 右来了(北京)科技有限公司 Method and system for realizing enterprise data visualization based on big data analysis

Also Published As

Publication number Publication date
CN112445845A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2022116430A1 (en) Big data mining-based model deployment method, apparatus and device, and storage medium
CN110415831B (en) Medical big data cloud service analysis platform
Kumar et al. Big data analytics for healthcare industry: impact, applications, and tools
CN110459287B (en) Structured report data from medical text reports
Meystre et al. Automation of a problem list using natural language processing
Hempelmann et al. An entropy-based evaluation method for knowledge bases of medical information systems
US20200311610A1 (en) Rule-based feature engineering, model creation and hosting
Park et al. Graph databases for large-scale healthcare systems: A framework for efficient data management and data services
CN108962394B (en) Medical data decision support method and system
CN111243748A (en) Needle pushing health data standardization system
Khan et al. Towards development of health data warehouse: Bangladesh perspective
CN109074858A (en) There is no hospital's matching in the health care data library for going identification of obvious standard identifier
CN114003734A (en) Breast cancer risk factor knowledge system model, knowledge map system and construction method
CN110019410A (en) For the big data digging system of tcm clinical case information
Chu et al. Knowledge representation and retrieval using conceptual graphs and free text document self-organisation techniques
WO2021169203A1 (en) Monogenic disease name recommendation method and system based on multi-level structural similarity
Chauhan et al. A robust model for big healthcare data analytics
JP7437386B2 (en) How to categorize medical records
Ahmed et al. Diagnosis recommendation using machine learning scientific workflows
Liu et al. Controlled vocabularies in OODBs: Modeling issues and implementation
Safaei Text-based multi-dimensional medical images retrieval according to the features-usage correlation
CN116383413A (en) Knowledge graph updating method and system based on medical data extraction
Kokkinaki et al. Searching biosignal databases by content and context: Research Oriented Integration System for ECG Signals (ROISES)
Zamora et al. Characterizing chronic disease and polymedication prescription patterns from electronic health records
CN114188036A (en) Operation scheme evaluation method, device and system and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899474

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21899474

Country of ref document: EP

Kind code of ref document: A1