WO2023029353A1 - 基于多模态混合模型的业务数据处理方法及装置 - Google Patents

基于多模态混合模型的业务数据处理方法及装置 Download PDF

Info

Publication number
WO2023029353A1
WO2023029353A1 PCT/CN2022/071442 CN2022071442W WO2023029353A1 WO 2023029353 A1 WO2023029353 A1 WO 2023029353A1 CN 2022071442 W CN2022071442 W CN 2022071442W WO 2023029353 A1 WO2023029353 A1 WO 2023029353A1
Authority
WO
WIPO (PCT)
Prior art keywords
modal
underwriting
multimodal
policy
mixed
Prior art date
Application number
PCT/CN2022/071442
Other languages
English (en)
French (fr)
Inventor
谯轶轩
陈浩
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023029353A1 publication Critical patent/WO2023029353A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a business data processing method and device based on a multi-modal hybrid model.
  • artificial intelligence has been fully popularized in the big data field of medical insurance business.
  • artificial intelligence can be used to process data on insurance policy business to achieve the accuracy and efficiency of underwriting.
  • the insurance policy contains multi-modal data, Only data processing for a single model of single-modal data will result in irrelevance between the processing of various modes, affecting the overall accuracy of data processing of insurance policies, and thus the processing efficiency of business data.
  • the present application provides a business data processing method and device based on a multimodal hybrid model.
  • a business data processing method based on a multimodal hybrid model including:
  • the multimodal mixed model is based on multimodal input parameters
  • the number of modality replacement objects are respectively configured for construction, and the loss function is determined in a mixed mode to complete the model training;
  • a business data processing device based on a multimodal hybrid model including:
  • the analysis module is used to analyze the image data and text data in the policy business information
  • a processing module configured to perform multimodal mixed recognition processing on the image data and the text data based on the trained multimodal mixed model to obtain a multimodal mixed processing result.
  • the multimodal mixed model is based on a multimodal mixed model.
  • the number of modal input parameters is respectively configured to construct the modal replacement object, and the loss function is determined in a mixed mode to complete the model training;
  • a classification module configured to perform label classification on the policy business information according to the multi-modal hybrid processing result, and analyze the policy business requirements and underwriting start trigger events of each label classification after the label classification;
  • the starting module is configured to start the underwriting operation on the policy business information when the underwriting start trigger event is detected.
  • a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, business data processing based on a multi-modal hybrid model is implemented.
  • the multimodal mixed model is based on multimodal input parameters
  • the number of modality replacement objects are respectively configured for construction, and the loss function is determined in a mixed mode to complete the model training;
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and operable on the processor, wherein the computer-readable instructions are executed by the processor Realize the business data processing method based on the multi-modal hybrid model at the same time, including:
  • the multimodal mixed model is based on multimodal input parameters
  • the number of modality replacement objects are respectively configured for construction, and the loss function is determined in a mixed mode to complete the model training;
  • the technical solution provided by the embodiment of the present application has at least the following advantages:
  • the application can make the processing among various modals relevant, improve the overall accuracy of policy data processing, and thus improve the processing efficiency of business data.
  • FIG. 1 shows a flow chart of a business data processing method based on a multimodal hybrid model provided by an embodiment of the present application
  • FIG. 2 shows a schematic diagram of a dual-modal input format provided by an embodiment of the present application
  • Fig. 3 shows the schematic diagram of the single mode data input format H1' provided by the embodiment of the present application
  • Fig. 4 shows the schematic diagram of the single mode data input format H2' provided by the embodiment of the present application
  • FIG. 5 shows a block diagram of a business data processing device based on a multimodal hybrid model provided by an embodiment of the present application
  • FIG. 6 shows a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • AI artificial intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • an embodiment of the present application provides a business data processing method based on a multi-modal hybrid model, as shown in Figure 1, and the application of this method to a computer device such as a server is used as an example for illustration, wherein the server can be
  • An independent server can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery network (Content Delivery Network, CDN), And cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, such as intelligent medical systems, digital medical platforms, etc.
  • the above method comprises the following steps:
  • an insurance policy data processing module will be embedded in the intelligent medical system to directly retrieve the policy business for processing after medical diagnosis, so as to establish a complete business chain between medical care and policy underwriting. Therefore, the terminal as the current execution terminal can be A separate processing server may also be a single processing unit embedded in an intelligent medical system, which is not specifically limited in this embodiment of the present application.
  • the image data and text data in the business information of the policy are first analyzed.
  • the policy business information may include scanned image information and entered text data. Therefore, when processing the policy business information, it is necessary to analyze the image data and text data in the policy business information, and respectively obtain the image data and text data that need to be recognized by multimodal fusion.
  • the policy business information can be obtained from the policy information database of the insurance company, wherein the data in the database can be entered by business personnel when handling business for users, and can include image data and text data. For example, business personnel scan or take photos of the orders filled by users and store the pictures in the database. Business personnel can also enter the basic information of users and selected business information into the system to generate electronic insurance policies and store them in the database.
  • the embodiments of this application do not make specific limitations. Due to the diversity of data in the database, before performing intelligent artificial recognition processing, it is necessary to analyze and obtain the image data and text data that need to be recognized by multi-modal fusion.
  • the multi-modal hybrid model is based on the number of multi-modal input parameters to configure the modal replacement objects and determine the loss function in a mixed-modal manner to complete the model training;
  • the multi-modal hybrid model may be an artificial intelligence data processing model for recognition processing, and the multi-modal hybrid recognition process is performed on image data and text data by using the trained multi-modal hybrid model to obtain multiple Modal blending results.
  • the processing result is the business classification content obtained by multi-modal mixed recognition processing of different image data and different text data, which can be used as the content for labeling and classifying policy business information, such as policy products, user objects, signing time, underwriting period, etc. , which is not specifically limited in this embodiment of the present application.
  • the multimodal hybrid model that has been trained can be obtained by configuring the modal replacement object based on the number of multimodal input parameters and determining the loss function in a mixed modal manner to complete the model training, that is, using the modal replacement object and the single-mode
  • the multi-modal hybrid network structure of the multi-modal training sample group is used for model training to obtain a multi-modal hybrid model.
  • the unimodal model is responsible for representing the information as a numerical vector that can be processed by the computer or further abstracted into a higher-level feature vector
  • the multimodal model refers to the elimination of modal The redundancy between states, so as to learn a better feature representation.
  • the policy business information is classified according to the processing results, and the policy business requirements are analyzed to match the label-classified policy business information, Identify underwriting initiation trigger events.
  • the label classification is used to represent the policy product, time, subject object, etc. corresponding to different policy business information.
  • Policy business requirements may include: underwriting requirements, status tracking requirements, sampling evaluation requirements, etc.
  • the underwriting start trigger event is used to represent the processing operations that the insurance company needs to make based on the policy information, such as calling the underwriting business process, so that the underwriting business can be carried out in an orderly manner.
  • step 103 after matching the multi-modal hybrid processing result of the policy with the policy information business requirement in step 103, if the next step operation to be performed is matched, the next step operation is performed.
  • the underwriting status of each underwriting stage in the policy business information can be analyzed to make it execute the underwriting business according to the underwriting process.
  • this implementation method also includes : Obtain a multimodal training sample set, the multimodal training sample set contains at least two unimodal training sample groups; construct a multimodal hybrid network structure, and replace at least one single modality in the multimodal hybrid network structure The modal replacement object of the input parameters; based on the modal replacement object and the single-modal training sample group, the multi-modal hybrid network structure is trained to obtain a multi-modal hybrid model.
  • the sum of the number of unimodal input parameters is less than or equal to the number of multimodal input parameters; the loss function of the multimodal mixed model is determined based on the mixed mode.
  • the multi-modal training sample set is related data in the parsed policy business information, and includes at least two single-modal training sample groups, such as image data training sample group and text data training sample group.
  • at least one modality replacement object is obtained to replace the single-modal input parameters in the multi-modal hybrid network structure.
  • model training is performed on the multi-modal hybrid network structure based on the acquired modality replacement object and the single-modal training sample group to obtain a multi-modal hybrid model.
  • modality 1 is text data
  • modality 2 is image data.
  • a dual-modal input format as shown in Figure 2
  • splicing [sep] characters after each modality to distinguish and then concatenating the input to the model. Since the input length of the transformer is fixed, it is generally set to 512. Therefore, when the spliced length is greater than 512, truncate it, and when it is less than 512, add [pad] characters to complete to 512.
  • H1' as shown in Figure 3
  • H2' as shown in Figure 4.
  • the input length of the transformer is fixed, it is generally set to 512, so in the case of only a single mode, first add [sep] after the single mode feature, and then add [pad] characters to complete to 512 characters.
  • Input the data of the two single-mode training sample groups into the Transformer architecture respectively, and take out the vector corresponding to the position of the last layer network [sep].
  • H1 represents the eigenvector representation of mode 1 for dual-modal input
  • H2 represents the eigenvector representation of mode 2 for dual-modal input
  • H1' represents the eigenvector representation of mode 1 for single-modal input
  • H2' represents the eigenvector representation of single-modal input.
  • L1
  • 2 represents the second norm of the vector.
  • the modality 1 and modality 2 come from the same sample is accurately classified, that is, when constructing the training samples, half of the samples will be composed of modality 1 and modality 2 of the same sample, and half of the samples will be composed of different
  • the sample consists of mode 1 and mode 2, and the model needs to predict that the former category is 1 and the latter category is 0.
  • the specific steps include: b1) first splice the H1 and H2 vectors after passing through the transformer architecture; b2) convert them into 2 dimensions through the fully connected layer, and obtain the predicted probability of each category through the softmax operation, abbreviated as S; b3) calculate the predicted probability The cross-entropy loss function between the vector S and the one-hot encoded Y corresponding to the true category yields L2
  • (l) represents the l-th dimension of the vector
  • the one-hot encoding points to a method in which the position of the real word in the vector is 1, and the remaining positions are all 0.
  • the stochastic gradient descent algorithm (SGD) and the pytorch framework are used to model the model, update the parameters, and complete the model training.
  • the method of this embodiment also includes: dividing the number of categories of the single-modal training sample group on the multi-modal training sample set; configuring the replacement of each modality replacement object according to the ratio between the number of categories divided Weight value; Obtaining the modal replacement object replacing at least one unimodal input parameter in the multimodal hybrid network structure includes: determining at least one modal replacement object matching the unimodal input parameter, according to the replacement weight value from the modal Filters the only modal replacement objects that match the unimodal input parameter from the modal replacement objects.
  • preprocessing is performed on the original training data of modality 1 and modality 2, and corresponding feature information is extracted.
  • preprocessing is performed on the original training data of modality 1 and modality 2, and corresponding feature information is extracted.
  • For text data first perform jieba word segmentation processing, and use pre-trained Word2vec or GloVe to convert each word after word segmentation into a word vector. For example, if there are m words after word segmentation, the final feature dimension is [m, 300 ], where 300 is the word vector dimension.
  • the method of this embodiment further includes: according to the preset time interval and the update status of the policy business, classifying the single-modal training sample set in the multi-modal training sample set.
  • the modality training sample group is updated to update the replacement weight value of each modality replacement object.
  • the sample data can also be updated according to preset rules before dividing the number of categories of the single-modal training sample group on the multi-modal training sample set. Further, according to The updated sample data determines the replacement weight value of each modal replacement object, for example, the sample data is updated at a preset update time interval, or the sample data is updated according to the update status of the policy business.
  • it is set to update the sample data every 30 days, and correspondingly update the replacement weight value of each mode according to the updated sample data, and further obtain a multi-modal hybrid model that is more in line with the current actual situation, so that the insurance policy The result of business information identification and processing is more accurate.
  • labeling the policy business information according to the result of the multi-modal hybrid processing, and analyzing the policy business requirements of each tag classification after the tag classification and the underwriting start trigger event include: the multi-modal hybrid processing
  • the policy product identification, time identification, and subject identification marked by the results are matched with the underwriting requirements, status tracking requirements, and sampling evaluation requirements in the policy business requirements; if they match, the underwriting start trigger event is configured in the policy business information to perform underwriting After the operation, status tracking operation, and sampling evaluation operation are completed, the trigger operation of the underwriting start trigger event is executed.
  • the label classification is used to represent the policy product, time, subject object, etc. corresponding to different policy business information.
  • Product IDs are used to characterize product categories, such as auto insurance, life insurance, etc.
  • the time stamp is used to represent the signing time of the policy or the effective time of the policy, etc.
  • Principal objects are used to represent groups of users.
  • the underwriting requirements are used to represent the user's signing of the order and the insurance company's review stage.
  • the status tracking requirement is used to represent that the policy is within the validity period, and the insurance company can confirm the business status of the policy in real time during the process of underwriting.
  • Sampling evaluation requirements are used to characterize a sampling evaluation survey of policy fulfillment quality.
  • starting the underwriting operation on the policy business information includes: calling the underwriting thread, and analyzing the underwriting parameters in the policy business information; introducing the underwriting parameters into the underwriting operation execution instructions in the underwriting thread, so that Carry out the underwriting business according to the underwriting operation execution instruction.
  • the underwriting parameter is used to identify the underwriting status of different underwriting time periods and different underwriting business stages.
  • the insurance company will formulate corresponding underwriting threads to instruct the orderly conduct of insurance business. After the underwriting start operation is triggered, the underwriting thread is called first, and the underwriting parameters in the policy information are introduced into the operation execution instructions in the underwriting thread, so that the underwriting business can be carried out in an orderly manner according to the instructions.
  • the image data and text data in the policy business information can be obtained from the policy information database of the insurance company, and the data in the database can be entered by the business personnel when handling business for the user, including the order filled in by the business personnel
  • the image data and text data obtained by scanning or taking pictures may have problems such as unclear images or input errors, which will affect the accuracy of multi-modal hybrid recognition.
  • the method of this embodiment further includes: The image data is screened to determine the image data to be processed by multi-modal hybrid recognition; the text data is screened for abnormal words according to the natural language processing database, and the text data to be processed by multi-modal hybrid recognition is determined.
  • the data can be screened before performing multi-modal mixed identification processing on the policy business information data.
  • image data preferably, it can be screened according to image definition, image pixel ratio, and image size to determine the image data to be processed;
  • text data preferably, the text data can be processed according to the natural language processing database
  • the abnormal words are screened to determine the text data to be processed, for example, an NLP natural language processing model, which is not specifically limited in this embodiment of the present application.
  • the present application provides a business data processing method based on a multi-modal hybrid model.
  • the embodiment of the present application analyzes the image data and text data in the policy business information;
  • the hybrid model performs multi-modal hybrid recognition processing on the image data and the text data to obtain a multi-modal hybrid processing result, and the multi-modal hybrid model configures modal replacement based on the number of multi-modal input parameters respectively
  • the object is constructed, and the loss function is determined in a mixed mode to complete the model training; according to the multi-modal mixed processing result, the policy business information is labeled and classified, and the policy business requirements of each label classification are analyzed after the label classification and an underwriting start trigger event; when the underwriting start trigger event is detected, the underwriting operation on the policy business information is started, so that the processing between the various modes is related, and the overall accuracy of the policy data processing is improved.
  • the processing efficiency of business data is improved.
  • an embodiment of the present application provides a business data processing device based on a multimodal hybrid model, as shown in Figure 5, the device includes:
  • Parsing module 21 for parsing image data and text data in policy business information
  • the processing module 22 is used to perform multimodal mixed recognition processing on the image data and the text data based on the trained multimodal mixed model to obtain a multimodal mixed processing result.
  • the multimodal mixed model is based on The number of multi-modal input parameters is respectively configured to construct the modal replacement object, and the loss function is determined in a mixed-modal manner to complete the model training;
  • the classification module 23 is configured to perform label classification on the policy business information according to the multi-modal hybrid processing result, and analyze the policy business requirements and underwriting start trigger events of each label classification after the label classification;
  • the starting module 24 is configured to start the underwriting operation on the policy business information when the underwriting start trigger event is detected.
  • the device before the processing module 22, the device further includes:
  • An acquisition module configured to acquire a multi-modal training sample set, the multi-modal training sample set includes at least two single-modal training sample groups;
  • the training module is used to perform model training on the multi-modal hybrid network structure based on the modal replacement object and the single-modal training sample group to obtain a multi-modal hybrid model, and the loss function of the multi-modal hybrid model is: determined based on a mixed-modal approach.
  • the device before the training module, the device also includes:
  • a division module configured to divide the number of categories of the single-mode training sample group into the multi-modal training sample set
  • the configuration module is used to configure the replacement weight value of each modal replacement object according to the ratio between the divided category numbers;
  • the building blocks include:
  • a determining unit configured to determine at least one modal replacement object that matches the unimodal input parameter, and select the modal that uniquely matches the unimodal input parameter from the modal replacement objects according to the replacement weight value Replacement object.
  • the device before the division module, the device further includes:
  • the update module is used to update the single-modal training sample group in the multi-modal training sample set according to the preset time interval and policy update status, so as to update the replacement weight value of each modality replacement object.
  • the label classification is used to characterize the policy product, time, and subject object corresponding to different policy business information, and the classification module 23 includes:
  • a matching unit configured to match the policy product identification, time identification, and subject object identification marked with the multimodal mixed processing result with the underwriting requirements, status tracking requirements, and sampling evaluation requirements in the policy business requirements;
  • the trigger unit is configured to configure an underwriting start trigger event after the underwriting operation, status tracking operation, and sampling evaluation operation of the policy business information are completed, so as to execute the trigger operation of the underwriting start trigger event.
  • the startup module 24 includes:
  • the calling unit is used to call the underwriting thread, and analyze the underwriting parameters in the policy business information, and the underwriting parameters are used to identify the underwriting status of different underwriting time periods and different underwriting business stages;
  • An execution unit configured to introduce the underwriting parameter into the underwriting operation execution instruction in the underwriting thread, so as to execute the underwriting business according to the underwriting operation execution instruction.
  • the device further includes:
  • a screening module configured to screen the image data according to image clarity, image pixel ratio, and image size, and determine the image data to be processed for multimodal hybrid recognition
  • the determining module is configured to screen the text data for abnormal words according to the natural language processing database, and determine the text data to be processed for multimodal mixed recognition.
  • the present application provides a business data processing device based on a multimodal hybrid model.
  • the embodiment of the present application analyzes the image data and text data in the policy business information;
  • the hybrid model performs multi-modal hybrid recognition processing on the image data and the text data to obtain a multi-modal hybrid processing result, and the multi-modal hybrid model configures modal replacement based on the number of multi-modal input parameters respectively
  • the object is constructed, and the loss function is determined in a mixed mode to complete the model training; according to the multi-modal mixed processing result, the policy business information is labeled and classified, and the policy business requirements of each label classification are analyzed after the label classification and an underwriting start trigger event; when the underwriting start trigger event is detected, the underwriting operation on the policy business information is started, so that the processing between the various modes is related, and the overall accuracy of the policy data processing is improved.
  • the processing efficiency of business data is improved.
  • a computer-readable storage medium stores at least one executable instruction, and the computer-executable instruction can execute the business based on the multimodal hybrid model in any of the above method embodiments
  • the computer-readable storage medium may be non-volatile or volatile.
  • FIG. 6 shows a schematic structural diagram of a computer device provided according to an embodiment of the present application.
  • the specific embodiment of the present application does not limit the specific implementation of the computer device.
  • the computer device may include: a processor (processor) 302, a communication interface (Communications Interface) 304, a memory (memory) 306, and a communication bus 508.
  • processor processor
  • Communication interface Communication Interface
  • memory memory
  • the processor 302 , the communication interface 304 , and the memory 306 communicate with each other through the communication bus 308 .
  • the communication interface 304 is used to communicate with network elements of other devices such as clients or other servers.
  • the processor 302 is configured to execute the program 310, and specifically, may execute relevant steps in the above embodiments of the business data processing method based on the multimodal hybrid model.
  • the program 310 may include program codes including computer operation instructions.
  • the processor 302 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present application.
  • the one or more processors included in the computer device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.
  • the memory 306 is used to store the program 310 .
  • the memory 306 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the program 310 can specifically be used to make the processor 302 perform the following operations:
  • the multimodal mixed model is based on multimodal input parameters
  • the number of modality replacement objects are respectively configured for construction, and the loss function is determined in a mixed mode to complete the model training;

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

本申请提供一种基于多模态混合模型的业务数据处理方法及装置,涉及人工智能技术领域,主要目的在于改善现有利用针对单一模态数据的单独模型进行多模态的数据处理导致的各个模态之间的处理无关联性,从而降低了保单数据处理的整体性准确率和业务数据处理效率的问题。包括:解析保单业务信息中的图像数据、以及文本数据;基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果;根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。

Description

基于多模态混合模型的业务数据处理方法及装置
本申请要求与2021年08月30日提交中国专利局、申请号为202111007560.6申请名称为“基于多模态混合模型的业务数据处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及一种人工智能技术领域,特别是涉及一种基于多模态混合模型的业务数据处理方法及装置。
背景技术
随着人工智能的快速发展,人工智能已经在医疗保险业务的大数据领域全面普及。其中,为了减少人为对保单的处理误差,可以通过人工智能对保单业务进行数据处理,以实现承保的准确性、高效性。
发明人意识到目前对保单、承保的数据处理过程仅仅是对单一模态数据进行单一地模型数据处理,从而得到对保单的数据处理结果,但是,由于保单中包含的是多模态的数据,仅仅针对单一模态数据的单独模型进行数据处理,会导致各个模态之间处理无关联性,影响保单的数据处理的整体性准确率,从而业务数据的处理效率。
发明内容
有鉴于此,本申请提供一种基于多模态混合模型的业务数据处理方法及装置。
依据本申请一个方面,提供了一种基于多模态混合模型的业务数据处理方法,包括:
解析保单业务信息中的图像数据、以及文本数据;
基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
依据本申请另一个方面,提供了一种基于多模态混合模型的业务数据处理装置,包括:
解析模块,用于解析保单业务信息中的图像数据、以及文本数据;
处理模块,用于基于完成训练的多模态混合模型对所述图像数据、所述文本数据进 行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
分类模块,用于根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
启动模块,用于当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
根据本申请的又一方面,提供了一种存计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现基于多模态混合模型的业务数据处理方法,包括:
解析保单业务信息中的图像数据、以及文本数据;
基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
根据本申请的再一方面,提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于多模态混合模型的业务数据处理方法,包括:
解析保单业务信息中的图像数据、以及文本数据;
基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
借由上述技术方案,本申请实施例提供的技术方案至少具有下列优点:
本申请能够使各个模态之间处理具有关联性,提高了保单数据处理的整体性准确率,从而提高了业务数据的处理效率。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通 技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了本申请实施例提供的一种基于多模态混合模型的业务数据处理方法流程图;
图2示出了本申请实施例提供双模态输入格式示意图;
图3示出了本申请实施例提供单模态数据输入格式H1’示意图;
图4示出了本申请实施例提供单模态数据输入格式H2’示意图;
图5示出了本申请实施例提供的一种基于多模态混合模型的业务数据处理装置组成框图;
图6示出了本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。
目前,在保司对保单或承保业务进行数据处理时,普遍仅仅针对单一模态数据的单独模型进行数据处理,然而由于实际的保单中大多是包含多模态数据的,这样容易导致各个模态之间的处理无关联性,降低了保单的数据处理的整体性准确率和业务数据的处理效率。
为了改善上述问题,本申请实施例提供了一种基于多模态混合模型的业务数据处理方法,如图1所示,以该方法应用于服务器等计算机设备为例进行说明,其中,服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器,如智能医疗系统、数字医疗平台等。上述方法包括以下步骤:
101、解析保单业务信息中的图像数据、以及文本数据。
本申请实施例中,智能医疗系统中会嵌入保单数据处理模块,以在医疗诊断后直接调取保单业务进行处理,从而使医疗与保单承保建立完整业务链,因此,作为当前执行端的终端可以为单独的处理服务端,也可以为嵌入在智能医疗系统中的单独一个处理单元,本申请实施例不做具体限定。其中,在对保单中的业务信息进行识别处之前,首先针对保单业务信息中的图像数据以及文本数据进行解析。其中,保单业务信息中可以包括扫描的图像信息以及录入的文本数据。因此,在对保单业务信息进行处理时,需要对保单业务信息中的图像数据以及文本数据进行解析,分别得到需要进行多模态融合识别的图像数据以及文本数据。
需要说明的是,保单业务信息可以从保司的保单信息数据库中获取,其中,数据库中的数据可以由业务人员在为用户办理业务时进行录入,可以包括图像数据以及文本数据等。例如,业务人员将用户填写的订单经过扫描或拍照,并将图片存储在数据库中,也可以由业务人员将用户的基本信息和选择的业务信息,录入系统生成电子保单,并存储在数据库中。本申请实施例不做具体限定。由于数据库中的数据呈多样性,因此,在进行智能人工识别处理前,需要先解析得到需要进行多模态融合识别的图像数据以及文本数据。
102、基于完成训练的多模态混合模型对图像数据、文本数据进行多模态混合识别处理,得到多模态混合处理结果。
其中,多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象并以混合模态方式确定损失函数完成模型训练的;
本申请实施例中,多模态混合模型可以是一种进行识别处理的人工智能数据处理模型,利用完成训练的多模态混合模型对图像数据、文本数据进行多模态混合识别处理,得到多模态混合处理结果。处理结果为对不同图像数据、不同文本数据进行多模态混合识别处理得到的业务分类内容,可作为对保单业务信息进行标签分类的内容,例如,保单产品,用户对象,签订时间,承保期限等,本申请实施例不做具体限定。其中,完成训练的多模态混合模型可以基于多模态输入参数的个数分别配置模态替换对象并以混合模态方式确定损失函数完成模型训练得到的,即利用模态替换对象以及单模态训练样本组多模态混合网络结构进行模型训练,得到的多模态混合模型。
需要说明的是,单模态模型负责将信息表示为计算机可以处理的数值向量或者进一步抽象为更高层的特征向量,而多模态模型是指通过利用多模态之间的互补性,剔除模态间的冗余性,从而学习到更好的特征表示。
103、根据多模态混合处理结果对保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件。
本申请实施例中,为了提高业务数据的效率,在得到多模态混合处理结果之后,根据处理结果对该保单业务信息进行标签分类,并解析保单业务需求与标签分类的保单业 务信息相匹配,确定承保启动触发事件。其中,标签分类用于表征标记不同保单业务信息所对应的保单产品、时间、主体对象等。保单业务需求可以包括:核保需求、状态追踪需求、抽样评估需求等。承保启动触发事件用于表征基于保单信息,保司需要作出的处理操作,如调取承保业务流程,以承保业务有序进行等。
示例性的,经过对保单进行多模态混合识别处理得出该保单为2019年3月份签订的保单,保险期限为1年,经过解析后,该保单已完成承保,需要对其进行抽样评估。
104、当检测到承保启动触发事件,则启动对保单业务信息的承保操作。
本申请实施例中,通过步骤103中将保单的多模态混合处理结果与保单信息业务需求经过匹配后,如果匹配到需要进行的下一步操作,则执行下一步操作。
优选的,可以在检测到承保启动触发事件时,通过解析保单业务信息中各个承保阶段阶段的承保状态,使其按照承保流程执行承保业务。
在本申请实施例中,为了进一步说明及限定,基于完成训练的多模态混合模型对图像数据、文本数据进行多模态混合识别处理,得到多模态混合处理结果之前,本实施方法还包括:获取多模态训练样本集,该多模态训练样本集中包含至少两个单模态训练样本组;构建多模态混合网络结构,并获取多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象;基于模态替换对象、单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型。
其中,单模态输入参数的个数之和小于或等于多模态输入参数个数;多模态混合模型的损失函数为基于混合模态方式确定的。
其中,多模态训练样本集为经过解析的保单业务信息中的相关数据,至少包含两个单模态训练样本组,如图像数据训练样本组以及文本数据训练样本组。优选的,在构建多模态混合网络结构时,获取至少一个模态替换对象用于替换该多模态混合网络结构中的单模态输入参数,也可根据多模态训练样本集中的单模态训练样本组的个数获取多个,只需保证单模态输入参数的个数之和小于或等于多模态输入参数个数即可,本申请实施例不做具体限定。进一步的,基于获取的模态替换对象以及单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型。
示例性的,以双模态混模型为例,模态1为文本数据,模态2为图像数据。首先构建双模态输入格式,如图2所示,分别在每个模态后面拼接[sep]字符进行区分,然后拼接起输入给模型,由于transformer的输入长度是固定的,一般设置为512,所以拼接起来长度大于512时,进行截断,少于512时,添加[pad]字符补全至512个。其次,分别构建单模态数据输入格式H1’,如图3所示,以及单模态数据输入格式H2’,图4所示。由于transformer的输入长度是固定的,一般设置为512,所以在只有单模态的情况下后单模态特征后首先添加[sep],然后添加[pad]字符补全至512个。分别将两个单模态训练样本组的数据分别输入Transformer架构,取出最后一层网络[sep]位置对应的向量。H1表示双模态输入时模态1的特征向量表示,H2表示双模态输入时模态2的特征向量 表示,H1’表示单模态输入时模态1的特征向量表示,H2’表示单模态输入时模态2的特征向量表示。进一步的,为了使H1和H1’、以及H2和H2’的向量表示尽可能相同,定义如下损失函数L1,L1=|H1-H1’|2+|H2-H2’|2。其中||2表示向量见的二范数。优选的,对模态1和模态2是否来自于同一样本进行准确分类,即在构建训练样本时,会有一半的样本由同一样本的模态1和模态2构成,一半的样本由不同样本的模态1和模态2构成,需要模型预测前者类别为1,后者类别为0。具体步骤包括:b1)首先拼接通过transformer架构后H1,H2向量;b2)通过全连接层转换成2维,通过softmax操作得到对每个类别的预测概率,简记为S;b3)计算预测概率向量S和真实类别所对应的one-hot编码Y之间的交叉熵损失函数得到L2
Figure PCTCN2022071442-appb-000001
其中,共有K个样本,(l)表示向量的第l维,one-hot编码指向量中真实单词所在的位置为1,其余位置均为0的表示方法。最终的损失函数为L=L1+L2。
采用随机梯度下降算法(SGD)和pytorch框架进行模型的建模,参数的更新,完成模型训练。
为了更清晰的说明模态替换对象的获取过程,本申请实施例中,优选的,在基于模态替换对象、单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型之前,本实施例方法还包括:对所述多模态训练样本集进行单模态训练样本组类别个数划分;根据划分的类别个数之间的比值配置各模态替换对象的替换权重值;获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象包括:确定与单模态输入参数匹配的至少一个模态替换对象,按照替换权重值从模态替换对象中筛选唯一匹配该单模态输入参数的模态替换对象。
具体的,承接上述示例,对于模态1与模态2的原始训练数据进行预处理,提取相应的特征信息。其中,针对文本数据,首先进行jieba分词处理,采用预训练好的Word2vec或者GloVe将分词后的每个词转换成词向量,例如,分词后有m个字,那最终特征维度为[m,300],其中300为词向量维度。针对图像数据,可选的,可以通过VGG等预训练好的模型,提取指定层(例如fc7)的特征,提取后的维度大小为[7,7,2048],通过python的reshape操作将其转换成维度[49,2048],即可以理解为一共提取出49个特征,每个特征2048维;也可以通过Faster RCNN等预训练好的模型提取图片中预定义类别的特征信息,最终维度大小是[n,2048],其中每张图片的n都是不一样的,本申请实施例不做具体限定。
进一步优选的,在对多模态训练样本集进行单模态训练样本组类别个数划分之前,本实施例方法还包括:按照预设时间间隔、保单业务更新状态对多模态训练样本集中单模态训练样本组进行更新,以更新各模态替换对象的替换权重值。
具体的,为了使训练样本的数据更准确,在对多模态训练样本集进行单模态训练样本组类别个数划分之前,还可以按照预设的规则对样本数据进行更新,进一步的,根据 更新后的样本数据确定各模态替换对象的替换权重值,例如,预设更新时间间隔对样本数据进行更新,或者根据保单业务更新状态对样本数据进行更新。
示例性的,设置每隔30天对样本数据进行更新,相应的根据更新后样本数据对各模态的替换权重值进行更新,进一步得到更符合当前实际情况的多模态混合模型,以使得保单业务信息识别处理结果更准确。
本申请实施例中,进一步的,根据多模态混合处理结果对保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件包括:将对多模态混合处理结果标记的保单产品标识、时间标识、主体对象标识与保单业务需求中的核保需求、状态追踪需求、抽样评估需求进行匹配;若匹配,则将承保启动触发事件配置于保单业务信息执行核保操作、状态追踪操作、抽样评估操作完成后,以执行承保启动触发事件的触发操作。
具体的,通过多模态混合处理可以得到对保单业务进行分类的内容进行标签分类,可以包括保单产品标识、时间标识、主体对象标识等,将其与保单业务需求中的核保需求、状态追踪需求、抽样评估需求进行匹配,若匹配,则将承保启动触发事件配置于保单业务信息中,以指示执行对应的操作。其中,标签分类用于表征标记不同保单业务信息所对应的保单产品、时间、主体对象等。产品标识用于表征产品类别,例如车险,人身保险等。时间标识用于表征保单签订时间或保单生效时间等。主体对象用于表征用户群体。核保需求用于表征用户签订订单,保司审查阶段。状态追踪需求用于表征保单处于有效期内,保司承保过程中对保单业务状态的实时确认。抽样评估需求用于表征对保单完成质量的抽样评估调查。
本申请实施例中,进一步的,启动对保单业务信息的承保操作包括:调取承保线程,并解析保单业务信息中的承保参数;将承保参数引入承保线程中的承保操作执行指令中,以使按照承保操作执行指令执行承保业务。
其中,承保参数用于标识不同承保时间段、不同承保业务阶段的承保状态。
需要说明的是,根据保险产品类别的不同,保司会制定相应的承保线程,以指示保险业务有序的进行。在触发了承保启动操作后,首先调取承保线程,并将保单务信息中承保参数引入承保线程中的操作执行指令中,以使得承保业务按照指令有序进行。
另外,由于保单业务信息中的图像数据、以及文本数据可以从保司的保单信息数据库中获取,而数据库中的数据可以由业务人员在为用户办理业务时进行录入,包含业务人员将用户填写的订单经过扫描或拍照获得的图像数据以及文本数据,因此会存在图像不清晰,或录入误差等问题,从而影响多模态混合识别的准确度。
为了避免这一问题,本申请实施例中,可选的,解析保单业务信息中的图像数据、以及文本数据之后,本实施例方法还包括:按照图像清晰度、图像像素配比、图像大小对图像数据进行筛选,确定待进行多模态混合识别处理的图像数据;根据自然语言处理数据库对文本数据进行异常词语筛选,确定待进行多模态混合识别处理的文本数据。
具体的,在对保单业务信息数据进行多模态混合识别处理前,可以对数据进行筛选。针对图像数据,优选的,可以按照图像清晰度、图像像素配比、图像大小对其进行筛选,以确定待处理的图像数据;针对文本数据,优选的,可以根据自然语言处理数据库对文本数据进行异常词语筛选,以确定待处理的文本数据,例如,NLP自然语言处理模型,本申请实施例不做具体限定。
本申请提供了一种基于多模态混合模型的业务数据处理方法,与现有技术相比,本申请实施例通过解析保单业务信息中的图像数据、以及文本数据;基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作,使各个模态之间处理具有关联性,提高了保单数据处理的整体性准确率,从而提高了业务数据的处理效率。
进一步的,作为对上述图1所示方法的实现,本申请实施例提供了一种基于多模态混合模型的业务数据处理装置,如图5所示,该装置包括:
解析模块21,处理模块22,分类模块23,启动模块24。
解析模块21,用于解析保单业务信息中的图像数据、以及文本数据;
处理模块22,用于基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
分类模块23,用于根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
启动模块24,用于当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
在具体的应用场景中,所述处理模块22之前,所述装置还包括:
获取模块,用于获取多模态训练样本集,所述多模态训练样本集中包含至少两个单模态训练样本组;
构建模块,用于构建多模态混合网络结构,并获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象,所述单模态输入参数的个数之和小于或等于多模态输入参数个数;
训练模块,用于基于所述模态替换对象、所述单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型,所述多模态混合模型的损失函数为基于混合模态方式确定的。
在具体的应用场景中,所述训练模块之前,所述装置还包括:
划分模块,用于对所述多模态训练样本集进行单模态训练样本组类别个数划分;
配置模块,用于根据划分的类别个数之间的比值配置各模态替换对象的替换权重值;
在具体的应用场景中,所述构建模块包括:
确定单元,用于确定与所述单模态输入参数匹配的至少一个模态替换对象,按照所述替换权重值从所述模态替换对象中筛选唯一匹配所述单模态输入参数的模态替换对象。
在具体的应用场景中,所述划分模块之前,所述装置还包括:
更新模块,用于按照预设时间间隔、保单业务更新状态对所述多模态训练样本集中单模态训练样本组进行更新,以更新各模态替换对象的替换权重值。
在具体的应用场景中,所述所述标签分类用于表征标记不同保单业务信息所对应的保单产品、时间、主体对象,所述分类模块23包括:
匹配单元,用于将对所述多模态混合处理结果标记的保单产品标识、时间标识、主体对象标识与所述保单业务需求中的核保需求、状态追踪需求、抽样评估需求进行匹配;
触发单元,用于若匹配,则将承保启动触发事件配置于所述保单业务信息执行核保操作、状态追踪操作、抽样评估操作完成后,以执行所述承保启动触发事件的触发操作。
在具体的应用场景中,所述启动模块24包括:
调取单元,用于调取承保线程,并解析所述保单业务信息中的承保参数,所述承保参数用于标识不同承保时间段、不同承保业务阶段的承保状态;
执行单元,用于将所述承保参数引入所述承保线程中的承保操作执行指令中,以使按照所述承保操作执行指令执行承保业务。
在具体的应用场景中,所述解析模块21之后,所述装置还包括:
筛选模块,用于按照图像清晰度、图像像素配比、图像大小对所述图像数据进行筛选,确定待进行多模态混合识别处理的图像数据;
确定模块,用于根据自然语言处理数据库对所述文本数据进行异常词语筛选,确定待进行多模态混合识别处理的文本数据。
本申请提供了一种基于多模态混合模型的业务数据处理装置,与现有技术相比,本申请实施例通过解析保单业务信息中的图像数据、以及文本数据;基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作,使各个模态之间处理具有关联性,提高了保单数据处理的整体性准确率, 从而提高了业务数据的处理效率。
根据本申请一个实施例提供了一种计算机可读存储介质,所述存储介质存储有至少一可执行指令,该计算机可执行指令可执行上述任意方法实施例中的基于多模态混合模型的业务数据处理方法,所述计算机可读存储介质可以是非易失性,也可以是易失性。
图6示出了根据本申请一个实施例提供的一种计算机设备的结构示意图,本申请具体实施例并不对计算机设备的具体实现做限定。
如图6所示,该计算机设备可以包括:处理器(processor)302、通信接口(Communications Interface)304、存储器(memory)306、以及通信总线508。
其中:处理器302、通信接口304、以及存储器306通过通信总线308完成相互间的通信。
通信接口304,用于与其它设备比如客户端或其它服务器等的网元通信。
处理器302,用于执行程序310,具体可以执行上述基于多模态混合模型的业务数据处理方法实施例中的相关步骤。
具体地,程序310可以包括程序代码,该程序代码包括计算机操作指令。
处理器302可能是中央处理器CPU,或者是特定集成电路ASIC(Application Specific Integrated Circuit),或者是被配置成实施本申请实施例的一个或多个集成电路。计算机设备包括的一个或多个处理器,可以是同一类型的处理器,如一个或多个CPU;也可以是不同类型的处理器,如一个或多个CPU以及一个或多个ASIC。
存储器306,用于存放程序310。存储器306可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。
程序310具体可以用于使得处理器302执行以下操作:
解析保单业务信息中的图像数据、以及文本数据;
基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。

Claims (22)

  1. 一种基于多模态混合模型的业务数据处理方法,其中,包括:
    解析保单业务信息中的图像数据、以及文本数据;
    基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
    根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
    当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
  2. 根据权利要求1所述的方法,其中,所述基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果之前,所述方法还包括:
    获取多模态训练样本集,所述多模态训练样本集中包含至少两个单模态训练样本组;
    构建多模态混合网络结构,并获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象,所述单模态输入参数的个数之和小于或等于多模态输入参数个数;
    基于所述模态替换对象、所述单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型,所述多模态混合模型的损失函数为基于混合模态方式确定的。
  3. 根据权利要求2所述的方法,其中,所述基于所述模态替换对象、所述单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型之前,所述方法还包括:
    对所述多模态训练样本集进行单模态训练样本组类别个数划分;
    根据划分的类别个数之间的比值配置各模态替换对象的替换权重值;
    所述获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象包括:
    确定与所述单模态输入参数匹配的至少一个模态替换对象,按照所述替换权重值从所述模态替换对象中筛选唯一匹配所述单模态输入参数的模态替换对象。
  4. 根据权利要求3所述的方法,其中,所述对所述多模态训练样本集进行单模态训练样本组类别个数划分之前,所述方法还包括:
    按照预设时间间隔、保单业务更新状态对所述多模态训练样本集中单模态训练样本组进行更新,以更新各模态替换对象的替换权重值。
  5. 根据权利要求1所述的方法,其中,所述所述标签分类用于表征标记不同保单 业务信息所对应的保单产品、时间、主体对象,所述根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件包括:
    将对所述多模态混合处理结果标记的保单产品标识、时间标识、主体对象标识与所述保单业务需求中的核保需求、状态追踪需求、抽样评估需求进行匹配;
    若匹配,则将承保启动触发事件配置于所述保单业务信息执行核保操作、状态追踪操作、抽样评估操作完成后,以执行所述承保启动触发事件的触发操作。
  6. 根据权利要求5所述的方法,其中,所述启动对所述保单业务信息的承保操作包括:
    调取承保线程,并解析所述保单业务信息中的承保参数,所述承保参数用于标识不同承保时间段、不同承保业务阶段的承保状态;
    将所述承保参数引入所述承保线程中的承保操作执行指令中,以使按照所述承保操作执行指令执行承保业务。
  7. 根据权利要求1-6任一项所述的方法,其中,所述解析保单业务信息中的图像数据、以及文本数据之后,所述方法还包括:
    按照图像清晰度、图像像素配比、图像大小对所述图像数据进行筛选,确定待进行多模态混合识别处理的图像数据;
    根据自然语言处理数据库对所述文本数据进行异常词语筛选,确定待进行多模态混合识别处理的文本数据。
  8. 一种基于多模态混合模型的业务数据处理装置,其中,包括:
    解析模块,用于解析保单业务信息中的图像数据、以及文本数据;
    处理模块,用于基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
    分类模块,用于根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
    启动模块,用于当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
  9. 一种计算机可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于多模态混合模型的业务数据处理方法,包括:
    解析保单业务信息中的图像数据、以及文本数据;
    基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到 的;
    根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
    当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
  10. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现对所述基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果之前,所述方法还包括:
    获取多模态训练样本集,所述多模态训练样本集中包含至少两个单模态训练样本组;
    构建多模态混合网络结构,并获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象,所述单模态输入参数的个数之和小于或等于多模态输入参数个数;
    基于所述模态替换对象、所述单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型,所述多模态混合模型的损失函数为基于混合模态方式确定的。
  11. 根据权利要求10所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现所述模态替换对象、所述单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型之前,所述方法还包括:
    对所述多模态训练样本集进行单模态训练样本组类别个数划分;
    根据划分的类别个数之间的比值配置各模态替换对象的替换权重值;
    所述获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象包括:
    确定与所述单模态输入参数匹配的至少一个模态替换对象,按照所述替换权重值从所述模态替换对象中筛选唯一匹配所述单模态输入参数的模态替换对象。
  12. 根据权利要求11所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现所述对所述多模态训练样本集进行单模态训练样本组类别个数划分之前,所述方法还包括:
    按照预设时间间隔、保单业务更新状态对所述多模态训练样本集中单模态训练样本组进行更新,以更新各模态替换对象的替换权重值。
  13. 根据权利要求9所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现所述标签分类用于表征标记不同保单业务信息所对应的保单产品、时间、主体对象,所述根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件包括:
    将对所述多模态混合处理结果标记的保单产品标识、时间标识、主体对象标识与所述保单业务需求中的核保需求、状态追踪需求、抽样评估需求进行匹配;
    若匹配,则将承保启动触发事件配置于所述保单业务信息执行核保操作、状态追踪 操作、抽样评估操作完成后,以执行所述承保启动触发事件的触发操作。
  14. 根据权利要求13所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现所述启动对所述保单业务信息的承保操作包括:
    调取承保线程,并解析所述保单业务信息中的承保参数,所述承保参数用于标识不同承保时间段、不同承保业务阶段的承保状态;
    将所述承保参数引入所述承保线程中的承保操作执行指令中,以使按照所述承保操作执行指令执行承保业务。
  15. 根据权利要求9-14任一项所述的计算机可读存储介质,其中,所述计算机可读指令被处理器执行时实现所述解析保单业务信息中的图像数据、以及文本数据之后,所述方法还包括:
    按照图像清晰度、图像像素配比、图像大小对所述图像数据进行筛选,确定待进行多模态混合识别处理的图像数据;
    根据自然语言处理数据库对所述文本数据进行异常词语筛选,确定待进行多模态混合识别处理的文本数据。
  16. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其中,所述计算机可读指令被处理器执行时实现基于多模态混合模型的业务数据处理方法,包括:
    解析保单业务信息中的图像数据、以及文本数据;
    基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果,所述多模态混合模型为基于多模态输入参数的个数分别配置模态替换对象进行构建,并以混合模态方式确定损失函数完成模型训练得到的;
    根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件;
    当检测到所述承保启动触发事件,则启动对所述保单业务信息的承保操作。
  17. 根据权利要求16所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现所述基于完成训练的多模态混合模型对所述图像数据、所述文本数据进行多模态混合识别处理,得到多模态混合处理结果之前,所述方法还包括:
    获取多模态训练样本集,所述多模态训练样本集中包含至少两个单模态训练样本组;
    构建多模态混合网络结构,并获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象,所述单模态输入参数的个数之和小于或等于多模态输入参数个数;
    基于所述模态替换对象、所述单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型,所述多模态混合模型的损失函数为基于混合模态方式确定的。
  18. 根据权利要求17所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现所述基于所述模态替换对象、所述单模态训练样本组对多模态混合网络结构进行模型训练,得到多模态混合模型之前,所述方法还包括:
    对所述多模态训练样本集进行单模态训练样本组类别个数划分;
    根据划分的类别个数之间的比值配置各模态替换对象的替换权重值;
    所述获取所述多模态混合网络结构中替换至少一个单模态输入参数的模态替换对象包括:
    确定与所述单模态输入参数匹配的至少一个模态替换对象,按照所述替换权重值从所述模态替换对象中筛选唯一匹配所述单模态输入参数的模态替换对象。
  19. 根据权利要求18所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现所述对所述多模态训练样本集进行单模态训练样本组类别个数划分之前,所述方法还包括:
    按照预设时间间隔、保单业务更新状态对所述多模态训练样本集中单模态训练样本组进行更新,以更新各模态替换对象的替换权重值。
  20. 根据权利要求16所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现所述所述标签分类用于表征标记不同保单业务信息所对应的保单产品、时间、主体对象,所述根据所述多模态混合处理结果对所述保单业务信息进行标签分类,并解析标签分类后各标签分类的保单业务需求以及承保启动触发事件包括:
    将对所述多模态混合处理结果标记的保单产品标识、时间标识、主体对象标识与所述保单业务需求中的核保需求、状态追踪需求、抽样评估需求进行匹配;
    若匹配,则将承保启动触发事件配置于所述保单业务信息执行核保操作、状态追踪操作、抽样评估操作完成后,以执行所述承保启动触发事件的触发操作。
  21. 根据权利要求20所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现所述启动对所述保单业务信息的承保操作包括:
    调取承保线程,并解析所述保单业务信息中的承保参数,所述承保参数用于标识不同承保时间段、不同承保业务阶段的承保状态;
    将所述承保参数引入所述承保线程中的承保操作执行指令中,以使按照所述承保操作执行指令执行承保业务。
  22. 根据权利要求16-21任一项所述的计算机设备,其中,所述计算机可读指令被处理器执行时实现所述解析保单业务信息中的图像数据、以及文本数据之后,所述方法还包括:
    按照图像清晰度、图像像素配比、图像大小对所述图像数据进行筛选,确定待进行多模态混合识别处理的图像数据;
    根据自然语言处理数据库对所述文本数据进行异常词语筛选,确定待进行多模态混合识别处理的文本数据。
PCT/CN2022/071442 2021-08-30 2022-01-11 基于多模态混合模型的业务数据处理方法及装置 WO2023029353A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111007560.6 2021-08-30
CN202111007560.6A CN113723288B (zh) 2021-08-30 2021-08-30 基于多模态混合模型的业务数据处理方法及装置

Publications (1)

Publication Number Publication Date
WO2023029353A1 true WO2023029353A1 (zh) 2023-03-09

Family

ID=78679329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071442 WO2023029353A1 (zh) 2021-08-30 2022-01-11 基于多模态混合模型的业务数据处理方法及装置

Country Status (2)

Country Link
CN (1) CN113723288B (zh)
WO (1) WO2023029353A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308960A (zh) * 2023-03-27 2023-06-23 杭州绿城信息技术有限公司 基于数据分析的智慧园区物业防控管理系统及其实现方法
CN117079299A (zh) * 2023-10-12 2023-11-17 腾讯科技(深圳)有限公司 数据处理方法、装置、电子设备及存储介质
CN117097797A (zh) * 2023-10-19 2023-11-21 浪潮电子信息产业股份有限公司 云边端协同方法、装置、系统、电子设备及可读存储介质
CN117349027A (zh) * 2023-12-04 2024-01-05 环球数科集团有限公司 一种降低算力需求的多模态大模型构建系统和方法

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723288B (zh) * 2021-08-30 2023-10-27 平安科技(深圳)有限公司 基于多模态混合模型的业务数据处理方法及装置

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507850A (zh) * 2020-03-25 2020-08-07 上海商汤智能科技有限公司 核保方法及相关装置、设备
US20210201266A1 (en) * 2019-12-31 2021-07-01 DataInfoCom USA, Inc. Systems and methods for processing claims
CN113723288A (zh) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 基于多模态混合模型的业务数据处理方法及装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3754548A1 (en) * 2019-06-17 2020-12-23 Sap Se A method for recognizing an object in an image using features vectors of an encoding neural network
CN112418302A (zh) * 2020-11-20 2021-02-26 清华大学 一种任务预测方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210201266A1 (en) * 2019-12-31 2021-07-01 DataInfoCom USA, Inc. Systems and methods for processing claims
CN111507850A (zh) * 2020-03-25 2020-08-07 上海商汤智能科技有限公司 核保方法及相关装置、设备
CN113723288A (zh) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 基于多模态混合模型的业务数据处理方法及装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116308960A (zh) * 2023-03-27 2023-06-23 杭州绿城信息技术有限公司 基于数据分析的智慧园区物业防控管理系统及其实现方法
CN116308960B (zh) * 2023-03-27 2023-11-21 杭州绿城信息技术有限公司 基于数据分析的智慧园区物业防控管理系统及其实现方法
CN117079299A (zh) * 2023-10-12 2023-11-17 腾讯科技(深圳)有限公司 数据处理方法、装置、电子设备及存储介质
CN117079299B (zh) * 2023-10-12 2024-01-09 腾讯科技(深圳)有限公司 数据处理方法、装置、电子设备及存储介质
CN117097797A (zh) * 2023-10-19 2023-11-21 浪潮电子信息产业股份有限公司 云边端协同方法、装置、系统、电子设备及可读存储介质
CN117097797B (zh) * 2023-10-19 2024-02-09 浪潮电子信息产业股份有限公司 云边端协同方法、装置、系统、电子设备及可读存储介质
CN117349027A (zh) * 2023-12-04 2024-01-05 环球数科集团有限公司 一种降低算力需求的多模态大模型构建系统和方法
CN117349027B (zh) * 2023-12-04 2024-02-23 环球数科集团有限公司 一种降低算力需求的多模态大模型构建系统和方法

Also Published As

Publication number Publication date
CN113723288B (zh) 2023-10-27
CN113723288A (zh) 2021-11-30

Similar Documents

Publication Publication Date Title
WO2023029353A1 (zh) 基于多模态混合模型的业务数据处理方法及装置
JP6994588B2 (ja) 顔特徴抽出モデル訓練方法、顔特徴抽出方法、装置、機器および記憶媒体
CN110020424B (zh) 合同信息的提取方法、装置和文本信息的提取方法
CN110210542B (zh) 图片文字识别模型训练方法、装置及文字识别系统
CN109872162B (zh) 一种处理用户投诉信息的风控分类识别方法及系统
KR102002024B1 (ko) 객체 라벨링 처리 방법 및 객체 관리 서버
CN110580308B (zh) 信息审核方法及装置、电子设备、存储介质
CN112950170B (zh) 审核方法以及装置
CN111126514A (zh) 图像多标签分类方法、装置、设备及介质
CN114140673B (zh) 一种违规图像识别方法、系统及设备
CN112036295A (zh) 票据图像处理方法、装置、存储介质及电子设备
CN110941702A (zh) 一种法律法规和法条的检索方法及装置、可读存储介质
CN110377733A (zh) 一种基于文本的情绪识别方法、终端设备及介质
CN113989476A (zh) 对象识别方法及电子设备
CN111125177B (zh) 生成数据标签的方法、装置、电子设备及可读存储介质
CN113297379A (zh) 一种文本数据多标签分类方法及装置
CN113128522B (zh) 目标识别方法、装置、计算机设备和存储介质
CN114708595A (zh) 图像文献结构化解析方法、系统、电子设备、存储介质
US8918406B2 (en) Intelligent analysis queue construction
CN114528851B (zh) 回复语句确定方法、装置、电子设备和存储介质
CN110852082A (zh) 同义词的确定方法及装置
CN113408265B (zh) 基于人机交互的语义解析方法、装置、设备及存储介质
CN116263784A (zh) 面向图片文本的粗粒度情感分析方法及装置
CN114677526A (zh) 图像分类方法、装置、设备及介质
CN112232320B (zh) 印刷品文字的校对方法及相关设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE