WO2022227164A1 - 基于人工智能的数据处理方法、装置、设备和介质 - Google Patents

基于人工智能的数据处理方法、装置、设备和介质 Download PDF

Info

Publication number
WO2022227164A1
WO2022227164A1 PCT/CN2021/096388 CN2021096388W WO2022227164A1 WO 2022227164 A1 WO2022227164 A1 WO 2022227164A1 CN 2021096388 W CN2021096388 W CN 2021096388W WO 2022227164 A1 WO2022227164 A1 WO 2022227164A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
bert model
target
information
bert
Prior art date
Application number
PCT/CN2021/096388
Other languages
English (en)
French (fr)
Inventor
徐卓扬
赵婷婷
廖希洋
孙行智
胡岗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022227164A1 publication Critical patent/WO2022227164A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a data processing method, apparatus, device and medium based on artificial intelligence.
  • the Deep Q-Network (DQN) model is a classic algorithm in Deep Reinforcement Learning (DRL)), which optimizes the long-term and short-term synthesis of sequential decision-making problems by combining deep learning and reinforcement learning.
  • DQN model uses a neural network to fit the strategy.
  • the network inputs the state and outputs the Q value corresponding to each action (ie, the expected reward).
  • the action corresponding to the largest Q value is the action that DQN thinks should be selected.
  • the inventor realized that the current state representation of the input DQN model is too simple, for example, text data is directly used as state input, resulting in an inaccurate Q value output by the DQN model, that is, the determined strategy is less reliable.
  • the embodiments of the present application provide a data processing method, apparatus, device, and medium based on artificial intelligence, which help to improve the reliability of the determined user processing strategy.
  • an embodiment of the present application provides a data processing method based on artificial intelligence, including:
  • the current state information is input into the target BERT model for encoding processing, and the encoding vector corresponding to the classifier marked CLS symbol output by the target BERT model is obtained;
  • an embodiment of the present application provides a data processing apparatus, including:
  • the acquisition module is used to acquire sample data
  • a training module is used to obtain the state transition information corresponding to the sample data by using the BERT model, train the BERT model according to the state transition information, and connect with the BERT model according to the output result of the BERT model
  • the DQN model is trained to obtain the trained target BERT model and the target DQN model
  • the obtaining module is also used to obtain the current state information of the target user
  • a processing module configured to input the current state information into the target BERT model for encoding processing, and obtain a coding vector corresponding to the classifier marked CLS symbol output by the target BERT model;
  • the processing module is further configured to input the coding vector corresponding to the CLS symbol into the target DQN model, and obtain action information corresponding to the current state information, where the action information is used to indicate the current state Describe the processing strategy adopted by the target user.
  • an embodiment of the present application provides a data processing device.
  • the data processing device may include a processor and a memory, and the processor and the memory are connected to each other.
  • the memory is used to store a computer program that supports the data processing device to perform the above method or steps
  • the computer program includes program instructions
  • the processor is configured to invoke the program instructions to execute the above artificial intelligence-based data Processing method
  • the artificial intelligence-based data processing method includes:
  • the current state information is input into the target BERT model for encoding processing, and the encoding vector corresponding to the classifier marked CLS symbol output by the target BERT model is obtained;
  • an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program includes program instructions that, when executed by a processor, cause all The processor executes the above-mentioned data processing method based on artificial intelligence, and the data processing method based on artificial intelligence includes:
  • the current state information is input into the target BERT model for encoding processing, and the encoding vector corresponding to the classifier marked CLS symbol output by the target BERT model is obtained;
  • the BERT model can learn the state transition information, so that the output vector of the BERT model can more accurately represent the meaning information of the sentence, and finally, when the output of the BERT model is input into the DQN model for training, more accurate Q value, so that when determining the policy for the user, the BERT model and the DQN model can be combined to obtain the policy corresponding to the current state, which improves the reliability of the determined user processing policy.
  • FIG. 1 is a schematic flowchart of a data processing method based on artificial intelligence provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of another artificial intelligence-based data processing method provided by an embodiment of the present application.
  • 3a is a schematic diagram of the architecture of a BERT+DQN model provided by an embodiment of the present application.
  • 3b is a schematic diagram of the architecture of another BERT+DQN model provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the technical solution of the present application can be applied to the field of artificial intelligence and/or big data technology, such as in scenarios such as treatment plan determination and user grouping, so as to intelligently determine reliable strategies for users and promote the construction of smart cities.
  • the present application may be implemented through a data platform or other device.
  • the data involved in this application such as sample data, status information and/or action information, may be stored in a blockchain node, or may be stored in a database, which is not limited in this application.
  • the BERT model is an advanced neural language pre-training model, and the DQN model, as a classic algorithm in DRL, realizes policy determination by combining deep learning and reinforcement learning with reinforcement learning.
  • the reinforcement learning method is that the agent adopts a certain policy (policy), takes action (action) for the state (state) and gets reward (reward), and then optimizes the policy (policy) through the obtained reward (reward) Artificial intelligence Smart way.
  • policy refers to an action that should be taken under a specific state to maximize the expected reward.
  • This application determines the adopted strategy by combining the BERT (Bidirectional Encoder Representation from Transformers) model and the DRL model such as the DQN model, so that the determined strategy is more accurate and more reliable.
  • this application can obtain sample data, train the BERT model and the DRL model such as the DQN model based on the sample data, and then input user information such as user status information into the trained BERT model, and input the output information of the BERT model into the trained BERT model.
  • a DRL model such as a DQN model, finally obtains the action information corresponding to the user state information, and obtains currently available strategies, such as a treatment plan, an exercise plan, and/or a grouping plan, and so on.
  • the technical solutions of the present application can be applied to a data processing device (data processing apparatus) for determining a processing strategy.
  • the data processing device may be a terminal, a server, or a data platform or other devices.
  • the terminal may include a mobile phone, a tablet computer, a computer, etc., which is not limited in this application. It can be understood that, in other embodiments, the terminal may also be called other names, such as terminal equipment, smart terminal, user equipment, user terminal, etc., which are not listed here one by one.
  • the embodiments of the present application provide an artificial intelligence-based data processing method, apparatus, device, and medium, etc., so as to help improve the reliability of the determined policy. Detailed descriptions are given below.
  • FIG. 1 is a schematic flowchart of an artificial intelligence-based data processing method provided by an embodiment of the present application.
  • the method can be performed by the above-mentioned data processing device, as shown in FIG. 1 , the method can include the following steps:
  • the sample data may be text data, table data, picture data, etc.
  • the obtained sample data may include multiple (multiple groups), which are not limited in this application.
  • the sample data may be long-term follow-up data of diabetic patients, by collecting long-term follow-up data of a large number of diabetic patients as sample data.
  • each follow-up data of each patient is regarded as a (group) of sample data
  • a sample data may include one or more characteristic data such as patient demographic information, inspection indicators, medication history, and prescription information, etc.
  • it may be obtained by transformation according to one or more characteristic data such as the patient's demographic information, test and inspection indicators, medication history, and medication prescribing information.
  • the characteristic data of a plurality of users may be acquired, and the characteristic data includes continuous characteristic data; and then the continuous characteristic data may be discretized to obtain the discretized characteristic data, and
  • the discretized feature data is used as sample data. That is to say, for continuous feature data, the continuous feature data can be discretized as sample data. This helps to reduce model training complexity while ensuring training reliability.
  • the continuous feature data is discretized to obtain the discretized feature data by dividing the continuous feature data into n intervals, each interval corresponding to a discrete value; if the user's continuous feature data If it belongs to a certain interval, the discrete value corresponding to the interval is determined as the discrete characteristic data. That is to say, the discretization process may be to divide the continuous feature data into n intervals, each interval corresponds to a discrete value, and then the interval corresponding to the continuous feature data of the user (patient) can be determined according to the interval to which the continuous feature data of the user (patient) belongs. Calculate the characteristic data, that is, the discrete values corresponding to the interval. For example, combined with clinical knowledge, all continuous patient characteristics are discretized and divided into 5 intervals, each interval corresponds to a discrete value. The 5 intervals are: very low, lower than normal, normal, higher than normal, and very high.
  • the sample data may be obtained from the blockchain, that is, the sample data may be pre-stored in the blockchain.
  • the reliability of the obtained data can be improved, and the accuracy of the model trained based on the sample data can be improved, thereby improving the reliability of the user processing strategy determined subsequently.
  • a data processing device can send a data request carrying a project identifier to a blockchain node, so that the blockchain node can search for the sample data corresponding to the identifier after receiving the request and passing the identity verification of the data processing device. , and then the found sample data can be returned to the data processing device, and the data processing device can receive the net benefit sent by the blockchain node.
  • the item identifier may be a disease identifier, a region identifier, a gender identifier, an age identifier, etc., which is not limited in this application.
  • the sample data can be obtained from a server, for example, by sending a data request carrying an item ID to the server to request sample data corresponding to the item ID, which is similar to the above-mentioned method of requesting data from a blockchain node, here I won't go into details.
  • the BERT model and the DRL model such as the DQN model can be trained based on the sample data.
  • the sample data can be input into the BERT model, and the state transition information corresponding to the sample data can be obtained to train the BERT model + DQN model.
  • the sample data includes state information and action information.
  • the state transition information may be related information of the next state feature.
  • the state transition information corresponding to the sample data is obtained by using the BERT model, the BERT model is trained according to the state transition information, and the DQN model connected to the BERT model is trained according to the output result of the BERT model, and the training is obtained.
  • the method of the target BERT model and the target DQN model can be to input the state information included in the sample data into the BERT model for encoding processing, obtain the output result such as the encoding vector corresponding to the CLS symbol, and obtain the action information included in the sample data.
  • the corresponding one-hot encoding vector; and then the encoding vector corresponding to the CLS symbol and the one-hot encoding vector corresponding to the action information can be vector spliced to obtain the spliced vector; then the corresponding sample data can be determined according to the spliced vector. state transition information, and adjust each model parameter of the BERT model according to the state transition information. Further, the coding vector corresponding to the CLS symbol can also be input into the DQN model, the DQN model is trained, and each model parameter of the BERT model is adjusted through the iterative training of the DQN model to obtain the trained target BERT model and target. DQN model.
  • the state transition information corresponding to the sample data is determined according to the spliced vector, and the mode of adjusting each model parameter of the BERT model according to the state transition information may be to predict the spliced vector, Obtain the predicted next state information/features, such as inputting the spliced vector into the fully connected layer for prediction, and obtain the predicted next state feature; and then according to the predicted next state feature and the actual next state feature, for The various model parameters of the BERT model are adjusted.
  • the word vector corresponding to one or more features in the sample data can also be replaced by a MASK symbol; then the position of the MASK symbol in each feature of the input BERT model can be obtained, and the The encoded word vector corresponding to the position of the MASK symbol in each encoded word vector output by the BERT model. Further, can predict the feature of this MASK symbol of this BERT model according to the coded word vector corresponding to the position of this MASK symbol, obtain the predicted feature of this MASK symbol; And then according to this predicted feature and the actual feature of this MASK symbol, this The various model parameters of the BERT model are adjusted. This can further improve the reliability of the BERT model training.
  • BERT can be used to pre-train the patient's state first, and the state of the patient's current visit can be input.
  • the pre-training includes: MASK (Mask) to drop a part of the token, and then use the output corresponding to the position of the MASK to predict the token to be masked; Use the output of CLS to predict the token that will appear in the next state.
  • the output of the CLS of the BERT model can be connected to the input of the DQN model, and the DQN model can be used for iterative training of reinforcement learning to fine-tune the BERT model while fitting the optimal policy.
  • the method of predicting the feature of the MASK symbol input to the BERT model according to the encoded word vector corresponding to the location of the MASK symbol may be to input the encoded word vector corresponding to the location of the MASK symbol into the fully connected layer for prediction , to obtain the features of the MASK symbol input to the BERT model.
  • the state transition information may also be acquired in other ways, for example, acquired from a server based on sample data, or determined by other algorithms, which is not limited in this application.
  • the target user may refer to any one or more users, or may refer to a specific one or more users, such as a user who initiates a decision request, or a user who needs to be followed up, etc.
  • This application does not Do limit.
  • the current state information may refer to any state information.
  • the current state information may include one or more of demographic information, test indicators, and medication history.
  • the BERT model is an advanced neural language pretraining model originally designed to learn representations of sentences consisting of words.
  • the input is each word and the output is the representation of the sentence.
  • the BERT model can be used in the DRL task of determining a strategy for a patient, for example, each feature (state) of a certain visit of the patient can be input, and the representation of the patient at the current visit can be output.
  • the BERT+DQN model After training the BERT+DQN model, you can input user information such as user status information into the trained BERT model, input the output information of the BERT model into the trained DRL model such as the DQN model, and finally get the corresponding user status information.
  • Action information predict the current strategies that can be taken, such as treatment plans, exercise plans and/or grouping plans, etc. That is to say, after the training is completed, when the model is used, the various characteristic information of the patient's state can be input, the decision corresponding to the maximum Q value can be output, and the patient grouping can be performed.
  • messages may also be pushed for the target user based on the determined action information/policy, such as marketing messages, service messages, medical messages, etc. This application does not Do limit.
  • the BERT+DQN scheme proposed in this application combines deep reinforcement learning technology and neuro-linguistic pre-training technology to find the optimal user processing strategy while learning a better representation of the patient's state.
  • the learned patient state representation contains more potential information and state transition information.
  • the state transition is also determined by the actual action taken, it may contain certain expert decision (action) information, so that the determined user can process it.
  • action expert decision
  • the strategy is more accurate, reasonable and secure.
  • This combined framework integrates the pre-training step of DRL into the pre-training of the BERT model, reducing the useless exploration of the DRL model, and is suitable for various fields.
  • the BERT model can learn the state transition information, so that the output vector of the BERT model can more accurately represent the meaning information of the sentence, and finally, when the output of the BERT model is input into the DQN model for training, more accurate Q value, so that when determining the policy for the user, the BERT model and the DQN model can be combined to obtain the policy corresponding to the current state, which improves the reliability of the determined user processing policy.
  • FIG. 2 is a schematic flowchart of another data processing method based on artificial intelligence provided by an embodiment of the present application.
  • the method can be executed by the above-mentioned data processing device, and can be applied to a user grouping scenario. As shown in Figure 2, the method may include the following steps:
  • the sample data can be input into the BERT model for encoding processing, such as inputting all sample data or state information in the sample data into the BERT model to obtain the output result, including the coding vector corresponding to the CLS symbol marked by the classifier,
  • the encoded vector can be used as the semantic representation of all sample data input to the BERT model.
  • the BERT model can be used to encode the one-dimensional word vector corresponding to each feature in the sample data, and output the encoded word vector corresponding to each feature.
  • the encoded word vector can be used as the textual semantic representation of the feature. It can be understood that the coded word vectors corresponding to similar or similar features are also relatively close in the vector space. For example, the higher the feature similarity, the closer the coded word vectors corresponding to the features are in the vector space.
  • the one-dimensional word vector corresponding to one or more features in the sample data input to the BERT model can also be replaced by the MASK symbol.
  • the one-dimensional word corresponding to a certain feature Vectors are replaced by MASK notation.
  • the encoded word vector corresponding to the position of the MASK symbol can be input into the fully connected layer for prediction, and the characteristics of the MASK symbol input to the BERT model can be obtained. According to the predicted characteristics According to the actual characteristics of the MASK symbol, each model parameter of the BERT model is adjusted.
  • the encoding vector corresponding to the CLS symbol and the one-hot encoding vector corresponding to the action can be vector spliced to obtain the spliced vector, which is convenient for training the BERT model based on the splicing vector.
  • the one-hot encoding vector corresponding to the action taken can be obtained for the state in the sample data.
  • the action can be the prescription plan adopted by the doctor, and the one-hot encoding vector corresponding to the prescription plan adopted by the doctor can be further concatenated with the encoding vector corresponding to the CLS symbol.
  • the spliced vector can be input into the fully connected layer for prediction, and the predicted state characteristics of the next follow-up can be obtained.
  • the state characteristics of the next follow-up adjust each model parameter of the BERT model, so that the predicted state characteristics of the next follow-up and the actual state characteristics of the next follow-up match the highest or higher than the threshold, so as to realize the BERT model. training.
  • the spliced vector can be input into the fully connected layer, and the state characteristics of the next follow-up visit can be predicted according to the vector output by the fully connected layer.
  • the matching degree between the status features of the follow-up adjust each model parameter of the BERT model, so that the matching degree between the predicted status features obtained in the next follow-up and the actual status features of the next follow-up is the largest.
  • state can be a multi-dimensional vector consisting of demographic information, inspection indicators, and medication history
  • action can be one-hot encoding
  • the pre-training of the BERT model is completed through the above steps 201-204.
  • the DQN model can be trained through step 205, and the BERT model can be adjusted.
  • the coding vector corresponding to the CLS symbol can also be input into the DQN model, and the DQN model can be used for iterative training of reinforcement learning, so as to fine-tune the BERT model and fit the optimal strategy.
  • BERT can be used to pre-train the patient's state, and input the patient's current follow-up state, such as gender, glycation information, BMI information, blood pressure information, whether biguanide has been used, age information, and other tokens.
  • part of the token can be dropped by MASK, and then the output corresponding to the position of MASK is used to predict the token to be masked, and the output of CLS is used to predict the token that will appear in the next state to train BERT.
  • the output of the CLS of the BERT model can be connected to the input of the DQN model, and the DQN model can be used for iterative training of reinforcement learning to fine-tune the BERT model while fitting the optimal policy.
  • the DQN may include one or more fully connected layers.
  • BERT can be used to pre-train the patient's state, input the patient's current follow-up state, MASK to drop a part of the token, and then use the output corresponding to the position of the MASK to predict the token to be masked, using CLS The output predicts the token that will appear in the next state, and trains BERT.
  • the BRET output result such as the encoding vector corresponding to the CLS symbol, and obtaining the one-hot encoding vector corresponding to the action
  • the encoding vector corresponding to the CLS symbol and the one-hot encoding vector corresponding to the action can be vector spliced.
  • the vector is input to the fully connected layer for prediction, the prediction feature token is obtained, and the BERT model is adjusted according to the predicted feature and the actual feature.
  • the output of the CLS of the BERT model can be connected to the input of the DQN model, and the DQN model can be used for iterative training of reinforcement learning to fine-tune the BERT model while fitting the optimal policy.
  • the action information can be used to indicate the processing strategy adopted for the target user in the current state.
  • the one-dimensional word vector corresponding to each feature of the current state can be obtained, and each feature corresponds to The one-dimensional word vector is input into the BERT model for encoding processing, and the encoding vector corresponding to the CLS symbol output by the BERT model is obtained, and the encoding vector corresponding to the CLS symbol is used as the semantic representation of all the feature data input to the BERT model.
  • the coding vector corresponding to the CLS symbol can be input into the DQN model, so as to obtain the Q values corresponding to various actions, and the action corresponding to the maximum Q value can be determined as the action that should be taken in the current state, for example, for diabetes
  • the patient prescribes medicine as an example, and the prescribing scheme corresponding to the maximum Q value is determined as the prescribing scheme for the patient.
  • User grouping can group users into a group according to certain conditions (attributes). After grouping, various analyses and operations can be performed on the users of the user group. For example, users can be sent to the same user group. It can analyze the user characteristics of the user group with the optimal conditions, or provide the same solution for users of the same user group, and so on. If only deep reinforcement learning is applied to patient grouping, it often faces the problem of too simple patient state representation. Clustering results may be inaccurate. In order to give a more accurate clustering result, a more accurate representation of the state and incorporating more information is required.
  • the BERT model and the reinforcement learning method are combined for the framework of patient grouping.
  • the BERT model and the deep reinforcement learning method have exerted their respective advantages. Combining them for patient grouping can obtain more accurate and reasonable grouping. As a result, the grouping habits of doctors are additionally pre-trained. Because some reinforcement learning models ignore the modeling of state transitions, state transition information is lost to a certain extent. Using the BERT model for pre-training can incorporate state transition information for patient grouping and improve accuracy.
  • the patient grouping can be performed by inputting various feature information of the target user's state, and outputting a decision corresponding to the maximum Q value. That is, the present application can train a BERT+DQN model for user grouping. For example, grouping processing can be performed according to the output action information. The action information corresponding to users belonging to the same user group is matched. Matching action information may refer to the same or similar actions/strategies adopted, such as being in the same interval. For example, prescribing medicines corresponds to the grouping scheme, different prescriptions are different prescribing schemes, and expert prescribing is the expert’s grouping scheme.
  • completing user grouping for example, completing grouping for target group users, message push, user feature analysis, and the like may also be performed based on the grouping situation.
  • the determined action information may be a treatment plan for diabetes.
  • the data processing device can also push targeted messages for each user group according to the divided user group, send follow-up reminders, etc., to facilitate management and maintenance, enhance user experience, and ensure treatment effects. .
  • state can be a multi-dimensional vector composed of demographic information, inspection indicators, and medication history
  • action can be a one-hot encoding of a grouping scheme
  • reward can be preset.
  • the output vector of the BERT model can be spliced with the one-hot encoding vector corresponding to the action vector, so that the features of the next state can be predicted through the fully connected layer, and the features of the actual next state can be matched to the parameters of the BERT model. Adjustment, so that the BERT model can learn the state transition information, and the output vector of the BERT model can more accurately represent the meaning information of the sentence. Finally, when the output vector of the BERT model is input into the DQN model for training, more accurate Q can be obtained. value.
  • the BERT+DQN method proposed in this scheme combines deep reinforcement learning technology and neuro-linguistic pre-training technology to find the optimal patient grouping decision while learning a better representation of patient follow-up.
  • the learned patient follow-up representation includes more potential information and state transition information.
  • the state representation also includes certain expert decision (action) information, and the recommended patient grouping decision is more accurate, reasonable and safe.
  • This combined framework integrates the pre-training step of DRL for patient grouping into the pre-training of the BERT model, reducing the useless exploration of the DRL model. Understandably, this framework can be used not only in the field of patient clustering, but also in other fields such as offline reinforcement learning.
  • the embodiments of the present application also provide a data processing apparatus.
  • the apparatus may include means for performing the method described in FIG. 1 or FIG. 2 above.
  • FIG. 4 is a schematic structural diagram of a data processing apparatus provided by an embodiment of the present application.
  • the data processing apparatus described in this embodiment may be configured in a data processing device.
  • the data processing apparatus 400 in this embodiment may include an acquisition module 401 , a training module 402 and a processing module 403 . in,
  • an acquisition module 401 configured to acquire sample data
  • a training module 402 is used to obtain state transition information corresponding to the sample data by using the BERT model, and to train the BERT model according to the state transition information, and to compare the BERT model with the BERT model according to the output result of the BERT model.
  • the connected DQN model is trained to obtain the trained target BERT model and the target DQN model;
  • the obtaining module 401 is also used to obtain the current state information of the target user;
  • a processing module 403 configured to input the current state information into the target BERT model for encoding processing, and obtain a coding vector corresponding to the classifier marked CLS symbol output by the target BERT model;
  • the processing module 403 is further configured to input the coding vector corresponding to the CLS symbol into the target DQN model to obtain action information corresponding to the current state information, where the action information is used to indicate that the current state The processing strategy adopted by the target user.
  • the sample data includes state information and action information;
  • the training module 402 obtains state transition information corresponding to the sample data by using a BERT model, and analyzes the state transition information for the BERT according to the state transition information.
  • the model is trained, and the DQN model connected with the BERT model is trained according to the output result of the BERT model, and the trained target BERT model and the target DQN model are obtained, including:
  • the state information included in the sample data is input into the BERT model for encoding processing, and the encoding vector corresponding to the CLS symbol is obtained;
  • the coding vector corresponding to the CLS symbol is input into the DQN model, the DQN model is trained, and each model parameter of the BERT model is adjusted to obtain the trained target BERT model and the target DQN model.
  • the training module 402 determines state transition information corresponding to the sample data according to the spliced vector, and adjusts each model parameter of the BERT model according to the state transition information, including :
  • each model parameter of the BERT model is adjusted.
  • the sample data includes state information, and the state information includes a plurality of features
  • the training module 402 is also used to replace the one-dimensional word vector corresponding to one or more features in the state information by the MASK symbol; obtain the position of the MASK symbol in each feature of the input BERT model, and obtain the The encoded word vector corresponding to the position of the MASK symbol in each encoded word vector output by the BERT model; according to the encoded word vector corresponding to the location of the MASK symbol, predict the feature of the MASK symbol input into the BERT model, and obtain the The prediction feature of the MASK symbol; according to the prediction feature and the actual feature of the MASK symbol, each model parameter of the BERT model is adjusted.
  • the training module 402 predicts the characteristics of the MASK symbol input to the BERT model according to the encoded word vector corresponding to the position of the MASK symbol, including:
  • the encoded word vector corresponding to the position of the MASK symbol is input to the fully connected layer for prediction, and the feature of the MASK symbol input to the BERT model is obtained.
  • the obtaining module 401 obtains sample data, including:
  • the feature data includes continuous feature data
  • the continuous feature data is subjected to discretization processing to obtain discretized feature data, and the discretized feature data is used as sample data.
  • the processing module 403 is further configured to determine the user group to which the target user belongs according to the action information corresponding to the current state information;
  • the action information corresponding to users belonging to the same user group is matched.
  • each functional module of the data processing apparatus in this embodiment can be specifically implemented according to the method in FIG. 1 or FIG. 2 of the above method embodiment, and the specific implementation process can refer to the related method in FIG. 1 or FIG. 2 of the above method embodiment. description, which will not be repeated here.
  • FIG. 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the data processing apparatus may include: a processor 501 and a memory 502 .
  • the processor 501 and the memory 502 may be connected to each other.
  • the data processing device may further include a communication interface 503 .
  • the above-mentioned processor 501 , memory 502 and communication interface 503 may be connected through a bus or other means, which is not limited in this application.
  • the memory 502 can be used to store program instructions, and the processor 501 can be used to invoke the program instructions to execute part or all of the steps in the above embodiments, such as executing part or all of the steps executed by the data processing device in the above embodiments.
  • Communication interface 503 may be controlled by the processor for sending and receiving messages.
  • the memory 502 may be used to store a computer program comprising program instructions, and the processor 501 may be used to execute the program instructions stored by the memory 502 .
  • the processor 501 is configured to invoke the program instructions to perform the following steps:
  • the current state information is input into the target BERT model for encoding processing, and the encoding vector corresponding to the classifier marked CLS symbol output by the target BERT model is obtained;
  • the sample data includes state information and action information; the obtaining state transition information corresponding to the sample data by using the BERT model is performed, and the BERT model is performed according to the state transition information.
  • the state information included in the sample data is input into the BERT model for encoding processing, and the encoding vector corresponding to the CLS symbol is obtained;
  • the coding vector corresponding to the CLS symbol is input into the DQN model, the DQN model is trained, and each model parameter of the BERT model is adjusted to obtain the trained target BERT model and the target DQN model.
  • the state transition information corresponding to the sample data is determined according to the spliced vector, and each model parameter of the BERT model is adjusted according to the state transition information, including:
  • each model parameter of the BERT model is adjusted.
  • the sample data includes state information, and the state information includes multiple features; the processor 501 is further configured to execute:
  • the one-dimensional word vector corresponding to one or more features in the state information is replaced by the MASK symbol
  • Each model parameter of the BERT model is adjusted according to the predicted feature and the actual feature of the MASK symbol.
  • performing the prediction of the feature of the MASK symbol input to the BERT model according to the encoded word vector corresponding to the position of the MASK symbol including:
  • the encoded word vector corresponding to the position of the MASK symbol is input to the fully connected layer for prediction, and the feature of the MASK symbol input to the BERT model is obtained.
  • performing the acquiring sample data includes:
  • the feature data includes continuous feature data
  • the continuous feature data is subjected to discretization processing to obtain discretized feature data, and the discretized feature data is used as sample data.
  • the processor 501 is further configured to execute:
  • the action information corresponding to users belonging to the same user group is matched.
  • the processor 501 may be a central processing unit (Central Processing Unit, CPU), and the processor 501 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP) ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 502 may include read only memory and random access memory, and provides instructions and data to the processor 501 .
  • a portion of memory 502 may also include non-volatile random access memory.
  • the memory 502 may also store epidemic data of the target infectious disease.
  • the communication interface 503 may include an input device and/or an output device, for example, the input device may be a control panel, a microphone, a receiver, etc., and the output device may be a display screen, a transmitter, etc., which are not listed here.
  • the input device may be a control panel, a microphone, a receiver, etc.
  • the output device may be a display screen, a transmitter, etc., which are not listed here.
  • the processor 501 and the memory 502 may perform the implementations described in the method embodiments described in FIG. 1 or FIG. 2 provided in the embodiments of the present application , the implementation manner of the data processing apparatus described in the embodiments of the present application can also be executed, and details are not described herein again.
  • Embodiments of the present application further provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program includes program instructions, and when the program instructions are executed by a processor, the above-mentioned data processing can be performed Some or all of the steps performed in the method embodiments, such as some or all of the steps performed by a data processing device.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile, which is not limited in this application.
  • Embodiments of the present application further provide a computer program product, where the computer program product includes computer program code, and when the computer program code runs on a computer, causes the computer to execute the steps performed in the above data processing apparatus method embodiments.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store The data created according to the usage of the blockchain nodes such as sample data, action information, policies, etc.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

一种基于人工智能的数据处理方法、装置、设备和介质,可应用于医疗技术领域。其中,该方法包括:获取样本数据(101);利用BERT模型获取所述样本数据对应的状态转移信息,并根据状态转移信息对BERT模型进行训练,以及对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型(102);获取目标用户的当前状态信息(103);将当前状态信息输入目标BERT模型进行编码处理,获得目标BERT模型输出的CLS符号所对应的编码向量(104);将所述CLS符号所对应的编码向量输入目标DQN模型,获得所述当前状态信息对应的动作信息(105)。采用该方法,有助于提升确定出的用户处理策略的可靠性。

Description

基于人工智能的数据处理方法、装置、设备和介质
本申请要求于2021年4月29日提交中国专利局、申请号为202110477679.3,发明名称为“基于人工智能的数据处理方法、装置、设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于人工智能的数据处理方法、装置、设备和介质。
背景技术
深度Q网络(Deep Q-Network,DQN)模型是深度强化学习(Deep Reinforcement Learning,DRL))中的一个经典算法,其通过将深度学习和强化学习相结合,来优化序列决策问题的长短期综合目标。DQN模型利用神经网络来拟合策略,网络输入状态,输出各个动作对应的Q值(即预期奖励),最大的Q值对应的动作即为DQN认为应该选择的动作。然而,发明人意识到,目前输入DQN模型的状态表示过于简单,例如,直接将文本数据作为状态输入,导致DQN模型输出的Q值不准确,即确定出的策略可靠性较低。
发明内容
本申请实施例提供了一种基于人工智能的数据处理方法、装置、设备和介质,有助于提升确定出的用户处理策略的可靠性。
第一方面,本申请实施例提供了一种基于人工智能的数据处理方法,包括:
获取样本数据;
利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
获取目标用户的当前状态信息;
将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
第二方面,本申请实施例提供了一种数据处理装置,包括:
获取模块,用于获取样本数据;
训练模块,用于利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
所述获取模块,还用于获取目标用户的当前状态信息;
处理模块,用于将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
所述处理模块,还用于将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
第三方面,本申请实施例提供了一种数据处理设备,该数据处理设备可包括处理器和存储器,所述处理器和存储器相互连接。其中,所述存储器用于存储支持数据处理设备执行上述方法或步骤的计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行上述基于人工智能的数据处理方法,该基于人工智能的数据处理方法包括:
获取样本数据;
利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
获取目标用户的当前状态信息;
将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行上述基于人工智能的数据处理方法,该基于人工智能的数据处理方法包括:
获取样本数据;
利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
获取目标用户的当前状态信息;
将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
在本申请实施例中,BERT模型可以学习到状态转移信息,使得BERT模型的输出向量更准确的表示语句含义信息,最终实现将BERT模型的输出输入DQN模型进行训练时,可以得到更准确的Q值,使得后续在为用户确定策略时,可以通过将BERT模型与DQN模型结合,得到当前状态对应的策略,这就提升了确定出的用户处理策略的可靠性。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种基于人工智能的数据处理方法的流程示意图;
图2是本申请实施例提供的另一种基于人工智能的数据处理方法的流程示意图;
图3a是本申请实施例提供的一种BERT+DQN模型的架构示意图;
图3b是本申请实施例提供的另一种BERT+DQN模型的架构示意图;
图4是本申请实施例提供的一种数据处理装置的结构示意图;
图5是本申请实施例提供的一种数据处理设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的技术方案可应用于人工智能和/或大数据技术领域,如可应用于治疗方案确定、用户分群等场景中,以实现智能地为用户确定出可靠的策略,推动智慧城市的建设。例如, 本申请可通过数据平台或其他设备实现。可选的,本申请涉及的数据如样本数据、状态信息和/或动作信息等可存储于区块链节点,或者可存储于数据库,本申请不做限定。
BERT模型是一个先进的神经语言预训练模型,而DQN模型作为DRL中的经典算法,是通过将深度学习和强化学习相结合强化学习,来实现策略的确定。其中,强化学习方法是智能体通过一定的策略(policy),针对状态(state)采取动作(action)后得到奖励(reward),再通过所获得的奖励(reward)来优化策略(policy)的人工智能方法。其中policy是指在特定state下应该采取某个action,以使预期reward最大。本申请通过结合BERT(Bidirectional Encoder Representation from Transformers)模型和DRL模型如DQN模型来确定采取的策略,使得确定出的策略更加准确,可靠性更高。例如,本申请可以获取样本数据,基于样本数据对BERT模型和DRL模型如DQN模型进行训练,进而可将用户信息如用户状态信息输入训练好的BERT模型,将BERT模型的输出信息输入训练好的DRL模型如DQN模型,最终得到该用户状态信息对应的动作信息,获得当前可采取的策略,如治疗方案、锻炼方案和/或分群方案等等。
可以理解,本申请的技术方案可应用于数据处理设备(数据处理装置)中,用于确定处理策略。可选的,该数据处理设备可以是终端,也可以是服务器,还可以为数据平台或其他设备。该终端可包括手机、平板电脑、计算机等等,本申请不做限定。可以理解,在其他实施例中,该终端还可叫做其余名称,比如叫做终端设备、智能终端、用户设备、用户终端等等,此处不一一列举。
本申请实施例提供了一种基于人工智能的数据处理方法、装置、设备和介质等,使得有助于提升确定出的策略的可靠性。以下分别详细说明。
请参见图1,是本申请实施例提供的一种基于人工智能的数据处理方法的流程示意图。该方法可以由上述的数据处理设备执行,如图1所示,该方法可包括以下步骤:
101、获取样本数据。
可选的,该样本数据可以是文本数据、表格数据、图片数据等等,获取的样本数据可包括多个(多组),本申请不做限定。例如,该样本数据可以是糖尿病患者的长期随访数据,通过收集大量糖尿病患者的长期随访数据,作为样本数据。比如以每个患者的每次随访数据为一个(一组)样本数据,一个样本数据可以包括患者的人口统计学信息、检验检查指标、用药史、开药信息等一项或多项特征数据,或者,可以是根据患者的人口统计学信息、检验检查指标、用药史、开药信息等一项或多项特征数据转化得到的。
在一些实施例中,在获取样本数据时,可以是通过获取多个用户的特征数据,该特征数据包括连续特征数据;进而可对该连续特征数据进行离散化处理,获得离散化特征数据,并将该离散化特征数据作为样本数据。也就是说,针对连续特征数据,可以将连续特征数据进行离散化处理后作为样本数据。由此有助于在确保训练可靠性的同时,降低模型训练复杂度。
在一些实施例中,对该连续特征数据进行离散化处理,获得离散化特征数据的方式,可以是将连续特征数据分为n个区间,每个区间对应一个离散值;若用户的连续特征数据属于某一个区间,则将该区间对应的离散值确定为离散化特征数据。也就是说,离散化处理可以是将连续特征数据分为n个区间,每个区间对应一个离散值,进而可根据用户(患者)的连续特征数据所属的区间,确定该连续特征数据对应的离算话特征数据,也即该区间对应的离散值。例如,结合临床知识,将所有的连续患者特征进行离散化处理,分为5个区间,每个区间对应一个离散值。5个区间分别为:极低、较低于正常值、正常值、较高于正常值、极高。
在一些实施例中,该样本数据可以从区块链获取,即样本数据可以预先存储于区块链中。通过从区块链中获取样本数据,可以提升获取的数据的可靠性,进而提升基于该样本 数据训练出的模型的准确性,进而提升后续确定出的用户处理策略的可靠性。例如,数据处理设备可以向区块链节点发送携带项目标识的数据请求,使得区块链节点在接收到该请求并对该数据处理设备的身份校验通过之后,可查找该标识对应的样本数据,进而可向该数据处理设备返回查找出的样本数据,数据处理设备可接收区块链节点发送的净效益。可选的,该项目标识可以是疾病标识、区域标识、性别标识、年龄标识等等,本申请不做限定。
在一些实施例中,该样本数据可以从服务器获取,比如通过向服务器发送携带项目标识的数据请求以请求该项目标识对应的样本数据,与上述向区块链节点请求数据的方式类似,此处不赘述。
102、利用BERT模型获取该样本数据对应的状态转移信息,并根据该状态转移信息对BERT模型进行训练,以及根据BERT模型的输出结果对与该BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型。
在获取得到样本数据之后,可以基于样本数据对BERT模型和DRL模型如DQN模型进行训练。例如,可以将样本数据输入BERT模型,并可获取该样本数据对应的状态转移信息,以对BERT模型+DQN模型进行训练。
可选的,该样本数据包括状态信息和动作信息。进一步可选的,该状态转移信息可以是下一次状态特征的相关信息。
在一些实施例中,利用BERT模型获取样本数据对应的状态转移信息,并根据状态转移信息对BERT模型进行训练,以及根据BERT模型的输出结果对与该BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型的方式,可以是将该样本数据包括的状态信息输入BERT模型进行编码处理,获得输出结果如CLS符号对应的编码向量,并可获取该样本数据包括的动作信息对应的独热编码向量;进而可将该CLS符号对应的编码向量与该动作信息对应的独热编码向量进行向量拼接,获得拼接后的向量;然后可根据拼接后的向量确定该样本数据对应的状态转移信息,并根据该状态转移信息对该BERT模型的各个模型参数进行调整。进一步的,还可将该CLS符号对应的编码向量输入DQN模型,对该DQN模型进行训练,通过DQN模型的迭代训练对该BERT模型的各个模型参数进行调整,得到训练后的目标BERT模型和目标DQN模型。
在一些实施例中,根据拼接后的向量确定该样本数据对应的状态转移信息,并根据该状态转移信息对该BERT模型的各个模型参数进行调整的方式,可以是对拼接后的向量进行预测,得到预测的下一次状态信息/特征,比如将拼接后的向量输入全连接层进行预测,获得预测的下一次状态特征;进而可根据该预测的下一次状态特征与实际的下一次状态特征,对BERT模型的各个模型参数进行调整。
在一些实施例中,还可通过MASK符号代替该样本数据中的一个或多个特征对应的词向量如一维词向量;进而可获取该MASK符号在输入BERT模型的各个特征中所在位置,并获取该BERT模型输出的各个编码词向量中该MASK符号所在位置对应的编码词向量。进一步的,可根据该MASK符号所在位置对应的编码词向量预测输入该BERT模型的该MASK符号的特征,得到该MASK符号的预测特征;进而根据该预测特征和该MASK符号的实际特征,对该BERT模型的各个模型参数进行调整。由此可进一步提升对BERT模型训练的可靠性。
也就是说,首先可利用BERT对患者的state进行预训练,输入患者的当前visit的state,预训练包括:MASK(Mask)掉一部分token,然后利用MASK的位置对应的输出预测被MASK的token;利用CLS的输出预测下一个状态会出现的token。然后,可以把BERT模型的CLS的输出连接到DQN模型的输入,利用DQN模型进行强化学习的迭代训练而微调BERT模型,同时拟合最优策略。
在一些实施例中,根据该MASK符号所在位置对应的编码词向量预测输入该BERT模型的该MASK符号的特征的方式,可以是将该MASK符号所在位置对应的编码词向量输入全连接层进行预测,获得输入该BERT模型的该MASK符号的特征。
在一些实施例中,还可以通过其他方式获取状态转移信息,比如基于样本数据从服务器获取,或者通过其他算法确定出,本申请不做限定。
103、获取目标用户的当前状态信息。
可选的,该目标用户可以是指任一个或多个用户,也可以是指特定的一个或多个用户,比如发起决策请求的用户,又如需要进行随访的用户,等等,本申请不做限定。进一步可选的,该当前状态信息可以是指任一状态信息。例如,该当前状态信息可以包括人口统计学信息、检验检查指标、用药史中的一项或多项。
104、将该当前状态信息输入目标BERT模型进行编码处理,获得该目标BERT模型输出的CLS符号所对应的编码向量。
105、将CLS符号所对应的编码向量输入该目标DQN模型,获得该当前状态信息对应的动作信息,该动作信息用于指示该当前状态下对该目标用户所采取的处理策略。
BERT模型是一个先进的神经语言预训练模型,原本用于学习由单词组成的句子的表示。输入为每个单词,输出为句子的表示。本申请可通过将BERT模型用于为患者确定策略的DRL任务上,比如可输入患者某次随访的各个特征(state),输出患者在当次随访的表示。
在训练得到BERT+DQN模型之后,即可通过将用户信息如用户状态信息输入训练好的BERT模型,将BERT模型的输出信息输入训练好的DRL模型如DQN模型,最终得到该用户状态信息对应的动作信息,预测得到当前可采取的策略,如治疗方案、锻炼方案和/或分群方案等等。也就是说,训练结束后,模型使用时,可输入患者的state的各个特征信息,输出最大Q值对应的决策,进行患者分群等等。
在一些实施例中,在确定出目标用户对应的动作信息之后,还可基于该确定出的动作信息/策略为该目标用户推送消息,如营销消息、服务消息、医疗消息等等,本申请不做限定。
本申请提出的BERT+DQN方案结合了深度强化学习技术和神经语言预训练技术,在学习患者状态更好的表示的同时寻找最优的用户处理策略。所学习到的患者状态表示包含了更多的潜在信息以及状态转移信息,同时,由于状态转移还由实际采取的action决定,可能包含了一定的专家决策(action)信息,使得所确定的用户处理策略更加准确、合理、安全。这种结合的框架将DRL的预训练步骤融合进了BERT模型的预训练中,减少了DRL模型的无用探索,适用于各种领域。
在本申请实施例中,BERT模型可以学习到状态转移信息,使得BERT模型的输出向量更准确的表示语句含义信息,最终实现将BERT模型的输出输入DQN模型进行训练时,可以得到更准确的Q值,使得后续在为用户确定策略时,可以通过将BERT模型与DQN模型结合,得到当前状态对应的策略,这就提升了确定出的用户处理策略的可靠性。
请参见图2,是本申请实施例提供的另一种基于人工智能的数据处理方法的流程示意图。该方法可以由上述的数据处理设备执行,并可应用于用户分群场景。如图2所示,该方法可包括以下步骤:
201、获取样本数据。
可选的,该步骤201的描述可参照上述实施例中的相关描述,此处不赘述。
202、将该样本数据包括的状态信息输入BERT模型进行编码处理,获得CLS符号对应的编码向量。
在获取得到样本数据之后,可以将样本数据输入BERT模型进行编码处理,比如将所 有样本数据或样本数据中的状态信息输入BERT模型,获得输出结果,包括分类器标记CLS符号所对应的编码向量,该编码向量可以作为输入该BERT模型的所有样本数据的语义表示。
例如,可采用BERT模型对样本数据中各个特征对应的一维词向量进行编码处理,并输出与各个特征对应的编码词向量,编码词向量可以作为该特征的文本语义表示。可以理解,具有相近或相似特征所对应的编码词向量在向量空间上距离也比较接近,比如特征相似度越高,特征对应的编码词向量在向量空间上距离越接近。
其中,为了便于训练该BERT模型,还可以将输入BERT模型的样本数据中的一个或多个特征对应的一维词向量通过MASK符号代替,如下图所示,将某个特征对应的一维词向量通过MASK符号代替。获取该MASK符号在输入BERT模型的各个特征中所在位置,并进一步获取BERT模型输出的各个编码词向量中该MASK符号所在位置对应的编码词向量,根据该MASK符号所在位置对应的编码词向量预测输入该BERT模型的该MASK符号的特征,例如,可以通过将该MASK符号所在位置对应的编码词向量输入全连接层进行预测,获得输入该BERT模型的该MASK符号的特征,根据预测得到的特征和该MASK符号的实际特征,对BERT模型的各个模型参数进行调整。
203、获取该样本数据包括的动作信息对应的独热编码向量,并将该CLS符号对应的编码向量与该动作信息对应的独热编码向量进行向量拼接,获得拼接后的向量。
在获取得到动作对应的独热编码向量之后,可以将CLS符号所对应的编码向量与动作对应的独热编码向量进行向量拼接,获得拼接后的向量,以便于基于拼接向量对BERT模型进行训练。
在本申请中,为了使BERT模型学习到状态转移信息,针对样本数据中的状态,可以获取所采取动作对应的独热编码向量。以状态为糖尿病患者的随访数据作为举例,动作可以是医生所采取的开药方案,则可以进一步将医生所采取的开药方案对应的独热编码向量与CLS符号所对应的编码向量进行拼接。
204、将拼接后的向量输入全连接层进行预测,获得预测的下一次状态特征,根据该预测的下一次状态特征与实际的下一次状态特征,对BERT模型的各个模型参数进行调整。
仍以状态为糖尿病患者的随访数据为例,可以将拼接后的向量输入全连接层进行预测,获得预测得到的下一次随访的状态特征,并根据预测得到的下一次随访得到的状态特征与实际下一次随访的状态特征,对BERT模型的各个模型参数进行调整,以使得预测得到的下一次随访得到的状态特征与实际下一次随访的状态特征匹配度最高或高于阈值,从而实现对BERT模型的训练。
例如,在本申请中,可以将拼接后的向量输入全连接层,并根据全连接层输出的向量预测得到下一次随访的状态特征,根据预测得到的下一次随访得到的状态特征与实际下一次随访的状态特征之间的匹配度,对BERT模型的各个模型参数进行调整,使得预测得到的下一次随访得到的状态特征与实际下一次随访的状态特征之间的匹配度最大。
在这个场景中,state可以为人口统计学信息、检验检查指标、用药史组成的多维向量,action可以为独热编码,reward可以预先设置得到,比如由人为设置。例如,reward=-sign(最后一次随访是否出现并发症)*5+sign(下一次随访糖化血红蛋白是否达标)*1。
通过以上步骤201-204完成对BERT模型的预训练,下面可通过步骤205对DQN模型进行训练,并调整BERT模型。
205、将该CLS符号对应的编码向量输入DQN模型,对该DQN模型进行训练,并对该BERT模型的各个模型参数进行调整,得到训练后的目标BERT模型和目标DQN模型。
进一步的,还可以将CLS符号所对应的编码向量输入DQN模型,并利用DQN模型进行强化学习的迭代训练,从而微调BERT模型,拟合最优策略。
例如,如图3a所示,可利用BERT对患者的state进行预训练,输入患者的当前随访的state,如性别、糖化信息、BMI信息、血压信息、是否用过双胍、年龄信息等等token。其中,可MASK掉一部分token,然后利用MASK的位置对应的输出预测被MASK的token,利用CLS的输出预测下一个状态会出现的token,对BERT进行训练。然后,可以把BERT模型的CLS的输出连接到DQN模型的输入,利用DQN模型进行强化学习的迭代训练而微调BERT模型,同时拟合最优策略。可选的,该DQN可包括一个或多个全连接层。
又如,如图3b所示,可利用BERT对患者的state进行预训练,输入患者的当前随访的state,可MASK掉一部分token,然后利用MASK的位置对应的输出预测被MASK的token,利用CLS的输出预测下一个状态会出现的token,对BERT进行训练。在获取得到BRET输出结果如CLS符号对应的编码向量,以及获取得到动作对应的独热编码向量之后,可以将CLS符号所对应的编码向量与动作对应的独热编码向量进行向量拼接,将拼接后的向量输入全连接层进行预测,获得预测特征token,并根据预测特征与实际特征对BERT模型进行调整。此外,可以把BERT模型的CLS的输出连接到DQN模型的输入,利用DQN模型进行强化学习的迭代训练而微调BERT模型,同时拟合最优策略。
206、获取目标用户的当前状态信息。
207、将该当前状态信息输入该目标BERT模型进行编码处理,获得该目标BERT模型输出的CLS符号所对应的编码向量。
208、将CLS符号所对应的编码向量输入DQN模型,获得该当前状态信息对应的动作信息。其中,该动作信息可用于指示该当前状态下对该目标用户所采取的处理策略。
在对BERT模型和DQN模型训练结束后,使用该BERT模型和DQN模型确定在当前状态特征下应该采取的动作时,可以获取当前状态的各个特征对应的一维词向量,并将该各个特征对应的一维词向量输入BERT模型进行编码处理,获取BERT模型输出的CLS符号所对应的编码向量,该CLS符号所对应的编码向量作为输入该BERT模型的所有特征数据的语义表示。进而可将该CLS符号所对应的编码向量输入DQN模型,从而获得各种动作所对应的Q值,并将最大Q值所对应的动作确定为当前状态下应该采取的动作,例如,以给糖尿病患者进行开药作为举例,以最大Q值所对应的开药方案确定为给患者的开药方案。
209、根据该当前状态信息对应的动作信息,确定该目标用户所属的用户分群。
目前,存在许多需要对用户进行分群的场景,用户分群可以将用户按照一定的条件(属性)组成一个群,分群后可以针对用户群的用户进行多种分析、操作,比如可向同一用户群中的推送消息,或者可对条件最优的用户群的用户特性进行分析,或者可为同一用户群的用户提供相同的解决方案等等。如果仅将深度强化学习应用于患者分群,往往面临着患者状态表示过于简单的问题。分群结果可能不准确。为了给出更准确的分群结果,需要对状态进行更加准确、融入更多信息的表示。同时,由于患者分群结果需要和医生分群的结果保持一致性,尽量不要偏离医生分群的结果太远,以确保可靠性。本申请通过将BERT的预训练模型与强化学习方法结合用于患者分群的框架,BERT模型与深度强化学习方法发挥了各自的优势,将它们结合来进行患者分群可以得到更加准确、更加合理的分群结果,且额外对医生的分群习惯进行预训练。因某些强化学习模型忽略了对状态转移的建模,一定程度上丢失了状态转移信息,利用BERT模型进行预训练,可以纳入状态转移信息进行患者分群,提升准确性。
在本申请实施例中,在训练得到BERT+DQN模型之后,可通过输入目标用户的state的各个特征信息,输出最大Q值对应的决策,进行患者分群。也即,本申请可训练得到进行用户分群的BERT+DQN模型。比如可按照输出的动作信息,进行分群处理。其中,属于同一用户分群的用户所对应的动作信息相匹配。动作信息相匹配可以是指采用的动作/策 略相同或相似,比如处于同一区间。例如,以开药对应分群方案,不同的开药即是不同的开药方案,专家开药即为专家的分群方案。
可选的,在完成用户分群比如针对目标群体用户完成分群之后,还可基于分群情况进行消息推送、用户特征分析等等。
例如,在一些实施例中,以糖尿病为例,确定出的动作信息可以为糖尿病的治疗方案。在基于动作信息划分得到用户群之后,数据处理设备还可针对划分得到的用户群为各用户群进行针对性消息推送,发送随访提醒等等,以便于管理和维护,增强用户体验,确保治疗效果。
在这个场景中,state可以为人口统计学信息、检验检查指标、用药史组成的多维向量,action可以为分群方案的独热编码,reward可预先设置得到。
本申请可以通过将BERT模型的输出向量与动作向量对应的独热编码向量进行拼接,从而通过全连接层预测下一次状态的特征,并与实际下一次状态的特征进行匹配,对BERT模型参数进行调整,从而让BERT模型可以学习到状态转移信息,也能让BERT模型的输出向量更准确的表示语句含义信息,最终实现将BERT模型的输出向量输入DQN模型进行训练时,可以得到更准确的Q值。本方案提出的BERT+DQN方法结合了深度强化学习技术和神经语言预训练技术,在学习患者随访更好的表示的同时寻找最优的患者分群决策,所学习到的患者随访表示包含了更多的潜在信息以及状态转移信息。同时,由于状态转移还由实际采取的action决定,该状态表示还包含了一定的专家决策(action)信息,所推荐的患者分群决策更加准确、合理、安全。
这种结合的框架将DRL用于患者分群时的预训练步骤融合进了BERT模型的预训练中,减少了DRL模型的无用探索。可以理解,这种框架不仅可以用于患者分群领域,还能应用于其他领域如离线强化学习领域。
可以理解,上述方法实施例都是对本申请的基于人工智能的数据处理方法的举例说明,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
本申请实施例还提供了一种数据处理装置。该装置可包括用于执行前述图1或者图2所述的方法的模块。请参见图4,是本申请实施例提供的一种数据处理装置的结构示意图。本实施例中所描述的数据处理装置,可配置于数据处理设备中,如图4所示,本实施例的数据处理装置400可以包括:获取模块401、训练模块402和处理模块403。其中,
获取模块401,用于获取样本数据;
训练模块402,用于利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
获取模块401,还用于获取目标用户的当前状态信息;
处理模块403,用于将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
处理模块403,还用于将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
在一种可能的实施方式中,所述样本数据包括状态信息和动作信息;所述训练模块402利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型,包括:
将所述样本数据包括的状态信息输入BERT模型进行编码处理,获得CLS符号对应的编码向量;
获取所述样本数据包括的动作信息对应的独热编码向量;
将所述CLS符号对应的编码向量与所述动作信息对应的独热编码向量进行向量拼接,获得拼接后的向量;
根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整;
将所述CLS符号对应的编码向量输入DQN模型,对所述DQN模型进行训练,并对所述BERT模型的各个模型参数进行调整,得到训练后的目标BERT模型和目标DQN模型。
在一种可能的实施方式中,所述训练模块402根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整,包括:
将拼接后的向量输入全连接层进行预测,获得预测的下一次状态特征;
根据所述预测的下一次状态特征与实际的下一次状态特征,对BERT模型的各个模型参数进行调整。
在一种可能的实施方式中,所述样本数据包括状态信息,所述状态信息包括多个特征;
训练模块402,还用于通过MASK符号代替所述状态信息中的一个或多个特征对应的一维词向量;获取所述MASK符号在输入所述BERT模型的各个特征中所在位置,并获取所述BERT模型输出的各个编码词向量中所述MASK符号所在位置对应的编码词向量;根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,得到所述MASK符号的预测特征;根据所述预测特征和所述MASK符号的实际特征,对所述BERT模型的各个模型参数进行调整。
在一种可能的实施方式中,所述训练模块402根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,包括:
将所述MASK符号所在位置对应的编码词向量输入全连接层进行预测,获得输入所述BERT模型的所述MASK符号的特征。
在一种可能的实施方式中,所述获取模块401获取样本数据,包括:
获取多个用户的特征数据,所述特征数据包括连续特征数据;
对所述连续特征数据进行离散化处理,获得离散化特征数据,并将所述离散化特征数据作为样本数据。
在一种可能的实施方式中,处理模块403,还用于根据所述当前状态信息对应的动作信息,确定所述目标用户所属的用户分群;
其中,属于同一用户分群的用户所对应的动作信息相匹配。
可以理解的是,本实施例的数据处理装置的各功能模块可根据上述方法实施例图1或者图2中的方法具体实现,其具体实现过程可以参照上述方法实施例图1或者图2的相关描述,此处不再赘述。
请参见图5,图5是本申请实施例提供的一种数据处理设备的结构示意图。如图5所示,该数据处理设备可包括:处理器501和存储器502。该处理器501和存储器502可相互连接。可选的,该数据处理设备还可包括通信接口503。上述处理器501、存储器502和通信接口503可通过总线或其他方式连接,本申请不做限定。其中,存储器502可用于存储程序指令,处理器501可用于调用所述程序指令执行上述实施例的部分或全部步骤,如执行上述实施例中数据处理设备执行的部分或全部步骤。通信接口503可受所述处理器的控制用于收发消息。例如,存储器502可用于存储计算机程序,所述计算机程序包括程序指令,处理器501用于执行存储器502存储的程序指令。其中,处理器501被配置用于调 用所述程序指令,执行以下步骤:
获取样本数据;
利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
获取目标用户的当前状态信息;
将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
在一种可能的实施方式中,所述样本数据包括状态信息和动作信息;执行所述利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型,包括:
将所述样本数据包括的状态信息输入BERT模型进行编码处理,获得CLS符号对应的编码向量;
获取所述样本数据包括的动作信息对应的独热编码向量;
将所述CLS符号对应的编码向量与所述动作信息对应的独热编码向量进行向量拼接,获得拼接后的向量;
根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整;
将所述CLS符号对应的编码向量输入DQN模型,对所述DQN模型进行训练,并对所述BERT模型的各个模型参数进行调整,得到训练后的目标BERT模型和目标DQN模型。
在一种可能的实施方式中,执行所述根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整,包括:
将拼接后的向量输入全连接层进行预测,获得预测的下一次状态特征;
根据所述预测的下一次状态特征与实际的下一次状态特征,对BERT模型的各个模型参数进行调整。
在一种可能的实施方式中,所述样本数据包括状态信息,所述状态信息包括多个特征;所述处理器501还用于执行:
通过MASK符号代替所述状态信息中的一个或多个特征对应的一维词向量;
获取所述MASK符号在输入所述BERT模型的各个特征中所在位置,并获取所述BERT模型输出的各个编码词向量中所述MASK符号所在位置对应的编码词向量;
根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,得到所述MASK符号的预测特征;
根据所述预测特征和所述MASK符号的实际特征,对所述BERT模型的各个模型参数进行调整。
在一种可能的实施方式中,执行所述根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,包括:
将所述MASK符号所在位置对应的编码词向量输入全连接层进行预测,获得输入所述BERT模型的所述MASK符号的特征。
在一种可能的实施方式中,执行所述获取样本数据,包括:
获取多个用户的特征数据,所述特征数据包括连续特征数据;
对所述连续特征数据进行离散化处理,获得离散化特征数据,并将所述离散化特征数据作为样本数据。
在一种可能的实施方式中,所述获得所述当前状态信息对应的动作信息之后,所述处理器501还用于执行:
根据所述当前状态信息对应的动作信息,确定所述目标用户所属的用户分群;
其中,属于同一用户分群的用户所对应的动作信息相匹配。
应当理解,在本申请实施例中,所称处理器501可以是中央处理单元(Central Processing Unit,CPU),该处理器501还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器502可以包括只读存储器和随机存取存储器,并向处理器501提供指令和数据。存储器502的一部分还可以包括非易失性随机存取存储器。例如,存储器502还可以存储目标传染病的疫情数据。
该通信接口503可以包括输入设备和/或输出设备,例如该输入设备是可以是控制面板、麦克风、接收器等,输出设备可以是显示屏、发送器等,此处不一一列举。
具体实现中,本申请实施例中所描述的处理器501和存储器502(还可包括通信接口503)可执行本申请实施例提供的图1或者图2所述的方法实施例所描述的实现方式,也可执行本申请实施例所描述的数据处理装置的实现方式,在此不再赘述。
本申请实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,可执行上述数据处理方法实施例中所执行的部分或全部步骤,如数据处理设备执行的部分或全部步骤。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的,本申请不做限定。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述数据处理装置方法实施例中所执行的步骤。
在一些实施例中,所述的计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据如样本数据、动作信息、策略等。
其中,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。

Claims (20)

  1. 一种基于人工智能的数据处理方法,其中,包括:
    获取样本数据;
    利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
    获取目标用户的当前状态信息;
    将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
    将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
  2. 根据权利要求1所述的方法,其中,所述样本数据包括状态信息和动作信息;所述利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型,包括:
    将所述样本数据包括的状态信息输入BERT模型进行编码处理,获得CLS符号对应的编码向量;
    获取所述样本数据包括的动作信息对应的独热编码向量;
    将所述CLS符号对应的编码向量与所述动作信息对应的独热编码向量进行向量拼接,获得拼接后的向量;
    根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整;
    将所述CLS符号对应的编码向量输入DQN模型,对所述DQN模型进行训练,并对所述BERT模型的各个模型参数进行调整,得到训练后的目标BERT模型和目标DQN模型。
  3. 根据权利要求2所述的方法,其中,所述根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整,包括:
    将拼接后的向量输入全连接层进行预测,获得预测的下一次状态特征;
    根据所述预测的下一次状态特征与实际的下一次状态特征,对BERT模型的各个模型参数进行调整。
  4. 根据权利要求1所述的方法,其中,所述样本数据包括状态信息,所述状态信息包括多个特征;所述方法还包括:
    通过MASK符号代替所述状态信息中的一个或多个特征对应的一维词向量;
    获取所述MASK符号在输入所述BERT模型的各个特征中所在位置,并获取所述BERT模型输出的各个编码词向量中所述MASK符号所在位置对应的编码词向量;
    根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,得到所述MASK符号的预测特征;
    根据所述预测特征和所述MASK符号的实际特征,对所述BERT模型的各个模型参数进行调整。
  5. 根据权利要求4所述的方法,其中,所述根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,包括:
    将所述MASK符号所在位置对应的编码词向量输入全连接层进行预测,获得输入所述BERT模型的所述MASK符号的特征。
  6. 根据权利要求1所述的方法,其中,所述获取样本数据,包括:
    获取多个用户的特征数据,所述特征数据包括连续特征数据;
    对所述连续特征数据进行离散化处理,获得离散化特征数据,并将所述离散化特征数据作为样本数据。
  7. 根据权利要求1-6任一项所述的方法,其中,在所述获得所述当前状态信息对应的动作信息之后,所述方法还包括:
    根据所述当前状态信息对应的动作信息,确定所述目标用户所属的用户分群;
    其中,属于同一用户分群的用户所对应的动作信息相匹配。
  8. 一种数据处理装置,其中,包括:
    获取模块,用于获取样本数据;
    训练模块,用于利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
    所述获取模块,还用于获取目标用户的当前状态信息;
    处理模块,用于将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
    所述处理模块,还用于将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
  9. 一种数据处理设备,其中,包括处理器和存储器,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行基于人工智能的数据处理方法,所述基于人工智能的数据处理方法包括:
    获取样本数据;
    利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
    获取目标用户的当前状态信息;
    将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
    将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
  10. 根据权利要求9所述的数据处理设备,其中,所述样本数据包括状态信息和动作信息;执行所述利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型,包括:
    将所述样本数据包括的状态信息输入BERT模型进行编码处理,获得CLS符号对应的编码向量;
    获取所述样本数据包括的动作信息对应的独热编码向量;
    将所述CLS符号对应的编码向量与所述动作信息对应的独热编码向量进行向量拼接,获得拼接后的向量;
    根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整;
    将所述CLS符号对应的编码向量输入DQN模型,对所述DQN模型进行训练,并对所 述BERT模型的各个模型参数进行调整,得到训练后的目标BERT模型和目标DQN模型。
  11. 根据权利要求10所述的数据处理设备,其中,执行所述根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整,包括:
    将拼接后的向量输入全连接层进行预测,获得预测的下一次状态特征;
    根据所述预测的下一次状态特征与实际的下一次状态特征,对BERT模型的各个模型参数进行调整。
  12. 根据权利要求9所述的数据处理设备,其中,所述样本数据包括状态信息,所述状态信息包括多个特征;所述处理器执行所述基于人工智能的数据处理方法时,还包括:
    通过MASK符号代替所述状态信息中的一个或多个特征对应的一维词向量;
    获取所述MASK符号在输入所述BERT模型的各个特征中所在位置,并获取所述BERT模型输出的各个编码词向量中所述MASK符号所在位置对应的编码词向量;
    根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,得到所述MASK符号的预测特征;
    根据所述预测特征和所述MASK符号的实际特征,对所述BERT模型的各个模型参数进行调整。
  13. 根据权利要求12所述的数据处理设备,其中,执行所述根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,包括:
    将所述MASK符号所在位置对应的编码词向量输入全连接层进行预测,获得输入所述BERT模型的所述MASK符号的特征。
  14. 根据权利要求9-13任一项所述的数据处理设备,其中,在所述获得所述当前状态信息对应的动作信息之后,所述处理器执行所述基于人工智能的数据处理方法时,还包括:
    根据所述当前状态信息对应的动作信息,确定所述目标用户所属的用户分群;
    其中,属于同一用户分群的用户所对应的动作信息相匹配。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行基于人工智能的数据处理方法,所述基于人工智能的数据处理方法包括:
    获取样本数据;
    利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型;
    获取目标用户的当前状态信息;
    将所述当前状态信息输入所述目标BERT模型进行编码处理,获得所述目标BERT模型输出的分类器标记CLS符号所对应的编码向量;
    将所述CLS符号所对应的编码向量输入所述目标DQN模型,获得所述当前状态信息对应的动作信息,所述动作信息用于指示所述当前状态下对所述目标用户所采取的处理策略。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述样本数据包括状态信息和动作信息;执行所述利用BERT模型获取所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型进行训练,以及根据所述BERT模型的输出结果对与所述BERT模型连接的DQN模型进行训练,得到训练后的目标BERT模型和目标DQN模型,包括:
    将所述样本数据包括的状态信息输入BERT模型进行编码处理,获得CLS符号对应的编码向量;
    获取所述样本数据包括的动作信息对应的独热编码向量;
    将所述CLS符号对应的编码向量与所述动作信息对应的独热编码向量进行向量拼接,获得拼接后的向量;
    根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整;
    将所述CLS符号对应的编码向量输入DQN模型,对所述DQN模型进行训练,并对所述BERT模型的各个模型参数进行调整,得到训练后的目标BERT模型和目标DQN模型。
  17. 根据权利要求16所述的计算机可读存储介质,其中,执行所述根据拼接后的向量确定所述样本数据对应的状态转移信息,并根据所述状态转移信息对所述BERT模型的各个模型参数进行调整,包括:
    将拼接后的向量输入全连接层进行预测,获得预测的下一次状态特征;
    根据所述预测的下一次状态特征与实际的下一次状态特征,对BERT模型的各个模型参数进行调整。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述样本数据包括状态信息,所述状态信息包括多个特征;所述处理器执行所述基于人工智能的数据处理方法,还包括:
    通过MASK符号代替所述状态信息中的一个或多个特征对应的一维词向量;
    获取所述MASK符号在输入所述BERT模型的各个特征中所在位置,并获取所述BERT模型输出的各个编码词向量中所述MASK符号所在位置对应的编码词向量;
    根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,得到所述MASK符号的预测特征;
    根据所述预测特征和所述MASK符号的实际特征,对所述BERT模型的各个模型参数进行调整。
  19. 根据权利要求18所述的计算机可读存储介质,其中,执行所述根据所述MASK符号所在位置对应的编码词向量预测输入所述BERT模型的所述MASK符号的特征,包括:
    将所述MASK符号所在位置对应的编码词向量输入全连接层进行预测,获得输入所述BERT模型的所述MASK符号的特征。
  20. 根据权利要求15-19任一项所述的计算机可读存储介质,其中,在所述获得所述当前状态信息对应的动作信息之后,所述处理器执行所述基于人工智能的数据处理方法,还包括:
    根据所述当前状态信息对应的动作信息,确定所述目标用户所属的用户分群;
    其中,属于同一用户分群的用户所对应的动作信息相匹配。
PCT/CN2021/096388 2021-04-29 2021-05-27 基于人工智能的数据处理方法、装置、设备和介质 WO2022227164A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110477679.3 2021-04-29
CN202110477679.3A CN113076745A (zh) 2021-04-29 2021-04-29 基于人工智能的数据处理方法、装置、设备和介质

Publications (1)

Publication Number Publication Date
WO2022227164A1 true WO2022227164A1 (zh) 2022-11-03

Family

ID=76616074

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096388 WO2022227164A1 (zh) 2021-04-29 2021-05-27 基于人工智能的数据处理方法、装置、设备和介质

Country Status (2)

Country Link
CN (1) CN113076745A (zh)
WO (1) WO2022227164A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200143206A1 (en) * 2018-11-05 2020-05-07 Royal Bank Of Canada System and method for deep reinforcement learning
CN111144119A (zh) * 2019-12-27 2020-05-12 北京联合大学 一种改进知识迁移的实体识别方法
CN111950296A (zh) * 2020-08-21 2020-11-17 桂林电子科技大学 一种基于bert微调模型的评论目标情感分析
CN112052320A (zh) * 2020-09-01 2020-12-08 腾讯科技(深圳)有限公司 一种信息处理方法、装置及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200143206A1 (en) * 2018-11-05 2020-05-07 Royal Bank Of Canada System and method for deep reinforcement learning
CN111144119A (zh) * 2019-12-27 2020-05-12 北京联合大学 一种改进知识迁移的实体识别方法
CN111950296A (zh) * 2020-08-21 2020-11-17 桂林电子科技大学 一种基于bert微调模型的评论目标情感分析
CN112052320A (zh) * 2020-09-01 2020-12-08 腾讯科技(深圳)有限公司 一种信息处理方法、装置及计算机可读存储介质

Also Published As

Publication number Publication date
CN113076745A (zh) 2021-07-06

Similar Documents

Publication Publication Date Title
US11810671B2 (en) System and method for providing health information
JP7100087B2 (ja) 情報を出力する方法および装置
CN111753543B (zh) 药物推荐方法、装置、电子设备及存储介质
US20200050949A1 (en) Digital assistant platform
US20180218127A1 (en) Generating a Knowledge Graph for Determining Patient Symptoms and Medical Recommendations Based on Medical Information
CN109637669B (zh) 基于深度学习的治疗方案的生成方法、装置及存储介质
US10984024B2 (en) Automatic processing of ambiguously labeled data
US11120913B2 (en) Evaluating drug-adverse event causality based on an integration of heterogeneous drug safety causality models
US11847411B2 (en) Obtaining supported decision trees from text for medical health applications
WO2020144645A1 (en) Document improvement prioritization using automated generated codes
CN113707299A (zh) 基于问诊会话的辅助诊断方法、装置及计算机设备
CN115858886B (zh) 数据处理方法、装置、设备及可读存储介质
US11532387B2 (en) Identifying information in plain text narratives EMRs
US20220374709A1 (en) System and/or method for machine learning using binary poly loss function
CN113724830B (zh) 基于人工智能的用药风险检测方法及相关设备
WO2021139223A1 (zh) 分群模型的解释方法、装置、计算机设备和存储介质
CN113657086A (zh) 词语处理方法、装置、设备及存储介质
US10957432B2 (en) Human resource selection based on readability of unstructured text within an individual case safety report (ICSR) and confidence of the ICSR
CN116992879A (zh) 基于人工智能的实体识别方法、装置、设备及介质
WO2022227164A1 (zh) 基于人工智能的数据处理方法、装置、设备和介质
US20220374993A1 (en) System and/or method for machine learning using discriminator loss component-based loss function
WO2023084254A1 (en) Diagnosic method and system
CN114898184A (zh) 模型训练方法、数据处理方法、装置及电子设备
CN114822741A (zh) 患者分类模型的处理装置、计算机设备及存储介质
Fierro et al. Predicting unplanned readmissions with highly unstructured data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21938634

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21938634

Country of ref document: EP

Kind code of ref document: A1