CN116662904A - Method, device, computer equipment and medium for detecting variation of data type - Google Patents

Method, device, computer equipment and medium for detecting variation of data type Download PDF

Info

Publication number
CN116662904A
CN116662904A CN202310713654.8A CN202310713654A CN116662904A CN 116662904 A CN116662904 A CN 116662904A CN 202310713654 A CN202310713654 A CN 202310713654A CN 116662904 A CN116662904 A CN 116662904A
Authority
CN
China
Prior art keywords
data
detection data
information entropy
type
prediction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310713654.8A
Other languages
Chinese (zh)
Inventor
瞿晓阳
王健宗
刘承昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310713654.8A priority Critical patent/CN116662904A/en
Publication of CN116662904A publication Critical patent/CN116662904A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The present invention relates to the field of digital medical technology, and in particular, to a method, an apparatus, a computer device, and a medium for detecting a variance of a data type. The method comprises the steps of sending N pieces of first detection data and corresponding data types to a server, receiving an updated classification model sent by the server, inputting second detection data into the updated classification model, outputting a prediction type and classification probability thereof, calculating first information entropy of all data types, calculating second information entropy of all prediction types, calculating variance of all classification probabilities when the difference value between the second information entropy and the first information entropy is larger than an information entropy threshold, generating early warning information if the variance is larger than the variance threshold, enabling classification accuracy of the updated classification model to be higher, facilitating timely finding out and early warning of variant data types, and accordingly improving accuracy and efficiency of data type variant detection, enabling variant virus carrying information to be timely detected in medical scenes, and improving reliability of a digital medical platform.

Description

Method, device, computer equipment and medium for detecting variation of data type
Technical Field
The present invention relates to the field of digital medical technology, and in particular, to a method, an apparatus, a computer device, and a medium for detecting a variance of a data type.
Background
Along with the development of artificial intelligence technology, the data type identification task based on the artificial intelligence model is widely applied to a digital medical platform, and the digital medical platform can support the functions of disease auxiliary diagnosis, health management, remote consultation and the like, so that the efficiency of a medical institution is improved, and residents can conveniently seek medical advice.
The data type recognition task can be applied to a virus carried information classification scene and the like in the digital medical platform, so that virus carried information is accurately classified, such as classification of new coronaries, however, an intelligent model for data type recognition can only predict the most probable data category for input data at present, and if the input data has variation, detection still needs to be performed by a manual observation and analysis mode.
However, the manual observation and analysis method consumes larger human resources, the detection efficiency is lower, detection information is difficult to generate in time to play a role of early warning, and in a scene with higher privacy requirements, the intelligent model cannot obtain more complete training data, so that the accuracy of data type identification based on the intelligent model is lower, enough credible reference data is lacking in mutation detection, and the efficiency and accuracy of data type mutation detection are lower. Therefore, how to improve the accuracy and efficiency of data type variation detection is a problem to be solved.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a method, an apparatus, a computer device, and a medium for detecting a variation of a data type, so as to solve the problem that the efficiency and accuracy of detecting the variation of the data type are low.
In a first aspect, an embodiment of the present invention provides a method for detecting a variance of a data type, where the method includes:
n pieces of first detection data and corresponding data types thereof acquired in a first target time period are sent to a server containing a basic classification model, wherein N is an integer greater than zero;
receiving an updated classification model sent by the server, wherein the updated classification model is obtained by updating the basic classification model by the server according to all first detection data and corresponding data types provided by each client;
respectively inputting at least two second detection data acquired in a second target time period into the updated classification model to perform class prediction, and outputting a prediction type and classification probability of the corresponding second detection data;
calculating information entropy of all data types to obtain a first information entropy, and calculating information entropy of all prediction types to obtain a second information entropy;
And calculating variances of all classification probabilities when the difference value between the second information entropy and the first information entropy is larger than a preset information entropy threshold, and generating early warning information for expressing that the type of the second detection data is mutated if the variances are larger than the preset variance threshold.
In a second aspect, an embodiment of the present invention provides a variation detecting apparatus for a data type, including:
the data transmission module is used for transmitting N pieces of first detection data acquired in a first target time period and corresponding data types thereof to a server containing a basic classification model, wherein N is an integer greater than zero;
the model receiving module is used for receiving an updated classification model sent by the server, wherein the updated classification model is obtained by updating the basic classification model by the server according to all first detection data and corresponding data types provided by each client;
the class prediction module is used for respectively inputting at least two second detection data acquired in a second target time period into the updated classification model to perform class prediction and outputting a prediction type and classification probability of the corresponding second detection data;
The information entropy calculation module is used for calculating the information entropy of all data types to obtain a first information entropy, and calculating the information entropy of all prediction types to obtain a second information entropy;
the variation early warning module is used for calculating variances of all classification probabilities when the difference value between the second information entropy and the first information entropy is larger than a preset information entropy threshold value, and generating early warning information if the variances are larger than the preset variance threshold value, wherein the early warning information is used for expressing that the type of the second detection data is varied.
In a third aspect, an embodiment of the present invention provides a computer device, where the computer device includes a processor, a memory, and a computer program stored in the memory and executable on the processor, where the processor implements the mutation detection method according to the first aspect when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium storing a computer program, which when executed by a processor implements the mutation detection method according to the first aspect.
Compared with the prior art, the embodiment of the invention has the beneficial effects that:
The method comprises the steps of sending N first detection data and corresponding data types thereof acquired in a first target time period to a server containing a basic classification model, receiving an updated classification model sent by the server, updating the basic classification model according to all first detection data and corresponding data types thereof provided by each client by the server, respectively inputting at least two second detection data acquired in a second target time period into the updated classification model for class prediction, outputting a prediction type and classification probability thereof corresponding to the second detection data, calculating information entropy of all data types, obtaining the first information entropy, calculating the information entropy of all prediction types, obtaining the second information entropy, calculating variance of all classification probabilities when the difference value between the second information entropy and the first information entropy is larger than a preset information entropy threshold, generating early warning information when the variance is larger than the preset variance threshold, updating the model by combining data of a plurality of clients by the server, enabling the classification accuracy of the updated classification model to be higher, comparing the information entropy, determining the prediction type of the client and the information entropy, and timely detecting the type of the data of the client, and timely detecting the early warning information of the type of the medical data under the medical platform, and detecting the situation that the abnormality is detected in time is high in accuracy and the situation that the type of the medical data is detected is high.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment of a method for detecting variance of data types according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for detecting variance of data types according to an embodiment of the invention;
fig. 3 is a schematic structural diagram of a variation detecting device for data types according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
The embodiment of the invention can acquire and process the related data based on the artificial intelligence technology. Among these, artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
It should be understood that the sequence numbers of the steps in the following embodiments do not mean the order of execution, and the execution order of the processes should be determined by the functions and the internal logic, and should not be construed as limiting the implementation process of the embodiments of the present invention.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
The method for detecting the variation of the data type provided by the embodiment of the invention can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server. The client includes, but is not limited to, a palm top computer, a desktop computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cloud terminal device, a personal digital assistant (personal digital assistant, PDA), and other computer devices. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
The client and the server can be deployed in the digital medical platform, and the digital medical platform can support the functions of disease auxiliary diagnosis, health management, remote consultation and the like, so that the efficiency of a medical institution is improved, residents can conveniently seek medical attention, the client can provide a data type mutation detection task to timely detect when the mutation occurs, such as virus carrying information, virus strain type and the like, in a medical scene, and the data type detection efficiency is improved.
Referring to fig. 2, a flow chart of a method for detecting a variance of a data type according to an embodiment of the present invention is provided, where the method for detecting variance can be applied to a client in fig. 1, the client belongs to a joint system, the joint system may include a server and a plurality of clients, a computer device corresponding to the client communicates with the server, so as to send first detection data local to the client and a data type corresponding to the first detection data to the server, so as to provide training data for training a basic classification model by the server, the computer device corresponding to the client receives an updated classification model sent by the server, and the updated classification model may refer to the updated classification model. As shown in fig. 2, the mutation detection method may include the steps of:
Step S201, the N first detection data and the corresponding data types acquired in the first target time period are sent to a server containing a basic classification model.
The first target time period may be used to determine a time period of collecting data, N is an integer greater than zero, the first detection data may be detection data to be used as training data, a data type corresponding to the first detection data may be a data type to which the first detection data belongs, in this embodiment, a virus carrying information type is taken as an example, specifically, a strain type of a new coronavirus is taken as a data type to which the first detection data belongs, the strain type may include strain types such as alpha, beta, delta, amikates, and the like, and accordingly, the first detection data may be nucleic acid detection data of the new coronavirus at this time.
The server may refer to a central server in a joint system, where the central server communicates with a plurality of clients respectively, but the clients do not communicate with each other to ensure the privacy of local data, and the basic classification model may refer to a classification model to be updated, where the classification model to be updated may be a classification model already pre-trained.
Specifically, in this embodiment, the server should be a trusted server, that is, each client can trust that the server can protect the privacy of its local data.
Each client may use the same first target period of time when locally acquiring data, for example, may take days as a unit of data acquisition, and the first target period of time may be 0 to 24 hours per day.
In one embodiment, the first target time periods adopted by the respective clients may be different, for example, the first target time period of the client a is 0 to 24 hours per day, the first target time period of the client B is monday and tuesday of the week, etc., but it should be noted that, whether or not the first target time periods are the same, each client should send the collected data to the server before the reception time limit set by the server, for example, the reception time limit may be set to friday of the week.
The step of sending the N first detection data acquired in the first target time period and the data types corresponding to the N first detection data to the server containing the basic classification model can send the local data to the server in the form of multiple clients, so that abundant data are provided for training the basic classification model by the server, the classification accuracy of the trained updated classification model is improved, and the privacy of the local data can be effectively protected.
Step S202, receiving an updated classification model sent by a server.
The updating classification model is obtained by updating the basic classification model by the server according to all the first detection data and the corresponding data types provided by each client.
Specifically, the server uses a received first detection data and a data type corresponding to the first detection data as a group of training samples and labels thereof, combines all the groups of training samples and labels thereof, trains a basic classification model deployed by the server, and the loss function adopted by the training can be a cross entropy loss function.
In one embodiment, the server may use the classification model updated last time as the basic classification model, and at this time, a part of learned knowledge can be kept during each update, and knowledge of all sets of training samples and labels during the update is learned.
In one embodiment, the local classification model carried on the client may be trained by the client, the local classification model of the client may be a pre-trained classification model or an updated classification model sent by the server when updated last time, the client trains the local classification model by using all local first detection data and corresponding data types thereof as local training samples and corresponding labels, the training loss function may still adopt a cross entropy loss function, after the local classification model is trained, the client directly sends model parameters of the local classification model to the server, the server integrates model parameters of the local classification model provided by a plurality of clients to obtain model parameters of the updated classification model, and then distributes the model parameters of the updated classification model to each client, thereby further improving the privacy of local data of the client, that is, only specific content of the local data can be known by the client, but specific content of the local data cannot be obtained by other clients and the server, but the client provided updated classification parameters can learn the local classification model data of the plurality of clients.
Optionally, updating the basic classification model includes:
obtaining basic model parameters of a basic classification model;
updating the basic model parameters according to all the first detection data and the corresponding data types provided by each client to obtain updated model parameters;
and adding the basic model parameters and the updated model parameters, determining an addition result as a target model parameter, and configuring the basic classification model according to the target model parameter to obtain the updated classification model.
The basic model parameters may refer to model parameters of a basic classification model, updated model parameters may be used to characterize knowledge learned based on all first detection data provided by each client and corresponding data types thereof, and target model parameters may be used to configure the classification model to obtain an updated classification model.
Specifically, the calculation modes of the basic model parameters and the updated model parameters may further include mean calculation, weighted summation calculation, weighted mean calculation, and the like, so as to take the calculation result as the target model parameters.
After the server side adopts the target model parameters to configure the basic classification, the target model parameters need to be distributed to each client side so that the client side can configure the local classification model.
Optionally, updating the basic model parameters according to all the first detection data and the corresponding data types provided by each client, and obtaining updated model parameters includes:
counting the total quantity of all first detection data provided by all clients, and updating basic model parameters according to all the first detection data provided by the clients and the corresponding data types of the first detection data aiming at any client to obtain first sub-model parameters;
counting the first quantity of all the first detection data provided by the client, and comparing the first quantity with the total quantity to obtain the corresponding reference weight of the client;
multiplying the first sub-model parameter by the reference weight, and determining the multiplied result as a weighted sub-model parameter;
traversing all clients to obtain weighted sub-model parameters of the corresponding clients, adding all weighted sub-model parameters, and determining an addition result as updated model parameters.
The total number may be the sum of the numbers of all the first detection data received by the server in the update time period, and the update time period may be the time range between the last update time and the current update time.
The first sub-model parameters may refer to a result of updating the base model parameters based on data provided by the single client, the first quantity may refer to a sum of quantities of first detection data provided by the single client during an update period, the reference weights may be used to characterize a contribution degree of the single client to the data quantity used for updating, and the weighted sub-model parameters may be used to characterize a modified sub-model parameter after the first sub-model parameters are combined with the contribution degree of the corresponding client.
Specifically, the larger the reference weight is, the more data is provided by the corresponding client for the current update, and accordingly, the knowledge learned according to the data provided by the client should have a larger influence when the classification model is updated, so that the product of the reference weight and the first sub-model parameter of the corresponding client is taken as a weighting sub-model parameter, and it is required to be noted that the parameter calculation modes of all the weighting sub-models may also include mean value calculation and the like.
The step of receiving the updated classification model sent by the server can enable the client to acquire the updated classification model after the server performs federal learning, so that the classification model with stronger generalization capability and higher accuracy rate is deployed for each client, and the accuracy rate of the subsequent type prediction is improved.
Step S203, inputting at least two second detection data acquired in the second target time period into the updated classification model respectively for classification prediction, and outputting the prediction type and the classification probability thereof corresponding to the second detection data.
The second target period may also be used to determine a period of time for collecting data, however, it should be noted that the starting time of the second target period should be later than the ending time of the first target period, that is, all second detection data collected in the second target period are subjected to class prediction by adopting an updated classification model, the second detection data may be local detection data collected by the client in the second target period, the prediction type may be a prediction result of the updated classification model corresponding to the second detection data, the classification probability may be a probability of the second detection data belonging to a prediction type, and the prediction type may belong to any data type.
Optionally, inputting at least two second detection data acquired in the second target time period into the updated classification model respectively for performing class prediction, and outputting the prediction type and the classification probability thereof corresponding to the second detection data includes:
for any one second detection data, inputting the second detection data into an updated classification model for class prediction, outputting prediction probabilities of M preset types of the second detection data respectively, wherein M is an integer greater than zero;
determining the maximum value in all the prediction probabilities as the classification probability of the second detection data, and taking the preset type corresponding to the classification probability as the prediction type of the second detection data;
and traversing all the second detection data to obtain the prediction type and the classification probability of the corresponding second detection data.
Wherein the preset types may correspond to data types, i.e. one preset type has a corresponding one of the data types, and M may be used to represent the number of all the data types.
Specifically, updating the second detection data input by the classification model prediction belongs to the prediction values of different preset types, respectively carrying out normalization processing on the M prediction values output by the single second detection data through a normalization exponential function, and determining that the normalization result is the prediction probability of each preset type corresponding to the second detection data, wherein the sum of the prediction probabilities of all preset types corresponding to the second detection data is 1.
If one prediction probability is the maximum value in all prediction probabilities, the possibility that the input second detection data belongs to the preset type corresponding to the prediction probability is indicated to be the maximum, and the maximum value is taken as the classification probability of the second detection data.
Step S204, calculating information entropy of all data types to obtain a first information entropy, and calculating information entropy of all prediction types to obtain a second information entropy.
Wherein the first information entropy may be used to characterize the uncertainty of all the first detection data corresponding data types, and the second information entropy may be used to characterize the uncertainty of all the second detection data corresponding prediction types.
Optionally, calculating the information entropy of all data types, and obtaining the first information entropy includes:
counting the number of first detection data respectively belonging to each data type in N pieces of first detection data, and determining the ratio as a first reference probability of the corresponding data type by comparing the number of the corresponding data type with N;
and calculating to obtain a first information entropy according to the first reference probability and the information entropy function corresponding to all the data types.
The first reference probability may be used to characterize a probability that any one of the first detection data belongs to the corresponding data type, where the probability is a statistical probability, that is, it may be understood that one first detection data is selected from all the first detection data, and the first detection data belongs to the probability of the corresponding data type.
Specifically, the calculation formula of the first information entropy may be expressed as:
wherein H (X) may refer to the first information entropy, X of all the first detection data corresponding to the data type m May refer to the mth data type, M is the total number of data types, p (x m ) May refer to a first reference probability for the mth data type.
Optionally, calculating information entropy of all prediction types, and obtaining the second information entropy includes:
counting the number of second detection data respectively belonging to each prediction type in all second detection data, comparing the number of the corresponding prediction types with the total number of all second detection data, and determining the ratio as a second reference probability of the corresponding prediction types;
and calculating to obtain a second information entropy according to the second reference probability and the information entropy function corresponding to all the prediction types.
The second reference probability may be used to characterize the probability that any one of the second detection data belongs to the corresponding prediction type, which is also a statistical probability, that is, it may be understood that one second detection data is selected from all the second detection data, which belongs to the probability of the corresponding prediction type.
Specifically, the calculation formula of the first information entropy may be expressed as:
Wherein H (Y) can refer to the second information entropy of all the second detection data corresponding to the prediction type, Y m May refer to the mth prediction type, the total number of prediction types is the same as the total number of data types, and is also M, p (y m ) May refer to a second reference probability of the mth prediction type.
The step of calculating the information entropy of all the data types to obtain the first information entropy, calculating the information entropy of all the prediction types to obtain the second information entropy, and characterizing the uncertainty degree of all the first detection data corresponding to the data types and the uncertainty degree of all the second detection data corresponding to the prediction types through the information entropy, so that the change of the uncertainty degree can be quantified, the follow-up analysis of the local data based on the first information entropy and the second information entropy is facilitated, and the accuracy of the data type mutation detection is improved.
Step S205, when the difference value between the second information entropy and the first information entropy is larger than a preset information entropy threshold, calculating variances of all classification probabilities, and if the variances are larger than the preset variance threshold, generating early warning information.
The variance can be used for judging whether the deviation of the classification probabilities from each other is usually caused by smaller classification probabilities of part of the detection data, and the smaller classification probabilities indicate that the classification model may not learn knowledge of the detection data during training, only because of limitation of classification tasks, and the classification model is distributed to the corresponding prediction type.
Specifically, after calculating the difference between the second information entropy and the first information entropy, the absolute value calculation may be performed on the difference, and the absolute value calculation result is compared with a preset information entropy threshold, if the absolute value calculation result is greater than the preset information entropy threshold, in this embodiment, the information entropy threshold may be set to 0.2, and an implementer may adjust the information entropy threshold according to an actual situation, so that the information amount contained in the detection data locally collected by the client side changes greatly, where the reason for the large change is that, besides the first detection data conforming to the original distribution, other first detection data exist, and the other first detection data do not have corresponding data types in the classification model, so that when the classification task is executed by the classification model, the other first detection data are classified into the known data types in an error manner, and the information entropy changes greatly.
The process of calculating the variance of all the classification probabilities and comparing the variance with the variance threshold can be considered as a verification step, that is, determining that the data type abnormal condition possibly exists through the comparison result of the difference value between the second information entropy and the first information entropy and the preset information entropy threshold, and further determining whether the data type abnormal condition exists through the process of comparing the variance with the variance threshold.
Since the updated classification model is dynamically updated, the default updated classification model can learn the knowledge of the detected data in time, that is, for the detected data of the known data type, the updated classification model can output a classification probability close to 1, and if the detected data of the abnormal data type appears, the updated classification model can still output a classification probability, but the classification probability is smaller at this time, for example, the classification probability is 0.5, and at this time, the detected data of the abnormal data type is more likely to belong to the predicted type than other data types, so that whether the detected data exists can be determined according to the variance of the classification probability.
Optionally, after detecting that the difference between the second information entropy and the first information entropy is greater than the preset information entropy threshold, the method further includes:
for any data type, acquiring the duty ratio of all first detection data belonging to the data type in N first detection data to obtain a first ratio;
acquiring the duty ratio of each second detection data belonging to the same prediction type as the data type in all second detection data to obtain a second ratio;
traversing all data types by taking the difference value of the second ratio and the first ratio as the new proportion of the data types to obtain the new proportion of the corresponding data types, and determining the data type corresponding to the maximum value of the new proportion as the target type;
Correspondingly, calculating variances of all the classification probabilities, and if the variances are larger than a preset variance threshold, generating early warning information comprises:
and calculating variances of classification probabilities corresponding to all the second detection data belonging to the target type, and if the variances are larger than a preset variance threshold, generating early warning information.
Wherein the first ratio may be used to characterize the specific gravity of the first detection data belonging to the corresponding data type in all the first detection data, and the first ratio may be used to characterize the specific gravity of the second detection data belonging to the corresponding prediction type in all the second detection data.
The new increment scale may be used to characterize the degree of change in the amount of detected data for the corresponding data type, and the target type may refer to the data type that is most likely to contain abnormal detected data.
Specifically, the data type corresponding to the maximum value of the new proportion is the data type with the largest variation degree of the corresponding detected data amount, and the data type is more likely to contain the detected data of the abnormal data type.
In one embodiment, the first K data types with the largest new increment ratio may be determined as target types, and the variance corresponding to each target type may be calculated respectively and compared with a variance threshold, so as to avoid omission, and in this embodiment, the variance threshold may be set to 5.
Accordingly, after the target type is determined, whether detection data of the abnormal data type exist can be detected only through variance change of the classification probability corresponding to the target type, so that the calculated amount can be effectively reduced, and the calculation efficiency and the detection efficiency can be improved.
In the embodiment, the data of the plurality of clients are combined by the server to update the model, so that the classification accuracy of the updated classification model is higher, the abnormal condition of the local detection data of the client is determined according to the comparison of information entropy, the variant data type can be conveniently found in time and early-warned, the accuracy and the efficiency of the variant detection of the data type are improved, the variant virus carrying information can be timely detected in a medical scene, and the reliability of the digital medical platform is improved.
Fig. 3 shows a block diagram of a data type variation detecting device according to a second embodiment of the present invention, where the variation detecting device is applied to a client, the client belongs to a joint system, the joint system may include a server and a plurality of clients, a computer device corresponding to the client communicates with the server to send first detection data local to the client and data types corresponding to the first detection data to the server to provide training data for training a basic classification model by the server, the computer device corresponding to the client receives an updated classification model sent by the server, and the updated classification model may refer to the updated classification model. For convenience of explanation, only portions relevant to the embodiments of the present invention are shown.
Referring to fig. 3, the variation detecting apparatus includes:
the data sending module 31 is configured to send N first detection data and corresponding data types thereof acquired in a first target period to a server that includes a basic classification model, where N is an integer greater than zero;
the model receiving module 32 is configured to receive an updated classification model sent by the server, where the updated classification model is obtained by updating, by the server, the basic classification model according to all the first detection data and the corresponding data types provided by each client;
the class prediction module 33 is configured to input at least two second detection data acquired in the second target time period into the updated classification model respectively to perform class prediction, and output a prediction type and a classification probability thereof corresponding to the second detection data;
the information entropy calculation module 34 is configured to calculate information entropy of all data types to obtain a first information entropy, and calculate information entropy of all prediction types to obtain a second information entropy;
the variation early warning module 35 is configured to calculate variances of all the classification probabilities when the difference between the second information entropy and the first information entropy is greater than a preset information entropy threshold, and generate early warning information for expressing that the type of the second detection data is varied if the variances are greater than the preset variance threshold.
Optionally, the mutation detection device further includes:
the parameter acquisition module is used for acquiring basic model parameters of the basic classification model;
the model updating module is used for updating the basic model parameters according to all the first detection data provided by each client and the corresponding data types thereof to obtain updated model parameters;
and the parameter configuration module is used for adding the basic model parameters and the updated model parameters, determining the addition result as the target model parameters, and configuring the basic classification model according to the target model parameters to obtain the updated classification model.
Optionally, the model updating module includes:
the sub-parameter acquisition unit is used for counting the total quantity of all the first detection data provided by all the clients, and updating the basic model parameters according to all the first detection data provided by the clients and the corresponding data types of the first detection data aiming at any one of the clients to obtain first sub-model parameters;
the weight calculation unit is used for counting the first quantity of all the first detection data provided by the client, and comparing the first quantity with the total quantity to obtain the reference weight corresponding to the client;
the parameter weighting unit is used for multiplying the first sub-model parameter by the reference weight and determining the multiplication result as a weighted sub-model parameter;
And the parameter determining unit is used for traversing all the clients to obtain weighted sub-model parameters of the corresponding clients, adding all the weighted sub-model parameters, and determining the addition result as updated model parameters.
Optionally, the category prediction module 33 includes:
the probability prediction unit is used for inputting the second detection data into the updated classification model for class prediction aiming at any one of the second detection data, outputting the prediction probabilities of the second detection data corresponding to M preset types respectively, wherein M is an integer greater than zero;
the type determining unit is used for determining that the maximum value in all the prediction probabilities is the classification probability of the second detection data, and taking the preset type corresponding to the classification probability as the prediction type of the second detection data;
the data traversing unit is used for traversing all the second detection data to obtain the prediction type and the classification probability of the corresponding second detection data.
Optionally, the information entropy calculating module 34 includes:
the first probability calculation unit is used for counting the number of the first detection data respectively belonging to each data type in N first detection data, and determining the ratio as a first reference probability of the corresponding data type by comparing the number of the corresponding data type with N;
The first information entropy calculation unit is used for calculating the first information entropy according to the first reference probability and the information entropy function corresponding to all the data types.
Optionally, the information entropy calculating module 34 includes:
the second probability calculation unit is used for counting the number of the second detection data respectively belonging to each prediction type in all the second detection data, and determining the ratio as a second reference probability of the corresponding prediction type by comparing the number of the corresponding prediction type with the total number of all the second detection data;
and the second information entropy calculation unit is used for calculating the second information entropy according to the second reference probability and the information entropy function corresponding to all the prediction types.
Optionally, the mutation detection device further includes:
the first ratio calculation module is used for obtaining the duty ratio of all the first detection data belonging to the data type in N first detection data aiming at any data type to obtain a first ratio;
the second ratio calculation module is used for obtaining the duty ratio of each second detection data belonging to the same prediction type as the data type in all second detection data to obtain a second ratio;
the target type determining module is used for traversing all the data types by taking the difference value of the second ratio and the first ratio as the new proportion of the data types to obtain the new proportion of the corresponding data types, and determining the data type corresponding to the maximum value of the new proportion as the target type;
Accordingly, the variation early warning module 35 includes:
the variance calculating unit is used for calculating variances of the classification probabilities corresponding to all the second detection data belonging to the target type, and if the variances are larger than a preset variance threshold, early warning information is generated.
It should be noted that, because the content of information interaction and execution process between the modules and units is based on the same concept as the method embodiment of the present invention, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
Fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. As shown in fig. 4, the computer device of this embodiment includes: at least one processor (only one shown in fig. 4), a memory, and a computer program stored in the memory and executable on the at least one processor, the processor executing the computer program to perform the steps of any of the various variation detection method embodiments described above.
The computer device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 4 is merely an example of a computer device and is not intended to limit the computer device, and that a computer device may include more or fewer components than shown, or may combine certain components, or different components, such as may also include a network interface, a display screen, an input device, and the like.
The processor may be a CPU, but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory includes a readable storage medium, an internal memory, etc., where the internal memory may be the memory of the computer device, the internal memory providing an environment for the execution of an operating system and computer-readable instructions in the readable storage medium. The readable storage medium may be a hard disk of a computer device, and in other embodiments may be an external storage device of the computer device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. that are provided on the computer device. Further, the memory may also include both internal storage units and external storage devices of the computer device. The memory is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs such as program codes of computer programs, and the like. The memory may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again. The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above-described embodiment, and may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiment described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The present invention may also be implemented as a computer program product for implementing all or part of the steps of the method embodiments described above, when the computer program product is run on a computer device, causing the computer device to execute the steps of the method embodiments described above.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided by the present invention, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention.

Claims (10)

1. A method for detecting a variation in a data type, wherein the method is used for a client in a joint system, the joint system further includes a server, and the method for detecting a variation includes:
n pieces of first detection data and corresponding data types thereof acquired in a first target time period are sent to a server containing a basic classification model, wherein N is an integer greater than zero;
Receiving an updated classification model sent by the server, wherein the updated classification model is obtained by updating the basic classification model by the server according to all first detection data and corresponding data types provided by each client;
respectively inputting at least two second detection data acquired in a second target time period into the updated classification model to perform class prediction, and outputting a prediction type and classification probability of the corresponding second detection data;
calculating information entropy of all data types to obtain a first information entropy, and calculating information entropy of all prediction types to obtain a second information entropy;
and calculating variances of all classification probabilities when the difference value between the second information entropy and the first information entropy is larger than a preset information entropy threshold, and generating early warning information for expressing that the type of the second detection data is mutated if the variances are larger than the preset variance threshold.
2. The method of claim 1, wherein the updating the base classification model comprises:
obtaining basic model parameters of the basic classification model;
updating the basic model parameters according to all the first detection data and the corresponding data types provided by each client to obtain updated model parameters;
And adding the basic model parameters and the updated model parameters, determining an addition result as a target model parameter, and configuring the basic classification model according to the target model parameter to obtain the updated classification model.
3. The mutation detection method according to claim 2, wherein updating the basic model parameters according to all the first detection data and the corresponding data types provided by the respective clients to obtain updated model parameters includes:
counting the total quantity of all first detection data provided by all clients, and updating the basic model parameters according to all the first detection data provided by the clients and the corresponding data types of the first detection data aiming at any client to obtain first sub-model parameters;
counting the first quantity of all first detection data provided by the client, and comparing the first quantity with the total quantity to obtain the corresponding reference weight of the client;
multiplying the first sub-model parameter by the reference weight, and determining a multiplication result as a weighted sub-model parameter;
traversing all clients to obtain weighted sub-model parameters of the corresponding clients, adding all weighted sub-model parameters, and determining an addition result as the updated model parameters.
4. The mutation detection method according to claim 1, wherein the inputting the at least two second detection data acquired in the second target period into the updated classification model for class prediction, respectively, and outputting the prediction type and the classification probability thereof corresponding to the second detection data comprises:
for any one second detection data, inputting the second detection data into the updated classification model for class prediction, and outputting prediction probabilities of M preset types corresponding to the second detection data respectively, wherein M is an integer greater than zero;
determining the maximum value in all the prediction probabilities as the classification probability of the second detection data, and taking the preset type corresponding to the classification probability as the prediction type of the second detection data;
and traversing all the second detection data to obtain the prediction type and the classification probability of the corresponding second detection data.
5. The method of claim 1, wherein calculating the entropy of all data types to obtain the first entropy comprises:
counting the number of first detection data respectively belonging to each data type in the N first detection data, and determining the ratio as a first reference probability of the corresponding data type by comparing the number of the corresponding data type with N;
And calculating the first information entropy according to the first reference probability and the information entropy function corresponding to all the data types.
6. The method of claim 1, wherein calculating the entropy of all prediction types to obtain the second entropy comprises:
counting the number of second detection data respectively belonging to each prediction type in all second detection data, comparing the number of the corresponding prediction types with the total number of all second detection data, and determining the ratio as a second reference probability of the corresponding prediction types;
and calculating to obtain the second information entropy according to the second reference probability and the information entropy function corresponding to all the prediction types.
7. The variation detection method according to any one of claims 1 to 6, further comprising, after detecting that a difference between the second information entropy and the first information entropy is greater than a preset information entropy threshold:
for any data type, acquiring the duty ratio of all first detection data belonging to the data type in the N first detection data to obtain a first ratio;
acquiring the duty ratio of each second detection data belonging to the same prediction type as the data type in all second detection data to obtain a second ratio;
Taking the difference value of the second ratio and the first ratio as the new ratio of the data types, traversing all the data types to obtain the new ratio of the corresponding data types, and determining the data type corresponding to the maximum value of the new ratio as the target type;
correspondingly, calculating the variance of all the classification probabilities, and if the variance is greater than a preset variance threshold, generating the early warning information includes:
and calculating variances of classification probabilities corresponding to all the second detection data belonging to the target type, and if the variances are larger than a preset variance threshold, generating the early warning information.
8. A variation detection apparatus of a data type, characterized in that the variation detection apparatus comprises:
the data transmission module is used for transmitting N pieces of first detection data acquired in a first target time period and corresponding data types thereof to a server containing a basic classification model, wherein N is an integer greater than zero;
the model receiving module is used for receiving an updated classification model sent by the server, wherein the updated classification model is obtained by updating the basic classification model by the server according to all first detection data and corresponding data types provided by each client;
The class prediction module is used for respectively inputting at least two second detection data acquired in a second target time period into the updated classification model to perform class prediction and outputting a prediction type and classification probability of the corresponding second detection data;
the information entropy calculation module is used for calculating the information entropy of all data types to obtain a first information entropy, and calculating the information entropy of all prediction types to obtain a second information entropy;
the variation early warning module is used for calculating variances of all classification probabilities when the difference value between the second information entropy and the first information entropy is larger than a preset information entropy threshold value, and generating early warning information if the variances are larger than the preset variance threshold value, wherein the early warning information is used for expressing that the type of the second detection data is varied.
9. A computer device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the variation detection method according to any one of claims 1 to 7 when the computer program is executed by the processor.
10. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the mutation detection method according to any one of claims 1 to 7.
CN202310713654.8A 2023-06-15 2023-06-15 Method, device, computer equipment and medium for detecting variation of data type Pending CN116662904A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310713654.8A CN116662904A (en) 2023-06-15 2023-06-15 Method, device, computer equipment and medium for detecting variation of data type

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310713654.8A CN116662904A (en) 2023-06-15 2023-06-15 Method, device, computer equipment and medium for detecting variation of data type

Publications (1)

Publication Number Publication Date
CN116662904A true CN116662904A (en) 2023-08-29

Family

ID=87713634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310713654.8A Pending CN116662904A (en) 2023-06-15 2023-06-15 Method, device, computer equipment and medium for detecting variation of data type

Country Status (1)

Country Link
CN (1) CN116662904A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057786A (en) * 2023-10-11 2023-11-14 中电科大数据研究院有限公司 Intelligent operation and maintenance management method, system and storage medium for data center

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117057786A (en) * 2023-10-11 2023-11-14 中电科大数据研究院有限公司 Intelligent operation and maintenance management method, system and storage medium for data center
CN117057786B (en) * 2023-10-11 2024-01-02 中电科大数据研究院有限公司 Intelligent operation and maintenance management method, system and storage medium for data center

Similar Documents

Publication Publication Date Title
CN111475804B (en) Alarm prediction method and system
CN112446025A (en) Federal learning defense method and device, electronic equipment and storage medium
CN112382407A (en) Risk management and control method and device, electronic equipment and storage medium
CN116662904A (en) Method, device, computer equipment and medium for detecting variation of data type
CN116311539B (en) Sleep motion capturing method, device, equipment and storage medium based on millimeter waves
CN112102959A (en) Server, data processing method, data processing device and readable storage medium
CN115034315A (en) Business processing method and device based on artificial intelligence, computer equipment and medium
CN113516275A (en) Power distribution network ultra-short term load prediction method and device and terminal equipment
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN116010228B (en) Time estimation method and device for network security scanning
CN116756522B (en) Probability forecasting method and device, storage medium and electronic equipment
CN116257885A (en) Private data communication method, system and computer equipment based on federal learning
CN112380126A (en) Web system health prediction device and method
CN116911824A (en) Intelligent decision method and system based on electric power big data
CN116842520A (en) Anomaly perception method, device, equipment and medium based on detection model
CN115037790B (en) Abnormal registration identification method, device, equipment and storage medium
CN116151369A (en) Bayesian-busy robust federal learning system and method for public audit
Lijun et al. An intuitionistic calculus to complex abnormal event recognition on data streams
CN114416417A (en) System abnormity monitoring method, device, equipment and storage medium
CN113868660B (en) Training method, device and equipment for malicious software detection model
CN117375855A (en) Abnormality detection method, model training method and related equipment
CN112259239B (en) Parameter processing method and device, electronic equipment and storage medium
CN113626461B (en) Information searching method, terminal device and computer readable storage medium
CN116486206A (en) Data processing method, device, computer equipment and medium based on model optimization
CN116452264A (en) Advertisement putting method and device based on artificial intelligence, terminal equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination