CN116913525B - Feature group normalization method, device, electronic equipment and storage medium - Google Patents

Feature group normalization method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116913525B
CN116913525B CN202311166869.9A CN202311166869A CN116913525B CN 116913525 B CN116913525 B CN 116913525B CN 202311166869 A CN202311166869 A CN 202311166869A CN 116913525 B CN116913525 B CN 116913525B
Authority
CN
China
Prior art keywords
feature
value
characteristic
task completion
completion data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311166869.9A
Other languages
Chinese (zh)
Other versions
CN116913525A (en
Inventor
傅云凤
吴珊珊
郭芷含
胡锦辉
熊晓夙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Everything Chengli Technology Co ltd
Original Assignee
Beijing Everything Chengli Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Everything Chengli Technology Co ltd filed Critical Beijing Everything Chengli Technology Co ltd
Priority to CN202311166869.9A priority Critical patent/CN116913525B/en
Publication of CN116913525A publication Critical patent/CN116913525A/en
Application granted granted Critical
Publication of CN116913525B publication Critical patent/CN116913525B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure provides a feature set normalization method, a feature set normalization device, electronic equipment and a storage medium. One specific implementation mode of the method comprises the steps of obtaining a task completion data feature set, wherein the task completion data feature set comprises feature values of at least one feature obtained by extracting features of task completion data; and executing normalization operation for each feature included in the task completion data feature group. In the feature normalization process, the embodiment considers the feature feedback direction and the feature type besides the value range of the feature, and realizes targeted feature normalization.

Description

Feature group normalization method, device, electronic equipment and storage medium
Technical Field
The embodiment of the disclosure relates to the technical field of digital intervention therapy, in particular to a task completion data feature set normalization method, a task completion data feature set normalization device, electronic equipment and a storage medium.
Background
In order to objectively evaluate and analyze the cognitive ability development level of the human brain, or to enhance the cognitive ability of the human brain, the conventional evaluation tools are as follows:
first, the scale method: evaluating through subjective answer questions;
Second, using physiological data acquisition instrumentation: the method adopts a physiological data acquisition instrument to acquire physiological data of a human body and evaluate cognitive ability after analysis. However, because a physiological data acquisition instrument is needed, the measurement threshold is high, the large-scale popularization is difficult, and the acquired data is unstable (because the data can be lost due to signal problems);
third, psychology experimental paradigm: the method requires professional personnel to guide and finish, has higher labor cost, has single normal form task result, and is difficult to explain the capability more in elements and richly;
after the evaluation data is obtained, it can be analyzed using an evaluation algorithm such as the following:
first, analysis is performed by using deep learning, but since the interpretation of the deep learning model is weak, accurate analysis cannot be performed;
secondly, calculating by adopting a self-defined calculation formula, wherein no weight is introduced into the self-defined calculation formula;
the analysis indexes used in the analysis process may include, for example, the following indexes:
first, a mapping table set manually: the mapping table is obtained based on common modes of people, but the common modes are difficult to revise in time due to the change of the years, or the score division based on the common modes is not objective and accurate due to different regional people.
Second, game answer results: the index dimension is single, whether students answer pairs or not can only be known, and the cognitive ability is not accurate enough only by using the index.
In order to solve the problems of single dimension and low objectivity and accuracy of index data for evaluating the cognitive ability, more dimension cognitive ability evaluation indexes can be provided, multidimensional features are extracted from data of a subject for executing a cognitive ability evaluation task, but different types of features have different influence degrees and directions on the cognitive ability evaluation, so that the subsequent evaluation of the cognitive ability is convenient, and the different types of features need to be normalized.
Disclosure of Invention
The embodiment of the disclosure provides a task completion data feature set normalization method, a task completion data feature set normalization device, electronic equipment and a storage medium.
In a first aspect, embodiments of the present disclosure provide a task completion data feature set normalization method, the method comprising:
acquiring a task completion data feature set, wherein the task completion data feature set comprises a feature value of at least one feature obtained by extracting features of task completion data;
for each feature included in the task completion data feature set, performing the following normalization operation: acquiring a feature category, a feature feedback direction and a preset feature minimum value and maximum value of the feature, wherein the feature feedback direction is used for representing a correlation direction between the feature value of the feature and the degree of capability of completing the task, and the feature feedback direction is positive correlation or negative correlation; determining a normalization method corresponding to the feature according to the feature class of the feature; and normalizing the characteristic value of the characteristic in each task completion data characteristic group based on the characteristic feedback direction of the characteristic, the preset characteristic minimum value and the maximum value according to a normalization method corresponding to the characteristic, so as to obtain the normalized characteristic value of the characteristic in the corresponding task completion data characteristic group.
In some alternative embodiments, the feature classes include a ratio class feature, a time class feature, and other class features; and determining a normalization method corresponding to the feature according to the feature category of the feature, wherein the normalization method comprises the following steps:
in response to determining that the feature class of the feature is a ratio class feature or a time class feature, determining that a normalization method corresponding to the feature is post-preprocessing normalization;
in response to determining that the feature class of the feature is other class of features, determining that the normalization method corresponding to the feature is conventional normalization.
In some optional embodiments, the normalizing the feature value of the feature in each task completion data feature group according to the normalization method corresponding to the feature based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value, to obtain a normalized feature value of the feature in the corresponding task completion data feature group, including:
in response to determining that the normalization method corresponding to the feature is conventional normalization, completing the task for each of the task completion data feature setsThe characteristic value of the characteristic, the characteristic feedback direction of the characteristic, the preset minimum value and the maximum value of the characteristic in the data characteristic group are respectively substituted into the following conventional normalization formula 、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining normalized feature values for the feature in the set of task completion data features:
in some optional embodiments, the normalizing the feature value of the feature in each task completion data feature group according to the normalization method corresponding to the feature based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value, to obtain a normalized feature value of the feature in the corresponding task completion data feature group, including:
determining a pretreatment method corresponding to the feature according to the feature category of the feature in response to determining that the normalization method corresponding to the feature is pretreatment-after-pretreatment normalization;
respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
completion of each of the tasksThe data feature group is used for substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the maximum value in the task completion data feature group into the conventional normalization formula respectively 、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a conventional normalized feature value for the feature in the task completion data feature set; substituting the processed characteristic value of the characteristic, the characteristic feedback direction of the characteristic, the preset processed characteristic minimum value and the preset processed characteristic maximum value in the task completion data characteristic group into +.>、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a processed normalized feature value for the feature in the task completion data feature set;
calculating distribution bias based on the distribution of the conventional normalized feature values of the feature in each task completion data feature group to obtain the conventional normalized distribution bias of the feature;
calculating distribution bias based on the distribution of the processed normalized feature values of the feature in each task completion data feature group to obtain the preprocessed normalized distribution bias of the feature;
determining whether an absolute value of the processed normalized distribution bias of the feature is less than a conventional normalized distribution bias of the feature;
in response to determining less than, determining a processed normalized feature value for the feature in each of the task completion data feature groups as a normalized feature value for the feature in the task completion data feature group;
And determining the conventional normalized feature value of the feature in each task completion data feature group as the normalized feature value of the feature in the task completion data feature group in response to determining that the conventional normalized feature value is not smaller than the conventional normalized feature value.
In some optional embodiments, the preprocessing method for determining the feature corresponding to the feature according to the feature class of the feature includes:
in response to determining that the feature class of the feature is a ratio class feature, determining that a preprocessing method corresponding to the feature is an exponentiation; and
the preprocessing method includes the steps of respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group, and comprises the following steps:
and respectively carrying out exponentiation operation by taking the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group as exponentiation operation by taking the first preset constant as a base in response to the determined pretreatment method, and respectively determining the obtained exponentiation operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group.
In some alternative embodiments, the first preset constant is a natural constant.
In some optional embodiments, the preprocessing method for determining the feature corresponding to the feature according to the feature class of the feature includes:
determining a preprocessing method corresponding to the feature as a base exponentiation in response to determining that the feature class of the feature is a time class feature and the feature feedback direction of the feature is positive correlation;
in response to determining that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is a negative correlation, determining that a preprocessing method corresponding to the feature is a logarithmic operation; and
the preprocessing method includes the steps of respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group, and comprises the following steps:
responding to the determined preprocessing method as a base exponentiation, respectively carrying out exponentiation by taking a second preset constant as an index and taking a minimum value and a maximum value of a preset characteristic value of the characteristic and a characteristic value of the characteristic in each task completion data characteristic group as bases, and respectively determining the obtained exponentiation result as a minimum value and a maximum value of the processed preset characteristic value of the characteristic and a processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
And taking the logarithm of the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group respectively by taking the third preset constant as a base in response to the determined preprocessing method, and determining the obtained logarithm operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group respectively.
In some alternative embodiments, each of the task completion data feature groups includes K features; and
the method further comprises the steps of:
generating a task completion data normalization feature group corresponding to each task completion data feature group by using the normalization feature value of each feature in each task completion data feature group, and generating a task completion data normalization feature group set by using the task completion data normalization feature group corresponding to each task completion data feature group in the task completion data feature group set;
calculating a correlation coefficient and a P value between any two different characteristics in K characteristics included in each task completion data normalization characteristic group based on the task completion data normalization characteristic group set;
Determining the number M of the normalized feature groups of the task completion data to be generated according to the number N of the normalized feature groups of the task completion data in the normalized feature group set of the task completion data, and generating an empty normalized feature group set of the simulated task completion data;
for each of the K features, determining at least two value intervals corresponding to the feature and screening probability corresponding to each value interval;
executing the generation operation of the normalized feature groups of the simulation task completion data until the number of the normalized feature groups of the simulation task completion data in the normalized feature group set of the simulation task completion data is not less than M, wherein the generation operation of the normalized feature groups of the simulation task completion data comprises the following steps: creating a simulation task completion data normalization feature set; according to the screening probability corresponding to each value interval of the 1 st feature in the K features, determining the 1 st feature screening value interval in each value interval of the 1 st feature, and randomly determining a numerical value in the 1 st feature screening value interval as the characteristic value of the 1 st feature in the new simulation task completion data normalization feature group; setting the initial value of the positive integer j as 2; for a j-th feature of the K features, performing a feature value generation operation until j is K, the feature value generation operation including: determining the feature with the largest absolute value of the correlation coefficient between the first j-1 features and the jth feature as the most relevant feature of the jth feature; determining whether an absolute value of a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold and a P-value is less than a preset P-value threshold; in response to determining that the screening value interval of the jth feature is determined according to the screening value interval of the most relevant feature of the jth feature; responding to the determination of the screening probability corresponding to each value interval of the jth feature, and determining the screening value interval of the jth feature in each value interval of the jth feature; randomly determining a value in the screening value interval of the jth feature as the feature value of the jth feature in the new simulation task completion data normalization feature group; and after increasing the value of j by 1, continuing to execute the characteristic value generation operation.
In some optional embodiments, the determining the screening value interval of the jth feature according to the screening value interval of the most relevant feature of the jth feature includes:
in response to determining that a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold, the preset correlation coefficient threshold being a constant greater than zero, determining a screening value interval of the most relevant feature of the jth feature as a screening value interval of the jth feature;
and in response to determining that the correlation coefficient between the jth feature and the jth feature is smaller than the inverse number of the preset correlation coefficient threshold, determining the inverse value interval of the screening value interval of the jth feature in the value interval of the jth feature as the screening value interval of the jth feature, wherein the minimum value and the maximum value corresponding to the inverse value interval of the screening value interval of the jth feature are respectively 1 minus the difference between the maximum value and the minimum value corresponding to the screening value interval of the jth feature.
In some optional embodiments, before the performing the operation of generating the normalized feature groups of the simulated task completion data until the number of normalized feature groups of the simulated task completion data in the set of normalized feature groups of the simulated task completion data is not less than M, the method further includes:
Determining whether an upper boundary task completion data normalization feature group and a lower boundary task completion data normalization feature group exist in the task completion data normalization feature group set, wherein each feature value in the upper boundary task completion data normalization feature group is 1, and each feature value in the lower boundary task completion data normalization feature group is 0;
generating an upper boundary task completion data normalization feature set and adding the upper boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the upper boundary task completion data normalization feature set does not exist;
and generating a lower boundary task completion data normalization feature set and adding the lower boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the lower boundary task completion data normalization feature set does not exist.
In some alternative embodiments, the at least one feature comprises at least one of: a feature for assessing attention, a feature for assessing self-control, and a feature for assessing conversion power.
In some alternative embodiments, the feature for assessing attention includes at least one of: continuously making standard deviation of the number of questions, continuously making weighted average of the number of questions, longest concentration time and time required for entering the longest concentration.
In some alternative embodiments, the feature for assessing self-control comprises at least one of: correct suppression ratio of inoperable time and disturbed error rate.
In some alternative embodiments, the feature for evaluating the conversion force comprises at least one of: the correctness of the thread switching questions, the correct response time of the thread switching questions, the difference of answering the correctness of the questions under different rules and the difference of responding the different rules.
In a second aspect, embodiments of the present disclosure provide a task completion data feature set normalization apparatus, the apparatus comprising:
the acquisition unit is configured to acquire a task completion data feature set, wherein the task completion data feature set comprises feature values of at least one feature obtained by extracting features of the task completion data;
a normalization unit configured to perform, for each feature included in the task completion data feature group, the following normalization operation: acquiring a feature category, a feature feedback direction and a preset feature minimum value and maximum value of the feature, wherein the feature feedback direction is used for representing a correlation direction between the feature value of the feature and the degree of capability of completing the task, and the feature feedback direction is positive correlation or negative correlation; determining a normalization method corresponding to the feature according to the feature class of the feature; and normalizing the characteristic value of the characteristic in each task completion data characteristic group based on the characteristic feedback direction of the characteristic, the preset characteristic minimum value and the maximum value according to a normalization method corresponding to the characteristic, so as to obtain the normalized characteristic value of the characteristic in the corresponding task completion data characteristic group.
In some alternative embodiments, the feature classes include a ratio class feature, a time class feature, and other class features; and determining a normalization method corresponding to the feature according to the feature category of the feature, wherein the normalization method comprises the following steps:
in response to determining that the feature class of the feature is a ratio class feature or a time class feature, determining that a normalization method corresponding to the feature is post-preprocessing normalization;
in response to determining that the feature class of the feature is other class of features, determining that the normalization method corresponding to the feature is conventional normalization.
In some optional embodiments, the normalizing the feature value of the feature in each task completion data feature group according to the normalization method corresponding to the feature based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value, to obtain a normalized feature value of the feature in the corresponding task completion data feature group, including:
in response to determining that the normalization method corresponding to the feature is conventional normalization,for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the maximum value in the task completion data feature group into the following conventional normalization formula 、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining normalized feature values for the feature in the set of task completion data features:
in some optional embodiments, the normalizing the feature value of the feature in each task completion data feature group according to the normalization method corresponding to the feature based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value, to obtain a normalized feature value of the feature in the corresponding task completion data feature group, including:
determining a pretreatment method corresponding to the feature according to the feature category of the feature in response to determining that the normalization method corresponding to the feature is pretreatment-after-pretreatment normalization;
respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the preset maximum value in the task completion data feature group into the conventional normalization formula 、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a conventional normalized feature value for the feature in the task completion data feature set; substituting the processed characteristic value of the characteristic, the characteristic feedback direction of the characteristic, the preset processed characteristic minimum value and the preset processed characteristic maximum value in the task completion data characteristic group into +.>、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a processed normalized feature value for the feature in the task completion data feature set;
calculating distribution bias based on the distribution of the conventional normalized feature values of the feature in each task completion data feature group to obtain the conventional normalized distribution bias of the feature;
calculating distribution bias based on the distribution of the processed normalized feature values of the feature in each task completion data feature group to obtain the preprocessed normalized distribution bias of the feature;
determining whether the absolute value of the preprocessed normalized distribution bias of the feature is less than the conventional normalized distribution bias of the feature;
in response to determining less than, determining a processed normalized feature value for the feature in each of the task completion data feature groups as a normalized feature value for the feature in the task completion data feature group;
And determining the conventional normalized feature value of the feature in each task completion data feature group as the normalized feature value of the feature in the task completion data feature group in response to determining that the conventional normalized feature value is not smaller than the conventional normalized feature value.
In some optional embodiments, the preprocessing method for determining the feature corresponding to the feature according to the feature class of the feature includes:
in response to determining that the feature class of the feature is a ratio class feature, determining that a preprocessing method corresponding to the feature is an exponentiation; and
the preprocessing method includes the steps of respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group, and comprises the following steps:
and respectively carrying out exponentiation operation by taking the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group as exponentiation operation by taking the first preset constant as a base in response to the determined pretreatment method, and respectively determining the obtained exponentiation operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group.
In some alternative embodiments, the first preset constant is a natural constant.
In some optional embodiments, the preprocessing method for determining the feature corresponding to the feature according to the feature class of the feature includes:
determining a preprocessing method corresponding to the feature as a base exponentiation in response to determining that the feature class of the feature is a time class feature and the feature feedback direction of the feature is positive correlation;
in response to determining that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is a negative correlation, determining that a preprocessing method corresponding to the feature is a logarithmic operation; and
the preprocessing method includes the steps of respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group, and comprises the following steps:
responding to the determined preprocessing method as a base exponentiation, respectively carrying out exponentiation by taking a second preset constant as an index and taking a minimum value and a maximum value of a preset characteristic value of the characteristic and a characteristic value of the characteristic in each task completion data characteristic group as bases, and respectively determining the obtained exponentiation result as a minimum value and a maximum value of the processed preset characteristic value of the characteristic and a processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
And taking the logarithm of the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group respectively by taking the third preset constant as a base in response to the determined preprocessing method, and determining the obtained logarithm operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group respectively.
In some alternative embodiments, each of the task completion data feature groups includes K features; and
the apparatus further comprises:
a generating unit configured to generate a task completion data normalized feature group corresponding to each of the task completion data feature groups by using a normalized feature value of each feature in the task completion data feature groups, and generate a task completion data normalized feature group set by using a task completion data normalized feature group corresponding to each of the task completion data feature groups in the task completion data feature group set;
a calculating unit configured to calculate a correlation coefficient and a P value between any two different features of K features included in each task completion data normalization feature group based on the task completion data normalization feature group set;
A first determining unit configured to determine the number M of task completion data normalization feature groups to be generated according to the number N of task completion data normalization feature groups in the task completion data normalization feature group set, and generate an empty simulation task completion data normalization feature group set;
the second determining unit is configured to determine, for each of the K features, at least two value intervals corresponding to the feature and screening probability corresponding to each value interval;
the simulation unit is configured to execute a simulation task completion data normalization feature group generation operation until the number of simulation task completion data normalization feature groups in the simulation task completion data normalization feature group set is not less than M, and the simulation task completion data normalization feature group generation operation includes: creating a simulation task completion data normalization feature set; according to the screening probability corresponding to each value interval of the 1 st feature in the K features, determining the 1 st feature screening value interval in each value interval of the 1 st feature, and randomly determining a numerical value in the 1 st feature screening value interval as the characteristic value of the 1 st feature in the new simulation task completion data normalization feature group; setting the initial value of the positive integer j as 2; for a j-th feature of the K features, performing a feature value generation operation until j is K, the feature value generation operation including: determining the feature with the largest absolute value of the correlation coefficient between the first j-1 features and the jth feature as the most relevant feature of the jth feature; determining whether an absolute value of a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold and a P-value is less than a preset P-value threshold; in response to determining that the screening value interval of the jth feature is determined according to the screening value interval of the most relevant feature of the jth feature; responding to the determination of the screening probability corresponding to each value interval of the jth feature, and determining the screening value interval of the jth feature in each value interval of the jth feature; randomly determining a value in the screening value interval of the jth feature as the feature value of the jth feature in the new simulation task completion data normalization feature group; and after increasing the value of j by 1, continuing to execute the characteristic value generation operation.
In some optional embodiments, the determining the screening value interval of the jth feature according to the screening value interval of the most relevant feature of the jth feature includes:
in response to determining that a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold, the preset correlation coefficient threshold being a constant greater than zero, determining a screening value interval of the most relevant feature of the jth feature as a screening value interval of the jth feature;
and in response to determining that the correlation coefficient between the jth feature and the jth feature is smaller than the inverse number of the preset correlation coefficient threshold, determining the inverse value interval of the screening value interval of the jth feature in the value interval of the jth feature as the screening value interval of the jth feature, wherein the minimum value and the maximum value corresponding to the inverse value interval of the screening value interval of the jth feature are respectively 1 minus the difference between the maximum value and the minimum value corresponding to the screening value interval of the jth feature.
In some optional embodiments, the apparatus further comprises a boundary sample generation unit configured to, before the performing of the simulation task completion data normalization feature group generation operation until the number of simulation task completion data normalization feature groups in the simulation task completion data normalization feature group set is not less than M:
Determining whether an upper boundary task completion data normalization feature group and a lower boundary task completion data normalization feature group exist in the task completion data normalization feature group set, wherein each feature value in the upper boundary task completion data normalization feature group is 1, and each feature value in the lower boundary task completion data normalization feature group is 0;
generating an upper boundary task completion data normalization feature set and adding the upper boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the upper boundary task completion data normalization feature set does not exist;
and generating a lower boundary task completion data normalization feature set and adding the lower boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the lower boundary task completion data normalization feature set does not exist.
In some alternative embodiments, the at least one feature comprises at least one of: a feature for assessing attention, a feature for assessing self-control, and a feature for assessing conversion power.
In some alternative embodiments, the feature for assessing attention includes at least one of: continuously making standard deviation of the number of questions, continuously making weighted average of the number of questions, longest concentration time and time required for entering the longest concentration.
In some alternative embodiments, the feature for assessing self-control comprises at least one of: correct suppression ratio of inoperable time and disturbed error rate.
In some alternative embodiments, the feature for evaluating the conversion force comprises at least one of: the correctness of the thread switching questions, the correct response time of the thread switching questions, the difference of answering the correctness of the questions under different rules and the difference of responding the different rules.
In a third aspect, embodiments of the present disclosure provide an electronic device, comprising: one or more processors; and a storage device having one or more programs stored thereon, which when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the implementations of the first aspect.
In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements a method as described in any of the implementations of the first aspect.
In order to normalize task completion data features of different types, a normalization method, a device, electronic equipment and a storage medium for a task completion data feature set provided by an embodiment of the present disclosure determine, for each feature included in the task completion data feature set, a normalization method corresponding to the feature according to a feature class of the feature; and normalizing the characteristic value of the characteristic in each task completion data characteristic group based on the characteristic feedback direction of the characteristic, the preset characteristic minimum value and the maximum value according to a normalization method corresponding to the characteristic, so as to obtain the normalized characteristic value of the characteristic in the corresponding task completion data characteristic group. In the feature normalization process, the value range of the feature is considered, the feature feedback direction and the feature type are also considered, and targeted feature normalization is realized.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments made with reference to the following drawings. The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention. In the drawings:
FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present disclosure may be applied;
FIG. 2A is a flow chart of one embodiment of a task completion data feature set normalization method according to the present disclosure;
FIG. 2B is an exploded flow chart of one embodiment of step 2022 according to the present disclosure;
FIG. 2C is an exploded flow chart of one embodiment of step 2023 according to the present disclosure;
FIG. 2D is an exploded flow chart of one embodiment of step 20232 according to the present disclosure;
FIG. 2E is an exploded flow chart of one embodiment of step 20233 according to the present disclosure;
FIG. 3A is a flowchart of yet another embodiment of a task completion data feature set normalization method according to the present disclosure;
FIG. 3B is an exploded flow chart of one embodiment of step 307 according to the present disclosure;
FIG. 3C is an exploded flow chart of one embodiment of step 3074 according to the present disclosure;
FIG. 4 is a schematic structural view of one embodiment of a task completion data feature set normalization device according to the present disclosure;
fig. 5 is a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the task completion data feature set normalization methods, apparatus, electronic devices, and storage media of the present disclosure may be applied.
As shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a cognitive ability assessment class application, a cognitive ability training class application, a short video social class application, an audio-video conference class application, a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a sound collection device (e.g., microphone), a video collection device (e.g., camera) and a display screen, including but not limited to smart phones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-listed terminal apparatuses. Which may be implemented as multiple software or software modules (e.g., to provide a task completion data feature set normalization service) or as a single software or software module. The present invention is not particularly limited herein.
In some cases, the task completion data feature group normalization method provided by the present disclosure may be performed by the terminal device 101, 102, 103, and accordingly, the task completion data feature group normalization apparatus may be provided in the terminal device 101, 102, 103. In this case, the system architecture 100 may not include the server 105.
In some cases, the task completion data feature group normalization method provided by the present disclosure may be performed jointly by the terminal device 101, 102, 103 and the server 105, for example, the step of "acquiring the task completion data feature group set" may be performed by the terminal device 101, 102, 103, the step of "performing normalization operation for each feature included in the task completion data feature group" may be performed by the server 105, and so on. The present disclosure is not limited in this regard. Accordingly, task completion data feature group normalization means may also be provided in the terminal devices 101, 102, 103 and the server 105, respectively.
In some cases, the task completion data feature group normalization method provided by the present disclosure may be executed by the server 105, and accordingly, the task completion data feature group normalization apparatus may also be disposed in the server 105, where the system architecture 100 may also not include the terminal devices 101, 102, 103.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster formed by a plurality of servers, or as a single server. When server 105 is software, it may be implemented as a plurality of software or software modules (e.g., to provide distributed services), or as a single software or software module. The present invention is not particularly limited herein.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2A, there is shown a flow 200 of one embodiment of a task completion data feature set normalization method according to the present disclosure, the task completion data feature set normalization method comprising the steps of:
step 201, a task completion data feature set is obtained.
In this embodiment, the execution body of the task completion data feature group normalization method (e.g., the server shown in fig. 1) may first acquire the task completion data feature group set.
Here, the task completion data feature group may include feature values of at least one feature obtained by feature extraction of task completion data corresponding to the task completion of the subject.
The task completion data may include at least one of: operational behavior data in the process of completing the task by the subject, performance result data of completing the task by the subject and task general data.
The operational behavior data of the subject in completing the task is digitally copied in the whole process of completing the task, and specifically may include log data of the task completion process, for example, which may record various relevant information of each operational behavior of the subject in completing the task. As an example, operational behavior data that may be included in a subject's completion of a task may include: user identification, a checkpoint, a question number, an operation category, an operation object, state update data of the operation object, an operation object attribute, a timestamp, an operation attribute and an operation result. Wherein:
the user identification indicates a subject performing the task.
The checkpoint is used for indicating what kind of gateway the subject currently performs the task.
The question number is used to indicate a specific question in the checkpoint currently being performed by the subject.
The operation category is used to indicate a specific operation of the subject. For example, the operation category may include single click, double click, drag, slide, and the like.
The operation object is used to indicate an object for which a specific operation of the subject is directed. For example, the operation object may be a picture, a button, a prompt area, an effective answer area, an ineffective answer area, a pause button, or the like.
The state update data of the operation object may include, for example, a state change type of the operation object, contents after the state change, a state attribute, and the like.
The operation object attribute may include, for example, a position coordinate where an operation object or a hint information or the like appears.
The time stamp is used for indicating the time stamp corresponding to the specific operation of the subject. For example, when a particular operation of the subject is a click, the timestamp may be a timestamp of the completion of the click by the subject. When a particular operation of the subject is a drag, the time stamp may include a time stamp of the subject's finger pressing and a time stamp of the finger lifting. The operation attribute is used for recording an attribute value corresponding to the operation attribute of the specific operation of the subject. For example, the operational attributes may include location coordinates of the subject click operation, a number or subscript of a particular option in the subject click prompt.
The result of the operation is used to indicate the result of a specific operation by the subject, for example, the result of the operation may be used to indicate whether the answer is correct or incorrect.
The performance result data of the task completed by the subject may include a difficulty level of each subtask (or title) included in the task completed by the subject, a response accuracy of the subject for each subtask, a total number of subtasks (or titles) included in the task completed by the subject, a number of correct subtasks (or titles) included in the subtasks included in the task completed by the subject, a corresponding time period for the task completed by the subject, and the like.
The task general data of the subject to complete the task may include: the time stamp of the task starting and ending time, the completion degree of the task, the browser information of the browser used by the subject to complete the task, the initial input mode corresponding to the task completed by the subject, the task score, the grade of the task completed by the subject, the self-adaptive grade when the task is ended, the task configuration information when the task is completed, and the like. Here, the task configuration information refers to a parameter configuration that may affect task execution difficulty, and may include task difficulty level parameter configuration, for example.
In practice, according to the requirement of specific capability expected to be evaluated by a task, feature extraction is performed by adopting corresponding different feature extraction methods on task completion data to obtain feature values of different features, so that a task completion data feature set is formed.
The task completion data feature group may include feature values of at least one feature obtained by feature extraction of task completion data corresponding to a task completed by the subject.
The tasks corresponding to different task completion data feature sets in the task completion data feature set may be the same task. The subjects corresponding to different task completion data feature groups in the set of task completion data feature groups may be different.
In some alternative embodiments, the at least one feature may include at least one of: a feature for assessing attention, a feature for assessing self-control, and a feature for assessing conversion power.
In some alternative embodiments, the feature for assessing attention may include at least one of: continuously making standard deviation of the number of questions, continuously making weighted average of the number of questions, longest concentration time and time required for entering the longest concentration.
Standard deviation of the number of successive pairs of questions:
the subject may complete multiple topics in succession during the completion of a task. The time of the subjects continuously making the subjects is the concentration time, and if the subjects can enter a continuous making pair with a similar length after being quickly adjusted after each continuous making pair, the subjects can be reflected to have better attention stability in the whole process of completing the task. Therefore, the standard deviation of the number of continuous subjects is smaller, and the subjects can be reflected to have better attention stability.
The standard deviation of the number of continuous question making can be calculated according to the following method:
first, a sequence of the number of questions is continuously made by a subject in completing a task in task completion data is obtained.
Wherein, the continuous pair 2 is 1 pair, the continuous pair 4 is 3 pairs, the continuous pair 1 is then misplaced, and the number of pairs is 0. That is, the number of consecutive subjects is the difference of the number of consecutive subjects minus one.
For example, the sequence of consecutive pairs of topic numbers may be: (3, 4,5, 6) the sequence indicates that the subject continuously makes a question of 4, then makes an error a1, then continuously makes a question of 5, then makes an error a2, then continuously makes a question of 6, then subsequently makes an error a3, then continuously makes a question of 7. Wherein a1, a2 and a3 are positive integers of 1 or more.
Then, the standard deviation of the obtained sequence of the continuous number of questions is calculated, and the characteristic value of the characteristic of the standard deviation of the continuous number of questions is obtained.
Regarding the weighted average of the number of consecutive times of doing the questions:
in the task completion process, the subjects may continuously make a pair for a long time sometimes and continuously make a pair for a short time sometimes, if the evaluation subjects can keep continuously making the concentration ability to the questions, the average value of the number of times of continuously making the questions cannot be completely calculated, the strongest ability which the subjects can actually reach is ignored, and the weighted average mode can better reflect the concentration level of the subjects in the whole course.
The method for calculating the weighted average of the number of continuous questions is as follows:
firstly, obtaining a sequence of continuous question number: (,/>,…,/>, …,/>) Where n is the number of elements in the sequence of successive pairs of questions, i is a positive integer between 1 and n, ">Subjects were presented with the number of successive subjects.
Then, a weighted average of the number of continuous questions is calculated according to the following formula:
wherein,number of consecutive subjects for the ith time +.>Weight of->And continuously carrying out weighted average on the number of questions for the calculated continuous.
Regarding the longest concentration time period and the time period required to enter the longest concentration:
the longest period of continuous doing of the subject in the whole task completion process corresponds to the time length of the title, namely the longest attention duration, and the strongest level of the attention duration can be reflected. If the longer the longest time of interest, the strongest level of the duration of attention of the subject is indicated to be higher.
The subject may be increasingly familiar with task requirements by continuously adjusting from the beginning of the task, and entering the longest concentration time reflects the subject's readiness for adjustment.
The calculation method of the longest concentration time is as follows:
The sequence of the number of times of the continuous question is adopted: (,/>,…,/>, …,/>) For example, the duration of the ending time minus the starting time of each successive segment of the question is (+.>,/>,…,/>, …,/>)。
,/>,…,/>, …,/>
The maximum continuous question number in the task completion process of the subject is>Then do the question number +.>The corresponding duration, and thus the longest duration of interest.
And the longest concentration is the firstThe segment continues for a start time corresponding to the title.
In some alternative embodiments, the features for assessing self-control may include at least one of: correct suppression ratio of inoperable time and disturbed error rate.
Wherein, in the period t in which the operation is impossible, the number of times of the stimulus is presented is A, and the number of times the subject clicks the stimulus in the period in which the operation is impossible is B. The correct suppression ratio r for the inoperable time can then be calculated as follows:
wherein,for the calculated correct inhibition ratio of the inoperable time, +.>The larger the instruction, the stronger the control capability.
For ease of understanding, the following is illustrative:
for example, the task process is divided into two time periods: an operable period and a non-operable period. In both different time periods, a bomb in the attack castle (i.e., stimulus) appears in the user interface, which may be prevented if the subject clicks on the bomb. However, during the operational period, the subject is supposed to click on the bomb to prevent the attack. While during periods of inoperability there may be a signal prompt (e.g., a red light shaped icon flashing) in the user interface. The subject should not do anything during this time, and even if there is a bomb in the offending castle, the impulse should be suppressed from doing anything, if the bomb is clicked, indicating that it is not in control. However, during the period of inoperability, the same stimuli (e.g., bombs) will still appear during the period of inoperability, and the subject should not operate on those stimuli (e.g., bombs) according to the rules (e.g., click on the bombs). Thus, during an inoperable period, if the subject performs an operation (e.g., clicks on a projectile) during the operable period, this indicates that the subject is not suppressing the impulse, indicating that the control is weak. Conversely, if during the period of inoperability, if the subject does not perform an operation during the period of inoperability (e.g., clicking on a projectile), this indicates that the subject is suppressing the impulse, indicating that the control is greater.
Thus, in the formulaThe number of times that the impulse did not click on the stimulus was suppressed for the subject during the period of inoperability. Visible, the->The larger the instruction the stronger the control capability, the two positively correlated.
Regarding the interfered error rate, assuming that the number of questions in which the direction of the surrounding interfering object and the correctly selected direction are not identical in the task is C, and the number of questions in which the subject is affected by the surrounding interfering object and the incorrectly selected direction is D, the interfered error rate may be calculated according to the following formula:
is the calculated disturbed error rate. />The larger the indication the worse the autonomous capability, i.e. the negative correlation between the disturbed error rate and the autonomous capability.
For ease of understanding, the following is illustrative: the task requirement is to select one of the up, down, left, and right according to the direction of the middle small fish-shaped icon presented in the user interface, with the title that the direction of the surrounding small fish icon is identical to the direction of the middle small fish icon, and the title that the direction of the surrounding small fish icon is not identical to the direction of the middle small fish icon. In the event of interference of surrounding small fish icons, if the direction of the small fish icon selected by the subject is consistent with the direction of the middle small fish icon, an answer to the question is indicated. Here, the surrounding small fish icon is an interfering object.
In some alternative embodiments, the feature for evaluating the conversion force may include at least one of: the correctness of the thread switching questions, the correct response time of the thread switching questions, the difference of answering the correctness of the questions under different rules and the difference of responding the different rules.
Accuracy of the topic for thread switch:
in the process of continuously selecting the class task, a rule for judging whether the task requirement (or called answer to the question or select to the question) is met is continuously switched, and the rule switching is called clue switching. For example, the previous question is an option which has the same color as the description object of the stem text and is selected from the options, and the current question is an option which has the same meaning as the stem text. That is, the stems are the same for different topics, but the selection targets are changed according to the rules of the task requirements of different topics. The two adjacent topic rule changes are called one thread switch. And answering the two questions before and after the thread switching, and considering that the thread switching is answered correctly. In the whole task process, the total number of times of thread switching is E, the number of times of thread switching response is F, and the accuracy of the thread switching questions can be calculated by adopting the following formula:
That is, if the subject is still able to answer correctly after the rule changes (i.e., thread switches), this indicates that the subject has better switching capability.The larger the transition capability is, the stronger the transition capability is, i.e., the positive correlation between the correctness of the thread switching topic and the transition capability.
Correct response time for thread switch questions:
the meaning of thread switching has been described above, and among the two questions before and after the thread switching, the subject makes the two questions before and after the thread switching, that is, the thread switching is answered correctly. And the correct response time for switching the title after the thread refers to the time period from the appearance of the title after the thread is switched to the feedback made by the subject. The correct response time of the thread switching questions refers to the average value of the correct response time of the questions after the thread switching answers correctly correspond to the thread switching questions in the whole task completion process of the subject. The correct response time of the thread switching questions is whether the subject can quickly and timely make correct response when facing the thread switching, so that the conversion capability of the subject is reflected. It will be appreciated that the correct reaction time of the thread switch topic is inversely related to the switching capability.
Regarding the difference of question correctness by answers under different rules:
in the task process of cable switching class, it is assumed that R is needed in the whole course 1 、R 2 Switching back and forth between the two rules, while the subject is at R 1 The correct rate of the reaction under the rule is C 1 At R 2 The correct rate of the reaction under the rule is C 2 While the answer under different rules is different to the question accuracyIs C 1 And C 2 The absolute value of the difference is expressed as follows:
if it isThe value is relatively large, indicated at R 1 、R 2 Of the two rules, the subject has a reactive advantage to one of the rules and a reactive disadvantage to the other rule, and thus the subject has poor switching ability. On the contrary, if->Smaller values indicate greater conversion capacity in the subject. Namely, the negative phase between the difference of the question accuracy rate of the answers under different rules and the conversion capabilityAnd (3) closing.
Differences in response to different rules:
based on the above, if the subject is in R during the task completion 1 The correct response time for the thread switch subject under rules, that is, the subject is at R 1 Average time for correct reaction under rule T 1 At R 2 Average time for correct reaction under rule T 2 Different regular reaction time differencesThe calculation can be performed according to the following formula:
If it isThe larger value indicates that the subject is at R 1 、R 2 Of the two rules, one of the rules reacts faster and the other one reacts slower with relatively poorer conversion capability. If->Smaller values indicate that the subject is at R 1 、R 2 The reaction speed of the two rules is not greatly different, and the conversion capability is relatively strong.
Step 202, for each feature included in the task completion data feature set, performing a normalization operation.
Here, it is assumed that the task completion data feature group includes K features, K being a positive integer.
In this embodiment, the execution body may perform, for each of the K features F k And performing normalization operation. K is a positive integer between 1 and K.
Here the normalization operation may include the following steps 2021 to 2023:
in step 2021, the feature class, the feature feedback direction, the preset feature minimum value and the preset feature maximum value of the feature are acquired.
I.e. here the feature F is obtained k The characteristic category, the characteristic feedback direction, the preset characteristic minimum value and the preset characteristic maximum value.
Here, different feature classes may be assigned to each feature in advance according to a specific situation of each feature in the K features, and each feature class may have a corresponding normalization method.
Feature F k The characteristic feedback direction is used for representing the correlation direction between the characteristic value of the characteristic and the degree of the capability of completing the task, and the characteristic feedback direction is positive correlation or negative correlation. That is, if the feature feedback direction is positive, the task completion data characterizes feature F in the data set k The larger the eigenvalue of (i) indicates that the more the subject is capable of completing the task corresponding to the task (e.g., the corresponding task is used to evaluate memory).
Feature F k The minimum and maximum values of the preset features of (a) can be the minimum and maximum values of a preset value range according to actual needs. In practice, feature F in the data feature set collection may be completed according to the task k And reasonably appointing the minimum value and the maximum value according to the characteristic value distribution characteristics of the task and the task setting of the task. By way of example, assume feature F k For the number of consecutive topics, the corresponding minimum value is 0, but the maximum value is not the feature F in the task completion data feature group set k It is necessary to see how much the maximum number of pairs of topics can be reached in connection with the task setting so that all the worst to best cases will be covered. As for the reaction time index of the click operation, it is known from experience that it is currently known that even the players of the professional race react for only about 100 ms at the fastest time, but that even 6 seconds of reaction are slow to the average person, and that the minimum value and the maximum value of this feature at the time of the click operation reaction can be set to 100 ms and 6 seconds, respectively.
Step 2022, determining a normalization method corresponding to the feature according to the feature class of the feature.
Since each feature is assigned in advance to a different feature class, each feature class canWith corresponding normalization methods. Thus, here can be based on the feature F k Feature class determination feature F k Corresponding normalization method.
Step 2023, normalizing the feature value of the feature in each task completion data feature set based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value according to the normalization method corresponding to the feature, to obtain the normalized feature value of the feature in the corresponding task completion data feature set.
In the normalization method corresponding to different feature classes, except the feature F is considered k In addition to the preset feature minimum and maximum values of (2), feature F is also considered k The characteristic feedback direction of (a) is normalized more specifically.
In some alternative embodiments, the feature classes may include ratio class features, time class features, and other class features. Accordingly, step 2022 may include step 20221 and step 20222 as shown in fig. 2B:
in response to determining that the feature class of the feature is a ratio class feature or a time class feature, a normalization method corresponding to the feature is determined as post-preprocessing normalization, step 20221.
Here, the ratio class feature may be, for example, a feature such as a correct ratio that relates to a ratio between two values. The applicant finds that the ratio type features have the phenomenon of right deviation through practical research, namely the ratio of the ratio type features in most of task completion data feature groups is relatively high, and in order to better distinguish the difference between the different ratio features, namely the difference between the higher feature values in the ratio type features is enlarged, the ratio type features need to be preprocessed, namely the difference between the higher feature values in the ratio type features is enlarged, and then conventional normalization is carried out. And calculating distribution bias for the data set distribution after conventional normalization after the difference of the higher-value region in the ratio-class characteristics is increased, if the absolute value of the distribution bias calculated by the data set distribution after conventional normalization after the difference of the higher-value region in the ratio-class characteristics is increased is smaller, the distribution bias is improved, the characteristic after conventional normalization after the difference of the higher-value region in the ratio-class characteristics is increased can be adopted, otherwise, the distribution bias is not improved, and the conventional normalization characteristic is adopted.
The time-class feature is a time-dependent feature. Such as reaction time, etc. The applicant finds that if the time class features are directly subjected to conventional normalization, the feature value range interval of the time class features is wider, so that the equidistant distinction is not proper. For example, for a feature in which the feature feedback direction is inversely related, such as for a reaction time, the difference between 100 ms and 110 ms and the difference between 5000 ms and 5010 ms are both 10 ms, but for the same task, the reaction time is increased from 110 ms to 100 ms and from 5010 ms to 5000 ms, both of which are completely different in terms of subject's ability, and obviously the former is more difficult and the ability is more demanding. Thus, the gap between 100 milliseconds and 110 milliseconds should be pulled large, as it is more difficult to lift from 110 milliseconds to 100 milliseconds. While for features where the feature feedback direction is positively correlated, for example, for which the time to maintain concentration is a feature, the increase in maintenance concentration from 100 milliseconds to 110 milliseconds and the increase from 5000 milliseconds to 5010 milliseconds are quite different, the latter is obviously more difficult and the requirement on the ability of the subject is higher. Thus, the gap between 5000 milliseconds and 5010 milliseconds should be enlarged, as it is more difficult to maintain concentration for a period of time that increases from 5000 milliseconds to 5010 milliseconds. Therefore, the time-class characteristics can be preprocessed, namely, the difference between parts with higher requirements on the capability of the subject in the time-class characteristics is pertinently enlarged according to the characteristic feedback direction, and then the conventional normalization is carried out. And calculate the distribution bias for the data set distribution after conventional normalization after the difference between the parts with higher requirements on the subject's ability in the time class feature of the pull-up, if the absolute value of the distribution bias for the data set distribution after conventional normalization after the difference between the parts with higher requirements on the subject's ability in the time class feature of the pull-up is smaller, it indicates that the distribution bias is improved, then the feature after conventional normalization after the difference between the parts with higher requirements on the subject's ability in the time class feature of the pull-up can be adopted, otherwise, it indicates that the distribution bias is not improved, then the feature of conventional normalization is adopted.
In response to determining that the feature class of the feature is other class of features, a normalization method corresponding to the feature is determined to be conventional normalization, step 20222.
Based on the above-mentioned optional embodiment of classifying the feature class into the ratio class feature, the time class feature and the other class feature in step 2022, step 2023, according to the normalization method corresponding to the feature, normalizes the feature value of the feature in each task completion data feature group based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value, to obtain the normalized feature value of the feature in the corresponding task completion data feature group, may include step 20231 shown in fig. 2C:
step 20231, in response to determining that the normalization method corresponding to the feature is conventional normalization, substituting, for each task completion data feature set, the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the maximum value in the task completion data feature set into conventional normalization formulas, respectively、/>、/>And->And outputting the conventional normalization result +_of the substituted conventional normalization formula>Determining normalized feature values for the feature in the set of task completion data features:
The conventional normalization formula is as follows:
wherein,and outputting a conventional normalization result for the conventional normalization formula.
It can be seen that the above conventional normalization formula reflects the characteristic feedback direction, and the final conventional normalization result is obtainedThe degree of capability of completing the task is positively correlated, and the value is between 0 and 1. Further, for each feature F of the K features k The conventional normalization results after the conventional normalization are positively correlated with the capacity degree of the completed task, and the conventional normalization results of each feature can also intuitively reflect the capacity degree of the completed task.
Through step 20231, the normalized feature values of the feature in the corresponding task completion data feature set may be obtained, and then step 2023 is performed completely, that is, step 202 is performed completely.
Optionally, step 2023 may also include the following steps 20232 to 20239 as shown in fig. 2C:
in response to determining that the normalization method corresponding to the feature is post-preprocessing normalization, a preprocessing method corresponding to the feature is determined according to the feature class of the feature, step 20232.
In practice, corresponding preprocessing methods can be preset according to the characteristic data distribution characteristics of different characteristic categories, so that the characteristic data of the characteristic categories are distributed more uniformly, and the capability degree reflected by the completion task can be basically reflected in different data intervals in an average mode.
For example, as is clear from the description in step 20221, the ratio-based feature or the time-based feature is different in the feature data distribution characteristics, and the corresponding preprocessing method is also different. The preprocessing method corresponding to the feature can be determined according to the corresponding relation between the preset feature class and the preprocessing method and the feature class of the feature.
After the execution of step 20232, the process goes to step 20233.
Step 20233, preprocessing the minimum and maximum values of the preset feature values of the feature and the feature values of the feature in the feature group of the task completion data respectively according to the determined preprocessing method, to obtain the minimum and maximum values of the processed preset feature values of the feature and the processed feature values of the feature in the feature group of the corresponding task completion data.
Step 20233 is followed for feature F k Can obtain the characteristic F k Minimum and maximum values of the processed preset feature values and features F in the feature group of the data of each task completion k Is used for processing the characteristic value after processing.
Optionally, step 20232 may include step 202321 as shown in fig. 2D:
in step 202321, in response to determining that the feature class of the feature is a ratio class feature, the preprocessing method corresponding to the feature is determined to be an exponentiation.
Step 20233 may also include step 202331 as shown in fig. 2E, accordingly:
and 202331, performing exponentiation by taking the minimum and maximum values of the preset characteristic values of the characteristic and the characteristic values of the characteristic in the data characteristic group of each task completion as exponentiations respectively based on the first preset constant in response to the determined preprocessing method as an exponent exponentiation, and determining the obtained exponentiation result as the minimum and maximum values of the processed preset characteristic values of the characteristic and the processed characteristic values of the characteristic in the data characteristic group of the corresponding task completion respectively.
Here, the first preset constant is a constant greater than 1. Furthermore, taking the first preset constant as a base, and taking the characteristic value of the characteristic in each task completion data characteristic group as an exponent to carry out power operation, the obtained processed characteristic value of the characteristic in each task completion data characteristic group can enlarge the difference of a higher section of the characteristic value relative to the value of the processed characteristic value before power operation, and further, the difference of the higher section of the characteristic value in the ratio class characteristic is enlarged.
Alternatively, the first preset constant may be a natural constant e.
Optionally, step 20232 may include the following steps 202322 and 202323 shown in fig. 2D:
In step 202322, in response to determining that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is a positive correlation, a preprocessing method corresponding to the feature is determined to be a base power operation.
In step 202323, in response to determining that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is a negative correlation, determining that the preprocessing method corresponding to the feature is a logarithmic operation.
Step 20233 may also include steps 202332 and 202333 as follows:
and 202332, performing power operation by taking the determined preprocessing method as a base power operation and taking a second preset constant as an index, respectively taking the minimum and the maximum of the preset characteristic value of the characteristic and the characteristic value of the characteristic in the data characteristic group of each task completion as the base, and respectively determining the obtained power operation result as the minimum and the maximum of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the data characteristic group of the corresponding task completion.
Here, the second preset constant is a constant greater than 1. Further, in the case that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is a positive correlation, the greater the feature value of the feature, the greater the degree of correspondingly completing the corresponding performance of the task, for example, the longer the period of attention is for the feature, the greater the period of attention is for the performance of the feature, and the greater the period of attention is for the portion of the higher-value region of the feature, the greater the distance is further to be pulled, because, as described above, the period of attention is raised from 100 milliseconds to 110 milliseconds, and the period of attention is raised from 5000 milliseconds to 5010 milliseconds, and then the greater the attention requirement of the subject is displayed. And by taking the second preset constant as an index, and taking the characteristic value of the characteristic in each task completion data characteristic group as a base to carry out power operation, the obtained processed characteristic value of the characteristic in each task completion data characteristic group can enlarge the difference between parts with higher requirements on the capacity of a subject, namely enlarge the difference between parts with higher values of the characteristic value, compared with the value of the processed characteristic value before power operation.
As an example, the second preset constant may be 2.
In order to achieve the following conventional normalization, the minimum value and the maximum value of the preset feature value of the feature and the feature value of the feature in the feature group of the task completion data in the base power operation in step 202332 may be converted into values in seconds.
In step 202333, in response to the determined preprocessing method being logarithmic, taking the minimum and maximum values of the preset feature values of the feature and the feature values of the feature in the task completion data feature group respectively by taking the third preset constant as a base, and determining the obtained logarithmic operation results as the minimum and maximum values of the processed preset feature values of the feature and the processed feature values of the feature in the corresponding task completion data feature group respectively.
Here, the third preset constant is a constant greater than 1. Further, in the case that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is inversely related, the greater the feature value of the feature, the lower the degree of correspondingly implementing the capability corresponding to the task, i.e. the longer the time, for example, the feature of the reaction duration, the capability for implementing may also be attention, the shorter the reaction duration indicates that the higher the attention capability, and further the distance is to be pulled for the lower part of the feature value of the feature, because the reaction duration is raised from 100 milliseconds to 110 milliseconds, and the reaction duration is raised from 5000 milliseconds to 5010 milliseconds, as described above, the higher the attention capability requirement of the former on the subject is obviously. By adopting the logarithm operation result of taking the logarithm of the characteristic value of the characteristic in the characteristic group of the data of the completion of each task by taking the third preset constant as the base, the difference between the parts with higher requirements on the capacity of the subject can be enlarged compared with the value of the characteristic value before taking the logarithm after processing, namely the difference between the parts with lower value of the characteristic value is enlarged.
As an example, the third preset constant may be 2 or 10.
After step 20233 is performed, the process goes to step 20234.
Step 20234, for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the maximum value in the task completion data feature group into the conventional normalization formula、/>、/>And->And outputting the conventional normalization result of the substituted conventional normalization formulaDetermining a conventional normalized feature value for the feature in the task completion data feature set; substituting the processed characteristic value of the characteristic, the characteristic feedback direction of the characteristic, the preset processed characteristic minimum value and the preset processed characteristic maximum value in the task completion data characteristic group into a conventional normalization formula respectively>、/>、/>And->And outputting the conventional normalization result +_of the substituted conventional normalization formula>Is determined to be any one ofNormalizing the feature values after processing the features in the data feature group.
That is, step 20234 is performed on feature F k For each task completion data feature set, the feature F is calculated separately k And the processed normalized feature values. Wherein, feature F k The conventional normalized feature value is obtained by integrating the feature F in the data feature set for each task k Characteristic value of (F) k Characteristic feedback direction, characteristic F of (2) k Respectively substituting the preset feature minimum value and the preset feature maximum value of the formula into a conventional normalization formula、/>、/>And->The obtained product. And feature F k By integrating the features F in the data feature set for each task k Processed feature value, feature F of (2) k Characteristic feedback direction, characteristic F of (2) k The minimum value and the maximum value of the characteristic after the preset treatment are respectively substituted into +.>、/>、/>And->The obtained product.
After the execution of step 20234, the process goes to step 20235 or step 20236.
Step 20235 calculates a distribution bias based on the distribution of the normal normalized feature values for the feature in the set of task completion data features, resulting in a normal normalized distribution bias for the feature.
And 20236, calculating distribution bias based on the distribution of the processed normalized feature values of the feature in the feature group of the data of each task, and obtaining the preprocessed normalized distribution bias of the feature.
Here, step 20235 may be performed first and then step 20236 may be performed, or step 20236 may be performed first and then step 20235 may be performed.
Through steps 20235 and 20236, for feature F k Can obtain the characteristic F k The conventional normalized distribution bias state and the normalized distribution bias state after pretreatment.
After steps 20235 and 20236 are performed, it then goes to step 20237 to perform.
Step 20237 determines whether the absolute value of the pre-processed normalized distribution bias for the feature is less than the conventional normalized distribution bias for the feature.
If the determination is less, it indicates that the feature (e.g., feature F k ) The distribution bias of (c) is improved and the process proceeds to step 20238.
If the determination is not less, it is indicated that the feature (e.g., feature F k ) Without improvement in the distribution bias of (c), then execution proceeds to step 20239.
Step 20238 determines the processed normalized feature values for the feature in each task completion data feature set as the normalized feature values for the feature in the task completion data feature set.
Step 20239, determining the conventional normalized feature value of the feature in each task completion data feature group as the normalized feature value of the feature in the task completion data feature group.
After steps 20237 to 20239, the method may be performed on the feature (e.g., feature F k ) After the preprocessing operation of step 20233, the distribution bias is improved by employing the feature (e.g., feature F k ) Is normalized after the treatment of (a)The eigenvalue is normalized as a normalized eigenvalue of the feature. Conversely, in this feature (e.g., feature F k ) After the preprocessing operation of step 20233, the distribution bias is not improved, and the feature (e.g., feature F k ) As a normalized feature value for the feature. Furthermore, the distribution of the finally obtained normalized characteristic values of the characteristics is more average, and the degree of the capability of completing the task can be intuitively reflected.
The normalization method of the task completion data feature group provided by the embodiment of the present disclosure includes determining, for each feature included in the task completion data feature group, a normalization method corresponding to the feature according to a feature class of the feature; and normalizing the characteristic value of the characteristic in each task completion data characteristic group based on the characteristic feedback direction of the characteristic, the preset characteristic minimum value and the maximum value according to a normalization method corresponding to the characteristic, so as to obtain the normalized characteristic value of the characteristic in the corresponding task completion data characteristic group. In the feature normalization process, the value range of the feature is considered, the feature feedback direction and the feature type are also considered, and targeted feature normalization is realized.
With continued reference to fig. 3A, a flow 300 of yet another embodiment of a task completion data feature set normalization method according to the present disclosure is shown. The task completion data feature set normalization method comprises the following steps:
step 301, a task completion data feature set is obtained.
Step 302, for each feature included in the task completion data feature set, a normalization operation is performed.
In this embodiment, the specific operations and the technical effects of steps 301 and 302 are substantially the same as those of steps 201 and 202 in the embodiment shown in fig. 2A, and are not described herein.
Here, it is assumed that the set of task completion data feature groups includes N task completion data feature groups, N being a positive integer, each task completion data feature group including K different features, each feature being defined by F k Meaning that K is an integer between 1 and K.
Step 301 and step 302 may be performed to obtain each feature F in each task completion data feature set of the N task completion data feature sets k Is used for the normalization feature value of (a).
Although steps 301 and 302 are followed, a set of data characteristics F for each of the tasks are obtained k In practice, in order to obtain the task completion data feature set obtained in step 301, task completion data of a subject is first obtained, and then multidimensional features are extracted from the task completion data of the subject to obtain a corresponding task completion data feature set. However, various costs are required for acquiring the task completion data of the subject, such as providing corresponding conditions to acquire the matching of the subject, completing the corresponding task and acquiring the corresponding data. In particular, in practice, when the subject is adolescent, coordination and support by the adolescent's parents may also be required. Therefore, the number of task completion data per se is limited, and the feature F may not be covered k All things being between the characteristic minimum and maximum. Here, the task completion data normalized feature group actually formed by the normalized feature values of the features in the task completion data feature group formed by actually completing the task by the subject is simply referred to as a real sample. For this purpose, a simulated sample can be generated on the basis of the real sample to meet the needs of covering various situations.
That is, on the basis of each task completion data normalization feature set, some simulation task completion data normalization feature sets are generated, and the generated simulation task completion data normalization feature sets are referred to herein as simulation samples. Because the generated simulation sample is added, namely the real sample is up-sampled, so that a sample set formed by combining the real sample and the simulation sample is not influenced by the distribution of the existing real sample, namely the distribution of the existing real sample is not influenced by normal mode, and the capability of completing the task is evaluated based on the real sample and the simulation sample more objectively and absolutely.
To enable the generation of analog samples, i.e. to enable up-sampling of real samples, the above-described execution body may continue to perform the following steps 303 to 307:
Step 303, generating a task completion data normalized feature group corresponding to the task completion data feature group by using the normalized feature values of the features in each task completion data feature group, and generating a task completion data normalized feature group set by using the task completion data normalized feature group corresponding to each task completion data feature group in the task completion data feature group set.
A task completion data normalization feature set including N task completion data normalization feature sets may be obtained through step 303, where each task completion data normalization feature set includes K features, and a feature value of each feature is between 0 and 1.
Step 304, calculating a correlation coefficient and a P value between any two different features of the K features included in each task completion data normalization feature group based on the task completion data normalization feature group set.
Here, a correlation coefficient calculation method between various different features may be employed, which is not particularly limited in this disclosure, and for example, the correlation coefficient here may be a Spearman correlation coefficient.
Step 305, determining the number M of the normalized feature groups of the task completion data to be generated according to the number N of the normalized feature groups of the task completion data in the normalized feature group set of the task completion data, and generating an empty normalized feature group set of the simulation task completion data.
That is, the purpose of step 305 is to determine the number of simulated samples from the number of real samples. In practice, various implementations may be employed such that the number M of analog samples is less than the number N of real samples. The characteristics of the original real sample are kept, so that the simulated sample size cannot exceed the real sample size, and the distribution of the samples is enriched and meanwhile, the samples cannot be crowded.
For example, M may be equal to N multiplied by a first preset ratio, the first preset ratio being greater than 0 and less than 1. For example, the first preset ratio may be greater than or equal to 0.1 and less than or equal to 0.5, e.g., the first preset ratio is 0.1.
Note that the order of execution between step 304 and step 305 is not particularly limited herein.
Step 306, for each of the K features, determining at least two value intervals corresponding to the feature and a screening probability corresponding to each value interval.
Here, each of the K features F included in the feature group may be normalized for the task completion data k Determining feature F k And K is a positive integer between 1 and K, and the screening probability corresponds to at least two corresponding value intervals and each value interval.
In some alternative embodiments, step 306 may include steps 3061 and 3062 as follows:
Step 3061, dividing the value range of the feature into at least two value intervals.
Suppose for feature F k For the sake of description, feature F k The value range of (2) is divided into V k And a value interval.
As an example, for each feature F k All divided into the same number of value intervals, i.e. for different k, V k May be identical. For example, V k May be 5.
Also, for example, for different features F k Divided into different numbers of intervals, i.e. for different k, V k May be different. For example, V 1 May be 5, and V 2 And 3, etc., which is not particularly limited by the present disclosure.
In the process of setting feature F k The value range of (2) is divided into V k In the case of each value interval, the value interval between 0 and 1 can be divided into V by average or basic average k And a value interval. For example, V 1 When equal to 5, the range between 0 and 1 is divided into the following value ranges: [0, 0.2]、(0.2, 0.4]、(0.4, 0.6]、(0.6, 0.8](0.8, 1)]. Or dividing the value interval between 0 and 1 into V according to a preset dividing mode k The number of the value intervals may be divided, for example, according to a principle that the value range in the middle is large and the value ranges in the two sides are small. The present disclosure does not specifically limit this. For example, V 1 When equal to 5, the range between 0 and 1 is divided into the following value ranges: [0, 0.1 ]、(0.1, 0.25]、(0.25, 0.65]、(0.65, 0.9](0.9, 1)]。
Step 3062, determining a screening probability corresponding to each value interval of the feature.
Here, the feature F may be determined in various implementations k Screening probability corresponding to each value interval. For example, it is possible for feature F k V of (2) k For each screening region in each value interval, the corresponding screening probability is set on average, for example, 1 can be divided by V k The ratio of (2) is taken as the screening probability of each value interval. Or, the screening probability of each value interval can be set according to a preset screening probability setting mode. For example, the screening probability of each value interval may be set according to the principle that the probability of the value interval on both sides is large and the probability of the value interval on the middle is small. For example, when V 1 When the value is equal to 5, and the value interval is as follows: [0, 0.2]、(0.2, 0.4]、(0.4, 0.6]、(0.6, 0.8](0.8, 1)]In this case, the screening probabilities corresponding to the value intervals may be sequentially and respectively: 0.35, 0.1 and 0.35. That is, more in the value interval near the lower boundary, that is, near 0, and more likelihood, analog samples are generated in the value interval near 1, so as to achieve coverage of more sample distribution likelihood.
Alternatively, the screening probability corresponding to each value interval of the feature may be set randomly, and only the sum of the screening probabilities corresponding to each value interval of the feature is required to be 1. Thus, the effect of making the distribution of the simulation samples more uniform can also be achieved.
And step 307, executing the generation operation of the normalized feature groups of the simulation task completion data until the number of the normalized feature groups of the simulation task completion data in the normalized feature group set of the simulation task completion data is not less than M.
It will be appreciated that the purpose of step 307 is here to include M simulated task completion data normalization feature sets in the simulated task completion data normalization feature set.
Here, the simulation task completion data normalization feature group generation operation may include steps 3071 to 3074 as shown in fig. 3B:
step 3071, creating a simulation task completion data normalization feature set.
Step 3072, determining a screening value interval of the 1 st feature in the value intervals of the 1 st feature according to the screening probability corresponding to the value intervals of the 1 st feature in the K features, and randomly selecting a number value in the screening value interval of the 1 st feature as the feature value of the 1 st feature in the new simulation task completion data normalization feature group.
Step 3073, the initial value of positive integer j is set to 2.
In step 3074, for the j-th feature of the K features, the feature value generation operation is performed until j is K.
Here, the feature value generation operation may include steps 30741 to 30745 as shown in fig. 3C:
Step 30741, determining the feature with the largest absolute value of the correlation coefficient between the first j-1 features and the jth feature as the most relevant feature of the jth feature.
Since the above-described step 304 has already been performed based on the task completion data normalized feature group set, the correlation coefficient and the P value between any two different features among the K features included in each task completion data normalized feature group are calculated, where the correlation coefficient between any two different features among the K features obtained by the above-described calculation may be used, and among the first j-1 features among the K features, the j-th feature F will be used j The feature with the largest absolute value of the correlation coefficient is determined as the j-th feature F j Is the most relevant feature of (a). Here, it can be assumed that the j-th feature F j The most relevant feature of (a) is the ith feature F i And i is different from j.
Step 30742, determining whether an absolute value of a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold and a P-value is less than a preset P-value threshold.
If the determination is yes, i.e. feature F j And feature F j Most relevant feature F of (2) i Correlation coefficient betweenC i,j The absolute value of (2) is greater than a preset correlation coefficient threshold T C And P value P i,j Less than a preset P value threshold T P Indicating feature F j And feature F j Strong correlation between them, then in generating feature F j Can refer to the feature F which has generated the feature value i At this point, the process may proceed to step 30743A.
Here, the correlation coefficient threshold T is preset C And a preset P value threshold T P Are constants greater than zero. As an example, a correlation coefficient threshold T is preset C May be 0.3, preset a P value threshold T P May be 0.05.
If no, i.e. feature F j And feature F j Most relevant feature F of (2) i Correlation coefficient C between i,j The absolute value of (a) is not greater than a preset correlation coefficient threshold T C Or P value P i,j Not less than a preset P value threshold T P Indicating feature F j And feature F j Uncorrelated or weakly correlated, then in generating feature F j Can be in the characteristic F when the characteristic value of (a) j Randomly determining a screening value interval in the value interval of the self-body without referring to the characteristic F of the generated characteristic value i At this point, the process may proceed to step 30743B.
Step 30743A, determining the screening value interval of the jth feature according to the screening value interval of the jth feature which is the most relevant feature.
The execution body may determine the feature F in step 30742 j And feature F j In the case of strong correlation between them, in determining the characteristic F j Can refer to the characteristic F of the screening value interval determined i The screening and value interval of (2) can be selected according to F j Most relevant feature F of (2) i The screening value interval of the j-th feature is determined.
Alternatively, step 30743A may include steps 30743A1 and 30743A2 as follows:
step 30743A1, in response to determining that the correlation coefficient between the jth feature and the most relevant feature of the jth feature is greater than the preset correlation coefficient threshold, determining the screening value interval of the most relevant feature of the jth feature as the screening value interval of the jth feature.
Here, if the feature F j And feature F j Most relevant feature F of (2) i Correlation coefficient C between i,j Is greater than a preset correlation coefficient threshold T C And P value P i,j Less than a preset P value threshold T P That is, if C i,j > T C And P is i,j < T P Indicating feature F j And feature F i Is strongly correlated and both are positively correlated, thus for feature F j Can select and select the characteristic F i The same screening interval, i.e. the feature F j Most relevant feature F of (2) i Is determined as the characteristic F j Is a screening value interval of the (a).
Step 30743A2, in response to determining that the correlation coefficient between the jth feature and the most relevant feature of the jth feature is less than the opposite number of the preset correlation coefficient threshold, determining the opposite value interval of the screening value interval of the most relevant feature of the jth feature in the value interval of the jth feature as the screening value interval of the jth feature.
Here, if the feature F j And feature F j Most relevant feature F of (2) i Correlation coefficient C between i,j Is smaller than a preset correlation coefficient threshold T C Opposite number and P value P i,j Less than a preset P value threshold T P That is, if C i,j < -T C And P is i,j < T P Indicating feature F j And feature F i Is strongly correlated and both are negatively correlated, thus, for feature F j Feature F can be selected i As the characteristic F, the opposite value interval of the screening value interval of (a) j Is a screening value interval of the (a). Here, feature F i The minimum and maximum values corresponding to the opposite value intervals of the screening value interval are respectively 1 minus the characteristic F i The difference between the maximum value and the minimum value corresponding to the value interval is selected. For ease of understanding, the following is illustrative:
assume that a featureF i The screening and value-taking interval of (1)Then feature F i Screening value interval of (1)The opposite interval of value of (2) may be [ ], or ]>. For example, feature F i The screening value interval of (2) is [ -je ]>Then feature F i Is a screening value interval [ -j ]>The opposite interval of value of (2) may be [ ], or ]>
Suppose that feature F i The screening and value-taking interval of the filter is%Then feature F i The screening value interval of [ ]The opposite interval of value of (2) may be [ ], or ]>). For example, feature F i The screening value interval of (a) is (/ -)>Then feature F i Screening interval (/ -) >The opposite interval of value of (2) may be [ ], or ]>)。
Suppose that feature F i The screening and value-taking interval of the filter is%) Then feature F i The screening value interval of [ ]) The opposite interval of the value of (2) may be (+)>). For example, feature F i The screening value interval of (a) is (/ -)>) Then feature F i Screening interval (/ -)>) The opposite interval of the value of (2) may be (+)>)。
Suppose that feature F i The screening and value interval of (2) is) Then feature F i Screening value interval of (1)) The opposite interval of the value of (2) may be (+)>. For example, feature F i The screening value interval of (2) is [ -je ]>) Then feature F i Is a screening value interval [ -j ]>) The opposite interval of the value of (2) may be (+)>
Thus, for feature F j Can select and select the characteristic F i As the characteristic F, the opposite value interval of the screening value interval of (a) j Is a screening value interval of the (a).
After step 30743A is performed, feature F is determined j After the screening interval of (c) may proceed to step 30744.
Step 30743B, determining a screening value interval of the jth feature in each value interval of the jth feature according to the screening probability corresponding to each value interval of the jth feature.
The execution subject in step 30742 may determine the feature F in step 30742 j And feature F j In the case of a weak correlation between them, i.e. in the case of feature F j And feature F j And under the condition of irrelevant or weak correlation, determining the screening value interval of the jth feature in the value intervals of the jth feature according to the screening probability corresponding to the value intervals of the jth feature.
After step 30743B is performed, feature F is determined j After the filtering interval, go to step 30744 for execution.
In step 30744, a value is randomly determined in the screening value interval of the jth feature as the feature value of the jth feature in the normalized feature set of the new simulation task completion data.
After step 30744 is performed, execution proceeds to step 30745.
Step 30745, increment the value of j by 1.
After the execution of step 30745, the process goes on to step 30741 to execute the feature value generating operation until the value of j is K, and the last feature value generating operation is executed, so as to generate the feature values of K features in the normalized feature group of the newly built simulation task completion data.
After the execution of step 307, the normalized feature set of the simulated task completion data includes M normalized feature sets of the simulated task completion data, thereby completing the generation of M simulated samples. Subsequently, capability assessment or various relevant statistical analyses can be performed based on the N real samples and the M simulated samples to reduce the cost requirements for real sample collection.
In some alternative embodiments, the foregoing execution body may further perform the following steps 308 to 310 before performing step 307:
step 308, determining whether an upper boundary task completion data normalization feature set and a lower boundary task completion data normalization feature set exist in the task completion data normalization feature set.
Here, each feature value in the upper boundary task completion data normalization feature group is 1, and each feature value in the lower boundary task completion data normalization feature group is 0.
That is, it is first determined whether there is an upper boundary sample (i.e., an upper boundary task completion data normalization feature group) or not there is a lower boundary sample (i.e., a lower boundary task completion data normalization feature group) in the real samples. If it is determined that there is no upper boundary task completion data normalization feature set, execution proceeds to step 309. If it is determined that there is no lower boundary task completion data normalization feature set, execution proceeds to step 310.
In response to determining that the upper bound task completion data normalization feature set does not exist, a set of upper bound task completion data normalization features is generated and added to the set of simulated task completion data normalization feature sets 309.
Here, the execution body may determine that the upper boundary task completion data normalization feature group does not exist in step 308, that is, in the case that the upper boundary sample does not exist in the real sample, generate the upper boundary task completion data normalization feature group and add the upper boundary sample as a newly generated simulation sample to the simulation task completion data normalization feature group set.
In step 310, in response to determining that the lower boundary task completion data normalization feature set does not exist, the lower boundary task completion data normalization feature set is generated and added to the simulated task completion data normalization feature set.
Here, the execution body may determine that the lower boundary task completion data normalization feature group does not exist in step 308, that is, in the case that the lower boundary sample does not exist in the real sample, generate the lower boundary task completion data normalization feature group and add the lower boundary task completion data normalization feature group to the simulation task completion data normalization feature group set, that is, generate the lower boundary sample as a newly generated simulation sample.
Through steps 308 to 310, the task completion data normalization feature set (i.e., the real sample) and the simulation task completion data normalization feature set (i.e., the simulation sample) are combined to include the upper boundary task completion data normalization feature set (i.e., the upper boundary sample) and the lower boundary task completion data normalization feature set (i.e., the lower boundary sample), so that the possibility of the samples can be enriched, and the distribution uniformity of the combined samples can be improved.
It will be appreciated that, based on the alternative embodiments from step 308 to step 310, the number of times of execution of step 307 may depend on whether the upper boundary sample is generated as a simulation sample and whether the lower boundary sample is generated as a simulation sample, and only M simulation task completion data normalization feature sets are included in the simulation task completion data normalization feature set.
As can be seen from fig. 3A, compared with the embodiment corresponding to fig. 2A, the process 300 of the task completion data feature set normalization method in this embodiment includes more steps of generating a simulation sample, and the process of generating the simulation sample is not randomly generated, but considers the correlation between different features, so that the generated simulation sample is closer to the actual situation, that is, closer to the actual sample.
With further reference to fig. 4, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of a task completion data feature set normalization apparatus, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2A, and the apparatus may be specifically applied in various electronic devices.
As shown in fig. 4, the task completion data feature group normalization apparatus 400 of the present embodiment includes: an acquisition unit 401 and a normalization unit 402. The acquiring unit 401 is configured to acquire a task completion data feature set, where the task completion data feature set includes feature values of at least one feature obtained by extracting features of the task completion data; and a normalization unit 402 configured to perform, for each feature included in the task completion data feature group, the following normalization operations: acquiring a feature category, a feature feedback direction and a preset feature minimum value and maximum value of the feature, wherein the feature feedback direction is used for representing a correlation direction between the feature value of the feature and the degree of capability of completing the task, and the feature feedback direction is positive correlation or negative correlation; determining a normalization method corresponding to the feature according to the feature class of the feature; and normalizing the characteristic value of the characteristic in each task completion data characteristic group based on the characteristic feedback direction of the characteristic, the preset characteristic minimum value and the maximum value according to a normalization method corresponding to the characteristic, so as to obtain the normalized characteristic value of the characteristic in the corresponding task completion data characteristic group.
In this embodiment, the specific processing and the technical effects brought by the acquiring unit 401 and the normalizing unit 402 of the task completion data feature set normalizing device 400 may refer to the descriptions related to the step 201 and the step 202 in the corresponding embodiment of fig. 2A, and are not repeated here.
In some alternative embodiments, the feature classes may include ratio class features, time class features, and other class features; and determining a normalization method corresponding to the feature according to the feature class of the feature, which may include:
in response to determining that the feature class of the feature is a ratio class feature or a time class feature, determining that a normalization method corresponding to the feature is post-preprocessing normalization;
in response to determining that the feature class of the feature is other class of features, determining that the normalization method corresponding to the feature is conventional normalization.
In some optional embodiments, normalizing the feature value of the feature in each task completion data feature group according to the normalization method corresponding to the feature based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value to obtain a normalized feature value of the feature in the corresponding task completion data feature group may include:
In response to determining that the normalization method corresponding to the feature is conventional normalization, for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the maximum value in the task completion data feature group into the following conventional normalization formulas respectively、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining normalized feature values for the feature in the set of task completion data features:
in some optional embodiments, normalizing the feature value of the feature in each task completion data feature group according to the normalization method corresponding to the feature based on the feature feedback direction of the feature, the preset feature minimum value and the maximum value to obtain a normalized feature value of the feature in the corresponding task completion data feature group may include:
determining a pretreatment method corresponding to the feature according to the feature category of the feature in response to determining that the normalization method corresponding to the feature is pretreatment-after-pretreatment normalization;
respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
For each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the preset maximum value in the task completion data feature group into the conventional normalization formula、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a conventional normalized feature value for the feature in the task completion data feature set; substituting the processed characteristic value of the characteristic, the characteristic feedback direction of the characteristic, the preset processed characteristic minimum value and the preset processed characteristic maximum value in the task completion data characteristic group into +.>、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining the set of data characteristics for completion of the taskNormalizing the feature value after the feature is processed;
calculating distribution bias based on the distribution of the conventional normalized feature values of the feature in each task completion data feature group to obtain the conventional normalized distribution bias of the feature;
calculating distribution bias based on the distribution of the processed normalized feature values of the feature in each task completion data feature group to obtain the preprocessed normalized distribution bias of the feature;
Determining whether the absolute value of the preprocessed normalized distribution bias of the feature is less than the conventional normalized distribution bias of the feature;
in response to determining less than, determining a processed normalized feature value for the feature in each of the task completion data feature groups as a normalized feature value for the feature in the task completion data feature group;
and determining the conventional normalized feature value of the feature in each task completion data feature group as the normalized feature value of the feature in the task completion data feature group in response to determining that the conventional normalized feature value is not smaller than the conventional normalized feature value.
In some optional embodiments, the determining a preprocessing method corresponding to the feature according to the feature class of the feature may include:
in response to determining that the feature class of the feature is a ratio class feature, determining that a preprocessing method corresponding to the feature is an exponentiation; and
the preprocessing method includes the steps of respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group, and comprises the following steps:
And respectively carrying out exponentiation operation by taking the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group as exponentiation operation by taking the first preset constant as a base in response to the determined pretreatment method, and respectively determining the obtained exponentiation operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group.
In some alternative embodiments, the first preset constant may be a natural constant.
In some optional embodiments, the determining a preprocessing method corresponding to the feature according to the feature class of the feature may include:
determining a preprocessing method corresponding to the feature as a base exponentiation in response to determining that the feature class of the feature is a time class feature and the feature feedback direction of the feature is positive correlation;
in response to determining that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is a negative correlation, determining that a preprocessing method corresponding to the feature is a logarithmic operation; and
the preprocessing method according to the determined preprocessing method respectively preprocesses the minimum and maximum values of the preset feature values of the feature and the feature values of the feature in each task completion data feature group to obtain the minimum and maximum values of the processed preset feature values of the feature and the processed feature values of the feature in the corresponding task completion data feature group, and may include:
Responding to the determined preprocessing method as a base exponentiation, respectively carrying out exponentiation by taking a second preset constant as an index and taking a minimum value and a maximum value of a preset characteristic value of the characteristic and a characteristic value of the characteristic in each task completion data characteristic group as bases, and respectively determining the obtained exponentiation result as a minimum value and a maximum value of the processed preset characteristic value of the characteristic and a processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
and taking the logarithm of the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group respectively by taking the third preset constant as a base in response to the determined preprocessing method, and determining the obtained logarithm operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group respectively.
In some alternative embodiments, each of the task completion data feature groups may include K features therein; and
the apparatus 400 may further include:
a generating unit 403 configured to generate a task completion data normalized feature group corresponding to each of the task completion data feature groups using the normalized feature values of the features in the task completion data feature groups, and generate a task completion data normalized feature group set using the task completion data normalized feature groups corresponding to each of the task completion data feature groups in the task completion data feature group set;
A calculating unit 404 configured to calculate a correlation coefficient and a P value between any two different features of K features included in each task completion data normalization feature group based on the task completion data normalization feature group set;
a first determining unit 405 configured to determine a number M of task completion data normalization feature groups to be generated according to a number N of task completion data normalization feature groups in the task completion data normalization feature group set, and generate an empty simulation task completion data normalization feature group set;
a second determining unit 406, configured to determine, for each of the K features, at least two value intervals corresponding to the feature and a screening probability corresponding to each value interval;
a simulation unit 407 configured to perform a simulation task completion data normalization feature group generation operation until the number of simulation task completion data normalization feature groups in the simulation task completion data normalization feature group set is not less than M, the simulation task completion data normalization feature group generation operation including: creating a simulation task completion data normalization feature set; according to the screening probability corresponding to each value interval of the 1 st feature in the K features, determining the 1 st feature screening value interval in each value interval of the 1 st feature, and randomly determining a numerical value in the 1 st feature screening value interval as the characteristic value of the 1 st feature in the new simulation task completion data normalization feature group; setting the initial value of the positive integer j as 2; for a j-th feature of the K features, performing a feature value generation operation until j is K, the feature value generation operation including: determining the feature with the largest absolute value of the correlation coefficient between the first j-1 features and the jth feature as the most relevant feature of the jth feature; determining whether an absolute value of a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold and a P-value is less than a preset P-value threshold; in response to determining that the screening value interval of the jth feature is determined according to the screening value interval of the most relevant feature of the jth feature; responding to the determination of the screening probability corresponding to each value interval of the jth feature, and determining the screening value interval of the jth feature in each value interval of the jth feature; randomly determining a value in the screening value interval of the jth feature as the feature value of the jth feature in the new simulation task completion data normalization feature group; and after increasing the value of j by 1, continuing to execute the characteristic value generation operation.
In some optional embodiments, the determining the screening value interval of the jth feature according to the screening value interval of the most relevant feature of the jth feature may include:
in response to determining that a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold, the preset correlation coefficient threshold being a constant greater than zero, determining a screening value interval of the most relevant feature of the jth feature as a screening value interval of the jth feature;
and in response to determining that the correlation coefficient between the jth feature and the jth feature is smaller than the inverse number of the preset correlation coefficient threshold, determining the inverse value interval of the screening value interval of the jth feature in the value interval of the jth feature as the screening value interval of the jth feature, wherein the minimum value and the maximum value corresponding to the inverse value interval of the screening value interval of the jth feature are respectively 1 minus the difference between the maximum value and the minimum value corresponding to the screening value interval of the jth feature.
In some optional embodiments, the apparatus 400 further includes a boundary sample generation unit 408 configured to, before the performing of the simulation task completion data normalization feature group generation operation until the number of simulation task completion data normalization feature groups in the simulation task completion data normalization feature group set is not less than M:
Determining whether an upper boundary task completion data normalization feature group and a lower boundary task completion data normalization feature group exist in the task completion data normalization feature group set, wherein each feature value in the upper boundary task completion data normalization feature group is 1, and each feature value in the lower boundary task completion data normalization feature group is 0;
generating an upper boundary task completion data normalization feature set and adding the upper boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the upper boundary task completion data normalization feature set does not exist;
and generating a lower boundary task completion data normalization feature set and adding the lower boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the lower boundary task completion data normalization feature set does not exist.
In some alternative embodiments, the at least one feature may include: a feature for assessing attention, a feature for assessing self-control, and a feature for assessing conversion power.
In some alternative embodiments, the feature for assessing attention may include at least one of: continuously making standard deviation of the number of questions, continuously making weighted average of the number of questions, longest concentration time and time required for entering the longest concentration.
In some alternative embodiments, the feature for assessing self-control may include at least one of: correct suppression ratio of inoperable time and disturbed error rate.
In some alternative embodiments, the feature for evaluating the conversion force may include at least one of: the correctness of the thread switching questions, the correct response time of the thread switching questions, the difference of answering the correctness of the questions under different rules and the difference of responding the different rules.
It should be noted that, the implementation details and the technical effects of each unit in the task completion data feature set normalization device provided in the embodiments of the present disclosure may refer to the descriptions of other embodiments in the present disclosure, and are not described herein again.
Referring now to FIG. 5, there is illustrated a schematic diagram of a computer system 500 suitable for use in implementing the electronic device of the present disclosure. The computer system 500 shown in fig. 5 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present disclosure.
As shown in fig. 5, a computer system 500 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 501 that may perform various suitable actions and processes in accordance with programs stored in a Read Only Memory (ROM) 502 or loaded from a storage device 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the computer system 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the computer system 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates a computer system 500 having electronic devices with various means, it should be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.
It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement a task completion data feature set normalization method as shown in the embodiment and alternative embodiments thereof shown in fig. 2A, and/or a task completion data feature set normalization method as shown in the embodiment and alternative embodiments thereof shown in fig. 3A.
Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments described in the present disclosure may be implemented by means of software, or may be implemented by means of hardware. Where the name of the unit does not constitute a limitation on the unit itself in some cases, for example, the acquisition unit may also be described as "unit that acquires a task completion data feature group set".
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Claims (14)

1. A method for normalizing a task completion data feature set, comprising:
acquiring a task completion data feature set, wherein the task completion data comprises at least one of the following: operating behavior data in the task completion process of the subject, performance result data of the task completed by the subject and task general data, wherein the task completion data feature group comprises feature values of at least one feature obtained by extracting features of the task completion data;
for each feature included in the task completion data feature set, performing the following normalization operation: the method comprises the steps of obtaining a characteristic category of the characteristic, a characteristic feedback direction and a preset characteristic minimum value and maximum value, wherein the characteristic category comprises a ratio characteristic, a time characteristic and other characteristics, the characteristic feedback direction is used for representing a correlation direction between the characteristic value of the characteristic and the degree of capability of completing a task, and the characteristic feedback direction is positive correlation or negative correlation; in response to determining that the feature class of the feature is a ratio class feature or a time class feature, determining that a normalization method corresponding to the feature is post-preprocessing normalization; in response to determining that the feature class of the feature is other class of features, determining that a normalization method corresponding to the feature is conventional normalization; in response to determining that the normalization method corresponding to the feature is conventional normalization, for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the maximum value in the task completion data feature group into the following conventional normalization formulas respectively 、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining normalized feature values for the feature in the set of task completion data features:
determining a pretreatment method corresponding to the feature according to the feature category of the feature in response to determining that the normalization method corresponding to the feature is pretreatment-after-pretreatment normalization;
respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the preset maximum value in the task completion data feature group into the conventional normalization formula、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a conventional normalized feature value for the feature in the task completion data feature set; substituting the processed characteristic value of the characteristic, the characteristic feedback direction of the characteristic, the preset processed characteristic minimum value and the preset processed characteristic maximum value in the task completion data characteristic group into +. >、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a processed normalized feature value for the feature in the task completion data feature set;
calculating distribution bias based on the distribution of the conventional normalized feature values of the feature in each task completion data feature group to obtain the conventional normalized distribution bias of the feature;
calculating distribution bias based on the distribution of the processed normalized feature values of the feature in each task completion data feature group to obtain the preprocessed normalized distribution bias of the feature;
determining whether the absolute value of the preprocessed normalized distribution bias of the feature is less than the conventional normalized distribution bias of the feature;
in response to determining less than, determining a processed normalized feature value for the feature in each of the task completion data feature groups as a normalized feature value for the feature in the task completion data feature group;
and determining the conventional normalized feature value of the feature in each task completion data feature group as the normalized feature value of the feature in the task completion data feature group in response to determining that the conventional normalized feature value is not smaller than the conventional normalized feature value.
2. The method of claim 1, wherein the determining a preprocessing method corresponding to the feature according to the feature class of the feature comprises:
In response to determining that the feature class of the feature is a ratio class feature, determining that a preprocessing method corresponding to the feature is an exponentiation; and
the preprocessing method includes the steps of respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group, and comprises the following steps:
and respectively carrying out exponentiation operation by taking the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group as exponentiation operation by taking the first preset constant as a base in response to the determined pretreatment method, and respectively determining the obtained exponentiation operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group.
3. The method of claim 2, wherein the first predetermined constant is a natural constant.
4. The method of claim 1, wherein the determining a preprocessing method corresponding to the feature according to the feature class of the feature comprises:
Determining a preprocessing method corresponding to the feature as a base exponentiation in response to determining that the feature class of the feature is a time class feature and the feature feedback direction of the feature is positive correlation;
in response to determining that the feature class of the feature is a time-class feature and the feature feedback direction of the feature is a negative correlation, determining that a preprocessing method corresponding to the feature is a logarithmic operation; and
the preprocessing method includes the steps of respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group, and comprises the following steps:
responding to the determined preprocessing method as a base exponentiation, respectively carrying out exponentiation by taking a second preset constant as an index and taking a minimum value and a maximum value of a preset characteristic value of the characteristic and a characteristic value of the characteristic in each task completion data characteristic group as bases, and respectively determining the obtained exponentiation result as a minimum value and a maximum value of the processed preset characteristic value of the characteristic and a processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
And taking the logarithm of the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group respectively by taking the third preset constant as a base in response to the determined preprocessing method, and determining the obtained logarithm operation result as the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group respectively.
5. The method of any of claims 1-4, wherein each of the task completion data feature sets includes K features; and
the method further comprises the steps of:
generating a task completion data normalization feature group corresponding to each task completion data feature group by using the normalization feature value of each feature in each task completion data feature group, and generating a task completion data normalization feature group set by using the task completion data normalization feature group corresponding to each task completion data feature group in the task completion data feature group set;
calculating a correlation coefficient and a P value between any two different characteristics in K characteristics included in each task completion data normalization characteristic group based on the task completion data normalization characteristic group set;
Determining the number M of the normalized feature groups of the task completion data to be generated according to the number N of the normalized feature groups of the task completion data in the normalized feature group set of the task completion data, and generating an empty normalized feature group set of the simulated task completion data;
for each of the K features, determining at least two value intervals corresponding to the feature and screening probability corresponding to each value interval;
executing the generation operation of the normalized feature groups of the simulation task completion data until the number of the normalized feature groups of the simulation task completion data in the normalized feature group set of the simulation task completion data is not less than M, wherein the generation operation of the normalized feature groups of the simulation task completion data comprises the following steps: creating a simulation task completion data normalization feature set; according to the screening probability corresponding to each value interval of the 1 st feature in the K features, determining the 1 st feature screening value interval in each value interval of the 1 st feature, and randomly determining a numerical value in the 1 st feature screening value interval as the characteristic value of the 1 st feature in the new simulation task completion data normalization feature group; setting the initial value of the positive integer j as 2; for a j-th feature of the K features, performing a feature value generation operation until j is K, the feature value generation operation including: determining the feature with the largest absolute value of the correlation coefficient between the first j-1 features and the jth feature as the most relevant feature of the jth feature; determining whether an absolute value of a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold and a P-value is less than a preset P-value threshold; in response to determining that the screening value interval of the jth feature is determined according to the screening value interval of the most relevant feature of the jth feature; responding to the determination of the screening probability corresponding to each value interval of the jth feature, and determining the screening value interval of the jth feature in each value interval of the jth feature; randomly determining a value in the screening value interval of the jth feature as the feature value of the jth feature in the new simulation task completion data normalization feature group; and after increasing the value of j by 1, continuing to execute the characteristic value generation operation.
6. The method of claim 5, wherein the determining the screening value interval of the jth feature from the screening value interval of the most relevant feature of the jth feature comprises:
in response to determining that a correlation coefficient between the jth feature and a most relevant feature of the jth feature is greater than a preset correlation coefficient threshold, the preset correlation coefficient threshold being a constant greater than zero, determining a screening value interval of the most relevant feature of the jth feature as a screening value interval of the jth feature;
and in response to determining that the correlation coefficient between the jth feature and the jth feature is smaller than the inverse number of the preset correlation coefficient threshold, determining the inverse value interval of the screening value interval of the jth feature in the value interval of the jth feature as the screening value interval of the jth feature, wherein the minimum value and the maximum value corresponding to the inverse value interval of the screening value interval of the jth feature are respectively 1 minus the difference between the maximum value and the minimum value corresponding to the screening value interval of the jth feature.
7. The method of claim 5, wherein prior to the performing the simulated task completion data normalization feature group generation operation until the number of simulated task completion data normalization feature groups in the simulated task completion data normalization feature group set is not less than M, the method further comprises:
Determining whether an upper boundary task completion data normalization feature group and a lower boundary task completion data normalization feature group exist in the task completion data normalization feature group set, wherein each feature value in the upper boundary task completion data normalization feature group is 1, and each feature value in the lower boundary task completion data normalization feature group is 0;
generating an upper boundary task completion data normalization feature set and adding the upper boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the upper boundary task completion data normalization feature set does not exist;
and generating a lower boundary task completion data normalization feature set and adding the lower boundary task completion data normalization feature set to the simulation task completion data normalization feature set in response to determining that the lower boundary task completion data normalization feature set does not exist.
8. The method of claim 1, wherein the at least one feature comprises at least one of: a feature for assessing attention, a feature for assessing self-control, and a feature for assessing conversion power.
9. The method of claim 8, wherein the feature for assessing attention comprises at least one of: continuously making standard deviation of the number of questions, continuously making weighted average of the number of questions, longest concentration time and time required for entering the longest concentration.
10. The method of claim 8, wherein the feature for assessing self-control comprises at least one of: correct suppression ratio of inoperable time and disturbed error rate.
11. The method of claim 8, wherein the feature for evaluating conversion force comprises at least one of: the correctness of the thread switching questions, the correct response time of the thread switching questions, the difference of answering the correctness of the questions under different rules and the difference of responding the different rules.
12. A task completion data feature set normalization apparatus comprising:
an acquisition unit configured to acquire a set of task completion data feature groups, the task completion data including at least one of: operating behavior data in the task completion process of the subject, performance result data of the task completed by the subject and task general data, wherein the task completion data feature group comprises feature values of at least one feature obtained by extracting features of the task completion data;
a normalization unit configured to perform, for each feature included in the task completion data feature group, the following normalization operation: acquiring the feature category, the feature feedback direction and the preset feature minimum value and maximum value of the feature, wherein the feature is specific The feature class comprises a ratio class feature, a time class feature and other class features, wherein the feature feedback direction is used for representing the correlation direction between the feature value of the feature and the degree of the capability of completing the task, and the feature feedback direction is positive correlation or negative correlation; in response to determining that the feature class of the feature is a ratio class feature or a time class feature, determining that a normalization method corresponding to the feature is post-preprocessing normalization; in response to determining that the feature class of the feature is other class of features, determining that a normalization method corresponding to the feature is conventional normalization; in response to determining that the normalization method corresponding to the feature is conventional normalization, for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the maximum value in the task completion data feature group into the following conventional normalization formulas respectively、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining normalized feature values for the feature in the set of task completion data features:
determining a pretreatment method corresponding to the feature according to the feature category of the feature in response to determining that the normalization method corresponding to the feature is pretreatment-after-pretreatment normalization;
Respectively preprocessing the minimum value and the maximum value of the preset characteristic value of the characteristic and the characteristic value of the characteristic in each task completion data characteristic group according to the determined preprocessing method to obtain the minimum value and the maximum value of the processed preset characteristic value of the characteristic and the processed characteristic value of the characteristic in the corresponding task completion data characteristic group;
for each task completion data feature group, substituting the feature value of the feature, the feature feedback direction of the feature, the preset feature minimum value and the preset maximum value in the task completion data feature group into the conventional normalization formula、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a conventional normalized feature value for the feature in the task completion data feature set; substituting the processed characteristic value of the characteristic, the characteristic feedback direction of the characteristic, the preset processed characteristic minimum value and the preset processed characteristic maximum value in the task completion data characteristic group into +.>、/>、/>And->And outputting the conventional normalization result ++of the conventional normalization formula after substitution>Determining a processed normalized feature value for the feature in the task completion data feature set;
Calculating distribution bias based on the distribution of the conventional normalized feature values of the feature in each task completion data feature group to obtain the conventional normalized distribution bias of the feature;
calculating distribution bias based on the distribution of the processed normalized feature values of the feature in each task completion data feature group to obtain the preprocessed normalized distribution bias of the feature;
determining whether the absolute value of the preprocessed normalized distribution bias of the feature is less than the conventional normalized distribution bias of the feature;
in response to determining less than, determining a processed normalized feature value for the feature in each of the task completion data feature groups as a normalized feature value for the feature in the task completion data feature group;
and determining the conventional normalized feature value of the feature in each task completion data feature group as the normalized feature value of the feature in the task completion data feature group in response to determining that the conventional normalized feature value is not smaller than the conventional normalized feature value.
13. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-11.
14. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by one or more processors implements the method of any of claims 1-11.
CN202311166869.9A 2023-09-12 2023-09-12 Feature group normalization method, device, electronic equipment and storage medium Active CN116913525B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311166869.9A CN116913525B (en) 2023-09-12 2023-09-12 Feature group normalization method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311166869.9A CN116913525B (en) 2023-09-12 2023-09-12 Feature group normalization method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN116913525A CN116913525A (en) 2023-10-20
CN116913525B true CN116913525B (en) 2024-02-06

Family

ID=88353453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311166869.9A Active CN116913525B (en) 2023-09-12 2023-09-12 Feature group normalization method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116913525B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102833A (en) * 2014-07-10 2014-10-15 西安交通大学 Intensive interval discovery based tax index normalization and fusion calculation method
CN111104400A (en) * 2019-12-24 2020-05-05 天津新开心生活科技有限公司 Data normalization method and device, electronic equipment and storage medium
CN113240010A (en) * 2021-05-14 2021-08-10 烟台海颐软件股份有限公司 Abnormity detection method and system supporting non-independent distribution of mixed data
CN113989121A (en) * 2021-11-09 2022-01-28 Oppo广东移动通信有限公司 Normalization processing method and device, electronic equipment and storage medium
CN115994131A (en) * 2022-10-11 2023-04-21 国网湖南省电力有限公司 Residential community feature tag calculation method and system based on electricity utilization time sequence data
JP2023085353A (en) * 2022-09-29 2023-06-20 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Feature extraction model training method, image classifying method, and related apparatus
CN116402512A (en) * 2023-05-31 2023-07-07 无锡锡商银行股份有限公司 Account security check management method based on artificial intelligence

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114021799A (en) * 2021-10-29 2022-02-08 国网辽宁省电力有限公司经济技术研究院 Day-ahead wind power prediction method and system for wind power plant

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102833A (en) * 2014-07-10 2014-10-15 西安交通大学 Intensive interval discovery based tax index normalization and fusion calculation method
CN111104400A (en) * 2019-12-24 2020-05-05 天津新开心生活科技有限公司 Data normalization method and device, electronic equipment and storage medium
CN113240010A (en) * 2021-05-14 2021-08-10 烟台海颐软件股份有限公司 Abnormity detection method and system supporting non-independent distribution of mixed data
CN113989121A (en) * 2021-11-09 2022-01-28 Oppo广东移动通信有限公司 Normalization processing method and device, electronic equipment and storage medium
JP2023085353A (en) * 2022-09-29 2023-06-20 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Feature extraction model training method, image classifying method, and related apparatus
CN115994131A (en) * 2022-10-11 2023-04-21 国网湖南省电力有限公司 Residential community feature tag calculation method and system based on electricity utilization time sequence data
CN116402512A (en) * 2023-05-31 2023-07-07 无锡锡商银行股份有限公司 Account security check management method based on artificial intelligence

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
离散数据的归一化处理在计税核价系统中应用研究;吴斌;;微型电脑应用(10);全文 *

Also Published As

Publication number Publication date
CN116913525A (en) 2023-10-20

Similar Documents

Publication Publication Date Title
KR102444165B1 (en) Apparatus and method for providing a meeting adaptively
US8781991B2 (en) Emotion recognition apparatus and method
CN109074166A (en) Change application state using neural deta
EP4080384A1 (en) Object recommendation method and apparatus, computer device, and medium
CN112380377B (en) Audio recommendation method and device, electronic equipment and computer storage medium
CN109817312A (en) A kind of medical bootstrap technique and computer equipment
CN114098730B (en) Cognitive ability testing and training method, device, equipment and medium based on cognitive map
US10238310B2 (en) Knowledge discovery based on brainwave response to external stimulation
WO2019086856A1 (en) Systems and methods for combining and analysing human states
US11291911B2 (en) Visualization of sound data extending functionality of applications/services including gaming applications/services
CN108805035A (en) Interactive teaching and learning method based on gesture identification and device
CN111400473A (en) Method and device for training intention recognition model, storage medium and electronic equipment
US20230051413A1 (en) Voice call control method and apparatus, computer-readable medium, and electronic device
CN114791982A (en) Object recommendation method and device
CN104541304B (en) Use the destination object angle-determining of multiple cameras
CN109829117A (en) Method and apparatus for pushed information
CN111857482B (en) Interaction method, device, equipment and readable medium
CN111444383B (en) Audio data processing method and device and computer readable storage medium
US10758159B2 (en) Measuring somatic response to stimulus utilizing a mobile computing device
CN116913525B (en) Feature group normalization method, device, electronic equipment and storage medium
CN116913526B (en) Normalization feature set up-sampling method and device, electronic equipment and storage medium
CN112398952A (en) Electronic resource pushing method, system, equipment and storage medium
JP2024507602A (en) Data processing methods and methods for training predictive models
CN116910492A (en) Capacity assessment model training and capacity assessment method, device, equipment and medium
CN111949860B (en) Method and apparatus for generating a relevance determination model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant