CN113470823A

CN113470823A - User physiological period prediction method, device, equipment and storage medium

Info

Publication number: CN113470823A
Application number: CN202110720559.1A
Authority: CN
Inventors: 沈鹏
Original assignee: Kangjian Information Technology Shenzhen Co Ltd
Current assignee: Kangjian Information Technology Shenzhen Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-10-01

Abstract

The invention relates to the field of big data and discloses a method, a device, equipment and a storage medium for predicting a user physiological period. The method comprises the following steps: acquiring target user data to be matched, and performing data cleaning on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of a target user; vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user; calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector; sorting according to the Mahalanobis distance, and determining a physiological cycle data group corresponding to the target user data according to a sorting result; and accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group. The technical problem of low accuracy of menstrual period prediction is solved, and the service value is improved.

Description

User physiological period prediction method, device, equipment and storage medium

Technical Field

The invention relates to the field of big data, in particular to a method, a device, equipment and a storage medium for predicting a user physiological period.

Background

Menstrual period management is a common tool type App and can record information such as menstrual periods, weight, body temperature and the like. With menstrual periods being presumed to be their most prominent function. Menstrual calendars are their primary form of interaction.

Due to the reasons of algorithm performance, synchronization strategy, code implementation and the like, the calendar in the App has various problems in the prediction of the menstrual period. Therefore, the technical problem to be solved by technical personnel in the field is to improve the accuracy of predicting the physiological period of the user by analyzing and processing the physiological period data of the user through the similarity of the user behaviors.

Disclosure of Invention

The method and the device analyze and process the physiological cycle data of the user based on the user behavior similarity, and solve the technical problem of low accuracy of the current user physiological cycle prediction.

The invention provides a user physiological period prediction method in a first aspect, which comprises the following steps: acquiring target user data to be matched, wherein the target user data comprises physiological cycle characteristic data of a target user; performing data cleaning on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of the target user; vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user; calculating the Mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector; sorting according to the Mahalanobis distance, and determining a physiological cycle data group corresponding to the target user data according to a sorting result, wherein the physiological cycle data group respectively comprises different physiological cycle characteristic data; and accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

Optionally, in a first implementation manner of the first aspect of the present invention, before the obtaining target user data to be matched, the method includes: acquiring sample data containing historical user physiological cycle characteristic data; preprocessing the sample data based on the type of the sample data to obtain a discretization word vector; calculating physiological cycle feature similarity between the historical users by adopting a cosine similarity algorithm based on the discretization word vector; and clustering the historical users based on the similarity to obtain a physiological cycle data group with similar physiological cycle characteristics.

Optionally, in a second implementation manner of the first aspect of the present invention, the clustering the historical users based on the similarity to obtain a physiological cycle data group with similar physiological cycle characteristics includes: setting the clustering number as k, and randomly selecting physiological cycle data corresponding to k historical users as an initial clustering center; and selecting the maximum similarity value corresponding to each historical user based on the similarity value, and dividing each historical user into the cluster where the clustering center corresponding to the maximum similarity value is located until the historical users are divided, thereby obtaining a clustering result.

Optionally, in a third implementation manner of the first aspect of the present invention, the performing data cleaning on the physiological cycle characteristic data by using a preset data processing algorithm to obtain the target physiological cycle characteristic data of the target user includes: receiving a data cleaning request, wherein the data cleaning request comprises data cleaning of the user behavior data according to a data format of a preset feature extraction algorithm; determining a data cleansing rule according to the data cleansing request, wherein the data cleansing rule comprises: a cleaning characteristic factor and a cleaning condition satisfied by the cleaning characteristic factor; determining a characteristic factor value corresponding to the cleaning characteristic factor; and cleaning the physiological cycle characteristic data by adopting a preset data processing algorithm according to the data cleaning rule and the characteristic factor value to obtain target physiological cycle characteristic data.

Optionally, in a fourth implementation manner of the first aspect of the present invention, the sorting according to the mahalanobis distance, and determining, according to a sorting result, a physiological cycle data group to which the target user data is correspondingly matched includes: sequencing the Mahalanobis distances to obtain a sequencing result; and determining a user group corresponding to the target user data based on the sequencing result, wherein the user group respectively comprises physiological cycle prediction information corresponding to different user behavior data.

Optionally, in a fifth implementation manner of the first aspect of the present invention, after the accurately predicting the physiological cycle of the target user according to the physiological cycle characteristic data corresponding to the physiological cycle data group, the method further includes: when the user actively modifies the user data, the modified user data is synchronized to preset user data information, and the corresponding physiological cycle time of the user is corrected.

Optionally, in a sixth implementation manner of the first aspect of the present invention, the accurately predicting the physiological cycle of the target user according to the physiological cycle characteristic data corresponding to the physiological cycle data group includes:

acquiring physiological cycle data contained in the physiological cycle data group;

performing feature extraction on the physiological cycle data to obtain physiological cycle feature data corresponding to the physiological cycle data;

and according to the physiological cycle characteristic data, accurately predicting the physiological cycle of the target user.

A second aspect of the present invention provides a user physiological period prediction apparatus, comprising: the device comprises a first acquisition module, a second acquisition module and a matching module, wherein the first acquisition module is used for acquiring target user data to be matched, and the target user data comprises physiological cycle characteristic data of a target user; the data cleaning module is used for cleaning the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of the target user; the vectorization processing module is used for vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user; the first calculation module is used for calculating the Mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector; the determining module is used for sequencing according to the Mahalanobis distance and determining a physiological cycle data group corresponding to and matched with the target user data according to a sequencing result, wherein the physiological cycle data group respectively comprises different physiological cycle characteristic data; and the prediction module is used for accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

Optionally, in a first implementation manner of the second aspect of the present invention, the user physiological period prediction apparatus further includes: the second acquisition module is used for acquiring sample data containing the characteristic data of the physiological cycle of the historical user; the preprocessing module is used for preprocessing the sample data based on the type of the sample data to obtain a discretization word vector; the second calculation module is used for calculating the physiological cycle feature similarity between the historical users by adopting a cosine similarity algorithm based on the discretization word vector; and the clustering module is used for clustering the historical users based on the similarity to obtain a physiological cycle data group with similar physiological cycle characteristics.

Optionally, in a second implementation manner of the second aspect of the present invention, the clustering module is specifically configured to: setting the clustering number as k, and randomly selecting physiological cycle data corresponding to k historical users as an initial clustering center; and selecting the maximum similarity value corresponding to each historical user based on the similarity value, and dividing each historical user into the cluster where the clustering center corresponding to the maximum similarity value is located until the historical users are divided, thereby obtaining a clustering result.

Optionally, in a third implementation manner of the second aspect of the present invention, the data cleansing module is specifically configured to: receiving a data cleaning request, wherein the data cleaning request comprises data cleaning of the user behavior data according to a data format of a preset feature extraction algorithm; determining a data cleansing rule according to the data cleansing request, wherein the data cleansing rule comprises: a cleaning characteristic factor and a cleaning condition satisfied by the cleaning characteristic factor; determining a characteristic factor value corresponding to the cleaning characteristic factor; and cleaning the physiological cycle characteristic data by adopting a preset data processing algorithm according to the data cleaning rule and the characteristic factor value to obtain target physiological cycle characteristic data.

Optionally, in a fourth implementation manner of the second aspect of the present invention, the determining module includes: the sorting unit is used for sorting the Mahalanobis distances to obtain a sorting result; and the determining unit is used for determining a user group corresponding to the target user data and matched with the target user data based on the sequencing result, wherein the user group respectively comprises physiological cycle prediction information corresponding to different user behavior data.

Optionally, in a fifth implementation manner of the second aspect of the present invention, the user physiological period prediction apparatus further includes: and the synchronization module is used for synchronizing the modified user data to preset user data information and correcting the corresponding physiological cycle time of the user when the user actively modifies the user data.

Optionally, in a sixth implementation manner of the second aspect of the present invention, the prediction module is specifically configured to: acquiring physiological cycle data contained in the physiological cycle data group; performing feature extraction on the physiological cycle data to obtain physiological cycle feature data corresponding to the physiological cycle data; and according to the physiological cycle characteristic data, accurately predicting the physiological cycle of the target user.

A third aspect of the present invention provides a user physiological period prediction apparatus comprising: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invokes the instructions in the memory to cause the user physiological stage prediction device to perform the user physiological stage prediction method described above.

A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to perform the method of predicting a physiological period of a user as described above.

According to the technical scheme provided by the invention, target user data to be matched is obtained, and a preset data processing algorithm is adopted to perform data cleaning on the physiological cycle characteristic data to obtain the target physiological cycle characteristic data of a target user; vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user; calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector; and sequencing according to the Mahalanobis distance, determining a physiological cycle data group corresponding to the target user data and matched with the target user data according to the sequencing result, and accurately predicting the physiological cycle of the target user according to the physiological cycle characteristic data corresponding to the physiological cycle data group. The technical problem of low accuracy of menstrual period prediction is solved, and the service value is improved.

Drawings

FIG. 1 is a schematic diagram of a first embodiment of a method for predicting a physiological period of a user according to the present invention;

FIG. 2 is a diagram of a second embodiment of a method for predicting a physiological period of a user according to the present invention;

FIG. 3 is a diagram of a third embodiment of a method for predicting a physiological period of a user according to the present invention;

FIG. 4 is a diagram of a fourth embodiment of a method for predicting a physiological period of a user according to the present invention;

FIG. 5 is a diagram of a fifth embodiment of a method for predicting a physiological period of a user according to the present invention;

FIG. 6 is a schematic view of a first embodiment of a user physiological period prediction apparatus according to the present invention;

FIG. 7 is a schematic view of a second embodiment of a user physiological period prediction apparatus according to the present invention;

fig. 8 is a schematic diagram of an embodiment of the user physiological period prediction device of the present invention.

Detailed Description

The embodiment of the invention provides a user physiological period prediction method, a device, equipment and a storage medium, wherein in the technical scheme of the invention, target user data to be matched are firstly obtained, and a preset data processing algorithm is adopted to carry out data cleaning on physiological period characteristic data to obtain target physiological period characteristic data of a target user; vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user; calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector; and sequencing according to the Mahalanobis distance, determining a physiological cycle data group corresponding to the target user data and matched with the target user data according to the sequencing result, and accurately predicting the physiological cycle of the target user according to the physiological cycle characteristic data corresponding to the physiological cycle data group. The technical problem of low accuracy of menstrual period prediction is solved, and the service value is improved.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

For convenience of understanding, a detailed flow of an embodiment of the present invention is described below, and referring to fig. 1, a first embodiment of a method for predicting a physiological period of a user according to an embodiment of the present invention includes:

101. acquiring target user data to be matched, wherein the target user data comprises physiological cycle characteristic data of a target user;

in this embodiment, the target user data to be matched refers to data of a user who is being subjected to a physiological cycle prediction and needs to make a physiological cycle prediction decision by referring to information of a past historical user. The target user data not only contains personal information of the target user, but also contains information such as characteristics corresponding to a plurality of physiological cycles of the target use case. The method mainly comprises gender, age, name, time of each physiological cycle and the like, and mainly comprises the starting time of the last menstrual Period of a user, the current menstrual Period C _ Period, the menstrual Period length Period _ Day and the menstrual cycle Period _ cycle.

The term "matching" in this embodiment refers to matching information such as a physiological cycle characteristic of a target user with a physiological cycle characteristic of a past user.

102. Data cleaning is carried out on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of a target user;

in this embodiment, the server performs data cleaning on the physiological cycle characteristic data by using a preset data processing algorithm to obtain target physiological cycle characteristic data. For example, the server needs to ensure that all fields in the obtained user behavior (complaint) data are not empty, and the like.

In this embodiment, after the preliminary data acquisition is completed, the next step is to perform data preprocessing. The physiological cycle characteristic data obtained by the conventional ways are relatively direct in source, relatively strong in data integrity and authenticity, and has integrity characteristics after being processed by business processing personnel, so that the physiological cycle characteristic data are required data sources for typical event mining. Data from the network, which data cannot guarantee its availability and integrity. The unstructured data or semi-structured data from different approaches are used as the physiological cycle characteristic data for mining typical events, the data needs to be preprocessed to ensure the accuracy and integrity of the data, the quality of the physiological cycle characteristic data influences the quality of data mining, and the preprocessing is necessary to improve the quality of a data source.

The data preprocessing comprises correcting error data and completing data, converting data and denoising data. The error data correction is to remove irrelevant data which is inconsistent with the subject of the complaint in the physiological cycle characteristic data and correct errors in the physiological cycle characteristic data, for example, complaints initiated by users are not product complaints or service complaints, and the problems of inconsistent time and event descriptions can affect the result of data mining. The completion data fills in incomplete parts in the physiological cycle characteristic data, so that the complaint data is consistent in the event description, and the integrity of the data is ensured.

103. Vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user;

in this embodiment, the data type of the physiological cycle characteristic data includes not only continuous data such as a test indicator and age, but also discrete data or text data such as gender and a test result, so that vectorization processing needs to be performed on the collected new user data according to the data type to which the collected new user data belongs to obtain corresponding vectorized new user data. For example, if the target user data is a mixed type data including text type data, discrete type data, and continuous type data, the text type data and the discrete type data are preprocessed by One-Hot (Hot) coding using a word vector method in a natural language processing technology, so as to obtain vectorized data.

The continuous data does not need any normalization or normalization preprocessing, and the characteristic data of the type can be directly used.

104. Calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector;

in this embodiment, a mahalanobis distance between the target user data and each historical user data in the preset physiological cycle feature database is calculated according to the physiological cycle feature word vector generated after the vectorization processing. For example, the preset physiological cycle characteristic database has a, B, C, D, E, F, G, and 7 physiological cycle data groups in common, and each physiological cycle data group has n samples (users): a (a1, a2, a3... an), B (B1, B2, b3... bn), C (C1, C2, c3... cn), D (D1, D2, d3... dn), E (E1, E2, e3... en), F (F1, F2, f3... fn), G (G1, G2, g3... gn), and mahalanobis distances between the new user data and each sample (user) data in a, B, C, D, E, F, G, 7 physiological cycle data groups are calculated, respectively.

In this embodiment, the physiological cycle characteristic database can be understood as a database containing a large amount of data of users (physiological cycle characteristics), which includes a plurality of different groups of physiological cycles, for example, the starting time of the last menstrual period of a user is 28-30, 1-3 or 10-12; the length of the menstruation period is 2-3 days, 3-5 days and 5-7 days, the menstruation period is 20-25 days, 25-30 days, 30-35 days, quarterly and semiannually. Each physiological cycle data group comprises data information of a certain number of users with the physiological cycle type. In this embodiment, we also call the data information of these users as sample data.

105. Sorting according to the Mahalanobis distance, and determining a physiological cycle data group corresponding to the target user data according to a sorting result;

in this embodiment, the mahalanobis distances are sorted according to the calculated value of the mahalanobis distance between the target user data and each piece of historical user data in each preset disease feature information base, and a sorting result is obtained. The sorting may be from large to small, or from small to large, and this embodiment does not limit this. The mahalanobis distance between every two users with similar fates is far smaller than the mahalanobis distance between every two users with dissimilar fates.

In this embodiment, the physiological cycle data group refers to a group including a specific physiological cycle characteristic. Which contains a number of samples (users) of this type of physiological cycle characteristic. Taking the user with the menstrual cycle of 30-35 days as an example, the personal information, the physiological cycle characteristics, the development process and the outcome of the physiology and the like of each sample (user) in the physiological cycle data group in the real physiological cycle.

106. And accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

In this embodiment, if the mahalanobis distance between the new user data and the sample (user) is smaller, it indicates that the outcomes between the two users are similar, and the probability of belonging to the same physiological cycle data group is higher, so the physiological cycle data group corresponding to the new user data can be determined according to the sorting result of the mahalanobis distance. Statistics and analysis are carried out according to the physiological cycle development rules of samples (users) in the corresponding physiological cycle data group, the female physiological cycle is accurately predicted, the embarrassment that the female physiological cycle comes suddenly is avoided, sufficient preparation is made for female sex, work and life are reasonably arranged, and labor and space are combined.

In this embodiment, the mahalanobis distance is used to measure the similarity between two data samples, for example, two sample data are respectively identified by two sample matrices, the covariance of the data in the sample matrix 1 is the mahalanobis distance in the sample matrix 1, and similarly, the sample matrix 2 also has a corresponding mahalanobis distance, and if the calculated 2 mahalanobis distances are closer, the similarity of the 2 samples can be considered to be higher.

In the embodiment of the invention, target user data to be matched is acquired, and a preset data processing algorithm is adopted to perform data cleaning on the physiological cycle characteristic data to obtain the target physiological cycle characteristic data of a target user; vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user; calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector; and sequencing according to the Mahalanobis distance, determining a physiological cycle data group corresponding to the target user data and matched with the target user data according to the sequencing result, and accurately predicting the physiological cycle of the target user according to the physiological cycle characteristic data corresponding to the physiological cycle data group. The technical problem of low accuracy of menstrual period prediction is solved, and the service value is improved.

Referring to fig. 2, a second embodiment of a method for predicting a physiological period of a user according to an embodiment of the present invention includes:

201. acquiring sample data containing historical user physiological cycle characteristic data;

in this embodiment, the historical user data includes physiological cycle characteristic data of the historical user. For example, mainly for the female physiological Period, the starting time of the last menstrual Period of the user, the current menstrual Period C _ Period, the Period length Period _ Day, and the Period _ cycle. User behavior data (circadian phase data) of a number of users over at least three menstrual cycles is acquired. After the user behavior data of the user is acquired, body temperature data of one menstrual cycle needs to be selected from the user behavior data. Preferably, the temperature data of the last menstrual cycle closest to the current time is selected.

202. Preprocessing the sample data based on the type of the sample data to obtain a discretization word vector;

in this embodiment, according to the type of the sample data, the sample is preprocessed, for example, the discrete data or the text data may be subjected to vectorization processing, so as to obtain data in the form of discrete word vectors.

In this embodiment, the data type of the user data includes not only continuous data such as a physiological period examination index and age, but also discrete data such as gender and examination results or text data. Meanwhile, the discrete data and the text data must be discretized and then used in a discrete word vector form, so that the type of the user data needs to be determined. According to the type of the user data, determining vectorization processing corresponding to the data and executing the vectorization processing; when the data is text type data, vectorization processing is carried out on the data. Here, text data refers to any character that cannot participate in arithmetic operations, and is also referred to as character-type data. When the type of the data is discrete data, vectorizing the discrete data; when the type of data is continuous type data, the vectorization processing is not performed on the data.

In this embodiment, vectorization processing refers to converting words into a distributed representation, which is also called word vector, so that there is a concept of "distance" between words, including more information.

In this embodiment, the continuous data refers to continuous data, and the statistical concept is also called continuous variable. The data can be randomly selected within a certain interval, the numerical values are continuous, and two adjacent numerical values can be infinitely divided (namely infinite numerical values can be selected). For example: the specification and size of the produced parts, the height, weight, chest circumference and the like measured by a human body are continuous data, and the numerical values can be obtained only by a measuring or metering method.

In this embodiment, because the data types are different, the processing performed on the data is also different, for example, the continuous data may be used without being processed, and the text data or the discrete data may be used after being subjected to the vectorization processing, so that the vectorization processing corresponding to the sample data is determined.

203. Calculating physiological cycle feature similarity between historical users by adopting a cosine similarity algorithm based on the discretization word vector;

in this embodiment, the cosine similarity, also called cosine similarity, is evaluated by calculating the cosine value of the included angle between two vectors. Cosine similarity maps vectors into a vector space, such as the most common two-dimensional space, according to coordinate values. Cosine similarity measures the similarity between two vectors by measuring their cosine values of their angle. The cosine value of the 0-degree angle is 1, and the cosine value of any other angle is not more than 1; and its minimum value is-1. The cosine of the angle between the two vectors thus determines whether the two vectors point in approximately the same direction. When the two vectors have the same direction, the cosine similarity value is 1; when the included angle of the two vectors is 90 degrees, the value of the cosine similarity is 0; the cosine similarity has a value of-1 when the two vectors point in completely opposite directions. The result is independent of the length of the vector, only the pointing direction of the vector. Cosine similarity is commonly used in the positive space, and therefore gives values between-1 and 1.

Note that this upper and lower bounds apply to any dimension of vector space, and cosine similarity is most often used in high-dimensional space. For example, in information retrieval, each term is assigned a different dimension, and one dimension is represented by a vector whose values in the respective dimension correspond to the frequency with which the term appears in the document. Cosine similarity may thus give the similarity of two documents in terms of their subject matter.

The key point of this embodiment is to calculate the attention similarity between two users, i.e., the user behavior similarity. Here, the collaborative filtering algorithm mainly calculates the degree of similarity of interest using the degree of similarity of behaviors. And giving a user u and a user v, and enabling N (u) to represent the index set of the user u having positive feedback, and enabling N (v) to be the index set of the user v having positive feedback. Then, the attention similarity of u and v can be simply calculated by the following Jaccard formula or by the cosine formula:

where n (i) is a user group having a behavior with respect to the index i, and Wuv is the similarity between the user u and the user v.

204. Clustering historical users based on the similarity to obtain physiological cycle data groups with similar physiological cycle characteristics;

in this embodiment, based on the similarity, the historical users are clustered to obtain physiological cycle data groups with similar physiological cycle characteristics. Clustering, as used herein, refers to the process of dividing a collection of physical or abstract objects into classes composed of similar objects, referred to as clustering. The cluster generated by clustering is a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters. "the groups of things and the groups of people" have a great number of classification problems in natural science and social science. Clustering analysis, also known as cluster analysis, is a statistical analysis method for studying (sample or index) classification problems. The clustering analysis originates from taxonomy, but clustering is not equal to classification. Clustering differs from classification in that the class into which the clustering is required to be divided is unknown. The clustering analysis content is very rich, and a system clustering method, an ordered sample clustering method, a dynamic clustering method, a fuzzy clustering method, a graph theory clustering method, a clustering forecasting method and the like are adopted.

In this embodiment, based on the behavior similarity of the user, the user is classified into the user group corresponding to each initial clustering center respectively until the user is classified completely. The user behavior similarity refers to the coincidence degree of the data related to the user physiological cycle. For example, the user's last menstrual Period start time, current menstrual Period C _ Period, menstrual Period length Period _ Day, and menstrual Period _ cycle, body temperature data of a menstrual cycle, menstruation date, and the like.

205. Acquiring target user data to be matched, wherein the target user data comprises physiological cycle characteristic data of a target user;

206. data cleaning is carried out on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of a target user;

207. vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user;

208. calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector;

209. sorting according to the Mahalanobis distance, and determining a physiological cycle data group corresponding to the target user data according to a sorting result, wherein the physiological cycle data group respectively comprises different physiological cycle characteristic data;

210. and accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

The steps 205-210 in the present embodiment are similar to the steps 101-106 in the first embodiment, and are not described herein again.

Referring to fig. 3, a third embodiment of the method for predicting a physiological period of a user according to the embodiment of the present invention includes:

301. acquiring target user data to be matched, wherein the target user data comprises physiological cycle characteristic data of a target user;

302. receiving a data cleaning request, wherein the data cleaning request comprises data cleaning of user behavior data according to a data format of a preset feature extraction algorithm;

in this embodiment, the data cleaning request may be to perform data cleaning on the feature data of the physiological cycle according to a data format of a preset feature extraction algorithm, or may be an abnormal data query request triggered after a user inputs a query keyword, where the query keyword may be content included in a query field. For example, for the query field of the address information including a plurality of specific address information, the query keyword may be a specific name field in the query field of the address information. The data cleaning is used for filtering the data which do not meet the requirements, searching and recording the filtering result so as to confirm whether the filtering is carried out or not, or extracting after the filtering is carried out by a business unit. Unsatisfactory data is mainly incomplete data, erroneous data, repeated data, etc.

For data cleansing in which the amount of data is very large, the data may be divided according to a predetermined period of time. In a time period, a fixed section of historical data is cleaned, so that the phenomenon that the data cleaning times are obviously increased due to the increase of the data when the data are cleaned in real time is avoided, and the data are prevented from being repeatedly cleaned for many times.

303. Determining a data cleansing rule according to the data cleansing request, wherein the data cleansing rule comprises: cleaning characteristic factors and cleaning conditions satisfied by the cleaning characteristic factors;

in this embodiment, different data cleansing requests may have corresponding service scenarios, and different service scenarios have rules or conditions that the data needs to satisfy. For example: for the repatriage business scenario, a certain threshold is usually set by the repatriage policy, and the repatriage is performed only when the transaction reaches the threshold, or the sales promotion of the shopping platform needs to meet certain conditions to participate in the sales promotion. The embodiment of the specification can determine a service scene of data cleaning according to the data cleaning request, obtain a service requirement corresponding to the service scene, and define a corresponding data cleaning rule according to the service requirement, where the data cleaning rule may include a cleaning characteristic factor and a cleaning condition satisfied by the cleaning characteristic factor. Wherein the cleaning characteristic factor may represent a key characteristic of data cleaning.

In this embodiment, the data preprocessing includes correcting error data and completing data, data transformation, and data denoising. The error correction data is to remove irrelevant data which is inconsistent with the subject of the complaint from the complaint history data and correct the error in the complaint history data. For example, the complaint initiated by the user is neither a product complaint nor a service complaint, and the time and event description are inconsistent, which may affect the result of data mining. The completion data fills in incomplete parts in the complaint history data, so that the complaint data is consistent in the event description, and the integrity of the data is ensured.

304. Determining a characteristic factor value corresponding to the cleaning characteristic factor;

in this embodiment, after defining the data cleaning rule, the data to be cleaned, that is, the address information of all target merchants in the preset area, may be acquired. Such as: corresponding original data can be obtained from the database according to the data cleaning request, and then the characteristic factor value corresponding to the cleaning characteristic factor is determined according to the data to be cleaned.

305. According to the data cleaning rule and the characteristic factor value, cleaning the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data;

in this embodiment, after determining the characteristic factor value corresponding to the cleaning characteristic factor, data cleaning may be performed according to the data cleaning rule, and whether each piece of data in the data to be cleaned satisfies the data cleaning rule is determined. If the data meets the cleaning rule, the data is reserved, if the data does not meet the cleaning rule, the data can be deleted, and the reserved data meeting the cleaning rule of the data is used as cleaning result data.

In this embodiment, after the preliminary data acquisition is completed, the next step is to perform data preprocessing. The physiological cycle characteristic data are derived from traditional approaches, and the data integrity and authenticity of the physiological cycle characteristic data obtained according to the approaches are strong. The process of the business process personnel, with the characteristics of integrity, is the source of the data to be mined for a typical event. Data from the network, which data cannot guarantee its availability and integrity. The unstructured data or semi-structured data from different approaches are used as physiological cycle characteristic data for mining typical events, and the data needs to be preprocessed to ensure the accuracy and integrity of the data. The quality of the physiological cycle characteristic data influences the quality of data mining, and the improvement of the quality of a data source through preprocessing is necessary.

306. Data cleaning is carried out on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of a target user;

307. vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user;

308. calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector;

309. sorting according to the Mahalanobis distance, and determining a physiological cycle data group corresponding to the target user data according to a sorting result, wherein the physiological cycle data group respectively comprises different physiological cycle characteristic data;

310. and accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

Steps

301 and 306 and 309 in this embodiment are similar to

steps

101 and 103 and 106 in the first embodiment, and are not described herein again.

Referring to fig. 4, a fourth embodiment of the method for predicting a physiological period of a user according to the embodiment of the present invention includes:

401. acquiring target user data to be matched, wherein the target user data comprises physiological cycle characteristic data of a target user;

402. data cleaning is carried out on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of a target user;

403. vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user;

404. calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector;

405. sequencing all the Mahalanobis distances to obtain a sequencing result;

in this embodiment, the mahalanobis distances are sorted according to the calculated mahalanobis distance value between the target user behavior data and the user behavior data corresponding to each historical user in each physiological period prediction database, and a sorting result is obtained. The ranking may be from large to small, or from small to large, wherein the mahalanobis distance between two users with similar physiological periods is much smaller than the mahalanobis distance between two users with dissimilar physiological periods.

406. Determining a user group corresponding to and matched with the target user data based on the sequencing result, wherein the user group respectively comprises physiological cycle prediction information corresponding to different user behavior data;

in this embodiment, the user group refers to a group of users with similar physiological periods, which includes a certain number of users with similar physiological periods. Taking the behavior data of a user (physiological Period) with a physiological Period of 35 days as an example, the personal information of each user in the user group in the whole physiological Period process, the starting time of the last menstrual Period of the user, the current menstrual Period C _ Period, the menstrual Period length Period _ Day, the menstrual Period _ cycle and the like are taken as examples.

In this embodiment, if the mahalanobis distance between the user behavior data corresponding to the target user and the user behavior data corresponding to the sample (the historical user in the user group) is smaller, it indicates that the physiological cycles between the two users are similar, and the probability of belonging to the same user group is higher. Therefore, the user group to which the user behavior data corresponding to the target user belongs can be determined according to the sorting result of the mahalanobis distance.

In this embodiment, mahalanobis distance is used to measure the similarity between two data samples. For example, two sample data are respectively identified by two sample matrices, the covariance of the data in the sample matrix 1 is the mahalanobis distance in the sample matrix 1, and similarly, the sample matrix 2 also has a corresponding mahalanobis distance, and if the calculated 2 mahalanobis distances are closer, the similarity of the 2 samples can be considered to be higher.

407. And accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

The

steps

401, 404, 407 in the present embodiment are similar to the

steps

101, 104, 106 in the first embodiment, and are not described herein again.

Referring to fig. 5, a fifth embodiment of the method for predicting a physiological period of a user according to the present invention includes:

501. acquiring target user data to be matched, wherein the target user data comprises physiological cycle characteristic data of a target user;

502. data cleaning is carried out on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of a target user;

503. vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user;

504. calculating the mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector;

505. sorting according to the Mahalanobis distance, and determining a physiological cycle data group corresponding to the target user data according to a sorting result, wherein the physiological cycle data group respectively comprises different physiological cycle characteristic data;

506. accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group;

507. acquiring physiological cycle data contained in a physiological cycle data group;

in this embodiment, the physiological cycle data includes each time segment of the physiological cycle, which mainly includes the starting time of the last menstrual Period of the user, the current menstrual Period C _ Period, the Period length Period _ Day, and the menstrual cycle Period _ cycle.

508. Performing feature extraction on the physiological cycle data to obtain physiological cycle feature data corresponding to the physiological cycle data;

in this embodiment, the existing feature extraction method can be roughly divided into three directions: a Filter: a filtering method, which scores each feature according to the divergence or the correlation, sets a threshold or the number of thresholds to be selected, and selects the feature; wrapper: a packing method, selecting several features at a time, or excluding several features, according to an objective function (usually a predictive effect score); embedded: the embedding method comprises the steps of firstly training by using certain machine learning algorithms and models to obtain weight coefficients of all the features, and selecting the features from large to small according to the coefficients. Similar to the Filter method, but with training to determine the goodness of the feature. And obtaining physiological cycle characteristic data corresponding to the physiological cycle data by performing characteristic extraction on the physiological cycle data.

509. Accurately predicting the physiological period of the target user according to the physiological period characteristic data;

in this embodiment, the physiological period of the target user is accurately predicted according to the physiological cycle characteristic data. For example, when the user a is similar to the user data in the physiological cycle data group a, the physiological cycle of the user a can be predicted according to the physiological cycle characteristics of all the users in the physiological cycle data group a.

510. When the user actively modifies the user data, the modified user data is synchronized to preset user data information, and the corresponding physiological cycle time of the user is corrected.

In this embodiment, if the user actively modifies the menstrual period data, the modified menstrual period data is synchronized to the menstrual period calendar, and the menstrual period time is corrected. For example, in the process of calculating the menstrual Period time, if the conditions of menstrual Period extension and menstrual Period postponement occur, the user actively modifies the menstrual Period data, namely, the current menstrual Period C _ Period, the Period length Period _ Day and the menstrual Period _ cycle; through APP modification of the mobile terminal, the mobile terminal synchronizes information to the menstrual calendar, and the menstrual calendar corrects menstrual time through data input by a user.

Steps 501-506 in this embodiment are similar to steps 101-106 in the first embodiment, and are not described herein again.

In the above description of the method for predicting the physiological period of the user in the embodiment of the present invention, referring to fig. 6, the following description of the device for predicting the physiological period of the user in the embodiment of the present invention, a first embodiment of the device for predicting the physiological period of the user in the embodiment of the present invention includes:

a first obtaining module 601, configured to obtain target user data to be matched, where the target user data includes physiological cycle characteristic data of a target user;

a data cleaning module 602, configured to perform data cleaning on the physiological cycle characteristic data by using a preset data processing algorithm, so as to obtain target physiological cycle characteristic data of the target user;

the vectorization processing module 603 is configured to perform vectorization processing on the target physiological cycle feature data of the target user to obtain a physiological cycle feature word vector corresponding to the target user;

a first calculating module 604, configured to calculate mahalanobis distances between the target user data and each piece of historical user data in a preset physiological cycle feature database based on the physiological cycle feature word vector;

a determining module 605, configured to sort according to the mahalanobis distance, and determine a physiological cycle data group corresponding to the target user data according to a sorting result, where the physiological cycle data group includes different physiological cycle characteristic data;

and the predicting module 606 is configured to accurately predict the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

Referring to fig. 7, a second embodiment of the user physiological period prediction apparatus according to the embodiment of the present invention specifically includes:

In this embodiment, the device for predicting a physiological period of a user further includes:

a second obtaining module 607, configured to obtain sample data including characteristic data of a physiological cycle of a historical user; the preprocessing module is used for preprocessing the sample data based on the type of the sample data to obtain a discretization word vector;

a second calculating module 608, configured to calculate, based on the discretized word vector, a physiological cycle feature similarity between the historical users by using a cosine similarity algorithm;

and the clustering module 609 is configured to cluster the historical users based on the similarity to obtain a physiological cycle data group with similar physiological cycle characteristics.

In this embodiment, the clustering module 609 is specifically configured to:

setting the clustering number as k, and randomly selecting physiological cycle data corresponding to k historical users as an initial clustering center;

and selecting the maximum similarity value corresponding to each historical user based on the similarity value, and dividing each historical user into the cluster where the clustering center corresponding to the maximum similarity value is located until the historical users are divided, thereby obtaining a clustering result.

In this embodiment, the data cleaning module 602 is specifically configured to:

receiving a data cleaning request, wherein the data cleaning request comprises data cleaning of the user behavior data according to a data format of a preset feature extraction algorithm;

determining a data cleansing rule according to the data cleansing request, wherein the data cleansing rule comprises: a cleaning characteristic factor and a cleaning condition satisfied by the cleaning characteristic factor;

determining a characteristic factor value corresponding to the cleaning characteristic factor;

and cleaning the physiological cycle characteristic data by adopting a preset data processing algorithm according to the data cleaning rule and the characteristic factor value to obtain target physiological cycle characteristic data.

In this embodiment, the determining module 605 includes:

a sorting unit 6051, configured to sort the mahalanobis distances to obtain a sorting result;

a determining unit 6052, configured to determine, based on the sorting result, user groups to which the target user data corresponds, where the user groups respectively include physiological cycle prediction information corresponding to different user behavior data.

the synchronization module 610 is configured to synchronize the modified user data to preset user data information and correct the physiological cycle time corresponding to the user when the user actively modifies the user data.

Fig. 6 and 7 describe the user physiological period prediction apparatus in the embodiment of the present invention in detail from the perspective of a modular functional entity, and the user physiological period prediction apparatus in the embodiment of the present invention is described in detail from the perspective of hardware processing.

Fig. 8 is a schematic structural diagram of a user physiological period prediction device according to an embodiment of the present invention, where the user physiological period prediction device 800 may generate relatively large differences due to different configurations or performances, and may include one or more processors (CPUs) 810 (e.g., one or more processors) and a memory 820, and one or more storage media 830 (e.g., one or more mass storage devices) storing an application 833 or data 832. Memory 820 and storage medium 830 may be, among other things, transient or persistent storage. The program stored in the storage medium 830 may include one or more modules (not shown), each of which may include a sequence of instructions operating on the user's physiological period prediction device 800. Still further, the processor 810 may be configured to communicate with the storage medium 830, and execute a series of instruction operations in the storage medium 830 on the user physiological period prediction device 800 to implement the steps of the user physiological period prediction method provided by the above-described method embodiments.

The user's circadian prediction device 800 may also include one or more power supplies 840, one or more wired or wireless network interfaces 850, one or more input-output interfaces 860, and/or one or more operating systems 831, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc. Those skilled in the art will appreciate that the configuration of the user physiology prediction device illustrated in fig. 8 does not constitute a limitation of the user physiology prediction device provided herein, and may include more or fewer components than those illustrated, or some components in combination, or a different arrangement of components.

The present invention also provides a computer-readable storage medium, which may be a non-volatile computer-readable storage medium, and which may also be a volatile computer-readable storage medium, having stored therein instructions, which, when run on a computer, cause the computer to perform the steps of the above-mentioned method for predicting a physiological period of a user.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A user physiological period prediction method, characterized by comprising:

acquiring target user data to be matched, wherein the target user data comprises physiological cycle characteristic data of a target user;

performing data cleaning on the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of the target user;

vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user;

calculating the Mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector;

sorting according to the Mahalanobis distance, and determining a physiological cycle data group corresponding to the target user data according to a sorting result, wherein the physiological cycle data group respectively comprises different physiological cycle characteristic data;

and accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

2. The method of predicting physiological periods of a user according to claim 1, wherein prior to said obtaining target user data to be matched, comprising:

acquiring sample data containing historical user physiological cycle characteristic data;

preprocessing the sample data based on the type of the sample data to obtain a discretization word vector;

calculating physiological cycle feature similarity between the historical users by adopting a cosine similarity algorithm based on the discretization word vector;

and clustering the historical users based on the similarity to obtain a physiological cycle data group with similar physiological cycle characteristics.

3. The method according to claim 2, wherein the clustering the historical users based on the similarity to obtain a physiological cycle data group with similar physiological cycle characteristics comprises:

4. The method of predicting the physiological period of the user according to claim 1, wherein the step of performing data cleaning on the physiological period characteristic data by using a preset data processing algorithm to obtain the target physiological period characteristic data of the target user comprises:

5. The method according to claim 1, wherein the step of ranking according to the mahalanobis distance and determining the corresponding matched physiological cycle data group of the target user data according to the ranking comprises:

sequencing the Mahalanobis distances to obtain a sequencing result;

and determining a user group corresponding to the target user data based on the sequencing result, wherein the user group respectively comprises physiological cycle prediction information corresponding to different user behavior data.

6. The method according to claim 1, wherein the accurately predicting the physiological phase of the target user according to the physiological cycle characteristic data corresponding to the physiological cycle data group comprises:

7. The method according to claim 3, wherein after the physiological cycle characteristic data corresponding to the physiological cycle data group is used to accurately predict the physiological cycle of the target user, the method further comprises:

when the user actively modifies the user data, the modified user data is synchronized to preset user data information, and the corresponding physiological cycle time of the user is corrected.

8. A user physiological period prediction apparatus, characterized by comprising:

the device comprises a first acquisition module, a second acquisition module and a matching module, wherein the first acquisition module is used for acquiring target user data to be matched, and the target user data comprises physiological cycle characteristic data of a target user;

the data cleaning module is used for cleaning the physiological cycle characteristic data by adopting a preset data processing algorithm to obtain target physiological cycle characteristic data of the target user;

the vectorization processing module is used for vectorizing the target physiological cycle characteristic data of the target user to obtain a physiological cycle characteristic word vector corresponding to the target user;

the first calculation module is used for calculating the Mahalanobis distance between the target user data and each historical user data in a preset physiological cycle characteristic database based on the physiological cycle characteristic word vector;

the determining module is used for sequencing according to the Mahalanobis distance and determining a physiological cycle data group corresponding to and matched with the target user data according to a sequencing result, wherein the physiological cycle data group respectively comprises different physiological cycle characteristic data;

and the prediction module is used for accurately predicting the physiological period of the target user according to the physiological period characteristic data corresponding to the physiological period data group.

9. A user physiological period prediction device, characterized in that the user physiological period prediction device comprises: a memory having instructions stored therein and at least one processor, the memory and the at least one processor interconnected by a line;

the at least one processor invoking the instructions in the memory to cause the user physiological stage prediction device to perform the steps of the user physiological stage prediction method of any one of claims 1-7.

10. A computer-readable storage medium, having a computer program stored thereon, which, when being executed by a processor, carries out the steps of the method for predicting a physiological period of a user according to any one of claims 1 to 7.