CN114723516A - User similarity calculation method and system based on form data - Google Patents

User similarity calculation method and system based on form data Download PDF

Info

Publication number
CN114723516A
CN114723516A CN202210254967.7A CN202210254967A CN114723516A CN 114723516 A CN114723516 A CN 114723516A CN 202210254967 A CN202210254967 A CN 202210254967A CN 114723516 A CN114723516 A CN 114723516A
Authority
CN
China
Prior art keywords
order
user
data
similarity
discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210254967.7A
Other languages
Chinese (zh)
Inventor
王安琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Elongnet Information Technology Beijing Co Ltd
Original Assignee
Elongnet Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Elongnet Information Technology Beijing Co Ltd filed Critical Elongnet Information Technology Beijing Co Ltd
Priority to CN202210254967.7A priority Critical patent/CN114723516A/en
Publication of CN114723516A publication Critical patent/CN114723516A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing
    • G06Q30/0635Processing of requisition or of purchase orders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to the technical field of computer machine learning, in particular to a user similarity calculation method and a user similarity calculation system based on form data, which aim to solve the problem of low efficiency of clustering similar users in the prior art, and the technical scheme is the user similarity calculation method based on the form data, and historical order data in a preset time interval are obtained; discretizing the historical order data to obtain discrete order data; screening and removing repeated order types, and constructing and acquiring an order type dictionary by taking each order type as a field; aggregating the discrete order data based on the user information, and splicing the discrete order data corresponding to the same user information into a user order sequence; acquiring a user vector based on the order type dictionary and the user order sequence; the method and the device for calculating the similarity of the target user and the other users are used for obtaining the similarity between the target user and the other users based on a preset similarity algorithm and user vectors.

Description

User similarity calculation method and system based on form data
Technical Field
The application relates to the technical field of computer machine learning, in particular to a user similarity calculation method and system based on form data.
Background
With the rapid development of internet big data and the maturity of related technologies, all industries use big data technologies to bring sufficient opportunities and wide development for industry development, but with the expansion increase of information resources, the problem of information overload also appears. Under the environment of information overload, various fields usually face the problem of interference of a large amount of invalid information in the application of big data technology, so that commodities cannot be accurately positioned to target users, and the actual application effect of big data analysis is poor.
At present, in order to reduce the influence of excessive information on big data application, a user similarity analysis technology is widely applied as an effective tool for delineating a target user, and the user similarity analysis technology can achieve the effect of clustering similar users by performing tagging analysis on the attributes and behaviors of the target user in the big data, so that targeted commodity information push is provided for the target user by means of the similar users of the target user. In implementation, a plurality of different user attributes are usually constructed for a user in advance according to categories of commodities, behaviors of a target user in a historical order for different commodities can be described through the user attributes, preferences of the target user for different commodities are further expressed, and a plurality of similar users can be clustered through similar preferences.
In the process of implementing the present application, the inventors found that the above-mentioned technology has at least the following problems:
when different commodities are analyzed, technicians need to manually construct user attributes aiming at the commodity categories, and the manually constructed user attributes have certain subjectivity, so that the accuracy and efficiency of clustering similar users are low.
Disclosure of Invention
In order to replace the fact that technical staff construct user attributes through understanding of user preferences and improve objectivity of user similarity analysis bases and improve accuracy and efficiency of user similarity analysis, the application provides a user similarity calculation method and system based on form data.
In a first aspect, the method for calculating user similarity based on singleton data provided by the application adopts the following technical scheme:
a user similarity calculation method based on singleton data comprises the following steps:
acquiring historical order data in a preset time interval, wherein the historical order data at least comprises user information and order information;
discretizing the historical order data based on a preset discretization rule to obtain discretization order data, wherein the discretization order data comprise user information and order types;
extracting order types in the discrete order data, screening and removing repeated order types, and constructing and acquiring an order type dictionary by taking each screened order type as a field;
aggregating discrete order data based on the user information, and splicing the discrete order data corresponding to the same user information into a user order sequence;
obtaining a user vector based on the order type dictionary and a user order sequence, wherein the dimension of the user vector is the number of order types in the order type dictionary, and the numerical value of each dimension of the user vector is the frequency of each order type in the order type dictionary appearing in the user order sequence;
and acquiring the similarity between the target user and other users based on a preset similarity algorithm and the user vector.
By adopting the technical scheme, the original user order data are subjected to discrete processing after the user order data are obtained, so that the original data with larger data volume can be subjected to dimension reduction, the complexity of the data to be processed is reduced, the calculation efficiency of user similarity is improved, the historical order data are subjected to aggregation processing after the discrete processing, all order data related to a target user are integrated, the integrated user order sequence comprises all orders of the user within a certain time interval, the commodity preference of the user is comprehensively and objectively described, and the calculation accuracy of the user similarity is improved; by constructing the common dictionary, the user order sequence of each user can be expressed in a vector form, and similarity calculation is performed in the vector form, so that the data volume and complexity of the user order sequence are reduced, and the calculation efficiency of the user similarity is further improved while the data storage space is saved.
In a specific implementation manner, the acquiring historical order data within a preset time interval, where the historical order data at least includes user information and order information, includes:
comparing the historical order number in the historical order data with a preset sample number interval, wherein the sample number interval comprises an upper sample number threshold value and a lower sample number threshold value;
when the historical order number is lower than a lower sample number threshold value, the time interval for obtaining the historical order data is gradually increased according to a preset step length;
and when the historical order number is higher than the upper sample number threshold value, the time interval for acquiring the historical order data is gradually reduced according to a preset step length.
By adopting the technical scheme, the data volume of the original data is appropriate, namely, the burden of data storage and operation is not caused excessively while the analysis and calculation are met, in addition, when the original data is acquired, when the acquisition volume of the original data is too low or too little, the acquisition volume of the original data is adjusted in a time interval adjusting mode, the simple degree of adjustment of the acquisition volume of the original data is improved, and the calculation efficiency of the similarity of users is improved.
In a specific implementation manner, the preset discrete processing rule at least includes an equal-frequency bucket, an equidistant bucket, and a clustering bucket, and the discretizing of the historical order data is performed based on the preset discrete processing rule to obtain discrete order data, where the discrete order data includes user information and an order type, and the discretizing includes:
acquiring an order item set included in the historical order data, and selecting a discrete processing rule based on the order item set;
and carrying out discretization processing on the historical order data based on the selected discretization processing rule to obtain discrete order data.
By adopting the technical scheme, the set multiple discrete processing rules are beneficial to enabling technicians to select the applicable discrete processing rules according to actual use requirements, the flexibility of historical order data processing is improved, and the discrete processing rules which can be adaptively adjusted according to the historical order data types are beneficial to improving the operation efficiency of discrete processing.
In a specific implementation manner, after the extracting order types in the discrete order data, filtering and removing the repeated order types, and constructing and acquiring an order type dictionary with each filtered order type as a field, the method includes:
comparing the field number in the order type dictionary with a preset field number interval;
when the number of fields in the order type dictionary does not belong to the field number interval, adjusting the discrete processing rule to match the number of fields with the field number interval.
By adopting the technical scheme, data operation and storage burden is easily brought when the number of fields in the order type dictionary is too high, and inaccurate calculation results of user similarity are easily caused when the number of fields in the order type dictionary is too low, so that the balance of operation efficiency and accuracy can be acquired by stipulating the number of fields in the order type dictionary, and the convenience degree of adjustment of the order type dictionary can be improved by adjusting through discrete processing rules.
In a specific implementation manner, the obtaining the similarity between the target user and the other users based on the preset similarity algorithm and the user vector includes:
respectively calculating the similarity between the target user vector of the target user and other user vectors of other users based on a preset similarity algorithm;
and based on the similarity, performing descending order arrangement on other users.
By adopting the technical scheme, the users except the target user are arranged in a descending order based on the similarity, so that the method is beneficial to assisting technicians to intuitively acquire other users similar to the target user, and is beneficial to improving the efficiency of performing similar analysis on the target user.
In a specific implementation manner, after the sorting the other users in descending order based on the similarity, the method includes:
selecting a preset number of other users from high to low according to the sequence, and marking the other users as similar users corresponding to the target user;
and acquiring the preference commodity information of the similar users and pushing the preference commodity information to the target user.
By adopting the technical scheme, the commodity preference of the target user is pushed to the similar user according to the similar user of the target user, the potential preference commodities with different dimensions, different time and different space of the target user can be obtained, and the commodity pushing effect can be improved.
In a specific embodiment, the method further comprises:
and when data missing occurs in the historical order data, replacing the missing data based on a preset replacement rule.
By adopting the technical scheme, missing data in historical order data can be replaced by the aid of the preset replacement rule, so that the fault tolerance rate in the similarity calculation process can be improved, and the stability in the data calculation process can be improved.
In a second aspect, the present application provides a user similarity calculation system based on form data, which adopts the following technical solutions:
a system for calculating user similarity based on singleton data, the system comprising:
the data acquisition module is used for acquiring historical order data in a preset time interval, and the historical order data at least comprises user information and order information;
the discrete processing module is used for carrying out discrete processing on the historical order data based on a preset discrete processing rule so as to obtain discrete order data, and the discrete order data comprises user information and order types;
the dictionary construction module is used for extracting order types in the discrete order data, screening and removing repeated order types, constructing and acquiring an order type dictionary by taking each screened order type as a field;
the sequence aggregation module is used for aggregating the discrete order data based on the user information and splicing the discrete order data corresponding to the same user information into a user order sequence;
the vector generation module is used for acquiring a user vector based on the order type dictionary and the user order sequence, the dimension of the user vector is the number of order types in the order type dictionary, and the numerical value of each dimension of the user vector is the frequency of each order type in the order type dictionary appearing in the user order sequence;
and the similarity calculation module is used for acquiring the similarity between the target user and other users based on a preset similarity calculation method and the user vector.
By adopting the technical scheme, the original user order data are subjected to discrete processing after the user order data are obtained, so that the original data with larger data volume can be subjected to dimension reduction, the complexity of the data to be processed is reduced, the calculation efficiency of the user similarity is improved, the historical order data are subjected to aggregation processing after the discrete processing, all order data related to a target user are integrated, the integrated user order sequence comprises all orders of the user within a certain time interval, the commodity preference of the user is comprehensively and objectively described, and the calculation accuracy of the user similarity is improved; by constructing the common dictionary, the user order sequence of each user can be expressed in a vector form, and similarity calculation is performed in the vector form, so that the data volume and complexity of the user order sequence are reduced, and the calculation efficiency of the user similarity is further improved while the data storage space is saved.
In a third aspect, the present application provides an intelligent terminal, which adopts the following technical scheme:
an intelligent terminal comprising a processor and a memory, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement a method for calculating user similarity based on singleton data according to any one of the first aspect.
By adopting the technical scheme, the processor in the intelligent terminal can realize the user similarity calculation method based on the singleton data according to the related computer program stored in the memory, so that technical personnel can be replaced to construct user attributes through understanding of user preference, the objectivity of the user similarity analysis basis is improved, and the accuracy and the efficiency of the user similarity analysis are improved.
In a fourth aspect, the present application provides a computer-readable storage medium, which adopts the following technical solutions:
a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of unitary data based user similarity calculation as defined in any one of the first aspects.
By adopting the technical scheme, the corresponding program can be stored, so that the user attribute is constructed by technical personnel through understanding the preference of the user, the objectivity of the analysis basis of the user similarity is improved, and the accuracy and the efficiency of the user similarity analysis are improved.
In summary, the present application includes at least one of the following beneficial technical effects:
1. after user order data are obtained, original user order data are subjected to discrete processing, so that the original data with large data volume can be subjected to dimensionality reduction, the complexity of data to be processed is reduced, the calculation efficiency of user similarity is improved, historical order data are subjected to aggregation processing after the discrete processing, all order data related to a target user are integrated, all orders in a certain time interval of the user are included in an integrated user order sequence, the commodity preference of the user is comprehensively and objectively described, and the calculation accuracy of the user similarity is improved; by constructing a common dictionary, the user order sequence of each user can be expressed in a vector form, and similarity calculation is performed in the vector form, so that the data volume and complexity of the user order sequence are reduced, and the calculation efficiency of the user similarity is further improved while the data storage space is saved;
2. the set multiple discrete processing rules are beneficial to enabling technicians to select the applicable discrete processing rules according to actual use requirements, improving the flexibility of historical order data processing, and improving the operation efficiency of discrete processing according to the discrete processing rules which can be adaptively adjusted according to the historical order data types;
3. according to the method and the device, the commodity preference of the similar user is pushed to the target user according to the similar user of the target user, the target user can obtain the potential preference commodities with different dimensions, different time and different spaces, and the commodity pushing effect can be improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a method for calculating user similarity based on singleton data shown in the embodiment of the present application;
FIG. 2 is a system flow diagram of a system for calculating user similarity based on singleton data shown in the embodiment of the present application;
fig. 3 is a schematic structural diagram of an intelligent terminal shown in an embodiment of the present application.
Detailed Description
The present embodiments are only illustrative and not restrictive, and those skilled in the art can make modifications to the embodiments without inventive contribution as required after reading the present specification, but the technical solutions in the embodiments of the present application will be described clearly and completely in the following with reference to fig. 1 to 3 of the embodiments of the present application as long as they are protected by patent laws within the scope of the claims of the present application to make the objects, technical solutions and advantages of the embodiments of the present application clearer, and it is obvious that the described embodiments are a part of the embodiments of the present application, but not all of the embodiments.
The embodiment of the application provides a user similarity calculation method based on form data, an execution main body can be an intelligent terminal, a processing flow can be shown in fig. 1, and the method comprises the following processing steps:
step 101, obtaining historical order data in a preset time interval, wherein the historical order data at least comprises user information and order information;
102, discretizing the historical order data based on a preset discretization rule to obtain discretization order data, wherein the discretization order data comprises user information and order types;
103, extracting order types in the discrete order data, screening and removing repeated order types, and constructing and acquiring an order type dictionary by taking each screened order type as a field;
step 104, aggregating discrete order data based on the user information, and splicing the discrete order data corresponding to the same user information into a user order sequence;
step 105, obtaining a user vector based on the order type dictionary and the user order sequence, wherein the dimension of the user vector is the number of order types in the order type dictionary, and the numerical value of each dimension of the user vector is the number of times of each order type in the order type dictionary appearing in the user order sequence;
and step 106, acquiring the similarity between the target user and other users based on a preset similarity algorithm and the user vector.
The process flow shown in fig. 2 will be described in detail below with reference to the specific embodiments, and the contents may be as follows:
step 101, obtaining historical order data in a preset time interval, wherein the historical order data at least comprises user information and order information.
In implementation, the historical order data may be original order data retained by the platform in the daily operation process, and the intelligent terminal may be called from the platform database. In order to meet the analysis requirement, at least user information for describing the characteristics of the user and order information for describing a specific trade order need to be included in the historical order data, and it needs to be emphasized that the user information and the order information may include several item sets. After acquiring the historical order data, the intelligent terminal can sort the historical order data, and the specific steps can be as shown in the following table 1:
serial number User information (id) Order information (item type, item unit price, time of placing order, etc.)
1 A user Food, 13, 11: 30.
2 B user Food, 24, 15: 10.
3 A user Cinema ticket, 28, 13: 20.
... ... ...
N N users Cell phone, 5288, 17: 43.
Table 1: historical order data
In this way, the intelligent terminal can acquire all historical order data within a preset time interval, and the historical order data can include a plurality of pieces of order information corresponding to the same user information. The time interval preset here is set by a technician, who can set the time interval to the previous day, week, month, etc. It should be emphasized that the user information indicated by "id" and the order information indicated by "commodity category, commodity unit price, order placing time, etc" shown in table 1 above are only examples, and in practical applications, the selection of the user information and the item set in the order information may be selected according to the item set of the order data retained in the daily operation of the platform, for example: the user device identification code is used as user information, and the commodity ordering date, commodity functions and the like are used as order information.
In one embodiment, after the intelligent terminal acquires historical order data, if the data size is too small, similarity calculation cannot be supported; if the data size is too large, the calculation load of the terminal increases, and accordingly, the following processing may be performed after step 101: comparing the historical order number in the historical order data with a preset sample number interval, wherein the sample number interval comprises an upper sample number threshold value and a lower sample number threshold value; when the historical order number is lower than a sample number lower threshold value, gradually increasing the time interval for acquiring the historical order data according to a preset step length; and when the historical order number is higher than the upper sample number threshold value, the time interval for acquiring the historical order data is gradually reduced according to a preset step length.
In implementation, when the intelligent terminal acquires the historical order data, the historical order data are acquired by dividing according to the preset time interval, so that the order sample number contained in the historical order data is in a strong correlation with the preset time interval, and the larger the span of the time interval is, the more the historical order data sample number is. Specifically, after the intelligent terminal acquires the historical order data, the historical order data may be compared with a preset sample number interval, where the sample number interval may include an upper sample number threshold and a lower sample number threshold, the upper sample number threshold may be set based on data storage and calculation capabilities of the intelligent terminal, and the lower sample number threshold may be specified by a technician to improve the accuracy of similarity calculation.
In addition, in order to improve the adjustment efficiency of the time interval, the intelligent terminal may set an adjustment step size, and since the absolute value difference of the data amount is large in different data systems, the adjustment step size may be set to be a fixed proportion of the length of the time interval, for example, to be 10%, 15%, and the like of the length of the time interval.
Therefore, when the acquisition quantity of the original historical order data is too low or too small, the intelligent terminal can adjust the acquisition quantity of the original data in a time interval adjusting mode, so that the data quantity is appropriate, and meanwhile, the simple data acquisition quantity adjusting mode is beneficial to improving the calculation efficiency of the similarity of users.
102, discretizing the historical order data based on a preset discretization rule to obtain discretization order data, wherein the discretization order data comprises user information and order types.
In implementation, after the intelligent terminal obtains the historical order data, in order to facilitate analysis and processing, the complexity of the historical order data needs to be reduced, the historical order data can be subjected to discrete processing, and a specific algorithm of the discrete processing can be preset in the intelligent terminal. In the process of discrete processing of historical order data, the data under the same item set must follow the same discrete principle. Specifically, the historical order data after discrete processing can be as shown in the following table 2:
serial number User information (id) Order type (item type, unit price, time of placing order, etc.)
1 A user Food, 10-15, 11-12.
2 B user Food, 20-25, 15-16.
3 A user Cinema ticket, 25-30, 13-14.
... ... ...
N N users Cell phone, 5285-.
Table 2: discrete order data
In this way, the intelligent terminal can acquire discrete order data, and in the discrete order data, the user information still exists uniquely, but the difference is that a part of specific numerical values in the order information composed of more item sets are correspondingly replaced by a preset range value, so that the order type is formed. At this time, it can be found that, for example, "food, 13, 11: 30" and "food, 14, 11: 45" which exist independently in the historical order data are replaced with "food, 10-15, 11-12". Therefore, the intelligent terminal enables the historical order data with large data volume to be reduced into discrete order data with small data volume, and the efficiency of similarity calculation is improved.
In one embodiment, the different types of discrete order data include different types of item sets, and different discrete processing rules may be applied to the different item sets, so that correspondingly, the discrete processing rules in step 102 may include at least equal frequency sub-buckets, equal distance sub-buckets, clustering sub-buckets, and the like, and step 102 may further include the following processing: acquiring an order item set included in the historical order data, and selecting a discrete processing rule based on the order item set; and carrying out discretization processing on the historical order data based on the selected discretization processing rule to obtain discrete order data.
This contributes to an improvement in the effect of discrete processing, and thus contributes to an improvement in the accuracy of similarity calculation.
Step 103, extracting the order types in the discrete order data, screening and removing the repeated order types, and constructing and acquiring an order type dictionary by taking each screened order type as a field.
In implementation, after the intelligent terminal acquires the discrete order data, the discrete order data at this time includes a plurality of uniquely existing user information and a plurality of possibly non-uniquely existing order types. Therefore, the intelligent terminal can further screen the discrete order data, screen out repeated order types and remove redundant repeated data under the condition of keeping one of the repeated order types.
In this way, the intelligent terminal can obtain a set consisting of unique order types, each order type is used as a field, an order type dictionary including all order types in the historical order data can be constructed, and the format of the order type dictionary can be as follows: [ item type, item unit price, time to place an order ], No. 1; (item type, item unit price, time of placing an order), number 2; (item type, item unit price, time of placing an order), number 3; ...; (commodity type, commodity unit price, time to place order), number N ]. For example: [ food, 10-15, 11-12, No. 1; (food, 20-25, 15-16), No. 2; (movie tickets, 25-30, 13-14), number 3; ...; (cell phone, 5285-5290, 17-18), number N ].
In one embodiment, too few or too many fields in the order type dictionary may adversely affect the similarity calculation result, and accordingly, the following process may be further included after step 103: comparing the field number in the order type dictionary with a preset field number interval; when the number of fields in the order type dictionary does not belong to the field number interval, adjusting the discrete processing rule to match the number of fields with the field number interval.
In practice, the discrete processing rules define classification intervals of the historical order data, and therefore the discrete processing rules are directly associated with the number of fields of the order type dictionary, such as "food, 13, 11: 30", "food, 14, 11: 45" are each replaced with "food, 10-15, 11-12" when "the commodity unit price interval span is 5 and the ordering time interval span is 1 hour", and when the discrete rules are adjusted to "the commodity unit price interval span is 5 and the ordering time interval span is half an hour", they are replaced with "food, 10-15, 11: 00-11:30 "and" food, 10-15, 11: 30-12: 00 ", and finally a field number in the order type dictionary, will also be split into two.
In this way, the intelligent terminal can adjust the data volume of the order type dictionary through adjusting the discrete rule, and when the data volume is too large, the order type dictionary is compressed; when the amount of data is too small, the order type dictionary is released.
And 104, aggregating the discrete order data based on the user information, and splicing the discrete order data corresponding to the same user information into a user order sequence.
In implementation, after the intelligent terminal obtains the discrete order data, in order to analyze the user similarity, data belonging to the same user needs to be aggregated, and the basis of attribution can be the user information uniquely existing in the discrete order data, and after all order types associated with the target user information are aggregated and arranged, a user order sequence of the target user can be obtained. Specifically, the user order sequence may be as shown in table 3 below, for convenience of presentation, the order types in the user order sequence are summarized in the present embodiment in the form of lower case letters, each lower case letter refers to a complete order type, for example: "a" = "food, 10-15, 11-12", and the like:
serial number User order sequence
1 A, a user: [ a, b, c, d, e
2 B user [ f, g, h, i, j. ]
3 C, user: [ k, l, m, n, o. ] to
... ...
N N, users: [ p, q, r, s, t
Table 3: user order sequence
In this way, the intelligent terminal can obtain a user order sequence describing the ordering behavior of each user.
And 105, acquiring a user vector based on the order type dictionary and the user order sequence, wherein the dimension of the user vector is the number of order types in the order type dictionary, and the numerical value of each dimension of the user vector is the frequency of each order type in the order type dictionary appearing in the user order sequence.
In implementation, in order to improve the efficiency of similarity analysis processing, the intelligent terminal may perform vectorization processing on the user order sequence through the obtained order type dictionary, specifically, may use the number of order types in the order type dictionary as a dimension of a vector, specify an order type corresponding to each digit of the vector, and use the number of times that the order types in the user order sequence are sent as a numerical value of a corresponding dimension, as shown in table 4:
serial number User vector
1 A, a user: [ 1,0,0,2,1 ]
2 B user [ 0,2,1,0,1 ]
3 C, user: [ 0,1,2,2,1. ] C
... ...
N N, users: [ 1,2,1,0,1 ]
Table 4: user vector
Therefore, the intelligent terminal can obtain the order type dictionary and the user vector which are used in a unified mode to replace the user order sequence, the data volume of user similarity analysis is reduced to the maximum extent, meanwhile, the summarization of the order placing behavior of the user is achieved to the maximum extent, and the accuracy and the efficiency of the user similarity analysis are improved finally.
And 106, acquiring the similarity between the target user and other users based on a preset similarity algorithm and the user vector.
In implementation, after the intelligent terminal obtains the user vectors, a similarity algorithm can be called to calculate the similarity between the users, the similarity algorithm can be set as a cosine theorem similarity algorithm and the like, in order to obtain the target user vector of the target user and other user vectors of other users by the cosine theorem algorithm, the cosine value of the included angle between the two user vectors is calculated, the obtained interval is (-1, 1), and the larger the cosine value in the interval is, the higher the similarity is.
In one embodiment, step 106 may be followed by the following steps: respectively calculating the similarity between the target user vector of the target user and other user vectors of other users based on a preset similarity algorithm; based on the similarity, the other users are sorted in a descending order; selecting a preset number of other users from high to low according to the sequence, and marking the other users as similar users corresponding to the target user; and acquiring the preference commodity information of the similar users and pushing the preference commodity information to the target user.
Therefore, according to the similar users of the target user, the commodity preference of the similar users is pushed to the target user, the target user can obtain the potential preference commodities with different dimensions, different time and different space, and the commodity pushing effect can be improved.
In one embodiment, a data loss may occur during the process of acquiring data, and accordingly, the method may further include the following steps: and when data missing occurs in the historical order data, replacing the missing data based on a preset replacement rule.
In practice, the replacement rule may be set by a technician to perform unique replacement in the same type of data, for example, the missing "user id" may be replaced with "user phone number".
Therefore, missing data in historical order data can be replaced through the preset replacement rule, the fault tolerance rate in the similarity calculation process can be improved, and the stability in the data calculation process can be improved.
Based on the same technical concept, an embodiment of the present invention further provides a system for calculating user similarity based on singleton data, and with reference to fig. 2, the system includes:
the data acquisition module is used for acquiring historical order data in a preset time interval, and the historical order data at least comprises user information and order information;
the discrete processing module is used for carrying out discrete processing on the historical order data based on a preset discrete processing rule so as to obtain discrete order data, and the discrete order data comprises user information and order types;
the dictionary construction module is used for extracting order types in the discrete order data, screening and removing repeated order types, constructing and acquiring an order type dictionary by taking each screened order type as a field;
the sequence aggregation module is used for aggregating discrete order data based on the user information and splicing the discrete order data corresponding to the same user information into a user order sequence;
a vector generation module, configured to obtain a user vector based on the order type dictionary and the user order sequence, where a dimension of the user vector is the number of order types in the order type dictionary, and a numerical value of each dimension of the user vector is the number of times that each order type in the order type dictionary appears in the user order sequence;
and the similarity calculation module is used for acquiring the similarity between the target user and other users based on a preset similarity algorithm and the user vector.
An embodiment of the present application further discloses an intelligent terminal, and referring to fig. 3, the intelligent terminal includes a memory and a processor, and the memory stores thereon a computer program that can be loaded by the processor and execute the above-mentioned user similarity calculation method based on singleton data.
Based on the same technical concept, the embodiment of the present application further discloses a computer-readable storage medium, which includes various steps that can be implemented in the above-mentioned user similarity calculation method flow based on the singleton data when being loaded and executed by the processor.
The computer-readable storage medium includes, for example: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, only the division of the above functional modules is used for illustration, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules to complete all or part of the above described functions.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. With this understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
The above embodiments are only used to describe the technical solutions of the present application in detail, but the above embodiments are only used to help understanding the method and the core idea of the present application, and should not be construed as limiting the present application. Those skilled in the art should also appreciate that various modifications and substitutions can be made without departing from the scope of the present disclosure.

Claims (10)

1. A user similarity calculation method based on form data is characterized in that: the method comprises the following steps:
acquiring historical order data in a preset time interval, wherein the historical order data at least comprises user information and order information;
discretizing the historical order data based on a preset discretization rule to obtain discretization order data, wherein the discretization order data comprises user information and order types;
extracting order types in the discrete order data, screening and removing repeated order types, and constructing and acquiring an order type dictionary by taking each screened order type as a field;
aggregating discrete order data based on the user information, and splicing the discrete order data corresponding to the same user information into a user order sequence;
obtaining a user vector based on the order type dictionary and a user order sequence, wherein the dimension of the user vector is the number of order types in the order type dictionary, and the numerical value of each dimension of the user vector is the frequency of each order type in the order type dictionary appearing in the user order sequence;
and acquiring the similarity between the target user and other users based on a preset similarity algorithm and the user vector.
2. The method for calculating user similarity based on formed data according to claim 1, wherein: the acquiring historical order data in a preset time interval, wherein the historical order data at least comprises user information and order information, and the method comprises the following steps:
comparing the historical order number in the historical order data with a preset sample number interval, wherein the sample number interval comprises an upper sample number threshold value and a lower sample number threshold value;
when the historical order number is lower than a sample number lower threshold value, gradually increasing the time interval for acquiring the historical order data according to a preset step length;
and when the historical order number is higher than the upper sample number threshold value, the time interval for acquiring the historical order data is gradually reduced according to a preset step length.
3. The method for calculating user similarity based on formed data according to claim 1, wherein: the preset discrete processing rule at least comprises an equal-frequency sub-bucket, an equal-distance sub-bucket and a clustering sub-bucket, the historical order data is subjected to discretization processing based on the preset discrete processing rule to obtain discrete order data, and the discrete order data comprises user information and order types:
acquiring an order item set included in the historical order data, and selecting a discrete processing rule based on the order item set;
and carrying out discretization processing on the historical order data based on the selected discretization processing rule to obtain discrete order data.
4. The method for calculating user similarity based on singleton data according to claim 1, wherein: the extracting of the order types in the discrete order data, the screening and removing of the repeated order types, and the constructing and obtaining of the order type dictionary with each screened order type as a field, includes:
comparing the field number in the order type dictionary with a preset field number interval;
when the number of fields in the order type dictionary does not belong to the field number interval, adjusting the discrete processing rule to match the number of fields with the field number interval.
5. The method for calculating user similarity based on formed data according to claim 1, wherein: the obtaining of the similarity between the target user and the other users based on the preset similarity algorithm and the user vector includes:
respectively calculating the similarity between the target user vector of the target user and other user vectors of other users based on a preset similarity algorithm;
and based on the similarity, performing descending order arrangement on other users.
6. The method for calculating user similarity based on the formed data according to claim 5, wherein: after the other users are sorted in descending order based on the similarity, the method includes:
selecting a preset number of other users from high to low according to the sequence, and marking the other users as similar users corresponding to the target user;
and acquiring the preference commodity information of the similar users and pushing the preference commodity information to the target user.
7. The method for calculating user similarity based on the singleton data according to claim 1, further comprising:
and when data missing occurs in the historical order data, replacing the missing data based on a preset replacement rule.
8. A user similarity calculation system based on form data is characterized in that: the system comprises:
the data acquisition module is used for acquiring historical order data in a preset time interval, and the historical order data at least comprises user information and order information;
the discrete processing module is used for carrying out discrete processing on the historical order data based on a preset discrete processing rule so as to obtain discrete order data, and the discrete order data comprises user information and order types;
the dictionary construction module is used for extracting order types in the discrete order data, screening and removing repeated order types, constructing and acquiring an order type dictionary by taking each screened order type as a field;
the sequence aggregation module is used for aggregating discrete order data based on the user information and splicing the discrete order data corresponding to the same user information into a user order sequence;
the vector generation module is used for acquiring a user vector based on the order type dictionary and the user order sequence, the dimension of the user vector is the number of order types in the order type dictionary, and the numerical value of each dimension of the user vector is the frequency of each order type in the order type dictionary appearing in the user order sequence;
and the similarity calculation module is used for acquiring the similarity between the target user and other users based on a preset similarity calculation method and the user vector.
9. An intelligent terminal, comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement a method for calculating user similarity based on singleton data according to any one of claims 1 to 7.
10. A computer-readable storage medium, having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a method of computing user similarity based on singleton data as claimed in any one of claims 1 to 7.
CN202210254967.7A 2022-03-15 2022-03-15 User similarity calculation method and system based on form data Pending CN114723516A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210254967.7A CN114723516A (en) 2022-03-15 2022-03-15 User similarity calculation method and system based on form data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210254967.7A CN114723516A (en) 2022-03-15 2022-03-15 User similarity calculation method and system based on form data

Publications (1)

Publication Number Publication Date
CN114723516A true CN114723516A (en) 2022-07-08

Family

ID=82238040

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210254967.7A Pending CN114723516A (en) 2022-03-15 2022-03-15 User similarity calculation method and system based on form data

Country Status (1)

Country Link
CN (1) CN114723516A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116402399A (en) * 2023-04-14 2023-07-07 北京智慧大王科技有限公司 Business data processing method and system based on artificial intelligence and electronic mall

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116402399A (en) * 2023-04-14 2023-07-07 北京智慧大王科技有限公司 Business data processing method and system based on artificial intelligence and electronic mall
CN116402399B (en) * 2023-04-14 2023-12-29 上海锦咏数据科技有限公司 Business data processing method and system based on artificial intelligence and electronic mall

Similar Documents

Publication Publication Date Title
US20100082697A1 (en) Data model enrichment and classification using multi-model approach
CN111489201A (en) Method, device and storage medium for analyzing customer value
EP3608799A1 (en) Search method and apparatus, and non-temporary computer-readable storage medium
CN107247728B (en) Text processing method and device and computer storage medium
CN113051291A (en) Work order information processing method, device, equipment and storage medium
CN111967521A (en) Cross-border active user identification method and device
CN114723516A (en) User similarity calculation method and system based on form data
CN114049016A (en) Index similarity judgment method, system, terminal device and computer storage medium
CN111611228B (en) Load balancing adjustment method and device based on distributed database
CN113609020A (en) Test case recommendation method and device
CN116596576B (en) Target recommendation method and device
CN113298145A (en) Label filling method and device
CN110458581B (en) Method and device for identifying business turnover abnormality of commercial tenant
CN113240489B (en) Article recommendation method and device based on big data statistical analysis
CN115408527A (en) Text classification method and device, electronic equipment and storage medium
CN114358879A (en) Real-time price monitoring method and system based on big data
CN110765100B (en) Label generation method and device, computer readable storage medium and server
CN112016975A (en) Product screening method and device, computer equipment and readable storage medium
CN113407700A (en) Data query method, device and equipment
CN112560433A (en) Information processing method and device
CN110858214B (en) Recommendation model training and further auditing program recommendation method, device and equipment
CN108109002B (en) Data processing method and device
CN110019771B (en) Text processing method and device
CN112529319A (en) Grading method and device based on multi-dimensional features, computer equipment and storage medium
CN113343767A (en) Logistics illegal operation identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination