WO2021093472A1 - 数据处理方法、电子设备及可读存储介质 - Google Patents

数据处理方法、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2021093472A1
WO2021093472A1 PCT/CN2020/117623 CN2020117623W WO2021093472A1 WO 2021093472 A1 WO2021093472 A1 WO 2021093472A1 CN 2020117623 W CN2020117623 W CN 2020117623W WO 2021093472 A1 WO2021093472 A1 WO 2021093472A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
bit vector
target
bit
vector
Prior art date
Application number
PCT/CN2020/117623
Other languages
English (en)
French (fr)
Inventor
李发明
李海翔
邹兆年
潘安群
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2021093472A1 publication Critical patent/WO2021093472A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Definitions

  • This application relates to the field of computer technology, and specifically to a data processing method, electronic equipment, and readable storage medium.
  • the embodiments of the present application provide a data processing method, a data processing device, a computer-readable storage medium, and an electronic device, which can improve data processing efficiency at least to a certain extent.
  • a data processing method which is executed by a computer device, and the method includes:
  • the query request In response to the query request of the first user, in response to the query request, obtain a bit vector table related to the operation data of the target user; wherein the query request includes identification information and time information, and the bit vector table includes the user identification and the The bit vector corresponding to the user identification, the identification information corresponds to the target user; the target bit vector is obtained from the bit vector table according to the identification information and the time information, and the target bit vector is logically processed To obtain target information.
  • a data processing device including: an obtaining module, configured to obtain a bit vector table related to operation data of a target user in response to a query request; wherein the query request includes identification information And time information, the bit vector table includes a user identification and a bit vector corresponding to the user identification, and the identification information corresponds to the target user; A target bit vector is obtained from the bit vector table, and logic processing is performed on the target bit vector to obtain target information.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the data processing method as described in the above-mentioned embodiment is implemented.
  • an electronic device including one or more processors; a storage device, for storing one or more programs, when the one or more programs are used by the one or more When executed by the two processors, the one or more processors are caused to execute the data processing method described in the foregoing embodiment.
  • a bit vector table related to the operation data of the target user in response to the query request of the first user, a bit vector table related to the operation data of the target user is obtained; then the bit vector table is obtained from the bit vector table according to the identification information and time information in the query request Obtain the target bit vector in, and obtain the target information by performing logic processing on the target bit vector.
  • the technical solution of the present application can improve data processing efficiency and reduce resource waste by converting user operation data into bit vectors.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied;
  • Fig. 2 schematically shows a flow chart of a data processing method according to an embodiment of the present application
  • Fig. 3 schematically shows a flow chart of updating a bit vector table according to an embodiment of the present application
  • Fig. 4 schematically shows a flow chart of counting the number of user operations based on a bit vector according to an embodiment of the present application
  • FIG. 5 schematically shows a schematic flow chart of judging user operation similarity based on a bit vector according to an embodiment of the present application
  • Fig. 6 schematically shows a flow chart of determining the influence relationship between user operations based on a bit vector according to an embodiment of the present application
  • FIG. 7 schematically shows a schematic flowchart of a periodic judgment of user behavior based on a bit vector according to an embodiment of the present application
  • FIG. 8 schematically shows a flowchart of abnormal operation judgment based on a bit vector according to an embodiment of the present application
  • Fig. 9 schematically shows a flow chart of decompressing a compressed bit vector according to an embodiment of the present application.
  • Fig. 10 schematically shows a block diagram of a data processing device according to an embodiment of the present application.
  • Fig. 11 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • the embodiments of the present application provide a data processing method, a data processing device, a computer storage medium, and an electronic device that improve the current billing system.
  • Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
  • the system architecture 100 may include a terminal device 101, a network 102, and a server 103.
  • the network 102 is used to provide a medium of a communication link between the terminal device 101 and the server 103.
  • the network 102 may include various connection types, such as wired communication links, wireless communication links, and so on.
  • the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to actual needs, there can be any number of terminal devices, networks and servers.
  • the server 103 may be a server cluster composed of multiple servers.
  • the terminal device 101 may be a terminal device with a display screen, such as a notebook, a desktop computer, and a smart phone.
  • the user performs various operations on the display screen of the terminal device 101, and the terminal device 101 can send instructions corresponding to the user operations to the server 103 via the network 102.
  • the server 103 can respond to the instruction after receiving the instruction, and can analyze the user operation, construct a user operation data table according to the user operation data, and generate a bit vector table associated with the user operation data table.
  • the server 103 monitors the user operation data table. When the user operation data in the user operation data table changes, the changed user operation data can be mapped to a bit vector to update the bit vector in the bit vector table.
  • the identification information and time information as the query and mining conditions can be obtained from the bit vector table corresponding to the identification information
  • the target bit vector of the target bit vector is logically processed to obtain the target information.
  • the identification information is identification information corresponding to the target user.
  • Query and mine user operation behaviors for example, including querying whether a given target user has operated on his account during the query time interval, and querying whether there are other users who have similar operation behaviors to the given target user during the query time interval.
  • Excavate whether the operation behavior of a given target user is periodic in the query time interval determine whether the user has abnormal operations in the query time interval based on the periodic operation behavior of the given target user, and so on.
  • the target information includes the number of times the target user performs a certain operation or certain operations in the query time interval, other users whose operation behaviors are similar to the target user in the query time interval, and the user operations of the target user in the query time interval.
  • the subject of execution of each step can be a computer device, which can be any electronic device with processing and storage capabilities, such as mobile phones, tablet computers, game devices, multimedia playback devices, electronic photo frames, wearable devices, PCs (Personal Computer) and other electronic devices can also be servers, etc.
  • a computer device can be any electronic device with processing and storage capabilities, such as mobile phones, tablet computers, game devices, multimedia playback devices, electronic photo frames, wearable devices, PCs (Personal Computer) and other electronic devices can also be servers, etc.
  • PCs Personal Computer
  • the data processing method provided in the embodiments of the present application is generally executed by a server, and correspondingly, the data processing device is generally set in the server. However, in other embodiments of the present application, the data processing method provided in the embodiments of the present application may also be executed by a terminal device.
  • the operation information of a certain user in a certain period of time is queried from the charging database, and identification information such as the user's ID is usually used to retrieve from the database according to time to obtain the user's operation information during the period of time.
  • identification information such as the user's ID
  • the operation information of one or more users in different time intervals is retrieved from the billing database and aggregated, because the operation data of the same user at different times is likely to be distributed in different storage nodes, even on user data The index has been built, but it still consumes a lot of retrieval time.
  • the embodiment of the present application first proposes a data processing method.
  • the implementation details of the technical solution of the embodiment of the present application are described in detail below:
  • FIG. 2 schematically shows a flowchart of a data processing method according to an embodiment of the present application.
  • the data processing method may be executed by a server, and the server may be the server 103 shown in FIG. 1.
  • the data processing method at least includes steps S210 to S220, which are described in detail as follows:
  • step S210 in response to the query request, a bit vector table related to the operation data of the target user is obtained; wherein the query request includes identification information and time information, and the bit vector table includes a user ID and a bit vector table corresponding to the user ID.
  • the bit vector of, the identification information corresponds to the target user.
  • the query request may be a request initiated by a query user for querying the operation data of the target user.
  • the target user is a user who performs a specific operation and generates operation data.
  • the target user may be one user or multiple users.
  • the server can obtain the operation data of the target user corresponding to the identification information according to the identification information and time information in the query request, and process it.
  • the data processing efficiency is very low, and the accuracy is also poor.
  • a user operation data table can be generated based on the target user's operation data, and a bit vector table associated with the user operation data table can be generated according to the user operation data table, and then the bit vector in the bit vector table can be compared. Process to obtain target information.
  • the user operation data table is generated according to the operation data of the target user, and the bit vector table associated with the user operation data table is generated according to the user operation data table, which can be specifically performed in the following manner.
  • the user performs operations through the terminal device 101, for example, by clicking the corresponding controls on the online shopping platform to perform operations such as product browsing, ordering, payment, sharing, etc., and clicking the corresponding controls on the chat interface to perform operations such as message sending, sharing, editing, and deletion ,and many more.
  • the terminal device 101 sends an instruction corresponding to the operation to the server 103.
  • the server 103 provides corresponding feedback after receiving the instruction, analyzes the user's operation behavior, and constructs a user operation data table according to the user's operation behavior.
  • the user operation data table can be a KV (key-value) data table, in which the key can be the user ID of the user, such as the user ID generated when the user is registered, the user ID number and other information or unique identification that is uniquely associated with the user User's information; value can be data generated after the user performs operations, such as the amount of expenditure, the number of purchases, the amount of recharge, the number of recharges, and so on.
  • KV key-value
  • the key can be the user ID of the user, such as the user ID generated when the user is registered, the user ID number and other information or unique identification that is uniquely associated with the user User's information
  • value can be data generated after the user performs operations, such as the amount of expenditure, the number of purchases, the amount of recharge, the number of recharges, and so on.
  • a one-bit vector table can also be created.
  • the bit vector table is associated with the user operation data table, wherein the recorded data is mapped according to the user operation data, and the bit vector table records the user identification and the
  • the bit vector is a binary sequence composed of consecutive 0/1 sequences.
  • the length of the bit vector is the number of 0s and 1s in the bit vector. Each 0 or 1 in the bit vector is called a "bit". For example, 01011 is a bit vector of length 5.
  • Table 1 shows the structure of an example bit vector table, as shown in Table 1:
  • B [s, e) represents the bit vector mapped according to the user's operation data in the time interval [s, e).
  • s represents the start time
  • e represents the end time
  • a , C, d, e are user IDs
  • the corresponding 110011011100, 100101100011, 101110110101, and 010001000111 are bit vectors obtained by mapping the operation data of user a, user c, user d, and user e; the length of each bit vector is 12.
  • the bit vector table can be created through SQL statements.
  • the name of the bit vector table can be determined according to the user operation data table, and is specified by bitsVector_table_name. For example, when the name of the user operation data table is buy_record_tab, the name of the bit vector table associated with it may be bitsVector_buy_record_tab. If not specified, the default name of the bit vector table associated with the user operation data table is "user operation data table name_bvt", for example, buy_record_tab_bvt.
  • the time granularity is the time attribute of each bit in the bit vector, which represents a period of time.
  • Each bit in the bit vector is the user's user operation behavior in this period of time.
  • the time granularity can also be understood as the period of the bit vector.
  • the time granularity can be set to one day (d), one hour (h), one minute (m), and so on.
  • the time granularity in Table 1 is one hour.
  • the time granularity can also be set to other values, which are not specifically limited in the embodiment of the present application.
  • the purpose of setting the time granularity is to reasonably control the relationship between the storage space usage of the bit vector, the query accuracy and the query requirements according to the requirements.
  • the time granularity can be set to a small value, such as one minute; when only coarse query results are required, the time granularity can be set to a larger value The numerical value.
  • the user operation data can be mapped to a bit vector according to the user's operation time. For example, if the time granularity is set to one hour, the bit vector length of one day is 24. When a user performs an operation in the time interval of 2 o'clock-3 o'clock, then the bit corresponding to 2 o'clock-3 o'clock in the bit vector can be set to 1, indicating that the user has performed an operation within the time granularity of this hour.
  • the bit vector table can be modified through the ALTER TABLE statement.
  • the ALTER TABLE statement uses the same predicate to modify the bit vector table. If you only modify the period value, the previous data will be invalid; if you modify the name of the bit vector table, you can keep the old data and store the new data in the new bit vector table.
  • the length of the bit vector in the bit vector table can be set as required. When the time corresponding to the user operation exceeds the time supported by the bit vector in the bit vector table, a new bit vector table can be created to deal with it.
  • the bit vector table can also be used to record other operation information of the user by adding columns to the bit vector table, such as the type of operation and the transaction corresponding to the operation. Amount, payment method corresponding to the operation, etc.
  • a bit vector can be added to record the user's operation type, 1 means consumption, 0 means recharge; a bit vector can be added to record the user's transaction amount, 1 means consumption exceeds 100 yuan, 0 means consumption does not exceed 100 yuan; yes Add a bit vector to record the user's payment method, 1 represents non-cash payment, and 0 represents cash payment.
  • the user operation data table can be monitored.
  • the bit vector in the bit vector table is triggered to update.
  • a trigger can be set in the user operation data table.
  • the user operation data in the user operation data table will change, and the changed user operation data can be mapped through the trigger trigger.
  • a bit vector is formed, and the bit vector table is updated according to the bit vector.
  • Fig. 3 shows a schematic diagram of the process of updating the bit vector table.
  • the process of updating the bit vector table at least includes step S301-step S304, specifically:
  • step S301 the target user identifier corresponding to the changed user operation data is determined from the user operation data table.
  • the target user ID corresponding to the changed user operation data can be determined, and then the bit vector table can be obtained according to the target user ID.
  • the target user identifies the corresponding bit vector and updates it.
  • step S302 the first bit vector corresponding to the target user ID is obtained from the bit vector table according to the target user ID, and the changed user operation data is mapped to obtain the second bit vector.
  • the target user ID after obtaining the target user ID, it can be matched with the user ID in the bit vector table.
  • the corresponding bit vector is extracted.
  • This bit vector is the first bit vector.
  • the changed user operation data can be mapped to obtain the second bit vector .
  • the target user ID is 12345.
  • the corresponding first vector is 010000000000
  • the length of the first vector is 12, and the time granularity is 1h, which means that the target user is within 12 hours.
  • An operation was performed between 1h and 2h. If the target user performed the operation again during the hour from 4h to 5h, the second bit vector after mapping can be obtained as 000010000000.
  • step S303 the first bit vector and the second bit vector are ORed to obtain the third bit vector.
  • the first bit vector and the second bit vector can be integrated, To get the third bit vector.
  • the integration operation may specifically be an OR (
  • ) operation on the first bit vector and the second bit vector. Taking the first bit vector and the second bit vector in step S302 as an example, the third bit vector (010000000000)
  • (000010000000) 010010000000, which is used to indicate that the user corresponding to the target user ID is in the 1h-2h and 4h -The operations were performed separately at 5h.
  • step S304 the first bit vector is replaced with the third bit vector to update the bit vector in the bit vector table.
  • the first bit vector can be replaced with the third bit vector to implement the update of the bit vector table.
  • the NewBit function may be used to generate a new bit vector value.
  • the NewBit function can receive four parameters, namely: the time to start counting user operations (that is, the time when the bit vector table is generated), the time granularity specified by the user in advance, the time of the user update operation, and the user ID of the operation that occurred.
  • the NewBit function first reads the user's bit vector value from the bit vector table, and then performs an OR operation with the bit vector mapped by the user update operation to obtain a new bit vector, and finally writes the new bit vector back to the bit vector table .
  • the NewBit function can be a trigger function, a user-defined function, or a system function of the database engine, which is not specifically limited in the embodiment of the application, and a suitable function can be selected to update the bit vector table according to actual needs.
  • step S220 a target bit vector is obtained from the bit vector table according to the identification information and the time information, and logical processing is performed on the target bit vector to obtain target information.
  • the server may obtain the target bit vector from the bit vector table according to the identification information and the time information, and then obtain the target information according to the target bit vector.
  • the target information includes the number of times the user performs a certain operation or certain operations in the query time interval, whether the operations of other users in the query time interval are similar to those of the target user, and whether they are in the query time interval. The mutual influence relationship between user operation behaviors, the periodicity of user operation behaviors in the query time interval, and whether users with periodic operation behaviors have abnormal operations in the query time interval.
  • the specific method for obtaining the target information according to the target bit vector is to perform logical processing on the target bit vector, and the logical processing includes basic operations and basic operations of the bit vector.
  • the basic operations of bit vectors include AND, OR, NOT, XOR, which are represented by &,
  • different operations may be used to process the bit vector to obtain different target information.
  • the AND, OR, and XOR operations are binary operations.
  • the AND, OR, and XOR operations respectively return the result 1 in the following cases: that is, both bits are 1, any one bit is 1, and there is only one bit is 1, the other cases return the result 0; non-operation is monocular Operation, if the operation bit is 0, the result 1 is returned, otherwise the result 0 is returned.
  • the basic operation of a bit vector is shifting.
  • >> and ⁇ are used to indicate shift operations, such as Means to move two bits to the right, and add 0 to the leftmost two bits after shifting, and the resulting bit vector is 001100110111. What needs to be emphasized here is that the basic operations and shift operations of bit vectors can be well supported by the bottom of the computer and can be completed very quickly by the computer.
  • the user operation data is mapped into a bit vector based on The bit vector is processed accordingly to avoid directly exposing user operation data to the outside world, thereby improving the security of user data and avoiding the leakage of user privacy.
  • a key operation for a bit vector is counting, which is represented by Count, such as
  • the function returns the number of 1s in the bit vector corresponding to user a in the time interval [s, e).
  • the Count function can be quickly completed by a shift operation.
  • the specific algorithm steps are: take the bit vector B [s, e) as the input bit vector, and first obtain the length of the bit vector B [s, e) and For the first bit, the first bit of the bit vector B [s, e) is ANDed with 1, and the result of the AND operation is used as the initial statistical value; then the bit vector B [s, e) is shifted to the right to get its first bit Two bits, the second bit is ANDed with 1, and the initial statistical value is updated with the result of the AND operation. Repeat the above steps until the last bit of the bit vector B [s, e) is ANDed with 1 , And update the statistical value to get the final result, which is the return value of the Count() function.
  • FIG. 4 shows a schematic diagram of the flow of counting the number of user operations based on a bit vector. As shown in FIG. 4, the flow at least includes steps S401-S402, specifically:
  • step S401 a first user identification and a first time interval are obtained, and a first target bit vector corresponding to the first user identification is obtained from a bit vector table according to the first user identification and the first time interval.
  • multiple user IDs and bit vectors corresponding to each user ID can be recorded in the bit vector table.
  • the first user ID can be combined with The user ID in the bit vector table is matched.
  • the bit vector corresponding to the first user ID is determined according to the first time interval and the time granularity of the bit vector table, and it is marked as The first target bit vector.
  • the step S401 is executed in response to a query request of the query user, and the query request includes identification information and time information.
  • the first user identification is obtained according to the identification information, for example, is included in the identification information, or may be obtained from a storage device by a server through the identification information.
  • the time information includes a first time interval.
  • the first user identifier corresponds to the target user.
  • step S402 the first target bit vector is counted to obtain the number of operations performed by the user corresponding to the first user identifier in the first time interval.
  • the Count() function can be used to count it, that is, the number of 1 in the first target bit vector is counted to obtain the user corresponding to the first user ID The number of operations performed in the first time interval. For example, taking user a in Table 1 as an example, if you want to obtain the operation status of user a in the time interval [0, 5), you can first obtain the first target bit vector 11001 of the user a in the time interval [0, 5). , And then use the Count() function to count the first target bit vector 11001, you can get It shows that user a has performed at least 3 operations in the time interval [0, 5); finally, the statistical result 3 is returned.
  • Fig. 5 shows a schematic diagram of a process for judging user operation similarity based on a bit vector. As shown in Fig. 5, the process at least includes steps S501-S504, specifically:
  • step S501 a second user identification, a user identification to be compared, and a second time interval are acquired.
  • the server can first obtain the user identification of the target user, the user identification of other users, and the time interval to be queried, and then according to the operation position vector of the target user and the operation position vector of other users in the time interval , To determine whether the operation of the target user is similar to the operation of other users.
  • the user identification of the target user is marked as the second user identification
  • the user identifications of other users are marked as the user identification to be compared
  • the time interval to be queried is marked as the second time interval.
  • the comparison user ID can be one user ID or multiple user IDs.
  • the step S501 is executed in response to a query request of the querying user, for example, and the query request includes identification information and time information.
  • the second user identification and the user identification to be compared are obtained according to the identification information, for example, included in the identification information, or may be obtained from a storage device by a server through the identification information.
  • the time information includes a second time interval.
  • the second user ID and the user ID to be compared correspond to the target user.
  • step S502 according to the second user ID, the user ID to be compared, and the second time interval, a second target bit vector corresponding to the second user ID and a second target bit vector corresponding to the user ID to be compared are obtained from the bit vector table. Compare the target bit vector.
  • the second user ID and the user ID to be compared may be respectively compared with the user ID in the bit vector table.
  • the matching is performed to obtain the second target bit vector corresponding to the second user identifier and the target bit vector to be compared corresponding to the user identifier to be compared in the second time interval.
  • step S503 an XOR operation is performed on the second target bit vector and the target bit vector to be compared, and the result of the XOR operation is not operated to obtain the comparison target bit vector.
  • the second target bit vector and each target bit vector to be compared may be operated on to obtain the similarity between the two Sex.
  • the second target bit vector can be XORed with the target bit vector to be compared, and the result of the XOR operation can be negated to obtain the comparison target bit vector; then the target bit vector can be compared to obtain statistics The similarity between the two.
  • the server first obtains the bit vectors of users c, a, and d in the time interval [0, 12), which are respectively Then the bit vector of user c is XORed with the bit vector of users a and d respectively, namely Then perform the NOT operation on the result of the exclusive OR operation, that is That is, the comparison target bit vector is with
  • step S504 the target bit vector is compared and counted to obtain the operating similarity of the user corresponding to the second user identification and the user corresponding to the user identification to be compared in the second time interval.
  • the target bit vector after obtaining the comparison target bit vector, can be compared according to the Count() function to obtain the user corresponding to the second user ID and the user corresponding to the user ID to be compared Operational similarity in the second time interval. After statistics, From this, it can be determined that in the time interval [0, 12), the similarity of the operations of the user d and the user c is greater than the similarity of the operations of the user a and the user c.
  • the similarity measure can also be refined by calculating the proportion of similar operations in all operations in a given time interval. For example, in the second time interval, the ratio of similar operations between user c and user a is 3/12, and the ratio of similar operations between user c and user d is 6/12. Obviously, user c and user d have more operating behaviors. similar.
  • the operation can be used to cluster users, that is, to cluster users based on operation similarity. Similar users are divided into the same category, and different users are divided into different categories.
  • the result of the clustering can be used as a data preprocessing process to speed up the processing speed of other analysis; it can also be provided to the user portrait as a type of behavior characteristic of the user to help better understand the user.
  • Fig. 6 shows a schematic diagram of the process of judging the influence relationship between user operations based on the bit vector. As shown in Fig. 6, the process at least includes steps S601-S604, specifically:
  • step S601 the third user identification, the fourth user identification, the similarity threshold, and the third time interval are acquired.
  • the step S601 is executed in response to a query request of the query user, and the query request includes identification information and time information.
  • the third user identification and the fourth user identification are obtained according to the identification information, for example, are included in the identification information, or may be obtained from a storage device by a server through the identification information.
  • the time information includes a third time interval.
  • the third user identification and the fourth user identification correspond to the target user.
  • step S602 according to the third user ID, the fourth user ID, and the third time interval, a third target bit vector corresponding to the third user ID and a fourth target bit vector corresponding to the fourth user ID are obtained from the bit vector table. Bit vector.
  • the third user ID and the fourth user ID are respectively matched with the user ID in the bit vector table to obtain the third time interval A third target bit vector corresponding to the third user identification and a fourth target bit vector corresponding to the fourth user identification.
  • step S603 a shift operation is performed on the fourth target bit vector to obtain the shift target bit vector, and similarity judgment is performed on the shift target bit vector and the third target bit vector to obtain the similarity.
  • the influence of one user's operation on another user's operation may be synchronous or delayed. Therefore, when determining the influence relationship between user operations, you can perform a shift operation on the fourth target bit vector, and then determine the similarity between the third target bit vector and the shifted fourth target bit vector, and obtain the difference between the two The similarity between. Specifically, the fourth target bit vector can be firstly shifted to the left according to the shift unit to obtain the shift target bit vector; then the third target bit vector and the shift target bit vector are XORed, and the XOR is performed.
  • the shift unit is the number of bits that change each time the shift operation is performed, for example, it can be 1, 2, etc., as long as it is any integer less than the length of the bit vector.
  • a shift threshold can be set. When the shift operation reaches the shift threshold, the shift operation is stopped, and it is determined that the operation of the user corresponding to the third user identifier has no effect on the operation of the user corresponding to the fourth user identifier.
  • step S604 the similarity is compared with the similarity threshold, and according to the comparison result, it is determined whether the operation of the user corresponding to the third user identifier affects the operation of the user corresponding to the fourth user identifier in the third time interval.
  • the similarity can be compared with a similarity threshold, and the user’s identity corresponding to the third user identifier can be determined according to the comparison result. Whether the operation affects the operation of the user corresponding to the fourth user identifier.
  • the similarity is greater than or equal to the similarity threshold, it is determined that the operation of the user corresponding to the third user identifier in the third time interval affects the operation of the user corresponding to the fourth user identifier; when the similarity is less than the similarity threshold.
  • the fourth target bit vector is shifted again, and the similarity between the shifted bit vector and the third target bit vector is calculated, and the relationship between the similarity and the similarity threshold is judged. If the similarity is less than the similarity Threshold, repeat the above steps until the number of bits shifted to the left by the fourth target bit vector reaches the shift threshold.
  • the number of shift operations is returned, that is, the time delay for one user's operation to affect another user; when two or more users When there is no mutual influence between the user's operations, the shift threshold is returned.
  • steps S603 and S604 are implemented by shifting the fourth target bit vector, and may also be implemented by shifting the third target bit vector instead of the fourth target bit vector.
  • the third target bit vector operated by user a in the time interval [3, 8) is 001101
  • the fourth target bit vector operated by user d in the time interval [4, 9) is 110110 .
  • Perform a shift operation on the fourth target bit vector, and calculate the similarity between the shift target bit vector and the third target bit vector you can get It shows that the operation behavior of user a may affect user d, and the delay of the impact is about 1h. Similar to the similarity judgment, the influence relationship is not strongly established.
  • the influence relationship should be caused by some external factor, such as the launch of a new product; it is also possible that the influence relationship is completely accidental, that is, the two users There are similar consumer behaviors without any external factors. If in the billing data, this kind of accidental influence between two users often occurs, that is, after one user purchases certain goods, another user often buys these goods, but there is no such thing between the two users. Any connection can also make use of this influence relationship. When it is found that a user is operating on the account, it can be predicted that another user is also likely to operate on the account, thereby improving the understanding of the corresponding user.
  • Fig. 7 shows a schematic flow chart of the periodic judgment of user behavior based on a bit vector. As shown in Fig. 7, the flow at least includes steps S701-S705, specifically:
  • step S701 the fifth user identifier, the first operation mode bit vector, the first operation mode period, and the fourth time interval are acquired.
  • the step S701 is executed in response to a query request of the query user, for example, the query request includes identification information and time information.
  • the fifth user identification is obtained according to the identification information, for example, is included in the identification information, or may be obtained from a storage device by a server through the identification information.
  • the time information includes a fourth time interval.
  • the fifth user ID corresponds to the target user
  • the mode bit vector is used to determine whether the user operation in the fourth time interval is the repetition of the operation mode according to the operation mode bit vector.
  • the period of the first operation mode can be obtained to determine whether the periodicity of the user operation meets the preset Operating mode cycle.
  • step S702 a fifth target bit vector corresponding to the fifth user ID is obtained from the bit vector table according to the fifth user ID and the fourth time interval.
  • the fifth user ID can be matched with the user ID in the bit vector table to obtain the fifth target bit vector of the user corresponding to the fifth user ID in the fourth time interval, and use this
  • the fifth target bit vector is used as a benchmark for periodic analysis.
  • step S703 according to the number of bits of the first operation mode bit vector, the fifth target bit vector is converted into a plurality of first sub-bit vectors arranged in sequence, and the first operation mode bit vector and each first sub-bit vector are respectively The vector performs similarity judgment to obtain sub-similarity.
  • the new bit vector obtained after the fifth target bit vector is processed must include multiple first operation mode bit vectors, Therefore, the length of the fifth target bit vector must be greater than the length of the first operation mode bit vector.
  • the fifth target bit vector can be converted into a plurality of first sub-bit vectors arranged in sequence according to the length of the first operation mode bit vector, and then the first operation mode bit vector and each first sub-bit vector The bit vector performs similarity judgment to obtain the sub-similarity corresponding to each first sub-bit vector.
  • the bit vector 110011011100 corresponding to the operation of user a in the time interval [0, 12) is converted to obtain the first sub-bit vector: 110, 100, 001, 011, 110, 101, 011, 111, 110, and 100.
  • An operation mode bit vector is similarly judged with each first sub-bit vector, and the sub-similarity can be obtained, in order: 3, 2, 0, 1, 3, 1, 1, 2, 3, 2.
  • step S704 a sequence bit vector is determined according to the ordering and sub-similarity of each first sub-bit vector, and the repetition period of the sequence bit vector is obtained.
  • a sequence of bit vectors can be determined according to the sub-similarity corresponding to each first sub-bit vector.
  • the similarity 3
  • the similarity is 0, 1, and 2
  • a bit vector of operation mode is different.
  • the corresponding position in the sequence bit vector is 1.
  • the corresponding position in the sequence bit vector The position is 0.
  • the sequence bit vector composed of sub-similarity is 1000100010
  • the first eight bits of the sequence bit vector are 10001000, which is a cycle of 1000, indicating that the first operation mode repeats in a period of 4 hours .
  • step S705 when the repetition period is the same as the period of the first operation mode, it is determined that the operation behavior of the user corresponding to the fifth user identifier is periodic in the fourth time interval.
  • the operation of user a in the time interval [0, 12) is repeated at a period of 4 hours, and the period of the given first operation mode is also 4, indicating that the operation of user a is
  • the time interval [0, 12) is periodic. Further, it can be determined that the periodic start time is the 0th hour, and the end time is the 10th hour.
  • abnormal operation refers to the user's operation behavior that is different from the usual operation. Quick detection of abnormality can help the system quickly find the abnormal operation and determine whether the operation is the user's own operation. If it is not the user's own operation, you can Take timely measures to reduce user losses.
  • the abnormality judgment in the embodiment of the present application is based on periodic judgment, that is, the user's previous operations are periodic, and when an operation that does not meet the periodic characteristics occurs, it is defined as an abnormal operation.
  • Fig. 8 shows a schematic diagram of the process of judging abnormal operations based on bit vectors. As shown in Fig. 8, the process at least includes steps S801-S805, specifically:
  • step S801 the sixth user identifier, the second operation mode bit vector, the abnormality threshold, and the fifth time interval are acquired.
  • the step S801 is executed in response to a query request of the querying user, for example, the query request includes identification information and time information.
  • the sixth user identification is obtained according to the identification information, for example, is included in the identification information, or may be obtained from a storage device by a server through the identification information.
  • the time information includes a fifth time interval, and the sixth user identifier corresponds to the target user.
  • the second operation mode bit vector and abnormal threshold may be preset.
  • the user identification obtained in this step is recorded as the sixth user identification
  • the operation mode bit vector is recorded as the second operation mode bit vector
  • the time interval is recorded as the fifth time interval
  • the abnormal threshold for judging abnormal operation is obtained.
  • step S802 a sixth target bit vector corresponding to the sixth user ID is obtained from the bit vector table according to the sixth user ID and the fifth time interval, wherein the operation of the user corresponding to the sixth user ID is periodic.
  • the sixth user ID is matched with the user ID in the bit vector table to obtain the sixth target bit vector in the fifth time interval corresponding to the sixth user ID.
  • the sixth target bit vector can be processed according to steps S703-S704 shown in FIG. 7, and it is determined whether the operation of the user corresponding to the sixth user identifier exists in the fifth time interval according to the processing result.
  • Periodic Only on the basis of the periodicity of the user operation can it be judged whether there is an abnormal operation in the user operation. For the non-periodical user operation, it is difficult to determine whether there is an abnormal operation.
  • step S803 the sixth target bit vector is divided into a plurality of second sub-bit vectors according to the number of bits of the second operation mode bit vector.
  • the sixth target bit vector in order to determine which of the user operations does not conform to the periodicity, and to determine that the user operation is abnormal, the sixth target bit vector needs to be divided into multiple numbers according to the number of bits in the second operation mode bit vector.
  • the second sub-bit vector Take the bit vector 110011011100 corresponding to the operation of the user a in the time interval [0, 12) as an example. Given that the second operating mode bit vector is 1100, the bit vector corresponding to user a can be divided into a plurality of second sub-bit vectors: 1100, 1101, and 1100, respectively.
  • step S804 the data of each bit in the second operation mode bit vector is compared with the data of the corresponding bit of each second sub-bit vector to obtain an abnormal count.
  • a bit vector corresponding to the operation of the user e in the time interval [0, 12) is taken as an example.
  • the abnormal threshold ⁇ 2.
  • the data of each bit in the second operation mode bit vector is compared with the data of the corresponding bit of the second sub-bit vector.
  • the first bit is 1, and the first bit of the second operating mode bit vector is 0.
  • the two are different, so the exception count is set to 1;
  • the second sub-bit vector The second bit is 1, and the second bit of the second operating mode bit vector is 1.
  • the two are the same, so the exception count is still 1.
  • the third bit of the second sub-bit vector is 1, and the third bit of the second operating mode bit vector The bit is 0, the two are different, so the exception count is set to 2.
  • the fourth bit of the second sub-bit vector is 1, and the fourth bit of the second operation mode bit vector is 0. The two are different, so the exception count is set to 3.
  • step S805 when the abnormality count is greater than or equal to the abnormality threshold, it is determined that the operation behavior of the user corresponding to the sixth user identifier is abnormal in the fifth time interval.
  • the abnormality count is 3, and the abnormality threshold is 2.
  • the abnormality count is greater than the abnormality threshold, indicating that the operation behavior of the user corresponding to the sixth user identifier is abnormal in the fifth time interval.
  • a warning can be issued to the system to make the system take corresponding measures, such as freezing the user's account, etc., to avoid causing damage to the user's property.
  • the data processing method disclosed in the embodiment of the present application can be used in multiple fields, such as the medical field, the financial field, the service field, and so on.
  • electronic wallets Take the use of electronic wallets as an example.
  • the user will use the electronic wallet to pay when shopping online, and when the money in the electronic wallet is used up, the electronic wallet will be recharged. Every time a user recharges or consumes an electronic wallet is a user operation behavior.
  • the system will store the user operation data in the user operation data table. For example, user A made a transaction at 17:00 on October 1, 2019 and purchased a set of skin care products worth 800 yuan, then the system will record user A's consumption behavior, consumption amount, consumption time and other information to the user Operation data sheet.
  • the trigger triggers the mapping of the new data to update the bit vector table associated with the user operation data table.
  • the bit vector table records the user identification and the bit vector corresponding to the user identification, and each bit in the bit vector records whether the user has performed an operation in the corresponding time interval. According to the bit vector table, the user analysis department can obtain the bit vector of the target user from it.
  • bit vector of the target user By analyzing the bit vector of the target user, it can obtain the number of operations of the target user in a certain time interval; The bit vector is analyzed to determine whether there are users whose operating behaviors are similar to those of the target user, so as to cluster the users, and further study the operating behaviors of each type of user, such as whether the operations of similar users have an interaction relationship; You can also perform data mining based on the bit vector table, for example, by analyzing the bit vector of the target user to determine whether the target user's operating behavior in a certain time interval has periodicity.
  • the target user can also be judged Whether there is an abnormality in the operation behavior of the system, when it is determined that there is an abnormality, a warning can be issued in time, and the use of the target user's electronic wallet can be controlled through the system to avoid unnecessary losses.
  • the bit vector is a binary sequence composed of 0 and 1, where 0 is relatively large, the bit vector can be compressed to improve the utilization of storage space.
  • is used to represent the compressed bit vector, where p bits are a group.
  • a group of compressed vectors is a bit vector containing 8 bits, in which the first bit is a flag bit, that is, it indicates the meaning expressed by the 7 bits behind the group.
  • the first bit is 1, it means that the next 7 bits are uncompressed bit vectors; if the first bit is 0, it means that the next 7 bits are used for counting and recording the number of compressed consecutive 0s.
  • the first bit is 1, indicating that the following 7 bits 000100 are uncompressed bit vectors; in the compressed bit vector 0000100, the first bit is 0, indicating that the following 7 bits 000100 are compressed continuous The number of 0s, a total of 4 0s.
  • B [0, 51) 0010010000000000000000000000000000001100011.
  • B [0, 7) the first bit of the first group of the compressed vector ⁇ is 1, indicating that there is no compression, and the remaining 7 bits are the same as B [0, 7) , that is, the first bit of ⁇ One group is 10010010; the next 8th to 43rd bits are all 0s, so the first bit of the second group of ⁇ is 0, and the remaining 7 bits are used to record the number of compressed 0s, B [7
  • the number of 0 in 44) is 37, which is represented as 100101 in binary, and converted to 7-bit binary as 0100101, then the second group of ⁇ is 00100101; the last 44th to 50th digits contain 1, so the third of ⁇
  • the first bit of the group is 1, and the remaining 7 bits are the same as B [
  • the compression ratio is related to two aspects, the number of consecutive 0s in the bit vector and the group size set in the compressed bit vector.
  • the compressed bit vector when it is necessary to perform operations on the compressed bit vector, the compressed bit vector can be read out from the database first, and then decompressed to obtain the corresponding bit vector, and perform data based on the bit vector deal with.
  • FIG. 9 shows a schematic diagram of the process of decompressing the compressed bit vector. As shown in FIG. 9, in step S901, the query interval corresponding to the compressed vector and the bit vector to be processed is obtained, and the query interval includes a start bit and a stop bit.
  • step S902 the compressed vector is divided into multiple compressed bit vectors according to the number of bits of the compressed bit vector, and the compressed bit vectors are sequentially decompressed to obtain a decompressed bit vector with more bits than the starting bit; in step In S903, the vector value whose median of the decompressed bit vector is greater than the start bit vector is used as the vector value in the bit vector to be processed; in step S904, if the number of vector values is less than the difference between the stop bit vector and the start bit vector , The compressed bit vector adjacent to the decompressed bit vector is decompressed to obtain the vector value of the remaining bits in the bit vector to be processed.
  • the query interval of a given bit vector to be processed is from the 40th hour to the 50th hour, that is, under the condition that the time granularity is 1 hour, the starting bit number is 40.
  • the stop bit is 50
  • the compression vector B [0, 51) is obtained at the same time; then the compression vector can be scanned according to the preset group size, and the compression vector can be divided into multiple compression bit vectors, such as the preset group size It is 8 bits, that is, the length of the compressed bit vector is 8, then the compressed vector can be divided into multiple compressed bit vectors of length 8; then each compressed bit vector is decompressed in turn, for example, the first group of compressed bit vectors is 10010010 , The first bit is 1, indicating that the following seven bits are not compressed, so the first group stores the first 7 bits of the bit vector to be processed.
  • the first group of compressed bit vectors Since 7 is less than the starting bit number 40, the first group of compressed bit vectors is not compressed. Contains the bits in B [10, 51) ; the second group of compressed bit vectors is 00100101, the first bit of which is 0, indicating that the following seven bits are the number of compressed consecutive 0s, a total of 37 0s, the first group The seven bits in and the number of 0s contained in the second group are 44 bits in total, which is greater than the starting bit of 40, indicating that the second group of compressed bit vectors contains the first four bits in B [40, 51) , specifically 0000; because B [40, 51) contains eleven bits, so it is necessary to continue to decompress the third group of compressed bit vectors.
  • the embodiments of this application map user operation data to bit vectors corresponding to time, and can realize user operation-related queries, such as the tasks and target results mentioned in the above embodiments, because the basic operations and basic operations of bit vectors can be used by the computer.
  • the excellent support at the bottom layer improves the efficiency of data processing and can quickly return results.
  • the data processing method in the embodiment of the present application can improve data processing efficiency and accuracy, provide better data support for user analysis departments, and can also be used as a preprocessing process for other data query or data mining work to improve processing efficiency.
  • the data processing method in the embodiment of the present application can well protect the user's privacy and avoid the leakage of the user's privacy. Furthermore, the bit vector can be compressed and stored during storage, which can save a lot of storage space and avoid the problems of insufficient storage space and reduced data processing efficiency caused by a large bit vector.
  • Fig. 10 schematically shows a block diagram of a data processing device according to an embodiment of the present application.
  • a data processing device 1000 includes: an acquisition module 1001 and an operation module 1002.
  • the obtaining module 1001 is configured to obtain a bit vector table related to the operation data of the target user in response to a query request; wherein the query request includes identification information and time information, and the bit vector table includes a user identification and a connection with the The bit vector corresponding to the user ID, and the ID information corresponds to the user ID; the arithmetic module 1002 is configured to obtain the target bit vector from the bit vector table according to the ID information and the time information, and compare the The target bit vector performs logical processing to obtain target information.
  • the bit vector includes user operation information in each time granularity.
  • the arithmetic module 1002 is configured to obtain a first user identification and a first time interval, and obtain from the bit vector table according to the first user identification and the first time interval A first target bit vector corresponding to the first user ID; and statistics are performed on the first target bit vector to obtain the number of operations performed by the user corresponding to the first user ID in the first time interval.
  • the first user identification is obtained according to the identification information, for example, is included in the identification information, or may be obtained from a storage device by a data processing apparatus through the identification information.
  • the time information includes a first time interval.
  • the first user identifier corresponds to the target user.
  • the computing module 1002 is configured to: obtain a second user ID, a user ID to be compared, and a second time interval; according to the second user ID, the user ID to be compared, and The second time interval obtains a second target bit vector corresponding to the second user identification and a target bit vector to be compared corresponding to the user identification to be compared from the bit vector table; Perform an exclusive OR operation on the two target bit vectors and the target bit vector to be compared, and perform a negation operation on the result of the exclusive OR operation to obtain a comparison target bit vector; perform statistics on the comparison target bit vector to obtain Operational similarity between the user corresponding to the second user identifier and the user corresponding to the compared user identifier in the second time interval.
  • the second user identification and the user identification to be compared are obtained according to the identification information, for example, included in the identification information, or may be obtained from a storage device by the data processing apparatus through the identification information.
  • the time information includes a second time interval.
  • the second user ID and the user ID to be compared correspond to the target user.
  • the arithmetic module 1002 includes: an information acquisition unit, configured to acquire a third user identification, a fourth user identification, a similarity threshold, and a third time interval; and a bit vector acquiring unit, configured according to The third user ID, the fourth user ID, and the third time interval obtain a third target bit vector corresponding to the third user ID and a third target bit vector corresponding to the fourth user ID from the bit vector table.
  • the fourth target bit vector a similarity obtaining unit, configured to perform a shift operation on the fourth target bit vector to obtain a shift target bit vector, and compare the shift target bit vector and the third target bit vector Perform similarity judgment to obtain similarity; a comparison unit, configured to compare the similarity with the similarity threshold, and determine according to the comparison result that the third user ID corresponds to the third user identifier in the third time interval. Whether the user's operation affects the user's operation corresponding to the fourth user identifier.
  • the third user identification and the fourth user identification are obtained according to the identification information, for example, are included in the identification information, or may be obtained from a storage device by the data processing apparatus through the identification information.
  • the time information includes a third time interval.
  • the third user identification and the fourth user identification correspond to the target user.
  • the similarity obtaining unit is configured to: shift the fourth target bit vector to the left according to the shift unit to obtain the shift target bit vector;
  • the target bit vector and the shift target bit vector are XORed, and the result of the XOR operation is negated to obtain a similarity target bit vector; the similarity target bit vector is counted to obtain the Similarity.
  • the comparing unit is configured to: when the similarity is greater than or equal to the similarity threshold, determine that the user's identity corresponding to the third user identifier is within the third time interval. The operation has an impact on the operation of the user corresponding to the fourth user identification; when the similarity is less than the similarity threshold, the method described in the foregoing embodiment is repeatedly executed until the fourth target bit vector is shifted to the left The number of bits reaches the shift threshold.
  • the arithmetic module 1002 is configured to: obtain a fifth user identification, a first operation mode bit vector, a first operation mode period, and a fourth time interval;
  • the fourth time interval obtains a fifth target bit vector corresponding to the fifth user identifier from the bit vector table; converts the fifth target bit vector to the number of bits of the first operation mode bit vector
  • the ordering of the vectors and the sub-similarity determine the sequence bit vector, and obtain the repetition period of the sequence bit vector; when the repetition period is the same as the period of the first operation mode, it is determined that the fifth user ID corresponds to The user's operation behavior has periodicity in the fourth time interval.
  • the fifth user identification is obtained according to the identification information, for example, is included in the identification information, or may be obtained from a storage device by the data processing apparatus through the identification information.
  • the time information includes a fourth time interval.
  • the fifth user identifier corresponds to the target user.
  • the arithmetic module 1002 is configured to: obtain a sixth user identification, a second operation mode bit vector, an abnormality threshold, and a fifth time interval; according to the sixth user identification and the fifth time interval; The time interval obtains a sixth target bit vector corresponding to the sixth user identifier from the bit vector table, wherein the operation of the user corresponding to the sixth user identifier is periodic; according to the second operation mode bit vector
  • the number of bits in the sixth target bit vector is divided into a plurality of second sub-bit vectors; the data of each bit in the second operation mode bit vector and the data of the corresponding bits of each of the second sub-bit vectors are separately performed Comparison to obtain an abnormal count; when the abnormal count is greater than or equal to the abnormal threshold, it is determined that the operation behavior of the user corresponding to the sixth user identifier is abnormal in the fifth time interval.
  • the sixth user identification is obtained according to the identification information, for example, is included in the identification information, or may be obtained from a storage device by the data processing apparatus through the identification information.
  • the time information includes a fifth time interval.
  • the sixth user identifier corresponds to the target user.
  • the second operation mode bit vector and abnormal threshold may be preset.
  • the data processing device 1000 further includes: a bit vector table generating module, configured to generate a user operation data table according to the user operation data, and generate a user operation data table according to the user operation data table.
  • the bit vector table associated with the user operation data table, the user includes a target user; the bit vector table update module is used to monitor the user operation data in the user operation data table when the user operation data changes. The data is mapped to update the bit vector in the bit vector table.
  • the user operation data table is provided with a trigger;
  • the bit vector table update module is configured to: monitor the user operation data table; the data in the user operation data table is generated When it changes, the trigger triggers the mapping of the changed user operation data to update the bit vector in the bit vector table.
  • the bit vector table update module is configured to: determine from the user operation data table the target user identification corresponding to the changed user operation data; Obtain the first bit vector corresponding to the target user identifier from the bit vector table, and map the changed user operation data to obtain the second bit vector; OR the first bit vector and the second bit vector Operate to obtain a third bit vector; replace the first bit vector with the third bit vector to update the bit vector in the bit vector table.
  • the bit vector is a compressed bit vector
  • the first bit of the compressed bit vector is a flag bit.
  • the flag bit is 1, the remaining bits after the first bit are A bit vector without compression; when the flag bit is 0, the remaining bits after the first bit are the number of consecutive 0s that are compressed.
  • the data processing device 1000 further includes: an acquiring module, configured to acquire a query interval corresponding to the compressed vector and the bit vector to be processed, the query interval including a start bit number and a stop bit number;
  • the decompression module is configured to divide the compressed vector into a plurality of compressed bit vectors according to the number of bits of the compressed bit vector, and sequentially decompress the compressed bit vectors to obtain the number of bits greater than the starting number of bits.
  • Decompression bit vector used to take the vector value of the decompression bit vector whose median is greater than the start bit vector as the vector value in the to-be-processed bit vector; bit-compensation module, used in the The number of vector values is less than the difference between the stop bit number and the start bit number, then the compressed bit vector adjacent to the decompressed bit vector is decompressed to obtain a vector of the remaining bits in the bit vector to be processed value.
  • Fig. 11 shows a schematic structural diagram of a computer system suitable for implementing an electronic device according to an embodiment of the present application.
  • the computer system 1100 includes a central processing unit (CPU) 1101, which can be loaded into a random system according to a program stored in a read-only memory (Read-Only Memory, ROM) 1102 or from a storage part 1108. Access to the program in the memory (Random Access Memory, RAM) 1103 to execute various appropriate actions and processing to implement the data processing method described in the foregoing embodiment. In RAM 1103, various programs and data required for system operation are also stored.
  • the CPU 1101, the ROM 1102, and the RAM 1103 are connected to each other through a bus 1104.
  • An input/output (Input/Output, I/O) interface 1105 is also connected to the bus 1104.
  • the following components are connected to the I/O interface 1105: input part 1106 including keyboard, mouse, etc.; including output part 1107 such as cathode ray tube (Cathode Ray Tube, CRT), liquid crystal display (LCD), etc., and speakers, etc. ; A storage part 1108 including a hard disk, etc.; and a communication part 1109 including a network interface card such as a LAN (Local Area Network) card and a modem.
  • the communication section 1109 performs communication processing via a network such as the Internet.
  • the driver 1110 is also connected to the I/O interface 1105 as needed.
  • a removable medium 1111 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 1110 as needed, so that the computer program read therefrom is installed into the storage portion 1108 as needed.
  • the process described below with reference to the flowchart can be implemented as a computer software program.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from the network through the communication part 1109, and/or installed from the removable medium 1111.
  • CPU central processing unit
  • the computer-readable medium shown in the embodiment of the present application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above.
  • Computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Erasable Programmable Read Only Memory (EPROM), flash memory, optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable of the above The combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein.
  • This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium.
  • the computer-readable medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device .
  • the program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wireless, wired, etc., or any suitable combination of the above.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present application may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • this application also provides a computer-readable medium.
  • the computer-readable medium may be included in the data processing device described in the above-mentioned embodiment; or it may exist alone without being integrated into the electronic device. In the device.
  • the foregoing computer-readable medium carries one or more programs, and when the foregoing one or more programs are executed by an electronic device, the electronic device realizes the method described in the foregoing embodiment.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (can be CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) execute the method according to the embodiments of the present application.
  • a computing device which can be a personal computer, a server, a touch terminal, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供了一种数据处理方法、电子设备及可读存储介质,涉及计算机领域。该方法包括:响应于查询请求,获取与目标用户的操作数据相关的位向量表;其中所述查询请求包括标识信息和时间信息,所述位向量表包括用户标识和与所述用户标识对应的位向量,所述标识信息与所述目标用户相对应;根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,对所述目标位向量进行逻辑处理以获取目标信息。

Description

数据处理方法、电子设备及可读存储介质
本申请要求于2019年11月15日提交的申请号为201911122281.7、发明名称为“数据处理方法、装置及电子设备”的中国专利申请的优先权。
技术领域
本申请涉及计算机技术领域,具体而言,涉及一种数据处理方法、电子设备及可读存储介质。
背景技术
随着计算机技术的迅速发展,计算机存储和数据处理广泛应用于各行各业。同时随着数据爆炸式增长,若通过人工统计的方式从数据库中获取某个用户在某个时间段的操作信息是异常困难的。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本申请的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本申请的实施例提供了一种数据处理方法、数据处理装置、计算机可读存储介质及电子设备,进而至少在一定程度上可以提高数据处理效率。
本申请的其他特性和优点将通过下面的详细描述变得显然,或部分地通过本申请的实践而习得。
根据本申请实施例的一个方面,提供了一种数据处理方法,由计算机设备执行,该方法包括:
响应第一用户的查询请求,响应于查询请求,获取与目标用户的操作数据相关的位向量表;其中所述查询请求包括标识信息和时间信息,所述位向量表包括用户标识和与所述用户标识对应的位向量,所述标识信息与所述目标用户相对应;根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,对所述目标位向量进行逻辑处理以获取目标信息。
根据本申请实施例的一个方面,提供了一种数据处理装置,包括:获取模 块,用于响应于查询请求,获取与目标用户的操作数据相关的位向量表;其中所述查询请求包括标识信息和时间信息,所述位向量表包括用户标识和与所述用户标识对应的位向量,所述标识信息与所述目标用户相对应;运算模块,用于根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,对所述目标位向量进行逻辑处理以获取目标信息。
根据本申请实施例的一个方面,提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如上述实施例所述的数据处理方法。
根据本申请实施例的一个方面,提供了一种电子设备,包括一个或多个处理器;存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如上述实施例所述的数据处理方法。
在本申请的一些实施例所提供的技术方案中,响应第一用户的查询请求,获取与目标用户的操作数据相关的位向量表;然后根据查询请求中的标识信息和时间信息从位向量表中获取目标位向量,通过对目标位向量进行逻辑处理以获取目标信息。本申请的技术方案能够通过将用户操作数据转换为位向量,提高数据处理效率、减少资源浪费。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图;
图2示意性示出了根据本申请的一个实施例的数据处理方法的流程示意图;
图3示意性示出了根据本申请的一个实施例的更新位向量表的流程示意图;
图4示意性示出了根据本申请的一个实施例的基于位向量统计用户操作次 数的流程示意图;
图5示意性示出了根据本申请的一个实施例的基于位向量判断用户操作相似性的流程示意图;
图6示意性示出了根据本申请的一个实施例的基于位向量判断用户操作之间影响关系的流程示意图;
图7示意性示出了根据本申请的一个实施例的基于位向量的用户行为周期性判断的流程示意图;
图8示意性示出了根据本申请的一个实施例的基于位向量进行异常操作判断的流程示意图;
图9示意性示出了根据本申请的一个实施例的对压缩位向量进行解压的流程示意图;
图10示意性示出了根据本申请的一个实施例的数据处理装置的框图;
图11示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本申请将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器 装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
目前,想要获取计费数据中某个用户在某段时间的操作信息,只能利用用户的ID信息从数据库中根据时间属性列进行检索。当根据操作相似性查询用户的相似用户时,需要事先定义相似用户的判定标准,然后检索每一个用户在给定时间区间内的操作信息,并同给定用户进行相似性比较。而对计费数据进行挖掘处理,仍然需要先检索部分甚至全部的用户数据,然后根据挖掘需求进行挖掘计算。
本申请实施例提供了对当前计费系统改进的数据处理方法、数据处理装置、计算机存储介质及电子设备。
图1示出了可以应用本申请实施例的技术方案的示例性系统架构的示意图。
如图1所示,系统架构100可以包括终端设备101、网络102和服务器103。网络102用以在终端设备101和服务器103之间提供通信链路的介质。网络102可以包括各种连接类型,例如有线通信链路、无线通信链路等等。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实际需要,可以具有任意数目的终端设备、网络和服务器。比如服务器103可以是多个服务器组成的服务器集群等。终端设备101可以是诸如笔记本、台式机、智能手机等带有显示屏幕的终端设备。
在本申请的一个实施例中,用户在终端设备101的显示屏幕上进行各类操作,终端设备101能够将用户操作对应的指令通过网络102发送至服务器103。服务器103接收到指令后能够对该指令进行响应,并且可以分析用户操作,根据用户操作数据构建用户操作数据表,并生成与用户操作数据表关联的位向量表。在构建用户操作数据表和位向量表后,服务器103监听用户操作数据表。当用户操作数据表中的用户操作数据发生变化时,可以将变化后的用户操作数据映射为位向量,以更新位向量表中的位向量。在终端设备101或者服务器103或者与服务器103连接的其他设备需要对用户操作行为进行查询、挖掘时,可 以根据作为查询、挖掘条件的标识信息和时间信息,从位向量表中获取与标识信息对应的目标位向量,通过对目标位向量进行逻辑处理以获取目标信息。其中,所述标识信息为目标用户对应的标识信息。对用户操作行为进行查询、挖掘例如包括查询给定的目标用户在查询时间区间内是否对其账户进行过操作、查询在查询时间区间内是否存在与给定的目标用户有相似操作行为的其他用户、挖掘给定的目标用户的操作行为在查询时间区间内是否具有周期性、在给定的目标用户存在周期性操作行为的基础上判断用户在查询时间区间内是否存在异常操作,等等。相应地,目标信息包括目标用户在查询时间区间内进行某种操作或某些操作的次数、在查询时间区间内与目标用户的操作行为相似的其他用户、在查询时间区间内目标用户的用户操作行为的周期性、存在周期性操作行为的目标用户在查询时间区间内是否存在异常操作等。本申请实施例的技术方案能够通过将用户操作数据转换为位向量,提高数据处理效率、减少资源耗费。
本申请中,各步骤的执行主体可以是计算机设备,该计算机设备可以是任何具备处理和存储能力的电子设备,如手机、平板电脑、游戏设备、多媒体播放设备、电子相框、可穿戴设备、PC(Personal Computer)等电子设备,也可以是服务器等。为了便于说明,在下述方法实施例中,仅以各步骤的执行主体为计算机设备进行介绍说明,但对此不构成限定。
需要说明的是,本申请实施例所提供的数据处理方法一般由服务器执行,相应地,数据处理装置一般设置于服务器中。但是,在本申请的其它实施例中,也可以由终端设备执行本申请实施例所提供的数据处理方法。
在一些技术中,从计费数据库中查询某个用户在某段时间的操作信息,通常利用用户的ID等标识信息从数据库中根据时间进行检索,以获得用户在该段时间内的操作信息。但是当从计费数据库中检索一个或多个用户在不同时间区间的操作信息并进行聚集时,由于同一个用户在不同时间的操作数据很有可能分布在不同的存储节点,即使在用户数据上已经建有索引,但是仍然需要消耗大量的检索时间。
本申请实施例首先提出了一种数据处理方法,以下对本申请实施例的技术方案的实现细节进行详细阐述:
图2示意性示出了根据本申请的一个实施例的数据处理方法的流程图,该数据处理方法可以由服务器来执行,该服务器可以是图1中所示的服务器103。参照图2所示,该数据处理方法至少包括步骤S210至步骤S220,详细介绍如下:
在步骤S210中,响应于查询请求,获取与目标用户的操作数据相关的位向量表;其中所述查询请求包括标识信息和时间信息,所述位向量表包括用户标识和与所述用户标识对应的位向量,所述标识信息与所述目标用户相对应。
在本申请的一个实施例中,该查询请求可以为一查询用户发起的、用于对目标用户的操作数据进行查询的请求。目标用户为进行具体操作并产生操作数据的用户,目标用户可以是一个用户,也可以是多个用户。服务器获取查询请求后,可以根据查询请求中的标识信息及时间信息,获取与标识信息对应的目标用户的操作数据,并对其进行处理。但是由于目标用户的操作数据的数据量庞大,如果直接对目标用户的操作数据进行处理,数据处理效率很低,精准度也差。因此在本申请实施例中,可以根据目标用户的操作数据生成用户操作数据表,并根据用户操作数据表生成与用户操作数据表关联的位向量表,进而通过对位向量表中的位向量进行处理以获得目标信息。
在本申请的一个实施例中,根据目标用户的操作数据生成用户操作数据表,并根据用户操作数据表生成与用户操作数据表关联的位向量表,具体可以是通过以下方式进行。
用户通过终端设备101进行操作,例如在网络购物平台上通过点击相应控件进行商品浏览、下单、付款、分享等操作,在聊天界面上通过点击相应控件进行消息发送、分享、编辑、删除等操作,等等。终端设备101接收到用户的操作后,将该操作对应的指令发送至服务器103。服务器103接收到指令后进行相应反馈,并对用户的操作行为进行分析,根据用户的操作行为构建用户操作数据表。该用户操作数据表可以是一个K-V(键-值)数据表,其中的键可以是用户的用户标识,例如用户注册时生成的用户ID、用户身份证号码等与用户唯一关联的信息或唯一标识用户的信息;值可以是用户进行操作后产生的数据,例如支出金额、购买次数、充值金额、充值次数等等。在根据用户操作行为构建用户操作数据表时,还可以创建一位向量表。该位向量表与用户操作数据表相关联,其中所记录的数据为根据用户操作数据映射而得,并且该位向量表中 记录用户标识和与用户标识对应的位向量。位向量是由连续的若干个0/1序列构成的二进制序列。位向量的长度是位向量中0和1的数目。位向量中的每一个0或1称为“位”。例如,01011为长度为5的位向量。表1示出了示例的位向量表的结构,如表1所示:
表1 用户操作位向量表
用户 B [0,12)
a 110011011100
c 100101100011
d 101110110101
e 010001000111
其中,B [s,e)表示在时间区间[s,e)内的根据用户的操作数据映射得到的位向量,时间区间[s,e)中,s表示开始时间,e表示结束时间;a、c、d、e为用户标识,对应的110011011100、100101100011、101110110101和010001000111分别为用户a、用户c、用户d和用户e的操作数据映射得到的位向量;各个位向量的长度为12。
在本申请的一个实施例中,位向量表可以通过SQL语句进行创建。在创建时,位向量表的名称和时间粒度必须同时进行设定。其中位向量表的名称可以根据用户操作数据表确定,由bitsVector_table_name指定。例如当用户操作数据表的名称为buy_record_tab,则与其关联的位向量表的名称可以为bitsVector_buy_record_tab。若不指定,则与用户操作数据表关联的位向量表的名称缺省值为“用户操作数据表名_bvt”,例如buy_record_tab_bvt。时间粒度为位向量中每一位的时间属性,其表示一段时间。位向量中的每一位即为用户在该段时间内的用户操作行为。时间粒度也可以理解为位向量的周期。时间粒度可以设置为一天(d)、一小时(h)、一分钟(m)等等。如表1中的时间粒度为一小时。当然也可以将时间粒度设置为其它值,本申请实施例对此不做具体限定。设置时间粒度的目的是根据需求合理控制位向量的存储空间使用、查询准确度和查询需求之间的关系。如果在需要精准的查询结果且不考虑存储空间使用的情况下,可以将时间粒度设定为较小的数值,如一分钟;当仅需要粗糙的查询结果时,可以将时间粒度设定为较大的数值。在确定了时间粒度之后,可以根 据用户的操作时间,将用户操作数据映射为位向量。例如将时间粒度设置为一小时,那么一天的位向量长度为24。当用户在2点-3点的时间区间中进行了操作,那么可以将位向量中与2点-3点对应的位设置为1,表示该用户在这一小时的时间粒度内进行过操作。值得注意的是,如果用户在某个时间粒度内进行了多次操作,则可能无法在对应的位中展示出来,这时可以通过减小时间粒度进行解决。进一步地,位向量长度还可以根据时间区间的大小和时间粒度的比值确定,使用length表示。如表1所示,给出了四个用户a、c、d、e在12个小时内的操作映射得到的位向量,各用户的时间区间大小为12小时,而时间粒度为1小时,则length=12/1=12。
在本申请的一个实施例中,可以通过ALTER TABLE语句对位向量表进行修改。ALTER TABLE语句使用同样的谓词对位向量表进行修改。如只修改period值的话,之前的数据失效;如修改位向量表名,则可保留旧数据,并将新数据存入新的位向量表中。位向量表中位向量的长度可以根据需要设置,当用户操作对应的时间超过位向量表中位向量支持的时间时,可以重新建立新的位向量表来应对。
进一步地,位向量表中除了包含用户标识、与用户标识对应的位向量之外,还可以通过在位向量表中添加列,用于记录用户的其它操作信息,例如操作类型、操作对应的交易金额、操作对应的支付方式等等。具体地,例如可以增加一个位向量记录用户的操作类型,1代表消费,0代表充值;可以增加一个位向量记录用户的交易金额,1代表消费超100元,0代表消费未超100元;可以增加一个位向量记录用户的支付方式,1代表非现金支付,0代表现金支付。在位向量表中添加列,用于记录用户的其它操作信息时,可以不用在用户操作数据表中增加列,以记录与用户操作对应的操作数据,这样不仅记录了用户操作信息,还可以避免因在用户操作数据表中添加列导致的用户操作数据表维护代价高的问题。
在本申请的一个实施例中,在构建好用户操作数据表和与用户操作数据表关联的位向量表后,可以监听用户操作数据表。当用户操作数据表中的用户操作数据发生变化时,触发更新位向量表中的位向量。具体地,可以在用户操作数据表中设置触发器,当用户进行新的操作时,用户操作数据表中的用户操作 数据会发生变化,进而通过触发器触发对变化后的用户操作数据进行映射,形成位向量,并根据该位向量对位向量表进行更新。
图3示出了更新位向量表的流程示意图,如图3所示,更新位向量表的流程至少包括步骤S301-步骤S304,具体地:
在步骤S301中,从用户操作数据表中确定与发生变化的用户操作数据所对应的目标用户标识。
在本申请的一个实施例中,在用户操作数据表中的用户操作数据发生变化时,可以确定发生变化的用户操作数据所对应的目标用户标识,进而根据目标用户标识可以获取位向量表中与该目标用户标识对应的位向量,并对其进行更新。
在步骤S302中,根据目标用户标识从位向量表中获取与目标用户标识对应的第一位向量,并将变化后的用户操作数据映射以得到第二位向量。
在本申请的一个实施例中,获取目标用户标识后,可以将其与位向量表中的用户标识进行匹配。当位向量表中存在与目标用户标识匹配的用户标识后,将对应的位向量提取出来,该位向量即为第一位向量,同时可以对变化后的用户操作数据进行映射获得第二位向量。举例而言,目标用户标识为12345,根据该目标用户标识可以获取其所对应的第一位向量为010000000000,该第一位向量的长度为12,时间粒度为1h,表示目标用户在12小时内的第1h-第2h之间进行过一次操作,若目标用户在第4h-第5h这一小时内再次进行了操作,那么可以得到映射后的第二位向量为000010000000。
在步骤S303中,将第一位向量和第二位向量进行或操作,以获取第三位向量。
在本申请的一个实施例中,获取与目标用户标识对应的用户操作数据变化前的第一位向量和变化后的第二位向量后,可以对第一位向量和第二位向量进行整合,以获取第三位向量。其中,该整合操作具体地可以是对第一位向量和第二位向量进行或(|)操作。以步骤S302中的第一位向量和第二位向量为例,第三位向量=(010000000000)|(000010000000)=010010000000,用于表示目标用户标识对应的用户在第1h-第2h及第4h-第5h分别进行过操作。
在步骤S304中,将第一位向量替换为第三位向量,以对位向量表中的位向 量进行更新。
在本申请的一个实施例中,获取第三位向量后,可以用第三位向量替换第一位向量,以实现对位向量表的更新。
在本申请的一个实施例中,可以采用NewBit函数生成新的位向量值。该NewBit函数可以接收四个参数,分别为:开始统计用户操作的时间(即位向量表生成的时间)、用户预先指定的时间粒度、用户更新操作的时间和发生操作的用户标识。NewBit函数先从位向量表中将该用户的位向量值读出,然后与用户更新操作映射得到的位向量进行或操作,以得到新的位向量,最后将新的位向量写回位向量表。该NewBit函数可以是触发器的函数、用户自定义函数,也可以是数据库引擎的系统函数,本申请实施例对此不做具体限定,可以根据实际需要选择合适的函数更新位向量表。
在步骤S220中,根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,对所述目标位向量进行逻辑处理以获取目标信息。
在本申请的一个实施例中,服务器在获取标识信息和时间信息后,可以根据标识信息和时间信息从位向量表中获取目标位向量,进而根据目标位向量获取目标信息。在本申请的实施例中,目标信息包括用户在查询时间区间内进行某种操作或某些操作的次数、在查询时间区间内其他用户的操作与目标用户的操作是否相似、在查询时间区间内用户操作行为之间的相互影响关系、在查询时间区间内用户操作行为的周期性、具有周期性操作行为的用户在查询时间区间内是否存在异常操作。
在本申请的一个实施例中,根据目标位向量获取目标信息的具体方式是对目标位向量进行逻辑处理,该逻辑处理包括位向量的基本运算和基本操作。位向量的基本运算包括与、或、非、异或,分别使用&、|、~和^表示。在本申请实施例中,可以采用不同的运算对位向量进行处理,以获取不同的目标信息。其中,与、或、异或运算为二目运算。与、或和异或运算分别在如下情况下返回结果1:即两个位都为1、任意一个位为1和有且仅有一个位为1,其余情况返回结果0;非运算为单目运算,在操作位为0的情况下返回结果1,否则返回结果0。位向量的基本操作是移位,在本申请实施例中使用>>和<<表示移位操作,如
Figure PCTCN2020117623-appb-000001
表示向右移动两位,移位后最左两位补0,得到的位向量为 001100110111。这里需要强调的是,位向量的基本运算和移位操作可以被计算机底层很好地支持,并被计算机十分快速地完成。由于直接对用户操作数据表进行查询获得的数据是直接暴露给外界的,使得用户数据不能得到保护,增加了用户隐私泄露的风险,而本申请实施例中将用户操作数据映射形成位向量,基于位向量进行相应处理,避免了将用户操作数据直接暴露给外界,因此可以提高用户数据的安全性,避免用户隐私泄露。
在本申请的一个实施例中,针对位向量的一个关键操作是计数,用Count表示,如
Figure PCTCN2020117623-appb-000002
函数返回的就是在时间区间[s,e)内a用户对应的位向量中1的数目。在本申请的实施例中,Count函数可以通过移位操作快速完成,具体算法步骤为:将位向量B [s,e)作为输入位向量,先获取位向量B [s,e)的长度及第一位,将位向量B [s,e)的第一位与1进行与运算,将与运算的结果作为初始统计值;接着将位向量B [s,e)向右移位获取其第二位,将第二位与1进行与运算,并用与运算的结果对初始统计值进行更新,重复上述步骤,直至获取位向量B [s,e)的最后一位与1进行与运算的结果,并对统计值进行更新得到最终的结果,该最终结果即为Count()函数的返回值。
在本申请的一个实施例中,对应目标信息的具体分类,本申请实施例的数据处理任务有五个:(1)操作统计;(2)操作相似性;(3)操作影响关系;(4)周期性判断;(5)异常操作判断。
针对任务(1)操作统计,图4示出了基于位向量统计用户操作次数的流程示意图,如图4所示,该流程至少包括步骤S401-S402,具体地:
在步骤S401中,获取第一用户标识和第一时间区间,根据第一用户标识和第一时间区间从位向量表中获取与第一用户标识对应的第一目标位向量。
在本申请的一个实施例中,在位向量表中可以记录多个用户标识及与各用户标识对应的位向量,在获取第一用户标识和第一时间区间后,可以将第一用户标识与位向量表中的用户标识进行匹配,当位向量表中存在第一用户标识时,根据第一时间区间和位向量表的时间粒度确定与第一用户标识对应的位向量,并将其标记为第一目标位向量。其中,所述步骤S401例如是响应于查询用户的查询请求执行的,查询请求包括标识信息和时间信息。所述第一用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识 信息由服务器从一个存储设备中获取。所述时间信息包括第一时间区间。所述第一用户标识与所述目标用户相对应。
在步骤S402中,对第一目标位向量进行统计,以获取第一用户标识对应的用户在第一时间区间内进行操作的次数。
在本申请的一个实施例中,获取第一目标位向量之后,可以采用Count()函数对其进行统计,也就是统计第一目标位向量中1的数量,以获取第一用户标识对应的用户在第一时间区间内进行操作的次数。例如以表1中的用户a为例,想要获取用户a在时间区间[0,5)内的操作情况,首先可以获取用户a在时间区间[0,5)上的第一目标位向量11001,然后采用Count()函数对第一目标位向量11001进行统计,可以获得
Figure PCTCN2020117623-appb-000003
说明用户a在时间区间[0,5)内至少进行过3次操作;最后返回统计结果3。
针对任务(2)操作相似性,图5示出了基于位向量判断用户操作相似性的流程示意图,如图5所示,该流程至少包括步骤S501-S504,具体地:
在步骤S501中,获取第二用户标识、待比对用户标识和第二时间区间。
在本申请的一个实施例中,服务器首先可以获取目标用户的用户标识、其他用户的用户标识及要查询的时间区间,然后根据该时间区间内目标用户的操作位向量和其他用户的操作位向量,判断目标用户的操作和其他用户的操作是否存在相似性。在本申请的实施例中,将目标用户的用户标识标记为第二用户标识,将其他用户的用户标识标记为待比对用户标识,将要查询的时间区间标记为第二时间区间,其中,待比对用户标识可以为一个用户标识,也可以为多个用户标识。其中,其中,所述步骤S501例如是响应于查询用户的查询请求执行的,查询请求包括标识信息和时间信息。所述第二用户标识、待比对用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由服务器从一个存储设备中获取。所述时间信息包括第二时间区间。所述第二用户标识、所述待比对用户标识与所述目标用户相对应。
在步骤S502中,根据第二用户标识、待比对用户标识和第二时间区间,从位向量表中获取与第二用户标识对应的第二目标位向量和与待比对用户标识对应的待比对目标位向量。
在本申请的一个实施例中,在获取第二用户标识、待比对用户标识和第二 时间区间后,可以分别将第二用户标识、待比对用户标识分别与位向量表中的用户标识进行匹配,以获取第二时间区间内的与第二用户标识对应的第二目标位向量和与待比对用户标识对应的待比对目标位向量。
在步骤S503中,将第二目标位向量与待比对目标位向量进行异或运算,并对异或运算的结果进行非运算,以获取比对目标位向量。
在本申请的一个实施例中,获取第二目标位向量和待比对目标位向量后,可以将第二目标位向量与各待比对目标位向量进行运算,以获取二者之间的相似性。首先可以将第二目标位向量与待比对目标位向量进行异或运算,并对异或运算的结果进行非运算,以获取比对目标位向量;然后对比对目标位向量进行统计,以获取二者之间的相似性。
以表1中的用户a、c、d为例,判断用户c和用户a、d在时间区间[0,12)内的操作相似性。服务器首先获取用户c、a、d在时间区间[0,12)上的位向量,分别为
Figure PCTCN2020117623-appb-000004
接着将用户c的位向量分别与用户a、d的位向量进行异或运算,即
Figure PCTCN2020117623-appb-000005
Figure PCTCN2020117623-appb-000006
然后对异或运算后的结果进行非运算,即
Figure PCTCN2020117623-appb-000007
Figure PCTCN2020117623-appb-000008
即比对目标位向量为
Figure PCTCN2020117623-appb-000009
Figure PCTCN2020117623-appb-000010
在步骤S504中,对比对目标位向量进行统计,以获取第二用户标识对应的用户和待比对用户标识对应的用户在第二时间区间内的操作相似性。
在本申请的一个实施例中,在获取比对目标位向量后,可以根据Count()函数对比对目标位向量进行统计,以获得第二用户标识对应的用户和待比对用户标识对应的用户在第二时间区间内的操作相似性。经过统计可得,
Figure PCTCN2020117623-appb-000011
Figure PCTCN2020117623-appb-000012
由此可以确定,在时间区间[0,12)内,用户d与用户c的操作相似性大于用户a与用户c的操作相似性。
在本申请的一个实施例中,还可以通过计算在给定时间区间内相似操作占全部操作的比例来细化相似性度量。如在第二时间区间内,用户c和用户a相似操作的比例为3/12,而用户c和用户d相似操作的比例为6/12,很明显,用户c和用户d在操作行为上更加相似。
有了用户操作相似性的定义和查询操作后,就可以利用该操作对用户进行聚类,即根据操作相似性对用户进行聚类。相似用户划分到同一个类里,不同用户划分到不同类里。聚类后的结果可以用作数据的预处理过程,以加快其它分析的处理速度;也可以提供给用户画像作为用户的一类行为特征,帮助更好地了解用户。
针对任务(3)操作影响关系,图6示出了基于位向量判断用户操作之间影响关系的流程示意图,如图6所示,该流程至少包括步骤S601-S604,具体地:
在步骤S601中,获取第三用户标识、第四用户标识、相似性阈值和第三时间区间。其中,所述步骤S601例如是响应于查询用户的查询请求执行的,查询请求包括标识信息和时间信息。所述第三用户标识、第四用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由服务器从一个存储设备中获取。所述时间信息包括第三时间区间。所述第三用户标识、第四用户标识与所述目标用户相对应。
在本申请的一个实施例中,想要确定两个或多个用户在某一时间区间内的操作是否存在影响关系,首先需要确定所述两个或多个用户的用户标识和要查询的时间区间,然后获取一相似性阈值,根据该相似性阈值判断两个或多个用户的操作行为是否相似,进而确定两个或多个用户的操作行为是否存在相互影响的关系。
在步骤S602中,根据第三用户标识、第四用户标识和第三时间区间,从位向量表中获取与第三用户标识对应的第三目标位向量和与第四用户标识对应的第四目标位向量。
在本申请的一个实施例中,获取第三用户标识和第四用户标识后,分别将第三用户标识和第四用户标识与位向量表中的用户标识进行匹配,以获取第三时间区间内与第三用户标识对应的第三目标位向量和与第四用户标识对应的第四目标位向量。
在步骤S603中,对第四目标位向量进行移位操作以获取移位目标位向量,并对移位目标位向量和第三目标位向量进行相似性判断,以获取相似度。
在本申请的一个实施例中,一个用户的操作对另一个用户的操作的影响可能是同步的,也可能是有延迟的。因此在确定用户操作之间的影响关系时,可 以对第四目标位向量进行移位操作,然后判断第三目标位向量与移位后的第四目标位向量的相似性,并获取二者之间的相似度。具体地,首先可以按照移位单元将第四目标位向量向左移位,以获取移位目标位向量;接着将第三目标位向量与移位目标位向量进行异或运算,并对异或运算的结果进行非运算,以获取相似性目标位向量;最后通过Count()函数对相似性目标位向量进行统计,以获取第三目标位向量和第四目标位向量之间的相似度。其中,移位单元是每次进行移位操作时发生变化的位的数量,例如可以是1、2等等,只要是小于位向量长度的任意整数即可。另外,可以设置一移位阈值,当移位操作达到移位阈值时,则停止移位操作,判定第三用户标识对应的用户的操作对第四用户标识对应的用户的操作不存在影响。
在步骤S604中,将相似度与相似性阈值进行比较,并根据比较结果判断在第三时间区间内第三用户标识对应的用户的操作对第四用户标识对应的用户的操作是否存在影响。
在本申请的一个实施例中,获取第三目标位向量和移位目标位向量的相似度后,可以将该相似度与相似性阈值进行比较,根据比较结果判断第三用户标识对应的用户的操作对第四用户标识对应的用户的操作是否存在影响。具体地,当相似度大于或等于相似性阈值时,判定在第三时间区间内第三用户标识对应的用户的操作对第四用户标识对应的用户的操作存在影响;当相似度小于相似性阈值时,对第四目标位向量再进行移位操作,并计算移位后的位向量与第三目标位向量的相似度,判断该相似度与相似性阈值的大小关系,若相似度小于相似性阈值,则重复上述步骤,直至第四目标位向量向左移位的位数达到移位阈值。
在本申请的一个实施例中,当两个或多个用户的操作存在相互影响关系时,返回移位操作的次数,即一个用户的操作影响另一个用户的时间延迟;当两个或多个用户的操作不存在相互影响关系时,返回移位阈值。
上述步骤S603、S604中是对第四目标位向量进行移位实现的,也可以通过不对第四目标位向量移位,而是对第三目标位向量进行移位而实现。
以表1中的用户a、d为例,给定的时间区间分别是[3,8)和[4,9),相似性阈值α=4,移位阈值γ=3。根据时间区间和用户标识,可以获取用户a在时间区间 [3,8)内操作的第三目标位向量为001101,用户d在时间区间[4,9)内操作的第四目标位向量为110110,对第四目标位向量进行移位操作,并计算移位目标位向量与第三目标位向量的相似度,可以得到
Figure PCTCN2020117623-appb-000013
说明用户a的操作行为可能会影响用户d,影响的延迟在1h左右。同相似性判断类似,影响关系也不是强成立的,但是,这种影响关系应该是由某种外界因素导致,如某种新产品上线;也可能这种影响关系完全偶然,即两个用户在没有任何外界因素影响的情况下产生了相似的消费行为。如果在计费数据中,两个用户间这种偶然的影响经常性出现,即在一个用户对购买了某些商品后,另一个用户也常常购买了这些商品,而两个用户之间并没有任何联系,也可以对这种影响关系加以利用。当发现一个用户对账户进行操作,可以预测另一个用户也有极大可能对账户进行操作,进而提高对应用户的了解程度。
针对任务(4)周期性判断,用户的操作行为可能会具有一定的周期性,如每隔一个星期进行一次充值,每充值一次会紧跟着消费两次等等。分析用户的操作上的周期性有利于更好地了解用户的操作习惯,进而对用户进行更加准确地画像。图7示出了基于位向量的用户行为周期性判断的流程示意图,如图7所示,该流程至少包括步骤S701-S705,具体地:
在步骤S701中,获取第五用户标识、第一操作模式位向量、第一操作模式周期和第四时间区间。其中,所述步骤S701例如是响应于查询用户的查询请求执行的,查询请求包括标识信息和时间信息。所述第五用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由服务器从一个存储设备中获取。所述时间信息包括第四时间区间。所述第五用户标识与所述目标用户相对应
在本申请的一个实施例中,为了判断用户的操作在某一时间区间内是否具有周期性,需要获取用户的用户标识和时间区间,即第五用户标识和第四时间区间,同时需要获取操作模式位向量,根据该操作模式位向量判断第四时间区间内的用户操作是否为该操作模式的重复,另外还可以获取第一操作模式周期,用以判断用户操作的周期性是否符合预设的操作模式周期。
在步骤S702中,根据第五用户标识和第四时间区间,从位向量表中获取与第五用户标识对应的第五目标位向量。
在本申请的一个实施例中,可以将第五用户标识与位向量表中的用户标识进行匹配,以获取第四时间区间内第五用户标识对应的用户的第五目标位向量,并以该第五目标位向量为基准,进行周期性分析。
在步骤S703中,根据第一操作模式位向量的位数,将第五目标位向量转换为多个依序排列的第一子位向量,分别对第一操作模式位向量和各第一子位向量进行相似性判断,以获取子相似度。
在本申请的一个实施例中,如果第五用户标识对应的用户的操作具有周期性,那么第五目标位向量经过处理后所获得新的位向量中必然包含多个第一操作模式位向量,因此第五目标位向量的长度必然大于第一操作模式位向量的长度。在进行周期性判断时,可以将第五目标位向量根据第一操作模式位向量的长度转换为多个依序排列的第一子位向量,然后对第一操作模式位向量和各第一子位向量进行相似性判断,以获取与各第一子位向量对应的子相似度。
在本申请的一个实施例中,以用户a在时间区间[0,12)内的操作对应的位向量110011011100为例,给定第一操作模式位向量110,其长度(位数)为3,根据第一操作模式位向量的长度对用户a对应的位向量进行转换,可以得到第一子位向量:110、100、001、011、110、101、011、111、110和100,接着将第一操作模式位向量分别与各第一子位向量进行相似性判断,能够获取子相似度,依次为:3、2、0、1、3、1、1、2、3、2。
在步骤S704中,根据各第一子位向量的排序和子相似度确定序列位向量,并获取所述序列位向量的重复周期。
在本申请的一个实施例中,根据与各第一子位向量对应的子相似度可以确定一序列位向量。在形成序列位向量的过程中,只有相似度为3时才说明第一子位向量与第一操作模式位向量完全相同,相似度为0、1、2时,说明第一子位向量与第一操作模式位向量不同。当第一子位向量与第一操作模式位向量完全相同时,序列位向量中的对应位置为1,当第一子位向量与第一操作模式位向量不相同时,序列位向量中的对应位置为0。以步骤S703中的例子为例,由子相似度组成的序列位向量为1000100010,该序列位向量的前八位为10001000,为1000的循环,说明第一操作模式是以4小时为周期重复出现的。
在步骤S705中,当重复周期与第一操作模式周期相同时,判定第五用户标 识对应的用户的操作行为在第四时间区间内具有周期性。
在本申请的一个实施例中,用户a在时间区间[0,12)内的操作是以4小时为周期重复出现的,给定的第一操作模式周期也为4,说明用户a的操作在时间区间[0,12)内是具有周期性的。进一步地,可以确定周期性的开始时间是第0小时,结束时间是第10小时。
针对任务(5),异常操作是指用户操作行为上出现了不同于往常的操作,快速检测异常可以帮助系统快速发现异常的操作,确定该操作是否为用户本人操作,如果不是用户本人操作,可以及时采取措施减少用户损失。本申请实施例中的异常判断建立在周期性判断的基础上,即用户之前的操作存在周期性,当出现不满足周期性特点的操作时,即将其定义为异常操作。
图8示出了基于位向量进行异常操作判断的流程示意图,如图8所示,该流程至少包括步骤S801-S805,具体地:
在步骤S801中,获取第六用户标识、第二操作模式位向量、异常阈值和第五时间区间。其中,所述步骤S801例如是响应于查询用户的查询请求执行的,查询请求包括标识信息和时间信息。所述第六用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由服务器从一个存储设备中获取。所述时间信息包括第五时间区间,所述第六用户标识与所述目标用户相对应。所述第二操作模式位向量、异常阈值可以是预设置的。
在本申请的一个实施例中,为了确定用户的操作是否存在异常,首先需要根据用户标识获取与该用户标识对应的位向量,接着根据该位向量判断用户的操作是否存在周期性,然后在用户操作具有周期性的基础上判断用户操作是否存在异常,最后将结果返回。为便于理解,将本步骤中获取的用户标识记为第六用户标识,操作模式位向量记为第二操作模式位向量,时间区间记为第五时间区间,同时获取判断异常操作的异常阈值。
在步骤S802中,根据第六用户标识和第五时间区间从位向量表中获取与第六用户标识对应的第六目标位向量,其中第六用户标识对应的用户的操作具有周期性。
在本申请的一个实施例中,通过将第六用户标识与位向量表中的用户标识进行匹配,以获取与第六用户标识对应的位于第五时间区间内的第六目标位向 量。获取第六目标位向量后,可以根据图7所示的步骤S703-S704对第六目标位向量进行处理,并根据处理结果判断第六用户标识对应的用户的操作在第五时间区间内是否存在周期性。只有在用户操作存在周期性的基础上,才能判断用户操作中是否存在异常操作,对于不存在周期性的用户操作,很难从中确定是否存在异常操作。
在步骤S803中,根据第二操作模式位向量的位数将第六目标位向量切分为多个第二子位向量。
在本申请的一个实施例中,为了判断用户操作中的哪个操作不符合周期性,并确定用户操作存在异常,需要根据第二操作模式位向量的位数将第六目标位向量切分为多个第二子位向量。以用户a在时间区间[0,12)内的操作对应的位向量110011011100为例。给定第二操作模式位向量为1100,那么可以将用户a对应的位向量分为多个第二子位向量,分别为:1100、1101和1100。
在步骤S804中,将第二操作模式位向量中各位的数据与各第二子位向量对应位的数据分别进行比对,以获取异常计数。
在本申请的一个实施例中,以用户e在时间区间[0,12)内的操作对应的位向量为例。
Figure PCTCN2020117623-appb-000014
给定的第二操作模式位向量B M=0100,异常阈值β=2。首先对用户e的操作是否具有周期性进行判断。根据步骤S703-S704计算可知,用户e的操作在第0小时至第8小时具有周期性。第二操作模式是以4小时为周期重复。从
Figure PCTCN2020117623-appb-000015
也可以看出操作在第0小时至第8小时具有周期性,证明在前两个第二子位向量0100中不存在异常,因此需要对第三个第二子位向量1111进行分析,判断是否存在异常。在判断时,将第二操作模式位向量中各位的数据与第二子位向量对应位的数据分别进行比对。具体地,对于第三个第二子位向量,其第一位为1,而第二操作模式位向量的第一位为0,二者不同,因此异常计数置为1;第二子位向量第二位为1,第二操作模式位向量的第二位为1,二者相同,因此异常计数仍为1;第二子位向量第三位为1,第二操作模式位向量的第三位为0,二者不同,因此异常计数置为2;第二子位向量第四位为1,第二操作模式位向量的第四位为0,二者不同,因此异常计数置为3。
在步骤S805中,当异常计数大于或等于异常阈值时,判定第六用户标识对应的用户的操作行为在第五时间区间内存在异常。
在本申请的一个实施例中,以步骤S804的例子为例,异常计数为3,异常阈值为2。异常计数大于异常阈值,说明第六用户标识对应的用户的操作行为在第五时间区间内存在异常。在判定用户的操作行为出现异常时,可以向系统发出警告,使系统采取相应措施,比如冻结用户的账户等,避免对用户财产造成损失。
在本申请的一个实施例中,本申请实施例公开的数据处理方法可以用于多个领域,例如医疗领域、金融领域、服务领域等等。以电子钱包的使用为例。用户在网购时会使用电子钱包进行支付,当电子钱包中的钱用完时会对电子钱包进行充值。用户对电子钱包的每一次充值或者消费都属于用户操作行为。在用户进行操作的时候,系统会将用户操作数据存储到用户操作数据表中。比如A用户在2019年10月1日17:00进行了一笔交易,购买了一套价值800元的护肤品,那么系统会将A用户的消费行为、消费金额、消费时间等信息记录到用户操作数据表中。当用户操作数据表中的数据发生变化时,触发器触发对新数据的映射,以更新与用户操作数据表关联的位向量表。位向量表中记录有用户标识和与用户标识对应的位向量,位向量中的每一位记录了用户在对应时间区间是否进行了操作。根据该位向量表,用户分析部门可以从中获取目标用户的位向量,通过对目标用户的位向量进行分析可以获取目标用户在某一时间区间内的操作次数;还可以对目标用户和其他用户的位向量进行分析,判断是否存在操作行为与目标用户的操作行为相似的用户,以对用户进行聚类,并进一步研究每一类用户的操作行为,例如同类用户的操作是否存在相互影响关系;另外还可以基于位向量表进行数据挖掘,例如通过对目标用户的位向量进行分析,判断目标用户在某一时间区间的操作行为是否存在周期性,在存在周期性的基础上,还可以判断目标用户的操作行为是否存在异常,在判定存在异常时,可以及时发出警告,通过系统控制目标用户的电子钱包的使用,避免不必要的损失。
在本申请的一个实施例中,由于位向量为0和1组成的二进制序列,其中0的占比较大,因此可以对位向量进行压缩,提高存储空间的利用率。在本申请的实施例中,采用σ表示压缩后的位向量,其中p个位为一组。p的具体大小可以由用户根据位向量中0的数量进行设置,来控制压缩效果。例如可以将p设 置为8、16、32等等。为了便于理解,下文将以p=8为例进行说明。当p=8时,压缩向量的一个组为包含8个位的位向量,其中第一位为标志位,即指示该组后面的7个位所表达的意义。若第一位为1,表示后面的7位表示是没有经过压缩的位向量;若第一位为0,表示后面的7位是用来进行计数,记录压缩的连续的0的数目。例如压缩位向量1000100中,第一位为1,表示后面的7位000100为没有经过压缩的位向量;压缩位向量0000100中,第一位为0,表示后面的7位000100为压缩的连续的0的数量,共计4个0。
举例说明本申请实施例中压缩位向量的过程。存在一位向量B [0,51)=001001000000000000000000000000000000000000001100011。在该位向量中,前7位中存在1,因此压缩向量σ的第一组的第一位为1,表示没有压缩,剩下的7位与B [0,7)相同,即σ的第一组为10010010;接下来的第8位至第43位全部为0,因此σ的第二组的第一位为0,剩下的7位用来记录压缩的0的个数,B [7,44)中0的数量为37,表示为二进制为100101,转换为7位二进制为0100101,那么σ的第二组为00100101;最后的第44位至第50位存在1,所以σ的第三组的第一位为1,剩下的7位与B [44,51)相同,即σ的第三组为11100011。即将B [0,51)压缩所形成的压缩向量为100100100010010111100011。
通过分析可知,压缩前的位向量为51位,压缩后的位向量为24,压缩比为24/51=0.47。压缩比和两个方面有关,位向量中的连续的0的数目和压缩位向量中设置的组大小。其中位向量中的连续的0的数目和用户的操作有关,这个并不受人为控制,而组大小可以根据数据进行调整,以提供较好的压缩效果。如发现数据中连续的0的数目比较多,可以将p设为较大的值,如p=64,来使用一个组表示更多的连续的0,提高压缩效果。
在本申请的一个实施例中,当需要在压缩位向量上进行操作时,可以先从数据库中将压缩位向量读取出来,然后进行解压,进而得到相应的位向量,并基于位向量进行数据处理。图9示出了对压缩位向量进行解压的流程示意图,如图9所示,在步骤S901中,获取压缩向量和待处理位向量对应的查询区间,该查询区间包括起始位数和终止位数;在步骤S902中,根据压缩位向量的位数将压缩向量切分为多个压缩位向量,依次对压缩位向量进行解压,以获取位数大于起始位数的解压位向量;在步骤S903中,将解压位向量中位数大于起始位 数的向量值作为待处理位向量中的向量值;在步骤S904中,若向量值的数量小于终止位数与起始位数的差值,则对解压位向量相邻的压缩位向量进行解压,以获取待处理位向量中剩余位的向量值。下面通过实例对图9所示的解压过程进行说明,例如给定待处理位向量的查询区间为第40小时到第50小时,即在时间粒度为1小时的条件下,起始位数为40,终止位数为50,同时获取压缩向量B [0,51);接着可以根据预设的组大小对压缩向量进行扫描,将压缩向量切分为多个压缩位向量,例如预设的组大小为8位,即压缩位向量的长度为8,那么可以将压缩向量切分为多个长度为8的压缩位向量;然后依次对各个压缩位向量进行解压,例如第一组压缩位向量为10010010,其第一位为1,说明其后的七位没有进行压缩,所以第一组存储了待处理位向量的前7位,由于7小于起始位数40,所以第一组压缩位向量不包含B [10,51)中的位;第二组压缩位向量为00100101,其第一位为0,说明其后的七位为压缩的连续的0的数量,共计37个0,第一组中的七位与第二组包含的0的数量共计44位,大于起始位数40,说明第二组压缩位向量包含B [40,51)中的前四位,具体为0000;由于B [40,51)中包含十一位,因此还需要继续对第三组压缩位向量进行解压,第三组压缩位向量为11100011,其第一位为1,表示没有进行压缩,所以可以直接读取后面的7位作为B [40,51)的后七位,最终得到B [40,51)=00001100011。值得注意的是,如果压缩向量后还有其它压缩位向量,可以直接忽略,因为已经得到B [40,51)的值。
本申请实施例通过将用户操作数据映射为对应时间的位向量,可以实现用户操作相关的查询,如上述实施例中提到的任务和目标结果,因为位向量的基本运算和基本操作可以被计算机底层极好的支持,使得数据处理效率提升,能够快速返回结果。在快速查询的基础上,还可以利用位向量进行数据挖掘,如周期性判断和异常操作判断等等。本申请实施例中的数据处理方法,能够提高数据处理效率和准确率,为用户分析部门提供较好的数据支持,也可以作为其它数据查询或数据挖掘工作的预处理过程,提高处理效率。另外,由于位向量中并没有体现具体地时间、地点及操作涉及的具体金额,所以本申请实施例中的数据处理方法可以很好地保护用户的隐私,避免用户隐私的泄露。进一步地,在存储时可以对位向量进行压缩存储,进而可以节省大量的存储空间,避免了位向量较大导致的存储空间不足、数据处理效率降低的问题。
以下介绍本申请的装置实施例,可以用于执行本申请上述实施例中的数据处理方法。对于本申请装置实施例中未披露的细节,请参照本申请上述的数据处理方法的实施例。
图10示意性示出了根据本申请的一个实施例的数据处理装置的框图。
参照图10所示,根据本申请的一个实施例的数据处理装置1000,包括:获取模块1001和运算模块1002。
其中,获取模块1001,用于响应于查询请求,获取与目标用户的操作数据相关的位向量表;其中所述查询请求包括标识信息和时间信息,所述位向量表包括用户标识和与所述用户标识对应的位向量,所述标识信息与所述用户标识相对应;运算模块1002,用于根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,对所述目标位向量进行逻辑处理以获取目标信息。
在本申请的一个实施例中,所述位向量包括各时间粒度内的用户操作信息。
在本申请的一个实施例中,所述运算模块1002配置为:获取第一用户标识和第一时间区间,根据所述第一用户标识和所述第一时间区间从所述位向量表中获取与所述第一用户标识对应的第一目标位向量;对所述第一目标位向量进行统计,以获取所述第一用户标识对应的用户在所述第一时间区间内进行操作的次数。其中,所述第一用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由数据处理装置从一个存储设备中获取。所述时间信息包括第一时间区间。所述第一用户标识与所述目标用户相对应。
在本申请的一个实施例中,所述运算模块1002配置为:获取第二用户标识、待比对用户标识和第二时间区间;根据所述第二用户标识、所述待比对用户标识和所述第二时间区间从所述位向量表中获取与所述第二用户标识对应的第二目标位向量和与所述待比对用户标识对应的待比对目标位向量;将所述第二目标位向量与所述待比对目标位向量进行异或运算,并对异或运算的结果进行非运算,以获取比对目标位向量;对所述比对目标位向量进行统计,以获取所述第二用户标识对应的用户和所述比对用户标识对应的用户在所述第二时间区间内的操作相似性。其中,所述第二用户标识、待比对用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由数据 处理装置从一个存储设备中获取。所述时间信息包括第二时间区间。所述第二用户标识、所述待比对用户标识与所述目标用户相对应。
在本申请的一个实施例中,所述运算模块1002包括:信息获取单元,用于获取第三用户标识、第四用户标识、相似性阈值和第三时间区间;位向量获取单元,用于根据所述第三用户标识、所述第四用户标识和第三时间区间从所述位向量表中获取与所述第三用户标识对应的第三目标位向量和与所述第四用户标识对应的第四目标位向量;相似度获取单元,用于对所述第四目标位向量进行移位操作以获取移位目标位向量,并对所述移位目标位向量和所述第三目标位向量进行相似性判断,以获取相似度;比较单元,用于将所述相似度与所述相似性阈值进行比较,并根据比较结果判断在所述第三时间区间内所述第三用户标识对应的用户的操作对所述第四用户标识对应的用户的操作是否存在影响。其中,所述第三用户标识、第四用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由数据处理装置从一个存储设备中获取。所述时间信息包括第三时间区间。所述第三用户标识、第四用户标识与所述目标用户相对应。
在本申请的一个实施例中,所述相似度获取单元配置为:按照移位单元将所述第四目标位向量向左移位,以获取所述移位目标位向量;将所述第三目标位向量和所述移位目标位向量进行异或运算,并对异或运算的结果进行非运算,以获取相似性目标位向量;对所述相似性目标位向量进行统计,以获取所述相似度。
在本申请的一个实施例中,所述比较单元配置为:当所述相似度大于或等于所述相似性阈值时,判定在所述第三时间区间内所述第三用户标识对应的用户的操作对所述第四用户标识对应的用户的操作存在影响;当所述相似度小于所述相似性阈值时,重复执行上述实施例所述的方法,直至所述第四目标位向量向左移位的位数达到移位阈值。
在本申请的一个实施例中,所述运算模块1002配置为:获取第五用户标识、第一操作模式位向量、第一操作模式周期和第四时间区间;根据所述第五用户标识和所述第四时间区间从所述位向量表中获取与所述第五用户标识对应的第五目标位向量;根据所述第一操作模式位向量的位数将所述第五目标位向量转 换为多个依序排列的第一子位向量,分别对所述第一操作模式位向量和各所述第一子位向量进行相似性判断,以获取子相似度;根据各所述第一子位向量的排序和所述子相似度确定序列位向量,并获取所述序列位向量的重复周期;当所述重复周期与所述第一操作模式周期相同时,判定所述第五用户标识对应的用户的操作行为在所述第四时间区间内具有周期性。其中,所述第五用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由数据处理装置从一个存储设备中获取。所述时间信息包括第四时间区间。所述第五用户标识与所述目标用户相对应。
在本申请的一个实施例中,所述运算模块1002配置为:获取第六用户标识、第二操作模式位向量、异常阈值和第五时间区间;根据所述第六用户标识和所述第五时间区间从所述位向量表中获取与所述第六用户标识对应的第六目标位向量,其中所述第六用户标识对应的用户的操作具有周期性;根据所述第二操作模式位向量的位数将所述第六目标位向量切分为多个第二子位向量;将所述第二操作模式位向量中各位的数据与各所述第二子位向量对应位的数据分别进行比对,以获取异常计数;当所述异常计数大于或等于所述异常阈值时,判定所述第六用户标识对应的用户的操作行为在所述第五时间区间内存在异常。其中,所述第六用户标识是根据所述标识信息获取,例如,包括在所述标识信息中,或者可以通过所述标识信息由数据处理装置从一个存储设备中获取。所述时间信息包括第五时间区间。所述第六用户标识与所述目标用户相对应。所述第二操作模式位向量、异常阈值可以是预设置的。
在本申请的一个实施例中,所述数据处理装置1000还包括:位向量表生成模块,用于根据所述用户的操作数据生成用户操作数据表,并根据所述用户操作数据表生成与所述用户操作数据表关联的位向量表,所述用户包括目标用户;位向量表更新模块,用于在监听到所述用户操作数据表中的用户操作数据发生变化时,对变化后的用户操作数据进行映射,以更新所述位向量表中的位向量。
在本申请的一个实施例中,所述用户操作数据表中设置有触发器;所述位向量表更新模块配置为:监听所述用户操作数据表;在所述用户操作数据表中的数据发生变化时,通过所述触发器触发对变化后的用户操作数据进行映射,以更新所述位向量表中的位向量。
在本申请的一个实施例中,所述位向量表更新模块配置为:从所述用户操作数据表中确定发生变化的用户操作数据所对应的目标用户标识;根据所述目标用户标识从所述位向量表中获取与所述目标用户标识对应的第一位向量,并将变化后的用户操作数据映射以得到第二位向量;将所述第一位向量和所述第二位向量进行或操作,以获取第三位向量;将所述第一位向量替换为所述第三位向量,以对所述位向量表中的位向量进行更新。
在本申请的一个实施例中,所述位向量为压缩位向量,所述压缩位向量的第一位为标志位,当所述标志位为1时,所述第一位之后的剩余位为没有压缩的位向量;当所述标志位为0时,所述第一位之后的剩余位为被压缩的连续的0的数量。
在本申请的一个实施例中,所述数据处理装置1000还包括:获取模块,用于获取压缩向量和待处理位向量对应的查询区间,所述查询区间包括起始位数和终止位数;解压模块,用于根据所述压缩位向量的位数将所述压缩向量切分为多个压缩位向量,依次对所述压缩位向量进行解压,以获取位数大于所述起始位数的解压位向量;截位模块,用于将所述解压位向量中位数大于所述起始位数的向量值作为所述待处理位向量中的向量值;补位模块,用于在所述向量值的数量小于所述终止位数与所述起始位数的差值,则对所述解压位向量相邻的压缩位向量进行解压,以获取所述待处理位向量中剩余位的向量值。
图11示出了适于用来实现本申请实施例的电子设备的计算机系统的结构示意图。
需要说明的是,图11示出的电子设备的计算机系统1100仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图11所示,计算机系统1100包括中央处理单元(Central Processing Unit,CPU)1101,其可以根据存储在只读存储器(Read-Only Memory,ROM)1102中的程序或者从储存部分1108加载到随机访问存储器(Random Access Memory,RAM)1103中的程序而执行各种适当的动作和处理,实现上述实施例中所述的数据处理方法。在RAM 1103中,还存储有系统操作所需的各种程序和数据。CPU 1101、ROM 1102以及RAM 1103通过总线1104彼此相连。输入/输出(Input/Output,I/O)接口1105也连接至总线1104。
以下部件连接至I/O接口1105:包括键盘、鼠标等的输入部分1106;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分1107;包括硬盘等的存储部分1108;以及包括诸如LAN(Local Area Network,局域网)卡、调制解调器等的网络接口卡的通信部分1109。通信部分1109经由诸如因特网的网络执行通信处理。驱动器1110也根据需要连接至I/O接口1105。可拆卸介质1111,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1110上,以便于从其上读出的计算机程序根据需要被安装入存储部分1108。
特别地,根据本申请的实施例,下文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1109从网络上被下载和安装,和/或从可拆卸介质1111被安装。在该计算机程序被中央处理单元(CPU)1101执行时,执行本申请的系统中限定的各种功能。
需要说明的是,本申请实施例所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、闪存、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传 输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的数据处理装置中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现上述实施例中所述的方法。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本申请的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本申请实施方式的技术方案可以以软件产品的形式体现出来,该软 件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、触控终端、或者网络设备等)执行根据本申请实施方式的方法。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯用技术手段。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (16)

  1. 一种数据处理方法,由计算机设备执行,所述方法包括:
    响应于查询请求,获取与目标用户的操作数据相关的位向量表;其中所述查询请求包括标识信息和时间信息,所述位向量表包括用户标识和与所述用户标识对应的位向量,所述标识信息与所述目标用户相对应;
    根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,对所述目标位向量进行逻辑处理以获取目标信息。
  2. 根据权利要求1所述的数据处理方法,其特征在于,所述根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,通过对所述目标位向量进行逻辑处理以获取目标信息,包括:
    获取第一用户标识和第一时间区间,其中,所述第一用户标识是根据所述标识信息获取,所述时间信息包括第一时间区间,所述第一用户标识与所述目标用户相对应;
    根据所述第一用户标识和所述第一时间区间从所述位向量表中获取与所述第一用户标识对应的第一目标位向量;
    对所述第一目标位向量进行统计,以获取所述第一用户标识对应的目标用户在所述第一时间区间内进行操作的次数。
  3. 根据权利要求1所述的数据处理方法,其特征在于,所述根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,通过对所述目标位向量进行逻辑处理以获取目标信息,包括:
    获取第二用户标识、待比对用户标识和第二时间区间,其中,所述第二用户标识、待比对用户标识是根据所述标识信息获取,所述时间信息包括第二时间区间,所述第二用户标识、所述待比对用户标识与所述目标用户相对应;
    根据所述第二用户标识、所述待比对用户标识和所述第二时间区间从所述位向量表中获取与所述第二用户标识对应的第二目标位向量和与所述待比对用户标识对应的待比对目标位向量;
    将所述第二目标位向量与所述待比对目标位向量进行异或运算,并对所述异或运算的结果进行非运算,以获取比对目标位向量;
    对所述比对目标位向量进行统计,以获取所述第二用户标识对应的目标用 户和所述比对用户标识对应的目标用户在所述第二时间区间内的操作相似性。
  4. 根据权利要求1所述的数据处理方法,其特征在于,所述根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,通过对所述目标位向量进行逻辑处理以获取目标信息,包括:
    获取第三用户标识、第四用户标识、相似性阈值和第三时间区间,其中,所述第三用户标识、第四用户标识是根据所述标识信息获取,所述时间信息包括第三时间区间,所述第三用户标识、第四用户标识与所述目标用户相对应;
    根据所述第三用户标识、所述第四用户标识和第三时间区间,从所述位向量表中获取与所述第三用户标识对应的第三目标位向量和与所述第四用户标识对应的第四目标位向量;
    对所述第四目标位向量进行移位操作以获取移位目标位向量,并对所述移位目标位向量和所述第三目标位向量进行相似性判断,以获取相似度;
    将所述相似度与相似性阈值进行比较,并根据比较结果判断在所述第三时间区间内所述第三用户标识对应的目标用户的操作对所述第四用户标识对应的目标用户的操作是否存在影响。
  5. 根据权利要求4所述的数据处理方法,其特征在于,所述对所述第四目标位向量进行移位操作以获取移位目标位向量,并对所述移位目标位向量和所述第三目标位向量进行相似性判断,以获取相似度,包括:
    按照移位单元将所述第四目标位向量向左移位,以获取所述移位目标位向量;
    将所述第三目标位向量和所述移位目标位向量进行异或运算,并对异或运算的结果进行非运算,以获取相似性目标位向量;
    对所述相似性目标位向量进行统计,以获取所述相似度。
  6. 根据权利要求5所述的数据处理方法,其特征在于,所述将所述相似度与相似性阈值进行比较,并根据比较结果判断在所述第三时间区间内所述第三用户标识对应的目标用户的操作对所述第四用户标识对应的目标用户的操作是否存在影响,包括:
    当所述相似度大于或等于所述相似性阈值时,判定在所述第三时间区间内所述第三用户标识对应的用户的操作对所述第四用户标识对应的目标用户的操 作存在影响;
    当所述相似度小于所述相似性阈值时,重复执行权利要求5所述的方法,直至所述第四目标位向量向左移位的位数达到移位阈值。
  7. 根据权利要求1所述的数据处理方法,其特征在于,所述根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,通过对所述目标位向量进行逻辑处理以获取目标信息,包括:
    获取第五用户标识、第一操作模式位向量、第一操作模式周期和第四时间区间,其中,所述第五用户标识是根据所述标识信息获取,所述时间信息包括第四时间区间,所述第五用户标识与所述目标用户相对应;
    根据所述第五用户标识和所述第四时间区间从所述位向量表中获取与所述第五用户标识对应的第五目标位向量;
    根据第一操作模式位向量的位数将所述第五目标位向量转换为多个依序排列的第一子位向量,分别对所述第一操作模式位向量和各所述第一子位向量进行相似性判断,以获取子相似度;
    根据各所述第一子位向量的排序和所述子相似度确定序列位向量,并获取所述序列位向量的重复周期;
    当所述重复周期与所述第一操作模式周期相同时,判定所述第五用户标识对应的目标用户的操作行为在所述第四时间区间内具有周期性。
  8. 根据权利要求1所述的数据处理方法,其特征在于,所述根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,通过对所述目标位向量进行逻辑处理以获取目标信息,包括:
    获取第六用户标识、第二操作模式位向量、异常阈值和第五时间区间,其中,所述第六用户标识是根据所述标识信息获取,所述时间信息包括第五时间区间,所述第六用户标识与所述目标用户相对应;
    根据所述第六用户标识和所述第五时间区间从所述位向量表中获取与所述第六用户标识对应的第六目标位向量,其中所述第六用户标识对应的目标用户的操作具有周期性;
    根据第二操作模式位向量的位数将所述第六目标位向量切分为多个第二子位向量;
    将所述第二操作模式位向量中各位的数据与各所述第二子位向量对应位的数据分别进行比对,以获取异常计数;
    当所述异常计数大于或等于所述异常阈值时,判定所述第六用户标识对应的目标用户的操作行为在所述第五时间区间内存在异常。
  9. 根据权利要求1至8任一项权利要求所述的数据处理方法,其特征在于,所述方法还包括:
    根据用户的操作数据生成用户操作数据表,并根据所述用户操作数据表生成与所述用户操作数据表关联的位向量表,所述用户包括目标用户;
    在监听到所述用户操作数据表中的用户操作数据发生变化时,对变化后的用户操作数据进行映射,以更新所述位向量表中的位向量。
  10. 根据权利要求9所述的数据处理方法,其特征在于,所述用户操作数据表中设置有触发器;
    所述在监听到所述用户操作数据表中的用户操作数据发生变化时,对变化后的用户操作数据进行映射,以更新所述位向量表中的位向量,包括:
    监听所述用户操作数据表;
    在所述用户操作数据表中的数据发生变化时,通过所述触发器触发对变化后的用户操作数据进行映射,以更新所述位向量表中的位向量。
  11. 根据权利要求9所述的数据处理方法,其特征在于,所述在监听到所述用户操作数据表中的用户操作数据发生变化时,对变化后的用户操作数据进行映射,以更新所述位向量表中的位向量,包括:
    从所述用户操作数据表中确定发生变化的用户操作数据所对应的目标用户标识;
    根据所述目标用户标识从所述位向量表中获取与所述目标用户标识对应的第一位向量,并将变化后的用户操作数据映射以得到第二位向量;
    将所述第一位向量和所述第二位向量进行或操作,以获取第三位向量;
    将所述第一位向量替换为所述第三位向量,以对所述位向量表中的位向量进行更新。
  12. 根据权利要求1所述的数据处理方法,其特征在于,所述位向量为压缩位向量,所述压缩位向量的第一位为标志位,当所述标志位为1时,所述第 一位之后的剩余位为没有压缩的位向量;当所述标志位为0时,所述第一位之后的剩余位为被压缩的连续的0的数量。
  13. 根据权利要求12所述的数据处理方法,其特征在于,所述方法还包括:
    获取压缩向量和待处理位向量对应的查询区间,所述查询区间包括起始位数和终止位数;
    根据所述压缩位向量的位数将所述压缩向量切分为多个压缩位向量,依次对所述压缩位向量进行解压,以获取位数大于所述起始位数的解压位向量;
    将所述解压位向量中位数大于所述起始位数的向量值作为所述待处理位向量中的向量值;
    若所述向量值的数量小于所述终止位数与所述起始位数的差值,则对所述解压位向量相邻的压缩位向量进行解压,以获取所述待处理位向量中剩余位的向量值。
  14. 一种数据处理装置,其特征在于,包括:
    获取模块,用于响应于查询请求,获取与目标用户的操作数据相关的位向量表;其中所述查询请求包括标识信息和时间信息,所述位向量表包括用户标识和与所述用户标识对应的位向量,所述标识信息与所述目标用户相对应;
    运算模块,用于根据所述标识信息和所述时间信息从所述位向量表中获取目标位向量,对所述目标位向量进行逻辑处理以获取目标信息。
  15. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1至13中任一项所述的数据处理方法。
  16. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如权利要求1-13中任一项所述的数据处理方法。
PCT/CN2020/117623 2019-11-15 2020-09-25 数据处理方法、电子设备及可读存储介质 WO2021093472A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911122281.7A CN111159515B (zh) 2019-11-15 2019-11-15 数据处理方法、装置及电子设备
CN201911122281.7 2019-11-15

Publications (1)

Publication Number Publication Date
WO2021093472A1 true WO2021093472A1 (zh) 2021-05-20

Family

ID=70555961

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117623 WO2021093472A1 (zh) 2019-11-15 2020-09-25 数据处理方法、电子设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN111159515B (zh)
WO (1) WO2021093472A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159515B (zh) * 2019-11-15 2024-05-28 腾讯科技(深圳)有限公司 数据处理方法、装置及电子设备
CN111724148B (zh) * 2020-06-22 2024-03-22 深圳前海微众银行股份有限公司 一种基于区块链系统的交易广播方法及节点

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373614A (zh) * 2015-11-24 2016-03-02 中国科学院深圳先进技术研究院 一种基于用户账号的子用户识别方法及系统
CN108989383A (zh) * 2018-05-31 2018-12-11 阿里巴巴集团控股有限公司 数据处理方法和客户端
US10425353B1 (en) * 2017-01-27 2019-09-24 Triangle Ip, Inc. Machine learning temporal allocator
CN111159515A (zh) * 2019-11-15 2020-05-15 腾讯科技(深圳)有限公司 数据处理方法、装置及电子设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104011673B (zh) * 2011-12-30 2016-12-07 英特尔公司 向量频率压缩指令
US9002903B2 (en) * 2013-03-15 2015-04-07 Wisconsin Alumni Research Foundation Database system with data organization providing improved bit parallel processing
CN103559274B (zh) * 2013-11-05 2016-08-31 中国联合网络通信集团有限公司 车况信息查询方法和装置
CN104765790B (zh) * 2015-03-24 2019-09-20 北京大学 一种数据查询的方法和装置
US10467215B2 (en) * 2015-06-23 2019-11-05 Microsoft Technology Licensing, Llc Matching documents using a bit vector search index
CN107545021B (zh) * 2017-05-10 2020-12-11 新华三信息安全技术有限公司 一种数据存储方法及装置
CN110019331A (zh) * 2017-09-08 2019-07-16 北京京东尚科信息技术有限公司 一种基于结构化查询语言的查询数据库的方法和装置
CN110111167A (zh) * 2018-02-01 2019-08-09 北京京东尚科信息技术有限公司 一种确定推荐对象的方法和装置
CN110223093B (zh) * 2018-03-02 2024-04-16 北京京东尚科信息技术有限公司 一种商品推介的方法和装置
CN108829572A (zh) * 2018-05-30 2018-11-16 北京奇虎科技有限公司 用户登录行为的分析方法及装置
CN109687991B (zh) * 2018-09-07 2023-04-18 平安科技(深圳)有限公司 用户行为识别方法、装置、设备及存储介质
CN109657890B (zh) * 2018-09-14 2023-04-25 蚂蚁金服(杭州)网络技术有限公司 一种转账欺诈的风险确定方法及装置
CN110362700B (zh) * 2019-06-17 2023-09-22 中国平安财产保险股份有限公司 数据处理方法、装置、计算机设备及存储介质
CN110365748B (zh) * 2019-06-24 2022-11-08 深圳市腾讯计算机系统有限公司 业务数据的处理方法和装置、存储介质及电子装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105373614A (zh) * 2015-11-24 2016-03-02 中国科学院深圳先进技术研究院 一种基于用户账号的子用户识别方法及系统
US10425353B1 (en) * 2017-01-27 2019-09-24 Triangle Ip, Inc. Machine learning temporal allocator
CN108989383A (zh) * 2018-05-31 2018-12-11 阿里巴巴集团控股有限公司 数据处理方法和客户端
CN111159515A (zh) * 2019-11-15 2020-05-15 腾讯科技(深圳)有限公司 数据处理方法、装置及电子设备

Also Published As

Publication number Publication date
CN111159515A (zh) 2020-05-15
CN111159515B (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
WO2022267735A1 (zh) 业务数据处理方法、装置、计算机设备和存储介质
US20230409349A1 (en) Systems and methods for proactively providing recommendations to a user of a computing device
US10747737B2 (en) Altering data type of a column in a database
US10853847B2 (en) Methods and systems for near real-time lookalike audience expansion in ads targeting
US8978034B1 (en) System for dynamic batching at varying granularities using micro-batching to achieve both near real-time and batch processing characteristics
US9047558B2 (en) Probabilistic event networks based on distributed time-stamped data
WO2021093472A1 (zh) 数据处理方法、电子设备及可读存储介质
US9940360B2 (en) Streaming optimized data processing
US8135666B2 (en) Systems and methods for policy based execution of time critical data warehouse triggers
Li et al. Feature selection with partition differentiation entropy for large-scale data sets
CN111666304B (zh) 数据处理装置、数据处理方法、存储介质与电子设备
WO2019187358A1 (ja) 評価装置
Ding et al. An adaptive density data stream clustering algorithm
WO2022161325A1 (zh) 提示方法和电子设备
US11086694B2 (en) Method and system for scalable complex event processing of event streams
US9633088B1 (en) Event log versioning, synchronization, and consolidation
CN113986933A (zh) 物化视图的创建方法、装置、存储介质及电子设备
US11860880B2 (en) Systems for learning and using one or more sub-population features associated with individuals of one or more sub-populations of a gross population and related methods therefor
Wang et al. A temporal consistency method for online review ranking
US20230016776A1 (en) Synthetic total audience ratings
CN112925859A (zh) 数据存储方法和装置
Huang et al. US-Rule: Discovering utility-driven sequential rules
Wei et al. Decision-relative discernibility matrices in the sense of entropies
CN116628049B (zh) 一种基于大数据的信息系统维护管理系统及方法
WO2020015114A1 (zh) 应用运行状态查询方法及终端设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20886262

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20886262

Country of ref document: EP

Kind code of ref document: A1