CN112559514A - Information processing method and system - Google Patents

Information processing method and system Download PDF

Info

Publication number
CN112559514A
CN112559514A CN201910910154.7A CN201910910154A CN112559514A CN 112559514 A CN112559514 A CN 112559514A CN 201910910154 A CN201910910154 A CN 201910910154A CN 112559514 A CN112559514 A CN 112559514A
Authority
CN
China
Prior art keywords
data
inverted index
label
database
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910910154.7A
Other languages
Chinese (zh)
Other versions
CN112559514B (en
Inventor
唐亚光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN201910910154.7A priority Critical patent/CN112559514B/en
Publication of CN112559514A publication Critical patent/CN112559514A/en
Application granted granted Critical
Publication of CN112559514B publication Critical patent/CN112559514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an information processing method and system. In the method, format conversion is carried out on data of a target object in a database, and an inverted index is generated according to the data after the format conversion; the converted data format is a data format supported by an index generator for generating an inverted index, and the inverted index takes the target data as a main key and takes an object identification list as data corresponding to the main key; and responding to a query request received by the search engine, and querying the inverted index according to the query request to obtain the number of objects meeting the query request.

Description

Information processing method and system
Technical Field
The present application relates to data processing technologies, and in particular, to an information processing method and system.
Background
For databases used to store user information, when creating or editing a crowd-sourcing rule, it is desirable to predict the number of users that match the rule, i.e., to predict how many eligible users the rule may cover, in order to assess the reasonableness of the rule or for other uses.
The estimation of the number of users covered by the crowd grouping rule is essentially to match the user information in the database by inputting a series of rules and count the number of matched users.
The database stores massive user information, each user has multiple (for example, dozens) of different types of tags, each type of tag has multiple (for example, dozens to hundreds) of different values, and the traditional relational database and index cannot support the indexing of massive data. Therefore, how to establish an index for a database storing massive user information and estimate the number of users matched with the crowd grouping rule based on the index is a problem to be solved at present.
Disclosure of Invention
The embodiment of the application provides an information processing method and system, which are used for counting and inquiring the number of objects by establishing an inverted index and based on the inverted index.
In a first aspect, an information processing method is provided, including: carrying out format conversion on data of a target object in a database, and generating an inverted index according to the data after the format conversion; the converted data format is a data format supported by an index generator for generating an inverted index, and the inverted index takes the target data as a main key and takes an object identification list as data corresponding to the main key; and responding to a query request received by the search engine, and querying the inverted index according to the query request to obtain the number of objects meeting the query request.
Optionally, the data of the target object comprises a tag of the target object, and the tag is used for describing the characteristics and/or behaviors of the object; the converting the format of the data of the target object in the database, and generating the inverted index according to the data after format conversion includes: and carrying out format conversion on the label of the target object in the database, and generating a label inverted index according to the label after the format conversion.
Optionally, the performing format conversion on the tag of the target object in the database, and generating the inverted tag index according to the tag after format conversion includes: according to the specified range requested by the query request, determining a label in the target object, which meets the specified range, for each target object in the database; and obtaining a label inverted index according to the label in the target object, wherein the label inverted index comprises the label in the designated range and a corresponding object identification list, the target label is a main key, and the object identification list is data corresponding to the main key.
Optionally, the querying the inverted index according to the query request to obtain the number of objects meeting the query request includes: inquiring the inverted index of the target label according to the target label required to be inquired by the inquiry request to obtain an object identification list corresponding to the target label; and determining the number of the objects according to the object identification list.
Optionally, the data of the target object includes grouping information of a group to which the target object belongs; the converting the format of the data of the target object in the database, and generating the inverted index according to the data after format conversion includes: and converting the format of the grouping information of the grouping to which the target object belongs in the database, and generating a grouping inverted index according to the grouping information after format conversion.
Optionally, the querying the inverted index according to the query request to obtain the number of objects meeting the query request includes: inquiring the reverse index of the grouping according to at least two target groupings corresponding to the grouping rule requested to be inquired by the inquiry request to obtain an object identification list corresponding to each of the at least two groupings; and performing intersection or union operation on the object identification lists corresponding to the at least two groups according to the grouping rule, and obtaining the number of the objects according to an operation result, wherein the grouping rule is used for indicating intersection or union operation on the at least two groups.
Optionally, the target objects are all objects in the database, or objects in the database where data update occurs; and if the target object is an object with data updating in the database, generating an inverted index according to the data after format conversion, and updating the original inverted index according to the inverted index.
Optionally, the database is an Hbase database, the Hbase database is used for storing user information, and the target object is a target user.
In a second aspect, there is provided an information processing system comprising:
the format converter is used for carrying out format conversion on the data of the target object in the database; wherein the converted data format is a data format supported by the index generator;
the index generator is used for generating an inverted index according to the data after format conversion; the reverse index takes the target data as a main key and takes an object identification list as data corresponding to the main key;
and the search engine is used for responding to the query request received by the search engine, querying the inverted index according to the query request and obtaining the number of the objects which accord with the query request.
Optionally, the data of the target object includes a tag of the target object and grouping information of a group to which the target object belongs, where the tag is used to describe features and/or behaviors of the object; the index generator comprises a label index generator and a grouping index generator, wherein the label index generator is used for carrying out format conversion on a label of a target object in the database and generating a label inverted index according to the label after the format conversion, and the grouping index generator is used for carrying out format conversion on grouping information of a group to which the target object in the database belongs and generating a grouping inverted index according to the grouping information after the format conversion; the search engine is specifically configured to: inquiring the inverted index of the target label according to the target label required to be inquired by the inquiry request to obtain an object identification list corresponding to the target label, and determining the number of objects according to the object identification list; or querying the reverse index of the group according to at least two target groups corresponding to the group rule requested to be queried by the query request to obtain object identifier lists corresponding to the at least two groups, performing intersection or union operation on the object identifier lists corresponding to the at least two groups according to the group rule, and obtaining the number of the objects according to an operation result, wherein the group rule is used for indicating intersection or union operation on the at least two groups.
In a third aspect, there is provided an information processing apparatus comprising: a processor, a memory; the processor is configured to read computer instructions in the memory and execute the method according to any one of the above first aspects.
In a fourth aspect, there is provided a computer storage medium having stored thereon computer-executable instructions for causing the computer to perform the method of any of the first aspects above.
In the embodiment of the application, format conversion is performed on the data of the target object in the database, and the inverted index is generated according to the data after format conversion, so that when the number of objects meeting a certain rule needs to be queried, the number of the objects meeting the rule can be queried according to the inverted index, and therefore the purpose that the number of the objects is counted and queried by establishing the inverted index and based on the inverted index is achieved, particularly when the method is applied to the database with mass information stored, the counting and querying can be performed conveniently and efficiently, and the requirement of real-time query is met.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 schematically illustrates a flow chart of establishing an inverted index in an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating a query flow in an embodiment of the present application;
FIG. 3 is a schematic diagram illustrating an architecture of an information handling system in an embodiment of the present application;
FIG. 4 is a schematic diagram illustrating information processing in an embodiment of the present application;
fig. 5 schematically shows a structure of an information processing apparatus provided in an embodiment of the present application.
Detailed Description
The concept to which the present application relates will be first explained below with reference to the drawings. It should be noted that the following descriptions of the concepts are only for the purpose of facilitating understanding of the contents of the present application, and do not represent limitations on the scope of the present application.
It is to be understood that the terms "first," "second," and the like in this application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used are interchangeable under appropriate circumstances and can be implemented in sequences other than those illustrated or otherwise described herein with respect to the embodiments of the application, for example.
Furthermore, the terms "comprises" and "comprising," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a product or device that comprises a list of elements is not necessarily limited to those elements explicitly listed, but may include other elements not expressly listed or inherent to such product or device.
As used in this application, the terms "module," "index generator," "search engine," "scheduler," and the like are intended to refer to any known or later developed hardware, software, firmware, artificial intelligence, fuzzy logic, or combination of hardware and/or software code that is capable of performing the functionality associated with that element.
The traditional relational database cannot meet the storage requirement of mass data, is not suitable for the scene of mass data retrieval, and currently, Hbase is used in the industry to store mass data. Hbase is a distributed, column-oriented open-ended database. Tags (also called tags) are stored as mass data in Hbase, and different classes of tags are stored in different columns (columns) in the database, i.e. columns in the database correspond to tags. Wherein the tags are used to describe characteristics and/or behavior of the subject, such as gender, age, interests, click-to-record, video watched, uploader of interest, and the like. Hbase only supports indexing of the primary key, and cannot index the value under each column (column).
At present, for the requirement of estimating the number of objects covered by a group grouping rule needing to be carried out in real time, data can be forcibly converted into a format suitable for a traditional relational database, and then the relational database is fragmented and an index is established. However, this solution is costly and requires high performance database services and hardware support. And the data is converted into the format of a relational database from the NoSQL format and the operations of manual fragmentation of the database, and the like, so that the technical implementation is complex and the development and maintenance cost is high.
In order to solve the above problems, embodiments of the present application provide an information processing method and system, and through the embodiments of the present application, mass data can be stored, an inverted index is established for the data, and objects and the number of objects that meet a group grouping rule are estimated in real time on the basis of the inverted index. Wherein the object may be a user, i.e. a database for storing user information.
In the embodiment of the present application, the database for storing the data of the object adopts Hbase to store massive data. The object data stored in the Hbase may include a tag (also referred to as tag) of the object and grouping information to which the object belongs, and the grouping information may specifically be a grouping ID. In the database, the identification (UID) of an object is a main key, each row corresponds to one object, each row is composed of a plurality of columns, and each column stores a type of label or group. The labels (columns) in the Hbase database may have different formats, and the labels may be specifically classified into two types: range matching and value matching.
Table 1 exemplarily shows a data structure of Hbase in which user information is stored.
Table 1: database structure
Figure BDA0002214466200000051
In the data structure, for a user, the information occupies one row, and the information of the user includes: user Identification (UID), user tags and Group Identifications (GIDs) of the belonging groups. The user id is used to uniquely identify a user, such as "100001" in the table, and is set as a primary key. The user tags may include one or more tags, and one tag occupies one column, such as the "video watching record tag" in table 1, for recording the behavior of the user watching the video. One user may belong to one group or may belong to a plurality of groups, and Group Identifications (GIDs) of the group to which one user belongs are recorded in the "belonging group" in table 1.
As shown in table 1, the "video viewing recording label" belongs to a range matching type, and its expressions [ "20190501", [ a, B, C ] ], [ "20190502", [ B, C, D ] ], [ "20190503", [ D, E, F ] ] represent: videos A, B and C are watched in 2019, 5 months and 1 day; videos B, C and D are watched in 2019, 5, month and 2; videos D, E, F are watched on 3 days 5 months in 2019. Wherein, A, B, C, D, E and F represent video identifiers.
The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 schematically illustrates a flow chart of establishing an inverted index in an embodiment of the present application.
As shown in fig. 1, the process may include:
s101: and acquiring data of a target object in the database, wherein the data of the target object is data of the reverse index to be generated.
The database may be a database for storing user information, that is, the target object is a target user. Alternatively, the database may be a Hbase database.
In the embodiment of the application, the reverse index can be generated for at least one of the tag and the grouping information of the target object in the database, and accordingly, at least one of the tag and the grouping information can be acquired for subsequently generating the reverse index.
Optionally, the object data related to the query condition may be obtained according to the query condition, so as to generate the inverted index for the object data meeting the query condition in the following.
S102: and converting the format of the data of the target object.
In this step, format conversion is performed to facilitate establishing the inverted index. The data format in the HBase cannot be directly indexed, so the index can be established only after corresponding format conversion. Specifically, the converted data format is a format supported by the index generator, so that the inverted index is generated by the index generator in the following.
Alternatively, the data structure in the database (Hbase) can be converted into JSON (JavaScript Object Notation) data suitable for building inverted indexes.
JSON is a lightweight data exchange format. Data is stored and represented in a text format that is completely independent of the programming language. The compact and clear hierarchy makes JSON an ideal data exchange language. The method is easy to read and write, and is easy to analyze and generate by a machine, and the network transmission efficiency is effectively improved.
S103: and generating an inverted index according to the data after format conversion.
The reverse index takes the target data as a main key and takes the object identification list as data corresponding to the main key. For example, if the database stores the ID of each user and the group to which each user belongs, the generated inverted index may include:
group IDs of user group a, a corresponding list of user IDs (users in the list all belong to group a);
group ID of user group B, corresponding user ID list (users in the list all belong to group B).
S104: and saving the inverted index.
Optionally, in this step, the generated inverted index may be stored in a search engine to improve search efficiency.
Optionally, in some embodiments, the data of the target object stored in the database may include a tag of the target object, where the tag is used to describe the characteristics and/or behavior of the object. In other embodiments, the data of the target object stored in the database may include grouping information, such as a group ID, of the group to which the target object belongs. In other embodiments, the data of the target object stored in the book may include the tag of the target object and the information of the group to which the target object belongs.
Correspondingly, when format conversion is carried out and the inverted index is generated, the format conversion can be carried out and the inverted index of the label can be generated only aiming at the label, namely, the label of the target object in the database is subjected to the format conversion, and the inverted index of the label is generated according to the label after the format conversion; format conversion can be carried out only on the grouping information and the grouping inverted index can be generated, namely, the format conversion is carried out on the information of the grouping to which the target object in the database belongs, and the grouping inverted index is generated according to the grouping information after the format conversion; format conversion and generation of an inverted index can also be performed for the tag and the packet information, respectively.
Alternatively, if the query or statistic performed involves a specified range when performing the query or statistic of the number of objects, when generating the inverted index, the data that meets the specified range may be first screened from the database, and the inverted index may be generated based on the data. Specifically, according to the specified range requested by the query request, for each target object in the database, determining a tag in the target object, which meets the specified range; and then, obtaining the label inverted index according to the label which accords with the specified range in the target object. The label inverted index comprises labels meeting the specified range and a corresponding object identification list, wherein the target labels are primary keys, and the object identification list is data corresponding to the primary keys.
For example, if the query condition is the number of users who have watched the corresponding video during the spring festival for each video statistics when the object number query is performed, for the time range, the video watching record labels of the users in the database may be first screened, the video watching records in the time period may be selected, and then the inverted index may be generated according to the video watching records in the time period, where the inverted index may have a structure of:
the ID of video a, a list of corresponding user IDs (the users in the list are the users who have viewed video a during the spring festival);
the ID of video B, a list of corresponding user IDs (the users in the list are the users who viewed video B during the spring festival).
Optionally, in this embodiment of the present application, the inverted index is distributed, and may support horizontal expansion, and accordingly, this embodiment of the present application may support full-scale update and incremental update of the inverted index.
In the full-scale updating, in S101, the acquired target objects are all objects in the database, and then through S102-103, an inverted index can be generated for the data.
During incremental updating, in S101, the acquired target object is an object in which data updating occurs in the database, then, reverse indexes can be generated for the data through S102-103, and in S104, the original reverse indexes are updated according to the reverse indexes. For example, the reverse index generated this time is merged with the original reverse index. For example, if an object 3 is newly added to the database, and the object belongs to the group a, the following inverted index may be generated for the object:
packet ID of packet a, { ID of object 3 }.
The original inverted index includes the following contents:
group ID of group A, { ID of object 1, ID of object 2 }
Then merging can result in:
packet ID of packet a, { ID of object 1, ID of object 2, ID of object 3 }.
The format of each label may be different, and may be specifically divided into two types: range matching and value matching. The embodiment of the application can perform format conversion and generate the inverted index aiming at the two types of user tags.
An example of a range query is a video tag viewed by a certain user (UID 100001), which is stored in HBase in the format as follows:
[[“20190501”,[A,B,C]],[“20190502”,[B,C,D]],[“20190503”,[D,E,F]]
the data indicates that the video watched on day 2019-05-01 is a, B, C, the video watched on day 2019-05-02 is B, C, D, the video watched on day 2019-05-03 is D, E, F.
If the crowd grouping rule is that the crowd of the videos A, B and C are watched in the last N days, if one video is watched for multiple times, only the latest watching record needs to be reserved. Thus, for the above-mentioned video tag viewed by the user, the user tag can be converted into the following JSON format supported by the inverted index:
{"viewed_video":
[{"id":"A","ts":"20190501"},
{"id":"B","ts":"20190502"},
{"id":"C","ts":"20190502"},
{"id":"D","ts":"20190503"},
{"id":"E","ts":"20190503"},
{"id":"F","ts":"20190503"}]}
the adapted viewing record represented by the above expression represents:
video identification a, timestamp (ts) 20190501; represents: the date a was last viewed is 2019, 5 month, 1 day;
video identification B, timestamp (ts) 20190503; represents: the date video B was last viewed was 2019, 5 month 2 day;
video identification C, timestamp (ts) 20190502; represents: the date video C was last viewed was 2019, 5 month 2 day;
video identification D, timestamp (ts) 20190503; represents: the date that video D was recently viewed is 2019, 5 months and 3 days;
video identification E, timestamp (ts) 20190503; represents: the date that video E was last viewed is 2019, 5 months and 3 days;
f, 20190503 for video identification (ts); represents: the date that video F was last viewed is 5, 3, 2019.
In the data in the JSON format, only the latest viewing time is reserved for B, C, and D viewed many times.
For the index of the numerical value type, taking the crowd grouping rule as the crowd grouping rule matching based on the combined crowd, the combined crowd refers to a series of group intersection or union operations which can be carried out according to the grouping identification to which the user belongs, so that the crowd meeting the rule is further screened out.
In HBase, the data format of a packet to which a certain user (UID 100001) belongs is as follows:
[G1,G2,G3,G4,G5]
wherein, G1, G2, G3, G4, and G5 respectively represent group identifications of the respective groups.
When format conversion is carried out, the format can be converted into the following JSON format with inverted indexes:
{“groups”:[“G1”,”G2”,”G3”,”G4”,”G5”]}
in S103 shown in fig. 1, the reverse index may be generated from the converted JSON data by calling an Application Programming Interface (API) for generating the reverse index.
Specifically, for the converted user tags, corresponding inverted indexes can be generated according to the video identifiers and timestamps in the JSON data of the same user tag under each user. The inverted index key (key) is a user tag value (here, a video identifier), and each key (key) contains a User Identifier (UID) list corresponding to the key. For the video tag watched by the user, table 2 shows the inverted index corresponding to the user tag.
Table 2: inverted index corresponding to adaptive label watched by user
Video identification (key) User identification list (UIDs)
A 100001,100002,100003
B 100001,100004,100006
C 100001,100003
D 100001,100009
E 100001
F 100001
In table 2, a, B, C, D, E, and F denote video identifiers.
For the converted grouping identification, a corresponding reverse index can be generated according to the grouping identification in JSON data of each user. The inverted index key (key) is a grouping identification, and a User Identification (UID) list containing the key is arranged under each key (key). Table 3 shows the reverse index corresponding to the grouping information.
Table 3: inverted index corresponding to grouping information of user
Grouping identification (key) User identification list (UIDs)
G1 100001,100002,100003,100005
G2 100001,100004,100006
G3 100001,100003,100005
G4 100001,100009
G5 100001,100005
Fig. 2 schematically shows a query flow in the embodiment of the present application.
As shown in fig. 2, the process may include:
s201: an incoming query request is received.
In this step, the user may enter the query request expression through a user interface provided by the system. The query request may be for a number of objects based on tags, such as a number of users who have viewed video a for video a, or for a number of objects based on a grouping rule of a group of groups, such as a number of objects belonging to group a but not to group B.
S202: and querying the inverted index according to the query request to obtain the number of the objects which accord with the query request.
In this step, the expression of the query request may first be converted into a query expression that is applied to the inverted index.
Optionally, if the query request is used to request query of the number of tag-based objects, in this step, the inverted index of the target tag query tag requested by the query request may be obtained, an object identification list corresponding to the target tag may be obtained, and the number of objects may be determined according to the object identification list.
Optionally, if the query request is used to request the number of objects based on the grouping rule of the combined group, in this step, the group inverted index may be queried according to at least two target groups corresponding to the grouping rule requested to be queried by the query request, so as to obtain an object identifier list corresponding to each of the at least two groups; and then, performing intersection or union operation on the object identification lists corresponding to the at least two groups according to a grouping rule, and obtaining the number of the objects according to an operation result. Wherein the grouping rule is used to indicate that the at least two groups are intersected or merged.
S203: and outputting the number of the objects obtained by the query.
In this step, the inverted index may be queried according to an expression applied to the inverted index, to obtain the number of objects that meet the query request.
According to the process shown in fig. 2, taking the video watched by the user most recently as an example, the crowd grouping rule is "crowd watching videos a, B, and C in the last 7 days", and the rule can be expressed by the following JSON expression:
{“in”:[A,B,C],”max_days”:7}
taking the current date as 5 and 10 months in 2019 as an example, and the date 7 days before is 5 and 3 months in 2019, after the expression is input into the search engine, the search engine converts the expression into the following query expression:
{"bool":{"must":[{"terms":{"viewed_video.id":[A,B,C]}},
{"range":{"viewed_video.ts":{"ge":"20190503"}}}]}}
and the search engine queries the inverted index of the user tags by using the expression to obtain the number of the users matched by the rule.
In another example, the combined crowd rule is "belongs to packet a or packet B, but not to packet C," and the JSON expression for this rule is as follows:
{“and”:[{“or”:[A,B]},“not”:C}]}
after the expression is input to the search engine, the search engine converts the expression into the following query expression:
{"bool":{"filter":[{"bool":{"should":[{"terms":{"groups":[A,B]}}]}},
{"bool":{"must_not":[{"terms":{"groups":[C]}}]}}]}}
and the search engine uses the expression to inquire the user grouping inverted index to obtain the number of the users matched by the rule.
In the embodiment of the application, the format conversion is performed on the user information of each user in the user information database, and the inverted index is generated according to the user information after the format conversion, so that when the number of users matched with the crowd grouping rule needs to be queried, the number of users conforming to the crowd grouping rule can be queried according to the inverted index, and the problem that the number of users matched with the crowd grouping rule cannot be directly queried for massive information is solved.
Fig. 3 schematically shows a configuration of an information processing system in the embodiment of the present application.
As shown, the system 100 includes: format converter 10, index generator 11, search engine 12, and further may include scheduler 13. Among them, the format converter 10 may perform the format conversion operation in the foregoing embodiment, the index generator 11 may perform the inverted index operation in the foregoing embodiment, and the search engine 12 may perform the query operation in the foregoing embodiment.
A format converter 10 for converting the format of the data of the target object in the database; wherein the converted data format is a data format supported by the index generator;
an index generator 11 for generating an inverted index from the format-converted data; the reverse index takes the target data as a main key and takes an object identification list as data corresponding to the main key;
and the search engine 12 is configured to respond to a query request received by the search engine, query the inverted index according to the query request, and obtain the number of objects meeting the query request.
A scheduler 13 for scheduling the index generator 11.
Optionally, the data of the target object includes a tag of the target object and grouping information of a group to which the target object belongs, where the tag is used to describe features and/or behaviors of the object.
Accordingly, the index generator 11 includes a tag index generator 111 and a grouping index generator 112. The tag index generator 111 is configured to perform format conversion on a tag of a target object in the database, and generate a tag inverted index according to the tag after the format conversion; the grouping index generator 112 is configured to perform format conversion on the information of the grouping to which the target object in the database belongs, and generate a grouping inverted index according to the grouping information after format conversion.
The search engine 12 is specifically configured to: inquiring the inverted index of the target label according to the target label required to be inquired by the inquiry request to obtain an object identification list corresponding to the target label, and determining the number of objects according to the object identification list; or querying the reverse index of the group according to at least two target groups corresponding to the group rule requested to be queried by the query request to obtain object identifier lists corresponding to the at least two groups, performing intersection or union operation on the object identifier lists corresponding to the at least two groups according to the group rule, and obtaining the number of the objects according to an operation result, wherein the group rule is used for indicating intersection or union operation on the at least two groups.
Optionally, the number of the tag index generators 111 is one or more. Each label index generator corresponds to one label, and each label index generator is used for generating the label inverted index for the corresponding label.
Alternatively, the search engine 12 may include: tag search engines and packet search engines. The label search engine is used for inquiring the label inverted index according to the inquiry rule based on the label to obtain the number of objects according with the inquiry rule. And the user grouping search engine is used for inquiring the grouping inverted index according to the grouping rule based on the combined group to obtain the number of the objects which accord with the grouping rule based on the combined group.
Alternatively, the scheduler 13 may schedule the index generator according to a set time or a set period, so that the index generator performs format conversion on the data of each object in the database, and generates the inverted index according to the data after format conversion. Optionally, the scheduler 13 may schedule the index generator to update the inverted index incrementally or fully.
Based on the system architecture, in some embodiments, the scheduler 13 may schedule each index generator according to a set time or a set period, so that each index generator may perform format conversion on data in the database, and generate an inverted index according to the data after format conversion, so that the inverted index may be updated in time according to a data update condition of the database, so as to ensure accuracy of a query result obtained based on the inverted index.
Taking a user information database for storing user information as an example, fig. 4 exemplarily shows an information processing diagram in the embodiment of the present application, according to one or a combination of the above embodiments. As shown in the figure, the user information database stores massive user information, and each user information includes a user tag a, a user tag B, a user tag C, and a group identifier to which the user belongs. For the user database, a user tag index generator 1 can be used for carrying out format conversion on a user tag A of each user and generating a user tag A pilot index, a user tag index generator 2 is used for carrying out format conversion on a user tag B of each user and generating a user tag B pilot index, a user tag index generator 3 is used for carrying out format conversion on a user tag C of each user and generating a user tag C inverted index, and a user grouping index generator is used for carrying out format conversion on grouping marks of each user and generating a user grouping inverted index.
After the user inputs the grouping rule based on the user tag, the user tag index engine queries based on the user tag A reverse index, the user tag B reverse index or the user tag C reverse index to obtain the number of users matched with the rule.
After the user inputs the grouping rule based on the combined crowd, the user grouping index engine carries out query based on the user grouping inverted index to obtain the number of the users matched with the rule.
Based on the same technical concept, the embodiment of the present application further provides an information processing apparatus, which can implement the flows executed in fig. 1 and fig. 2 in the foregoing implementation.
Fig. 5 schematically shows a structure of the apparatus 400 in the embodiment of the present application. Referring to fig. 5, the apparatus 400 includes a processor 401, a memory 402, and a communication interface 403. The processor 401 may also be a controller. The processor 401 is configured to enable the apparatus to perform the functions involved in the aforementioned procedures. A memory 402 is used for coupling with the processor 401 and holds the necessary program instructions and data for the terminal. The processor 401 is connected to the memory 402, the memory 402 is used for storing instructions, and the processor 401 is used for executing the instructions stored in the memory 402 to perform the steps of the method for executing the corresponding functions.
In the embodiment of the present application, for concepts, explanations, detailed descriptions, and other steps related to the technical solutions provided in the embodiment of the present application, reference is made to the foregoing methods or descriptions related to these contents in other embodiments, which are not described herein again.
It should be noted that the processor referred to in the embodiments of the present application may be a Central Processing Unit (CPU), a general purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic devices, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like. Wherein the memory may be integrated in the processor or may be provided separately from the processor.
Based on the same technical concept, the embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium stores computer-executable instructions for causing a computer to perform the processes of the present application.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. An information processing method characterized by comprising:
carrying out format conversion on data of a target object in a database, and generating an inverted index according to the data after the format conversion; the converted data format is a data format supported by an index generator for generating an inverted index, and the inverted index takes the target data as a main key and takes an object identification list as data corresponding to the main key;
and responding to a query request received by the search engine, and querying the inverted index according to the query request to obtain the number of objects meeting the query request.
2. The method of claim 1, wherein the data of the target object comprises a tag of the target object, the tag being used to describe a feature and/or behavior of the object;
the converting the format of the data of the target object in the database, and generating the inverted index according to the data after format conversion includes:
and carrying out format conversion on the label of the target object in the database, and generating a label inverted index according to the label after the format conversion.
3. The method of claim 2, wherein converting the format of the tag of the target object in the database and generating the inverted tag index from the format-converted tag comprises:
according to the specified range requested by the query request, determining a label in the target object, which meets the specified range, for each target object in the database;
and obtaining a label inverted index according to the label in the target object, wherein the label inverted index comprises the label in the designated range and a corresponding object identification list, the target label is a main key, and the object identification list is data corresponding to the main key.
4. The method of claim 2, wherein said querying the inverted index according to the query request for the number of objects that meet the query request comprises:
inquiring the inverted index of the target label according to the target label required to be inquired by the inquiry request to obtain an object identification list corresponding to the target label;
and determining the number of the objects according to the object identification list.
5. The method of claim 1, wherein the data of the target object includes grouping information of a group to which the target object belongs;
the converting the format of the data of the target object in the database, and generating the inverted index according to the data after format conversion includes:
and converting the format of the grouping information of the grouping to which the target object belongs in the database, and generating a grouping inverted index according to the grouping information after format conversion.
6. The method of claim 5, wherein said querying the inverted index according to the query request for the number of objects that meet the query request comprises:
inquiring the reverse index of the grouping according to at least two target groupings corresponding to the grouping rule requested to be inquired by the inquiry request to obtain an object identification list corresponding to each of the at least two groupings;
and performing intersection or union operation on the object identification lists corresponding to the at least two groups according to the grouping rule, and obtaining the number of the objects according to an operation result, wherein the grouping rule is used for indicating intersection or union operation on the at least two groups.
7. The method of claim 1, wherein the target objects are all objects in the database or objects in which data update occurs in the database;
and if the target object is an object with data updating in the database, generating an inverted index according to the data after format conversion, and updating the original inverted index according to the inverted index.
8. The method of any one of claims 1-7, wherein the database is an Hbase database, the Hbase database is used to store user information, and the target object is a target user.
9. An information processing system, comprising:
the format converter is used for carrying out format conversion on the data of the target object in the database; wherein the converted data format is a data format supported by the index generator;
the index generator is used for generating an inverted index according to the data after format conversion; the reverse index takes the target data as a main key and takes an object identification list as data corresponding to the main key;
and the search engine is used for responding to the query request received by the search engine, querying the inverted index according to the query request and obtaining the number of the objects which accord with the query request.
10. The system of claim 9, wherein the data of the target object comprises a tag of the target object and grouping information of a group to which the target object belongs, the tag being used for describing features and/or behaviors of the object;
the index generator comprises a label index generator and a grouping index generator, wherein the label index generator is used for carrying out format conversion on a label of a target object in the database and generating a label inverted index according to the label after the format conversion, and the grouping index generator is used for carrying out format conversion on grouping information of a group to which the target object in the database belongs and generating a grouping inverted index according to the grouping information after the format conversion;
the search engine is specifically configured to:
inquiring the inverted index of the target label according to the target label required to be inquired by the inquiry request to obtain an object identification list corresponding to the target label, and determining the number of objects according to the object identification list; or
And querying the reverse index of the group according to at least two target groups corresponding to the group rule requested to be queried by the query request to obtain object identification lists corresponding to the at least two groups, performing intersection or union operation on the object identification lists corresponding to the at least two groups according to the group rule, and obtaining the number of the objects according to an operation result, wherein the group rule is used for indicating intersection or union operation on the at least two groups.
CN201910910154.7A 2019-09-25 2019-09-25 Information processing method and system Active CN112559514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910910154.7A CN112559514B (en) 2019-09-25 2019-09-25 Information processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910910154.7A CN112559514B (en) 2019-09-25 2019-09-25 Information processing method and system

Publications (2)

Publication Number Publication Date
CN112559514A true CN112559514A (en) 2021-03-26
CN112559514B CN112559514B (en) 2023-04-25

Family

ID=75029120

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910910154.7A Active CN112559514B (en) 2019-09-25 2019-09-25 Information processing method and system

Country Status (1)

Country Link
CN (1) CN112559514B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282616A (en) * 2021-05-19 2021-08-20 华润电力技术研究院有限公司 Incremental time sequence data conflict detection method and device and storage medium
CN114116775A (en) * 2021-11-08 2022-03-01 北京达佳互联信息技术有限公司 Information processing method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030225779A1 (en) * 2002-05-09 2003-12-04 Yasuhiro Matsuda Inverted index system and method for numeric attributes
CN1892655A (en) * 2005-06-15 2007-01-10 阿尔卡特公司 Method and data structure for indexed storage of hierarchically interrelated information in a relational database
CN102402540A (en) * 2010-09-15 2012-04-04 浙江天宇信息技术有限公司 Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data
CN105404627A (en) * 2014-09-11 2016-03-16 阿里巴巴集团控股有限公司 Method and device for determining search result
CN105528367A (en) * 2014-09-30 2016-04-27 华东师范大学 A method for storage and near-real time query of time-sensitive data based on open source big data
CN105653628A (en) * 2015-12-28 2016-06-08 湖南蚁坊软件有限公司 Index inversion-based query method of column storage database

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030225779A1 (en) * 2002-05-09 2003-12-04 Yasuhiro Matsuda Inverted index system and method for numeric attributes
CN1892655A (en) * 2005-06-15 2007-01-10 阿尔卡特公司 Method and data structure for indexed storage of hierarchically interrelated information in a relational database
CN102402540A (en) * 2010-09-15 2012-04-04 浙江天宇信息技术有限公司 Numerical value and text mixed inverted index algorithm based on multilayer-optimization balanced tree
CN104794123A (en) * 2014-01-20 2015-07-22 阿里巴巴集团控股有限公司 Method and device for establishing NoSQL database index for semi-structured data
CN105404627A (en) * 2014-09-11 2016-03-16 阿里巴巴集团控股有限公司 Method and device for determining search result
CN105528367A (en) * 2014-09-30 2016-04-27 华东师范大学 A method for storage and near-real time query of time-sensitive data based on open source big data
CN105653628A (en) * 2015-12-28 2016-06-08 湖南蚁坊软件有限公司 Index inversion-based query method of column storage database

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨洵等: ""高校数据精简整合系统管理研究"", 《情报探索》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113282616A (en) * 2021-05-19 2021-08-20 华润电力技术研究院有限公司 Incremental time sequence data conflict detection method and device and storage medium
CN114116775A (en) * 2021-11-08 2022-03-01 北京达佳互联信息技术有限公司 Information processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN112559514B (en) 2023-04-25

Similar Documents

Publication Publication Date Title
EP3370391A1 (en) System for data cleansing based aggregation and prioritization of it asset field values from real-time event logs and method thereof
CN111400288A (en) Data quality inspection method and system
CN111061758B (en) Data storage method, device and storage medium
US10769104B2 (en) Block data storage system in an event historian
CN112559514B (en) Information processing method and system
CN103246745A (en) Device and method for processing data based on data warehouse
CN115033646B (en) Method for constructing real-time warehouse system based on Flink and Doris
CN103927314A (en) Data batch processing method and device
CN105868196A (en) Method for generating industrial data report in server
CN111309868A (en) Knowledge graph construction and retrieval method and device
CN116244333A (en) Database query performance prediction method and system based on cost factor calibration
CN102857949A (en) Method and device for planning data consistency guarantees
CN113810234B (en) Method and device for processing micro-service link topology and readable storage medium
CN112307318A (en) Content publishing method, system and device
CN112699183A (en) Data processing method, system, readable storage medium and computer equipment
CN101635711B (en) Programmable character communication method
CN110347726A (en) A kind of efficient time series data is integrated to store inquiry system and method
CN116186053A (en) Data processing method, device and storage medium
CN112069021B (en) Flow data storage method and device, electronic equipment and storage medium
CN116450637A (en) Data management method, device, electronic equipment and storage medium
CN113298106A (en) Sample generation method and device, server and storage medium
CN105224998A (en) Data processing method and device for pre-estimation model
CN113821896B (en) Dynamic loading method and system for topological data of power distribution network
CN112559562A (en) Information processing method and system
CN116431688B (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant