CN110968582B - Crowd generation method and device - Google Patents

Crowd generation method and device Download PDF

Info

Publication number
CN110968582B
CN110968582B CN201911060192.4A CN201911060192A CN110968582B CN 110968582 B CN110968582 B CN 110968582B CN 201911060192 A CN201911060192 A CN 201911060192A CN 110968582 B CN110968582 B CN 110968582B
Authority
CN
China
Prior art keywords
crowd
condition
state
search engine
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911060192.4A
Other languages
Chinese (zh)
Other versions
CN110968582A (en
Inventor
王志伟
谢俏
邰娟
李成
孙迁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Suning Cloud Computing Co ltd
SuningCom Co ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN201911060192.4A priority Critical patent/CN110968582B/en
Publication of CN110968582A publication Critical patent/CN110968582A/en
Application granted granted Critical
Publication of CN110968582B publication Critical patent/CN110968582B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a crowd generation method and device, and belongs to the technical field of big data. The method comprises the following steps: receiving a crowd ID and corresponding crowd conditions for generating a crowd, and storing the crowd ID and the crowd conditions into a relational database in an associated manner; acquiring a plurality of crowd conditions needing to be calculated from a relational database, and analyzing and converting each crowd condition into a query statement executable by a distributed search engine; and starting multithreading through a distributed computing engine, carrying out data query on a plurality of query statements in the distributed search engine on the basis of indexes in parallel, and storing the queried data into a Hive table. The embodiment of the invention can save the computing resources in the crowd generation process and accelerate the crowd generation speed.

Description

Crowd generation method and device
Technical Field
The invention relates to the technical field of big data, in particular to a crowd generation method and device.
Background
In the mobile internet, the basic features and behavior data of the user are usually labeled, and the labels are used as conditions for screening people, so as to calculate people who meet expectations, i.e. generate a crowd packet.
At present, in a crowd generation scheme, offline calculation engines such as hive and the like are usually adopted for calculation, so that full-table data is operated each time, a crowd calculation task is created for a crowd package in a task creation mode, when the number of the crowd package is increased, the number of tasks is increased, the demand of calculation resources is increased, the waste of the calculation resources is caused, the calculation time is generally in the minute level, and the generation speed of the crowd is reduced.
Disclosure of Invention
In order to solve the technical problems mentioned in the background art, the present invention provides a crowd generating method and apparatus, so as to save computing resources and accelerate the generation speed of the crowd in the process of generating the crowd.
The embodiment of the invention provides the following specific technical scheme:
in a first aspect, a method for generating a population of people is provided, the method comprising:
receiving a crowd ID and corresponding crowd conditions for generating a crowd, and storing the crowd ID and the crowd conditions in a relational database in an associated manner;
acquiring a plurality of crowd conditions needing to be calculated from the relational database, and analyzing and converting each crowd condition into a query statement executable by a distributed search engine;
and starting multithreading through a distributed computing engine, carrying out data query on a plurality of query statements in the distributed search engine on the basis of indexes in parallel, and storing the queried data into a Hive table.
Further, the receiving is used for generating a crowd ID of the crowd and a corresponding crowd condition, and the storing the crowd ID and the crowd condition in a relational database in an associated manner includes:
receiving the crowd ID and the corresponding crowd condition from the distributed message queue through Spark Streaming;
analyzing and converting the received crowd condition into a query statement executable by the distributed search engine, and querying the distributed search engine to obtain the number of covered people corresponding to the crowd ID;
and storing the crowd ID, the crowd condition and the coverage number into the relational database in a correlation mode, and setting the calculation state of the crowd ID according to the coverage number.
Further, the setting of the calculation state of the crowd ID according to the number of covered people includes:
judging whether the number of covered people corresponding to the crowd definition data is zero or not;
if yes, setting the state of the crowd ID as a successful calculation state;
and if not, setting the state of the crowd ID as a waiting calculation state.
Further, a user tag database and a corresponding tag index table are pre-stored in the distributed search engine, and the data query based on the index is performed on a plurality of query statements in parallel by starting multithreading through the distributed computing engine, including:
aiming at a plurality of query statements, generating a plurality of crowd computing tasks by the distributed computing engine in a multithreading mode and executing the crowd computing tasks;
and each crowd computing task is used for querying the user tag database according to the tag index value related to each query statement in the tag index table to obtain the user data related to each query statement.
Further, the crowd condition is an SQL condition, the relational database is MySQL, the distributed computing engine is Spark, and the distributed search engine is an Elasticsearch.
Further, the method further comprises:
and comparing the crowd ID stored in the Hive table with the crowd ID in the relational database in the calculation state, judging whether the missing crowd ID exists, and if so, updating the state of the missing crowd ID into a waiting calculation state.
Further, the method further comprises:
and after receiving a data dump instruction, dumping the data stored in the Hive table to a server indicated by the data dump instruction.
In a second aspect, there is provided a crowd generating apparatus, the apparatus comprising:
the receiving module is used for receiving a crowd ID and a corresponding crowd condition for generating a crowd and storing the crowd ID and the crowd condition into a relational database in an associated manner;
the analysis module is used for acquiring a plurality of crowd conditions needing to be calculated from the relational database and analyzing and converting each crowd condition into a query statement executable by a distributed search engine;
and the computing module is used for starting multithreading through a distributed computing engine and carrying out data query on a plurality of query statements in the distributed search engine on the basis of indexes in parallel, and storing the queried data into a Hive table.
Further, the receiving module specifically includes:
the receiving submodule is used for receiving the crowd ID and the corresponding crowd condition from the distributed message queue through Spark Streaming;
the query submodule is used for analyzing and converting the received crowd condition into a query statement executable by the distributed search engine, and querying the distributed search engine to obtain the number of covered people corresponding to the crowd ID;
and the storage submodule is used for storing the crowd ID, the crowd condition and the coverage number into the relational database in a correlation manner, and setting the calculation state of the crowd ID according to the coverage number.
Further, the storage submodule is specifically configured to:
judging whether the number of covered people corresponding to the crowd definition data is zero or not;
if yes, setting the state of the crowd ID as a successful calculation state;
if not, the state of the crowd ID is a waiting calculation state.
Further, a user tag database and a corresponding tag index table are pre-stored in the distributed search engine, and the calculation module is specifically configured to:
aiming at a plurality of query statements, generating a plurality of crowd computing tasks by the distributed computing engine in a multithreading mode and executing the crowd computing tasks;
and each crowd computing task is used for querying the user tag database according to the tag index value related to each query statement in the tag index table to obtain the user data related to each query statement.
Further, the crowd condition is an SQL condition, the relational database is MySQL, the distributed computing engine is Spark, and the distributed search engine is an Elasticsearch.
Further, the calculation module is specifically further configured to:
and comparing the crowd ID stored in the Hive table with the crowd ID in the relational database, the state of which is the calculating state, judging whether the missing crowd ID exists, and if so, updating the state of the missing crowd ID into the calculating waiting state.
Further, the apparatus further comprises:
and the service module is used for dumping the data stored in the Hive table to a server indicated by the data dumping instruction after receiving the data dumping instruction.
In a third aspect, a computer device is provided, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the crowd generation method according to any of the first aspects.
In a fourth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the crowd generation method according to any one of the first aspects.
The embodiment of the invention provides a crowd generation method and a device, wherein a distributed search engine is introduced in the crowd calculation process, the crowd is generated by combining the distributed search engine and the distributed search engine, all data of operation is changed into walking indexes, the data volume of operation is reduced, the crowd generation speed is accelerated, the minute level is changed into the second level, the calculation resources are saved, and the multi-thread parallel calculation is started through the distributed search engine, so that the capacity of parallelly and quickly calculating multiple crowds is realized.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow diagram of a crowd generation method in one embodiment;
FIG. 2 is a schematic flow chart diagram of a crowd generation method in another embodiment;
FIG. 3 is a block diagram of a crowd generation device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
It is to be understood that, unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising", and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, what is meant is "including but not limited to".
Furthermore, in the description of the present invention, it is to be understood that the terms "first", "second", etc. are merely for the purpose of conditional forest banking and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified.
At present, in the crowd generation process, an offline calculation engine such as hive is usually adopted for calculation, so that full-table data is operated each time, many tasks are created, resources are wasted, and the calculation time is generally in the minute level. Therefore, the embodiment of the invention provides a crowd generation method, which is used for receiving crowd IDs and corresponding crowd conditions for generating crowds, analyzing the crowd conditions and converting the crowd conditions into query statements of a distributed search engine, and parallelly and quickly querying people meeting the conditions based on an index mode by adopting a mode of combining the distributed calculation engine and the distributed search engine.
Example one
An embodiment of the present invention provides a crowd generation method, as shown in fig. 1, the method may include the steps of:
and S11, receiving the crowd ID and the corresponding crowd condition for generating the crowd, and storing the crowd ID and the crowd condition in a relational database in an associated manner.
The user terminal generates the crowd condition and the corresponding crowd ID through receiving label check operation input by a user (such as a merchant user, a platform operator and the like) on a visual interface, and the crowd ID uniquely identifies the crowd condition, namely, generates a crowd packet. In addition, when the user inputs the label checking operation on the user terminal, an effective time period is also set, so that the user terminal can obtain the effective time period of the crowd condition, and the effective time period comprises an effective starting date and an effective ending date.
After the user terminal generates the crowd ID and the crowd condition, the crowd ID and the crowd condition are sent to the distributed message queue Kafka to wait for Spark Streaming to consume.
The crowd condition can adopt a structured query statement SQL, and the relational database can adopt a MySQL database.
Specifically, the implementation process of step S11 may include:
a, receiving the crowd ID and the corresponding crowd condition from the distributed message queue through Spark Streaming.
And b, analyzing and converting the received crowd condition into a query sentence which can be executed by a distributed search engine, and querying the distributed search engine to obtain the number of the coverage people corresponding to the crowd ID.
In this embodiment, the distributed computing engine may adopt Spark, and the distributed search engine may adopt Elasticsearch.
Specifically, the crowd condition in the distributed message queue Kafka is consumed through Spark Streaming, the obtained crowd condition SQL statement is converted into a DSL statement (Domain Specific Language) for querying the Elasticsearch, and the coverage of the crowd ID is found.
In the process of converting the SQL statement into the DSL query statement of the Elasticsearch, the SQL statement may be analyzed to generate an SQL syntax tree, and then the SQL syntax tree may be converted into the DSL statement of the Elasticsearch.
And c, storing the crowd ID, the crowd condition and the number of the covered people in a relational database in a correlation mode, and setting the calculation state of the crowd ID according to the number of the covered people.
Specifically, when the crowd ID, the crowd condition, and the number of covered persons are stored in the relational database in association with each other, it is determined whether the number of covered persons corresponding to the crowd definition data is zero, and if so, the state of the crowd ID is set to a calculation success state in the relational database, and if not, the state of the crowd ID is set to a calculation waiting state.
In this embodiment, if the number of people covered by a crowd is zero, the crowd ID may be directly set to the successfully calculated ID, and then the successfully calculated ID may be returned.
And S12, acquiring a plurality of crowd conditions needing to be calculated from the relational database, and analyzing and converting each crowd condition into a query statement executable by the distributed search engine.
Specifically, obtaining a plurality of crowd conditions needing to be calculated from the relational database may include:
the distributed computing engine is used for acquiring the plurality of crowd IDs in the waiting computing state from the relational database in a timing mode, and correspondingly modifying the states of the plurality of crowd IDs from the waiting computing state to the computing state.
In this embodiment, the spare crowd calculation task may be started at regular time to scan data in MySQL, determine whether there is a crowd ID in a state waiting for calculation (i.e., a crowd needing to be calculated), if there is a crowd ID needing to be calculated, sort the crowd IDs in descending order according to the receiving time, take out the top N-bit crowd IDs (e.g., N = 700) for calculation, and modify the state of the taken out crowd IDs from a state waiting for calculation to a state in calculation.
Specifically, parsing each crowd condition into a query statement executable by a distributed search engine includes:
and analyzing each crowd condition to generate a syntax tree, and converting each syntax tree into a query sentence executable by a distributed search engine.
Specifically, after each crowd condition is analyzed to generate a syntax tree, the syntax tree can be recursively processed to generate a query language executable by the distributed search engine, wherein in the recursive processing process, whether a father node and a son node are in the same nested structure or not can be judged when traversing nodes on the syntax tree, and the query language executable by the distributed search engine can be generated according to the judgment result.
And S13, starting multithreading through the distributed computing engine, carrying out data query on a plurality of query statements in the distributed search engine based on the index in parallel, and storing the queried data into the Hive table.
The distributed search engine stores a user tag database and a corresponding tag index table in advance, the user tag database stores user data, the user data comprises a user ID and a corresponding user tag, and the user data can be obtained by calculating original data in a data warehouse through the distributed calculation engine and dumped in the distributed search engine.
In practical applications, the user tag may include a user identification, a business object, a behavior type, and a timestamp, the business object including at least one of a brand of goods, a category of goods, and a store, the behavior type including at least one of browsing, searching, shopping, collecting, submitting an order, and paying an order for the business object.
Specifically, aiming at a plurality of query statements, generating and executing a plurality of crowd computing tasks in a multithreading mode through a distributed computing engine; and each crowd calculation task is used for inquiring and obtaining the user data relevant to each query statement in the user tag database according to the tag index value relevant to each query statement in the tag index table.
In this embodiment, when performing calculation through Spark, M threads (for example, M = 10) may be started in a thread pool manner at a Drive end of Spark, where one thread is used to process one crowd calculation task, multiple crowd calculation tasks are executed by distributing multiple threads, a tag index value related to a key field in each query statement may be obtained from a tag index table, and user data related to each query statement is obtained by querying in a user tag database according to the obtained tag index value, so as to query required data, and store the required data in a Hive table in a manner that one crowd packet corresponds to one partition, where one crowd packet includes a crowd ID and corresponding user data.
It should be noted that, generally, one crowd condition corresponds to one effective time period, and each day in the effective time period of the crowd condition is calculated according to the crowd condition to generate the crowd for service delivery or activity delivery, so that the embodiment can quickly update the crowd delivered on the same day and the crowd to be delivered on the next day in batches, thereby ensuring the timeliness of the data delivered on the next day.
The embodiment of the invention provides a crowd generation method, which comprises the steps of receiving a crowd ID and corresponding crowd conditions for generating crowds, and storing the crowd ID and the crowd conditions into a relational database in an associated manner; acquiring a plurality of crowd conditions needing to be calculated from a relational database, and analyzing and converting each crowd condition into a query statement executable by a distributed search engine; the distributed computing engine is started to multithread parallelly carry out data query on a plurality of query statements in the distributed search engine based on the index, and the queried data are stored in the Hive table, so that crowd generation is carried out by combining the distributed computing engine and the distributed search engine, all data in operation are changed into walking indexes, the data amount in operation is reduced, the crowd generation speed is accelerated, the class of minutes is changed into the class of seconds, computing resources are saved, multithread parallel computing is started through the distributed computing engine, and the capacity of parallelly and quickly computing a plurality of crowds is realized.
Example two
On the basis of the first embodiment, the embodiment of the present invention further provides a crowd generating method, as shown in fig. 2, the method may include the steps of:
and S21, receiving the crowd ID and the corresponding crowd condition for generating the crowd, and storing the crowd ID and the crowd condition into a relational database in an associated manner.
Specifically, the implementation process of step S21 may refer to step S11 in the first embodiment, and is not described herein again.
And S22, acquiring a plurality of crowd conditions needing to be calculated from the relational database, and analyzing and converting each crowd condition into a query statement executable by the distributed search engine.
Specifically, the implementation process of step S22 may refer to step S12 in the first embodiment, and details are not repeated here.
And step S23, starting multithreading through the distributed computing engine, carrying out data query on a plurality of query sentences in the distributed search engine on the basis of the index in parallel, and storing the queried data into the Hive table.
Specifically, the implementation process of step S23 may refer to step S13 of the first embodiment, and details are not repeated here.
And step S24, comparing the crowd ID stored in the Hive table with the crowd ID in the relational database in the calculation state, judging whether the missing crowd ID exists, if so, executing step S25, otherwise, executing step S26.
Specifically, after the calculation of all the crowd conditions needing to be calculated in the batch is completed, a calculation missing check is started, all the crowd IDs of which the calculation states are in the calculation states are obtained from the MySQL database, the obtained all the crowd IDs are compared with the crowd IDs stored in the Hive table, and if a certain crowd ID exists in the MySQL database and the crowd ID does not exist in the Hive table, the crowd ID is determined to be the calculated missing crowd ID.
In step S25, the state of the missing crowd ID is updated to the waiting state.
Specifically, the statistical number of the missing population IDs is calculated, when the statistical number of the missing population IDs is judged to be more than 50% of the total number of all the population IDs required to be calculated in the batch, an alarm is given, and the state of the population IDs in the MySQL database is set to be a waiting calculation state, so that the next batch can be calculated continuously.
In step S26, after receiving the data dump instruction, the data stored in the Hive table is dumped to the server indicated by the data dump instruction.
The data dump instruction may be input by a platform operator on a user terminal, and the server indicated in the data dump instruction may be a Redis server or an FTP server. Here, the platform may be an e-commerce platform.
In this embodiment, for different service requirements, a platform operator may input a corresponding data dump instruction to dump data stored in the Hive table to different servers, so as to improve different data services.
It is noted that, after dumping the data stored in the Hive table to the server indicated by the data dumping instruction, the calculation state of the corresponding crowd ID is set as the calculation success state.
In addition, the above calculation states include: the wait-to-compute state, the in-compute state, and the compute-successful state may be represented in the MySQL database as "1", "2", and "3", respectively.
In the embodiment, the crowd is generated by combining the distributed computing engine and the distributed search engine, all data is changed into the walking index, the data amount of operation is reduced, the crowd generation speed is accelerated, the minute level is changed into the second level, computing resources are saved, and the multi-thread parallel computing is started through the distributed computing engine, so that the capability of computing multiple crowds in parallel and quickly is realized; in addition, the calculation logic codes generated by the crowd are abstractly decomposed into a receiving layer, an analysis layer, a calculation layer and a service layer for decoupling through a layered design idea, so that the expandability of the codes is increased, the complicated and changeable business logic can be packaged in the service layer, and different services are provided through the service layer.
EXAMPLE III
An embodiment of the present invention provides a crowd generating device, as shown in fig. 3, the device may include:
a receiving module 31, configured to receive a crowd ID and a corresponding crowd condition for generating a crowd, and store the crowd ID and the crowd condition in a relational database in an associated manner;
the analysis module 32 is used for acquiring a plurality of crowd conditions needing to be calculated from the relational database and analyzing and converting each crowd condition into a query statement executable by the distributed search engine;
and the computing module 33 is used for starting multithreading through the distributed computing engine to perform data query on a plurality of query statements in the distributed search engine on the basis of indexes in parallel, and storing the queried data into the Hive table.
Further, the receiving module 31 specifically includes:
the receiving submodule is used for receiving the crowd ID and the corresponding crowd condition from the distributed message queue through Spark Streaming;
the query submodule is used for analyzing and converting the received crowd condition into a query statement executable by the distributed search engine, and querying the distributed search engine to obtain the number of the coverage people corresponding to the crowd ID;
and the storage submodule is used for storing the crowd ID, the crowd condition and the coverage number into the relational database in a correlation mode, and setting the calculation state of the crowd ID according to the coverage number.
Further, the storage submodule is specifically configured to:
judging whether the number of covered people corresponding to the crowd definition data is zero or not;
if so, setting the state of the crowd ID as a successful calculation state;
if not, the state of the crowd ID is a waiting calculation state.
Further, a user tag database and a corresponding tag index table are pre-stored in the distributed search engine, and the calculation module 33 is specifically configured to:
aiming at a plurality of query statements, generating a plurality of crowd computing tasks by a distributed computing engine in a multithreading mode and executing the crowd computing tasks;
and each crowd calculation task is used for inquiring and acquiring user data related to each query statement in the user tag database according to the tag index value related to each query statement in the tag index table.
Further, the crowd condition is SQL condition, the relational database is MySQL, the distributed computing engine is Spark, and the distributed search engine is Elasticsearch.
Further, the calculating module 33 is specifically further configured to:
and comparing the crowd ID stored in the Hive table with the crowd ID in the relational database, the state of which is in the calculating state, judging whether the missing crowd ID exists, and if so, updating the state of the missing crowd ID into the calculating waiting state.
Further, the apparatus further comprises:
and the service module 34 is used for dumping the data stored in the Hive table to the server indicated by the data dumping instruction after receiving the data dumping instruction.
The crowd generating device provided by the embodiment of the invention belongs to the same inventive concept as the crowd generating method provided by the embodiment of the invention, can execute the crowd generating method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the target crowd circling method. For details of the technology that are not described in detail in this embodiment, reference may be made to the crowd generation method provided in this embodiment of the present invention, and details thereof are not described here.
In addition, an embodiment of the present invention further provides a computer device, where the computer device includes:
one or more processors;
storage means for storing one or more programs;
when executed by one or more processors, cause the one or more processors to implement the steps of the crowd generation method as described in the embodiments above.
Another embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the crowd generation method according to the above embodiment.
As will be appreciated by one of skill in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention have been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. A method of crowd generation, the method comprising:
receiving a crowd ID and corresponding crowd conditions for generating a crowd, and storing the crowd ID and the crowd conditions into a relational database in an associated manner, wherein one crowd condition is used for generating a crowd correspondingly, and the crowd ID carries out unique identification on the crowd condition;
acquiring a plurality of crowd conditions needing to be calculated from the relational database, and analyzing and converting each crowd condition into a query statement executable by a distributed search engine;
starting multithreading through a distributed computing engine, carrying out data query on a plurality of query statements in the distributed search engine on the basis of indexes in parallel, and storing the queried data into a Hive table;
wherein the receiving is used for generating a crowd ID and a corresponding crowd condition of a crowd and storing the crowd ID and the crowd condition in a relational database in an associated manner, and comprises:
receiving the crowd ID and the corresponding crowd condition from the distributed message queue through Spark Streaming;
analyzing and converting the received crowd condition into a query statement executable by the distributed search engine, and querying the distributed search engine to obtain the number of covered people corresponding to the crowd ID;
and storing the crowd ID, the crowd condition and the number of the covering people into the relational database in a correlation manner, and setting the calculation state of the crowd ID according to the number of the covering people, wherein if the number of the covering people is zero, the state of the crowd ID is set to be a successful calculation state in the relational database, otherwise, the state of the crowd ID is set to be a waiting calculation state.
2. The method of claim 1, wherein setting the calculation state of the crowd ID according to the number of the covered persons comprises:
judging whether the number of covered people corresponding to the crowd definition data is zero or not;
if yes, setting the state of the crowd ID as a successful calculation state;
and if not, setting the state of the crowd ID as a waiting calculation state.
3. The method according to claim 1, wherein a user tag database and a corresponding tag index table are pre-stored in the distributed search engine, and the initiating multithreading by the distributed computing engine concurrently performs data query on a plurality of query statements in the distributed search engine based on indexes comprises:
aiming at a plurality of query statements, generating a plurality of crowd computing tasks by the distributed computing engine in a multithreading mode and executing the crowd computing tasks;
and each crowd calculation task is used for inquiring and acquiring user data related to each query statement in the user tag database according to a tag index value related to each query statement in the tag index table.
4. The method of any of claims 1 to 3, wherein the demographic condition is a SQL condition, the relational database is MySQL, the distributed computing engine is Spark, and the distributed search engine is an elastic search.
5. The method of claim 1, further comprising:
and comparing the crowd ID stored in the Hive table with the crowd ID in the relational database, the state of which is the calculating state, judging whether the missing crowd ID exists, and if so, updating the state of the missing crowd ID into the calculating waiting state.
6. The method according to claim 1 or 5, characterized in that the method further comprises:
and after receiving a data dump instruction, dumping the data stored in the Hive table to a server indicated by the data dump instruction.
7. A crowd generating device, the device comprising:
the receiving module is used for receiving a crowd ID and corresponding crowd conditions for generating crowds, and storing the crowd ID and the crowd conditions into a relational database in an associated manner, wherein one crowd condition is used for generating a corresponding crowd, and the crowd ID is used for uniquely identifying the crowd condition;
the analysis module is used for acquiring a plurality of crowd conditions needing to be calculated from the relational database and analyzing and converting each crowd condition into a query statement executable by a distributed search engine;
the computing module is used for starting multithreading through a distributed computing engine, carrying out data query on a plurality of query statements in the distributed search engine on the basis of indexes in parallel, and storing the queried data into a Hive table;
wherein, the receiving module specifically includes:
the receiving submodule is used for receiving the crowd ID and the corresponding crowd condition from the distributed message queue through Spark Streaming;
the query submodule is used for analyzing and converting the received crowd condition into a query statement executable by the distributed search engine, and querying the distributed search engine to obtain the number of the coverage people corresponding to the crowd ID;
and the storage submodule is used for storing the crowd ID, the crowd condition and the number of covered people into the relational database in an associated mode, and setting the calculation state of the crowd ID according to the number of covered people, wherein if the number of covered people is zero, the state of the crowd ID is set to be a calculation success state in the relational database, and otherwise, the state of the crowd ID is set to be a calculation waiting state.
8. The apparatus of claim 7, wherein the crowd condition is an SQL condition, the relational database is MySQL, the distributed computing engine is Spark, and the distributed search engine is an Elasticsearch.
9. The apparatus of claim 7, further comprising:
and the service module is used for dumping the data stored in the Hive table to the server indicated by the data dumping instruction after receiving the data dumping instruction.
CN201911060192.4A 2019-11-01 2019-11-01 Crowd generation method and device Active CN110968582B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911060192.4A CN110968582B (en) 2019-11-01 2019-11-01 Crowd generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911060192.4A CN110968582B (en) 2019-11-01 2019-11-01 Crowd generation method and device

Publications (2)

Publication Number Publication Date
CN110968582A CN110968582A (en) 2020-04-07
CN110968582B true CN110968582B (en) 2022-12-30

Family

ID=70030065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911060192.4A Active CN110968582B (en) 2019-11-01 2019-11-01 Crowd generation method and device

Country Status (1)

Country Link
CN (1) CN110968582B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113744008A (en) * 2020-05-29 2021-12-03 北京顺源开华科技有限公司 Resource bit delivery method and device, electronic equipment and storage medium
CN111815359B (en) * 2020-07-09 2024-05-24 北京火山引擎科技有限公司 Target crowd determination method and device, electronic equipment and storage medium
CN112052259A (en) * 2020-09-28 2020-12-08 深圳前海微众银行股份有限公司 Data processing method, device, equipment and computer storage medium
CN112396462B (en) * 2020-11-26 2022-11-22 苏宁云计算有限公司 Crowd circling method and device based on click house
CN112765201A (en) * 2021-02-01 2021-05-07 武汉思普崚技术有限公司 Method and device for analyzing SQL (structured query language) statement into specific field query statement
CN113268495A (en) * 2021-05-25 2021-08-17 深圳壹账通智能科技有限公司 Data searching method and device, electronic equipment and storage medium
CN113282610A (en) * 2021-06-17 2021-08-20 金蝶软件(中国)有限公司 Data query method and data query device
CN113722318A (en) * 2021-07-23 2021-11-30 恩亿科(北京)数据科技有限公司 Storage query method, system, device and medium for user-defined crowd package
CN113590923A (en) * 2021-07-28 2021-11-02 深圳市酷开网络科技股份有限公司 Crowd delineating task splitting method, device, equipment and storage medium
CN113806451A (en) * 2021-09-17 2021-12-17 平安普惠企业管理有限公司 Data division processing method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241627A (en) * 2016-12-23 2018-07-03 北京神州泰岳软件股份有限公司 A kind of isomeric data storage querying method and system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108241627A (en) * 2016-12-23 2018-07-03 北京神州泰岳软件股份有限公司 A kind of isomeric data storage querying method and system

Also Published As

Publication number Publication date
CN110968582A (en) 2020-04-07

Similar Documents

Publication Publication Date Title
CN110968582B (en) Crowd generation method and device
US11182204B2 (en) System and method for batch evaluation programs
CN106104533B (en) Handle the data set in large data repository
CN110609852B (en) Streaming data processing method and device, computer equipment and storage medium
CN110532084B (en) Platform task scheduling method, device, equipment and storage medium
CN109408541A (en) Report decomposes statistical method, system, computer equipment and storage medium
US10592507B2 (en) Query processing engine recommendation method and system
CN111813803A (en) Statement block execution plan generation method, device, equipment and storage medium
CN106874080B (en) Data calculation method and system based on distributed server cluster
CN111814041A (en) NPM package recommendation method and device, storage medium and computer equipment
US11256748B2 (en) Complex modeling computational engine optimized to reduce redundant calculations
CN112749325A (en) Training method and device for search ranking model, electronic equipment and computer medium
CN113010539A (en) Data processing method and device
CN110929207B (en) Data processing method, device and computer readable storage medium
US20130232172A1 (en) Methods and systems for matching expressions
CN116263717A (en) Order service processing method and device based on event
Ma et al. Cbbcm: Clustering based automatic service composition
CN112799797A (en) Task management method and device
US20230023134A1 (en) Lookup and relationship caches for dynamic fetching
US11616744B2 (en) Context-dependent message extraction and transformation
CN110750563A (en) Multi-model data processing method, system, device, electronic equipment and storage medium
US11809390B2 (en) Context-dependent event cleaning and publication
CN113158031B (en) Method and device for determining user resource information, computer storage medium and terminal
CN114091769A (en) Federal learning modeling optimization method based on feature engineering
CN112882803B (en) Data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee after: Jiangsu Suning cloud computing Co.,Ltd.

Country or region after: China

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee before: Suning Cloud Computing Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address
TR01 Transfer of patent right

Effective date of registration: 20240515

Address after: 210000, 1-5 story, Jinshan building, 8 Shanxi Road, Nanjing, Jiangsu.

Patentee after: SUNING.COM Co.,Ltd.

Country or region after: China

Address before: No.1-1 Suning Avenue, Xuzhuang Software Park, Xuanwu District, Nanjing, Jiangsu Province, 210000

Patentee before: Jiangsu Suning cloud computing Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right