CN111787084A - Method and device for selecting object - Google Patents
Method and device for selecting object Download PDFInfo
- Publication number
- CN111787084A CN111787084A CN202010581611.5A CN202010581611A CN111787084A CN 111787084 A CN111787084 A CN 111787084A CN 202010581611 A CN202010581611 A CN 202010581611A CN 111787084 A CN111787084 A CN 111787084A
- Authority
- CN
- China
- Prior art keywords
- data source
- label
- tag
- computer
- setting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims description 21
- 238000013507 mapping Methods 0.000 claims description 4
- 230000003750 conditioning effect Effects 0.000 claims 1
- 241000272470 Circus Species 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000035622 drinking Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 235000004936 Bromus mango Nutrition 0.000 description 1
- 241000196324 Embryophyta Species 0.000 description 1
- 240000009088 Fragaria x ananassa Species 0.000 description 1
- 240000007228 Mangifera indica Species 0.000 description 1
- 235000014826 Mangifera indica Nutrition 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 235000009184 Spondias indica Nutrition 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 235000021012 strawberries Nutrition 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/535—Tracking the activity of the user
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1095—Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Data Mining & Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Computer Hardware Design (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An embodiment of the invention sets labels and label conditions related to objects, sets logic relations among different label conditions to form a label condition packet, and then uses the label condition packet to circle out the specific object from the data source. The embodiment of the invention can improve the efficiency of object selection.
Description
Technical Field
The invention relates to the field of data analysis, in particular to a method and a device for selecting objects.
Background
Users on the internet can generate enormous amounts of behavioral data. Taking e-commerce website as an example, the behavior data refers to which users browse the e-commerce website and which commodities are purchased.
People meeting the preset conditions can be selected through the behavior data circle. The operation of circling out a group of people satisfying a predetermined condition from a large group of people is also called "circling people". The scheme of using the table of Excel software of Microsoft corporation to circle people exists in the prior art, and the defects are that specified conditions are required to be manually input to screen people, and then the screened people are analyzed by using a data analysis system, so that high manual participation degree is required, the efficiency is low, and the accuracy is not high.
Disclosure of Invention
In view of the problems in the prior art, the present invention aims to provide a method and an apparatus for efficiently selecting a specific object.
One aspect of the invention relates to a method of computer-based selection of a particular object from a data source, comprising: setting, by a computer, a tag associated with an object; setting, by a computer, a label condition of the label; setting logic relations among different label conditions through a computer to form a label condition packet; and using, by a computer, the tag condition package to circle the particular object from the data source.
Yet another aspect of the invention relates to a method of circling a particular object from a data source by a computer, comprising: means for setting a tag associated with the object; means for setting a label condition of the label; means for setting a logical relationship between different tag conditions to form a tag condition package; and means for using the tag condition package to circle the particular object from the data source.
The embodiment of the invention can reduce the manual participation degree and greatly improve the efficiency of object selection.
Drawings
FIG. 1 is a schematic diagram of a system implementing an embodiment of the invention.
FIG. 2 is a flow chart of a method of implementing an embodiment of the present invention.
Detailed Description
The content of the invention will now be described with reference to a number of exemplary embodiments. It is to be understood that these examples are set forth merely to enable those of ordinary skill in the art to better understand and thereby implement the teachings of the present invention, and are not intended to suggest any limitation as to the scope of the invention.
As used herein, the term "include" and its variants should be read as open-ended terms meaning "including, but not limited to. The term "based on" should be read as "based, at least in part, on. The terms "one embodiment" and "an embodiment" should be read as "at least one embodiment". The term "another embodiment" should be read as "at least one other embodiment".
The following is described by way of example. But as those skilled in the art will recognize, the present invention may also be used for reduced scope selection operations (i.e., circle selection operations) for many types of objects (e.g., animals, plants, merchandise, etc.).
The tags may be obtained by manual definition based on underlying data (e.g., access records, order information, user information, etc.). The label may be formed (or referred to as "defined") by combining irregular individual fields (which may be understood as a single action by the user, such as purchase, settlement, etc.) or multiple fields in the underlying data. For example, assuming that the label "active user" is to be defined, the definition may be based on the Login Time in the behavior data. Specifically, an "active user" may be defined that has a Login Time within 5 days (i.e., Login Time < 5). Similarly, the label "drink category number" may be defined as "drink category number, which is the number of times of drinking a strawberries + the number of times of drinking B mangoes".
A condition may be set to the tag, such as "age 18" or "age > 18". Defined tag conditions may also be combined into a tag condition package, e.g., "age 18 and gender male", according to business needs. The tag condition package contains conditions of multiple tags and logical relationships between different tag conditions.
FIG. 1 shows a schematic diagram of an exemplary system for implementing the data acquisition method of the present invention. The system may generally include three levels, an application level, a circulant engine level, and a compute cluster level. In the application layer, a user can configure the label or combine the label condition through a visual interface at the client, so that a more intuitive viewing effect can be provided for the user. The label or combined label conditions may also be automatically configured by the program to achieve greater efficiency. As described above, tags may be obtained by manual definition based on underlying data. The label which is enough adapted to various scenes can be provided through the setting of the label, and then the accurate selection of the corresponding crowd meeting the conditions is realized through the combination of the label condition packets.
The combined label condition packet can be issued to the circulant engine. And the sparkSQL task module creates a sparkSQL query task based on SQL sentences parsed by the tag packet and submits the created sparkSQL task to an execution agent module in the computing cluster for execution. The SQL sentence analyzed from the tag condition packet has higher efficiency by using spark SQL.
The resource scheduling module schedules the resource request sent by the execution agent. Ideally, the circle man system's request for server resources should be satisfied immediately. However, in real-world situations, server resources are often limited (for example, facing a busy Hadoop cluster), and if machine resources are insufficient, requests often need to wait for a period of time to reach the corresponding resources. The resource scheduling modules are arranged into a queue according to the submitted sequence, the queue is a first-in first-out queue, when resource allocation is carried out, resources are allocated to the top task in the queue firstly, and the top task is allocated to the next task after the top task meets the requirements, and so on, so that the effective scheduling of the query tasks can be realized.
The people circling module can be a set of distributed clusters built on the basis of Hadoop at the bottom layer, executes a SparkSQL task on a data source (not shown in figure 1) according to a label condition packet to obtain corresponding crowd result data, and stores the generated crowd packet into a Hive crowd table (namely, a Hive data source).
The data synchronization module in the circus engine can establish a data synchronization task (such as a DataX task), and synchronize the crowd result data in the Hive data source into a local database (circus system database) based on the mapping relation between the local database and the Hive database source. The crowd result data (i.e., target data) synchronized into the circle people system database may be manually derived by the user.
In one embodiment, a real-time log of query tasks and task status may be sent to the client to facilitate a user in better monitoring status and progress. In one embodiment, the query task may be executed according to a preset period (for example, once per week), so as to update the crowd result data, thereby achieving better automation and improving efficiency.
FIG. 2 illustrates a workflow of a circulant engine according to one embodiment of the invention, including the steps of:
step 1: the user combines the labels at the application layer, and freely configures label combination comprising equal and or unequal logics;
step 2: determining label logic and issuing a circulant request;
and step 3: analyzing the label logic to generate SQL sentences;
and 4, step 4: calling a development platform interface through SQL to create a Spark task;
and 5: dispatching the Spark task to an execution agent through a task dispatching algorithm;
step 6: the executing agent submits the task to the YARN queue;
and 7: waiting for allocation of computing resources;
and 8: distributing to a computing resource, starting to execute a task, computing people-around data from a data source, and importing the people-around data into a Hive data source;
and step 9: establishing a mapping relation between a Hive data source and MySQL of a people circling database, and creating a DataX task;
step 10: and executing a DataX task, synchronizing the crowd result data into MySQL, and providing the MySQL for people circling to use so as to download the crowd package. The Hive, MySQL and DataX are simple to realize in cooperation, and the execution efficiency is high.
The steps, methods and apparatus of the embodiments of the present invention may be implemented as a pure software module (e.g., a software program written in Java language), as a pure hardware module (e.g., a special-purpose ASIC chip or FPGA chip) as required, or as a module combining software and hardware (e.g., a firmware system storing fixed code).
Another aspect of the invention is a computer-readable medium having computer-readable instructions stored thereon that, when executed, perform a method of embodiments of the invention.
It will be appreciated by persons skilled in the art that the foregoing description is only exemplary of the invention and is not intended to limit the invention. The present invention may include various modifications and variations. Any modifications and variations within the spirit and scope of the present invention should be included within the scope of the present invention.
Claims (11)
1. A method for computer-based selection of a particular object from a data source, comprising:
setting, by a computer, a tag associated with an object;
setting, by a computer, a label condition of the label;
setting logic relations among different label conditions through a computer to form a label condition packet; and
using, by a computer, the tag condition package to circle the particular object from the data source.
2. The method of claim 1, wherein said operation of using, by a computer, the tag conditioning pack to circle the particular object from the data source comprises:
analyzing the label condition packet to generate an SQL statement;
creating a query task based on the SQL statement;
scheduling the query task for execution against the data source.
3. The method of claim 2, wherein the query task is a SparkSQL task.
4. The method of claim 1, further comprising:
importing data representing the circled specific object into a second data source; and
establishing a mapping relation between the second data source and a local database;
synchronizing the data from the second data source to the local database based on the mapping relationship.
5. The method of claim 4, wherein the second data source is a Hive data source, the local database is a MySQL database, and the synchronization operation uses DataX.
6. The method of claim 2, wherein the scheduling operation comprises:
submitting the query task to an execution agent module;
the execution agent module sends out a resource request; and
and if the resource request cannot be met immediately, the query task is put into a first-in first-out queue to wait until the resource request is met.
7. The method of claim 1, wherein the setting operation of the tag and the tag condition is performed through a visualized interface or program.
8. The method of claim 2, further comprising:
and sending the real-time log and the state of the query task to a client.
9. The method of claim 2, wherein the query task is scheduled to be executed at a preset period.
10. A method for computer-based selection of a particular object from a data source, comprising:
means for setting a tag associated with the object;
means for setting a label condition of the label;
means for setting a logical relationship between different tag conditions to form a tag condition package; and
means for using the tag condition package to circle the particular object from the data source.
11. A computer readable medium having stored thereon computer readable instructions capable, when executed, of performing the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010581611.5A CN111787084A (en) | 2020-06-23 | 2020-06-23 | Method and device for selecting object |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010581611.5A CN111787084A (en) | 2020-06-23 | 2020-06-23 | Method and device for selecting object |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111787084A true CN111787084A (en) | 2020-10-16 |
Family
ID=72757190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010581611.5A Pending CN111787084A (en) | 2020-06-23 | 2020-06-23 | Method and device for selecting object |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111787084A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140307098A1 (en) * | 2013-04-15 | 2014-10-16 | Microsoft Corporation | Extracting true color from a color and infrared sensor |
CN108304591A (en) * | 2018-03-16 | 2018-07-20 | 深圳市买买提信息科技有限公司 | A kind of method for customizing of label, system and terminal device |
CN109471904A (en) * | 2018-11-01 | 2019-03-15 | 杭州数澜科技有限公司 | A kind of method and system for tissue label |
CN110648185A (en) * | 2019-11-28 | 2020-01-03 | 苏宁云计算有限公司 | Target crowd circling method and device and computer equipment |
US20200067789A1 (en) * | 2016-06-24 | 2020-02-27 | QiO Technologies Ltd. | Systems and methods for distributed systemic anticipatory industrial asset intelligence |
CN110955690A (en) * | 2019-08-21 | 2020-04-03 | 广州云徙科技有限公司 | Self-service data labeling platform and self-service data labeling method based on big data technology |
-
2020
- 2020-06-23 CN CN202010581611.5A patent/CN111787084A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140307098A1 (en) * | 2013-04-15 | 2014-10-16 | Microsoft Corporation | Extracting true color from a color and infrared sensor |
US20200067789A1 (en) * | 2016-06-24 | 2020-02-27 | QiO Technologies Ltd. | Systems and methods for distributed systemic anticipatory industrial asset intelligence |
CN108304591A (en) * | 2018-03-16 | 2018-07-20 | 深圳市买买提信息科技有限公司 | A kind of method for customizing of label, system and terminal device |
CN109471904A (en) * | 2018-11-01 | 2019-03-15 | 杭州数澜科技有限公司 | A kind of method and system for tissue label |
CN110955690A (en) * | 2019-08-21 | 2020-04-03 | 广州云徙科技有限公司 | Self-service data labeling platform and self-service data labeling method based on big data technology |
CN110648185A (en) * | 2019-11-28 | 2020-01-03 | 苏宁云计算有限公司 | Target crowd circling method and device and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9277003B2 (en) | Automated cloud workload management in a map-reduce environment | |
CN110096344A (en) | Task management method, system, server cluster and computer-readable medium | |
US20070005623A1 (en) | Process oriented message driven workflow programming model | |
US20140172809A1 (en) | Hadoop access via hadoop interface services based on function conversion | |
US20100318492A1 (en) | Data analysis system and method | |
US20140059054A1 (en) | Parallel generation of topics from documents | |
US20090282413A1 (en) | Scalable Scheduling of Tasks in Heterogeneous Systems | |
CN103092683A (en) | Scheduling used for analyzing data and based on elicitation method | |
US20200285508A1 (en) | Method and Apparatus for Assigning Computing Task | |
Vinod et al. | Simulation-based metamodels for scheduling a dynamic job shop with sequence-dependent setup times | |
CN110457333B (en) | Data real-time updating method and device and computer readable storage medium | |
US20170046376A1 (en) | Method and system for monitoring data quality and dependency | |
US9141936B2 (en) | Systems and methods for simulating a resource constrained process | |
CN109978392A (en) | Agile Software Development management method, device, electronic equipment, storage medium | |
CN110673959A (en) | System, method and apparatus for processing tasks | |
CN114090608A (en) | Data report generation method and device | |
CN115202847A (en) | Task scheduling method and device | |
US20210390496A1 (en) | Method for model-based project scoring classification and reporting | |
US20110276358A1 (en) | Allocation of work items via queries of organizational structure and dynamic work item allocation | |
CN1783121A (en) | Method and system for executing design automation | |
CN112363914A (en) | Parallel test resource configuration optimization method, computing device and storage medium | |
CN111787084A (en) | Method and device for selecting object | |
CN109829005A (en) | A kind of big data processing method and processing device | |
US10250716B2 (en) | Priority-driven boxcarring of action requests from component-driven cloud applications | |
CN114066507A (en) | Promotion information analysis method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201016 |
|
WD01 | Invention patent application deemed withdrawn after publication |