CN111787084A - Method and device for selecting object - Google Patents

Method and device for selecting object Download PDF

Info

Publication number
CN111787084A
CN111787084A CN202010581611.5A CN202010581611A CN111787084A CN 111787084 A CN111787084 A CN 111787084A CN 202010581611 A CN202010581611 A CN 202010581611A CN 111787084 A CN111787084 A CN 111787084A
Authority
CN
China
Prior art keywords
data source
label
tag
computer
setting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010581611.5A
Other languages
Chinese (zh)
Inventor
梁爽
江敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dtwave Technology Co ltd
Original Assignee
Hangzhou Dtwave Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dtwave Technology Co ltd filed Critical Hangzhou Dtwave Technology Co ltd
Priority to CN202010581611.5A priority Critical patent/CN111787084A/en
Publication of CN111787084A publication Critical patent/CN111787084A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the invention sets labels and label conditions related to objects, sets logic relations among different label conditions to form a label condition packet, and then uses the label condition packet to circle out the specific object from the data source. The embodiment of the invention can improve the efficiency of object selection.

Description

Method and device for selecting object
Technical Field
The invention relates to the field of data analysis, in particular to a method and a device for selecting objects.
Background
Users on the internet can generate enormous amounts of behavioral data. Taking e-commerce website as an example, the behavior data refers to which users browse the e-commerce website and which commodities are purchased.
People meeting the preset conditions can be selected through the behavior data circle. The operation of circling out a group of people satisfying a predetermined condition from a large group of people is also called "circling people". The scheme of using the table of Excel software of Microsoft corporation to circle people exists in the prior art, and the defects are that specified conditions are required to be manually input to screen people, and then the screened people are analyzed by using a data analysis system, so that high manual participation degree is required, the efficiency is low, and the accuracy is not high.
Disclosure of Invention
In view of the problems in the prior art, the present invention aims to provide a method and an apparatus for efficiently selecting a specific object.
One aspect of the invention relates to a method of computer-based selection of a particular object from a data source, comprising: setting, by a computer, a tag associated with an object; setting, by a computer, a label condition of the label; setting logic relations among different label conditions through a computer to form a label condition packet; and using, by a computer, the tag condition package to circle the particular object from the data source.
Yet another aspect of the invention relates to a method of circling a particular object from a data source by a computer, comprising: means for setting a tag associated with the object; means for setting a label condition of the label; means for setting a logical relationship between different tag conditions to form a tag condition package; and means for using the tag condition package to circle the particular object from the data source.
The embodiment of the invention can reduce the manual participation degree and greatly improve the efficiency of object selection.
Drawings
FIG. 1 is a schematic diagram of a system implementing an embodiment of the invention.
FIG. 2 is a flow chart of a method of implementing an embodiment of the present invention.
Detailed Description
The content of the invention will now be described with reference to a number of exemplary embodiments. It is to be understood that these examples are set forth merely to enable those of ordinary skill in the art to better understand and thereby implement the teachings of the present invention, and are not intended to suggest any limitation as to the scope of the invention.
As used herein, the term "include" and its variants should be read as open-ended terms meaning "including, but not limited to. The term "based on" should be read as "based, at least in part, on. The terms "one embodiment" and "an embodiment" should be read as "at least one embodiment". The term "another embodiment" should be read as "at least one other embodiment".
The following is described by way of example. But as those skilled in the art will recognize, the present invention may also be used for reduced scope selection operations (i.e., circle selection operations) for many types of objects (e.g., animals, plants, merchandise, etc.).
The tags may be obtained by manual definition based on underlying data (e.g., access records, order information, user information, etc.). The label may be formed (or referred to as "defined") by combining irregular individual fields (which may be understood as a single action by the user, such as purchase, settlement, etc.) or multiple fields in the underlying data. For example, assuming that the label "active user" is to be defined, the definition may be based on the Login Time in the behavior data. Specifically, an "active user" may be defined that has a Login Time within 5 days (i.e., Login Time < 5). Similarly, the label "drink category number" may be defined as "drink category number, which is the number of times of drinking a strawberries + the number of times of drinking B mangoes".
A condition may be set to the tag, such as "age 18" or "age > 18". Defined tag conditions may also be combined into a tag condition package, e.g., "age 18 and gender male", according to business needs. The tag condition package contains conditions of multiple tags and logical relationships between different tag conditions.
FIG. 1 shows a schematic diagram of an exemplary system for implementing the data acquisition method of the present invention. The system may generally include three levels, an application level, a circulant engine level, and a compute cluster level. In the application layer, a user can configure the label or combine the label condition through a visual interface at the client, so that a more intuitive viewing effect can be provided for the user. The label or combined label conditions may also be automatically configured by the program to achieve greater efficiency. As described above, tags may be obtained by manual definition based on underlying data. The label which is enough adapted to various scenes can be provided through the setting of the label, and then the accurate selection of the corresponding crowd meeting the conditions is realized through the combination of the label condition packets.
The combined label condition packet can be issued to the circulant engine. And the sparkSQL task module creates a sparkSQL query task based on SQL sentences parsed by the tag packet and submits the created sparkSQL task to an execution agent module in the computing cluster for execution. The SQL sentence analyzed from the tag condition packet has higher efficiency by using spark SQL.
The resource scheduling module schedules the resource request sent by the execution agent. Ideally, the circle man system's request for server resources should be satisfied immediately. However, in real-world situations, server resources are often limited (for example, facing a busy Hadoop cluster), and if machine resources are insufficient, requests often need to wait for a period of time to reach the corresponding resources. The resource scheduling modules are arranged into a queue according to the submitted sequence, the queue is a first-in first-out queue, when resource allocation is carried out, resources are allocated to the top task in the queue firstly, and the top task is allocated to the next task after the top task meets the requirements, and so on, so that the effective scheduling of the query tasks can be realized.
The people circling module can be a set of distributed clusters built on the basis of Hadoop at the bottom layer, executes a SparkSQL task on a data source (not shown in figure 1) according to a label condition packet to obtain corresponding crowd result data, and stores the generated crowd packet into a Hive crowd table (namely, a Hive data source).
The data synchronization module in the circus engine can establish a data synchronization task (such as a DataX task), and synchronize the crowd result data in the Hive data source into a local database (circus system database) based on the mapping relation between the local database and the Hive database source. The crowd result data (i.e., target data) synchronized into the circle people system database may be manually derived by the user.
In one embodiment, a real-time log of query tasks and task status may be sent to the client to facilitate a user in better monitoring status and progress. In one embodiment, the query task may be executed according to a preset period (for example, once per week), so as to update the crowd result data, thereby achieving better automation and improving efficiency.
FIG. 2 illustrates a workflow of a circulant engine according to one embodiment of the invention, including the steps of:
step 1: the user combines the labels at the application layer, and freely configures label combination comprising equal and or unequal logics;
step 2: determining label logic and issuing a circulant request;
and step 3: analyzing the label logic to generate SQL sentences;
and 4, step 4: calling a development platform interface through SQL to create a Spark task;
and 5: dispatching the Spark task to an execution agent through a task dispatching algorithm;
step 6: the executing agent submits the task to the YARN queue;
and 7: waiting for allocation of computing resources;
and 8: distributing to a computing resource, starting to execute a task, computing people-around data from a data source, and importing the people-around data into a Hive data source;
and step 9: establishing a mapping relation between a Hive data source and MySQL of a people circling database, and creating a DataX task;
step 10: and executing a DataX task, synchronizing the crowd result data into MySQL, and providing the MySQL for people circling to use so as to download the crowd package. The Hive, MySQL and DataX are simple to realize in cooperation, and the execution efficiency is high.
The steps, methods and apparatus of the embodiments of the present invention may be implemented as a pure software module (e.g., a software program written in Java language), as a pure hardware module (e.g., a special-purpose ASIC chip or FPGA chip) as required, or as a module combining software and hardware (e.g., a firmware system storing fixed code).
Another aspect of the invention is a computer-readable medium having computer-readable instructions stored thereon that, when executed, perform a method of embodiments of the invention.
It will be appreciated by persons skilled in the art that the foregoing description is only exemplary of the invention and is not intended to limit the invention. The present invention may include various modifications and variations. Any modifications and variations within the spirit and scope of the present invention should be included within the scope of the present invention.

Claims (11)

1. A method for computer-based selection of a particular object from a data source, comprising:
setting, by a computer, a tag associated with an object;
setting, by a computer, a label condition of the label;
setting logic relations among different label conditions through a computer to form a label condition packet; and
using, by a computer, the tag condition package to circle the particular object from the data source.
2. The method of claim 1, wherein said operation of using, by a computer, the tag conditioning pack to circle the particular object from the data source comprises:
analyzing the label condition packet to generate an SQL statement;
creating a query task based on the SQL statement;
scheduling the query task for execution against the data source.
3. The method of claim 2, wherein the query task is a SparkSQL task.
4. The method of claim 1, further comprising:
importing data representing the circled specific object into a second data source; and
establishing a mapping relation between the second data source and a local database;
synchronizing the data from the second data source to the local database based on the mapping relationship.
5. The method of claim 4, wherein the second data source is a Hive data source, the local database is a MySQL database, and the synchronization operation uses DataX.
6. The method of claim 2, wherein the scheduling operation comprises:
submitting the query task to an execution agent module;
the execution agent module sends out a resource request; and
and if the resource request cannot be met immediately, the query task is put into a first-in first-out queue to wait until the resource request is met.
7. The method of claim 1, wherein the setting operation of the tag and the tag condition is performed through a visualized interface or program.
8. The method of claim 2, further comprising:
and sending the real-time log and the state of the query task to a client.
9. The method of claim 2, wherein the query task is scheduled to be executed at a preset period.
10. A method for computer-based selection of a particular object from a data source, comprising:
means for setting a tag associated with the object;
means for setting a label condition of the label;
means for setting a logical relationship between different tag conditions to form a tag condition package; and
means for using the tag condition package to circle the particular object from the data source.
11. A computer readable medium having stored thereon computer readable instructions capable, when executed, of performing the method of any one of claims 1 to 9.
CN202010581611.5A 2020-06-23 2020-06-23 Method and device for selecting object Pending CN111787084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581611.5A CN111787084A (en) 2020-06-23 2020-06-23 Method and device for selecting object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581611.5A CN111787084A (en) 2020-06-23 2020-06-23 Method and device for selecting object

Publications (1)

Publication Number Publication Date
CN111787084A true CN111787084A (en) 2020-10-16

Family

ID=72757190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581611.5A Pending CN111787084A (en) 2020-06-23 2020-06-23 Method and device for selecting object

Country Status (1)

Country Link
CN (1) CN111787084A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140307098A1 (en) * 2013-04-15 2014-10-16 Microsoft Corporation Extracting true color from a color and infrared sensor
CN108304591A (en) * 2018-03-16 2018-07-20 深圳市买买提信息科技有限公司 A kind of method for customizing of label, system and terminal device
CN109471904A (en) * 2018-11-01 2019-03-15 杭州数澜科技有限公司 A kind of method and system for tissue label
CN110648185A (en) * 2019-11-28 2020-01-03 苏宁云计算有限公司 Target crowd circling method and device and computer equipment
US20200067789A1 (en) * 2016-06-24 2020-02-27 QiO Technologies Ltd. Systems and methods for distributed systemic anticipatory industrial asset intelligence
CN110955690A (en) * 2019-08-21 2020-04-03 广州云徙科技有限公司 Self-service data labeling platform and self-service data labeling method based on big data technology

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140307098A1 (en) * 2013-04-15 2014-10-16 Microsoft Corporation Extracting true color from a color and infrared sensor
US20200067789A1 (en) * 2016-06-24 2020-02-27 QiO Technologies Ltd. Systems and methods for distributed systemic anticipatory industrial asset intelligence
CN108304591A (en) * 2018-03-16 2018-07-20 深圳市买买提信息科技有限公司 A kind of method for customizing of label, system and terminal device
CN109471904A (en) * 2018-11-01 2019-03-15 杭州数澜科技有限公司 A kind of method and system for tissue label
CN110955690A (en) * 2019-08-21 2020-04-03 广州云徙科技有限公司 Self-service data labeling platform and self-service data labeling method based on big data technology
CN110648185A (en) * 2019-11-28 2020-01-03 苏宁云计算有限公司 Target crowd circling method and device and computer equipment

Similar Documents

Publication Publication Date Title
US9277003B2 (en) Automated cloud workload management in a map-reduce environment
CN110096344A (en) Task management method, system, server cluster and computer-readable medium
US20070005623A1 (en) Process oriented message driven workflow programming model
US20140172809A1 (en) Hadoop access via hadoop interface services based on function conversion
US20100318492A1 (en) Data analysis system and method
US20140059054A1 (en) Parallel generation of topics from documents
US20090282413A1 (en) Scalable Scheduling of Tasks in Heterogeneous Systems
CN103092683A (en) Scheduling used for analyzing data and based on elicitation method
US20200285508A1 (en) Method and Apparatus for Assigning Computing Task
Vinod et al. Simulation-based metamodels for scheduling a dynamic job shop with sequence-dependent setup times
CN110457333B (en) Data real-time updating method and device and computer readable storage medium
US20170046376A1 (en) Method and system for monitoring data quality and dependency
US9141936B2 (en) Systems and methods for simulating a resource constrained process
CN109978392A (en) Agile Software Development management method, device, electronic equipment, storage medium
CN110673959A (en) System, method and apparatus for processing tasks
CN114090608A (en) Data report generation method and device
CN115202847A (en) Task scheduling method and device
US20210390496A1 (en) Method for model-based project scoring classification and reporting
US20110276358A1 (en) Allocation of work items via queries of organizational structure and dynamic work item allocation
CN1783121A (en) Method and system for executing design automation
CN112363914A (en) Parallel test resource configuration optimization method, computing device and storage medium
CN111787084A (en) Method and device for selecting object
CN109829005A (en) A kind of big data processing method and processing device
US10250716B2 (en) Priority-driven boxcarring of action requests from component-driven cloud applications
CN114066507A (en) Promotion information analysis method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201016

WD01 Invention patent application deemed withdrawn after publication