CN106980865B - Method and device for optimizing extraction performance in multi-condition extraction - Google Patents

Method and device for optimizing extraction performance in multi-condition extraction Download PDF

Info

Publication number
CN106980865B
CN106980865B CN201610034045.XA CN201610034045A CN106980865B CN 106980865 B CN106980865 B CN 106980865B CN 201610034045 A CN201610034045 A CN 201610034045A CN 106980865 B CN106980865 B CN 106980865B
Authority
CN
China
Prior art keywords
conditions
condition
expression
execution
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610034045.XA
Other languages
Chinese (zh)
Other versions
CN106980865A (en
Inventor
徐磊石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610034045.XA priority Critical patent/CN106980865B/en
Publication of CN106980865A publication Critical patent/CN106980865A/en
Application granted granted Critical
Publication of CN106980865B publication Critical patent/CN106980865B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a method and a device for optimizing extraction performance in multi-condition extraction, wherein the method comprises the following steps: judging whether an expression formed by the current conditions only contains one condition; if yes, determining the continuous full and partial and/or the continuous full or partial optimal execution sequence in the expression composed of the current conditions; respectively taking the continuous whole and part of the optimal execution sequence and/or the continuous whole or part of the optimal execution sequence as an integral condition and respectively using a variable for replacement to form a new expression containing the variable; returning to execute the judging step aiming at the new expression; if not, replacing the variable with the original expression to obtain the optimal execution sequence expression of the conditions; the target extraction is performed in an optimal execution order of the plurality of conditions. The method and the device for determining the multi-condition execution sequence automatically realize the determination of the multi-condition execution sequence and effectively improve the target extraction performance.

Description

Method and device for optimizing extraction performance in multi-condition extraction
Technical Field
The present application relates to the field of computers, and in particular, to a method and an apparatus for optimizing extraction performance in multi-condition extraction.
Background
The multi-condition extraction is an object extraction process for extracting an object satisfying a plurality of conditions simultaneously from the full-scale data. There are multiple conditional extraction processes in many scenarios, such as database query scenarios with multiple conditions, suspicious data filtering scenarios with multiple conditions, call/extract data scenarios with multiple conditions, and so on. Currently, when target extraction is performed using a plurality of conditions, the relationship between the plurality of conditions is fixed, and target extraction is performed using the fixed relationship between the plurality of conditions and a fixed execution order. However, since the execution time of each condition is different, and the execution time is different, the execution order of the conditions directly affects the target extraction performance. In order to improve the target extraction performance, a manual intervention mode is mostly adopted at present. Namely, the execution sequence of each condition is manually adjusted, so that the condition with large calculation amount and time consumption is executed finally, the execution times are reduced, and the target extraction performance is improved.
The manual intervention mode is difficult to quickly obtain the optimal adjustment result under the scene with more conditions, and cannot be automatically executed in batch. Therefore, it is necessary to provide a method for automatically determining the multi-condition execution order in the multi-condition extraction to achieve the optimal extraction performance.
Disclosure of Invention
One of the technical problems solved by the present application is to provide a method and a system for optimizing extraction performance in multi-condition extraction, so as to automatically determine the execution sequence of the multi-condition during the multi-condition extraction and automatically improve the target extraction performance.
According to an embodiment of an aspect of the present application, there is provided a method for optimizing extraction performance in multi-condition extraction, where the method is used in a target extraction scenario using multiple conditions, and automatically determines an execution order of the multiple conditions to improve extraction performance, and the method includes:
exhausting different execution sequences of the plurality of conditions for target extraction;
calculating a performance parameter value corresponding to each execution sequence;
selecting an execution order in which performance parameter values are maximum as an execution order of the plurality of conditions of the determined target extraction;
performing the target extraction in the determined execution order of the plurality of conditions.
According to an embodiment of another aspect of the present application, there is provided a method for optimizing extraction performance in multi-condition extraction, the method being used for a target extraction scenario using a plurality of conditions, and automatically determining an execution order of the plurality of conditions to improve extraction performance, the method including:
judging whether an expression formed by the current conditions only contains one condition;
if the expression composed of the current conditions does not only contain one condition, determining the continuous full and partial expression composed of the current conditions and/or the optimal execution sequence of the continuous full or partial expression;
respectively taking the continuous whole and part of the optimal execution sequence and/or the continuous whole or part of the optimal execution sequence as an integral condition and respectively using a variable for replacement to form a new expression containing the variable;
returning and executing the step of judging whether the expression formed by the current conditions only contains one condition or not aiming at the new expression;
if the expression composed of the current conditions only contains one condition, replacing the variable by using the original expression to obtain the optimal execution sequence expression of the conditions;
the target extraction is performed in an optimal execution order of the plurality of conditions.
According to another aspect of the present application, there is provided an apparatus for optimizing extraction performance in multi-condition extraction, in a target extraction scenario using multiple conditions, automatically determining an execution order of the multiple conditions to improve extraction performance, the apparatus including:
the exhaustion unit is used for exhausting different execution sequences of the plurality of conditions during target extraction;
a performance parameter value calculation unit for calculating a performance parameter value corresponding to each execution sequence;
a selection unit configured to select an execution order in which performance parameter values are largest as an execution order of the plurality of conditions of the determined target extraction;
an object extraction unit configured to perform the object extraction in the execution order of the plurality of conditions determined by the selection unit.
According to another aspect of the present application, there is provided an apparatus for optimizing extraction performance in multi-condition extraction, the apparatus being configured to perform a target extraction scenario using a plurality of conditions, and automatically determining an execution order of the plurality of conditions to improve extraction performance, the apparatus including:
the judging unit is used for judging whether the expression formed by the current conditions only contains one condition;
the optimal execution sequence determining unit is used for determining the optimal execution sequence of continuous all and part and/or continuous all or part in the expression composed of the current conditions under the condition that the expression composed of the current conditions contains more than one condition;
the replacing unit is used for respectively using the continuous whole and part of the optimal execution sequence and/or the continuous whole or part of the optimal execution sequence as an integral condition and respectively using a variable for replacing to form a new expression containing the variable, and transmitting the new expression to the judging unit to execute the judging operation;
the reverse replacement unit is used for replacing the variable back by using the original expression under the condition that the expression formed by the current conditions only contains one condition to obtain the optimal execution sequence expression of the conditions;
an object extraction unit configured to perform the object extraction in an optimal execution order of the plurality of conditions.
The execution sequence with the maximum performance parameter value can be selected as the execution sequence of the plurality of conditions extracted by the target by exhaustively exhausting different execution sequences of the plurality of conditions and determining the performance parameter values in different execution sequences; or the optimal execution sequence of a plurality of conditions is determined in a recursion mode, so that the execution sequence of the plurality of conditions during multi-condition extraction is automatically determined, and the target extraction performance is automatically improved.
It will be appreciated by those of ordinary skill in the art that although the following detailed description will proceed with reference being made to illustrative embodiments, the present application is not intended to be limited to these embodiments. Rather, the scope of the application is broad and is intended to be defined only by the claims that follow.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow diagram of a method for optimizing extraction performance in multi-conditional extraction according to one embodiment of the present application.
FIG. 2 is a schematic diagram of conditional extraction according to one embodiment of the present application.
FIG. 3 is a flow diagram of a method for optimizing extraction performance in multi-conditional extraction according to another embodiment of the present application.
Fig. 4 is a schematic structural diagram of an apparatus for optimizing extraction performance in multi-conditional extraction according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of an apparatus for optimizing extraction performance in multi-conditional extraction according to another embodiment of the present application.
It will be appreciated by those of ordinary skill in the art that although the following detailed description will proceed with reference being made to illustrative embodiments, the present application is not intended to be limited to these embodiments. Rather, the scope of the application is broad and is intended to be defined only by the claims that follow.
Detailed Description
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.
The computer equipment comprises user equipment and network equipment. Wherein the user equipment includes but is not limited to computers, smart phones, PDAs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. The computer equipment can be independently operated to realize the application, and can also be accessed into a network to realize the application through the interactive operation with other computer equipment in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present application, if applicable, and are included by reference.
The methods discussed below, some of which are illustrated by flow diagrams, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) may perform the necessary tasks.
Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present application. This application may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between elements (e.g., "between" versus "directly between", "adjacent" versus "directly adjacent to", etc.) should be interpreted in a similar manner.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
The technical solution of the present application is further described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of a method for optimizing extraction performance in multi-condition extraction, according to an embodiment of the present application, in a target extraction scenario using multiple conditions, automatically determining an execution order of the multiple conditions to improve extraction performance, where the method includes:
s110, exhausting different execution sequences of the plurality of conditions during target extraction;
s120, calculating performance parameter values corresponding to each execution sequence;
s130, selecting the execution sequence with the maximum performance parameter value as the execution sequence of the plurality of conditions of the determined target extraction;
s140, the target extraction is carried out according to the determined execution sequence of the conditions.
To further understand the present solution, the above steps will be described in further detail.
Since the relationship between the plurality of conditions is fixed when the target extraction is performed using the plurality of conditions, the different execution order of the target extraction by exhaustively exhausting the plurality of conditions described in step S110 is to change the different execution order of the plurality of conditions while keeping the fixed relationship between the plurality of conditions unchanged. That is, the embodiments of the present application exhaust different execution sequences when performing target extraction under the multiple conditions, including: and acquiring the relation among the conditions, and exhausting the conditions to perform different execution sequences when the conditions are subjected to target extraction under the condition of ensuring that the relation among the conditions is not changed. When it is determined that the target extraction is performed using the plurality of conditions, the relationship between the plurality of conditions is determined, and therefore, the present embodiment can achieve the acquisition of the relationship between the plurality of conditions. The relationship between the plurality of conditions is a logical relationship between the plurality of conditions, including but not limited to: and, or, not, etc. One embodiment of different execution sequences when performing target extraction under the exhaustive plurality of conditions is as follows:
replacing each condition with a variable, and forming an expression according to the relation between each condition; the different execution orders for exhaustively executing the plurality of conditions for target extraction under the condition of ensuring that the relationship among the plurality of conditions is unchanged include:
firstly, removing brackets in the expression by using Demorgen's law and a combination law; wherein, using Demorgen law to remove "! "corresponding parentheses. For example, the expression! (A1& A2), on the removal of "! "corresponding brackets are changed to! A1 |! A2. All useless brackets in the expression can be removed by using a combination law, for example, the expression A1& (A2& A3) & A4 is converted into A1& A2& A3& A4 after the brackets are removed.
And then, exchanging the positions of the variables in the expression by using an exchange law to realize different execution sequences exhausting the plurality of conditions. For example, the expression (a1& a2) | A3, after pairwise permutation, yields results including: four kinds of (a1& a2) | A3, (a2& a1) | A3, A3| (a1& a2), and A3| (a2& a 1).
Taking the scenario shown in fig. 2 as an example, the target extraction is performed under three conditions: the identification number is in the list A123, the time of surfing the internet every day is less than 2 hours, and PC or wireless registration is used. Each condition is replaced with a variable, e.g., A1 in place of the identification number in list a 123; a2 replaces the time of surfing the Internet less than 2 hours per day; a3 instead of using PC or wireless registration. Then the expression conditions are composed according to the relationship between the three conditions: (A1& A2) | A3. The different execution orders of the three conditions are exhausted under the condition of ensuring that the relationship of the three conditions is not changed, namely the four conditions listed above.
In the multi-condition extraction, there are various ways to evaluate the extraction performance, all of which do not depart from the two basic elements of the number of times of condition execution and the time of condition execution. The embodiment of the application shortens the extraction time by changing the execution sequence of the multiple conditions under the condition that the target extraction result is not changed. Assuming that the execution times of the condition Ai is ci and the execution time is ti, the performance parameter value corresponding to each execution sequence calculated in step S120 may be f (ci, ti), where f is a function, and the function is different according to different application scenarios. In one embodiment, the performance parameter value is an inverse of an overall time required for performing the target extraction by using the plurality of conditions, and the method for calculating the performance parameter value corresponding to each execution sequence in step S120 includes the following sub-steps:
substep 1201 determines the number of times of execution of each condition in different execution orders according to the predetermined short-circuit rate of each condition.
The short-circuit rate of the condition is the probability that the condition is not met in the total data. That is, if the target extraction is performed on the total amount data under one condition a and the extraction result is false probability, if the target extraction is performed on the total amount data under condition a, the probability of true is 30% and the probability of false is 70%, the short-circuit rate under condition a is 70%.
In this embodiment, the short-circuit rate of each condition may be determined in advance through statistics. For each execution sequence that is exhaustive, the number of executions of each condition in the execution sequence may be determined according to the short-circuit rate of the condition. Let the short-circuit rate of A1 be p1, the short-circuit rate of A2 be p2, and the total number of pieces of full data be n. For the expression A1& A2, the execution frequency of A1 is n, and the execution frequency of A2 in the expression corresponding to the execution sequence is determined to be (1-p1) n according to the short-circuit rate of A1; for the expression a2& a1, the execution number of a2 is n, and the execution number of a1 in the expression corresponding to the execution order is determined to be (1-p2) n according to the short-circuit rate of a 2. For the expression A1| A2, the execution times of A1 is n, and the execution times of A2 in the expression corresponding to the execution sequence is determined to be p1 × n according to the short-circuit rate of A1; for the expression a2| a1, the execution number of a2 is n, and the execution number of a1 in the expression corresponding to the execution order is p2 × n according to the short-circuit rate of a 2. That is, the number of times the condition executed first among the plurality of conditions is executed is the total number of the total data, and the condition not executed first is related to the short-circuit rate of the condition executed before.
And a substep 1202 of determining the total time required for extracting the target under different execution sequences of the plurality of conditions according to the execution times of each condition and the predetermined execution time of each condition.
In an execution order expression of a plurality of conditions, where the number of times the condition Ai is executed is ci and the execution time is ti, the condition requires time f (ci, ti) in the expression, and the total time of the plurality of conditions is T ═ f (c1, T1) + f (c2, T2) + … f (cn, tn).
The execution time of each condition may be determined statistically in advance, and may be a constant, or may be a function of the execution times, or a modified function of time, such as the time μ t after multiplying by the empirical coefficient μ.
Suppose that condition a1 calculates the time to be constant 1ms and the short circuit rate to be 70%; a2, calculating the time to be constant 10ms and the short circuit rate to be 50%; the total number of the full data is n; the total time required was calculated separately for two different execution orders of the same expression, a1& a2 and a2& a 1.
Wherein the total time required for A1& A2 is:
n×1ms+(1-70%)n×10ms=4n;
the total time required for a2& a1 was:
n×10ms+(1-50%)n×1ms=10.5n;
as can be seen, the fetch times for different execution orders differ significantly, and in the above example, the condition A1 is executed first, which saves nearly one time compared to the condition A2.
Substep 1203, calculating the reciprocal of the total time as a performance parameter value for the plurality of conditions in different execution orders.
Since the total time can be calculated for each execution sequence, the reciprocal of the total time of each execution sequence can be correspondingly calculated as the performance parameter value of the execution sequence.
The shorter the overall time required, the larger the corresponding performance parameter value, i.e. the better the target extraction performance. Step S130 is to select the execution order with the largest performance parameter value as the determined execution order of the plurality of conditions for the target extraction.
Step S140 is to perform the target extraction in the execution order of the plurality of conditions determined.
As can be seen from the above description, in the embodiment of the present application, different execution sequences of a plurality of conditions are exhausted, and performance parameter values in different execution sequences are determined, so that the execution sequence with the largest performance parameter value can be selected as the execution sequence of the plurality of conditions for target extraction, thereby automatically determining the execution sequence of the plurality of conditions in the multi-condition extraction, and automatically improving the target extraction performance. The above embodiments of the present application are applicable to a scenario with relatively few conditions, so that each execution sequence can be enumerated quickly and a performance parameter value of each execution sequence can be determined.
Fig. 3 is a flowchart of a method for optimizing extraction performance in multi-condition extraction according to another embodiment of the present application, where the method is used in a target extraction scenario using multiple conditions, and automatically determines an execution order of the multiple conditions to improve extraction performance, and the method mainly includes the following steps:
s310, judging whether the expression composed of the current conditions only contains one condition;
if not, go to step S320; if yes, the process proceeds to step S340.
S320, determining continuous full and partial and/or continuous full or partial optimal execution sequence in the expression composed of the current conditions;
s330, respectively taking the continuous whole and part of the optimal execution sequence and/or the continuous whole or part of the optimal execution sequence as an integral condition and respectively using a variable for replacement to form a new expression;
thereafter, step S310 is repeatedly executed for the new expression until only one condition is included in the expression of the current condition composition, and in the case where only one condition is included in the expression of the current condition composition, step S340 is executed.
S340, replacing the variables with the original expression to obtain the optimal execution sequence expression of the conditions;
and S350, extracting the target according to the optimal execution sequence of the conditions.
The above steps are described in further detail below.
It is understood that before step S310 is executed, an expression composed of the plurality of conditions is obtained. That is, the plurality of conditions are extracted, each condition is replaced with a different variable, respectively, and an expression is composed according to the relationship between each condition. Before judging whether the expression composed of the current conditions only contains one condition, the brackets in the current expression can be removed according to the Demorgen's law and the combination law aiming at the expression. Wherein, using Demorgen law to remove "! "corresponding parentheses. For example, the expression! (A1& A2), on the removal of "! "corresponding brackets are changed to! A1 |! A2. All useless brackets in the expression can be removed by using a combination law, for example, the expression A1& (A2& A3) & A4 is converted into A1& A2& A3& A4 after the brackets are removed.
Since the embodiment of the present application relates to a multi-condition extraction scenario, when step S310 is initially executed, the expression composed of the current condition is the expression composed of the multiple conditions, and if the condition included in the expression is not necessarily one, step S320 is performed.
Step S320 is to determine the continuous full and partial and/or the continuous full or partial optimal execution order in the expression composed of the current conditions.
The expression of the sequential full and, for example, a1& a2& A3; an expression of successive integers is for example a1| a2| A3. One or more continuous full and expressions may be included in the expression composed of the current condition, and one or more continuous full or expressions may exist. And determining the optimal execution sequence of each continuous full and expression and each continuous full or expression.
The method for determining the continuous full and partial and/or the optimal execution sequence of the continuous full or partial provided by the embodiment of the application comprises but is not limited to any one of the following steps:
first, the optimal execution order of the whole and/or whole or part of the sequence is determined by the exhaustive method described in the above embodiments. Taking the sequential full and expression a1& a2& A3 as an example, each execution sequence of the expression includes: a1& A2& A3, A1& A3& A2, A2& A1& A3, A2& A3& A1, A3& A1& A2 and A3& A2& A1, calculating performance parameter values corresponding to each execution sequence, and selecting the execution sequence with the largest performance parameter value as the optimal execution sequence of the continuous all-inclusive expression. Due to the limited conditions involved in full and partial or full and partial serial processes, the optimal execution sequence can be determined automatically and quickly by the exhaustive method.
Secondly, the method comprises the following substeps:
substep 3201 of calculating a value of a performance parameter for each condition when the condition is finally executed, in the conditions currently contained in the continuous full and partial and/or continuous full or partial respectively;
substep 3202, taking the last executed condition corresponding to the maximum value of the performance parameter as the last executed condition in the conditions currently contained in the continuous full and partial and/or continuous full or partial;
substep 3203 of removing said last executed condition from said consecutive all and partial and/or consecutive all or partial currently contained conditions;
returning to sub-step 3201, repeating the above operations for the conditions which are continuously all and partially and/or continuously all or partially currently contained after the condition which is executed last is removed, and gradually determining the condition which is executed last in the conditions which are continuously all and partially and/or continuously all or partially currently contained, thereby determining the execution sequence of each condition which is continuously all and partially and/or continuously all or partially contained as the optimal execution sequence of the conditions which are continuously all and partially and/or continuously all or partially contained.
The embodiment of the application reduces the extraction times of each condition by changing the execution sequence of multiple conditions under the condition that the target extraction result is not changed. Wherein the performance parameter values include: the inverse number of the execution times, in the case that the execution time of each condition is constant, the method of calculating the performance parameter value when each condition is executed last in the conditions respectively included in the continuous full and partial and/or the continuous full or partial current may be: calculating the execution times of the last execution of each condition according to the short-circuit rate of each condition; and calculating the reciprocal of the execution times as the corresponding performance parameter value when the condition is finally executed.
For example, for the successive-union expression a1& a2& … & An, assuming that the short-circuit rate of each condition Ai is pi, if Ai is put in the last execution, the number of times of execution ci of the condition Ai is n (1-p1) (1-p2) … (1-pn)/(1-pi), n is the number of pieces of data, and the reciprocal of the number of times of execution ci is calculated as the value of the performance parameter at the last execution of the condition Ai. In this way, the performance parameter values can be calculated regardless of which condition was last executed. The last executed condition Ax corresponding to the maximum value in the performance parameter values may be taken as the last executed condition in the expression. The Ax is then drawn from the expression. If the Ax is A2, extracting A2 from the consecutive full AND expression to obtain An expression A1& A3& … & An, determining the last execution condition in the expression A1& A3& … & An according to the above method, and determining the execution order of all the conditions of the full AND expression as the optimal execution order.
The optimal execution order is determined according to the same method for the continuous full or expression A1| A2| … | An, if Ai is put in the last execution, the execution times ci of Ai is n p1 p2 … pn/pi, and n is the number of data pieces.
Step S330 is to take the continuous whole and part and/or continuous whole or part of the determined optimal execution sequence as an overall condition, and replace the continuous whole and part and/or continuous whole or part of the optimal execution sequence with a new variable. And the short-circuit rate of the overall condition may be determined based on the short-circuit rates of the conditions for all and part of the series and/or all or part of the series. For the expression a1& a2& … & An of the continuum and, assuming that the short-circuit rate of each condition Ai is pi, the short-circuit rate of the continuum and expression as a whole is: 1- (1-p1) (1-p2) (- …) (1-pn). For the consecutive full-or expression a1| a2| … | An, assuming that the short-circuit rate of each condition Ai is pi, the short-circuit rate of the consecutive full-or expression as a whole condition is: p1 p2 … pn.
If the expression composed of a plurality of conditions is a1& a2& A3| A4| A5| A6, after determining that the optimal execution order of the continuous whole and part a1& a2& A3 is A3& a2& a1, the part is taken as a whole condition and replaced by a1 ', and the optimal execution order of the continuous whole or part A4| A5| A6 is A5| A4| A6, the part is taken as a whole condition and replaced by a 2', and the original expression composed of a plurality of conditions, a1& a2& A3| A4| A5| A6, can be converted into a1 '| a 2'.
Then returning to step S310, continuously determining whether the expression composed of the current conditions only includes one condition, and if the converted current expression a1 '| a 2' includes two conditions, continuously determining the optimal execution order of the current expression a1 '| a 2' and part and/or all or part of the current expression, and if the current expression a1 '| a 2' includes all or part of the current expression, the optimal execution order of the expression can still be determined according to the above method, and assuming that the determined optimal execution order of the new expression a1 '| a 2' is a2 '| a 1', replacing the expression a2 '| a 1' with another new variable a1 ", returning to step S310, and if the expression is determined to include only one condition, performing step S340.
Step S340 replaces the variable with the original expression, that is, the expression a2 ' | a1 ' replaces a1 ', the expression A3& a2& a1 replaces a1 ' |, and the expression a5| a4| A6 replaces a2 ', so that the finally obtained expression of the optimal execution sequence composed of the multiple conditions is a5| a4| A6| A3& a2& a 1.
Step S350 is to perform the target extraction in the optimal execution order.
According to the above description, the optimal execution sequence of the expression composed of the conditions is determined in a recursive manner, so that the execution sequence of the conditions during multi-condition extraction is automatically determined, and the target extraction performance is automatically improved.
The embodiment is suitable for various multi-condition extraction scenes, and can effectively improve the target extraction performance under the condition of complicated rules. For example, when massive offline data is processed, operations such as analyzing and batch processing are performed on the offline data by using a plurality of conditions in an offline sql-like manner, so that the system execution performance can be improved, and the task output can be accelerated. In the online database application, when the query conditions are complex and the index technology is difficult to cover all conditions, the method of the application can be used for optimizing the execution sequence of each query condition to obtain the fastest sql execution speed.
An embodiment of the present application further provides an apparatus for optimizing extraction performance in multi-condition extraction, corresponding to the method for optimizing extraction performance in multi-condition extraction, where a schematic structural diagram of the apparatus is shown in fig. 4, the apparatus is corresponding to the first embodiment, and is configured to automatically determine an execution sequence of multiple conditions in a target extraction scenario using the multiple conditions, so as to improve extraction performance, and the apparatus includes:
an exhaustion unit 410, configured to exhaust different execution orders of the plurality of conditions during target extraction;
a performance parameter value calculation unit 420, configured to calculate a performance parameter value corresponding to each execution order;
a selecting unit 430, configured to select an execution order in which performance parameter values are largest as the determined execution order of the plurality of conditions of the target extraction;
an object extraction unit 440 configured to perform the object extraction in the order of execution of the plurality of conditions determined by the selection unit.
One embodiment said performance parameter values comprise: the reciprocal of the total time required for the target extraction using the plurality of conditions, the performance parameter value calculation unit 420 is configured to:
determining the execution times of each condition in different execution sequences according to the predetermined short-circuit rate of each condition, wherein the short-circuit rate of the condition is the probability that the condition is not met in the total data;
determining the total time required by the target extraction of the conditions in different execution sequences according to the execution times of the conditions and the predetermined execution time of the conditions;
calculating the reciprocal of the total time as the performance parameter value of the plurality of conditions in different execution sequences.
The exhaustion unit 410 is configured to:
obtaining a relationship between the plurality of conditions;
and exhaustively exhausting different execution sequences of the plurality of conditions for target extraction under the condition of ensuring that the relation among the plurality of conditions is unchanged.
The device further comprises:
an expression determining unit 450 for replacing each condition with a variable, and composing an expression according to a relationship between each condition;
the exhaustion unit 410 is configured to:
removing brackets in the expression by using Demorgen's law and a combination law;
and exchanging the positions of the variables in the expression by using an exchange law to exhaust different execution sequences of the conditions.
An embodiment of the present application further provides an apparatus for optimizing extraction performance in multi-condition extraction, corresponding to the method for optimizing extraction performance in the second multi-condition extraction, where a schematic structural diagram of the apparatus is shown in fig. 5, the apparatus is configured to perform a target extraction scenario by using multiple conditions, and automatically determine an execution sequence of the multiple conditions, so as to improve extraction performance, and the apparatus includes:
a judging unit 510, configured to judge whether an expression composed of current conditions includes only one condition;
an optimal execution order determining unit 520, configured to determine an optimal execution order of a continuous sum portion and/or a continuous sum or portion in the expression composed of the current conditions, when the expression composed of the current conditions includes more than one condition;
a replacing unit 530, configured to respectively use the continuous whole and part of the optimal execution sequence and/or the continuous whole or part of the optimal execution sequence as an overall condition and respectively replace with a variable to form a new expression including the variable, and transmit the new expression to the determining unit to perform the determining operation;
a reverse replacement unit 540, configured to replace the variable with the original expression to obtain an optimal execution sequence expression of the multiple conditions if the expression composed of the current conditions only includes one condition;
an object extraction unit 550 for performing the object extraction in an optimal execution order of the plurality of conditions.
The optimal execution order determination unit 520 is configured to:
respectively calculating the performance parameter value of each condition when the condition is executed last in the conditions contained in the continuous full and partial and/or continuous full or partial current;
taking the last executed condition corresponding to the maximum performance parameter value as the last executed condition in the continuous full and partial and/or continuous full or partial current contained conditions;
removing the last executed condition from the consecutive all-and-part and/or consecutive all-or-part currently contained conditions;
and repeating the operation, and gradually determining the condition which is executed last in the conditions which are contained in the continuous full and partial and/or continuous full or partial current, thereby determining the execution sequence of each condition which is contained in the continuous full and partial and/or continuous full or partial as the optimal execution sequence of the continuous full and partial and/or continuous full or partial.
The performance parameter values include: an inverse number of execution times, in a case where the execution time of each condition is constant, the optimal execution order determination unit 520 is configured to:
respectively calculating the execution times of each condition when being executed last according to the short-circuit rate of each condition;
and calculating the reciprocal of the execution times as a corresponding performance parameter value.
The device further comprises:
and the expression preprocessing unit 560 is used for removing brackets in the expression according to the Demorgen's law and the combination law.
In summary, in the embodiment of the present application, different execution orders of a plurality of conditions are exhausted, and performance parameter values in different execution orders are determined, so that the execution order with the largest performance parameter value can be selected as the execution order of the plurality of conditions for target extraction; or the optimal execution sequence of a plurality of conditions is determined in a recursion mode, so that the execution sequence of the plurality of conditions during multi-condition extraction is automatically determined, and the target extraction performance is automatically improved.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Program instructions which invoke the methods of the present application may be stored on a fixed or removable recording medium and/or transmitted via a data stream on a broadcast or other signal-bearing medium and/or stored within a working memory of a computer device operating in accordance with the program instructions. An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (12)

1. A method for optimizing extraction performance in multi-condition extraction is characterized in that the method is used for automatically determining the execution sequence of a plurality of conditions to improve the extraction performance under the scene of target extraction by using the conditions, and the method comprises the following steps:
exhausting different execution sequences of the plurality of conditions for target extraction;
calculating the performance parameter value corresponding to each execution sequence, wherein the performance parameter value comprises the following steps:
determining the execution times of each condition in different execution sequences according to the predetermined short-circuit rate of each condition, wherein the short-circuit rate of the condition is the probability that the condition is not met in the total data;
determining the total time required by the target extraction of the conditions in different execution sequences according to the execution times of the conditions and the predetermined execution time of the conditions;
calculating the reciprocal of the total time as the performance parameter value of the plurality of conditions in different execution sequences;
selecting an execution order in which performance parameter values are maximum as an execution order of the plurality of conditions of the determined target extraction;
performing the target extraction in the determined execution order of the plurality of conditions.
2. The method of claim 1, wherein said exhausting the plurality of conditions for different execution orders in object extraction comprises:
obtaining a relationship between the plurality of conditions;
and exhaustively exhausting different execution sequences of the plurality of conditions for target extraction under the condition of ensuring that the relation among the plurality of conditions is unchanged.
3. The method of claim 2, wherein the method further comprises:
replacing each condition with a variable, composing an expression of the plurality of conditions according to a relationship between the plurality of conditions;
the different execution sequences for exhaustively executing the plurality of conditions for target extraction under the condition of ensuring that the relationship among the plurality of conditions is unchanged comprise:
removing brackets in the expression by using Demorgen's law and a combination law;
and exchanging the positions of the variables in the expression by using an exchange law to exhaust different execution sequences of the conditions.
4. A method for optimizing extraction performance in multi-condition extraction is characterized in that the method is used for carrying out a target extraction scene by using a plurality of conditions, and the execution sequence of the conditions is automatically determined so as to improve the extraction performance, and the method comprises the following steps:
judging whether an expression formed by the current conditions only contains one condition;
if the expression composed of the current conditions does not only contain one condition, determining the continuous full and partial expression and/or the continuous full or partial optimal execution sequence in the expression composed of the current conditions, wherein the optimal execution sequence comprises the following steps:
respectively calculating the performance parameter value of each condition when the condition is executed last in the conditions contained in the continuous full and partial and/or continuous full or partial current;
taking the last executed condition corresponding to the maximum performance parameter value as the last executed condition in the continuous full and partial and/or continuous full or partial current contained conditions;
removing the last executed condition from the consecutive all-and-part and/or consecutive all-or-part currently contained conditions;
repeating the above operation for the conditions which are continuously all and partially and/or continuously all or partially currently contained after the condition which is executed last is removed, and gradually determining the condition which is executed last in the conditions which are continuously all and partially and/or continuously all or partially currently contained, thereby determining the execution sequence of each condition which is continuously all and partially and/or continuously all or partially contained as the optimal execution sequence of the conditions which are continuously all and partially and/or continuously all or partially contained;
respectively taking the continuous whole and part of the optimal execution sequence and/or the continuous whole or part of the optimal execution sequence as an integral condition and respectively using a variable for replacement to form a new expression containing the variable;
returning and executing the step of judging whether the expression formed by the current conditions only contains one condition or not aiming at the new expression;
if the expression composed of the current conditions only contains one condition, replacing the variable by using the original expression to obtain the optimal execution sequence expression of the conditions;
the target extraction is performed in an optimal execution order of the plurality of conditions.
5. The method of claim 4, wherein the performance parameter values comprise: the inverse number of the execution times, in the case that the execution time of each condition is constant, in the calculating the conditions currently contained in the continuous full and partial and/or continuous full or partial respectively, the performance parameter value when each condition is executed last includes:
respectively calculating the execution times of each condition when being executed last according to the short-circuit rate of each condition;
and calculating the reciprocal of the execution times as a corresponding performance parameter value.
6. The method of claim 4, wherein before determining whether the expression of the current condition includes only one condition, the method further comprises:
and removing brackets in the expression according to Demorgen's law and a binding law.
7. An apparatus for optimizing extraction performance in multi-condition extraction, the apparatus being configured to automatically determine an execution order of a plurality of conditions in a target extraction scenario using the plurality of conditions to improve extraction performance, the apparatus comprising:
the exhaustion unit is used for exhausting different execution sequences of the plurality of conditions during target extraction;
a performance parameter value calculating unit, configured to calculate a performance parameter value corresponding to each execution order, where the performance parameter value calculating unit includes:
determining the execution times of each condition in different execution sequences according to the predetermined short-circuit rate of each condition, wherein the short-circuit rate of the condition is the probability that the condition is not met in the total data;
determining the total time required by the target extraction of the conditions in different execution sequences according to the execution times of the conditions and the predetermined execution time of the conditions;
calculating the reciprocal of the total time as the performance parameter value of the plurality of conditions in different execution sequences;
a selection unit configured to select an execution order in which performance parameter values are largest as an execution order of the plurality of conditions of the determined target extraction;
an object extraction unit configured to perform the object extraction in the execution order of the plurality of conditions determined by the selection unit.
8. The apparatus of claim 7, wherein the exhaustive unit is configured to:
obtaining a relationship between the plurality of conditions;
and exhaustively exhausting different execution sequences of the plurality of conditions for target extraction under the condition of ensuring that the relation among the plurality of conditions is unchanged.
9. The apparatus of claim 8, wherein the apparatus further comprises:
an expression determination unit configured to replace each condition with a variable, and compose expressions of the plurality of conditions according to a relationship between the plurality of conditions;
the exhaustion unit is configured to:
removing brackets in the expression by using Demorgen's law and a combination law;
and exchanging the positions of the variables in the expression by using an exchange law to exhaust different execution sequences of the conditions.
10. An apparatus for optimizing extraction performance in multi-condition extraction, the apparatus being configured to perform a target extraction scenario using a plurality of conditions, and automatically determining an execution order of the plurality of conditions to improve extraction performance, the apparatus comprising:
the judging unit is used for judging whether the expression formed by the current conditions only contains one condition;
an optimal execution order determination unit, configured to determine an optimal execution order of a continuous whole and part and/or a continuous whole or part in an expression composed of current conditions, in case that the expression composed of current conditions includes more than one condition, configured to:
respectively calculating the performance parameter value of each condition when the condition is executed last in the conditions contained in the continuous full and partial and/or continuous full or partial current;
taking the last executed condition corresponding to the maximum performance parameter value as the last executed condition in the continuous full and partial and/or continuous full or partial current contained conditions;
removing the last executed condition from the consecutive all-and-part and/or consecutive all-or-part currently contained conditions;
repeating the above operation for the conditions which are continuously all and partially and/or continuously all or partially currently contained after the condition which is executed last is removed, and gradually determining the condition which is executed last in the conditions which are continuously all and partially and/or continuously all or partially currently contained, thereby determining the execution sequence of each condition which is continuously all and partially and/or continuously all or partially contained as the optimal execution sequence of the conditions which are continuously all and partially and/or continuously all or partially contained;
the replacing unit is used for respectively using the continuous whole and part of the optimal execution sequence and/or the continuous whole or part of the optimal execution sequence as an integral condition and respectively using a variable for replacing to form a new expression containing the variable, and transmitting the new expression to the judging unit to execute the judging operation;
the reverse replacement unit is used for replacing the variable back by using the original expression under the condition that the expression formed by the current conditions only contains one condition to obtain the optimal execution sequence expression of the conditions;
an object extraction unit configured to perform the object extraction in an optimal execution order of the plurality of conditions.
11. The apparatus of claim 10, wherein the performance parameter values comprise: an inverse number of execution times, in a case where the execution time of each condition is constant, the optimal execution order determination unit is configured to:
respectively calculating the execution times of each condition when being executed last according to the short-circuit rate of each condition;
and calculating the reciprocal of the execution times as a corresponding performance parameter value.
12. The apparatus of claim 10, wherein the apparatus further comprises:
and the expression preprocessing unit is used for removing brackets in the expression according to the Demorgen's law and the combination law.
CN201610034045.XA 2016-01-19 2016-01-19 Method and device for optimizing extraction performance in multi-condition extraction Active CN106980865B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610034045.XA CN106980865B (en) 2016-01-19 2016-01-19 Method and device for optimizing extraction performance in multi-condition extraction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610034045.XA CN106980865B (en) 2016-01-19 2016-01-19 Method and device for optimizing extraction performance in multi-condition extraction

Publications (2)

Publication Number Publication Date
CN106980865A CN106980865A (en) 2017-07-25
CN106980865B true CN106980865B (en) 2020-06-02

Family

ID=59340767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610034045.XA Active CN106980865B (en) 2016-01-19 2016-01-19 Method and device for optimizing extraction performance in multi-condition extraction

Country Status (1)

Country Link
CN (1) CN106980865B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6876999B2 (en) * 2001-04-25 2005-04-05 International Business Machines Corporation Methods and apparatus for extraction and tracking of objects from multi-dimensional sequence data
CN103093037A (en) * 2012-12-27 2013-05-08 东北电网有限公司 Electric system splitting fracture surface searching method based on master-slave problem alternating optimization
CN103164495A (en) * 2011-12-19 2013-06-19 中国人民解放军63928部队 Half-connection inquiry optimizing method based on periphery searching and system thereof
US8571262B2 (en) * 2006-01-25 2013-10-29 Abbyy Development Llc Methods of object search and recognition
CN105159971A (en) * 2015-08-26 2015-12-16 成都布林特信息技术有限公司 Cloud platform data retrieval method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6493701B2 (en) * 2000-11-22 2002-12-10 Sybase, Inc. Database system with methodogy providing faster N-ary nested loop joins
CA2374271A1 (en) * 2002-03-01 2003-09-01 Ibm Canada Limited-Ibm Canada Limitee Redundant join elimination and sub-query elimination using subsumption
CN101419625B (en) * 2008-12-02 2012-11-28 西安交通大学 Deep web self-adapting crawling method based on minimum searchable mode
CN102163195B (en) * 2010-02-22 2013-04-24 北京东方通科技股份有限公司 Query optimization method based on unified view of distributed heterogeneous database
CN102467563A (en) * 2010-11-19 2012-05-23 金蝶软件(中国)有限公司 Data retrieval method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6876999B2 (en) * 2001-04-25 2005-04-05 International Business Machines Corporation Methods and apparatus for extraction and tracking of objects from multi-dimensional sequence data
US8571262B2 (en) * 2006-01-25 2013-10-29 Abbyy Development Llc Methods of object search and recognition
CN103164495A (en) * 2011-12-19 2013-06-19 中国人民解放军63928部队 Half-connection inquiry optimizing method based on periphery searching and system thereof
CN103093037A (en) * 2012-12-27 2013-05-08 东北电网有限公司 Electric system splitting fracture surface searching method based on master-slave problem alternating optimization
CN105159971A (en) * 2015-08-26 2015-12-16 成都布林特信息技术有限公司 Cloud platform data retrieval method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Logic-based query optimization for object databases;J.Grant,and etc;《IEEE Transactions on Knowledge and Data Engineering 》;20000430;第12卷(第4期);第529-547页 *
一种嵌入式实时数据库系统查询优化算法;宋静静等;《计算机工程》;20070630;第33卷(第11期);第90-92页 *

Also Published As

Publication number Publication date
CN106980865A (en) 2017-07-25

Similar Documents

Publication Publication Date Title
Li et al. Recurrent feature reasoning for image inpainting
CN112866799A (en) Video frame extraction processing method, device, equipment and medium
CN112862728B (en) Artifact removal method, device, electronic equipment and storage medium
CN109685805B (en) Image segmentation method and device
CN110148088A (en) Image processing method, image rain removing method, device, terminal and medium
CN109859314A (en) Three-dimensional rebuilding method, device, electronic equipment and storage medium
CN112200887B (en) Multi-focus image fusion method based on gradient sensing
CN106909454B (en) Rule processing method and equipment
CN111383191A (en) Image processing method and device for repairing blood vessel fracture
CN106980865B (en) Method and device for optimizing extraction performance in multi-condition extraction
CN112801890B (en) Video processing method, device and equipment
CN108628889B (en) Time slice-based data sampling method, system and device
CN109886963B (en) Image processing method and system
CN111510109B (en) Signal filtering method, device, equipment and medium
CN116228544B (en) Image processing method and device and computer equipment
CN107833232B (en) Image detail extraction method and device, electronic equipment and computer storage medium
CN116052168A (en) Cross-domain semantic segmentation model generation method and device based on single-target domain image
CN112906558B (en) Image feature extraction method and device, computer equipment and storage medium
CN112579833B (en) Service association relation acquisition method and device based on user operation data
CN113887458A (en) Training method and device, computer equipment and storage medium
CN112069359B (en) Method for dynamically filtering abnormal data of snapshot object comparison result
CN113516238A (en) Model training method, denoising method, model, device and storage medium
CN112862726B (en) Image processing method, device and computer readable storage medium
CN116503294B (en) Cultural relic image restoration method, device and equipment based on artificial intelligence
CN110930441A (en) Image processing method, device and equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant