CN113297306B - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN113297306B
CN113297306B CN202011061918.9A CN202011061918A CN113297306B CN 113297306 B CN113297306 B CN 113297306B CN 202011061918 A CN202011061918 A CN 202011061918A CN 113297306 B CN113297306 B CN 113297306B
Authority
CN
China
Prior art keywords
algorithm
target object
processing result
processing
operators
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011061918.9A
Other languages
Chinese (zh)
Other versions
CN113297306A (en
Inventor
师超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Cloud Computing Ltd filed Critical Alibaba Cloud Computing Ltd
Priority to CN202011061918.9A priority Critical patent/CN113297306B/en
Publication of CN113297306A publication Critical patent/CN113297306A/en
Application granted granted Critical
Publication of CN113297306B publication Critical patent/CN113297306B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a data processing method and a data processing device, wherein the data processing method comprises the steps of receiving a target object processing request generated by a user based on a structured query language and a target object; analyzing the target object processing request to obtain an algorithm identifier for processing the target object; acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier; processing the target object according to the algorithm and obtaining a processing result of the target object; according to the data processing method, under the condition of processing the unstructured target object, the unstructured target object is processed based on the operators in the algorithm determined from the preset algorithm library and the execution sequence among the operators, so that the processing result of the unstructured target object is accurately and quickly obtained, and the user experience is improved.

Description

Data processing method and device
Technical Field
The embodiment of the specification relates to the technical field of computers, in particular to a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium.
Background
Existing analytical databases provide only the ability to analyze structured data, which is highly organized and well-formatted data, and which is information that can be represented by data or a uniform structure, such as numbers, symbols. For unstructured data such as videos, images and texts, the existing analytical databases cannot be analyzed, and the proportion of the unstructured data in the analytical databases is only increased in the future, so that a data processing method capable of analyzing the unstructured data of the databases is urgently needed.
Disclosure of Invention
In view of this, the embodiments of the present disclosure provide a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, and a computer-readable storage medium to address technical deficiencies in the prior art.
According to a first aspect of embodiments herein, there is provided a data processing method comprising:
receiving a target object processing request generated by a user based on a structured query language and a target object;
analyzing the target object processing request to obtain an algorithm identifier for processing the target object;
acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier;
and processing the target object according to the algorithm, and obtaining a processing result of the target object.
Optionally, before receiving the target object processing request generated by the user based on the structured query language and the target object, the method further includes:
receiving an algorithm creating request generated by the user based on a structured query language;
parsing the algorithm creation request to determine operators for creating the algorithm and an execution order among the operators;
and creating the algorithm based on the operators and the execution sequence among the operators, and creating an algorithm interface based on the algorithm.
Optionally, after the creating the algorithm based on the operators and the execution order among the operators, the method further includes:
setting corresponding algorithm identification for the algorithm, and storing the algorithm and the algorithm identification corresponding to the algorithm into the preset algorithm library; and
and generating a processing result data table set corresponding to the algorithm based on the algorithm.
Optionally, the algorithm includes at least two operators and an execution order between the at least two operators;
correspondingly, the processing the target object according to the algorithm and obtaining the processing result of the target object includes:
and processing the target object according to the at least two operators and the execution sequence between the at least two operators, and obtaining a processing result of at least one operator for the target object.
Optionally, after obtaining a processing result of the at least one operator for the target object, the method further includes:
judging whether operators obtaining the processing results of the target object have corresponding processing result data tables in a database or not;
and if so, storing the processing result of the target object into a corresponding processing result data table under the condition of receiving the storage instruction of the user.
Optionally, after determining whether the corresponding processing result data tables exist in the database for the operators that obtain the processing results of the target object, the method further includes:
and under the condition that the operator for obtaining the processing result of the target object does not have a corresponding processing result data table in the database, generating a corresponding processing result data table based on the processing result of the target object, and storing the processing result data table to the database.
Optionally, after determining whether the corresponding processing result data tables exist in the database for the operators obtaining the processing results of the target object, the method further includes:
when any one of the operators for obtaining the processing result of the target object does not have a corresponding processing result data table in the database, creating a corresponding processing result data table in the database for the operator without the corresponding processing result data table in the database;
and storing the processing result of the operator which does not have the corresponding processing result data table in the database aiming at the target object into the corresponding processing result data table.
Optionally, after obtaining the processing result of the target object, the method further includes:
and generating a corresponding processing result data table based on the processing result of the target object, and storing the processing result data table to a database.
Optionally, the obtaining, from a preset algorithm library based on the algorithm identifier, an algorithm corresponding to the algorithm identifier includes:
and calling the algorithm interface, and acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier.
Optionally, the target object is unstructured data, including pictures, videos, voice and/or texts.
According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:
the request receiving module is configured to receive a target object processing request generated by a user based on a structured query language and a target object;
the analysis module is configured to analyze the target object processing request to obtain an algorithm identifier for processing the target object;
the algorithm obtaining module is configured to obtain an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier;
and the processing module is configured to process the target object according to the algorithm and obtain a processing result of the target object.
Optionally, the apparatus further includes:
an algorithm creation module configured to:
receiving an algorithm creating request generated by the user based on a structured query language;
parsing the algorithm creation request to determine operators for creating the algorithm and an execution order among the operators;
and creating the algorithm based on the operators and the execution sequence among the operators, and creating an algorithm interface based on the algorithm.
Optionally, the apparatus further includes:
a storage module configured to:
setting corresponding algorithm identification for the algorithm, and storing the algorithm and the algorithm identification corresponding to the algorithm into the preset algorithm library; and
and generating a processing result data table set corresponding to the algorithm based on the algorithm.
Optionally, the algorithm includes at least two operators and an execution order between the at least two operators;
accordingly, the processing module is further configured to:
and processing the target object according to the at least two operators and the execution sequence between the at least two operators, and obtaining a processing result of at least one operator for the target object.
Optionally, the apparatus further includes:
a determination module configured to:
judging whether the operators obtaining the processing results of the target object all have corresponding processing result data tables in a database or not;
and if so, storing the processing result of the target object into a corresponding processing result data table under the condition of receiving the storage instruction of the user.
Optionally, the determining module is further configured to:
and under the condition that the operator for obtaining the processing result of the target object does not have a corresponding processing result data table in the database, generating a corresponding processing result data table based on the processing result of the target object, and storing the processing result data table to the database.
Optionally, the determining module is further configured to:
when any one of the operators for obtaining the processing result of the target object does not have a corresponding processing result data table in the database, creating a corresponding processing result data table in the database for the operator without the corresponding processing result data table in the database;
and storing the processing result of the operator which does not have the corresponding processing result data table in the database aiming at the target object into the corresponding processing result data table.
Optionally, the apparatus further includes:
the storage module is configured to generate a corresponding processing result data table based on the processing result of the target object and store the processing result data table to a database.
Optionally, the algorithm obtaining module is further configured to:
and calling the algorithm interface, and acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier.
Optionally, the target object is unstructured data, including pictures, videos, voice and/or texts.
According to a third aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is for storing computer-executable instructions and the processor is for executing the computer-executable instructions, which when executed by the processor, implement the steps of the data processing method.
According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data processing method.
One embodiment of the specification realizes a data processing method and a data processing device, wherein the data processing method comprises the steps of receiving a target object processing request generated by a user based on a structured query language and a target object; analyzing the target object processing request to obtain an algorithm identifier for processing the target object; acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier; processing the target object according to the algorithm, and obtaining a processing result of the target object; according to the data processing method, under the condition of processing the unstructured target object, the unstructured target object is processed based on the operators in the algorithm determined from the preset algorithm library and the execution sequence among the operators, so that the processing result of the unstructured target object is accurately and quickly obtained, and the user experience is improved.
Drawings
FIG. 1 is a schematic diagram of a database structure of a data processing method in a specific application according to an embodiment of the present disclosure;
FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present specification;
FIG. 3 is a flowchart illustrating a processing procedure of a data processing method according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;
fig. 5 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present specification. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
Structuring data: structured data, also referred to as row data, is data logically represented and implemented by a two-dimensional table structure, strictly following data format and length specifications, and is generally stored and managed by a relational database.
Unstructured data: the unstructured data is data which is irregular or incomplete in data structure and has no predefined data, namely, data which is inconvenient to be represented by a database two-dimensional logic table; the system comprises office documents, texts, pictures, XML (English: eXtensible Markup Language, XML for short), HTML (HTML for short: hyperText Markup Language), various reports, images, audio and video information and the like in all formats.
Operator: operator, simply speaking, performs some kind of "operation" action. Correspondingly, the operation object is called operand, operand; the operator is a mapping of function space to function space O: x → X.
In the present specification, a data processing method is provided, and the present specification relates to a data processing apparatus, a computing device, and a computer-readable storage medium, which are described in detail one by one in the following embodiments.
The traditional database can only support the query of the structured data, for the unstructured data such as video, image, text and the like, the traditional database cannot analyze the content of the unstructured data, and if the unstructured data is analyzed by using the database, the unstructured data needs to be converted into the structured data by using an AI (artificial intelligence) algorithm, and then the data is analyzed.
In order to analyze unstructured data, in the data processing method provided in this specification, different UDFs (english: user defined function, abbreviated as UDF, chinese: user defined function) may be provided in a database to convert unstructured data into structured data, where a UDF is a high-level interface, and each UDF allows a user to assemble different operators, such as a UDF for providing face recognition, a UDF for gender recognition, and a UDF for face age prediction, and in practical applications, which UDF needs to be used to directly perform corresponding UDF interface call, but some demands may be satisfied by providing a very large number of UDFs, for example, some users need video face recognition, and some users need image face recognition, and then both a video face recognition UDF and an image face recognition UDF need to be provided; a plurality of UDFs inevitably cause unnecessary resource waste, for example, a user needs to recognize both the face and the face gender, two UDFs need to be called, and both the two UDFs depend on the face target detection module, in this case, the face target detection module is called twice repeatedly, and unnecessary computing resource waste is caused; in addition, when a plurality of UDFs are used and unstructured data needs to be read for a plurality of times, if the unstructured data are images and video files, the images and the video files are usually large, and unnecessary resource waste is caused; after the unstructured data are analyzed based on the UDF, a user needs to manually establish different tables according to the used UDF, so that the analysis result of the unstructured data can be stored in the different tables, and the user experience is poor.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating a database structure of a data processing method in a specific application according to an embodiment of the present disclosure.
In fig. 1, a Database includes an SQL (Structured Query Language) layer, a Database unstructured analysis engine and other Database modules, wherein the Database unstructured analysis engine includes a DAG (Database Availability Group) parser, a DAG engine and an unstructured operator library, a series of basic unstructured data operators can be provided in the unstructured operator library, for example, operators such as video decoding, image decoding, face detection, face feature extraction, face gender, age identification, image target detection, image feature extraction, etc., the DAG engine can combine the basic unstructured data operators into one or more computation flows for unstructured data analysis and operate the computation flows when receiving a processing request for unstructured data next, and the DAG parser can configure the computation flows by a user through an SQL (Structured Query Language) algorithm.
The data processing method provided by the embodiment of the specification can be applied to the database to realize the analysis and processing of the unstructured data.
Referring to fig. 2, fig. 2 shows a flowchart of a data processing method provided in an embodiment of the present specification, which specifically includes the following steps.
Step 202: and receiving a target object processing request generated by a user based on the structured query language and the target object.
The target object is unstructured data and comprises pictures, videos, voice and/or texts. And XML, HTML, various reports and the like can also be included.
If the structured query language is the SQL language, receiving the target object processing request generated by the user based on the structured query language may be understood as receiving the target object processing request generated by the user based on the SQL language, that is, the target object processing request is the SQL language. For example, the target object handling request is: select pipeline _ run _ insert (< pipeline _ name >, < data/data _ url >), i.e., which algorithm to implement the processing of the target object.
Taking the target object as an example of a picture, the method receives a target object processing request and a target object generated by a user based on a structured query language, that is, the method can be understood as receiving a picture processing request and a picture generated by a user based on an SQL language.
In another embodiment of this specification, to save invocation of UDF and save computational resources, before receiving a target object processing request generated by a user based on a structured query language and a target object, the method further includes:
receiving an algorithm creating request generated by the user based on a structured query language;
parsing the algorithm creation request to determine operators for creating the algorithm and an execution order among the operators;
and creating the algorithm based on the operators and the execution sequence among the operators, and creating an algorithm interface based on the algorithm.
Specifically, receiving an algorithm creation request generated by a user based on a structured query language may be understood as receiving an algorithm creation request generated by a user based on an SQL language, where the algorithm creation request is, for example: select pipeline _ create (< pipeline _ name >, < pipeline _ config _ yaml >), and describe the algorithm flow by yaml.
When the method is specifically implemented, after an algorithm creating request generated by a user based on an SQL language is received, the algorithm creating request is analyzed to obtain an operator corresponding to the algorithm in the algorithm creating request and an execution sequence between the operators through analysis, then the algorithm is created based on the operator and the execution sequence between the operators, and an algorithm interface is created based on the algorithm, wherein the execution sequence between the operators can be understood as which operator is executed first in practical application, for example, the algorithm is a face detection algorithm, the algorithm comprises a face feature detection operator and a face age identification operator, when the face is actually detected, the face feature detection operator can be executed first, then the face age identification operator is executed based on the face features, so that the calculation process of the whole algorithm is saved, and the calculation efficiency is improved.
In the embodiment of the specification, after an algorithm creating request generated by a user based on an SQL language is received, an operator for creating an algorithm and an execution sequence between operators are determined based on the algorithm creating request, the algorithm is created based on the execution sequence between the operators, and subsequently, under the condition that the user processes a request aiming at a target object, multi-operator processing can be performed on the target object based on the UDF of one algorithm, so that the utilization rate of the UDF is greatly improved, and computing resources are improved.
In another embodiment of the present specification, after the creating the algorithm based on the operators and the execution order among the operators, the method further includes:
setting corresponding algorithm identification for the algorithm, and storing the algorithm and the algorithm identification corresponding to the algorithm into the preset algorithm library; and
and generating a processing result data table set corresponding to the algorithm based on the algorithm.
Specifically, in order to accurately identify the algorithm, after the algorithm is created, a unique algorithm identifier is set for each algorithm, then the algorithm and the corresponding algorithm identifier are stored in a preset algorithm library, and the algorithm can be called quickly and accurately directly based on the algorithm identifier when the algorithm is called subsequently.
In addition, after the algorithm is created, a corresponding processing result data table set can be automatically generated based on the created algorithm, wherein the processing result data table set comprises a corresponding processing result data table which is automatically generated based on each operator in the created algorithm, and when the target object is processed based on the created algorithm, the processing result can be directly stored in the processing result data table corresponding to the operator, and the processing result does not need to be generated in a data table, so that the processing result can be quickly stored.
Step 204: and analyzing the target object processing request to obtain an algorithm identification for processing the target object.
Specifically, after receiving a target object processing request and a target object generated by a user based on an SQL language, the target object processing request is parsed to obtain an algorithm identifier for processing the target object.
Still taking the target object as an example of the picture, after receiving the SQL processing request for the picture, the SQL processing request is parsed to obtain the algorithm identifier for processing the picture.
Step 206: and acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier.
Wherein the algorithm comprises at least two operators and an execution order between the at least two operators.
Specifically, after the algorithm identifier in the target object processing request is determined, the algorithm corresponding to the algorithm identifier may be obtained from a preset algorithm library based on the algorithm identifier, so as to determine an operator of the algorithm and an execution sequence between the operators.
In specific implementation, the obtaining of the algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier includes:
and calling the algorithm interface, and acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier.
In the embodiment of the specification, after the algorithm is created, an algorithm interface is created for the algorithm based on the created algorithm, when the algorithm interface is actually called, the algorithm interface is directly called, and the algorithm corresponding to the algorithm identifier is obtained from the preset algorithm library according to the algorithm identifier.
Step 208: and processing the target object according to the algorithm, and obtaining a processing result of the target object.
Specifically, the algorithm comprises at least two operators and an execution sequence between the at least two operators;
correspondingly, the processing the target object according to the algorithm and obtaining the processing result of the target object includes:
and processing the target object according to the at least two operators and the execution sequence between the at least two operators, and obtaining a processing result of at least one operator for the target object.
In a specific implementation, each algorithm includes at least two operators and an execution order between the at least two operators, and then after determining an algorithm for processing a target object, the target object may be processed based on the at least two operators in the algorithm and the execution order between the at least two operators, and a processing result of the at least one operator for the target object is obtained. In practical applications, since not every operator can output the processing result of the target object, for example, some operators may only be one intermediate step, when the target object is processed based on at least two operators, it is also possible to obtain the processing result of only one of the operators for the target object.
Still taking a target object as an example, if the target object processing request is to identify a face feature, a face gender and a face age in a picture, analyzing the target object processing request to obtain an algorithm identifier carried in the target object processing request and used for processing the picture, then obtaining an algorithm corresponding to the algorithm identifier in a preset algorithm library based on the algorithm identifier, then extracting a face feature in the picture by using a face feature extraction operator, a face gender identification operator and a face age identification operator in the algorithm, and implementing the processing of the picture by using an execution sequence among the three operators, that is, extracting the face feature in the picture by using the face feature extraction operator, identifying the face gender in the picture by using the face gender identification operator, identifying the face age in the picture by using the face age identification operator, and finally obtaining the face feature, the face gender and the face feature in the picture.
In the embodiment of the specification, each algorithm is composed of a plurality of operators and execution sequences among the operators, so that after the algorithm is determined, accurate and rapid processing of the target object can be achieved based on the specific operators in the algorithm and the execution sequences among the operators, and user experience is improved.
In another embodiment of this specification, after obtaining the processing result of the target object, the method further includes:
and generating a corresponding processing result data table from the processing result of the target object in a table form, and storing the processing result data table into a database.
In practical application, because data in the database all exist in a table form, if the processing result is directly stored in the database, the table needs to be built based on the processing result, then the processing result is stored in the table and then stored in the database, and thus the storage flow of the processing result becomes complicated.
Specifically, each algorithm includes a plurality of operators, and each operator has different processing for a target object, and the obtained processing results are also different, for example, a face recognition operator is used to process a picture, and the obtained result is a face in the picture; and processing the picture by adopting a human face feature extraction operator, wherein the obtained result is the human face feature in the picture.
In practical application, in order to ensure that the processing result of each operator for the target object can be stored in the database in a classified manner, and facilitate a later-stage user to conveniently query the processing result of the target object based on the database, the processing result of each operator for the target object is stored in the database in a classified manner, and the specific implementation manners include the following:
in another embodiment of this specification, after obtaining the processing result of each operator for the target object, the method further includes:
judging whether operators obtaining the processing results of the target object have corresponding processing result data tables in a database or not;
and if so, storing the processing result of the target object into a corresponding processing result data table under the condition of receiving the storage instruction of the user.
Specifically, in order to implement that the processing results of the operators obtaining the processing results of the target object are stored in the database in a classified manner, a corresponding processing result data table is established in the database for each operator, and then the processing results of each operator for the target object are stored in the corresponding processing result data table; in practical application, since the operator can be flexibly added in the preset algorithm library according to actual algorithm requirements, if the operator is a new operator, a corresponding processing result data table may not exist in the database. Therefore, before storing the processing result of each operator for the target object in the database, whether the operator obtaining the processing result of the target object has a corresponding processing result data table in the database is further judged, if yes, the processing result of the target object can be stored in the corresponding processing result data table under the condition that the storage instruction of the user is received, and in such a way, the user can determine whether the obtained processing result is stored in the processing result data table or directly returns the processing result, so that the processing result of the target object is stored in such an interactive way, the interactive experience of the user is greatly improved, and the subsequent search of the user is facilitated through the storage of the table.
If there is no corresponding processing result data table in the database for the operator that obtains the processing result of the target object, then under the condition of obtaining the processing result of each operator for the target object, the processing results may be respectively generated in a table form into processing result data tables corresponding to the operators, and stored in the database, and the specific implementation manner is as follows:
after judging whether the operators obtaining the processing results of the target object all have corresponding processing result data tables in the database, the method further comprises the following steps:
and under the condition that the operator for obtaining the processing result of the target object does not have a corresponding processing result data table in the database, generating a corresponding processing result data table based on the processing result of the target object, and storing the processing result data table to the database.
In this embodiment of the present specification, in a case that no corresponding processing result data table exists in the database for the operator that obtains the processing result of the target object, it is not necessary to set up a processing result data table corresponding to the operator in the database when obtaining the processing result of the target object, but the processing result of each operator for the target object may be directly stored in the database in a form of a table corresponding to the operator, so that the processing flow is saved, and the storage efficiency is improved.
In another embodiment of this specification, if there is a portion of operators that obtain processing results of the target object and there is a corresponding processing result data table in the database, and there is a portion of operators that obtain processing results of the target object and there is no corresponding processing result data table in the database, then to ensure that the processing result of each operator that obtains processing results of the target object for the target object can be stored in the corresponding processing result data table, when there is no corresponding processing result data table in the database, a corresponding processing result data table is newly created for the operator in the database, and the specific implementation manner is as follows:
after judging whether the operators obtaining the processing results of the target object all have corresponding processing result data tables in the database, the method further comprises the following steps:
when any one of the operators obtaining the processing result of the target object does not have a corresponding processing result data table in the database, creating a corresponding processing result data table in the database for the operator without the corresponding processing result data table in the database;
and storing the processing result of the operator which does not have the corresponding processing result data table in the database aiming at the target object into the corresponding processing result data table.
In this embodiment of the present specification, in order to ensure that the processing result of the target object is stored in a classified manner when a corresponding processing result data table does not exist in the database for a certain operator that obtains the processing result of the target object, a corresponding processing result data table may be newly created for an operator that does not exist in the corresponding processing result data table in the database, so that the processing result of each operator for the target object may be stored in the corresponding processing result data table, and thus, confusion of data storage in the database is avoided.
The data processing method of the embodiment of the specification comprises the steps of receiving a target object processing request generated by a user based on a structured query language and a target object; analyzing the target object processing request to obtain an algorithm identifier for processing the target object; acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier; processing the target object according to the algorithm and obtaining a processing result of the target object; under the condition of processing the unstructured target object, the data processing method processes the unstructured target object based on the operator in the algorithm determined from the preset algorithm library and the execution sequence among the operators, so that the processing result of the unstructured target object is accurately and quickly obtained, and the user experience is improved.
The following description will further explain the data processing method provided in this specification by taking an application of the data processing method in a database as an example, with reference to fig. 3. Fig. 3 is a flowchart of a processing procedure of a data processing method according to an embodiment of the present specification, which specifically includes the following steps.
Step 302: an algorithm flow is created through SQL.
The algorithm flow can be regarded as the algorithm of the above embodiment.
Specifically, an SQL statement for creating an algorithm flow is received, and an algorithm is created based on the SQL statement and a plurality of operators in an algorithm library, for example, an image algorithm is created, where the image algorithm includes an image feature extraction operator, a face detection operator, and an image object detection operator.
Step 304: and establishing a table group for the specified flow through SQL.
Specifically, after the algorithm flows are created, a corresponding table group may be created for each algorithm flow based on each algorithm flow, so that after the target object processing is completed through the algorithm flows, the processing result is stored in the table corresponding to the table group.
Still taking the creation of an image algorithm as an example, after the creation of the algorithm is completed, receiving an SQL request for creating a table set of the algorithm, creating a corresponding table set for the image algorithm, that is, creating a table 1 for an image feature extraction operator, creating a table 2 for an image object detection operator, creating a table 3 for a face detection operator, where the table 1, the table 2, and the table 3 form the table set of the image algorithm.
Step 306: the algorithm flow is run through SQL.
Specifically, after the algorithm flows are created and the corresponding table groups are created for each algorithm flow, the target object processing may be performed using the algorithm flows.
In the embodiment of the present specification, taking a target object as an image file as an example, after receiving an SQL processing request for the image file, obtaining an image algorithm, that is, an image feature extraction operator, a face detection operator, and an image object detection operator, from an algorithm library by analyzing an image algorithm identifier in the obtained SQL processing request. Firstly, image decoding is carried out on an image file, then image features are obtained from a decoded image based on an image feature extraction operator, face feature extraction and gender age identification are realized from the decoded image based on a face detection operator so as to obtain face features, gender ages and face positions, and object categories and object positions are obtained from the decoded image based on an image object detection operator.
Step 308: the algorithm flow outputs a write table set.
And the output of the algorithm process is the processing result of each operator for the image file.
Specifically, after the processing result of each operator for the image file is obtained based on the above algorithm flow, the processing results are respectively written into the table corresponding to each operator, for example, the image feature is written into table 1 corresponding to the image feature extraction operator, the object type and the object position are written into table 2 corresponding to the image object detection operator, the face feature, the gender, the age, and the face position are written into table 3 corresponding to the face detection operator, and finally, the table group formed by table 1, table 2, and table 3 is stored into the database.
In the embodiment of the specification, end-to-end unstructured data analysis (namely, picture, video and text analysis) is realized in a database by using SQL, an algorithm flow interface (namely, the algorithm flow interface created by the basic operator in the algorithm library), a table interface (based on the algorithm flow) and an algorithm flow operation interface are provided in the SQL interface, and through the three interfaces, a user can quickly build unstructured analysis application through SQL to realize image searching, face retrieval and the like.
When the data processing method is specifically implemented, the algorithm flow creation, the table group creation corresponding to the algorithm flow, and the analysis of the image of the unstructured data can be realized by adopting the data processing method in the embodiment of the specification, a user does not need to additionally write data before and after processing codes, and does not need to serially connect a plurality of processing models, different algorithm flows can be created according to the actual requirements of the user directly based on a series of basic operators, in the subsequent use, the user can call different algorithm flows through the same UDF, the resources are saved, the call frequency is greatly reduced, the table can be automatically created through analyzing the algorithm flows, the output of the stored algorithm flows is directly written into the table, the storage flows are reduced, the storage efficiency is improved, and the user experience is improved.
Corresponding to the above method embodiment, this specification further provides a data processing apparatus embodiment, and fig. 4 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of this specification. As shown in fig. 4, the apparatus includes:
a request receiving module 402 configured to receive a target object processing request generated by a user based on a structured query language and a target object;
a parsing module 404 configured to parse the target object processing request to obtain an algorithm identifier for processing the target object;
an algorithm obtaining module 406 configured to obtain, from a preset algorithm library, an algorithm corresponding to the algorithm identifier based on the algorithm identifier;
the processing module 408 is configured to process the target object according to the algorithm and obtain a processing result of the target object.
Optionally, the apparatus further includes:
an algorithm creation module configured to:
receiving an algorithm creating request generated by the user based on a structured query language;
analyzing the algorithm creating request to determine operators for creating the algorithm and an execution sequence among the operators;
and creating the algorithm based on the operators and the execution sequence among the operators, and creating an algorithm interface based on the algorithm.
Optionally, the apparatus further includes:
a storage module configured to:
setting corresponding algorithm identification for the algorithm, and storing the algorithm and the algorithm identification corresponding to the algorithm into the preset algorithm library; and
and generating a processing result data table set corresponding to the algorithm based on the algorithm.
Optionally, the algorithm includes at least two operators and an execution order between the at least two operators;
accordingly, the processing module 408 is further configured to:
and processing the target object according to the at least two operators and the execution sequence between the at least two operators, and obtaining a processing result of at least one operator for the target object.
Optionally, the apparatus further includes:
a determination module configured to:
judging whether operators obtaining the processing results of the target object have corresponding processing result data tables in a database or not;
and if so, storing the processing result of the target object into a corresponding processing result data table under the condition of receiving the storage instruction of the user.
Optionally, the determining module is further configured to:
and under the condition that the operator for obtaining the processing result of the target object does not have a corresponding processing result data table in the database, generating a corresponding processing result data table based on the processing result of the target object, and storing the processing result data table to the database.
Optionally, the determining module is further configured to:
when any one of the operators obtaining the processing result of the target object does not have a corresponding processing result data table in the database, creating a corresponding processing result data table in the database for the operator without the corresponding processing result data table in the database;
and storing the processing result of the operator which does not have the corresponding processing result data table in the database aiming at the target object into the corresponding processing result data table.
Optionally, the apparatus further includes:
the storage module is configured to generate a corresponding processing result data table based on the processing result of the target object and store the processing result data table to a database.
Optionally, the algorithm obtaining module 406 is further configured to:
and calling the algorithm interface, and acquiring the algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier.
Optionally, the target object is unstructured data, including pictures, videos, voice and/or texts.
The data processing device of the embodiment of the specification comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a target object processing request generated by a user based on a structured query language and a target object; analyzing the target object processing request to obtain an algorithm identifier for processing the target object; acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier; processing the target object according to the algorithm and obtaining a processing result of the target object; under the condition of processing the unstructured target object, the data processing method processes the unstructured target object based on the operator in the algorithm determined from the preset algorithm library and the execution sequence among the operators, so that the processing result of the unstructured target object is accurately and quickly obtained, and the user experience is improved.
The above is a schematic configuration of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.
FIG. 5 illustrates a block diagram of a computing device 500, provided in accordance with one embodiment of the present specification. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.
Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device structure shown in FIG. 5 is for illustration purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.
Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.
Wherein the processor 520 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the data processing method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.
An embodiment of the present specification also provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the data processing method.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, and to thereby enable others skilled in the art to best understand the specification and utilize the specification. The specification is limited only by the claims and their full scope and equivalents.

Claims (12)

1. A data processing method is applied to a database and comprises the following steps:
receiving a target object processing request generated by a user based on a structured query language and a target object, wherein the target object is unstructured data;
analyzing the target object processing request to obtain an algorithm identifier for processing the target object;
calling an algorithm interface, and acquiring an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier, wherein the algorithm interface is created based on the algorithm corresponding to the algorithm identifier, and the algorithm comprises at least two operators and an execution sequence between the at least two operators;
and processing the target object according to the algorithm, and obtaining a processing result of the target object.
2. The data processing method of claim 1, before receiving the target object processing request generated by the user based on the structured query language and the target object, further comprising:
receiving an algorithm creating request generated by the user based on a structured query language;
parsing the algorithm creation request to determine operators for creating the algorithm and an execution order among the operators;
and creating the algorithm based on the operators and the execution sequence among the operators, and creating an algorithm interface based on the algorithm.
3. The data processing method of claim 2, further comprising, after creating the algorithm based on the operators and an execution order between the operators:
setting corresponding algorithm identification for the algorithm, and storing the algorithm and the algorithm identification corresponding to the algorithm into the preset algorithm library; and
and generating a processing result data table set corresponding to the algorithm based on the algorithm.
4. The data processing method of claim 1, wherein the processing the target object according to the algorithm and obtaining the processing result of the target object comprises:
and processing the target object according to the at least two operators and the execution sequence between the at least two operators, and obtaining a processing result of at least one operator for the target object.
5. The data processing method according to claim 4, after obtaining the processing result of the at least one operator for the target object, further comprising:
judging whether operators obtaining the processing results of the target object have corresponding processing result data tables in a database or not;
and if so, storing the processing result of the target object into a corresponding processing result data table under the condition of receiving the storage instruction of the user.
6. The data processing method according to claim 5, wherein after determining whether the operators obtaining the processing results of the target object have corresponding processing result data tables in the database, the method further comprises:
and under the condition that the operator for obtaining the processing result of the target object does not have a corresponding processing result data table in the database, generating a corresponding processing result data table based on the processing result of the target object, and storing the processing result data table to the database.
7. The data processing method according to claim 5, wherein after determining whether the operators obtaining the processing results of the target object have corresponding processing result data tables in the database, the method further comprises:
when any one of the operators obtaining the processing result of the target object does not have a corresponding processing result data table in the database, creating a corresponding processing result data table in the database for the operator without the corresponding processing result data table in the database;
and storing the processing result of the operator which does not have the corresponding processing result data table in the database aiming at the target object into the corresponding processing result data table.
8. The data processing method according to claim 1, further comprising, after obtaining the processing result of the target object:
and generating a corresponding processing result data table based on the processing result of the target object, and storing the processing result data table to a database.
9. The data processing method according to any of claims 1 to 8, wherein the target object is unstructured data comprising pictures, video, speech and/or text.
10. A data processing device applied to a database comprises:
the request receiving module is configured to receive a target object processing request generated by a user based on a structured query language and a target object, wherein the target object is unstructured data;
the analysis module is configured to analyze the target object processing request to obtain an algorithm identifier for processing the target object;
the algorithm obtaining module is configured to call an algorithm interface, obtain an algorithm corresponding to the algorithm identifier from a preset algorithm library based on the algorithm identifier, wherein the algorithm interface is created based on the algorithm corresponding to the algorithm identifier, and the algorithm comprises at least two operators and an execution sequence between the at least two operators;
and the processing module is configured to process the target object according to the algorithm and obtain a processing result of the target object.
11. A computing device, comprising:
a memory and a processor;
the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions, which when executed by the processor, implement the steps of the data processing method of any one of claims 1 to 9.
12. A computer readable storage medium storing computer instructions which, when executed by a processor, carry out the steps of the data processing method of any one of claims 1 to 9.
CN202011061918.9A 2020-09-30 2020-09-30 Data processing method and device Active CN113297306B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011061918.9A CN113297306B (en) 2020-09-30 2020-09-30 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011061918.9A CN113297306B (en) 2020-09-30 2020-09-30 Data processing method and device

Publications (2)

Publication Number Publication Date
CN113297306A CN113297306A (en) 2021-08-24
CN113297306B true CN113297306B (en) 2023-02-07

Family

ID=77318309

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011061918.9A Active CN113297306B (en) 2020-09-30 2020-09-30 Data processing method and device

Country Status (1)

Country Link
CN (1) CN113297306B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9507762B1 (en) * 2015-11-19 2016-11-29 International Business Machines Corporation Converting portions of documents between structured and unstructured data formats to improve computing efficiency and schema flexibility

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110046169B (en) * 2019-03-12 2021-09-07 创新先进技术有限公司 Computing service implementation scheme based on structured query language statements
CN110309196A (en) * 2019-05-22 2019-10-08 深圳壹账通智能科技有限公司 Block chain data storage and query method, apparatus, equipment and storage medium
CN110377881B (en) * 2019-06-11 2023-04-07 创新先进技术有限公司 Integration method, device and system of text processing service
CN110767264B (en) * 2019-10-15 2024-10-15 腾讯科技(深圳)有限公司 Data processing method, device and computer readable storage medium
CN111274019B (en) * 2019-12-31 2023-05-12 深圳云天励飞技术有限公司 Data processing method, device and computer readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9507762B1 (en) * 2015-11-19 2016-11-29 International Business Machines Corporation Converting portions of documents between structured and unstructured data formats to improve computing efficiency and schema flexibility

Also Published As

Publication number Publication date
CN113297306A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
CN110147437B (en) Knowledge graph-based searching method and device
CN110704479A (en) Task processing method and device, electronic equipment and storage medium
CN110532280B (en) SQL sentence visualization method and device
US9830316B2 (en) Content availability for natural language processing tasks
CN114861889B (en) Deep learning model training method, target object detection method and device
CN110377881B (en) Integration method, device and system of text processing service
CN109189395B (en) Data analysis method and device
CN113535749A (en) Query statement generation method and device
CN115221191A (en) Virtual column construction method based on data lake and data query method
CN118210889A (en) Knowledge graph-based method and device for generating prompt words for vector similarity search
CN118035415A (en) Question answering method, device, equipment and storage medium
CN114647719A (en) Question-answering method and device based on knowledge graph
CN113297306B (en) Data processing method and device
CN117130990A (en) Data processing method and device
CN116594628A (en) Data tracing method and device and computer equipment
CN115525260A (en) Code generation method and device based on protobuf
CN108509059B (en) Information processing method, electronic equipment and computer storage medium
CN114510564A (en) Video knowledge graph generation method and device
CN113297199B (en) Method and device for using spatiotemporal data engine and Cassandra database system
CN114089960B (en) Object processing method and device
CN113296782A (en) Method and device for analyzing JSON data
CN113239001A (en) Data storage method and device
CN115904391A (en) Code analysis method and device
CN113239175A (en) Method for displaying candidate sentence list and terminal equipment
CN115422907A (en) Multi-dimensional science and technology project item establishment duplicate checking method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40057894

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant