CN117539981A - Method, equipment and medium for constructing theme data set - Google Patents

Method, equipment and medium for constructing theme data set Download PDF

Info

Publication number
CN117539981A
CN117539981A CN202311560826.9A CN202311560826A CN117539981A CN 117539981 A CN117539981 A CN 117539981A CN 202311560826 A CN202311560826 A CN 202311560826A CN 117539981 A CN117539981 A CN 117539981A
Authority
CN
China
Prior art keywords
data
field
configuration table
content
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311560826.9A
Other languages
Chinese (zh)
Inventor
郑敏
单震
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Original Assignee
Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chaozhou Zhuoshu Big Data Industry Development Co Ltd filed Critical Chaozhou Zhuoshu Big Data Industry Development Co Ltd
Priority to CN202311560826.9A priority Critical patent/CN117539981A/en
Publication of CN117539981A publication Critical patent/CN117539981A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method, equipment and medium for constructing a theme data set, wherein the method comprises the following steps: receiving a subject data set construction request from a user, and acquiring a data screening requirement; analyzing the data screening requirement to obtain analysis contents of different field types; creating an analytic content configuration table based on the analytic content; generating a target database statement based on the analysis content configuration table and a preset database statement template; and constructing a theme data set through the target database statement. Analyzing the data demand screening conditions around the dimension type and the quantization type fields, storing analysis contents into a configuration table according to rules, and constructing a theme data set by combining the templated SQL. By this method, data requirements can be responded to quickly. The method has the advantages of reusability in a form of separating configuration from codes, reduced development amount in data development work, convenience in maintenance and good popularization and application value.

Description

Method, equipment and medium for constructing theme data set
Technical Field
The present invention relates to the field of data matching, and in particular, to a method, an apparatus, and a medium for constructing a theme data set.
Background
The topic data set is a data set aiming at a specific topic or task and is widely applied to the fields of machine learning, data mining analysis, decision support and the like. Taking the e-commerce data analysis field as an example, a data team often needs to quickly send out an analysis report according to valuable social hotspots, so that historical data of corresponding topics needs to be obtained from massive data in a data warehouse to support reporting views.
The existing data processing method generally converts the filtering conditions in the requirement document into corresponding SQL sentences one by one, and then manually executes the SQL sentences to acquire the required data. However, this approach tends to be inefficient in situations where a fast response to data demands is required, and is detrimental to later maintenance.
Disclosure of Invention
In order to solve the above problems, the present application proposes a method, an apparatus, and a medium for constructing a theme data set, where the method includes:
receiving a subject data set construction request from a user, and acquiring a data screening requirement; analyzing the data screening requirement to obtain analysis contents of different field types; creating an analytic content configuration table based on the analytic content; generating a target database statement based on the analysis content configuration table and a preset database statement template; and constructing a theme data set through the target database statement.
In one example, the field types include a dimension field and a quantization field; the dimension field is used for describing the environment attribute of the environment where the quantized data is located and providing context for the quantized data; the quantization field is used to represent a performance index of the business process.
In one example, the parsing the data filtering requirement to obtain parsed contents with different field types specifically includes: analyzing the data screening requirement to determine an analysis field in the data screening requirement; performing semantic analysis on the analysis field to determine a field type corresponding to the analysis field; determining a preset category classification dictionary corresponding to the analysis field based on the field type; and determining the analysis content corresponding to the analysis field based on the preset category classification dictionary.
In one example, the creating the parsing content configuration table based on the parsing content specifically includes: determining the configuration table content corresponding to the data screening requirement based on the analysis content; the configuration table content comprises a requirement batch identifier, a dimension screening field, a fuzzy matching characteristic word, a fuzzy matching interference word, a standardized field and a duplication eliminating sequence; and creating a analytic content configuration table based on the configuration table content.
In one example, after determining the configuration table content corresponding to the data filtering requirement based on the parsing content, the method further includes: determining field types corresponding to the configuration table contents respectively, wherein the field types at least comprise character types and numerical types; the field types corresponding to the requirement batch identifier, the dimension screening field, the fuzzy matching characteristic word, the fuzzy matching interference word and the standardized field are character types; the field type corresponding to the de-duplication sequence is numerical.
In one example, the generating the target database statement based on the parsing content configuration table and a preset database statement template specifically includes: determining a preset function template based on the analysis content configuration table and the data screening requirement, wherein the preset function template comprises a plurality of preset database statement templates; determining blank fields in the plurality of preset database statement templates in the preset function statement; and replacing the blank field based on the configuration table content corresponding to the analysis content configuration table to generate the target database statement.
In one example, the constructing a subject data set by the target database statement specifically includes: determining the data quantity of a source table, and splitting the source table to obtain a plurality of sub-source tables if the data quantity of the source table is higher than a preset data quantity threshold; and circularly executing the target database statement on the plurality of sub-source tables until the source table data volume is traversed.
In one example, the method further comprises: determining target original data which accords with a plurality of matching rules in the data screening requirement; and performing de-duplication on the target original data in the matching result based on the de-duplication sequence so as to ensure the uniqueness of the target original data.
The application also provides a device for constructing a subject data set, which comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform: receiving a subject data set construction request from a user, and acquiring a data screening requirement; analyzing the data screening requirement to obtain analysis contents of different field types; creating an analytic content configuration table based on the analytic content; generating a target database statement based on the analysis content configuration table and a preset database statement template; and constructing a theme data set through the target database statement.
The present application also provides a non-volatile computer storage medium storing computer-executable instructions configured to: receiving a subject data set construction request from a user, and acquiring a data screening requirement; analyzing the data screening requirement to obtain analysis contents of different field types; creating an analytic content configuration table based on the analytic content; generating a target database statement based on the analysis content configuration table and a preset database statement template; and constructing a theme data set through the target database statement.
The method provided by the application has the following beneficial effects: analyzing the data demand screening conditions around the dimension type and the quantization type fields, storing analysis contents into a configuration table according to rules, and constructing a theme data set by combining the templated SQL. By this method, data requirements can be responded to quickly. The method has the advantages of reusability in a form of separating configuration from codes, reduced development amount in data development work, convenience in maintenance and good popularization and application value.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:
FIG. 1 is a flow chart of a method for constructing a topic data set according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an apparatus for constructing a theme data set according to an embodiment of the present application.
Detailed Description
For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.
Fig. 1 is a flow chart illustrating a method for constructing a theme data set according to one or more embodiments of the present disclosure. The method can be applied to the construction of different types of theme data sets, such as an internet financial theme data set, a business theme data set, an instant messaging theme data set, a game theme data set, a public business theme data set and the like. The process may be performed by a computing device in the corresponding domain (e.g., a wind control server or intelligent mobile terminal corresponding to the payment service, etc.), and certain input parameters or intermediate results in the process allow for manual intervention adjustments to help improve accuracy.
The implementation of the analysis method according to the embodiment of the present application may be a terminal device or a server, which is not particularly limited in this application. For ease of understanding and description, the following embodiments are described in detail with reference to a server.
It should be noted that the server may be a single device, or may be a system composed of a plurality of devices, that is, a distributed server, which is not specifically limited in this application.
As shown in fig. 1, an embodiment of the present application provides a method for constructing a theme data set, including:
s101: and receiving a subject data set construction request from a user, and acquiring data screening requirements.
First, based on a subject data set construction request sent by a user, determining a data screening requirement corresponding to a current construction task and source table data. It should be noted that, the construction task refers to matching original data according to data screening requirements in source table data, so as to obtain a subject data set.
S102: and analyzing the data screening requirement to obtain analysis contents of different field types.
And analyzing the data screening requirements provided by the user, so as to obtain the analysis contents of different field types corresponding to the data screening requirements. It should be noted that, the data filtering requirement may be text format or voice command, but when resolving, the data filtering requirement is converted into text format, and the user confirms the text content, so as to improve the resolving accuracy.
In one embodiment, taking the most common commodity data in the internet collected data as an example, the field types of the data can be mainly divided into two types: a dimension field and a quantization field. The dimension field is used for describing the attribute of the background and environment where the quantized data are located, providing context for the quantized data, and commonly comprises month, category, region and the like; the quantization field is a performance index for representing the business process, such as commodity name, sales volume, price, etc. When the requirements are analyzed, the screening conditions in the requirements are analyzed and classified around the two fields.
Further, when analyzing the data screening requirement, the data screening requirement is analyzed firstly to determine an analysis field in the data screening requirement, then semantic analysis is performed on the analysis field to determine a field type corresponding to the analysis field, a preset category classification dictionary corresponding to the analysis field is determined based on the field type, and finally analysis content corresponding to the analysis field is determined based on the preset category classification dictionary.
The following description will be given by taking "acquire data related to chafing dish food materials in Chongqing city of Tianmao mall platform, exclude self-heating food, standardize such data into category 'chafing dish food materials'" as data screening requirement. According to the data screening requirement, the dimension type field obtained through analysis comprises an e-commerce platform (limited to a ' Tianmao's market '), a region (limited to a ' Chongqing ' market), and the ' hot pot food material related data ' is analyzed into a dimension type field primary category (limited to ' grain and oil and food '), and the quantized field commodity name contains ' hot pot ' and does not contain ' self-heating ', because the category of the hot pot cannot be obtained according to the category in the ' network retail monitoring index specification '.
S103: and creating an analytic content configuration table based on the analytic content.
After the analysis content is obtained, the analysis content can be stored in the configuration table according to a preset rule based on the analysis content.
In one embodiment, when creating the configuration table of the resolved content, firstly, determining the configuration table content corresponding to the data screening requirement based on the resolved content, where the configuration table content includes a requirement batch identifier, a dimension screening field, a fuzzy matching feature word, a fuzzy matching interference word, a standardized field and a de-duplication sequence, and filling the configuration table content into a template of the configuration table of the resolved content according to a preset rule to obtain the configuration table of the resolved content.
In one embodiment, when the configuration table contents are filled into the parsing content configuration table template according to a preset rule, first, the field types corresponding to the configuration table contents are determined, where the field types at least include a character type and a numerical type. The field types corresponding to the requirement batch identification, the dimension screening field, the fuzzy matching characteristic word, the fuzzy matching interference word and the standardized field are character types, and the field types corresponding to the deduplication sequence are numerical types. The structure of the analytic content configuration table is abstracted as follows:
field name Field type Field annotation
REQ_BATCH Character type Demand lot identification
DIM_COL_1 Character type Dimension filtering field 1
DIM_COL_N Character type Dimension filtering field N
MATCH_WORDS1 Character type Fuzzy matching of first feature words
MATCH_WORDS2 Character type Fuzzy matching of second feature words
NOT_MATCH_WORDS Character type Fuzzy matching interference word
STD_COL_1 Character type Standardized field 1
STD_COL_N Character type Normalized field N
STD_ORDER Numerical value type De-duplication sequence
Wherein req_BATCH represents the lot identification of the current demand. Dim_col_1. Dim_col_n represents a dimension filtering field, and the number of specific fields depends on the structure and requirements of the corresponding access source table in the data warehouse.
MATCH_WORS1, MATCH_WORS2 and NOT_MATCH_WORS are stored in the content of regular expressions respectively in the form of (keyword 1|keyword 2|keyword N), each quantized field needing to be screened is subjected to screening configuration information storage through the group of fields, and logical operation of AND, OR and NOT is completed through the three fields. For example, the quantization field screening conditions in the requirements are: keyword 1 must be included, while either keyword 2 or keyword 3 may be included, but keyword 4 and keyword 5 cannot be present, then the set of fields is stored in the form of: MATCH_WORS1: (keyword 1); MATCH_WORD2: (keyword 2|keyword 3); not_match_work (keyword 4|keyword 5).
Std_col_1.
The std_order is used for the de-duplication logic when matching the matched original data, for example, one piece of original data may match with a plurality of configuration rules in the requirement, and priority ordering is needed according to the set std_order value and the requirement business meaning to ensure the uniqueness of the basic data. When the duplication removal is carried out, determining target original data which accords with a plurality of matching rules in the data screening requirement; and then, based on the de-duplication sequence, de-duplication is carried out on the target original data in the matching result so as to ensure the uniqueness of the target original data.
S104: and generating a target database statement based on the analysis content configuration table and a preset database statement template.
And generating a target database statement through a preset SQL statement template by analyzing the content configuration table so that the target database statement can execute the data screening requirement.
In one embodiment, when generating the target database statement, a preset function template is determined firstly based on the analysis content configuration table and the data screening requirement, wherein the preset function template comprises a plurality of preset database statement templates. And then determining blank fields in a plurality of preset database statement templates in the preset function statements, and replacing the blank fields based on the configuration table contents corresponding to the analysis content configuration table to generate the target database statement.
Wherein the database statement template may be abstracted into the following form:
the above pseudocode is written according to Oracle syntax as an example, and most other mainstream database types can implement the above functions by replacing the corresponding functions.
S105: and constructing a theme data set through the target database statement.
And matching the original data with the data screening requirement by executing the target database statement in the source table data, so as to obtain the original data corresponding to the constructed subject data set, and further construct the subject data set.
In one embodiment, when the subject data set is constructed, if the data amount of the source table is large, the target database statement may be circularly executed after the data is split. At this time, the data amount of the source table needs to be determined, and if the data amount of the source table is higher than a preset data amount threshold, the source table is split to obtain a plurality of sub-source tables. And circularly executing target database sentences on the plurality of sub-source tables until the source table data volume is traversed.
As shown in fig. 2, an embodiment of the present application further provides a device for constructing a theme data set, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
receiving a subject data set construction request from a user, and acquiring a data screening requirement; analyzing the data screening requirement to obtain analysis contents of different field types; creating an analytic content configuration table based on the analytic content; generating a target database statement based on the analysis content configuration table and a preset database statement template; and constructing a theme data set through the target database statement.
The embodiments also provide a non-volatile computer storage medium storing computer executable instructions configured to:
receiving a subject data set construction request from a user, and acquiring a data screening requirement; analyzing the data screening requirement to obtain analysis contents of different field types; creating an analytic content configuration table based on the analytic content; generating a target database statement based on the analysis content configuration table and a preset database statement template; and constructing a theme data set through the target database statement.
All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.
The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not described in detail herein.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A method of constructing a subject data set, comprising:
receiving a subject data set construction request from a user, and acquiring a data screening requirement;
analyzing the data screening requirement to obtain analysis contents of different field types;
creating an analytic content configuration table based on the analytic content;
generating a target database statement based on the analysis content configuration table and a preset database statement template;
and constructing a theme data set through the target database statement.
2. The method of claim 1, wherein the field types include a dimension field and a quantization field;
the dimension field is used for describing the environment attribute of the environment where the quantized data is located and providing context for the quantized data;
the quantization field is used to represent a performance index of the business process.
3. The method of claim 2, wherein the parsing the data filtering requirement to obtain parsed contents with different field types specifically includes:
analyzing the data screening requirement to determine an analysis field in the data screening requirement;
performing semantic analysis on the analysis field to determine a field type corresponding to the analysis field;
determining a preset category classification dictionary corresponding to the analysis field based on the field type;
and determining the analysis content corresponding to the analysis field based on the preset category classification dictionary.
4. The method according to claim 1, wherein creating the parsed content configuration table based on the parsed content specifically comprises:
determining the configuration table content corresponding to the data screening requirement based on the analysis content;
the configuration table content comprises a requirement batch identifier, a dimension screening field, a fuzzy matching characteristic word, a fuzzy matching interference word, a standardized field and a duplication eliminating sequence;
and creating a analytic content configuration table based on the configuration table content.
5. The method of claim 4, wherein after determining the configuration table content corresponding to the data filtering requirement based on the parsing content, the method further comprises:
determining field types corresponding to the configuration table contents respectively, wherein the field types at least comprise character types and numerical types;
the field types corresponding to the requirement batch identifier, the dimension screening field, the fuzzy matching characteristic word, the fuzzy matching interference word and the standardized field are character types;
the field type corresponding to the de-duplication sequence is numerical.
6. The method according to claim 1, wherein the generating the target database statement based on the parsed content configuration table and a preset database statement template specifically includes:
determining a preset function template based on the analysis content configuration table and the data screening requirement, wherein the preset function template comprises a plurality of preset database statement templates;
determining blank fields in the plurality of preset database statement templates in the preset function statement;
and replacing the blank field based on the configuration table content corresponding to the analysis content configuration table to generate the target database statement.
7. The method according to claim 1, wherein said constructing a subject data set from said target database statement, in particular comprises:
determining the data quantity of a source table, and splitting the source table to obtain a plurality of sub-source tables if the data quantity of the source table is higher than a preset data quantity threshold;
and circularly executing the target database statement on the plurality of sub-source tables until the source table data volume is traversed.
8. The method according to claim 4, wherein the method further comprises:
determining target original data which accords with a plurality of matching rules in the data screening requirement;
and performing de-duplication on the target original data in the matching result based on the de-duplication sequence so as to ensure the uniqueness of the target original data.
9. A device for constructing a subject data set, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform:
receiving a subject data set construction request from a user, and acquiring a data screening requirement;
analyzing the data screening requirement to obtain analysis contents of different field types;
creating an analytic content configuration table based on the analytic content;
generating a target database statement based on the analysis content configuration table and a preset database statement template;
and constructing a theme data set through the target database statement.
10. A non-transitory computer storage medium storing computer-executable instructions, the computer-executable instructions configured to:
receiving a subject data set construction request from a user, and acquiring a data screening requirement;
analyzing the data screening requirement to obtain analysis contents of different field types;
creating an analytic content configuration table based on the analytic content;
generating a target database statement based on the analysis content configuration table and a preset database statement template;
and constructing a theme data set through the target database statement.
CN202311560826.9A 2023-11-21 2023-11-21 Method, equipment and medium for constructing theme data set Pending CN117539981A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311560826.9A CN117539981A (en) 2023-11-21 2023-11-21 Method, equipment and medium for constructing theme data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311560826.9A CN117539981A (en) 2023-11-21 2023-11-21 Method, equipment and medium for constructing theme data set

Publications (1)

Publication Number Publication Date
CN117539981A true CN117539981A (en) 2024-02-09

Family

ID=89791417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311560826.9A Pending CN117539981A (en) 2023-11-21 2023-11-21 Method, equipment and medium for constructing theme data set

Country Status (1)

Country Link
CN (1) CN117539981A (en)

Similar Documents

Publication Publication Date Title
CN107038207B (en) Data query method, data processing method and device
CN109063178B (en) Method and device for automatically expanding self-help analysis report
CN106951231B (en) Computer software development method and device
CN110019111B (en) Data processing method, data processing device, storage medium and processor
CN117150050B (en) Knowledge graph construction method and system based on large language model
CN114004694A (en) Public opinion information-based enterprise risk early warning method, equipment and medium
CN115881106A (en) Intelligent form filling method, equipment and medium based on multi-scene semantic analysis
CN110688383A (en) Data acquisition method and system
CN114297204A (en) Data storage and retrieval method and device for heterogeneous data source
CN110019169A (en) A kind of method and device of data processing
CN110673828A (en) Product configuration method
CN110019357B (en) Database query script generation method and device
CN117539981A (en) Method, equipment and medium for constructing theme data set
CN113239039B (en) Dynamic data storage method, query method, management method and management system
CN111078905A (en) Data processing method, device, medium and equipment
CN114138787A (en) Bar code identification method, equipment and medium
CN110941952A (en) Method and device for perfecting audit analysis model
CN110968555A (en) Dimension data processing method and device
CN113177059A (en) Method and device for generating matching platform SQL script
CN106951427B (en) Data extraction method and device for business object
CN115455035B (en) Data query model construction method and computer-readable storage medium
CN116701402A (en) Log recording method, system, equipment and medium based on cyclic archive table
CN117851691A (en) Configuration method, equipment and medium for adding document auxiliary attribute
CN110968634B (en) Method for realizing ETL conversion processing by utilizing programmable function based on XML description in big data scene
CN116383468A (en) Design method, equipment and medium of big data intelligent retrieval platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination