CN117033347A - Method, system, equipment and medium for modeling number bins based on patent data - Google Patents

Method, system, equipment and medium for modeling number bins based on patent data Download PDF

Info

Publication number
CN117033347A
CN117033347A CN202311005350.2A CN202311005350A CN117033347A CN 117033347 A CN117033347 A CN 117033347A CN 202311005350 A CN202311005350 A CN 202311005350A CN 117033347 A CN117033347 A CN 117033347A
Authority
CN
China
Prior art keywords
data
patent data
source table
data source
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311005350.2A
Other languages
Chinese (zh)
Inventor
卢春辉
何娅娅
臧智涛
张敏
李建雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qizhi Technology Co ltd
Original Assignee
Qizhi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qizhi Technology Co ltd filed Critical Qizhi Technology Co ltd
Priority to CN202311005350.2A priority Critical patent/CN117033347A/en
Publication of CN117033347A publication Critical patent/CN117033347A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/11Patent retrieval

Abstract

A method, a system, equipment and a medium for modeling a plurality of bins based on patent data relate to the field of data warehouse modeling. In the method, the method comprises the following steps: acquiring service demand information; obtaining a patent data source table, wherein the patent data source table comprises a plurality of sheets; acquiring granularity declaration information; processing the patent data source table, and converting the multi-value fields contained in the patent data source table into a single field to obtain a plurality of patent data detail tables corresponding to the patent data source table; determining data logic association among patent data detail tables according to service demand information; and constructing a data warehouse model according to all patent data detail tables, granularity statement information and data logic association among the patent data detail tables. By adopting the technical scheme provided by the application, the patent data source list is converted into the patent data detail list only comprising single-dimension data, so that the unification of the patent data caliber is ensured, and the data warehouse modeling of the patent data is well completed.

Description

Method, system, equipment and medium for modeling number bins based on patent data
Technical Field
The application relates to the field of data warehouse modeling, in particular to a method, a system, equipment and a medium for modeling a plurality of bins based on patent data.
Background
With the rapid development of internet technology and the continuous expansion of artificial intelligence application scenes, the number of patent resources is exponentially increased. How to fully mine and utilize the patent data is an important subject faced by researchers today.
Conventional relational database storage structures have failed to meet the needs of the big data age, and thus several bin-based solutions have been proposed. However, for patent data, multiple fields in a patent data source table may be in different secondary data fields, which are available for later analysis and differ in granularity. Therefore, if the existing data warehouse modeling method is used for modeling the patent data warehouse, the problems of unclear data relationship, confusion of blood edges, inconsistent caliber and the like of the constructed data warehouse are caused, and the existing data warehouse modeling method cannot model the patent data warehouse well.
Disclosure of Invention
In order to better model a patent data warehouse, the application provides a method, a system, equipment and a medium for modeling a plurality of bins based on patent data.
In a first aspect, the present application provides a method for modeling a plurality of bins based on patent data, the method comprising the steps of:
acquiring service demand information;
obtaining a patent data source table, wherein the patent data source table comprises a plurality of patent data sources;
acquiring granularity declaration information;
processing the patent data source table, and converting the multi-value fields contained in the patent data source table into single fields to obtain a plurality of patent data detail tables corresponding to the patent data source table;
determining data logic association among the patent data detail tables according to the service demand information;
and constructing a data warehouse model according to all the patent data detail tables, the acquired granularity declaration information and the data logic association among the patent data detail tables.
By adopting the technical scheme, the patent data source table contains a large number of multivalued fields, and when the data warehouse modeling is performed based on the patent data source table, the patent data source table is converted into a patent data detail table only containing single-dimension data, so that the patent data is divided into secondary data fields, the unification of patent data apertures is ensured, and the data warehouse modeling of the patent data is well completed; meanwhile, the data logic relations among the patent data detail tables are combined, and the blood margin definition of the patent data in the data warehouse model is guaranteed.
Optionally, in processing the patent data source table, the method specifically includes:
identifying the multi-value field in the patent data source table;
splitting the patent data source table containing the multi-value field to obtain a plurality of first patent data detail tables corresponding to the multi-value field;
and directly converting the patent data source table which does not contain the multivalued field into a second patent data detail table, and completing the processing of all the patent data source tables.
By adopting the technical scheme, if the patent data source list contains multiple-value fields, the problems of data redundancy, reduced inquiry performance, poor data integrity, complex list design and the like of the established data warehouse model can be caused, the first patent data source list is split to obtain the corresponding first patent data detail list, so that the finally obtained patent data detail list only contains single fields, and various problems caused by the multiple-value fields are avoided in the subsequent modeling.
Optionally, before processing the patent data source table, the method further includes:
determining a plurality of data topic domains according to the service demand information, and establishing service logic association between the data topic domains;
and dividing each patent data source table into corresponding data subject domains.
By adopting the technical scheme, the patent data are sorted according to the division of the data subject domains, so that the quality and the accuracy of the patent data are improved, the data in the data warehouse are ensured to be reliable and consistent, and the accuracy and the reliability of data decision are improved.
Optionally, the data topic field includes a bibliographic topic field, a legal topic field, a description topic field, a citation topic field, and a review topic field.
Optionally, in dividing each patent data source table into corresponding data subject domains, the method specifically includes:
respectively obtaining source table features of each patent data source table, wherein the source table features comprise a first source table feature and a second source table feature;
acquiring data topic features of each data topic domain;
respectively calculating attribution degree between each data topic domain and each patent data source table according to the data topic features and the source table features;
the patent data source table is divided into the data subject domains having the highest attribution degree with the patent data source table.
By adopting the technical scheme, the source list features describe the patent data source list, and the attribution data topic domain of the patent data source list is determined by the source list features, so that the accuracy of data topic domain allocation is ensured.
Optionally, the method specifically includes the steps of:
and acquiring the first source table characteristic according to the data field contained in the patent data source table.
By adopting the technical scheme, the source list features are used for enterprise data source list description, and the source list features are extracted through each data field in the patent data source list, so that the accuracy of the source list features on the description of the patent data source list is ensured.
Optionally, in acquiring the source table features of each of the patent data source tables separately, the method further includes:
acquiring a data dictionary associated with the patent data source list;
and acquiring the second source list features according to the data dictionary.
By adopting the technical scheme, the data dictionary is a document or database for comprehensively describing and defining data in the enterprise data warehouse, and comprises information such as data tables, fields, data types, data formats, data sources and the like. And extracting the source list features through the data dictionary associated with the enterprise data source list, so that the accuracy of the source list features on the description of the enterprise data source list is further ensured.
In a second aspect of the application, there is provided a system for modeling a plurality of bins based on patent data, the system comprising the following modules:
the business demand information acquisition module is used for acquiring business demand information;
the patent data source table acquisition module is used for acquiring a patent data source table, wherein the patent data source table comprises a plurality of patent data sources;
the data warehouse granularity declaration module is used for acquiring granularity declaration information;
the patent data source table processing module is used for processing the patent data source table, converting the multi-value fields contained in the patent data source table into single fields, and obtaining a plurality of patent data detail tables corresponding to the patent data source table;
the data logic association determining module is used for determining the data logic association between the patent data detail tables according to the service demand information;
and the data warehouse model construction module is used for constructing a data warehouse model according to all the patent data detail tables, the granularity statement information and the data logic association among the patent data detail tables.
In a third aspect of the application, an electronic device is provided;
in a fourth aspect of the application, a computer readable storage medium is provided;
in summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:
1. when the data warehouse modeling is performed based on the patent data source table, the patent data source table is converted into a patent data detail table only containing single-dimension data, so that the patent data is divided into secondary data fields, the unification of patent data apertures is ensured, and the data warehouse modeling of the patent data is completed well.
2. The data logic relations among the patent data detail tables are combined, and the blood margin definition of the patent data in the data warehouse model is guaranteed.
3. Extracting source table features through each data field in the patent data source table, extracting data topic features of each data topic domain, calculating attribution degree between the data topic domain and the patent data source table through the topic domain features and the source table features, and determining what data topic domain the patent data source table belongs to. The division of the data subject domain is quick and accurate.
Drawings
Fig. 1 is a schematic flow chart of a method for modeling a plurality of bins based on patent data according to an embodiment of the present application.
Fig. 2 is a schematic structural diagram of a system for modeling a plurality of bins based on patent data according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of an electronic device according to the disclosure.
Reference numerals illustrate: 201. a service demand information acquisition module; 202. a patent data source table acquisition module; 203. a data warehouse granularity declaration module; 204. a patent data source table processing module; 205. a data logic association determination module; 206. a data warehouse model building module; 300. an electronic device; 301. a processor; 302. a communication bus; 303. a user interface; 304. a network interface; 305. a memory.
Detailed Description
In order that those skilled in the art will better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.
In describing embodiments of the present application, words such as "for example" or "for example" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "such as" or "for example" in embodiments of the application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.
In the description of embodiments of the application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Before describing embodiments of the present application, some terms involved in the embodiments of the present application will be first defined and described.
Data Warehouse (DW): a data warehouse is a system for storing and managing large amounts of data from a number of different operating systems and data sources for an enterprise. It is a theme-oriented, integrated, stable, nonvolatile, time-varying data set that is used to support enterprise decision making and analysis. The data warehouse is designed and built to meet enterprise needs for efficient querying and analysis, as well as to support decision making. Data warehouses typically include data extraction, transformation, and loading (ETL) processes, as well as data storage and query tools. The data warehouse itself does not "produce" any data, which originates from different external systems; meanwhile, the data warehouse does not need to consume any data, and the result is opened to various external applications.
Data warehouse model: a data warehouse model refers to the organization of data in a data warehouse into a particular structure to support data analysis and querying in the data warehouse. Based on the concept of the data warehouse model, the data warehouse modeling is to perform planning of a certain structure and organization on the data in the data warehouse.
Referring to fig. 1, the application provides a several-bin modeling method based on patent data, which specifically comprises the following steps:
s1: acquiring service demand information;
specifically, the modeling of the data warehouse is close to the service, so the modeling is required to be carried out by taking the service as a root, and then the service process is selected, as the name implies, the service which needs to be modeled is selected from the whole service flow, and the service is selected according to the requirements provided by operation, the later expansibility and the like. Therefore, the business requirement information is determined according to the construction purpose of the data warehouse, and the business requirement information describes business processes aimed by the data warehouse on one hand and operation requirements of the data warehouse on the other hand. The business need information is generally determined by those skilled in the art based on the business process for which the data warehouse is directed.
In the embodiment provided by the application, the data warehouse model is oriented to the patent data query system, and technicians analyze the patent data query system to establish a business demand model of the data warehouse, and then convert the business demand model of the data warehouse into structured data to obtain business demand information.
Specifically, the business demand model needs to be determined according to the function planning of related technicians on the patent data query system, each business domain involved in the patent data query system is determined, and the logic association between each business domain is established according to the actual business scene, so that the establishment of the business demand model is completed.
In one possible embodiment of the present application, for a patent, the patent data query system provides the user with a query service of "bibliographic," legal, "" specification, "" citation, "and" review "of 5 panels, and then when the business requirement model is established, the business domain of the business requirement model may be set to" bibliographic, "" legal, "" specification, "" citation, "and" review. After the service domain of the service demand model is determined, a logic association between each service domain is established according to the actual association between the actual service scenes corresponding to each service domain, for example, for two service domains of the specification and the reference, the two service domains have association in the actual service scenes, and in the specification of part of the patent, the reference is made to other related product specifications or reference materials, so that the logic association can be established between the two service domains of the specification and the reference.
After the business requirement model is built, the business requirement model is described through structured data to obtain business requirement information which can be used for the identification of an execution main body of the multi-bin modeling method based on the patent data provided by the embodiment of the application, the business requirement information completes the description of each business domain and the logic association among each business domain, and the business requirement information can be stored in SOP, XML, JSON or CSV format.
S2: obtaining a patent data source list;
specifically, the patent data source table may be obtained from each business system involved in the patent business process, or may be obtained from an external data provider. The patent data source table is typically embodied in the form of a relational database. The patent data source table is an important component in the data warehouse and is used for recording information such as data tables and data fields in different service systems.
The patent data source table generally comprises a data table name, a data table description, a data field name, a data field type, a data field description and a data table relationship description. The data table names are used for recording the data table names in each service system; the data table description is used for recording brief description information of each data table, and comprises the service field of the data table, the action and the function of the data table and the like; the data field names are used for recording the data field names in each data table; the data field type is used for recording the data type of each data field, such as integer type, character type, date type and the like; the data field description is used for recording brief description information of each data field, including meaning of the data field, data source and the like; the inter-table relationship describes a relationship for recording the respective data tables, such as a one-to-many relationship, a many-to-many relationship, and the like.
Each patent data source table is also associated with a data dictionary, which is a document or database for comprehensively describing and defining data in the patent data warehouse, and includes information such as data table, field, data type, data format, data source, etc. for integrally describing the patent data source table.
S3: acquiring granularity declaration information;
specifically, the definition of granularity is a very critical step in data warehouse modeling, which directly affects the query performance of the data warehouse and the effect of data analysis. The granularity is defined taking into account the business requirements and availability of data, as well as the efficiency of data queries and the storage capacity of the data warehouse.
In one possible embodiment of the present application, the data granularity of the data warehouse model based on the patent data is determined to be "public number" by analyzing the data warehouse model, then granularity declaration information is generated according to the determined data granularity, and the declaration of the granularity of the data warehouse model is completed based on the granularity declaration information.
Granularity declaration information for declaring data warehouse model granularity is typically expressed in the form of documents, including granularity level declaration information, granularity metric declaration information, granularity object declaration information, granularity dimension declaration information, and granularity particle declaration information. S4: processing the patent data source list to obtain a plurality of patent data detail lists;
specifically, the patent data source table contains a plurality of multi-value fields, and the multi-value fields generally refer to a case where one data field contains a plurality of values, for example, in the inventor data of the patent data, the inventor of one patent may contain a plurality of values, for example, "inventor: thirdly, stretching; fourthly, plum; wang wu. For a patent data source table containing multiple-value fields, if the patent data source table is directly converted into a detail table for data warehouse modeling, problems such as data redundancy, reduced query performance, poor data integrity, complex table design and the like can be caused.
And identifying multiple-value fields in all the patent data source tables, splitting the patent data source tables containing the multiple-value fields to obtain a plurality of first patent data detail tables corresponding to the multiple-value fields, and directly converting the patent data source tables not containing the multiple-value fields into second patent data detail tables.
In particular, the identification of multi-valued fields in a patent data source table may be performed by an SQL query tool, and since the patent data source table is typically embodied in the form of a relational database, the data in the relational database may be queried and analyzed using an SQL query language, for example, using a string function, a regular expression, or the like, to find the multi-valued fields. In another possible embodiment of the present application, the data fields included in the patent data source table may be text-analyzed by using a natural language processing tool, such as NLTK, spaCy, etc., so as to complete the identification of the multi-valued fields.
For a patent data source table containing multiple-value fields, splitting is performed by the following steps:
step 1: identifying a multi-value field;
first, it is necessary to identify columns containing multi-value fields in a patent data source table containing multi-value fields, and typically, the data in these columns is a plurality of numerical values or texts separated by separators.
Step 2: splitting the multi-value field;
for a column containing multiple-valued fields, it can be split into multiple columns, each containing only one value or text, while to ensure the integrity of the data, an identifier needs to be added to the split column to indicate which multiple-valued field the column belongs to.
Step 3: creating a first patent data detail table;
the split columns may form a new first patent data list, where each row in the first patent data list represents a numerical value or text corresponding to a multi-value field, and a column is required to be included to represent a record to which the multi-value field belongs.
Step 4: using foreign key association;
between the patent data source table containing the multi-value field and the first patent data detail table, a relationship may be established using foreign key association so that the multi-value field can be accurately handled in data analysis.
Step 5: data importing;
and importing the split data into a first patent data detail table, updating corresponding columns in a patent data source table containing multi-value fields, and ensuring the consistency of the data by using foreign key association.
In addition, to ensure that the logic and the blood edges of the data structures in the established patent data warehouse are clear, the data subject domain division process of the patent data source table is further included before the processing of the patent data source table is performed.
Specifically, prior to processing the patent data source tables, each patent data source table needs to be partitioned into corresponding data subject domains. The data topic field is determined according to the construction purpose of the data warehouse described by the business requirement information, and in one possible embodiment of the application, the data topic field is provided with 5 books, laws, specifications, references and review.
Extracting source table features of each patent data source table, extracting data topic features of each data topic domain, respectively calculating attribution degrees between the patent data source table and the data topic domain according to the source table features of each patent data source table and the data topic features of each data topic domain, and dividing the patent data source table into the data topic domain with the highest attribution degrees.
The source table features of the patent data source table are used for describing the patent data source table, and the source table features of the patent data source table comprise first source table features and second source table features, wherein the first source table features are extracted according to data fields contained in the patent data source table, and the second source table features are extracted according to a data dictionary associated with the patent data source table.
Similarly, the data topic features of the data topic domain are descriptions of the data topic domain, and in one possible embodiment of the application, feature vectorization is performed on topic domain names of the data topic domain, and the vectorized topic domain names are used as the data topic features of the data topic domain.
For a patent data source list, carrying out feature similarity calculation on the corresponding source list features and the data topic features of each data topic domain respectively, taking the similarity between the source list features and the data topic features as the attribution degree between the patent data source list corresponding to the source list features and the data topic domain corresponding to the data topic features, selecting the data topic domain with the highest attribution degree, and dividing the patent data source list into the data topic domain with the highest attribution degree.
The source list features and the data subject features are both embodied in text type features, and for text type features, methods such as editing distance, jaccard coefficient, TF-IDF and the like can be used to calculate the similarity between features, and in addition, the description of the prior art is omitted here.
S5: determining data logic association among patent data detail tables according to service demand information;
specifically, data logic association among the patent data detail tables is deduced according to the service demand information, and the data logic association among the patent data detail tables is established. The business requirement information describes a business process aimed at by the data warehouse, business association among business links is also included in the described business process, and the patent data detail table corresponds to each business link, so that after the business association is determined, the data logic association among the patent data detail tables can be determined.
Specifically, the data logical associations between the individual patent data lists include a primary foreign key relationship, a one-to-many relationship, a many-to-many relationship, etc., which are used to describe business connections between the individual patent data lists.
S6: and constructing a data warehouse model according to all patent data detail tables, data warehouse granularity and data logic association among the patent data detail tables.
Specifically, after the patent data detail table, the granularity of the data warehouse, and the data logic association between the patent data detail tables are obtained, the obtaining of the basic elements of the data warehouse modeling has been completed. Determining a data warehouse architecture of a data warehouse model based on the business requirement information, wherein the data warehouse architecture of the data warehouse model can be any one of a star mode, a snowflake mode or a constellation mode; according to the business demand information and the data logic relation, determining a dimension table and a fact table and the relation between the dimension table and the fact table, wherein the dimension table can comprise time, place, patents, applicant and the like, the fact table can comprise patent quantity, patent value, patent application quantity and the like, and the specific contents of the dimension table and the fact table are determined according to the business demand information; and determining a physical model of the data warehouse according to the established data warehouse architecture, the relationship between the dimension table and the fact table and the relationship between the dimension table and each fact table, thereby completing the model establishment of the data warehouse.
Referring to fig. 2, the application further provides a system for modeling a plurality of bins based on patent data, which specifically comprises the following modules:
a service requirement information acquisition module 201, configured to acquire service requirement information;
a patent data source table obtaining module 202, configured to obtain a patent data source table, where the patent data source table includes a plurality of patent data sources;
a data warehouse granularity declaration module 203, configured to obtain granularity declaration information;
a patent data source table processing module 204, configured to process the patent data source table, and convert the multi-valued field included in the patent data source table into a single field, so as to obtain a plurality of patent data detail tables corresponding to the patent data source table;
a data logic association determining module 205, configured to determine a data logic association between the patent data detail tables according to the service requirement information;
the data warehouse model building module 206 is configured to build a data warehouse model based on all patent data detail tables, granularity declaration information, and data logical associations between patent data detail tables.
It should be noted that: in the device provided in the above embodiment, when implementing the functions thereof, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.
The application also discloses the electronic equipment 300. Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device 300 according to an embodiment of the present disclosure. The electronic device 300 may include: at least one processor 301, at least one network interface 304, a user interface 303, a memory 305, at least one communication bus 302.
Wherein the communication bus 302 is used to enable connected communication between these components.
The user interface 303 may include a Display screen (Display), a Camera (Camera), and the optional user interface 303 may further include a standard wired interface, and a wireless interface.
The network interface 304 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein the processor 301 may include one or more processing cores. The processor 301 utilizes various interfaces and lines to connect various portions of the overall server, perform various functions of the server and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 305, and invoking data stored in the memory 305. Alternatively, the processor 301 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 301 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 301 and may be implemented by a single chip.
The Memory 305 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 305 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 305 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 305 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. Memory 305 may also optionally be at least one storage device located remotely from the aforementioned processor 301. Referring to fig. 3, an operating system, a network communication module, a user interface module, and an application program of a several-bin modeling method based on patent data may be included in the memory 305 as a computer storage medium.
In the electronic device 300 shown in fig. 3, the user interface 303 is mainly used for providing an input interface for a user, and acquiring data input by the user; and processor 301 may be configured to invoke an application program in memory 305 that stores a method of modeling a number of bins based on patent data, which when executed by one or more processors 301, causes electronic device 300 to perform the method as described in one or more of the embodiments above. It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory 305. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product, or all or part of the technical solution, which is stored in a memory 305, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present application. And the aforementioned memory 305 includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.
The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure.
This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims (10)

1. A method for modeling a plurality of bins based on patent data, the method comprising the steps of:
acquiring service demand information;
obtaining a patent data source list;
acquiring granularity declaration information;
processing the patent data source table, and converting the multi-value fields contained in the patent data source table into single fields to obtain a plurality of patent data detail tables corresponding to the patent data source table;
determining data logic association among the patent data detail tables according to the service demand information;
and constructing a data warehouse model according to all the patent data detail tables, the granularity declaration information and the data logic association among the patent data detail tables.
2. The method for modeling a plurality of bins based on patent data according to claim 1, wherein in the processing of the patent data source table, specifically comprising:
identifying the multi-value field in the patent data source table;
splitting the patent data source table containing the multi-value field to obtain a plurality of first patent data detail tables corresponding to the multi-value field;
and directly converting the patent data source table which does not contain the multivalued field into a second patent data detail table, and completing the processing of all the patent data source tables.
3. The patent data based several bins modeling method of claim 1, further comprising, prior to processing the patent data source table:
determining a plurality of data topic domains according to the service demand information, and establishing service logic association between the data topic domains;
and dividing each patent data source table into corresponding data subject domains.
4. A method of modeling a plurality of bins based on patent data according to claim 3, wherein:
the data topic fields comprise a bibliographic topic field, a legal topic field, a description topic field, a citation topic field and a review topic field.
5. The method for modeling a plurality of bins based on patent data according to claim 3, wherein the dividing each patent data source table into corresponding data subject domains specifically comprises:
respectively obtaining source table features of each patent data source table, wherein the source table features comprise a first source table feature and a second source table feature;
acquiring data topic features of each data topic domain;
respectively calculating attribution degree between each data topic domain and each patent data source table according to the data topic features and the source table features;
the patent data source table is divided into the data subject domains having the highest attribution degree with the patent data source table.
6. The method for modeling a plurality of bins based on patent data according to claim 3, wherein the method for obtaining the source table characteristics of each of the source tables of patent data comprises:
and acquiring the first source table characteristic according to the data field contained in the patent data source table.
7. The method of claim 3, wherein in obtaining the source table characteristics of each of the source tables of patent data, further comprising:
acquiring a data dictionary associated with the patent data source list;
and acquiring the second source list features according to the data dictionary.
8. A system for modeling a plurality of bins based on patent data, the system comprising:
a service demand information acquisition module (201) for acquiring service demand information;
a patent data source table acquisition module (202) for acquiring a patent data source table, wherein the patent data source table comprises a plurality of patent data sources;
a data warehouse granularity declaration module (203) for acquiring granularity declaration information;
a patent data source table processing module (204) for processing the patent data source table, converting the multi-value fields contained in the patent data source table into a single field, and obtaining a plurality of patent data detail tables corresponding to the patent data source table;
a data logic association determining module (205) for determining a data logic association between each of the patent data detail tables according to the service requirement information;
a data warehouse model construction module (206) for constructing a data warehouse model based on all of the patent data detail tables, the acquisition granularity declaration information, and the data logical associations between the patent data detail tables.
9. An electronic device comprising a processor (301), a memory (305), a user interface (303) and a network interface (304), the memory (305) being adapted to store instructions, the user interface (303) and the network interface (304) being adapted to communicate to other devices, the processor (301) being adapted to execute the instructions stored in the memory (305) to cause the electronic device (300) to perform the method according to any of claims 1-7.
10. A computer readable storage medium storing instructions which, when executed, perform the method steps of any of claims 1-7.
CN202311005350.2A 2023-08-09 2023-08-09 Method, system, equipment and medium for modeling number bins based on patent data Pending CN117033347A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311005350.2A CN117033347A (en) 2023-08-09 2023-08-09 Method, system, equipment and medium for modeling number bins based on patent data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311005350.2A CN117033347A (en) 2023-08-09 2023-08-09 Method, system, equipment and medium for modeling number bins based on patent data

Publications (1)

Publication Number Publication Date
CN117033347A true CN117033347A (en) 2023-11-10

Family

ID=88640775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311005350.2A Pending CN117033347A (en) 2023-08-09 2023-08-09 Method, system, equipment and medium for modeling number bins based on patent data

Country Status (1)

Country Link
CN (1) CN117033347A (en)

Similar Documents

Publication Publication Date Title
US10725836B2 (en) Intent-based organisation of APIs
Rattenbury et al. Principles of data wrangling: Practical techniques for data preparation
US20200050968A1 (en) Interactive interfaces for machine learning model evaluations
Karnitis et al. Migration of relational database to document-oriented database: Structure denormalization and data transformation
EP3161635B1 (en) Machine learning service
US10963810B2 (en) Efficient duplicate detection for machine learning data sets
US10171311B2 (en) Generating synthetic data
US11874798B2 (en) Smart dataset collection system
Silva et al. Integrating big data into the computing curricula
CN116431598A (en) Redis-based relational database full memory method
CN114417012A (en) Method for generating knowledge graph and electronic equipment
CN110222047A (en) A kind of dynamic list generation method and device
CN115329011A (en) Data model construction method, data query method, data model construction device and data query device, and storage medium
Yang et al. User story clustering in agile development: a framework and an empirical study
WO2016119508A1 (en) Method for recognizing large-scale objects based on spark system
US10877998B2 (en) Highly atomized segmented and interrogatable data systems (HASIDS)
CN108205564B (en) Knowledge system construction method and system
CN114880483A (en) Metadata knowledge graph construction method, storage medium and system
CN117033347A (en) Method, system, equipment and medium for modeling number bins based on patent data
CN117033346A (en) Method, system, equipment and medium for modeling multiple bins based on enterprise data
CN111221846B (en) Automatic translation method and device for SQL sentences
CN116226686B (en) Table similarity analysis method, apparatus, device and storage medium
CN115237986A (en) Data dump method, device and storage medium
Bernardo et al. WikiOlapBase: A collaborative tool for open data processing and integration
CN117668242A (en) Data analysis method, system and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination