CN116186097A - Method, device, equipment and storage medium for searching data asset - Google Patents
Method, device, equipment and storage medium for searching data asset Download PDFInfo
- Publication number
- CN116186097A CN116186097A CN202310140331.4A CN202310140331A CN116186097A CN 116186097 A CN116186097 A CN 116186097A CN 202310140331 A CN202310140331 A CN 202310140331A CN 116186097 A CN116186097 A CN 116186097A
- Authority
- CN
- China
- Prior art keywords
- data
- data asset
- asset
- assets
- data assets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 230000001419 dependent effect Effects 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 13
- 230000004044 response Effects 0.000 claims description 3
- 238000007726 management method Methods 0.000 description 17
- 238000004590 computer program Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 8
- 239000008280 blood Substances 0.000 description 6
- 210000004369 blood Anatomy 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013506 data mapping Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the application discloses a method, a device, equipment and a storage medium for searching data assets, wherein the method for searching the data assets comprises the following steps: determining and sorting data assets in the data asset set based on the position of the keywords in each data asset and the priority of the positions of the keywords in the data assets, and obtaining a first data asset set with a first sorting, wherein the first data asset set contains the keywords; ranking the data assets of the first data asset set having the same keyword location in the data assets based on the value of each first data asset in the first data asset set to obtain a second data asset set having a second ranking, wherein the value of the first data asset is used to characterize at least one of: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance.
Description
Technical Field
The present application relates to, but not limited to, the field of computer technology, and in particular, to a method, apparatus, device, and storage medium for searching for data assets.
Background
Based on big data processing and storage, data asset management is used as a core, and data insight is used as a data asset management platform of a full life cycle with value expression, so that enterprise data is converted from difficult understanding and difficult management to controllable and operational. The data asset platform is built, the data can be processed, stored, authorized and searched uniformly, and the data is managed in full life cycle. However, how to realize intelligent searching and intelligent recommendation of data assets can more accurately meet the demands of people, and becomes a research topic.
Disclosure of Invention
In view of this, embodiments of the present application provide at least a method, an apparatus, a device, and a storage medium for searching data assets.
The technical scheme of the embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a method for searching a data asset, the method including: determining the position of a keyword carried in a search request in each data asset of a data asset set; determining a priority of the location of the keyword in the data asset; screening and sorting the data assets in the data asset set based on the position of the keyword in each data asset and the priority of the position of the keyword in the data asset, so as to obtain a first data asset set with a first sorting, wherein the first data asset set contains the keyword; ranking the data assets with the same keyword position in the data assets in the first data asset set based on the value of each first data asset in the first data asset set to obtain a second data asset set with a second ranking, and outputting the second data asset set with the second ranking, wherein the value of the first data asset is used for representing at least one of the following: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance.
In a second aspect, embodiments of the present application provide a search apparatus for data assets, the apparatus including: a first determining module, configured to determine a location of a keyword carried in the search request in each data asset of the data asset set; a second determining module for determining a priority of the location of the keyword in the data asset; the screening and sorting module is used for screening and sorting the data assets in the data asset set based on the position of the keyword in each data asset and the priority of the position of the keyword in the data asset, so as to obtain a first data asset set with a first sorting and containing the keyword; a first sorting module, configured to sort data assets with the same keyword position in the data assets in the first data asset set based on the value of each first data asset in the first data asset set, to obtain a second data asset set with a second sort, and output the second data asset set with the second sort, where the value of the first data asset is used to characterize at least one of the following: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance.
In a third aspect, embodiments of the present application provide a computer device comprising a memory and a processor, the memory storing a computer program executable on the processor, the processor implementing some or all of the steps of the above method when the program is executed.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method.
In the embodiment of the application, the keywords are firstly ranked according to the priority of the positions of the keywords in the data assets, and then ranked according to the value of the data assets, so that a second data asset set with a second ranking is obtained. The value of the data asset considers at least one of the use frequency, click heat, dependence of other data assets on the data asset and subject correlation of the data asset, so that the finally obtained second data asset set with the second order can be combined with factors such as heat of the data asset, use habit of the object, importance of the data asset and subject correlation, requirements of different objects can be met more accurately, personalized search is realized, and the object is helped to find the required data asset more quickly.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the aspects of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and, together with the description, serve to explain the technical aspects of the application.
Fig. 1 is a schematic implementation flow chart of a method for searching a data asset according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of another method for searching data assets according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an implementation flow chart for determining a value of a first data asset according to an embodiment of the present application;
FIG. 4 is a schematic implementation flow chart of a data asset recommendation method according to an embodiment of the present application;
FIG. 5A is a schematic diagram of a data asset inventory provided by an embodiment of the present application;
FIG. 5B is a schematic diagram of a framework for implementing intelligent recommendation of data assets according to an embodiment of the present application;
fig. 5C is a schematic diagram of a knowledge graph according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a structure of a data asset searching device according to an embodiment of the present application;
Fig. 7 is a schematic hardware entity diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application are further elaborated below in conjunction with the accompanying drawings and examples, which should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making inventive efforts are within the scope of protection of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
The term "first/second/third" is merely to distinguish similar objects and does not represent a specific ordering of objects, it being understood that the "first/second/third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the present application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the present application only and is not intended to be limiting of the present application.
Embodiments of the present application provide a method of searching for data assets, which may be performed by a processor of a computer device. The computer device may be a device with data processing capability, such as a server, a notebook computer, a tablet computer, a desktop computer, a smart television, a set-top box, a mobile device (e.g., a mobile phone, a portable video player, a personal digital assistant, a dedicated messaging device, and a portable game device). Fig. 1 is a schematic implementation flow chart of a data asset searching method according to an embodiment of the present application, as shown in fig. 1, the method includes steps S101 to S104 as follows:
step S101: determining the position of a keyword carried in a search request in each data asset of a data asset set;
here, a search request refers to a request for searching input by an object (e.g., a user). The data assets may include table assets, report assets, index assets, and asset priority index (Asset Priority Index, API) assets. The location in the data asset may include: name, content, and annotation of the data asset. Wherein the name of the data asset refers to the title of the data asset, the content refers to the body of the data asset, and the annotation refers to the interpretation of the content of the data asset. For example, where the data asset is a table asset, the location in the data asset includes a table name, a field, and an annotation.
The location of the keywords in each data asset may include the following: no keywords exist, keywords in names, keywords in content, or keywords in notes.
In some embodiments, the method for searching a data asset provided in the embodiments of the present application is applied to a data asset management platform, and the implementation of step S101 may include: after the object clicks the "data search" button in the data asset management platform, the above-described step S101 is performed.
Step S102: determining a priority of the location of the keyword in the data asset;
here, the priority refers to an order of the positions of the keywords searched out preferentially in the data assets, for example, the data assets whose positions of the keywords in the data assets are names are searched out preferentially; then searching out the data asset with the position of the keyword in the data asset as the content; and searching out the data asset with the position of the keyword in the data asset as the annotation. In implementation, the method can be preset according to requirements.
Step S103: screening and sorting the data assets in the data asset set based on the position of the keyword in each data asset and the priority of the position of the keyword in the data asset, so as to obtain a first data asset set with a first sorting, wherein the first data asset set contains the keyword;
Here, the implementation of step S103 may include: the method comprises the steps of screening data assets containing keywords from a data asset set, and sorting the screened data assets according to the priority of the positions of the keywords in the data assets to obtain a first data asset set containing the keywords and having a first sorting.
Correspondingly, in some embodiments, the location of the keywords in the data asset comprises: name, content, and notes. The implementation of step S103 may include the following steps S1031 and S1032:
step S1031: screening a target data asset set containing the keywords from the data asset sets based on the positions of the keywords in each data asset;
the target data asset set is the screened data asset set containing the keywords.
Step S1032: and sorting the data assets in the target data asset set based on the priority of the position of the keyword in the data assets from high to low as name, content and annotation, and obtaining a first data asset set with a first sort containing the keyword.
Here, the implementation of step S1032 may include: and sorting the data assets in the target data asset set according to the order of the priority from high to low, preferentially displaying the data assets of the keywords in the names, displaying the data assets of the keywords in the content, and finally displaying the data assets of the keywords in the notes to obtain a first data asset set.
Step S104: ranking the data assets with the same keyword position in the data assets in the first data asset set based on the value of each first data asset in the first data asset set to obtain a second data asset set with a second ranking, and outputting the second data asset set with the second ranking, wherein the value of the first data asset is used for representing at least one of the following: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance.
Here, the value of the first data asset is used to characterize at least one of: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance. I.e., the value of the first data asset is used to characterize any one, two, three, or four of the frequency of use of the data asset, the click heat, the dependency of other data assets on the data asset, and the subject matter relevance.
Wherein the frequency of use of the data asset is used to characterize whether the frequency of use of the data asset is frequent. In implementation, the data assets can be divided into common data assets and non-common data assets according to the use frequency of the data assets, for example, the common data assets and the non-common data assets can be obtained through a mode of collecting objects, wherein the collected data assets are common data assets, and the non-collected data assets are non-common data assets.
Click heat is used to characterize whether a data asset is clicked on by multiple people. In practice, the data assets may be ordered according to the number of clicks of the object, thereby obtaining click heat data.
The dependencies of other data assets on the data asset are used to characterize the blood-bearing relationship between data assets, e.g., data asset A is a subordinate to data asset B, data asset B is a subordinate to data asset C, and there is a blood-bearing relationship between data asset A, data asset B, and data asset C. In practice, this may be represented by the number of dependencies of other data assets on the data asset, e.g., data asset a is followed by data assets B and C, which are 2 in number, and then the other data asset has a dependency on the data asset of 2.
The topic relevance refers to whether the topic to be searched is related or not, and can be divided into two cases of topic relevance and topic irrelevance, for example. In implementation, related topics can be set in a collection mode, and non-collected topics are irrelevant topics. The related subject and the unrelated subject may also be set in a preset manner.
In some embodiments, the implementation of step S104 may include: setting different grades for the use frequency, click heat of the data asset, the dependence of other data assets on the data asset and the subject correlation, and setting different scores for each grade; then, the value of the data asset is obtained by adding the scores of the different items; and finally, sorting the data assets from high to low according to the value of the data assets, and obtaining a second data asset set with a second sorting.
In the embodiment of the application, first, determining the position of a keyword carried in a search request in each data asset of a data asset set; then, determining a priority of the location of the keyword in the data asset; then, screening and sorting the data assets in the data asset set based on the position of the keyword in each data asset and the priority of the position of the keyword in the data asset, so as to obtain a first data asset set with a first sorting, wherein the first data asset set contains the keyword; finally, sorting the data assets of the first data asset set, which have the same keyword position in the data assets, based on the value of each first data asset in the first data asset set, to obtain a second data asset set with a second sorting, and outputting the second data asset set with the second sorting, wherein the value of the first data asset is used for representing at least one of the following: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance.
It can be seen that in the embodiment of the present application, the second data asset set with the second ranking is obtained by first ranking according to the priority of the positions of the keywords in the data assets, and then ranking according to the value of the data assets. The value of the data asset considers at least one of the use frequency, click heat, dependence of other data assets on the data asset and subject correlation of the data asset, so that the finally obtained second data asset set with the second order can be combined with factors such as heat of the data asset, use habit of the object, importance of the data asset and subject correlation, requirements of different objects can be met more accurately, personalized search is realized, and the object is helped to find the required data asset more quickly.
In some embodiments, as shown in fig. 2, before step S103 "sort the data assets with the same positions of the keywords in the first data asset set in the data assets based on the value of each first data asset in the first data asset set", the following steps S201 and S202 are further included:
step S201: based on the priority of the database to which each first data asset in the first data asset set belongs, sorting the data assets with the same keyword position in the data assets in the first data asset set for the first time to obtain a third data asset set with a third sorting;
here, since different data assets come from different databases, e.g., HIVE libraries, business libraries, etc., the emphasis and focus of the different databases are different; and because the object may have preference on the data of a certain database, after the first data asset set is obtained, the data assets with the same positions of the keywords in the first data asset set in the data assets can be ranked again to obtain a third data asset set with a third ranking, so that the databases with high priority are preferentially displayed, and the databases with low priority are later displayed. In implementation, when keywords in the first 100 data assets are all in the names of the data assets, the implementation of step S201 is to sort the first 100 data assets once, so that the databases with high priority in the first 100 data assets are displayed preferentially, and the databases with low priority are displayed later.
Step S202: taking the third set of data assets as the first set of data assets;
here, the third data asset set is taken as the first data asset set in step S202, for executing step S104 described above.
Correspondingly, the implementation of step S104 may include: and based on the value of each third data asset in the third data asset set, sorting the data assets which are adjacent in the third data asset set and belong to the same database for the second time, and obtaining the second data asset set with the second sorting.
Here, since step S201 has already ordered once according to the priority of the database, the data assets belonging to the same database among the data assets whose positions of the keywords are the same are brought together. Thus, the implementation in step S104 is to sort the data assets in the third set of data assets that are adjacent and belong to the same database a second time, resulting in a second set of data assets having a second sort. For example, if the keywords in the first 100 data assets are all in the names of the data assets, the implementation of step S201 is to sort the first 100 data assets first, so that the first 30 data assets belong to the data assets of the database HIVE library with high priority, and the implementation of step S104 is to sort the first 30 data assets again according to the value of the data assets, so as to obtain the second data asset set with the second sort.
In the embodiment of the application, before sorting according to the value of the data asset, the data asset is sorted according to the priority of the database, so that the database commonly used by the object or the database content is preferentially displayed compared with the database meeting the object requirement, the searched data set meets the object requirement better, and the object is helped to find the required data asset faster.
In some embodiments, as shown in FIG. 3, the value of the first data asset is used to characterize the frequency of use of the data asset, the click heat, and the dependence of other data assets on the data asset, and correspondingly, the method further includes the following steps S301a through S303a:
step S301a: acquiring a use frequency grade, a query volume ranking and the number of directly-subsequent-stage dependent data assets of each first data asset in the first data asset set;
here, the usage frequency ranking may be obtained by ranking the usage frequencies of the above-described data assets; or may be set according to the usage preference of the object. In some embodiments, the usage frequency level may include common data assets and unusual data assets, where implemented, the common data assets may be set by collection, etc., and the uncollected data assets are unusual data assets. In some embodiments, the frequency of use level may also be set by way of an object preset, such as frequent use, infrequent use, occasional use, and the like.
The query quantity ranking, namely the click heat, can be obtained by counting the query times of the data assets within a period of time and ranking the query times.
The number of directly post-dependent data assets, i.e., the above-described dependence of other data assets on the data asset, is, in some embodiments, the number of directly post-dependent data assets that have a blood-based relationship with the data asset.
Step S302a: determining a frequency of use score, a heat score and a dependency score of the corresponding data asset based on the frequency of use level, the query volume ranking and the number of directly-subsequent dependent data assets of each first data asset;
here, the implementation of step S302a may include: different scores are defined for different usage frequency levels, query volume rankings, and number of directly subsequent dependent data assets, resulting in usage frequency scores, heat scores, and dependency scores for the data assets. For example, a common data asset score of 50 points, an unusual data asset score of 0 points; query volume ranking: front (top) 10:30 minutes, top11-top30:20 minutes, top31-top100:10 minutes; direct post-dependency data asset quantity: 1. the number is more than 20 and 20 minutes, and the number is more than 2 and less than 10 and less than or equal to 20 and 10 minutes; 3. the number is more than 5 and less than or equal to 10 and 5 minutes. The setting modes of the different grades of scores are not limited in the embodiment of the application.
Step S303a: the value of the corresponding first data asset is determined based on the usage frequency score, the heat score, and the dependency score for each first data asset.
Here, the implementation of step S303a may include: the usage frequency score, the heat score, and the dependency score for each first data asset are summed to obtain a value for the corresponding first data asset.
In the embodiment of the application, the value of each first data asset is obtained by acquiring the use frequency grade, the query quantity ranking and the number of directly-subsequent-stage dependent data assets in the first data asset set and the corresponding score thereof, and then adding the use frequency score, the heat score and the dependent score of each first data asset to obtain the value of the corresponding first data asset, so that the quantification of the value of the data asset is realized.
In some embodiments, the value of the first data asset is used to characterize the relevance of the topic, the method further comprising the following steps S301b to S303b:
step S301b: obtaining a subject matter relevance of each first data asset in the first data asset set;
here, the degree of relevance of the subject matter may include relevance to the subject matter of the object preference or relevance to the subject matter of the object preference. In implementation, subjects with object preference (i.e., related to subjects) and subjects with no preference (i.e., not related to subjects) can be set in a preset manner; the method can also gradually collect the topics preferred by the object in a collection mode, and the topics not collected are topics not preferred by the object.
Step S302b: determining a topic relevance score for each data asset based on the topic relevance of the corresponding data asset;
here, the implementation of step S302b may also include: and setting different scores for the relevance of the topics, so as to obtain the topic relevance score of the data asset. For example, subject matter is relevant, 30 points, subject matter is irrelevant, 0 points.
Step S303b: the value of the corresponding first data asset is determined based on the topic relevance score for each first data asset.
Here, the implementation of step S303b is such that the value of the first data asset is equal to the topic relevance score of the first data asset. In some embodiments, where the value of the first data asset is also used to characterize at least one of a frequency of use of the data asset, a heat of click, and a dependency of the other data asset on the data asset, the value of the first data asset is equal to the subject matter relevance score of the first data asset plus a score of the other term.
In some embodiments, the method is applied to a data asset management platform, and the platform comprises a data asset catalog covering theme domains, subtopics, tables and fields, and is used for realizing navigation searching of data assets and searching according to the theme catalog, so that an object can conveniently know the full view of the data assets and accurately find the assets to be searched.
In some embodiments, as shown in fig. 4, the method further includes the following steps S401 to S404:
step S401: determining a similar object subset with similarity meeting a preset condition with a target object sending out the search request in an object set based on a knowledge graph among an object group, objects, business metadata assets and business data assets in response to a data recommendation request, wherein the object set comprises at least two objects in the object group;
here, the object may be a user using the data asset management platform, and the object group is a group consisting of a plurality of objects, such as a sales group, a development group, and the like. The object set comprises objects of at least two object groups, i.e. the object set may comprise objects of a part of the object groups or may comprise objects of all the object groups.
Business metadata assets include index metadata assets and report metadata assets. In some embodiments, the index metadata asset is also referred to as an index asset, such as an asset associated with an index, sales amount, sales quantity, responsible person, and the like. Report metadata assets are also referred to as report assets, such as assets related to report attributes, report names, report descriptions, report identification numbers, and the like.
Business data assets refer to data assets stored in a database that are related to a business. In some embodiments, business data assets are also referred to as table assets, e.g., real data of a table, etc.
The preset condition may be set according to an actual use condition, for example, members of the same object group are referred to as a subset of similar objects whose similarity satisfies the preset condition; for another example, objects having the same preference are referred to as a subset of similar objects whose similarity satisfies a preset condition.
In some embodiments, the implementation of step S401 may include: after the target object clicks the "data recommendation" button in the data asset management platform, step S401 described above is performed.
Step S402: acquiring data assets viewed by each similar object in the similar object subset and data assets viewed by the target object;
here, the implementation of step S402 may obtain the data asset viewed by each similar object in the subset of similar objects and the data asset viewed by the target object in the preset period of time, for reducing the calculation amount.
Step S403: determining data assets which are not viewed by the target object in the data assets viewed by each similar object based on the data assets viewed by each similar object and the data assets viewed by the target object;
Here, the data assets that are not viewed by the target object in the data assets viewed by each similar object are the data assets obtained by subtracting the data assets viewed by the target object from the data assets viewed by each similar object. For example, if the similar object is a, the data assets b and C are viewed, the target object is C, and the data assets b and e are viewed, then the data asset that is not viewed by the target object in the data assets viewed by the similar object is C.
Step S404: and recommending the data assets which are not viewed by the target object in the data assets viewed by each similar object to the target object.
Here, the description will also be given taking, as an example, c, a data asset that is not viewed by the target object, among the data assets that are viewed by the similar objects, and recommending c to the target object.
In the embodiment of the application, firstly, a similar object subset similar to a target object sending a search request is determined through a knowledge graph among an object group, an object, a business metadata asset and a business data asset; then, determining data assets which are not viewed by the target object in the data assets viewed by each similar object in the similar object subset; and finally, recommending the data assets which are not viewed by the target object in the data assets viewed by each similar object to the target object, so that the target object can view the data assets viewed by other similar objects, and further know more data assets related to the target object.
In some embodiments, the data asset viewed by the target object includes a business metadata asset or a business data asset, the business metadata asset being associated with the business data asset, and correspondingly, after step S502 "acquire the data asset viewed by the target object", further includes the following steps S601 and S602:
step S601: determining a first associated data asset based on the object group, the objects, the business metadata assets, the knowledge graph between business data assets, and the data assets viewed by the target object, wherein the first associated data asset is associated with and different from the data assets viewed by the target object;
here, since the data asset viewed by the target object includes the business metadata asset or the business data asset, that is, the target object views only a part of the business metadata asset and the business data asset, the implementation of step S601 may derive the first associated data asset that is associated with and different from the data asset viewed by the target object according to the object group, the object, the business metadata asset, and the knowledge graph between the business data assets, for example, the data asset viewed by the target object is the business data asset, and the first associated data asset that is associated with and different from the data asset viewed by the target object is the business metadata asset. For another example, the data asset viewed by the target object is a business metadata asset, and the first associated data asset that is associated with and different from the data asset viewed by the target object is a business data asset. In the case where the business metadata asset includes an index metadata asset and a report metadata asset, if the data asset viewed by the target object is the index metadata asset, the first associated data asset that is associated with and different from the data asset viewed by the target object may be the business data asset and the report metadata asset.
Step S602: recommending the first associated data asset to the target object.
In the embodiment of the application, through the knowledge graph among the object group, the object, the business metadata asset and the business data asset, the first associated data asset which is associated with and different from the data asset checked by the target object is determined, and the first associated data asset is recommended to the target object, so that the target object can see more related data assets and know more data assets in the same direction.
In some embodiments, after determining that the target object is not viewed in the data assets viewed by the similar objects in step S503", the following steps S5031 and S5032 are further included:
step S5031: determining a second associated data asset based on the object group, the objects, the business metadata assets, the knowledge graph between business data assets, and the data assets not viewed by the target object in the data assets viewed by each similar object, wherein the second associated data asset is associated with and different from the data assets not viewed by the target object in the data assets viewed by each similar object;
Here, as above, if the data asset not viewed by the target object in the data assets viewed by the similar object is a business data asset, the second associated data asset which is associated with and different from the target data asset not viewed by the target object in the data assets viewed by the similar object is a business metadata asset. For another example, if the data asset not viewed by the target object in the data assets viewed by the similar object is a business metadata asset, then a second associated data asset associated with and different from the target data asset not viewed by the target object in the data assets viewed by the similar object is a business data asset. In the case where the business metadata asset includes an index metadata asset and a report metadata asset, if a data asset of the data asset viewed by the similar object that is not viewed by the target object is the index metadata asset, the second associated data asset that is associated with and different from the target data asset of the data asset viewed by the similar object may be the business data asset and the report metadata asset.
Step S5032: recommending the second associated data asset to the target object.
In the embodiment of the application, through the knowledge graph among the object group, the objects, the business metadata assets and the business data assets, the second associated data assets which are associated with and different from the target data assets which are not viewed by the target object in the data assets viewed by the similar objects are determined, and the second associated data assets are recommended to the target object, so that the target object can see more related data assets and know more data assets in the same direction.
In some embodiments, the implementation of step S501 "determining, in the object set, a subset of similar objects having similarity to the target object from which the search request is issued satisfying a preset condition based on the knowledge graph between the object group, the object, the business metadata asset, and the business data asset" may include the following steps S5011a to S5013a:
step S5011a: determining an object group in which the target object is located based on the object group, the object, the business metadata asset and the knowledge graph among the business data assets;
here, since the knowledge graph includes the relationship between the object group and the object, the object group in which the target object is located may be determined by the knowledge graph after the target object is obtained.
Step S5012a: determining other objects except the target object in the object group;
here, if the object group in which the target object is located includes A, B, C three objects, and the target object is object a, the other objects in the object group except for the target object include B and C.
Step S5013a: and determining the other objects as a similar object subset with the similarity with the target object meeting the preset condition.
That is, the objects B and C are a subset of similar objects whose similarity with the target object satisfies a preset condition.
In the embodiment of the present invention, the object belonging to the same object group is determined by the knowledge graph including the relationship between the object group and the object, and because the data to be checked of the object of the same object group is generally the same or similar, the data assets checked by other objects in the same object group except for the target object are recommended to the target object, so that more relevant data assets can be recommended to the target object.
In some embodiments, the implementation of step S501 "determining, in the object set, a subset of similar objects having similarity to the target object from which the search request is issued satisfying a preset condition based on the knowledge graph between the object group, the object, the business metadata asset, and the business data asset" may include the following steps S5011b to S5014b:
step S5011b: acquiring data assets and first quantity of the data assets, which are checked by each object in the object set in a preset time period;
here, the preset time may be one month or three months, and may be set according to the requirements when implemented. An object set refers to all or part of the objects that use the data asset management platform. The data assets and the first quantity thereof that each object views within a preset period of time refer to all the data assets and quantities that each object views. For example, object a views a data asset of a, b, c, d, e, then object a views a first number of data assets of 5,
Step S5012b: determining, for each object in the set of objects other than the target object, a second number of identical data assets viewed by the target object and each of the other objects within a preset time period;
here, the same data asset refers to the duplicate asset viewed by the target object and other objects, for example, the data asset viewed by the target object C is a, B, C, d, e, the data asset viewed by the object B is a, B, C, and then the same data asset viewed by the target object C and the object B is a, B, C, and the second number is 3.
Step S5013b: determining a maximum of a third number of data assets viewed by the target object and the first number of each of the other objects;
here, if the data asset viewed by the target object C is a, B, C, d, e, the third number is 5, and the data assets viewed by the object B are a, B, and C, the first number of the object B is 3, and correspondingly, the maximum value of the third number and the first number is 5.
Step S5014b: and for each object except the target object in the object set, determining a similar object subset with the similarity meeting a preset condition with the target object in the object set based on the second quantity and the maximum value.
Here, the preset condition may be a number of repetitions of the data asset viewed by some other object and the target object (i.e., the second number) divided by a maximum value of the number of the data asset viewed by the other object and the target object, and if the ratio is greater than the threshold value, the other object is considered to be a similar object whose similarity satisfies the preset condition. For example, the second number of the target objects C and the objects B is 3, the maximum value of the target objects C and the objects B is 5, and the ratio of the second number to the maximum value is 3/5. If the threshold is 50%, the ratio is greater than the threshold, and the object B is considered to be a similar object with similarity meeting the preset condition. And determining the ratio of the second number to the maximum value for all other objects in the object set to judge whether the other objects are similar objects with the similarity meeting the preset condition, thereby obtaining a similar object subset.
In the embodiment of the application, by determining the second number of the same data assets in the data assets viewed by each other object and the target object and the maximum value of the data assets viewed by each other object and the target object, and then determining the similar object subset meeting the preset condition with the similarity of the target object in the object set according to the ratio of the second number of each other object and the maximum value, the repetition rate of the data assets viewed by two objects is achieved, a virtual similar object group is obtained, and further, the data assets viewed by other objects except the target object in the virtual similar object group are recommended to the target object, so that more and more related data assets are recommended to the target object.
In some embodiments, because of the blood-bearing relationship between data assets, upstream and downstream data assets in the collection of data assets that have a blood-bearing relationship with data asset A may also be recommended to the target object if the target object or similar object views data asset A.
Based on big data processing and storage, data asset management is used as a core, and data insight is used as a data asset management platform of a full life cycle with value expression, so that enterprise data is converted from difficult understanding and difficult management to controllable and operational. The data asset platform is built, the data can be processed, stored, authorized and searched uniformly, and the data is managed in full life cycle. And performing accurate search of the data according to the use heat of the data, the value of the data asset and the relativity of the data, so that the search of the data asset information is more intelligent. By matching the user (namely the object), the user attribute and the data information, the data can automatically find the user needing the information, and the data value can be maximized. Therefore, data recommendation and data searching are realized, a user is helped to find required data faster, understand and use the data, and the use frequency of the user is increased from passive inquiry of the previous data from the system to active pushing of the data.
The embodiment of the application performs full text retrieval on the data asset, takes the data as an asset, and performs automatic maintenance, model generation, intelligent retrieval and the like on the relationship. The method has the advantages that the method has high-efficiency retrieval and hit rate aiming at the large data generated asset catalogue and the field blood relationship, the label, the index and the like in the asset catalogue, and meanwhile, the priority retrieval can be effectively carried out on some Top N data with hotter query.
In the aspect of data asset management, the embodiment of the application firstly realizes unified management of assets (namely, the data assets) such as table assets, index assets and report assets, and realizes standardized acquisition access capability of asset data and visual presentation of a data map. Entity relations (i.e., the knowledge maps) based on data assets such as teams, people, reports, index names, tables and the like are supported, full-text retrieval of asset data is supported, data recommendation is realized, and better understanding and use of the asset data are facilitated. According to the using heat and the value of the data assets in the data dictionary, index asset, table asset and API asset in the data inquiry, offline scheduling and business intelligence (Business Intelligence, BI) data analysis and other platforms, the data searching is more intelligent.
A first part: intelligent data search
Unified data asset management, supporting users to quickly retrieve their own data assets of interest and to learn data asset relationships.
Data intelligent search rules (illustrated by way of example as table assets for data assets):
step 1, search Guan Jianzi (wherein the key word matching rule is that the table name (i.e. the above name) > the field (i.e. the above content) > the comment), input search Guan Jianzi, match the table whose table name contains the key word preferentially, then match the table whose field name matches Guan Jianzi, and match the table whose table comment contains the searched key word again.
And 2, sorting according to the names of the appointed libraries (namely the database), namely sorting the library names of the search results in the step 1.
Step 3, value (i.e. the first data asset value) ordering rules: according to the common table (namely the using frequency of the data assets), the number of times of inquiry in the data inquiry platform (namely the click heat) and the number of directly later-stage dependent tables of the table (namely the dependence of other data assets on the data assets), the score of the ranking table is calculated according to the importance, and the table with high score has high value and is preferentially searched.
In practice, different scores may be assigned to different situations depending on importance. For example, the scoring method is as follows:
(1) Common table score: 50 minutes
(2) The most frequently queried table rank scores in the data query platform: up to 30 minutes (Top 10:30 minutes, top11-Top30:20 minutes, top31-Top100:10 minutes)
( 3) The more the number of directly-later-stage dependent tables is, the higher the ranking priority of the tables is, the highest 20 points (1, the number of the tables is more than 20 and 20 points, 2 and 10 are less than the number of the tables and less than or equal to 20 and 10 points; 3. the table number is more than 5 and less than or equal to 10 and 5 minutes )
The score (i.e., value) for each table is obtained by summing the scores for each table.
A second part: business perspective data recommendation and asset navigation
And recommending related topic tables from a service perspective, and realizing data asset navigation and searching according to topic catalogs. And aggregating the related tables according to the service topics, and providing data navigation of the topics and the subtopic catalogs for the user. Data assets are managed, identified, and shared by category, topic, application, and multiple angles. The data classification has great value in data asset management, business value is used as a guide, management efficiency is improved, and user experience is improved.
The construction step of the data asset catalogue:
(1) Checking data resources, combing the corresponding relation between the table and the functions, and comprehensively combing an access service system and a table data dictionary;
(2) Analyzing a data link, namely combing a data circulation path according to a data dictionary, and identifying a data source and a link relation;
(3) And constructing a data asset catalog and publishing the data asset catalog to each business department. From company data resource classification, making directory structure specification; and constructing a data asset catalog covering the theme domain, the subtopic, the table and the field, further forming a corresponding relation between the service and the data, and publishing the data asset catalog.
As shown in fig. 5A, the theme zone includes: digital marketing, sales operations, service experience, finance, and labor; in the case of a topic domain being digital marketing, the subtopic includes: e-commerce, finance, users, and others, while there are multiple tables under each sub-topic, e.g., under an e-commerce sub-topic there are 10 tables, and the contents of each table include fields. In other cases, there are 50 tables and 225 offline scheduling tasks at the lower level.
Third section: data intelligent recommendation
As shown in fig. 5B, the implementation of this part mainly includes collecting data (data from HIVE library, business library, etc.), constructing asset model (including table asset, report asset, etc.), constructing knowledge graph (for representing the relationship between user and user, user and report, etc.), and data recommendation (making table asset recommendation, report recommendation, etc.). And constructing a data asset map by establishing a business and data mapping relation, tracing a business view and tracing a data view, wherein the data asset map comprises a business view, a data view and a mapping relation between the business and the data view, combing and identifying data assets and a data asset relation overall, and constructing a knowledge graph by utilizing the data asset map. Knowledge maps may be used to represent relationships between entities, such as relationships between users and users, between users and data assets, and between data assets and data assets. And constructing a knowledge graph based on entity relations of team, person, report, index name, table and the like (wherein the team is the object group, the person is the object, the report and the index name are the business metadata asset, the table is the business data asset), calculating the similarity between two users by using the preference of the users to the same kind of data asset, and recommending required data to the users to realize' data searching ".
For example, users in the same group also have common data viewing preferences (i.e., other members of the same team as the user have the same data viewing preferences), three users in user group 1A, B, C learn the A user data viewing preferences, generate recommendations based thereon, and recommend the data assets of user A preferences to B, C users. For another example, a virtual user group is constructed, and data recommendation is performed based on the virtual user group. The similarity between two users is calculated by using the users to view the same data assets, and a virtual user group is constructed (namely, a virtual team is built aiming at the users with high viewing repetition rate, and data viewed by other members in the virtual team are pushed to the users).
The method comprises the following specific steps:
step 1: constructing a knowledge graph based on entity relations of team, person, report, index name, table and the like (as shown in fig. 5C, the knowledge graph shows the relations among team, person, report, index name and table), and constructing the relations among users, reports and table assets;
step 2: calculating the similarity between users by using the preference of the users for the same data asset;
step 3: recommendations are generated accordingly, for example: the data assets of user a preferences are recommended to B, C users having the same attributes.
Compared with the related art, the method has the following advantages:
1. based on entity relations of team, person, report, index, table and the like, data asset recommendation and data person finding are realized.
2. The usage data usage heat and data asset value support more intelligent searching of data assets.
3. Data blood-edge and business logic data relationships are applied, and list, summary list and list upstream and downstream recommendations.
4. And recommending related topic tables from a service perspective, and realizing data asset navigation and searching according to topic catalogs.
5. The whole process from generation to consumption of the data is recorded, so that the user is helped to get insight into the data and the data value is released.
6. And the data is uniformly managed, and the user can conveniently and intensively view and use the data.
The main invention is as follows:
1. entity relationships based on user groups formed by business departments are relatively common data recommendations based on entity relationships. The similarity among users is calculated by using the preference of the users to the same kind of data assets, which is a part of the current system which needs continuous optimization, and the recommendation effect can be increased along with the accumulation of the historical values.
2. Based on the data use heat and the data asset value, the data searching is more intelligent, and the user is helped to find the required data more quickly.
Based on the foregoing embodiments, the embodiments of the present application provide a search apparatus for data assets, where the apparatus includes units included, and modules included in the units may be implemented by a processor in a computer device; of course, the method can also be realized by a specific logic circuit; in practice, the processor may be a central processing unit (Central Processing Unit, CPU), microprocessor (Microprocessor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field programmable gate array (Field Programmable Gate Array, FPGA), etc.
Fig. 6 is a schematic structural diagram of a data asset searching device according to an embodiment of the present application, and as shown in fig. 6, a data asset searching device 600 includes: a first determination module 610, a second determination module 620, a screening ranking module 630, and a first ranking module 640, wherein:
a first determining module 610, configured to determine a location of a keyword carried in the search request in each data asset of the set of data assets;
a second determination module 620 for determining a priority of the location of the keyword in the data asset;
a screening and sorting module 630, configured to screen and sort the data assets in the data asset set based on the position of the keyword in each data asset and the priority of the position of the keyword in the data asset, to obtain a first data asset set with a first sort including the keyword;
A first ranking module 640, configured to rank, based on the value of each first data asset in the first data asset set, data assets with the same keyword position in the data assets in the first data asset set, to obtain a second data asset set with a second ranking, and output the second data asset set with the second ranking, where the value of the first data asset is used to characterize at least one of the following: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance.
In some embodiments, the apparatus further comprises: a second sorting module, configured to, before sorting, based on the value of each first data asset in the first data asset set, data assets with the same position of a keyword in the first data asset set in the data assets, perform a first sorting on data assets with the same position of the keyword in the first data asset set in the data assets based on the priority of a database to which each first data asset in the first data asset set belongs, to obtain a third data asset set with a third sorting; a third determination module for taking the third set of data assets as the first set of data assets; correspondingly, the first sorting module 640 is further configured to sort the data assets adjacent to each other and belonging to the same database in the third data asset set for a second time based on the value of each third data asset in the third data asset set, so as to obtain the second data asset set with the second sorting.
In some embodiments, the location of the keyword in the data asset comprises: name, content and annotation, the filter ordering module includes: a screening sub-module, configured to screen a target data asset set containing the keyword from the data asset set based on the position of the keyword in each data asset; and the sorting sub-module is used for sorting the data assets in the target data asset set based on the order of the priority of the position of the keyword in the data assets from high to low as name, content and annotation, so as to obtain a first data asset set with a first sorting and containing the keyword.
In some embodiments, the value of the first data asset is used to characterize the frequency of use of the data asset, the heat of click, and the dependence of other data assets on the data asset, and correspondingly, the apparatus further comprises: a first acquisition module for acquiring a frequency of use level, a query volume ranking, and a number of directly-later-stage dependent data assets for each first data asset in the first set of data assets; a fourth determining module, configured to determine a usage frequency score, a heat score, and a dependency score of the corresponding data asset, based on the usage frequency level, the query volume rank, and the number of directly-subsequent dependent data assets of each first data asset, respectively; and a fifth determining module for determining the value of the corresponding first data asset based on the usage frequency score, the heat score and the dependency score of each first data asset.
In some embodiments, the value of the first data asset is used to characterize relevance of a topic, the apparatus further comprising: the second acquisition module is used for acquiring the subject relevance of each first data asset in the first data asset set; a sixth determining module, configured to determine a topic relevance score of the corresponding data asset based on the topic relevance of each first data asset; a seventh determining module for determining a value of the corresponding first data asset based on the topic relevance score for each first data asset.
In some embodiments, the apparatus further comprises: an eighth determining module, configured to determine, in response to a data recommendation request, a subset of similar objects that satisfy a preset condition with respect to a target object similarity from which the search request is issued, from an object set based on a knowledge graph among an object group, objects, service metadata assets, and service data assets, where the object set includes at least two objects in the object group; a third obtaining module, configured to obtain a data asset viewed by each similar object in the subset of similar objects and a data asset viewed by the target object; a ninth determining module, configured to determine, based on the data asset viewed by each similar object and the data asset viewed by the target object, a data asset that is not viewed by the target object in the data assets viewed by each similar object; and the first recommending module is used for recommending the data assets which are not viewed by the target object in the data assets viewed by each similar object to the target object.
In some embodiments, the data asset viewed by the target object comprises a business metadata asset or a business data asset, the business metadata asset being associated with the business data asset, the apparatus further comprising: a tenth determining module, configured to determine, after the acquiring the data asset viewed by the target object, a first associated data asset based on the object group, the object, the business metadata asset, the knowledge graph between the business data assets, and the data asset viewed by the target object, where the first associated data asset is associated with and different from the data asset viewed by the target object; and the second recommending module is used for recommending the first associated data asset to the target object.
In some embodiments, the apparatus further comprises: an eleventh determining module configured to determine a second associated data asset based on the object group, the objects, the business metadata asset, the knowledge graph among the business data assets, and the data assets that are not viewed by the target object in the data assets viewed by each similar object, where the second associated data asset is associated with and different from the data assets that are not viewed by the target object in the data assets viewed by each similar object; and the third recommending module is used for recommending the second associated data asset to the target object.
In some embodiments, the eighth determination module comprises: the first determining submodule is used for determining an object group where the target object is located based on the object group, the object, the business metadata asset and the knowledge graph among the business data assets; a second determining submodule, configured to determine other objects in the object group except the target object; and the third determining submodule is used for determining the other objects as a similar object subset with the similarity with the target object meeting the preset condition.
In some embodiments, the eighth determination module comprises: the acquisition sub-module is used for acquiring the data assets and the first quantity of the data assets which are checked by each object in the object set in a preset time period; a fourth determining submodule, configured to determine, for each object in the set of objects except for the target object, a second number of identical data assets viewed by the target object and each other object within a preset time period; a fifth determination sub-module for determining a maximum of a third number of data assets viewed by the target object and the first number of each of the other objects; a sixth determining sub-module, configured to determine, for each object in the object set except for the target object, a subset of similar objects in the object set, where the similarity with the target object meets a preset condition, based on the second number and the maximum value.
The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. In some embodiments, functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to perform the methods described in the method embodiments, and for technical details not disclosed in the apparatus embodiments of the present application, please understand with reference to the description of the method embodiments of the present application.
It should be noted that, in the embodiment of the present application, if the above-mentioned searching method and recommendation method for data assets are implemented in the form of software functional modules, and sold or used as independent products, they may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or portions contributing to the related art, and the software product may be stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the present application are not limited to any specific hardware, software, or firmware, or to any combination of hardware, software, and firmware.
The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the program to realize part or all of the steps of the method.
Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method. The computer readable storage medium may be transitory or non-transitory.
Embodiments of the present application provide a computer program comprising computer readable code which, when run in a computer device, performs some or all of the steps for implementing the above method.
Embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, in other embodiments the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It should be noted here that: the above description of various embodiments is intended to emphasize the differences between the various embodiments, the same or similar features being referred to each other. The above description of apparatus, storage medium, computer program and computer program product embodiments is similar to that of method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, storage medium, computer program and computer program product of the present application, please refer to the description of the method embodiments of the present application.
It should be noted that, fig. 7 is a schematic diagram of a hardware entity of a computer device in the embodiment of the present application, as shown in fig. 7, the hardware entity of the computer device 700 includes: a processor 701, a communication interface 702, and a memory 703, wherein:
the processor 701 generally controls the overall operation of the computer device 700.
The memory 703 is configured to store instructions and applications executable by the processor 701, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or processed by various modules in the processor 701 and the computer device 700, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM). Data transfer may occur between the processor 701, the communication interface 702 and the memory 703 via the bus 704.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence number of each step/process described above does not mean that the execution sequence of each step/process should be determined by the function and the internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
Alternatively, the integrated units described above may be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
The foregoing is merely an embodiment of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered in the protection scope of the present application.
Claims (10)
1. A method of searching for data assets, comprising:
determining the position of a keyword carried in a search request in each data asset of a data asset set;
determining a priority of the location of the keyword in the data asset;
screening and sorting the data assets in the data asset set based on the position of the keyword in each data asset and the priority of the position of the keyword in the data asset, so as to obtain a first data asset set with a first sorting, wherein the first data asset set contains the keyword;
ranking the data assets with the same keyword position in the data assets in the first data asset set based on the value of each first data asset in the first data asset set to obtain a second data asset set with a second ranking, and outputting the second data asset set with the second ranking, wherein the value of the first data asset is used for representing at least one of the following: frequency of use of a data asset, click heat, dependency of other data assets on the data asset, topic relevance.
2. The method of claim 1, further comprising, prior to said ranking data assets of the first set of data assets that have the same location of keywords in the data assets based on the value of each first data asset in the first set of data assets:
based on the priority of the database to which each first data asset in the first data asset set belongs, sorting the data assets with the same keyword position in the data assets in the first data asset set for the first time to obtain a third data asset set with a third sorting;
taking the third set of data assets as the first set of data assets;
correspondingly, the sorting the data assets with the same positions of the keywords in the first data asset set in the data assets based on the value of each first data asset in the first data asset set, to obtain a second data asset set with a second sorting, including:
and based on the value of each third data asset in the third data asset set, sorting the data assets which are adjacent in the third data asset set and belong to the same database for the second time, and obtaining the second data asset set with the second sorting.
3. The method of claim 1 or 2, wherein the location of the keyword in the data asset comprises: a name, content, and annotation, the screening and sorting of data assets in the set of data assets based on the location of the keywords in each data asset and the priority of the locations of the keywords in the data assets, resulting in a first set of data assets having a first order including the keywords, comprising:
screening a target data asset set containing the keywords from the data asset sets based on the positions of the keywords in each data asset;
and sorting the data assets in the target data asset set based on the priority of the position of the keyword in the data assets from high to low as name, content and annotation, and obtaining a first data asset set with a first sort containing the keyword.
4. The method of claim 1 or 2, wherein the value of the first data asset is used to characterize the frequency of use of the data asset, the heat of click, and the dependence of other data assets on the data asset, and wherein the method further comprises:
Acquiring a use frequency grade, a query volume ranking and the number of directly-subsequent-stage dependent data assets of each first data asset in the first data asset set;
determining a frequency of use score, a heat score and a dependency score of the corresponding data asset based on the frequency of use level, the query volume ranking and the number of directly-subsequent dependent data assets of each first data asset;
the value of the corresponding first data asset is determined based on the usage frequency score, the heat score, and the dependency score for each first data asset.
5. A method according to claim 3, wherein the value of the first data asset is used to characterize relevance of a topic, the method further comprising:
obtaining a subject matter relevance of each first data asset in the first data asset set;
determining a topic relevance score for each data asset based on the topic relevance of the corresponding data asset;
the value of the corresponding first data asset is determined based on the topic relevance score for each first data asset.
6. The method according to claim 1 or 2, further comprising:
determining a similar object subset with similarity meeting a preset condition with a target object sending out the search request in an object set based on a knowledge graph among an object group, objects, business metadata assets and business data assets in response to a data recommendation request, wherein the object set comprises at least two objects in the object group;
Acquiring data assets viewed by each similar object in the similar object subset and data assets viewed by the target object;
determining data assets which are not viewed by the target object in the data assets viewed by each similar object based on the data assets viewed by each similar object and the data assets viewed by the target object;
and recommending the data assets which are not viewed by the target object in the data assets viewed by each similar object to the target object.
7. The method of claim 6, the data asset viewed by the target object comprising a business metadata asset or a business data asset, the business metadata asset being associated with the business data asset, correspondingly, after the obtaining the data asset viewed by the target object, further comprising:
determining a first associated data asset based on the object group, the objects, the business metadata assets, the knowledge graph between business data assets, and the data assets viewed by the target object, wherein the first associated data asset is associated with and different from the data assets viewed by the target object;
recommending the first associated data asset to the target object.
8. The method of claim 6, further comprising, after said determining that said target object is not viewed from among said data assets viewed by each similar object:
determining a second associated data asset based on the object group, the objects, the business metadata assets, the knowledge graph between business data assets, and the data assets not viewed by the target object in the data assets viewed by each similar object, wherein the second associated data asset is associated with and different from the data assets not viewed by the target object in the data assets viewed by each similar object;
recommending the second associated data asset to the target object.
9. The method of claim 6, wherein determining a subset of similar objects in the set of objects that satisfy a predetermined condition with respect to a similarity to a target object that issued the search request based on a knowledge graph between the group of objects, the object, the business metadata asset, and the business data asset, comprises:
determining an object group in which the target object is located based on the object group, the object, the business metadata asset and the knowledge graph among the business data assets;
Determining other objects except the target object in the object group;
and determining the other objects as a similar object subset with the similarity with the target object meeting the preset condition.
10. The method of claim 6, wherein determining a subset of similar objects in the set of objects that satisfy a predetermined condition with respect to a similarity to a target object that issued the search request based on a knowledge graph between the group of objects, the object, the business metadata asset, and the business data asset, comprises:
acquiring data assets and first quantity of the data assets, which are checked by each object in the object set in a preset time period;
determining, for each object in the set of objects other than the target object, a second number of identical data assets viewed by the target object and each of the other objects within a preset time period;
determining a maximum of a third number of data assets viewed by the target object and the first number of each of the other objects;
and for each object except the target object in the object set, determining a similar object subset with the similarity meeting a preset condition with the target object in the object set based on the second quantity and the maximum value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310140331.4A CN116186097A (en) | 2023-02-20 | 2023-02-20 | Method, device, equipment and storage medium for searching data asset |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310140331.4A CN116186097A (en) | 2023-02-20 | 2023-02-20 | Method, device, equipment and storage medium for searching data asset |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116186097A true CN116186097A (en) | 2023-05-30 |
Family
ID=86437944
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310140331.4A Pending CN116186097A (en) | 2023-02-20 | 2023-02-20 | Method, device, equipment and storage medium for searching data asset |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116186097A (en) |
-
2023
- 2023-02-20 CN CN202310140331.4A patent/CN116186097A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kim et al. | A scientometric review of emerging trends and new developments in recommendation systems | |
US9830390B2 (en) | Related entities | |
US9576029B2 (en) | Trust propagation through both explicit and implicit social networks | |
Li et al. | A hybrid recommendation system for Q&A documents | |
Zangerle et al. | Using tag recommendations to homogenize folksonomies in microblogging environments | |
EP2467789A2 (en) | Semantic trading floor | |
Sun et al. | CWS: a comparative web search system | |
US8997008B2 (en) | System and method for searching through a graphic user interface | |
KR101088710B1 (en) | Method and Apparatus for Online Community Post Searching Based on Interactions between Online Community User and Computer Readable Recording Medium Storing Program thereof | |
US20180307744A1 (en) | Named entity-based category tagging of documents | |
Ransom et al. | Facets of user‐assigned tags and their effectiveness in image retrieval | |
Liu et al. | QA document recommendations for communities of question–answering websites | |
Moulahi et al. | When time meets information retrieval: Past proposals, current plans and future trends | |
Sun et al. | A hybrid approach for article recommendation in research social networks | |
Kianian et al. | Semantic community detection using label propagation algorithm | |
JP2012178026A (en) | Search device, search method, search program, and computer-readable recording medium for storing the same program | |
Balakrishnan et al. | Improving retrieval relevance using users’ explicit feedback | |
Drăgan et al. | Linking semantic desktop data to the web of data | |
CN116431895A (en) | Personalized recommendation method and system for safety production knowledge | |
US20140149378A1 (en) | Method and apparatus for determining rank of web pages based upon past content portion selections | |
US20180101615A1 (en) | Systems, methods and techniques for customizable domain-based searching | |
CN116186097A (en) | Method, device, equipment and storage medium for searching data asset | |
Elsas et al. | Shopping for top forums: discovering online discussion for product research | |
Ge et al. | A generalized evaluation framework for multimedia recommender systems | |
Liu | Personalized Recommendation Algorithm for Movie Data Combining Rating Matrix and User Subjective Preference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |