CN113901233B - Query data restoration method, system, computer equipment and storage medium - Google Patents

Query data restoration method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN113901233B
CN113901233B CN202111189624.9A CN202111189624A CN113901233B CN 113901233 B CN113901233 B CN 113901233B CN 202111189624 A CN202111189624 A CN 202111189624A CN 113901233 B CN113901233 B CN 113901233B
Authority
CN
China
Prior art keywords
data
missing
content
query
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111189624.9A
Other languages
Chinese (zh)
Other versions
CN113901233A (en
Inventor
沈玉军
李民权
徐小磊
刘建华
邢继风
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhilian Wangpin Information Technology Co ltd
Original Assignee
Zhilian Wangpin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhilian Wangpin Information Technology Co ltd filed Critical Zhilian Wangpin Information Technology Co ltd
Priority to CN202111189624.9A priority Critical patent/CN113901233B/en
Publication of CN113901233A publication Critical patent/CN113901233A/en
Application granted granted Critical
Publication of CN113901233B publication Critical patent/CN113901233B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of data processing, and particularly relates to a query data restoration method, a query data restoration system, computer equipment and a storage medium. The method comprises the following steps: acquiring data content to be judged, and judging whether the content data is missing data or not; repairing the data content determined to be missing data to obtain repaired data; constructing knowledge graph data in real time from the repaired data, and eliminating repeated data stored in a database; and verifying whether the repair result of the repaired data reaches the service application standard on line, and optimizing recall and matching degree of the search query by using the verified repair data. The invention forms complete closed loop through judging, repairing, constructing and verifying the missing of the main data, repairs and complets the data in real time, realizes the correlation alignment of the user and the matching content, recalls the most relevant data, and fundamentally realizes the bidirectional promotion of the recall quantity and the matching degree of the query.

Description

Query data restoration method, system, computer equipment and storage medium
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a query data restoration method, a query data restoration system, computer equipment and a storage medium.
Background
With the rapid development of internet technology, talent recruitment is gradually realized in a linear manner. When recruiting online, the user needs to be accurately understood by docking the database, portraying the user portrait and automatically repairing the missing data.
For example, in a recruitment database. The search/recommendation scene in the online recruitment service is often limited in search/recommendation recall and matching degree due to the data loss of the resume or the position, and is difficult to realize great improvement. The reason for this is that users are usually accustomed to using abbreviated content instead of standard content. Such as: when searching the position of the intelligent company, the user rarely inputs the full name of the company, namely 'Beijing network recruiting consultation limited company', to search, more particularly 'intelligent company' or 'intelligent company recruiting'. In the resume filling link, the user fills out the educational experience, and contents such as 'North Dar', 'tally' are usually filled out. The problem of data missing is commonly existed in scenes such as resume, position and company basic information filling. It is this input habit of the user that presents a significant challenge to search/recommend recall and match for online recruitment services.
Aiming at the problem, a query data restoration method, a query data restoration system, computer equipment and a storage medium are needed to be provided for restoring and completing data in real time, realizing the correlation alignment of users and matching contents, recalling the most relevant data, and fundamentally realizing the bidirectional promotion of the recall quantity and the matching degree of query.
Disclosure of Invention
In order to solve the problems that search/recommendation recall and matching degree are limited and great improvement is difficult to realize due to data deletion of resume or job position in the prior art, the invention provides a query data repairing method, a query data repairing system, a query data repairing device and a query data repairing storage medium.
The invention is realized by adopting the following technical scheme:
a query data retrieval method comprising:
acquiring data content to be judged, and judging whether the content data is missing data or not;
Repairing the data content determined to be missing data to obtain repaired data;
constructing knowledge graph data in real time from the repaired data, and eliminating repeated data stored in a database;
and verifying whether the repair result of the repaired data reaches the service application standard on line, and optimizing recall and matching degree of the search query by using the verified repair data.
Optionally, the method for judging whether the content data is missing data includes:
reading data content input by a user side, wherein the data content comprises query data and resume data filled in a search box by the user side;
traversing the acquired data content based on the user behavior data and the domain knowledge graph data, and judging whether the data content is missing content or not;
judging whether the missing content is completed or not based on constructed domain knowledge graph data aiming at the missing content;
marking the uncompensated data content with missing marks to obtain missing data.
Optionally, the method for repairing the data content determined to be missing data includes:
acquiring missing data to be repaired;
repairing the missing data in real time according to user behavior data and domain knowledge graph data;
The repair of the missing data is completed by searching the user behavior data related to the user behavior preference;
acquiring Internet open domain knowledge in real time by a crawler, acquiring tag data corresponding to the missing data, generating characteristic tag data after marking confirmation, establishing domain knowledge graph data of user input data and repaired data, and finishing the repair of the missing data.
Optionally, the method for constructing the knowledge-graph data from the repaired data in real time and eliminating the repeated data stored in the database comprises the following steps:
the method comprises the steps of obtaining repaired data, positioning the repaired data into wide table data, converting the wide table data into triple data in a metadata table in a triple (SPO) data form, and setting up a basic SPO layer of a hierarchical architecture for knowledge graph data storage;
generating a link according to the attribute of the triplet data in the metadata table, performing de-duplication normalization processing on entity data of a base layer for constructing the triplet, removing invalid data, and setting up a hierarchical entity data normalization layer for knowledge graph data storage;
and converting the triplet data of the entity data into wide-table data, realizing mapping of attribute names and data types of the triplet data to the wide-table data, and setting up a wide-table service application layer for storing the knowledge graph data.
Further, the metadata table comprises a generated entity category table, an entity attribute table, a constructed automatic database entering task metadata table, a record tracing table and an auxiliary table; the entity attribute table is used for constraining attributes of entity data, wherein the attributes comprise basic attributes and relational attributes of the entity data, and the entity attribute table comprises attribute names, categories to which the attributes belong and whether the attributes belong or not; the automatic warehousing task metadata table is used for describing the attribute corresponding to the entity data and carrying out automatic construction, and comprises a task number, an attribute name, a data source, field mapping, relation attribute constraint and whether a reverse relation is constructed or not; the record traceability table is used for recording process information and detailed configuration information in the data construction process, so that the traceability of the data is facilitated, and comprises traceability id, entity type, construction time, type, data source and version number; the auxiliary tables include an attribute constraint table, a data source table, a custom broad table conversion configuration table, and the like.
Optionally, the method for verifying whether the repair result of the repaired data reaches the service application standard on line includes: and pushing the repaired data to the line, and verifying whether the data repairing result reaches the service application standard or not through small-flow experimental analysis.
The invention also comprises a query data repairing system which adopts the query data repairing method to realize the repair of the missing data; the query data repair system comprises a data missing judging module, a data repair module, a data construction module and a data verification module.
The data missing judging module is used for acquiring data content to be judged, judging whether the content data is missing data or not and whether the missing data is completed or not according to user behavior data and domain knowledge graph data; the data restoration module is used for restoring the data content determined to be the missing data according to user behavior data and/or internet open domain knowledge acquired by a crawler in real time to obtain restored data; the data construction module is used for constructing knowledge graph data storage in real time in a progressive layering mode in a form of a triple structure of the repaired data, and automatically constructing a link according to the knowledge graph data so as to eliminate repeated data stored in a database; the data verification module is used for verifying whether the repair result of the repaired data reaches the service application standard on line, and the recall and the matching degree of the query are improved in a two-way mode by using the verified repair data.
The invention also includes a computer device comprising a memory storing a computer program and a processor implementing the steps of the query data retrieval method when the computer program is executed.
The invention also includes a storage medium storing a computer program which, when executed by a processor, performs the steps of a query data retrieval method.
The technical scheme provided by the invention has the following beneficial effects:
according to the invention, the data deletion problem related to the search/recommendation scene in the online recruitment service is repaired and completed in real time, so that the correlation alignment of the user and the matched content is realized, the most relevant data is recalled, and the two-way improvement of the recall quantity and the matching degree of the query is fundamentally realized. Through judging, repairing, constructing and verifying the missing of the main data, a complete closed loop is formed, and when inquiring, all entity data of the entity data oil pipe can be checked, and the relationship network data of the entity data and the attribute and relationship of all entity data are explored.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
Fig. 1 is a flowchart of a query data repairing method in embodiment 1 of the present invention.
Fig. 2 is a schematic diagram of the query data restoration method of embodiment 1 before and after resume data restoration.
Fig. 3 is a flowchart of data missing judging in the query data repairing method of embodiment 1 of the present invention.
Fig. 4 is a flowchart of missing data repair in the query data repair method of embodiment 1 of the present invention.
Fig. 5 is a flowchart for constructing knowledge-graph data in the query data restoration method of embodiment 1 of the present invention.
Fig. 6 is a schematic diagram of three branches of a construction task in the query data repair method of embodiment 1 of the present invention.
Fig. 7 is a schematic diagram of an SPO-to-width table of triple structure data in the query data repair method of embodiment 1 of the present invention.
Fig. 8 is a system block diagram of a query data repair system according to embodiment 2 of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Knowledge graph, which is essentially a semantic network, is a graph-based data structure that can be composed of nodes (points) and edges (edges). In the knowledge graph, each node represents an "entity" existing in the real world, and each edge is a "relationship" between entities. Knowledge maps are an efficient way of representing relationships. It is colloquially understood that a knowledge graph may be a network of relationships that are obtained by linking together all the different kinds of information. Knowledge maps provide the ability to analyze problems from a "relational" perspective. Assume that a knowledge graph is used to describe a fact: zhang three is the father of Lifour. Here the entities are Zhang three and Lifour and the relationship is father. Of course, zhang three and Liu four may have some type of relationship with others and are not considered here. When a telephone number is added to the knowledge graph as a node, the telephone number is also an entity, and a relationship called has_phone can be defined between a person and a telephone, i.e. a certain telephone number belongs to a certain person. The time may be added to the his phone relationship as an attribute to indicate the time to open the phone number, and this attribute may be added not only to the relationship, but also to the entity as such.
The resume knowledge graph is a knowledge graph constructed by using information related to the resume. The resume knowledge graph can be a whole set of frameworks for realizing knowledge representation and reasoning, and comprises knowledge graph entities, relations, word forests (synonyms and upper and lower words), vertical knowledge graphs (domain professional graphs), knowledge maintenance modules, machine learning reasoning engines (upper and lower level and allele reasoning, inconsistent reasoning, knowledge discovery reasoning and ontology concept reasoning) and the like. On one hand, the reasoning mechanism of the knowledge graph plays an auxiliary recognition role in resume analysis; on the other hand, in the information evaluation, the functions of entity positioning, matching degree recognition and the like are realized, and support is provided for final resume evaluation. In one embodiment, the resume knowledge graph may be generated using the history resume that has been evaluated. The history resume that has been evaluated may include a resume of a job applicant that has been successfully applied, and may also include a resume of a job applicant that has not been successfully applied. The history resume which is evaluated can be a resume after the history resume is subjected to overall grading, or a resume after grading is carried out on one or more resume information in the resume. The resume knowledge graph at least comprises the resume information of the history resume and the correlation information of the post requirements of the post. The post demand may be determined by the recruitment demand and the domain positioning. For example, skill requirements, academic requirements, work years requirements, industry characteristics requirements, and the like may be included. The resume information may be information recorded in a resume, including, for example, a personal description, a learning experience description, a work experience description, and the like. The nodes in the resume knowledge graph and the relationships among the nodes can be configured according to requirements. For example, the nodes in the resume knowledge graph may include post nodes, resume nodes, and the like. The post node may be used to represent post requirements and the resume node may be used to represent information related to the resume. The node connecting edges in the resume knowledge graph are used for representing that the connected nodes have an association relationship. The relevance information may be information for evaluating relevance such as a degree of relevance, a score, or a degree of matching. By way of example, the attributes of the node edges of the resume node and the post node may include the value attributes of the resume node relative to the post node. The value attribute may be embodied by a scoring value/association, or the like. In some examples, some resume nodes also have value attributes. For example, a node that has a value attribute describing the value of the node indicates that a Nobel prize was obtained. In determining the correlation between nodes, the correlation may be determined by the value attribute of the inter-node connection edge or the value attribute of the node.
There are various ways to construct the resume knowledge graph, and the detailed description is omitted herein. The application aims to solve the problem of how to repair data in real time when query data are judged to be missing data and repair is needed.
According to the query data restoration method, the query data restoration system, the computer equipment and the storage medium, when the problem of data deletion related to a search/recommendation scene in online recruitment service is solved, the main data in the recruitment service is subjected to deletion judgment, restoration, construction and verification to form a complete closed loop, and the data is restored and complemented in real time. The correlation alignment of the user and the matching content is realized, the most relevant data is recalled, and the bidirectional promotion of the recall quantity and the matching degree of the query is fundamentally realized. The following will describe specific examples.
Example 1
As shown in fig. 1, one embodiment of the present application provides a query data repairing method, which is used for performing deletion judgment and repairing on data content input by a user at a user terminal, and the method includes the following steps:
s1, acquiring data content to be judged, and judging whether the content data is missing data or not.
In this embodiment, missing data is mainly divided into two types, one is that the user actually fills out the content, and the user can recognize the content, but the machine is hard to recognize. Referring to fig. 2, schematic diagrams of resume data before and after repair are shown. Such as "north da" for the school name filling of the original resume. The people have background knowledge, so that the 'Beijing university' can be known, but the machine can not establish the relationship between the 'Beijing university' and the 'Beijing university'; another is that the user does not fill but the knowledge is hidden in the filled content, and needs to assist in background knowledge reading, or for the example of "north day", the person knows that the user is a 985 and 211 school graduation, although the user does not fill specifically, but the machine does not. Clearly, these two types of missing data supplemental fixes are critical to the impact of human post matching on recall and match-up.
To solve the above two types of missing data problems. Referring to fig. 3, the method for determining whether the content data is missing data includes:
s101, reading data content input by a user side, wherein the data content comprises query data and resume data filled in a search box by the user side.
S102, traversing the acquired data content based on the user behavior data and the domain knowledge graph data, and judging whether the data content is missing content or not.
S103, judging whether the missing content is completed or not based on the constructed domain knowledge graph data aiming at the missing content.
S104, marking the data content which is not completed by the missing, and obtaining missing data.
In this embodiment, based on the user behavior data and the domain knowledge graph data, whether the user input content is missing data and whether the data content is completed are calculated in real time. Specific behavior data include: the user mainly uses the query content, position click, position check and position delivery data input by the user in the search box; query content entered by HR responsible for recruitment, resume work experience, resume educational experience, and the like. For example: through the user input of the user terminal, query content of 'intelligent couplet', and the follow-up position clicking, position checking and position delivering company is 'Beijing network recruiting consultation limited company' and other preference data, the 'intelligent couplet' input by the user can be judged to be missing content. Based on the constructed domain knowledge graph data, whether the missing data is completed can be judged.
S2, repairing the data content determined to be the missing data to obtain repaired data.
In this embodiment, referring to fig. 4, a method for repairing data content determined as missing data includes:
s201, acquiring missing data to be repaired;
s202, repairing the missing data in real time according to user behavior data and domain knowledge graph data;
s203, completing the repair of the missing data by searching the user behavior data related to the user behavior preference;
s204, acquiring Internet open domain knowledge in real time through a crawler, acquiring tag data corresponding to the missing data, generating characteristic tag data after marking confirmation, and establishing domain knowledge graph data of user input data and repaired data to finish the repair of the missing data.
In an embodiment, the data contents determined to be missing data and to need repair are subjected to data repair in real time. Data repair mainly provides two pathways: one is to repair based on user behavior data. The behavior data includes: search queries, position clicks, position views, position delivery data, resume work experience, resume educational experience, and the like. And (5) completing the repair of the missing data by searching and calculating the correlation between the behavior preference. For example: the user searching for "intelligent recruitment" can directly check the position of "Beijing network recruitment consultation limited company". Thereby establishing the relationship between the 'intelligent joint recruitment' and the 'Beijing network recruitment consultation limited company' entity.
In addition, crawler is adopted to collect internet open domain knowledge in real time, so as to obtain relation data of Beijing university and label data of Beijing university and 985 and 211. And then, after the manual labeling is confirmed, carrying out feature label production processing to generate label data, and establishing the relation between user input data and repaired data.
And S3, constructing knowledge graph data from the repaired data in real time, and eliminating repeated data stored in a database.
In this embodiment, the repaired data is constructed and put in storage in real time through the data construction module. In order to improve the repair and service capacity of missing data, the invention constructs the repaired data into a warehouse in a triple (SPO) structure form. The structure data not only maintains rich semantic meaning among the data, but also is more friendly to a graph database, is more flexible in data repair and processing, and is more in line with multi-source and various missing data repair processing and service application scenes. In order to ensure the data warehousing quality and the consistency of the upper layer service call data, the invention uniformly manages the warehousing process of the attribute through metadata.
The scheme adopted comprises the following steps: 1) Metadata configuration; 2) A knowledge graph metadata management platform; 3) Progressive layering is carried out to construct a data storage of the knowledge graph; 4) Automatically constructing a link of knowledge graph data; standardized deduplication, and the like.
Referring to fig. 5, the method for constructing knowledge-graph data from the repaired data in real time includes:
s301, acquiring repaired data, positioning the data as wide table data, converting the wide table data into triple data in a metadata table in a triple (SPO) data form, and setting up a basic SPO layer of a hierarchical architecture of knowledge graph data storage;
s302, generating a link according to the attribute of the triplet data in the metadata table, performing de-duplication normalization processing on entity data of a base layer for constructing the triplet, clearing invalid data, and setting up a hierarchical entity data normalization layer for knowledge graph data storage;
s303, converting the triplet data of the entity data into wide-table data, realizing mapping of attribute names and data types of the triplet data to the wide-table data, and setting up a wide-table service application layer for storing the knowledge graph data.
In this embodiment, the metadata table includes a generated entity category table, an entity attribute table, a constructed metadata table for an automated database-entering task, a record tracing table, and an auxiliary table. The entity class table comprises an entity class number, a class name, a level and a parent class number, the entity attribute table is used for constraining the attribute of the entity data, the attribute comprises the basic attribute and the relation attribute of the entity data, and the entity attribute table comprises the attribute name, the class to which the attribute belongs and whether the attribute belongs to multiple values or not; the automatic warehousing task metadata table is used for describing the attribute corresponding to the entity data and carrying out automatic construction, and comprises a task number, an attribute name, a data source, field mapping, relation attribute constraint and whether a reverse relation is constructed or not; the record traceability table is used for recording process information and detailed configuration information in the data construction process, so that the traceability of the data is facilitated, and comprises traceability id, entity type, construction time, type, data source and version number; the auxiliary tables include an attribute constraint table, a data source table, a custom broad table conversion configuration table, and the like.
The invention builds a unified knowledge graph metadata management platform, various entities and attribute metadata are preset in advance by a data modeler, and controllable contents comprise: attribute names, chinese meaning, attribute description, edge type, single/multi-value, belonging class, data type, source identification, rule constraint, etc. Before data storage, metadata is firstly required to be uniformly configured in a metadata management platform, and in data storage, a program can read the metadata for verification so as to ensure that the storage data accords with data standards. For example: the 985 and 211 label fields of Beijing university are multi-value types, and after the multi-value types are configured, the system can accurately identify and automatically construct a multi-value array format, so that the multi-value array format can be applied to recall and matching.
In knowledge graph data construction, because data such as entities, relationships and attributes of the knowledge graph data come from multiple sources, the problem of repeated data warehousing cannot be avoided, for example: the "Beijing university" attribute of the entity goes multiple times. Like this kind of repeated data, not only occupy the storage space, also influence the practical application effect of business. In order to avoid the problem, the invention ensures the data warehouse-in efficiency and the high availability of the data, fully utilizes the hierarchical characteristic of a warehouse-in link, and performs reliability judgment on repeated data and repeated data deduplication on an upper layer. Duplicate data is removed based on duplicate value deduplication capabilities provided by the database itself.
In one embodiment of the invention, entity data is abstracted into a metadata table, and unified specification and constraint management are carried out on the data; the method specifically comprises the following steps:
1.1 An entity class table is generated, and mainly comprises an entity class number, a class name, a class and a parent class number.
1.2 An entity attribute table is generated, which attributes (basic attributes and relationship attributes) the constraint entity has, mainly has attribute names, the category to which the attributes belong, whether multiple values exist, and the like.
1.3 An automatic warehouse-in task metadata table is constructed to describe which entity and which attributes are automatically constructed, and the automatic warehouse-in task metadata table mainly comprises task numbers, attribute names, data sources, field mapping, relation attribute constraints and whether reverse relations are constructed or not.
1.4 Data construction record traceability table, process information and detailed configuration information in the data construction process are recorded, so that the traceability of the data is facilitated, and the data is mainly traced by a trace ID (traceability id), an entity type, construction time, type, data source, version number and the like.
1.5 Other auxiliary tables to improve data quality, such as: attribute constraint table, kg_source data source table, custom wide table conversion configuration table, etc.
In this embodiment, the link generation according to the metadata attribute specifically includes: entity category management, entity attribute management, management of data sources. In the real-time construction of the knowledge graph data from the repaired data, as shown in fig. 6, the whole construction task is divided into three branches: triple (SPO) structure stores, data normalization and deduplication, and SPO span tables. Wherein:
(1) The purpose of the triplet structure store is to support dynamic diversity of attribute type changes and graph computation query support. Dynamic augmentation of arbitrary properties is supported. Such as: the "Beijing network recruitment consultation limited company" entity adds attribute alias "intelligent linkage recruitment", adds attribute great-use name "Beijing intelligent linkage three-talent service limited company", etc. and does not need to modify table structure, and can be dynamically added.
(2) The purpose of the data normalization and deduplication tasks is to ensure consistency and reliability of data and avoid ambiguity when providing data services for front-end business. Such as: the Beijing network consultation limited company and the Beijing Zhi Ling Sanke talent service limited company are actually different heads of the same company, and when the external service is provided, the external service is required to be carried out based on the data after unified standardization, and the Beijing network consultation limited company is adopted in a unified way. And realizing data normalization and deduplication, namely judging the similarity of two entities through a data normalization model, and performing entity normalization if the similarity reaches a certain threshold value and proving that the data is a unified entity. In a corporate entity similarity model, selected features of the present invention include: company name, company registration address, company legal person, company equity relationship, etc.
(3) The SPO conversion table aims at solving the problems of analysis of triple structure data and scene mining and providing automatic conversion of data formats. Referring to FIG. 7, the triple structure has the advantage of convenient and flexible construction, and has the disadvantage of being unsuitable for data analysis and mining in non-graph databases such as hive, mysql, and the like, because of the large number of join table operations. Not only is development cost high, but also execution filtering is low. Therefore, the invention realizes the service of converting the SPO into the broad table, automatically converts the data table of the SPO structure into the broad table structure based on the metadata, and provides the service for data analysis and mining.
In one embodiment of the present invention, the hierarchical architecture of the knowledge-graph data store is set up to include three layers, wherein the first layer is a base SPO layer, the second layer is a unified layer of entity data, and the third layer is a broad-table service application layer.
The base SPO layer: the basic data of KG data comprises data of which the data sources are not normalized to the entity, mainly realizes the conversion of wide table data into triple data, the configuration of entity attribute relationship, the generation of data traceable trackID, the automatic construction of reciprocal relationship, the control of reliable data source attribute and the storage of data in dm_garph layer.
The entity data is normalized: the data of the entity of the basic layer is subjected to de-duplication and normalization, data normalization and data sequencing are mainly realized, and according to single-value/multi-value de-duplication, invalid data are cleaned, the source of the data is controlled, and the data is stored in the dmr_garph layer.
The broad table service application layer: a one-stop service is provided for a data user, so that user friendly who does not know SPO is facilitated, mapping from attribute names and data types to a wide table is mainly achieved, configuration construction is achieved, and data are stored in a dmagarph layer.
And S4, verifying whether the repair result of the repaired data meets the service application standard on line, and optimizing recall and matching degree of the search query by using the verified repair data.
In this embodiment, the method for verifying whether the repair result of the repaired data reaches the service application standard on line includes: and pushing the repaired data to the line, and verifying whether the data repairing result reaches the service application standard or not through small-flow experimental analysis.
Finally, when searching, searching is carried out on the platform constructed by the method, so that searching can be optimized, and recall and matching degree are improved in both directions by inquiry. When searching is carried out by the platform constructed by the method, the detail information of the company can be clicked and checked; clicking to view the entity related to the company, entering a detail page of the entity, and further viewing all the entities related to the entity subsequently, so that relationship network data of a certain entity can be explored; clicking into the entity details page can see the attributes and relationships of the entity.
Example 2
As shown in fig. 8, in one embodiment of the present invention, a query data repair system includes a data missing judging module 11, a data repair module 12, a data constructing module 13, and a data verifying module 14.
The data missing judging module 11 is configured to obtain the data content to be judged, and judge whether the content data is missing data and whether the missing data is completed according to the user behavior data and the domain knowledge graph data. In this embodiment, the data missing property determination module 11 determines for two types of missing data problems, where the two types of missing data are: the user actually fills in the content, so that people can recognize the content, but the machine is difficult to recognize; the user does not fill in, but knowledge is hidden in the filled content, which requires assistance in background knowledge reading.
The data missing judging module 11 may read the data content including the two types of missing data input by the user terminal, traverse the obtained data content based on the user behavior data and the domain knowledge graph data, and judge whether the data content is missing content. And judging whether the missing content is completed or not according to the constructed domain knowledge graph data of the missing content, and marking the missing and uncompleted data content to obtain missing data. The data missing judging module 11 can judge whether the missing data is completed or not based on the constructed domain knowledge graph data.
The data restoration module 12 is configured to restore the data content determined to be missing data according to user behavior data and/or internet open domain knowledge acquired by a crawler in real time, so as to obtain restored data. In this embodiment, when the data repairing module 12 repairs the missing data, the missing data is repaired in real time according to the user behavior data and the domain knowledge graph data by acquiring the missing data to be repaired. The repairing method is divided into two types, wherein one type is based on user behavior data, and the repairing of the missing data is completed by searching the user behavior data related to user behavior preference. And the other is based on domain knowledge graph data, internet open domain knowledge is acquired in real time through a crawler, tag data corresponding to the missing data are acquired, after the tag is confirmed, feature tag data are generated, domain knowledge graph data of user input data and repaired data are established, and the repair of the missing data is completed.
For example, the behavior data includes: search queries, position clicks, position views, position delivery data, resume work experience, resume educational experience, and the like. And (5) completing the repair of the missing data by searching and calculating the correlation between the behavior preference. For example: the user searching for "intelligent recruitment" can directly check the position of "Beijing network recruitment consultation limited company". Thereby establishing the relationship between the 'intelligent joint recruitment' and the 'Beijing network recruitment consultation limited company' entity.
When the domain knowledge graph data is restored, for example, the relationship data of 'Beijing university' and '985 and 211' label data are acquired. And then, after the manual labeling is confirmed, carrying out feature label production processing to generate label data, and establishing the relation between user input data and repaired data.
The data construction module 13 is configured to construct knowledge-graph data storage in real time in a progressive hierarchical manner according to the repaired data in a form of a triple structure, and automatically construct a link according to the knowledge-graph data, so as to reject repeated data stored in a database. In the knowledge graph data construction, the data construction module 13 performs reliability judgment and repeated data deduplication on the repeated data at an upper layer by utilizing the layering characteristic of a warehousing link according to the repeated warehousing problem of most of data such as entities, relations, attributes and the like from a multi-source channel before the repeated data is rejected. Duplicate data is removed based on duplicate value deduplication capabilities provided by the database itself.
The data construction module 13 is further configured to abstract entity data into metadata tables, and perform unified specification and constraint management on the data. The metadata table comprises a generated entity category table, an entity attribute table, a constructed automatic database entering task metadata table, a record tracing table and an auxiliary table. And generating a link according to the metadata attribute to realize entity category management, entity attribute management and data source management. The whole construction task is constructed through triple (SPO) structure storage, data normalization and deduplication, and SPO breadth-conversion tables.
The data verification module 14 is configured to verify, on line, whether the repair result of the repaired data meets the service application standard, and to use the repair data that passes the verification to bidirectionally promote the recall and the matching degree of the query. In this embodiment, the repaired data content is pushed to the online by the data verification module 14, and the online performs a small-flow a/B experiment significance analysis to verify the data repair result. Finally, searching is carried out on a platform constructed by the system, so that searching can be optimized, and recall and matching degree are improved in a bidirectional manner.
Example 3
In one embodiment of the present invention, a computer device is provided, where the computer device may be used to implement the query data repairing method provided in the foregoing embodiment, and the computer device may be a smart phone, a computer, a tablet computer, or other devices.
The computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the steps in the embodiment of the method:
acquiring data content to be judged, and judging whether the content data is missing data or not;
repairing the data content determined to be missing data to obtain repaired data;
Constructing knowledge graph data in real time from the repaired data, and eliminating repeated data stored in a database;
and verifying whether the repair result of the repaired data reaches the service application standard on line, and optimizing recall and matching degree of the search query by using the verified repair data.
Example 4
In one embodiment of the present invention, there is also provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above:
acquiring data content to be judged, and judging whether the content data is missing data or not;
repairing the data content determined to be missing data to obtain repaired data;
constructing knowledge graph data in real time from the repaired data, and eliminating repeated data stored in a database;
and verifying whether the repair result of the repaired data reaches the service application standard on line, and optimizing recall and matching degree of the search query by using the verified repair data.
The query data restoration method based on the knowledge graph provided by the embodiment can be implemented by software, or can be implemented by combining software and hardware or by hardware, and the related hardware can be composed of two or more physical entities or one physical entity. The method of the embodiment can be applied to electronic equipment with processing capability. The electronic device may be a PC, a tablet computer, a notebook computer, a desktop computer, or the like.
It should be noted that, for the query data repairing method according to the present application, it will be understood by those skilled in the art that all or part of the flow of implementing the query data repairing method according to the embodiments of the present application may be implemented by controlling related hardware by a computer program, where the computer program may be stored in a computer readable storage medium, such as a memory of a computer device, and executed by at least one processor in the computer device, and the execution may include the flow of implementing the query data repairing method according to the embodiments of the present application.
Correspondingly, the embodiment of the specification also provides a computer storage medium, wherein the storage medium stores program instructions, and the program instructions realize the query data restoration method based on the knowledge graph when being executed by a processor.
Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.
For the query data repairing device of the embodiment of the application, each functional module can be integrated in one processing chip, each module can exist alone physically, and two or more modules can be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated module, if implemented as a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium such as read-only memory, magnetic or optical disk, etc.
In summary, the application repairs and complets data in real time by solving the problem of data missing related to the searching/recommending scene in the online recruitment service, realizes the correlation alignment of the user and the matching content, recalls the most relevant data, and fundamentally realizes the bidirectional promotion of the recall quantity and the matching degree of the query. Through judging, repairing, constructing and verifying the missing of the main data, a complete closed loop is formed, and when inquiring, all entity data of the entity data oil pipe can be checked, and the relationship network data of the entity data and the attribute and relationship of all entity data are explored.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (9)

1. A query data remediation method, characterized in that the query data remediation method comprises:
s1, acquiring data content to be judged, and judging whether the content data is missing data or not;
s2, repairing the data content of the data determined to be missing to obtain repaired data;
s3, constructing knowledge graph data in real time from the repaired data, and eliminating repeated data stored in a database;
s4, verifying whether the repair result of the repaired data meets the service application standard on line, and optimizing recall and matching degree of the search query by using the verified repair data;
in step S1, determining whether the content data is missing data includes:
s101: reading data content input by a user side, wherein the data content comprises query data and resume data filled in a search box by the user side;
s102: traversing the acquired data content based on the user behavior data and the domain knowledge graph data, and judging whether the data content is missing content or not;
S103: judging whether the missing content is completed or not based on constructed domain knowledge graph data aiming at the missing content;
s104: marking the missing uncompensated data content to obtain missing data;
the behavior data includes: the user is query content, resume work experience and resume education experience which are input in the search box by using the user, and the query content, the position clicking, the position checking, the position delivering data, the query content, the resume work experience and the resume education experience are input in the HR responsible for recruitment.
2. The query data restoration method as set forth in claim 1, wherein in step S2, the restoration of the data content determined to be missing data includes:
s201: acquiring missing data to be repaired;
s202: repairing the missing data in real time according to user behavior data and domain knowledge graph data;
s203: the repair of the missing data is completed by searching the user behavior data related to the user behavior preference;
s204: acquiring Internet open domain knowledge in real time by a crawler, acquiring tag data corresponding to the missing data, generating characteristic tag data after marking confirmation, establishing domain knowledge graph data of user input data and repaired data, and finishing the repair of the missing data.
3. The query data restoration method as set forth in claim 2, wherein the method of constructing knowledge-graph data from the restored data in real time and rejecting the repeated data stored in the database comprises:
s301: the method comprises the steps of obtaining repaired data, positioning the repaired data into wide table data, converting the wide table data into triple data in a metadata table in a triple (SPO) data form, and setting up a basic SPO layer of a hierarchical architecture for knowledge graph data storage;
s302: generating a link according to the attribute of the triplet data in the metadata table, performing de-duplication normalization processing on entity data of a base layer for constructing the triplet, removing invalid data, and setting up a hierarchical entity data normalization layer for knowledge graph data storage;
s303: and converting the triplet data of the entity data into wide-table data, realizing mapping of attribute names and data types of the triplet data to the wide-table data, and setting up a wide-table service application layer for storing the knowledge graph data.
4. The query data restoration method as recited in claim 3, wherein the metadata table includes a generated entity category table, an entity attribute table, a constructed automated database-entering task metadata table, a record traceability table and an auxiliary table.
5. The query data remediation method of claim 4 wherein the entity class table includes entity class numbers, class names, levels, parent class numbers;
the entity attribute table is used for constraining the attribute of the entity data, wherein the attribute comprises the basic attribute and the relation attribute of the entity data, and the entity attribute table comprises an attribute name, a category to which the attribute belongs and whether the attribute belongs to multiple values or not;
the automatic warehousing task metadata table is used for describing the attribute corresponding to the entity data and carrying out automatic construction, and comprises a task number, an attribute name, a data source, field mapping, relation attribute constraint and whether a reverse relation is constructed or not;
the record traceability table is used for recording process information and detailed configuration information in the data construction process, so that the traceability of the data is facilitated, and comprises traceability id, entity type, construction time, type, data source and version number;
the auxiliary table comprises an attribute constraint table, a data source table and a customized wide table conversion configuration table.
6. The query data restoration method as set forth in claim 1, wherein the method for verifying on line whether the restoration result of the restored data meets the service application standard comprises: and pushing the repaired data to the line, and verifying whether the data repairing result reaches the service application standard or not through small-flow experimental analysis.
7. A query data repair system, characterized in that the query data repair system implements missing data repair by using the query data repair method of any one of claims 1 to 6; the query data repair system includes:
the data missing judging module is used for acquiring data content to be judged, judging whether the content data is missing data or not and whether the missing data is completed or not according to the user behavior data and the domain knowledge graph data;
the data restoration module is used for restoring the data content determined to be the missing data according to the user behavior data and/or the internet open domain knowledge acquired by the crawler in real time to obtain restored data;
the data construction module is used for constructing knowledge graph data storage in real time in a progressive layering mode according to the repaired data in a form of a triple structure, and automatically constructing a link according to the knowledge graph data so as to eliminate repeated data stored in a database; and
the data verification module is used for verifying whether the repair result of the repaired data meets the service application standard on line, and the recall quantity and the matching degree of the query are improved in a two-way by using the verified repair data;
The data missing judging module reads data content input by a user terminal, traverses the acquired data content based on user behavior data and domain knowledge graph data, and judges whether the data content is missing content or not; and aiming at the missing content, judging whether the missing content is completed or not based on the constructed domain knowledge graph data, marking the missing and uncompleted data content, and obtaining the missing data, wherein the data missing judging module can judge whether the missing data is completed or not based on the constructed domain knowledge graph data.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 6.
CN202111189624.9A 2021-10-13 2021-10-13 Query data restoration method, system, computer equipment and storage medium Active CN113901233B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111189624.9A CN113901233B (en) 2021-10-13 2021-10-13 Query data restoration method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111189624.9A CN113901233B (en) 2021-10-13 2021-10-13 Query data restoration method, system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113901233A CN113901233A (en) 2022-01-07
CN113901233B true CN113901233B (en) 2023-11-17

Family

ID=79191884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111189624.9A Active CN113901233B (en) 2021-10-13 2021-10-13 Query data restoration method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113901233B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115278774A (en) * 2022-07-20 2022-11-01 云南电网有限责任公司电力科学研究院 Beidou short message missing data additional recording method and system
CN117290561B (en) * 2023-11-27 2024-03-29 北京衡石科技有限公司 Service state information feedback method, device, equipment and computer readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271530A (en) * 2018-10-17 2019-01-25 长沙瀚云信息科技有限公司 A kind of disease knowledge map construction method and plateform system, equipment, storage medium
CN109657238A (en) * 2018-12-10 2019-04-19 宁波深擎信息科技有限公司 Context identification complementing method, system, terminal and the medium of knowledge based map
CN109766445A (en) * 2018-12-13 2019-05-17 平安科技(深圳)有限公司 A kind of knowledge mapping construction method and data processing equipment
CN110019150A (en) * 2019-04-11 2019-07-16 软通动力信息技术有限公司 A kind of data administering method, system and electronic equipment
CN111723215A (en) * 2020-06-19 2020-09-29 国家计算机网络与信息安全管理中心 Device and method for establishing biotechnological information knowledge graph based on text mining

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11036768B2 (en) * 2018-06-21 2021-06-15 LeapAnalysis Inc. Scalable capturing, modeling and reasoning over complex types of data for high level analysis applications

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109271530A (en) * 2018-10-17 2019-01-25 长沙瀚云信息科技有限公司 A kind of disease knowledge map construction method and plateform system, equipment, storage medium
CN109657238A (en) * 2018-12-10 2019-04-19 宁波深擎信息科技有限公司 Context identification complementing method, system, terminal and the medium of knowledge based map
CN109766445A (en) * 2018-12-13 2019-05-17 平安科技(深圳)有限公司 A kind of knowledge mapping construction method and data processing equipment
CN110019150A (en) * 2019-04-11 2019-07-16 软通动力信息技术有限公司 A kind of data administering method, system and electronic equipment
CN111723215A (en) * 2020-06-19 2020-09-29 国家计算机网络与信息安全管理中心 Device and method for establishing biotechnological information knowledge graph based on text mining

Also Published As

Publication number Publication date
CN113901233A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
Meier et al. SQL & NoSQL databases
Jin Determinants of efficient risk allocation in privately financed public infrastructure projects in Australia
Ballard et al. Data modeling techniques for data warehousing
Talburt Entity resolution and information quality
CN113901233B (en) Query data restoration method, system, computer equipment and storage medium
US20130117219A1 (en) Architecture for knowledge-based data quality solution
WO2013067077A1 (en) Knowledge-based data quality solution
Hoberman Data Modeling for MongoDB: Building Well-Designed and Supportable MongoDB Databases
CN107016001A (en) A kind of data query method and device
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN109614375B (en) Data storage system based on personal computer
Gui et al. IFC-based partial data model retrieval for distributed collaborative design
CN116383193A (en) Data management method and device, electronic equipment and storage medium
CN113626571B (en) Method, device, computer equipment and storage medium for generating answer sentence
Wolf Solving location‐allocation problems with professional optimization software
CN114265957A (en) Multiple data source combined query method and system based on graph database
US20190087474A1 (en) Automatic ingestion of data
Gaona-Garcia et al. Visual analytics of Europeana digital library for reuse in learning environments: A premier systematic study
Bullard Digital humanities and electronic resources in the long eighteenth century
CN112231380A (en) Method and system for comprehensively processing acquired data, storage medium and electronic equipment
Siguenza Guzman et al. Design of an integrated decision support system for library holistic evaluation
Anand ETL and its impact on Business Intelligence
Lee et al. Mobile risk management for wooden architectural heritage in korea using hbim and vr
Chamberlain Using aspects of data governance frameworks to manage big data as an asset
Prat et al. Representation of aggregation knowledge in OLAP systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 214000 room 706, 7 / F, building 8 (Wuxi talent financial port), east of Hongxing Duhui, economic development zone, Wuxi City, Jiangsu Province

Applicant after: Zhilian Wangpin Information Technology Co.,Ltd.

Address before: 214000 room 706, 7 / F, building 8 (Wuxi talent financial port), east of Hongxing Duhui, Wuxi Economic Development Zone, Wuxi City, Jiangsu Province

Applicant before: Zhilian (Wuxi) Information Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant