CN117807252B - Knowledge graph-based data processing method, device and system and storage medium - Google Patents

Knowledge graph-based data processing method, device and system and storage medium Download PDF

Info

Publication number
CN117807252B
CN117807252B CN202410226896.9A CN202410226896A CN117807252B CN 117807252 B CN117807252 B CN 117807252B CN 202410226896 A CN202410226896 A CN 202410226896A CN 117807252 B CN117807252 B CN 117807252B
Authority
CN
China
Prior art keywords
sentence
content
vector
pointing vector
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410226896.9A
Other languages
Chinese (zh)
Other versions
CN117807252A (en
Inventor
周正斌
花福军
覃进千
王敏
罗强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Creative Information Technology Co ltd
Original Assignee
Creative Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Creative Information Technology Co ltd filed Critical Creative Information Technology Co ltd
Priority to CN202410226896.9A priority Critical patent/CN117807252B/en
Publication of CN117807252A publication Critical patent/CN117807252A/en
Application granted granted Critical
Publication of CN117807252B publication Critical patent/CN117807252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data processing method, a device, a system and a storage medium based on a knowledge graph, which comprise responding to acquired sentence data, analyzing the sentence data to obtain a plurality of phrases; assigning a direction vector to each phrase; and connecting the obtained multiple direction vectors end to end in sequence to obtain a pointing vector, searching in a knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range. According to the knowledge-graph-based data processing method, device and system and the storage medium disclosed by the application, the retrieval and the high matching of the content are realized based on the three-dimensional space operation, and more accurate content can be obtained.

Description

Knowledge graph-based data processing method, device and system and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, device, system and storage medium based on a knowledge graph.
Background
The knowledge graph is a structured semantic knowledge base for describing concepts and interrelationships thereof in a physical world in a symbol form, and the basic constituent units of the knowledge graph are entity-relation-entity triples and entity and related attribute-value pairs thereof, and the entities are mutually connected through the relation to form a net-shaped knowledge structure.
In the search field, the knowledge graph has wide application prospect, for example, in the current artificial intelligence field, the trained artificial intelligence can give accurate answers according to questions proposed by questioners, and the implementation of the process can be realized by depending on the knowledge graph. Because in the process of obtaining an accurate answer, an accurate search is required in the existing contents. Prior to such accurate searching, the commonly used searching method is based on association searching, and the searching method sorts the searched content according to the association degree, so that the questioner needs to further screen the content.
With the wide-range use of artificial intelligence, the accuracy and energy consumption of the retrieval process are also attracting attention, and the higher accuracy can reduce the use cost of artificial intelligence. The knowledge graph can open up the data island to realize the networking of the content, so that the content in the rich vertical field can be provided, but further research is needed on how to realize the rapid orientation of the content.
Disclosure of Invention
In order to improve the matching degree of the problems and the results in the retrieval process, the application provides a data processing method, a device, a system and a storage medium based on a knowledge graph, which are based on three-dimensional space operation to realize the high matching of the retrieval and the content and can obtain more accurate content.
The above object of the present application is achieved by the following technical solutions:
In a first aspect, the present application provides a data processing method based on a knowledge graph, including:
responding to the obtained sentence data, and analyzing the sentence data to obtain a plurality of phrases;
Assigning a direction vector to each phrase;
connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
And searching in the knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range.
In a possible implementation manner of the first aspect, assigning a direction vector to each phrase includes:
Dividing statement data into multiple units to obtain multiple statement units, wherein the length of each statement unit obtained by each division is the same;
analyzing the sentence units obtained by each division to obtain corresponding meanings of the sentence units, wherein each sentence unit corresponds to at least one meaning;
Obtaining a semantic group of sentence data according to the division and a meaning corresponding to each sentence unit;
Screening semantic groups with the highest occurrence frequency and the same expression meaning or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
Each phrase is assigned a direction vector according to the meaning semantic group.
In a possible implementation manner of the first aspect, when one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, and each direction vector corresponds to one meaning of the sentence unit.
In a possible implementation manner of the first aspect, the synthesis processing is performed on the plurality of direction vectors assigned to the sentence unit, and the plurality of direction vectors assigned to the sentence unit are combined into one direction vector.
In a possible implementation manner of the first aspect, retrieving in the knowledge-graph using the pointing vector includes:
Taking the connection points of the two directional vectors with connection relations on the sequence as nodes, and bringing the content at the nodes or in the range of the nodes into a first retrieval range;
combining the obtained first search ranges to obtain a second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range;
obtaining a sub-knowledge graph by using the content in the second retrieval range;
and searching in the sub-knowledge graph by using the pointing vector to obtain a search result.
In a possible implementation manner of the first aspect, retrieving in the sub-knowledge-graph using the pointing vector includes:
Calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
Sorting the content packets according to the association degree, wherein the association degree of the content packets is inversely related to the position in the sequence;
The search result is obtained in at least the preceding content packet of the sequential sequence.
In a possible implementation manner of the first aspect, obtaining the search result in at least a previous content packet of the sequential sequence includes:
Obtaining a plurality of content in a first content packet;
Sorting the contents according to the relation and constructing a comparison pointing vector according to the sorted contents;
Calculating the similarity of the pointing vector and the contrast pointing vector;
when the similarity between the pointing vector and the contrast pointing vector is larger than or equal to a set threshold value, the ordered content is used as a retrieval result;
Splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, wherein the similarity of the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
and supplementing the search result by using the newly added part in the reconstructed contrast pointing vector.
In a second aspect, the present application provides a knowledge-graph-based data processing apparatus, including:
The first analysis unit is used for responding to the acquired sentence data and analyzing the sentence data to obtain a plurality of phrases;
A first processing unit for assigning a direction vector to each phrase;
The second processing unit is used for connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
The first retrieval unit is used for retrieving in the knowledge graph by using the pointing vector to obtain a retrieval result, the surrounding areas of the pointing vector are all included in the first retrieval range, and the retrieval result is obtained according to the content in the first retrieval range.
In a third aspect, the present application provides a knowledge-graph-based data processing system, the system comprising:
one or more memories for storing instructions; and
One or more processors configured to invoke and execute the instructions from the memory, to perform the method as described in the first aspect and any possible implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium comprising:
A program which, when executed by a processor, performs a method as described in the first aspect and any possible implementation of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising program instructions which, when executed by a computing device, perform a method as described in the first aspect and any possible implementation of the first aspect.
In a sixth aspect, the present application provides a chip system comprising a processor for implementing the functions involved in the above aspects, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above methods.
The chip system can be composed of chips, and can also comprise chips and other discrete devices.
In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, provided on different devices, respectively, connected by wire or wirelessly, or the processor and the memory may be coupled on the same device.
The beneficial effects of the application are as follows:
According to the knowledge graph-based data processing method, device and system and the storage medium disclosed by the application, through accurate splitting of sentence data and accurate matching of the sentence data and the content in a space range, the accurate matching of questions and answer contents is realized, and the manner can provide more accurate answer contents for a questioner, so that the answer contents required by the questioner can be obtained with fewer questioning times.
Drawings
Fig. 1 is a schematic block diagram of a data processing method based on a knowledge graph.
Fig. 2 is a schematic diagram of a process for obtaining a pointing vector according to the present application.
Fig. 3 is a schematic diagram of the similarity between the pointing vector and the contrast pointing vector provided by the present application when the similarity is greater than or equal to the set threshold.
FIG. 4 is a schematic diagram of the application when the similarity between the alignment vector and the contrast alignment vector is greater than or equal to the set threshold after the adjustment of the contrast alignment vector.
Detailed Description
The technical scheme in the application is further described in detail below with reference to the accompanying drawings.
The application discloses a data processing method based on a knowledge graph, referring to fig. 1, in some examples, the data processing method based on the knowledge graph disclosed by the application comprises the following steps:
s1, responding to the obtained sentence data, and analyzing the sentence data to obtain a plurality of phrases;
S2, giving a direction vector to each phrase;
s3, connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
And S4, searching in the knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range.
The data processing method based on the knowledge graph disclosed by the application is applied to a server, the server is deployed at a local or cloud end, a questioner sends questions to the server, for example, in the form of a dialog box, and the server understands the questions and gives answers after receiving the questions.
In step S1, the server receives a sentence data, where the sentence data refers to the problem mentioned in the foregoing, after receiving the problem, the server first parses the sentence data to obtain a plurality of phrases, where the phrases are key parts in the sentence data, and in the sentence data, the association relations of the phrases, for example, including relations, subordinate relations, and the like, are also included.
In step S2, referring to fig. 2, a direction vector is given to each phrase, the direction vector is given according to an association relationship, and is described herein by means of a knowledge graph, wherein the knowledge graph reflects the relationship between entities, and the entities can be regarded as islands one by one, and the relationship is a bridge connecting the islands, and the islands and the bridge form a network. The knowledge graph is transferred into a three-dimensional space here, and for the relationships, a vector is used to represent, the vector having a length and a direction, where the different relationships are distinguished by the direction.
In step S3, the obtained plurality of direction vectors are connected end to end in order to obtain a pointing vector, where the pointing vector may be further interpreted as a polyline in three-dimensional space, where each segment of the polyline has a direction.
In step S4, the search is performed in the knowledge graph by using the directional vector, so as to obtain a search result, wherein the surrounding areas of the directional vector are all included in the first search range, and the search result is obtained according to the content in the first search range.
The method can obtain an accurate first search range in the knowledge graph at the same time, and then the content in the first search range obtains a search result. The processing mode has the advantages that all phrases in the received sentence data can be searched at the same time, and an accurate search range can be obtained by relying on the pointing vector in the search process. In the field of artificial intelligence, the answers of questioners can be accurately searched, because the association relationship between phrases is considered in the searching process.
In some examples, assigning a direction vector to each phrase includes the steps of:
s21, dividing statement data into multiple units to obtain multiple statement units, wherein the length of each statement unit obtained by each division is the same;
S22, analyzing the sentence units obtained by each division to obtain the corresponding meaning of the sentence units, wherein each sentence unit corresponds to at least one meaning;
S23, obtaining a semantic group of sentence data according to the division and a meaning corresponding to each sentence unit;
s24, screening out semantic groups with the highest occurrence number and the same expression meaning or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
S25, assigning a direction vector to each phrase according to the meaning semantic group.
Specifically, in step S21, the sentence data is first divided into a plurality of units to obtain a plurality of sentence units, and the sentence units obtained by each division have the same length.
It should be understood that sentence data is composed of subjects, predicates, objects, animals, subjects, complements, centers, etc., and is difficult for a machine to directly distinguish, and the sentence data can be understood only after being divided, so that the degree of understanding is directly determined by whether the division is appropriate or not.
For the understanding degree, there are two reference indexes of the accurate matching rate and the execution accuracy rate, the problem of the accurate matching rate is solved by using a multi-division mode, sentence units with different lengths can be obtained by using the multi-division mode, and each sentence unit obtained at the moment has two results which can be understood and cannot be understood, and the results which cannot be understood are discarded.
The results were counted and classified as understood. That is, in step S22, the sentence units obtained by each division are parsed to obtain the corresponding meaning of the sentence units, and each sentence unit corresponds to at least one meaning.
Next, in step S23, a semantic group of sentence data is obtained according to a meaning corresponding to each sentence unit, and then in step S24, the semantic group with the same meaning or similar meaning, which has the largest occurrence number, is screened out and is recorded as a meaning semantic group. Finally, in step S25, a direction vector is assigned to each phrase according to the meaning semantic group.
In this way, the meaning of the statement unit can be obtained in an accurate manner.
When one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, each direction vector corresponding to one meaning of the sentence unit.
In some possible implementations, the plurality of direction vectors assigned to the sentence unit are synthesized, and the plurality of direction vectors assigned to the sentence unit are combined into one direction vector. In this way, similar to the fusion process, it should be noted here that when a plurality of direction vectors are assigned to the sentence unit, the direction vectors should point to one direction or one area, for example, a constraint rule is that the included angle between any two direction vectors is within a required range, and for the direction vectors exceeding the required range, a discard process is required.
In some examples, retrieving in the knowledge-graph using the pointing vector includes the steps of:
S41, taking a connection point of two directional vectors with connection relation on the sequence as a node, and bringing the content at the node or in the node range into a first search range;
s42, merging the obtained first search ranges to obtain a second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range;
s43, obtaining a sub-knowledge graph by using the content in the second search range;
s44, searching in the sub-knowledge graph by using the pointing vector to obtain a search result.
In step S41 to step S44, the connection point of the two directional vectors with the connection relationship on the sequence is taken as a node, the content at the node or within the node range is included in the first search range, and then the obtained first search ranges are combined to obtain the second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range.
Thus, a smaller search range, namely a second search range, is obtained, meanwhile, the synthesis of the first search ranges is limited, at least one content of the first search ranges needs to have a direct connection relation with the content in the other first search range, and therefore connection relation among a plurality of first search ranges needs to be ensured instead of independent data islands.
And then obtaining a sub-knowledge graph by using the content in the second retrieval range, wherein the sub-knowledge graph consists of the content in the second retrieval range, and then retrieving in the sub-knowledge graph by using the pointing vector to obtain a retrieval result.
The steps of retrieving in the sub-knowledge graph using the pointing vector are as follows:
s441, calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
S442, sorting the content packets according to the association degree, wherein the association degree of the content packets is inversely related to the position in the sequence;
s443, obtaining the search result in at least the previous content packet of the sequence.
Specifically, the association degree between each content in the sub-knowledge graph and the pointing vector is calculated firstly, namely, how many phrases in sentence data each content in the sub-knowledge graph has a relation with the sub-knowledge graph is calculated, and then the contents in the sub-knowledge graph are grouped according to the number of the phrases with the relation.
At this time, the content which has no relation with the word groups in the sentence data can be eliminated, and the word groups which have close connection relation with the word groups in the sentence data can be obtained. The search result is then obtained in at least the preceding content packet of the sequential sequence.
The step of obtaining the search result in at least the preceding content packet of the sequential sequence is as follows:
S4431 obtaining a plurality of contents in the first content packet;
S4432, sorting the contents according to the relation and constructing a comparison pointing vector according to the sorted contents;
s4433, calculating the similarity between the pointing vector and the contrast pointing vector;
S4434, when the similarity between the pointing vector and the contrast pointing vector is greater than or equal to a set threshold, taking the ordered content as a retrieval result;
s4435, splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, wherein the similarity of the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
And S4436, supplementing the search result by using the newly added part in the reconstructed contrast direction vector.
In steps S4431 to S4436, a comparison vector is first constructed according to the contents obtained in the first content group, and then the similarity between the vector and the comparison vector is calculated. There are two processing modes at this time:
firstly, when the similarity between the pointing vector and the contrast pointing vector is larger than or equal to a set threshold value, taking the ordered content as a retrieval result;
Secondly, when the similarity between the pointing vector and the contrast pointing vector is smaller than a set threshold, splitting and reconstructing the contrast pointing vector, and supplementing a search result by using a newly added part in the reconstructed contrast pointing vector, wherein the similarity between the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold.
Referring to fig. 3 and 4, the splitting and reconstructing of the contrast vector divides the contrast vector into multiple segments (similar region and dissimilar region), the divided segments are similar to a part of the vector, and then the empty part between two adjacent segments is complemented by a new vector, where the similarity process is not limited in length.
When supplementing with a new vector, a content is selected in the first content group to be placed in the dissimilar region, then the pointing vector is assigned, and then the contrast pointing vector is reconstructed.
The new vector means that a change is made to the relationship, and the use of the new vector complement means that a new relationship is added. For the two processing modes, when at least one contrast vector can be obtained by using the first processing mode, the second processing mode is not used; when the contrast vector cannot be obtained by using the first processing method, the contrast vector is obtained by using the second processing method.
The application also provides a data processing device based on the knowledge graph, which comprises:
The first analysis unit is used for responding to the acquired sentence data and analyzing the sentence data to obtain a plurality of phrases;
A first processing unit for assigning a direction vector to each phrase;
The second processing unit is used for connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
The first retrieval unit is used for retrieving in the knowledge graph by using the pointing vector to obtain a retrieval result, the surrounding areas of the pointing vector are all included in the first retrieval range, and the retrieval result is obtained according to the content in the first retrieval range.
Further, the method further comprises the following steps:
the dividing unit is used for dividing the sentence data into multiple units to obtain multiple sentence units, and the length of each sentence unit obtained by each division is the same;
the second analysis unit is used for analyzing the sentence units obtained by each division to obtain the corresponding meaning of the sentence units, and each sentence unit corresponds to at least one meaning;
The third processing unit is used for obtaining a semantic group of the sentence data according to the division and a meaning corresponding to each sentence unit;
The screening unit is used for screening semantic groups with the highest occurrence number and the same or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
And the giving unit is used for giving a direction vector to each phrase according to the meaning semantic group.
Further, when one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, and each direction vector corresponds to one meaning of the sentence unit.
Further, the plurality of direction vectors given to the sentence unit are synthesized, and the plurality of direction vectors given to the sentence unit are combined into one direction vector.
Further, the method further comprises the following steps:
a first search range construction unit, configured to take a connection point of two directional vectors with a connection relationship on the sequence as a node, and incorporate content at the node or within the node range into the first search range;
The second search range construction unit is used for combining the obtained first search ranges to obtain a second search range, and at least one content of the first search range has a direct connection relationship with the content in the other first search range;
the second retrieval unit is used for obtaining a sub-knowledge graph by using the content in the second retrieval range;
And the third retrieval unit is used for retrieving in the sub-knowledge graph by using the pointing vector to obtain a retrieval result.
Further, the method further comprises the following steps:
the first calculation unit is used for calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
the ordering unit is used for ordering the content packets according to the association degree, and the association degree of the content packets is inversely related to the position in the sequence;
And a fourth retrieval unit for obtaining a retrieval result in at least a previous content packet of the sequential sequence.
Further, the method further comprises the following steps:
a fourth processing unit for obtaining a plurality of contents in the first content packet;
The fifth processing unit is used for sequencing the contents according to the relation and constructing a comparison pointing vector according to the sequenced contents;
The second calculating unit is used for calculating the similarity between the pointing vector and the contrast pointing vector;
The sixth processing unit is used for taking the ordered content as a retrieval result when the similarity between the pointing vector and the contrast pointing vector is greater than or equal to a set threshold value;
the seventh processing unit is used for splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, and the reconstructed similarity of the contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
and the result supplementing unit is used for supplementing the search result by using the newly added part in the reconstructed contrast pointing vector.
In one example, the unit in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integratedcircuit, ASIC), or one or more digital signal processors (DIGITAL SIGNAL processor, DSP), or one or more field programmable gate arrays (field programmable GATE ARRAY, FPGA), or a combination of at least two of these integrated circuit forms.
For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke a program. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Various objects such as various messages/information/devices/network elements/systems/devices/actions/operations/processes/concepts may be named in the present application, and it should be understood that these specific names do not constitute limitations on related objects, and that the named names may be changed according to the scenario, context, or usage habit, etc., and understanding of technical meaning of technical terms in the present application should be mainly determined from functions and technical effects that are embodied/performed in the technical solution.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It should also be understood that in various embodiments of the present application, first, second, etc. are merely intended to represent that multiple objects are different. For example, the first time window and the second time window are only intended to represent different time windows. Without any effect on the time window itself, the first, second, etc. mentioned above should not impose any limitation on the embodiments of the present application.
It is also to be understood that in the various embodiments of the application, where no special description or logic conflict exists, the terms and/or descriptions between the various embodiments are consistent and may reference each other, and features of the various embodiments may be combined to form new embodiments in accordance with their inherent logic relationships.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a computer-readable storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The application also provides a data processing system based on the knowledge graph, which comprises:
one or more memories for storing instructions; and
One or more processors configured to invoke and execute the instructions from the memory to perform the method as set forth above.
The present application also provides a computer program product comprising instructions which, when executed, cause the data processing system to perform operations corresponding to the data processing system of the above method.
The present application also provides a chip system comprising a processor for implementing the functions involved in the above, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above method.
The chip system can be composed of chips, and can also comprise chips and other discrete devices.
The processor referred to in any of the foregoing may be a CPU, microprocessor, ASIC, or integrated circuit that performs one or more of the procedures for controlling the transmission of feedback information described above.
In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, and disposed on different devices, respectively, and connected by wired or wireless means, so as to support the chip system to implement the various functions in the foregoing embodiments. Or the processor and the memory may be coupled to the same device.
Optionally, the computer instructions are stored in a memory.
Alternatively, the memory may be a storage unit in the chip, such as a register, a cache, etc., and the memory may also be a storage unit in the terminal located outside the chip, such as a ROM or other type of static storage device, a RAM, etc., that may store static information and instructions.
It will be appreciated that the memory in the present application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
The non-volatile memory may be a ROM, programmable ROM (PROM), erasable programmable ROM (erasable PROM, EPROM), electrically erasable programmable EPROM (EEPROM), or flash memory.
The volatile memory may be RAM, which acts as external cache. There are many different types of RAM, such as sram (STATIC RAM, SRAM), DRAM (DYNAMIC RAM, DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (double DATA RATE SDRAM, DDR SDRAM), enhanced SDRAM (ENHANCED SDRAM, ESDRAM), synchronous DRAM (SYNCH LINK DRAM, SLDRAM), and direct memory bus RAM.
The embodiments of the present application are all preferred embodiments of the present application, and are not intended to limit the scope of the present application in this way, therefore: all equivalent changes in structure, shape and principle of the application should be covered in the scope of protection of the application.

Claims (9)

1. The data processing method based on the knowledge graph is characterized by comprising the following steps of:
responding to the obtained sentence data, and analyzing the sentence data to obtain a plurality of phrases;
Assigning a direction vector to each phrase;
Connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector;
Searching in the knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range;
Assigning a direction vector to each phrase includes:
Dividing statement data into multiple units to obtain multiple statement units, wherein the length of each statement unit obtained by each division is the same;
analyzing the sentence units obtained by each division to obtain corresponding meanings of the sentence units, wherein each sentence unit corresponds to at least one meaning;
Obtaining a semantic group of sentence data according to the division and a meaning corresponding to each sentence unit;
Screening semantic groups with the highest occurrence frequency and the same expression meaning or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
Each phrase is assigned a direction vector according to the meaning semantic group.
2. The knowledge-graph-based data processing method according to claim 1, wherein when one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, each direction vector corresponding to one meaning of the sentence unit.
3. The knowledge-graph-based data processing method according to claim 2, wherein the plurality of direction vectors assigned to the sentence unit are synthesized, and the plurality of direction vectors assigned to the sentence unit are combined into one direction vector.
4. A knowledge-graph based data processing method according to any one of claims 1 to 3, characterized in that the retrieval in the knowledge-graph using the pointing vector comprises:
Taking the connection points of the two directional vectors with connection relations on the sequence as nodes, and bringing the content at the nodes or in the range of the nodes into a first retrieval range;
combining the obtained first search ranges to obtain a second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range;
obtaining a sub-knowledge graph by using the content in the second retrieval range;
and searching in the sub-knowledge graph by using the pointing vector to obtain a search result.
5. The knowledge-graph based data processing method of claim 4, wherein retrieving in the sub-knowledge-graph using the pointing vector comprises:
Calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
Sorting the content packets according to the association degree, wherein the association degree of the content packets is inversely related to the position in the sequence;
The search result is obtained in at least the preceding content packet of the sequential sequence.
6. The knowledge-graph based data processing method of claim 5, wherein obtaining a search result in at least a previous content packet of the sequential sequence comprises:
Obtaining a plurality of content in a first content packet;
Sorting the contents according to the relation and constructing a comparison pointing vector according to the sorted contents;
Calculating the similarity of the pointing vector and the contrast pointing vector;
when the similarity between the pointing vector and the contrast pointing vector is larger than or equal to a set threshold value, the ordered content is used as a retrieval result;
Splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, wherein the similarity of the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
and supplementing the search result by using the newly added part in the reconstructed contrast pointing vector.
7. A knowledge-graph-based data processing apparatus, comprising:
The first analysis unit is used for responding to the acquired sentence data and analyzing the sentence data to obtain a plurality of phrases;
A first processing unit for assigning a direction vector to each phrase;
the second processing unit is used for connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector;
The first retrieval unit is used for retrieving in the knowledge graph by using the pointing vector to obtain a retrieval result, the surrounding areas of the pointing vector are all included in a first retrieval range, and the retrieval result is obtained according to the content in the first retrieval range;
the dividing unit is used for dividing the sentence data into multiple units to obtain multiple sentence units, and the length of each sentence unit obtained by each division is the same;
the second analysis unit is used for analyzing the sentence units obtained by each division to obtain the corresponding meaning of the sentence units, and each sentence unit corresponds to at least one meaning;
The third processing unit is used for obtaining a semantic group of the sentence data according to the division and a meaning corresponding to each sentence unit;
The screening unit is used for screening semantic groups with the highest occurrence number and the same or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
And the giving unit is used for giving a direction vector to each phrase according to the meaning semantic group.
8. A knowledge-graph-based data processing system, the system comprising:
one or more memories for storing instructions; and
One or more processors to invoke and execute the instructions from the memory to perform the method of any of claims 1 to 6.
9. A computer-readable storage medium, the computer-readable storage medium comprising:
program which, when executed by a processor, performs the method according to any one of claims 1 to 6.
CN202410226896.9A 2024-02-29 2024-02-29 Knowledge graph-based data processing method, device and system and storage medium Active CN117807252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410226896.9A CN117807252B (en) 2024-02-29 2024-02-29 Knowledge graph-based data processing method, device and system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410226896.9A CN117807252B (en) 2024-02-29 2024-02-29 Knowledge graph-based data processing method, device and system and storage medium

Publications (2)

Publication Number Publication Date
CN117807252A CN117807252A (en) 2024-04-02
CN117807252B true CN117807252B (en) 2024-04-30

Family

ID=90430336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410226896.9A Active CN117807252B (en) 2024-02-29 2024-02-29 Knowledge graph-based data processing method, device and system and storage medium

Country Status (1)

Country Link
CN (1) CN117807252B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632224A (en) * 2020-12-29 2021-04-09 天津汇智星源信息技术有限公司 Case recommendation method and device based on case knowledge graph and electronic equipment
WO2021092099A1 (en) * 2019-11-05 2021-05-14 Epacca, Inc. Mechanistic causal reasoning for efficient analytics and natural language
CN112883172A (en) * 2021-02-03 2021-06-01 大连理工大学 Biomedical question-answering method based on dual knowledge selection
CN113761151A (en) * 2021-05-07 2021-12-07 腾讯科技(深圳)有限公司 Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium
KR20220066554A (en) * 2020-11-16 2022-05-24 주식회사 포티투마루 Method, apparatus and computer program for buildding knowledge graph using qa model
CN114564966A (en) * 2022-03-04 2022-05-31 中国科学院地理科学与资源研究所 Spatial relation semantic analysis method based on knowledge graph
CN115683129A (en) * 2023-01-04 2023-02-03 苏州尚同墨方智能科技有限公司 Long-term repositioning method and device based on high-definition map
CN115964528A (en) * 2022-12-19 2023-04-14 中科(厦门)数据智能研究院 Picture retrieval optimization algorithm based on street view retrieval
CN115982338A (en) * 2023-02-24 2023-04-18 中国测绘科学研究院 Query path ordering-based domain knowledge graph question-answering method and system
CN116152938A (en) * 2021-11-18 2023-05-23 腾讯科技(深圳)有限公司 Method, device and equipment for training identity recognition model and transferring electronic resources
CN116595971A (en) * 2022-03-03 2023-08-15 平安健康保险股份有限公司 Knowledge graph construction method and device, storage medium and computer equipment
CN116737915A (en) * 2023-08-16 2023-09-12 中移信息系统集成有限公司 Semantic retrieval method, device, equipment and storage medium based on knowledge graph
CN116860996A (en) * 2023-07-05 2023-10-10 安徽华云安科技有限公司 Method, device, equipment and storage medium for constructing three-dimensional knowledge graph
CN116910633A (en) * 2023-09-14 2023-10-20 北京科东电力控制系统有限责任公司 Power grid fault prediction method based on multi-modal knowledge mixed reasoning
CN116958342A (en) * 2023-05-15 2023-10-27 腾讯科技(深圳)有限公司 Method for generating actions of virtual image, method and device for constructing action library
CN117521804A (en) * 2023-10-31 2024-02-06 内蒙古电力(集团)有限责任公司包头供电分公司 Relay protection hidden fault identification and reasoning method based on knowledge graph

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11710049B2 (en) * 2020-12-16 2023-07-25 Ro5 Inc. System and method for the contextualization of molecules
US20230245654A1 (en) * 2022-01-31 2023-08-03 Meta Platforms, Inc. Systems and Methods for Implementing Smart Assistant Systems

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021092099A1 (en) * 2019-11-05 2021-05-14 Epacca, Inc. Mechanistic causal reasoning for efficient analytics and natural language
KR20220066554A (en) * 2020-11-16 2022-05-24 주식회사 포티투마루 Method, apparatus and computer program for buildding knowledge graph using qa model
CN112632224A (en) * 2020-12-29 2021-04-09 天津汇智星源信息技术有限公司 Case recommendation method and device based on case knowledge graph and electronic equipment
CN112883172A (en) * 2021-02-03 2021-06-01 大连理工大学 Biomedical question-answering method based on dual knowledge selection
CN113761151A (en) * 2021-05-07 2021-12-07 腾讯科技(深圳)有限公司 Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium
CN116152938A (en) * 2021-11-18 2023-05-23 腾讯科技(深圳)有限公司 Method, device and equipment for training identity recognition model and transferring electronic resources
CN116595971A (en) * 2022-03-03 2023-08-15 平安健康保险股份有限公司 Knowledge graph construction method and device, storage medium and computer equipment
CN114564966A (en) * 2022-03-04 2022-05-31 中国科学院地理科学与资源研究所 Spatial relation semantic analysis method based on knowledge graph
CN115964528A (en) * 2022-12-19 2023-04-14 中科(厦门)数据智能研究院 Picture retrieval optimization algorithm based on street view retrieval
CN115683129A (en) * 2023-01-04 2023-02-03 苏州尚同墨方智能科技有限公司 Long-term repositioning method and device based on high-definition map
CN115982338A (en) * 2023-02-24 2023-04-18 中国测绘科学研究院 Query path ordering-based domain knowledge graph question-answering method and system
CN116958342A (en) * 2023-05-15 2023-10-27 腾讯科技(深圳)有限公司 Method for generating actions of virtual image, method and device for constructing action library
CN116860996A (en) * 2023-07-05 2023-10-10 安徽华云安科技有限公司 Method, device, equipment and storage medium for constructing three-dimensional knowledge graph
CN116737915A (en) * 2023-08-16 2023-09-12 中移信息系统集成有限公司 Semantic retrieval method, device, equipment and storage medium based on knowledge graph
CN116910633A (en) * 2023-09-14 2023-10-20 北京科东电力控制系统有限责任公司 Power grid fault prediction method based on multi-modal knowledge mixed reasoning
CN117521804A (en) * 2023-10-31 2024-02-06 内蒙古电力(集团)有限责任公司包头供电分公司 Relay protection hidden fault identification and reasoning method based on knowledge graph

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Domain-specific Knowledge Graphs: A survey;Bilal Abu-Salih;《arXiv:2011.00235v3》;20201031;1-38 *
基于记忆建模的深度学习模型及其在问答系统中的应用;潘永华;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115(第01期);I138-2515 *
基于预训练语言模型的中文知识图谱问答研究;张天杭;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220115(第01期);I138-3340 *
面向新兴技术追踪预测的专利数据组织与知识发现研究;魏明珠;《 万方数据知识服务平台》;20230823;1-205 *

Also Published As

Publication number Publication date
CN117807252A (en) 2024-04-02

Similar Documents

Publication Publication Date Title
US8655805B2 (en) Method for classification of objects in a graph data stream
US7933282B1 (en) Packet classification device for storing groups of rules
US7308446B1 (en) Methods and apparatus for regular expression matching
Delling et al. Hub label compression
US20130339352A1 (en) Shortest path computation in large networks
JP6613475B2 (en) Route inquiry method, apparatus, device, and non-volatile computer storage medium
US20150019592A1 (en) Systems, methods and software for computing reachability in large graphs
CN100442255C (en) Associative memory with entry groups and skip operations
JP2018531379A6 (en) Route inquiry method, apparatus, device, and non-volatile computer storage medium
US10747741B2 (en) Mechanism for efficient storage of graph data
Wang et al. A fast online spanner for roadmap construction
CN111274455B (en) Graph data processing method and device, electronic equipment and computer readable medium
CN117807252B (en) Knowledge graph-based data processing method, device and system and storage medium
Nagayama et al. An efficient heuristic for linear decomposition of index generation functions
US10169469B2 (en) System and method for searching using orthogonal codes
Brisaboa et al. Using Compressed Suffix-Arrays for a compact representation of temporal-graphs
Djordjevic et al. Detecting regular visit patterns
CN112100313B (en) Data indexing method and system based on finest granularity segmentation
Abam et al. Kinetic spanners in Rd
Ben-Ari et al. On a local version of the bak–sneppen model
CN112465514A (en) Block chain-based layered transaction parallel execution method and system
CN109726328A (en) Information acquisition method, device, electronic equipment and computer readable storage medium
CN116541421B (en) Address query information generation method and device, electronic equipment and computer medium
US6839799B2 (en) Method for the prioritization of database entries
Jung et al. Processing continuous range queries with non-spatial selections

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant