CN117807252B - Knowledge graph-based data processing method, device and system and storage medium - Google Patents
Knowledge graph-based data processing method, device and system and storage medium Download PDFInfo
- Publication number
- CN117807252B CN117807252B CN202410226896.9A CN202410226896A CN117807252B CN 117807252 B CN117807252 B CN 117807252B CN 202410226896 A CN202410226896 A CN 202410226896A CN 117807252 B CN117807252 B CN 117807252B
- Authority
- CN
- China
- Prior art keywords
- sentence
- content
- vector
- pointing vector
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 172
- 238000000034 method Methods 0.000 claims description 33
- 238000012545 processing Methods 0.000 claims description 33
- 230000015654 memory Effects 0.000 claims description 24
- 238000012216 screening Methods 0.000 claims description 7
- 230000001502 supplementing effect Effects 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 description 10
- 230000006870 function Effects 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/383—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a data processing method, a device, a system and a storage medium based on a knowledge graph, which comprise responding to acquired sentence data, analyzing the sentence data to obtain a plurality of phrases; assigning a direction vector to each phrase; and connecting the obtained multiple direction vectors end to end in sequence to obtain a pointing vector, searching in a knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range. According to the knowledge-graph-based data processing method, device and system and the storage medium disclosed by the application, the retrieval and the high matching of the content are realized based on the three-dimensional space operation, and more accurate content can be obtained.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data processing method, device, system and storage medium based on a knowledge graph.
Background
The knowledge graph is a structured semantic knowledge base for describing concepts and interrelationships thereof in a physical world in a symbol form, and the basic constituent units of the knowledge graph are entity-relation-entity triples and entity and related attribute-value pairs thereof, and the entities are mutually connected through the relation to form a net-shaped knowledge structure.
In the search field, the knowledge graph has wide application prospect, for example, in the current artificial intelligence field, the trained artificial intelligence can give accurate answers according to questions proposed by questioners, and the implementation of the process can be realized by depending on the knowledge graph. Because in the process of obtaining an accurate answer, an accurate search is required in the existing contents. Prior to such accurate searching, the commonly used searching method is based on association searching, and the searching method sorts the searched content according to the association degree, so that the questioner needs to further screen the content.
With the wide-range use of artificial intelligence, the accuracy and energy consumption of the retrieval process are also attracting attention, and the higher accuracy can reduce the use cost of artificial intelligence. The knowledge graph can open up the data island to realize the networking of the content, so that the content in the rich vertical field can be provided, but further research is needed on how to realize the rapid orientation of the content.
Disclosure of Invention
In order to improve the matching degree of the problems and the results in the retrieval process, the application provides a data processing method, a device, a system and a storage medium based on a knowledge graph, which are based on three-dimensional space operation to realize the high matching of the retrieval and the content and can obtain more accurate content.
The above object of the present application is achieved by the following technical solutions:
In a first aspect, the present application provides a data processing method based on a knowledge graph, including:
responding to the obtained sentence data, and analyzing the sentence data to obtain a plurality of phrases;
Assigning a direction vector to each phrase;
connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
And searching in the knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range.
In a possible implementation manner of the first aspect, assigning a direction vector to each phrase includes:
Dividing statement data into multiple units to obtain multiple statement units, wherein the length of each statement unit obtained by each division is the same;
analyzing the sentence units obtained by each division to obtain corresponding meanings of the sentence units, wherein each sentence unit corresponds to at least one meaning;
Obtaining a semantic group of sentence data according to the division and a meaning corresponding to each sentence unit;
Screening semantic groups with the highest occurrence frequency and the same expression meaning or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
Each phrase is assigned a direction vector according to the meaning semantic group.
In a possible implementation manner of the first aspect, when one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, and each direction vector corresponds to one meaning of the sentence unit.
In a possible implementation manner of the first aspect, the synthesis processing is performed on the plurality of direction vectors assigned to the sentence unit, and the plurality of direction vectors assigned to the sentence unit are combined into one direction vector.
In a possible implementation manner of the first aspect, retrieving in the knowledge-graph using the pointing vector includes:
Taking the connection points of the two directional vectors with connection relations on the sequence as nodes, and bringing the content at the nodes or in the range of the nodes into a first retrieval range;
combining the obtained first search ranges to obtain a second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range;
obtaining a sub-knowledge graph by using the content in the second retrieval range;
and searching in the sub-knowledge graph by using the pointing vector to obtain a search result.
In a possible implementation manner of the first aspect, retrieving in the sub-knowledge-graph using the pointing vector includes:
Calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
Sorting the content packets according to the association degree, wherein the association degree of the content packets is inversely related to the position in the sequence;
The search result is obtained in at least the preceding content packet of the sequential sequence.
In a possible implementation manner of the first aspect, obtaining the search result in at least a previous content packet of the sequential sequence includes:
Obtaining a plurality of content in a first content packet;
Sorting the contents according to the relation and constructing a comparison pointing vector according to the sorted contents;
Calculating the similarity of the pointing vector and the contrast pointing vector;
when the similarity between the pointing vector and the contrast pointing vector is larger than or equal to a set threshold value, the ordered content is used as a retrieval result;
Splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, wherein the similarity of the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
and supplementing the search result by using the newly added part in the reconstructed contrast pointing vector.
In a second aspect, the present application provides a knowledge-graph-based data processing apparatus, including:
The first analysis unit is used for responding to the acquired sentence data and analyzing the sentence data to obtain a plurality of phrases;
A first processing unit for assigning a direction vector to each phrase;
The second processing unit is used for connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
The first retrieval unit is used for retrieving in the knowledge graph by using the pointing vector to obtain a retrieval result, the surrounding areas of the pointing vector are all included in the first retrieval range, and the retrieval result is obtained according to the content in the first retrieval range.
In a third aspect, the present application provides a knowledge-graph-based data processing system, the system comprising:
one or more memories for storing instructions; and
One or more processors configured to invoke and execute the instructions from the memory, to perform the method as described in the first aspect and any possible implementation of the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium comprising:
A program which, when executed by a processor, performs a method as described in the first aspect and any possible implementation of the first aspect.
In a fifth aspect, the present application provides a computer program product comprising program instructions which, when executed by a computing device, perform a method as described in the first aspect and any possible implementation of the first aspect.
In a sixth aspect, the present application provides a chip system comprising a processor for implementing the functions involved in the above aspects, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above methods.
The chip system can be composed of chips, and can also comprise chips and other discrete devices.
In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, provided on different devices, respectively, connected by wire or wirelessly, or the processor and the memory may be coupled on the same device.
The beneficial effects of the application are as follows:
According to the knowledge graph-based data processing method, device and system and the storage medium disclosed by the application, through accurate splitting of sentence data and accurate matching of the sentence data and the content in a space range, the accurate matching of questions and answer contents is realized, and the manner can provide more accurate answer contents for a questioner, so that the answer contents required by the questioner can be obtained with fewer questioning times.
Drawings
Fig. 1 is a schematic block diagram of a data processing method based on a knowledge graph.
Fig. 2 is a schematic diagram of a process for obtaining a pointing vector according to the present application.
Fig. 3 is a schematic diagram of the similarity between the pointing vector and the contrast pointing vector provided by the present application when the similarity is greater than or equal to the set threshold.
FIG. 4 is a schematic diagram of the application when the similarity between the alignment vector and the contrast alignment vector is greater than or equal to the set threshold after the adjustment of the contrast alignment vector.
Detailed Description
The technical scheme in the application is further described in detail below with reference to the accompanying drawings.
The application discloses a data processing method based on a knowledge graph, referring to fig. 1, in some examples, the data processing method based on the knowledge graph disclosed by the application comprises the following steps:
s1, responding to the obtained sentence data, and analyzing the sentence data to obtain a plurality of phrases;
S2, giving a direction vector to each phrase;
s3, connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
And S4, searching in the knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range.
The data processing method based on the knowledge graph disclosed by the application is applied to a server, the server is deployed at a local or cloud end, a questioner sends questions to the server, for example, in the form of a dialog box, and the server understands the questions and gives answers after receiving the questions.
In step S1, the server receives a sentence data, where the sentence data refers to the problem mentioned in the foregoing, after receiving the problem, the server first parses the sentence data to obtain a plurality of phrases, where the phrases are key parts in the sentence data, and in the sentence data, the association relations of the phrases, for example, including relations, subordinate relations, and the like, are also included.
In step S2, referring to fig. 2, a direction vector is given to each phrase, the direction vector is given according to an association relationship, and is described herein by means of a knowledge graph, wherein the knowledge graph reflects the relationship between entities, and the entities can be regarded as islands one by one, and the relationship is a bridge connecting the islands, and the islands and the bridge form a network. The knowledge graph is transferred into a three-dimensional space here, and for the relationships, a vector is used to represent, the vector having a length and a direction, where the different relationships are distinguished by the direction.
In step S3, the obtained plurality of direction vectors are connected end to end in order to obtain a pointing vector, where the pointing vector may be further interpreted as a polyline in three-dimensional space, where each segment of the polyline has a direction.
In step S4, the search is performed in the knowledge graph by using the directional vector, so as to obtain a search result, wherein the surrounding areas of the directional vector are all included in the first search range, and the search result is obtained according to the content in the first search range.
The method can obtain an accurate first search range in the knowledge graph at the same time, and then the content in the first search range obtains a search result. The processing mode has the advantages that all phrases in the received sentence data can be searched at the same time, and an accurate search range can be obtained by relying on the pointing vector in the search process. In the field of artificial intelligence, the answers of questioners can be accurately searched, because the association relationship between phrases is considered in the searching process.
In some examples, assigning a direction vector to each phrase includes the steps of:
s21, dividing statement data into multiple units to obtain multiple statement units, wherein the length of each statement unit obtained by each division is the same;
S22, analyzing the sentence units obtained by each division to obtain the corresponding meaning of the sentence units, wherein each sentence unit corresponds to at least one meaning;
S23, obtaining a semantic group of sentence data according to the division and a meaning corresponding to each sentence unit;
s24, screening out semantic groups with the highest occurrence number and the same expression meaning or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
S25, assigning a direction vector to each phrase according to the meaning semantic group.
Specifically, in step S21, the sentence data is first divided into a plurality of units to obtain a plurality of sentence units, and the sentence units obtained by each division have the same length.
It should be understood that sentence data is composed of subjects, predicates, objects, animals, subjects, complements, centers, etc., and is difficult for a machine to directly distinguish, and the sentence data can be understood only after being divided, so that the degree of understanding is directly determined by whether the division is appropriate or not.
For the understanding degree, there are two reference indexes of the accurate matching rate and the execution accuracy rate, the problem of the accurate matching rate is solved by using a multi-division mode, sentence units with different lengths can be obtained by using the multi-division mode, and each sentence unit obtained at the moment has two results which can be understood and cannot be understood, and the results which cannot be understood are discarded.
The results were counted and classified as understood. That is, in step S22, the sentence units obtained by each division are parsed to obtain the corresponding meaning of the sentence units, and each sentence unit corresponds to at least one meaning.
Next, in step S23, a semantic group of sentence data is obtained according to a meaning corresponding to each sentence unit, and then in step S24, the semantic group with the same meaning or similar meaning, which has the largest occurrence number, is screened out and is recorded as a meaning semantic group. Finally, in step S25, a direction vector is assigned to each phrase according to the meaning semantic group.
In this way, the meaning of the statement unit can be obtained in an accurate manner.
When one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, each direction vector corresponding to one meaning of the sentence unit.
In some possible implementations, the plurality of direction vectors assigned to the sentence unit are synthesized, and the plurality of direction vectors assigned to the sentence unit are combined into one direction vector. In this way, similar to the fusion process, it should be noted here that when a plurality of direction vectors are assigned to the sentence unit, the direction vectors should point to one direction or one area, for example, a constraint rule is that the included angle between any two direction vectors is within a required range, and for the direction vectors exceeding the required range, a discard process is required.
In some examples, retrieving in the knowledge-graph using the pointing vector includes the steps of:
S41, taking a connection point of two directional vectors with connection relation on the sequence as a node, and bringing the content at the node or in the node range into a first search range;
s42, merging the obtained first search ranges to obtain a second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range;
s43, obtaining a sub-knowledge graph by using the content in the second search range;
s44, searching in the sub-knowledge graph by using the pointing vector to obtain a search result.
In step S41 to step S44, the connection point of the two directional vectors with the connection relationship on the sequence is taken as a node, the content at the node or within the node range is included in the first search range, and then the obtained first search ranges are combined to obtain the second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range.
Thus, a smaller search range, namely a second search range, is obtained, meanwhile, the synthesis of the first search ranges is limited, at least one content of the first search ranges needs to have a direct connection relation with the content in the other first search range, and therefore connection relation among a plurality of first search ranges needs to be ensured instead of independent data islands.
And then obtaining a sub-knowledge graph by using the content in the second retrieval range, wherein the sub-knowledge graph consists of the content in the second retrieval range, and then retrieving in the sub-knowledge graph by using the pointing vector to obtain a retrieval result.
The steps of retrieving in the sub-knowledge graph using the pointing vector are as follows:
s441, calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
S442, sorting the content packets according to the association degree, wherein the association degree of the content packets is inversely related to the position in the sequence;
s443, obtaining the search result in at least the previous content packet of the sequence.
Specifically, the association degree between each content in the sub-knowledge graph and the pointing vector is calculated firstly, namely, how many phrases in sentence data each content in the sub-knowledge graph has a relation with the sub-knowledge graph is calculated, and then the contents in the sub-knowledge graph are grouped according to the number of the phrases with the relation.
At this time, the content which has no relation with the word groups in the sentence data can be eliminated, and the word groups which have close connection relation with the word groups in the sentence data can be obtained. The search result is then obtained in at least the preceding content packet of the sequential sequence.
The step of obtaining the search result in at least the preceding content packet of the sequential sequence is as follows:
S4431 obtaining a plurality of contents in the first content packet;
S4432, sorting the contents according to the relation and constructing a comparison pointing vector according to the sorted contents;
s4433, calculating the similarity between the pointing vector and the contrast pointing vector;
S4434, when the similarity between the pointing vector and the contrast pointing vector is greater than or equal to a set threshold, taking the ordered content as a retrieval result;
s4435, splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, wherein the similarity of the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
And S4436, supplementing the search result by using the newly added part in the reconstructed contrast direction vector.
In steps S4431 to S4436, a comparison vector is first constructed according to the contents obtained in the first content group, and then the similarity between the vector and the comparison vector is calculated. There are two processing modes at this time:
firstly, when the similarity between the pointing vector and the contrast pointing vector is larger than or equal to a set threshold value, taking the ordered content as a retrieval result;
Secondly, when the similarity between the pointing vector and the contrast pointing vector is smaller than a set threshold, splitting and reconstructing the contrast pointing vector, and supplementing a search result by using a newly added part in the reconstructed contrast pointing vector, wherein the similarity between the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold.
Referring to fig. 3 and 4, the splitting and reconstructing of the contrast vector divides the contrast vector into multiple segments (similar region and dissimilar region), the divided segments are similar to a part of the vector, and then the empty part between two adjacent segments is complemented by a new vector, where the similarity process is not limited in length.
When supplementing with a new vector, a content is selected in the first content group to be placed in the dissimilar region, then the pointing vector is assigned, and then the contrast pointing vector is reconstructed.
The new vector means that a change is made to the relationship, and the use of the new vector complement means that a new relationship is added. For the two processing modes, when at least one contrast vector can be obtained by using the first processing mode, the second processing mode is not used; when the contrast vector cannot be obtained by using the first processing method, the contrast vector is obtained by using the second processing method.
The application also provides a data processing device based on the knowledge graph, which comprises:
The first analysis unit is used for responding to the acquired sentence data and analyzing the sentence data to obtain a plurality of phrases;
A first processing unit for assigning a direction vector to each phrase;
The second processing unit is used for connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector; and
The first retrieval unit is used for retrieving in the knowledge graph by using the pointing vector to obtain a retrieval result, the surrounding areas of the pointing vector are all included in the first retrieval range, and the retrieval result is obtained according to the content in the first retrieval range.
Further, the method further comprises the following steps:
the dividing unit is used for dividing the sentence data into multiple units to obtain multiple sentence units, and the length of each sentence unit obtained by each division is the same;
the second analysis unit is used for analyzing the sentence units obtained by each division to obtain the corresponding meaning of the sentence units, and each sentence unit corresponds to at least one meaning;
The third processing unit is used for obtaining a semantic group of the sentence data according to the division and a meaning corresponding to each sentence unit;
The screening unit is used for screening semantic groups with the highest occurrence number and the same or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
And the giving unit is used for giving a direction vector to each phrase according to the meaning semantic group.
Further, when one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, and each direction vector corresponds to one meaning of the sentence unit.
Further, the plurality of direction vectors given to the sentence unit are synthesized, and the plurality of direction vectors given to the sentence unit are combined into one direction vector.
Further, the method further comprises the following steps:
a first search range construction unit, configured to take a connection point of two directional vectors with a connection relationship on the sequence as a node, and incorporate content at the node or within the node range into the first search range;
The second search range construction unit is used for combining the obtained first search ranges to obtain a second search range, and at least one content of the first search range has a direct connection relationship with the content in the other first search range;
the second retrieval unit is used for obtaining a sub-knowledge graph by using the content in the second retrieval range;
And the third retrieval unit is used for retrieving in the sub-knowledge graph by using the pointing vector to obtain a retrieval result.
Further, the method further comprises the following steps:
the first calculation unit is used for calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
the ordering unit is used for ordering the content packets according to the association degree, and the association degree of the content packets is inversely related to the position in the sequence;
And a fourth retrieval unit for obtaining a retrieval result in at least a previous content packet of the sequential sequence.
Further, the method further comprises the following steps:
a fourth processing unit for obtaining a plurality of contents in the first content packet;
The fifth processing unit is used for sequencing the contents according to the relation and constructing a comparison pointing vector according to the sequenced contents;
The second calculating unit is used for calculating the similarity between the pointing vector and the contrast pointing vector;
The sixth processing unit is used for taking the ordered content as a retrieval result when the similarity between the pointing vector and the contrast pointing vector is greater than or equal to a set threshold value;
the seventh processing unit is used for splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, and the reconstructed similarity of the contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
and the result supplementing unit is used for supplementing the search result by using the newly added part in the reconstructed contrast pointing vector.
In one example, the unit in any of the above apparatuses may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (application specific integratedcircuit, ASIC), or one or more digital signal processors (DIGITAL SIGNAL processor, DSP), or one or more field programmable gate arrays (field programmable GATE ARRAY, FPGA), or a combination of at least two of these integrated circuit forms.
For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as a central processing unit (central processing unit, CPU) or other processor that may invoke a program. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Various objects such as various messages/information/devices/network elements/systems/devices/actions/operations/processes/concepts may be named in the present application, and it should be understood that these specific names do not constitute limitations on related objects, and that the named names may be changed according to the scenario, context, or usage habit, etc., and understanding of technical meaning of technical terms in the present application should be mainly determined from functions and technical effects that are embodied/performed in the technical solution.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It should also be understood that in various embodiments of the present application, first, second, etc. are merely intended to represent that multiple objects are different. For example, the first time window and the second time window are only intended to represent different time windows. Without any effect on the time window itself, the first, second, etc. mentioned above should not impose any limitation on the embodiments of the present application.
It is also to be understood that in the various embodiments of the application, where no special description or logic conflict exists, the terms and/or descriptions between the various embodiments are consistent and may reference each other, and features of the various embodiments may be combined to form new embodiments in accordance with their inherent logic relationships.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a computer-readable storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned computer-readable storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The application also provides a data processing system based on the knowledge graph, which comprises:
one or more memories for storing instructions; and
One or more processors configured to invoke and execute the instructions from the memory to perform the method as set forth above.
The present application also provides a computer program product comprising instructions which, when executed, cause the data processing system to perform operations corresponding to the data processing system of the above method.
The present application also provides a chip system comprising a processor for implementing the functions involved in the above, e.g. generating, receiving, transmitting, or processing data and/or information involved in the above method.
The chip system can be composed of chips, and can also comprise chips and other discrete devices.
The processor referred to in any of the foregoing may be a CPU, microprocessor, ASIC, or integrated circuit that performs one or more of the procedures for controlling the transmission of feedback information described above.
In one possible design, the system on a chip also includes memory to hold the necessary program instructions and data. The processor and the memory may be decoupled, and disposed on different devices, respectively, and connected by wired or wireless means, so as to support the chip system to implement the various functions in the foregoing embodiments. Or the processor and the memory may be coupled to the same device.
Optionally, the computer instructions are stored in a memory.
Alternatively, the memory may be a storage unit in the chip, such as a register, a cache, etc., and the memory may also be a storage unit in the terminal located outside the chip, such as a ROM or other type of static storage device, a RAM, etc., that may store static information and instructions.
It will be appreciated that the memory in the present application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
The non-volatile memory may be a ROM, programmable ROM (PROM), erasable programmable ROM (erasable PROM, EPROM), electrically erasable programmable EPROM (EEPROM), or flash memory.
The volatile memory may be RAM, which acts as external cache. There are many different types of RAM, such as sram (STATIC RAM, SRAM), DRAM (DYNAMIC RAM, DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (double DATA RATE SDRAM, DDR SDRAM), enhanced SDRAM (ENHANCED SDRAM, ESDRAM), synchronous DRAM (SYNCH LINK DRAM, SLDRAM), and direct memory bus RAM.
The embodiments of the present application are all preferred embodiments of the present application, and are not intended to limit the scope of the present application in this way, therefore: all equivalent changes in structure, shape and principle of the application should be covered in the scope of protection of the application.
Claims (9)
1. The data processing method based on the knowledge graph is characterized by comprising the following steps of:
responding to the obtained sentence data, and analyzing the sentence data to obtain a plurality of phrases;
Assigning a direction vector to each phrase;
Connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector;
Searching in the knowledge graph by using the pointing vector to obtain a search result, wherein the surrounding areas of the pointing vector are all included in a first search range, and the search result is obtained according to the content in the first search range;
Assigning a direction vector to each phrase includes:
Dividing statement data into multiple units to obtain multiple statement units, wherein the length of each statement unit obtained by each division is the same;
analyzing the sentence units obtained by each division to obtain corresponding meanings of the sentence units, wherein each sentence unit corresponds to at least one meaning;
Obtaining a semantic group of sentence data according to the division and a meaning corresponding to each sentence unit;
Screening semantic groups with the highest occurrence frequency and the same expression meaning or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
Each phrase is assigned a direction vector according to the meaning semantic group.
2. The knowledge-graph-based data processing method according to claim 1, wherein when one sentence unit corresponds to a plurality of meanings, a plurality of direction vectors are assigned to the sentence unit, each direction vector corresponding to one meaning of the sentence unit.
3. The knowledge-graph-based data processing method according to claim 2, wherein the plurality of direction vectors assigned to the sentence unit are synthesized, and the plurality of direction vectors assigned to the sentence unit are combined into one direction vector.
4. A knowledge-graph based data processing method according to any one of claims 1 to 3, characterized in that the retrieval in the knowledge-graph using the pointing vector comprises:
Taking the connection points of the two directional vectors with connection relations on the sequence as nodes, and bringing the content at the nodes or in the range of the nodes into a first retrieval range;
combining the obtained first search ranges to obtain a second search range, wherein at least one content of the first search range has a direct connection relationship with the content in the other first search range;
obtaining a sub-knowledge graph by using the content in the second retrieval range;
and searching in the sub-knowledge graph by using the pointing vector to obtain a search result.
5. The knowledge-graph based data processing method of claim 4, wherein retrieving in the sub-knowledge-graph using the pointing vector comprises:
Calculating the association degree of each content in the sub-knowledge graph and the direction vector, and grouping the content in the sub-knowledge graph according to the association degree to obtain a plurality of content groups;
Sorting the content packets according to the association degree, wherein the association degree of the content packets is inversely related to the position in the sequence;
The search result is obtained in at least the preceding content packet of the sequential sequence.
6. The knowledge-graph based data processing method of claim 5, wherein obtaining a search result in at least a previous content packet of the sequential sequence comprises:
Obtaining a plurality of content in a first content packet;
Sorting the contents according to the relation and constructing a comparison pointing vector according to the sorted contents;
Calculating the similarity of the pointing vector and the contrast pointing vector;
when the similarity between the pointing vector and the contrast pointing vector is larger than or equal to a set threshold value, the ordered content is used as a retrieval result;
Splitting and reconstructing the contrast pointing vector when the similarity of the pointing vector and the contrast pointing vector is smaller than a set threshold, wherein the similarity of the reconstructed contrast pointing vector and the pointing vector is larger than or equal to the set threshold;
and supplementing the search result by using the newly added part in the reconstructed contrast pointing vector.
7. A knowledge-graph-based data processing apparatus, comprising:
The first analysis unit is used for responding to the acquired sentence data and analyzing the sentence data to obtain a plurality of phrases;
A first processing unit for assigning a direction vector to each phrase;
the second processing unit is used for connecting the obtained multiple direction vectors end to end according to the sequence to obtain a pointing vector;
The first retrieval unit is used for retrieving in the knowledge graph by using the pointing vector to obtain a retrieval result, the surrounding areas of the pointing vector are all included in a first retrieval range, and the retrieval result is obtained according to the content in the first retrieval range;
the dividing unit is used for dividing the sentence data into multiple units to obtain multiple sentence units, and the length of each sentence unit obtained by each division is the same;
the second analysis unit is used for analyzing the sentence units obtained by each division to obtain the corresponding meaning of the sentence units, and each sentence unit corresponds to at least one meaning;
The third processing unit is used for obtaining a semantic group of the sentence data according to the division and a meaning corresponding to each sentence unit;
The screening unit is used for screening semantic groups with the highest occurrence number and the same or similar expression meaning from the semantic groups, and marking the semantic groups as meaning semantic groups; and
And the giving unit is used for giving a direction vector to each phrase according to the meaning semantic group.
8. A knowledge-graph-based data processing system, the system comprising:
one or more memories for storing instructions; and
One or more processors to invoke and execute the instructions from the memory to perform the method of any of claims 1 to 6.
9. A computer-readable storage medium, the computer-readable storage medium comprising:
program which, when executed by a processor, performs the method according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410226896.9A CN117807252B (en) | 2024-02-29 | 2024-02-29 | Knowledge graph-based data processing method, device and system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410226896.9A CN117807252B (en) | 2024-02-29 | 2024-02-29 | Knowledge graph-based data processing method, device and system and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117807252A CN117807252A (en) | 2024-04-02 |
CN117807252B true CN117807252B (en) | 2024-04-30 |
Family
ID=90430336
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410226896.9A Active CN117807252B (en) | 2024-02-29 | 2024-02-29 | Knowledge graph-based data processing method, device and system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117807252B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632224A (en) * | 2020-12-29 | 2021-04-09 | 天津汇智星源信息技术有限公司 | Case recommendation method and device based on case knowledge graph and electronic equipment |
WO2021092099A1 (en) * | 2019-11-05 | 2021-05-14 | Epacca, Inc. | Mechanistic causal reasoning for efficient analytics and natural language |
CN112883172A (en) * | 2021-02-03 | 2021-06-01 | 大连理工大学 | Biomedical question-answering method based on dual knowledge selection |
CN113761151A (en) * | 2021-05-07 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium |
KR20220066554A (en) * | 2020-11-16 | 2022-05-24 | 주식회사 포티투마루 | Method, apparatus and computer program for buildding knowledge graph using qa model |
CN114564966A (en) * | 2022-03-04 | 2022-05-31 | 中国科学院地理科学与资源研究所 | Spatial relation semantic analysis method based on knowledge graph |
CN115683129A (en) * | 2023-01-04 | 2023-02-03 | 苏州尚同墨方智能科技有限公司 | Long-term repositioning method and device based on high-definition map |
CN115964528A (en) * | 2022-12-19 | 2023-04-14 | 中科(厦门)数据智能研究院 | Picture retrieval optimization algorithm based on street view retrieval |
CN115982338A (en) * | 2023-02-24 | 2023-04-18 | 中国测绘科学研究院 | Query path ordering-based domain knowledge graph question-answering method and system |
CN116152938A (en) * | 2021-11-18 | 2023-05-23 | 腾讯科技(深圳)有限公司 | Method, device and equipment for training identity recognition model and transferring electronic resources |
CN116595971A (en) * | 2022-03-03 | 2023-08-15 | 平安健康保险股份有限公司 | Knowledge graph construction method and device, storage medium and computer equipment |
CN116737915A (en) * | 2023-08-16 | 2023-09-12 | 中移信息系统集成有限公司 | Semantic retrieval method, device, equipment and storage medium based on knowledge graph |
CN116860996A (en) * | 2023-07-05 | 2023-10-10 | 安徽华云安科技有限公司 | Method, device, equipment and storage medium for constructing three-dimensional knowledge graph |
CN116910633A (en) * | 2023-09-14 | 2023-10-20 | 北京科东电力控制系统有限责任公司 | Power grid fault prediction method based on multi-modal knowledge mixed reasoning |
CN116958342A (en) * | 2023-05-15 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Method for generating actions of virtual image, method and device for constructing action library |
CN117521804A (en) * | 2023-10-31 | 2024-02-06 | 内蒙古电力(集团)有限责任公司包头供电分公司 | Relay protection hidden fault identification and reasoning method based on knowledge graph |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11710049B2 (en) * | 2020-12-16 | 2023-07-25 | Ro5 Inc. | System and method for the contextualization of molecules |
US20230245654A1 (en) * | 2022-01-31 | 2023-08-03 | Meta Platforms, Inc. | Systems and Methods for Implementing Smart Assistant Systems |
-
2024
- 2024-02-29 CN CN202410226896.9A patent/CN117807252B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021092099A1 (en) * | 2019-11-05 | 2021-05-14 | Epacca, Inc. | Mechanistic causal reasoning for efficient analytics and natural language |
KR20220066554A (en) * | 2020-11-16 | 2022-05-24 | 주식회사 포티투마루 | Method, apparatus and computer program for buildding knowledge graph using qa model |
CN112632224A (en) * | 2020-12-29 | 2021-04-09 | 天津汇智星源信息技术有限公司 | Case recommendation method and device based on case knowledge graph and electronic equipment |
CN112883172A (en) * | 2021-02-03 | 2021-06-01 | 大连理工大学 | Biomedical question-answering method based on dual knowledge selection |
CN113761151A (en) * | 2021-05-07 | 2021-12-07 | 腾讯科技(深圳)有限公司 | Synonym mining method, synonym mining device, synonym question answering method, synonym question answering device, computer equipment and storage medium |
CN116152938A (en) * | 2021-11-18 | 2023-05-23 | 腾讯科技(深圳)有限公司 | Method, device and equipment for training identity recognition model and transferring electronic resources |
CN116595971A (en) * | 2022-03-03 | 2023-08-15 | 平安健康保险股份有限公司 | Knowledge graph construction method and device, storage medium and computer equipment |
CN114564966A (en) * | 2022-03-04 | 2022-05-31 | 中国科学院地理科学与资源研究所 | Spatial relation semantic analysis method based on knowledge graph |
CN115964528A (en) * | 2022-12-19 | 2023-04-14 | 中科(厦门)数据智能研究院 | Picture retrieval optimization algorithm based on street view retrieval |
CN115683129A (en) * | 2023-01-04 | 2023-02-03 | 苏州尚同墨方智能科技有限公司 | Long-term repositioning method and device based on high-definition map |
CN115982338A (en) * | 2023-02-24 | 2023-04-18 | 中国测绘科学研究院 | Query path ordering-based domain knowledge graph question-answering method and system |
CN116958342A (en) * | 2023-05-15 | 2023-10-27 | 腾讯科技(深圳)有限公司 | Method for generating actions of virtual image, method and device for constructing action library |
CN116860996A (en) * | 2023-07-05 | 2023-10-10 | 安徽华云安科技有限公司 | Method, device, equipment and storage medium for constructing three-dimensional knowledge graph |
CN116737915A (en) * | 2023-08-16 | 2023-09-12 | 中移信息系统集成有限公司 | Semantic retrieval method, device, equipment and storage medium based on knowledge graph |
CN116910633A (en) * | 2023-09-14 | 2023-10-20 | 北京科东电力控制系统有限责任公司 | Power grid fault prediction method based on multi-modal knowledge mixed reasoning |
CN117521804A (en) * | 2023-10-31 | 2024-02-06 | 内蒙古电力(集团)有限责任公司包头供电分公司 | Relay protection hidden fault identification and reasoning method based on knowledge graph |
Non-Patent Citations (4)
Title |
---|
Domain-specific Knowledge Graphs: A survey;Bilal Abu-Salih;《arXiv:2011.00235v3》;20201031;1-38 * |
基于记忆建模的深度学习模型及其在问答系统中的应用;潘永华;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115(第01期);I138-2515 * |
基于预训练语言模型的中文知识图谱问答研究;张天杭;《中国优秀硕士学位论文全文数据库 信息科技辑》;20220115(第01期);I138-3340 * |
面向新兴技术追踪预测的专利数据组织与知识发现研究;魏明珠;《 万方数据知识服务平台》;20230823;1-205 * |
Also Published As
Publication number | Publication date |
---|---|
CN117807252A (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8655805B2 (en) | Method for classification of objects in a graph data stream | |
US7933282B1 (en) | Packet classification device for storing groups of rules | |
US7308446B1 (en) | Methods and apparatus for regular expression matching | |
Delling et al. | Hub label compression | |
US20130339352A1 (en) | Shortest path computation in large networks | |
JP6613475B2 (en) | Route inquiry method, apparatus, device, and non-volatile computer storage medium | |
US20150019592A1 (en) | Systems, methods and software for computing reachability in large graphs | |
CN100442255C (en) | Associative memory with entry groups and skip operations | |
JP2018531379A6 (en) | Route inquiry method, apparatus, device, and non-volatile computer storage medium | |
US10747741B2 (en) | Mechanism for efficient storage of graph data | |
Wang et al. | A fast online spanner for roadmap construction | |
CN111274455B (en) | Graph data processing method and device, electronic equipment and computer readable medium | |
CN117807252B (en) | Knowledge graph-based data processing method, device and system and storage medium | |
Nagayama et al. | An efficient heuristic for linear decomposition of index generation functions | |
US10169469B2 (en) | System and method for searching using orthogonal codes | |
Brisaboa et al. | Using Compressed Suffix-Arrays for a compact representation of temporal-graphs | |
Djordjevic et al. | Detecting regular visit patterns | |
CN112100313B (en) | Data indexing method and system based on finest granularity segmentation | |
Abam et al. | Kinetic spanners in Rd | |
Ben-Ari et al. | On a local version of the bak–sneppen model | |
CN112465514A (en) | Block chain-based layered transaction parallel execution method and system | |
CN109726328A (en) | Information acquisition method, device, electronic equipment and computer readable storage medium | |
CN116541421B (en) | Address query information generation method and device, electronic equipment and computer medium | |
US6839799B2 (en) | Method for the prioritization of database entries | |
Jung et al. | Processing continuous range queries with non-spatial selections |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |