CN114328847A

CN114328847A - Data processing method and system based on knowledge graph

Info

Publication number: CN114328847A
Application number: CN202210252888.2A
Authority: CN
Inventors: 周柳阳; 蒋林林
Original assignee: Shenzhen Yihao Hulian Technology Co ltd
Current assignee: Shenzhen Yihao Hulian Technology Co ltd
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-04-12
Anticipated expiration: 2042-03-15
Also published as: CN114328847B

Abstract

The invention relates to the technical field of data processing, and particularly discloses a data processing method and a system based on a knowledge graph, wherein the method comprises the steps of receiving a query sentence input by a user, identifying the content of the query sentence, and determining a character table to be detected; converting the character table to be detected into a reference character table according to the trained mapping relation, traversing a preset query database based on the reference character table, and querying the query indexes of all elements in the reference character table; acquiring account information of a user, determining a related user based on the account information, and determining an expansion table of a reference word table according to a query record of the related user based on the query record; and finally generating a query graph. The invention identifies the content of the query sentence input by the user, then acquires the account information of the user, determines the related user, determines the expansion table, further generates the query map, and provides the push content which is more suitable for the user.

Description

Data processing method and system based on knowledge graph

Technical Field

The invention relates to the technical field of data processing, in particular to a data processing method and system based on a knowledge graph.

Background

The internet advertisement is targeted to all computer terminal clients connected with the internet, and information such as products, services and the like is transmitted to all over the world through the internet, and the coverage area of the world advertisement leads other advertisement media to be popular. In addition, the internet advertisement adopts rich expression means of integrating characters, sound, images, colors and music, has various advantages of newspapers and televisions, and attracts audiences more. Therefore, internet advertising is also increasing.

With the popularization of computer equipment, everyone actively or passively contacts the internet, and each person receives a great number of advertisements every day, wherein one type of advertisements occur in the active search process of the user, particularly in the shopping search process, such as searching for potatoes, and related products are the advertisements, and the benefits of the advertisements are high, and correspondingly, the advertisement cost is also high. It is conceivable that the cost of the advertisement is high and the profit of the goods is not low. In addition, because the pushing of related products is costly, the more budget is, the more forward the related products are, which results in that when a user searches for a product, the related products are not even the same as the hobbyhorse, which also makes the advertising effect appropriate. Therefore, how to optimize the pushing process of the related products is a technical problem to be solved by the technical scheme of the invention.

Disclosure of Invention

The present invention is directed to a method and a system for processing data based on a knowledge graph, so as to solve the problems set forth in the background art.

In order to achieve the purpose, the invention provides the following technical scheme:

a method of data processing based on a knowledge-graph, the method comprising:

receiving a query request containing a query statement input by a user, performing content identification on the query statement, and determining a character table to be detected; wherein, the tail of the query statement is a statement label;

converting the character table to be detected into a reference character table according to the trained mapping relation, traversing a preset query database based on the reference character table, and querying the query indexes of all elements in the reference character table;

acquiring account information of a user, determining a related user based on the account information, reading a query record containing time information of the related user, and determining an expansion table of a reference word table according to the query record;

and determining the weight of each element of the character table to be detected according to the query index, and generating and displaying a query map according to the weight and the expansion table.

As a further scheme of the invention: the steps of receiving a query request containing a query statement input by a user, identifying the content of the query statement and determining a word table to be checked comprise:

receiving a query request containing a query statement input by a user, and determining a statement architecture according to a statement label of the query statement;

reading a query statement, and inputting the query statement and the statement framework into a trained part-of-speech analysis model to obtain the query statement containing part-of-speech tags;

inputting the query sentence containing the part-of-speech tag into a trained sensitivity analysis model, and performing content conversion on the query sentence according to a sensitivity analysis result;

and extracting key words in the query sentence after content conversion according to the part of speech marks, and determining the word table to be detected.

As a further scheme of the invention: the step of converting the word table to be checked into a reference word table according to the trained mapping relation, traversing a preset query database based on the reference word table, and querying the query indexes of each element in the reference word table comprises the following steps:

sequentially extracting elements in the character table to be detected, traversing the trained near-meaning word stock based on the elements, and extracting near-meaning phrases;

extracting a near meaning word from the near meaning phrase according to a preset extraction rule to serve as a reference character of the element;

counting the reference words to obtain a reference word table;

traversing a preset query database based on the reference word table, and querying the query frequency and the latest query time of each element in the reference word table;

wherein the element items of the query database are determined by the preset extraction rule.

As a further scheme of the invention: the step of acquiring account information of the user and determining the relevant user based on the account information comprises the following steps:

acquiring identity information in account information of a user, and determining a user detection range according to the identity information;

acquiring browsing information in account information of a user, and determining the type of the user according to the browsing information;

inquiring related users according to the user types within the user detection range to obtain a related user table;

and acquiring social information in the account information of the user, and correcting the related user table according to the social information.

As a further scheme of the invention: the step of obtaining social information in the account information of the user and correcting the relevant user table according to the social information comprises the following steps:

acquiring the reading authority of a user terminal, and acquiring the screen occupation time of each App in a preset time range based on the reading authority;

extracting a preset number of apps based on the screen occupation time, and acquiring a communication list in the extracted apps;

reading the related user table, performing logic operation on the related user table and the communication list, and determining repeated users and non-repeated users between the communication list and the related user table;

and marking repeated users in the related user table, and inserting non-repeated users into the related user table.

As a further scheme of the invention: the step of reading the query record containing the time information of the related user and determining the extended table of the reference word table according to the query record comprises the following steps:

acquiring account information of a related user, and acquiring a query record of the related user according to the account information; the query records contain time information, and are converted through a trained mapping relation;

sequencing the query records of the same related user according to the time information, and positioning the query contents appearing in the reference word table in the sequenced query records to be used as target words;

extracting a preset number of query contents from the sorted query records by taking the target word as a center;

and expanding the reference word table according to the extracted query content.

As a further scheme of the invention: the step of expanding the reference word table according to the extracted query content comprises the following steps:

calculating the time difference between the extracted query content and the target word;

converting the time difference into a step length according to a preset conversion formula;

establishing an expansion table with the target word as an index; the expansion table comprises query content items and step length items;

wherein the target word is the same element in the query record and the reference word table.

The technical scheme of the invention also provides a data processing system based on the knowledge graph, which comprises:

the word table to be detected determining module is used for receiving a query request containing a query statement input by a user, identifying the content of the query statement and determining the word table to be detected; wherein, the tail of the query statement is a statement label;

the query index determining module is used for converting the word table to be detected into a reference word table according to the trained mapping relation, traversing a preset query database based on the reference word table and querying the query index of each element in the reference word table;

the extended table determining module is used for acquiring account information of a user, determining related users based on the account information, reading query records containing time information of the related users, and determining an extended table of a reference word table according to the query records;

and the query map generation module is used for determining the weight of each element of the character table to be detected according to the query index, and generating and displaying the query map according to the weight and the expansion table.

As a further scheme of the invention: the extended table determining module includes:

the record query unit is used for acquiring account information of a related user and acquiring a query record of the related user according to the account information; the query records contain time information, and are converted through a trained mapping relation;

the sequencing unit is used for sequencing the query records of the same related user according to the time information, positioning the query contents appearing in the reference word table in the sequenced query records and taking the query contents as target words;

a content extraction unit, configured to extract a preset number of query contents from the sorted query records with a target word as a center;

and the processing execution unit is used for expanding the reference word table according to the extracted query content.

As a further scheme of the invention: the process execution unit includes:

a calculating subunit, configured to calculate a time difference between the extracted query content and the target word;

the conversion subunit is used for converting the time difference into a step length according to a preset conversion formula;

the table building subunit is used for building an extended table which takes the target word as an index; the expansion table comprises query content items and step length items;

Compared with the prior art, the invention has the beneficial effects that: the method and the device perform content identification on the query sentence input by the user, then acquire the account information of the user, determine the related user, determine the expansion table, determine the weight of each element of the character table to be detected according to the content identification result, and generate and display the query map according to the weight and the expansion table, thereby providing the push content which is more suitable for the user.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention.

FIG. 1 shows a flow diagram of a method of data processing based on a knowledge-graph.

FIG. 2 illustrates a first sub-flow block diagram of a method of knowledge-graph based data processing.

FIG. 3 illustrates a second sub-flow block diagram of a method of knowledge-graph based data processing.

FIG. 4 illustrates a third sub-flow block diagram of a knowledge-graph based data processing method.

FIG. 5 illustrates a fourth sub-flow block diagram of a knowledge-graph based data processing method.

FIG. 6 illustrates a block diagram of a component architecture of a knowledge-graph based data processing system.

FIG. 7 is a block diagram illustrating the components of an expansion table determination module in a knowledge-graph based data processing system.

Fig. 8 is a block diagram showing a constitutional structure of a processing execution unit in the extended table determination module.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

Fig. 1 shows a flow chart of a data processing method based on a knowledge graph, and in an embodiment of the present invention, the data processing method based on the knowledge graph includes steps S100 to S400:

step S100: receiving a query request containing a query statement input by a user, performing content identification on the query statement, and determining a character table to be detected; wherein, the tail of the query statement is a statement label;

the technical scheme of the invention has the functions of receiving a query request of a user and then outputting data feedback in a knowledge graph form, wherein the step S100 is a data receiving process and receives a query statement and a query request input by the user; the query sentence is identified, and the key sentence and the key word of the query sentence can be extracted, so that a word table to be detected is determined; it should be noted that the statement label may be a symbol of a sentence end, if the statement label is a question mark, the statement is a question sentence, and if the statement label is a sentence, the statement sentence is a statement sentence; if there is no symbol, the sentence is defaulted to a phrase.

Step S200: converting the character table to be detected into a reference character table according to the trained mapping relation, traversing a preset query database based on the reference character table, and querying the query indexes of all elements in the reference character table;

the purpose of step S200 is to perform normalization processing on the word table to be searched, because there are many types of words to be searched, if all words are recorded, the server is very redundant, and some words with similar meanings can be replaced by one unified word, so that the utilization rate of computing resources can be effectively improved.

It is worth mentioning that the query database contains query information of all users, and for the current server, the query database almost has a data storage function, and even if the capacity of the storage device is limited, the storage process can be realized by means of the cloud. Specifically, the existing daily hotspot pushing function in each App is a practical application of the query database.

Step S300: acquiring account information of a user, determining a related user based on the account information, reading a query record containing time information of the related user, and determining an expansion table of a reference word table according to the query record;

the purpose of step S300 is to expand the reference word table, and the principle of the expansion is to obtain the user with higher similarity or higher closeness with the user, and to establish a query index with respect to the related search content of the query statement.

Step S400: and determining the weight of each element of the character table to be detected according to the query index, and generating and displaying a query map according to the weight and the expansion table.

A preferred display mode of the query index is to display in a map mode, if a circle is used to represent an element in the table of words to be examined, the weight represents the size of the circle, and correspondingly, the extended table is a small circle connected with the circle.

Fig. 2 shows a first sub-flow block diagram of a data processing method based on a knowledge graph, the method receives a query request containing a query statement input by a user, performs content identification on the query statement, and the step of determining a table of words to be checked includes steps S101 to S104:

step S101: receiving a query request containing a query statement input by a user, and determining a statement architecture according to a statement label of the query statement;

step S102: reading a query statement, and inputting the query statement and the statement framework into a trained part-of-speech analysis model to obtain the query statement containing part-of-speech tags;

step S103: inputting the query sentence containing the part-of-speech tag into a trained sensitivity analysis model, and performing content conversion on the query sentence according to a sensitivity analysis result;

step S104: and extracting key words in the query sentence after content conversion according to the part of speech marks, and determining the word table to be detected.

The generation process of the word list to be detected is specifically limited from the step S101 to the step S104, firstly, a statement architecture is determined according to statement labels, if the query statement is an interrogative statement, some mood auxiliary words in the query statement can be removed, and the regularity of the determination process of the word list to be detected can be stronger according to the statement architecture.

Then, the part-of-speech analysis is performed on the query statement, and the content in the query statement is marked as nouns, verbs or adverbs, and the like. Finally, the words are sensitively analyzed, and if the words are inappropriate contents, the words are marked as invalid words without subsequent operations. And counting the query sentences after the part of speech analysis and desensitization to obtain the word list to be detected.

Fig. 3 shows a second sub-flow block diagram of a data processing method based on a knowledge graph, where the word table to be checked is converted into a reference word table according to a trained mapping relationship, a preset query database is traversed based on the reference word table, and the step of querying the query indicators of each element in the reference word table includes steps S201 to S204:

step S201: sequentially extracting elements in the character table to be detected, traversing the trained near-meaning word stock based on the elements, and extracting near-meaning phrases;

step S202: extracting a near meaning word from the near meaning phrase according to a preset extraction rule to serve as a reference character of the element;

step S203: counting the reference words to obtain a reference word table;

step S204: traversing a preset query database based on the reference word table, and querying the query frequency and the latest query time of each element in the reference word table;

Step S201 to step S204 provide a determination scheme of query indexes, wherein elements in a word table to be detected are converted into uniform near-meaning words to generate a reference word table, and then query indexes of all elements in the reference word table are determined in a preset query database, wherein the query indexes comprise query frequency and latest query time; when the number of users is large enough, the difference between the latest query time and the current time is almost zero, and therefore, the query frequency is the really important query index.

It should be noted that the preset extraction rule is determined by human, and there are many types of rules, for example, the first element is extracted and used as a reference word according to the first letter sequence, or the first element is extracted and used as a reference word according to the stroke sequence. Therefore, the specific extraction rule is not important, and the uniformity of the extraction rule is essential.

As a preferred embodiment of the technical solution of the present invention, the step of acquiring account information of a user and determining a relevant user based on the account information includes:

Further, the step of obtaining social information in the account information of the user and modifying the relevant user table according to the social information includes:

In an example of the technical solution of the present invention, the determination process of the relevant user is specifically defined, first, the user detection range is determined according to the identity information, wherein the most important of the identity information is age and gender, and of course, other dependent variables such as education level may also be available; then, determining the user type according to the browsing information, and if the information frequently browsed by a user is sports information, defining the user as a sports fan; and preliminarily determining related users according to the user detection range and the user type, and finally marking or supplementing the determined related users according to the social information.

It is worth mentioning that the user with similar aspects to the user has the highest priority, i.e. the above mentioned tagged repeat user.

Fig. 4 shows a third sub-flow block diagram of the data processing method based on knowledge-graph, the step of reading the query record containing time information of the relevant user, and the step of determining the extended table of the reference word table according to the query record comprises steps S301 to S304:

step S301: acquiring account information of a related user, and acquiring a query record of the related user according to the account information; the query records contain time information, and are converted through a trained mapping relation;

step S302: sequencing the query records of the same related user according to the time information, and positioning the query contents appearing in the reference word table in the sequenced query records to be used as target words;

step S303: extracting a preset number of query contents from the sorted query records by taking the target word as a center;

step S304: and expanding the reference word table according to the extracted query content.

Step S301 to step S304 provide a specific technical scheme for generating an expansion table, wherein step S301 and step S302 judge whether the query record of the relevant user has an element in the reference word table, and if so, mark the element as a target word; it is worth mentioning that the query records of the relevant users are also processed by the trained mapping relationship.

Step S303 and step S304 extract a preset number of query contents from the sorted query records with the target word as the center, where the reference of the sorting is time information, and the reference of the extracted query contents is also time information.

Fig. 5 shows a fourth sub-flow block diagram of the data processing method based on knowledge-graph, and the step of expanding the reference word table according to the extracted query content includes steps S3041 to S3043:

step S3041: calculating the time difference between the extracted query content and the target word;

step S3042: converting the time difference into a step length according to a preset conversion formula;

step S3043: establishing an expansion table with the target word as an index; the expansion table comprises query content items and step length items;

The specific expansion process is limited by the content, the purpose of the expansion process is to determine the step size, the final purpose of the technical scheme of the invention is to generate a query graph, and at least two parameters of the query graph are determined, wherein one is the size of the nodes, the other is the distance between the nodes, the weight is the size of the nodes, and the step size is the distance between the nodes.

It should be noted that the weights only need to distinguish different elements in the word table to be checked.

It should be noted that the reference word table and the word table to be checked are in a mapping relationship, the element items of the reference word table and the word table to be checked are corresponding, the reference word table is subjected to a series of operation comparison processes to determine the extended table of the reference word table, and then the extended table and the word table to be checked are required to be inversely converted, and the inverse conversion process is not difficult on the premise that the element items of the reference word table and the word table to be checked are corresponding.

Example 2

Fig. 6 is a block diagram illustrating a composition structure of a data processing system based on a knowledge-graph, and in an embodiment of the present invention, the data processing system based on a knowledge-graph includes:

the word table to be detected determining module 11 is configured to receive a query request containing a query statement input by a user, perform content identification on the query statement, and determine a word table to be detected; wherein, the tail of the query statement is a statement label;

the query index determining module 12 is configured to convert the word table to be tested into a reference word table according to the trained mapping relationship, traverse a preset query database based on the reference word table, and query the query index of each element in the reference word table;

an extended table determining module 13, configured to obtain account information of a user, determine a relevant user based on the account information, read a query record containing time information of the relevant user, and determine an extended table of a reference word table according to the query record;

and the query map generation module 14 is used for determining the weight of each element of the character table to be detected according to the query index, and generating and displaying a query map according to the weight and the expansion table.

FIG. 7 is a block diagram illustrating the structure of an extended table determination module in a data processing system based on knowledge-graph, wherein the extended table determination module 13 comprises:

the record query unit 131 is configured to obtain account information of a relevant user, and obtain a query record of the relevant user according to the account information; the query records contain time information, and are converted through a trained mapping relation;

a sorting unit 132, configured to sort the query records of the same relevant user according to the time information, and locate, in the sorted query records, the query content appearing in the reference word table as a target word;

a content extracting unit 133, configured to extract a preset number of query contents from the sorted query records by taking a target word as a center;

and the processing execution unit 134 is used for expanding the reference word table according to the extracted query content.

Fig. 8 is a block diagram showing a configuration of a process execution unit in the extended table determination module, where the process execution unit 134 includes:

a calculation subunit 1341, configured to calculate a time difference between the extracted query content and the target word;

a conversion subunit 1342, configured to convert the time difference into a step length according to a preset conversion formula;

a table establishing subunit 1343, configured to establish an extended table indexed by the target word; the expansion table comprises query content items and step length items;

The functions that can be performed by the knowledge-graph based data processing method are performed by a computer device that includes one or more processors and one or more memories having at least one program code stored therein, the program code being loaded and executed by the one or more processors to perform the functions of the knowledge-graph based data processing method.

The processor fetches instructions and analyzes the instructions one by one from the memory, then completes corresponding operations according to the instruction requirements, generates a series of control commands, enables all parts of the computer to automatically, continuously and coordinately act to form an organic whole, realizes the input of programs, the input of data, the operation and the output of results, and the arithmetic operation or the logic operation generated in the process is completed by the arithmetic unit; the Memory comprises a Read-Only Memory (ROM) for storing a computer program, and a protection device is arranged outside the Memory.

Illustratively, a computer program can be partitioned into one or more modules, which are stored in memory and executed by a processor to implement the present invention. One or more of the modules may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program in the terminal device.

Those skilled in the art will appreciate that the above description of the service device is merely exemplary and not limiting of the terminal device, and may include more or less components than those described, or combine certain components, or different components, such as may include input output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal equipment and connects the various parts of the entire user terminal using various interfaces and lines.

The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the terminal device by operating or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory mainly comprises a storage program area and a storage data area, wherein the storage program area can store an operating system, application programs (such as an information acquisition template display function, a product information publishing function and the like) required by at least one function and the like; the storage data area may store data created according to the use of the berth-state display system (e.g., product information acquisition templates corresponding to different product types, product information that needs to be issued by different product providers, etc.), and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The terminal device integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the modules/units in the system according to the above embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the functions of the embodiments of the system. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of data processing based on a knowledge graph, the method comprising:

2. The method of claim 1, wherein the step of receiving a query request containing a query statement from a user, performing content recognition on the query statement, and determining a table of words to be checked comprises:

3. The method of claim 1, wherein the step of converting the word table to be checked into a reference word table according to the trained mapping relationship, traversing a preset query database based on the reference word table, and querying the query index of each element in the reference word table comprises:

counting the reference words to obtain a reference word table;

4. The knowledge-graph-based data processing method of claim 1, wherein the step of obtaining account information of a user and determining a relevant user based on the account information comprises:

5. The knowledge-graph-based data processing method according to claim 4, wherein the step of obtaining social information in the account information of the user and modifying the relevant user table according to the social information comprises:

6. The method of claim 1, wherein the step of reading a query record containing time information of an associated user and determining an extended table of reference word tables from the query record comprises:

7. The method of knowledge-graph based data processing according to claim 6, wherein said step of augmenting said reference word list according to extracted query content comprises:

8. A data processing system based on a knowledge-graph, the system comprising:

9. The knowledge-graph based data processing system of claim 8, wherein the extended table determination module comprises:

10. The knowledge-graph based data processing system of claim 9, wherein the process execution unit comprises: