CN111427967A - Entity relationship query method and device - Google Patents

Entity relationship query method and device Download PDF

Info

Publication number
CN111427967A
CN111427967A CN201811582250.5A CN201811582250A CN111427967A CN 111427967 A CN111427967 A CN 111427967A CN 201811582250 A CN201811582250 A CN 201811582250A CN 111427967 A CN111427967 A CN 111427967A
Authority
CN
China
Prior art keywords
entity
entities
neural network
query
expression matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811582250.5A
Other languages
Chinese (zh)
Other versions
CN111427967B (en
Inventor
蒋笑通
汤芬斯蒂
路高飞
曾文烨
叶嘉韬
金晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SF Technology Co Ltd
Original Assignee
SF Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SF Technology Co Ltd filed Critical SF Technology Co Ltd
Priority to CN201811582250.5A priority Critical patent/CN111427967B/en
Publication of CN111427967A publication Critical patent/CN111427967A/en
Application granted granted Critical
Publication of CN111427967B publication Critical patent/CN111427967B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Machine Translation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses an entity relationship query method and device, wherein the method comprises the following steps: constructing an entity expression matrix according to the intimacy relationship of the entities, wherein the entity expression matrix consists of N-dimensional vectors of V entities; and determining and displaying the entities with the most close relationship with the query entity in a set number in the entity expression matrix. In the technical scheme of the application, the space complexity and the training time of the algorithm are greatly reduced, a cluster is not required to be constructed, and the training can be completed on a common pc; by mining the mutual relation among the entities in multiple dimensions, guarantee and basis are provided for fine management and intelligent decision making of the logistics industry.

Description

Entity relationship query method and device
Technical Field
The present disclosure relates generally to the field of logistics big data, and more particularly to the technical field of entity relationships in logistics, and more particularly, to a method and an apparatus for querying an entity relationship.
Background
In recent years, with the development of electronic commerce and logistics industry, the quantity of express items is rapidly increasing, various demands and complaints are increasing, and the service requirements of logistics companies are increasing. Each level of responsible persons very much want to know the operation state of the jurisdiction so as to take corresponding measures. For example, the host of the net point A is often sent to which cities, and whether the hairy crabs are often sent out or not is judged. If the network point A, each city and the hairy crab are taken as entities, the two problems can be classified as the intimacy problem of the cooccurrence relationship between the network point A entity and each city and between the network point A entity and the hairy crab entity.
If the network point A has some problems, the network point responsible person wants to know whether the network point C managed by the network point responsible person is the same kind of network point as the network point A or not, so as to take preventive measures. Such problems can be categorized as the affinity problem of the homogeneous relationship between the mesh point a entity and the mesh point entity C.
For the problems, enterprises mostly adopt a mode of mining the co-occurrence relationship by inquiring, counting and constructing the co-occurrence matrix for historical waybill data, and mine the similar relationship by carrying out matrix decomposition and dimension reduction and then clustering, and the modes have a plurality of problems in the logistics industry.
The co-occurrence matrix is constructed to mine the co-occurrence relation, and a large amount of resource space is occupied. The logistics industry has a plurality of entities, and the number of the entities is hundreds of thousands of millions only when the logistics industry holds the objects, and the number of the entities is huge when other entities such as network points, cities, counties and the like are added. Assuming that the number of entities subdivided by the logistics company is 50 ten thousand, constructing such a co-occurrence matrix occupies 50 ten thousand by 50 ten thousand to 2500 billion storage units, occupies a huge resource space, and greatly improves the space complexity of an algorithm.
The co-occurrence matrix has too high dimensionality and a large number of sparse values, the existing method mainly excavates the same kind of relationship by matrix decomposition dimensionality reduction, such as SVD (singular value decomposition) and other algorithms, however, the calculation consumes time and occupies resources due to the fact that the co-occurrence matrix is very large.
Disclosure of Invention
In view of the above-mentioned drawbacks and deficiencies in the prior art, it is desirable to provide a method and an apparatus for querying entity relationships with low resource consumption and high computation efficiency.
In a first aspect, the present application provides an entity relationship query method, including the following steps:
constructing an entity expression matrix according to the intimacy relationship of the entities, wherein the entity expression matrix consists of N-dimensional vectors of V entities;
and determining and displaying the entities with the most close relationship with the query entity in a set number in the entity expression matrix.
According to the technical scheme provided by the embodiment of the application, the building of the entity expression matrix according to the affinity relationship of the entities specifically comprises the following steps:
acquiring waybill data, wherein the waybill data comprises fields and entities corresponding to the fields;
each waybill data is corporated and then stored into a corpus; and constructing a neural network language model and training the model based on the corpus to obtain the entity expression matrix.
According to the technical scheme provided by the embodiment of the application, the step of constructing the neural network language model and obtaining the entity expression matrix after training the model based on the corpus comprises the following steps:
constructing a neural network language model, namely constructing a W matrix of 1V N and C W of N V derived through a hidden layer1'、W2'......Wc' matrix; v is the total number of the entities, N is the expression dimension of the entities, and C is the window size of the context field of the set entity in the waybill data where the entity is located;
construction of the loss function fDecrease in the thickness of the steel
Based on a loss function fDecrease in the thickness of the steelDetermining an objective function fEyes of a user
And carrying out iterative optimization on the neural network language model based on the objective function to obtain the entity expression matrix.
According to the technical scheme provided by the embodiment of the application, the method for obtaining the entity expression matrix after constructing the neural network language model and training the model based on the corpus specifically comprises the following steps:
iterative optimization is carried out on the neural network language model by adopting a random gradient descent method to obtain final matrixes W and W';
and adding the matrix W and the transposed matrixes of all W' to average to obtain an entity expression matrix.
According to the technical scheme provided by the embodiment of the application, before each waybill data is corporated and stored in the corpus, the method further comprises the following steps:
normalizing the entities in each waybill data through the following steps:
acquiring a preliminary entity from waybill data;
and matching the obtained preliminary entities with dictionaries corresponding to the preliminary entities respectively, and replacing the preliminary entities with the matched entities.
According to the technical scheme provided by the embodiment of the application, the step of respectively matching the obtained preliminary entities with the dictionaries corresponding to the preliminary entities and replacing the preliminary entities with the matched entities specifically comprises the following steps:
matching the shape code and the sound code of each Chinese character in the preliminary entity with the shape code and the sound code of the Chinese character in the corresponding dictionary respectively;
weighting and summing the values matched with the shape codes and the values matched with the sound codes to obtain matching values matched with the Chinese characters in the preliminary entity, and replacing the Chinese characters in the preliminary entity with the Chinese characters corresponding to the matching values; and obtaining an entity matched with the preliminary entity.
According to the technical scheme provided by the embodiment of the application, the shape code of each Chinese character is obtained by the following method:
carrying out shape and sound coding on the Chinese characters;
and (3) performing high-level shape feature extraction on the shape and sound codes of each Chinese character by using a convolutional neural network to obtain an 8-dimensional vector.
According to the technical scheme provided by the embodiment of the application, the sound code of each Chinese character is a 3-dimensional vector obtained by coding the initial consonant, the final sound and the tone of the pinyin of the Chinese character respectively.
In a second aspect, the present application provides an entity relationship query apparatus, comprising
An input module configured to input a query entity;
the storage module is used for storing an entity expression matrix for expressing the affinity relationship of the entities;
the computing module is configured to determine the position of the query entity in an entity expression matrix for expressing the intimacy relationship of the entities, and determine a set number of entities which are most closely related to the query entity in the entity expression matrix;
and the display module is configured for displaying the set number of entities which are closest to the query entity.
According to the technical scheme provided by the embodiment of the application, the computing module is also configured to classify the entity with the closest relationship with the query entity into a co-occurrence entity and a homogeneous entity; the display module is also configured to display the set number of entities most closely related to the query entity in a classified manner.
According to the technical scheme provided by the embodiment of the application, the device further comprises:
the acquisition module is configured to acquire waybill data, and the waybill data comprises fields and entities corresponding to the fields; and the neural network module is configured and used for constructing a neural network language model and training the neural network language model by using the waybill data to obtain the entity expression matrix.
According to the technical solution provided by the embodiment of the present application,
the computing module is also configured to corporate each waybill data;
the storage module is provided with a corpus and is configured for storing the linguistic waybill data;
and the neural network module is configured to train the neural network language model based on the corpus to obtain the entity expression matrix.
According to the technical scheme, the relation between the logistics industry entities is mined through the neural network language model, so that the resource consumption is greatly reduced. For example, if the existing co-occurrence matrix method is used, 60 ten thousand, 3600 hundred million memory cells are needed, while the present scheme only needs 60 ten thousand, 200, 1.2 hundred million memory cells, 1/3000 for the existing method; meanwhile, the space complexity and the training time of the algorithm are greatly reduced, a cluster does not need to be constructed, and the training can be completed on a common pc.
According to the technical scheme, the mutual relation among the entities is mined in multiple dimensions, and guarantees and bases are provided for fine management and intelligent decision making of the logistics industry.
According to the technical scheme provided by some embodiments of the application, the loss function is redesigned for the neural network language model, so that the capability and effect of the model for mining the entity co-occurrence relationship are greatly improved;
according to the technical scheme provided by some embodiments of the application, the convolutional neural network and the pinyin are used for carrying out shape-sound coding on the Chinese characters, so that the effect of entity alignment is improved; and the object-supporting dictionary with multi-dimensional fields is utilized, so that the probability of object-supporting matching failure is reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a first embodiment of the present application;
FIG. 2 is a flow chart of the construction of an entity expression matrix according to the first embodiment;
FIG. 3 is a flowchart illustrating data corporations of waybill in the first embodiment;
FIG. 4 is a flowchart of the construction of a neural network and an entity representation matrix in the first embodiment;
FIG. 5 is a schematic diagram of a neural network model in the first embodiment;
FIG. 6 is a schematic block diagram of a second embodiment of the present application;
fig. 7 is a schematic block diagram of a third embodiment of the present application.
Reference numbers in the figures: 100. a device; 110. an input module; 120. a storage module; 130. a calculation module; 140. a display module; 150 acquisition module, 160 neural network module;
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Referring to fig. 1, which is a flowchart illustrating a first embodiment of the present application, an entity relationship query method provided by the present application includes the following steps:
s100, constructing an entity expression matrix according to the intimacy relationship of the entities, wherein the entity expression matrix is composed of N-dimensional vectors of V entities;
the embodiment is applied to the logistics industry, the entity is the specific content of the relevant field in the logistics waybill, that is, the content included in the fields of the consignment name, the category, the departure city, the arrival city, the departure county, the arrival county, the departure website, the arrival website, whether the fields are fragile, whether the fields are overtime, the aging type, the value-added service type, the waybill feedback, and the like, in the waybill data shown in table 1 below, the content shown in rows 2 to 4 is the entity, that is, the Shenzhen is one entity, 755B is 1 entity, the express mail loss is 1 entity, and the like. The embodiment can also be applied to other occasions similar to the logistics industry, such as warehouse management and calling of a factory, and when the embodiment is applied to other fields, the meaning of the entity is other meanings corresponding to the application occasions.
The intimacy relationship is the relationship of the probability of two entities appearing in one system at the same time; for example, in a logistics system, the host of the network point A is often sent to which cities, and whether the hairy crabs are often sent or not is judged. The network site A, each city and the hairy crab are taken as entities, and the two problems can be classified as the intimacy problem of the co-occurrence relationship between the network site A entity and each city and between the network site A entity and the hairy crab entity.
If the network point A has some problems, the network point responsible person wants to know whether the network point C managed by the network point responsible person is the same kind of network point as the network point A or not, so as to take preventive measures. Such problems can be classified as the affinity problem of the same kind relationship between the net point A entity and the net point entity C
The number of the entities is the number of the entities in the logistics coverage range, for example, the total number of all departure nodes, the number of reaching nodes, and the number of departure cities, in the logistics system applied in this embodiment, the total number V of all the entities is 60 ten thousand; each entity is represented by a 200-dimensional vector of N-200, so that each row of the entity expression matrix in this embodiment corresponds to a vector of an entity.
Figure BDA0001918226960000061
TABLE 1
And s200, determining and displaying the entities with the most close relationship with the query entity in the set number in the entity expression matrix.
In the entity expression matrix, for example, when an entity is input during query, for example, the set number of entities, such as a net point "755B", which are closest to the "755B" is 50, the "755B" is first converted into a unique 200-dimensional vector corresponding to the "755B", the cosine distance between the vector of the other entity and the vector of the "755B" is calculated in a 200-dimensional coordinate system, and the 50 entities with the smallest cosine distance are selected as the result to be queried.
In this embodiment, as shown in fig. 2, the entity expression matrix is obtained by a neural network language model, specifically by the following method steps:
s110, acquiring waybill data, wherein the waybill data comprises fields and entities corresponding to the fields;
in the primarily acquired waybill data, due to the fact that the names of the consignment objects are various, one commodity can be expressed in various ways; secondly, when the fortune bill is filled in, the client often inputs Chinese characters into other words with similar shape and sound, so the names of the primary entities obtained from the fortune bill data are often irregular and need to be subjected to standardized processing.
In the embodiment, the entity is divided into the host supporting object and the non-host supporting object for processing respectively.
For the consignment, a real consignment needs to be mined from the appeal text, the client appeal text is loaded from a client appeal table of the system, and the waybill data only comprises related data during waybill filling, such as a departure address, an arrival address and the like. The appeal text contains the text for recording the customer complaints and requirements in the appeal table. The information of the client appeal text is more accurate. Then, key words in the text are mined by using a TF-IDF algorithm, and then matching is carried out in the existing consignment dictionary of the company. The object-holding dictionary contains information of multiple dimensions such as Chinese names, English names, names of others, product series, categories and the like. The keywords are matched with each dimension in the consignment dictionary, and the matching success rate is improved. The object supporting dictionary is an object supporting dictionary established for each logistics enterprise for the system of the logistics enterprise, for example, for an object supporting ' potato ', information of each dimension of the potato ' in the object supporting dictionary includes: the name of Chinese: potato; the name of English: a pitato; the alternative name is as follows: potatoes, sweet potatoes, ground eggs, potatoes, sweet potatoes and potatoes; the product series is as follows: a crop; the category: information of various dimensions such as freshness; for example, in the waybill data, the initial name of the consignment object filled by the client is 'potato', and the standardized entity name 'potato' can be matched in the consignment object dictionary, so that the 'potato' is replaced by the 'potato' in the waybill data.
For example, in the appeal text, the customer appeals that "why the potatoes purchased by me do not arrive at present", the keyword "potato" successfully matched at this time is the consignment, and then the primary entity in the data waybill corresponding to the consignment field is replaced by the consignment name, namely the standardized entity.
Matching all entity names except the consignment with an existing data dictionary of a company to obtain corresponding normalized entity names, wherein the data dictionary contains information such as website names, city names, district names and the like, and the names are subjected to normalized processing;
when matching Chinese characters, shape-sound coding is carried out on the Chinese characters, high-level shape characteristics of the Chinese characters are extracted by utilizing a convolutional neural network, an 8-dimensional vector is obtained, and the vector is used as a shape code of the Chinese characters. And then the initial consonants, the vowels and the tones in the pinyin are used for coding the Chinese characters to obtain a 3-dimensional vector which is used as the phonetic code of the Chinese characters. For example, the piece of rose has an initial consonant m, which is coded as 3, a final consonant ei, which is correspondingly coded as 7, and a tone as a second tone, which is correspondingly coded as 2, thus forming a three-dimensional vector of [3,7,2 ].
When matching, the shape code and the phonetic code of each Chinese character are matched separately, and then the values matched twice are weighted and summed to obtain the final matching value. And matching each Chinese character of each preliminary entity, wherein the words corresponding to the preliminary entities are standard entities of the Chinese characters after matching, and replacing the preliminary entities with the standard entities.
For example, if a customer mistakenly writes the arrival city "Suzhou" as "Suzhou", the two Chinese characters of "Suzhou" and "zhou" in the Suzhou are first encoded and then matched in a dictionary to find the entity with the largest score. The 'Su' character has no problem, although the 'zhou' is wrongly written, the pronunciation of the 'zhou' is unchanged, so the matching value of the 'zhou' character and the 'zhou' character is 1, and the 'zhou' character are matched
The "states" are similar in shape, with a shape code match value of 0.8. We add the matching values of the two parts by weight, the phonetic code is 0.7, the shape code is 0.3, so the matching value of "continent" and "state" is 0.7 × 1+0.3 × 0.8 ═ 0.94, which is higher than the matching result with other words. Thus, the matching can be realized.
And using the method to take the entity matched from the data dictionary and the consignment dictionary as the final name of the entity.
The earlier-stage standardized processing of the waybill data enables the name of the entity in the logistics system to be simplified and unique on the premise of ensuring comprehensive coverage, and accurate training data are provided for a neural network language model; the expression of the entity expression matrix finally obtained by the scheme is more accurate for the expression of the entity intimacy relationship.
s120, the data of each waybill is corporated and then stored into a corpus;
as shown in fig. 3, the data corpus of waybill specifically includes the following steps:
s121, splicing the non-empty entity of each waybill data and the corresponding field thereof into a new word;
for example, the new words corresponding to the entities in the second row in table 1 are: starting a network point: 755B, reach site: 512V, starting city: shenzhen, starting region county: a Bao' an area; reaching the county: the cang lang district holds in the palm and posts the object name: hua is the cell-phone, the first grade is pieced together: electronic products, secondary products: mobile phone, waybill feedback: the express mail is lost.
s122, splicing all new words of each waybill data into sentences;
during splicing, the new word consists of a field name, a colon and a non-empty entity; the new words are separated from the spaces between the new words;
the entities in the following expressions of the present application are entities expressed in the form of new words;
in certain embodiments: when the entity is 'yes', directly taking the field name as a new word, and when the entity is 'no', regarding the field as a corresponding empty entity; when an entity corresponds to two words separated by a symbol, two new words sharing a field are arranged corresponding to the entity, and in some embodiments, the entity of the registered object can be specially not added with the field name.
For example, the sentence formed by the second row in table 1 above:
a mesh point sending: 755B up to dot: the city is sent out at 512V: shenzhen launch region county: from Baoan district to prefecture: the name of the article is posted in the cang Lang region: hua is the first grade of mobile phone: the secondary product of the electronic product: mobile phone waybill feedback: the express mail is lost.
And s123, storing the sentences of all waybill data line by line to form a corpus.
For example, a corpus formed of three waybill data as in table 1 is:
a mesh point sending: 755B up to dot: 512V departure city: shenzhen starting region county: the Baoan area reaches the city: suzhou arrives at prefecture: the name of the article is posted in the cang Lang region: hua is the first grade of mobile phone: the secondary product of the electronic product: and (4) mobile phone insurance: is waybill feedback: express mail loss
The distribution point 633L reach distribution point 010L G distribution city, sunshine departure district, Wulian county arrival city, Beijing arrival district, Haihu district mail name, hairy crab primary product class, fresh secondary product class, crab overtime value-added service, hairy crab specific failure type, every other day arrival city
A mesh point sending: 746B dot: 411U departure city: yongzhou arrives at the city: beijing departed district, cold water arriving at district: the name of the article in the Jinzhou district: first-grade of the iso-snake king wine: food secondary products: wine price preservation: is a value added service: value added service dispatch on holidays: dispatch at specified time
It can be seen from the above linguistic process that the order of the new words in the waybill data in the sentence is not necessarily consistent with the field order in the waybill data, and within the range of the field setting window C, C is generally 5, and can be arbitrarily ordered within the window 5.
The linguistic process is ready for training the neural network language model.
And s130, constructing a neural network language model and training the model based on the corpus to obtain the entity expression matrix.
As shown in fig. 4, the method specifically includes the following steps:
s131, constructing the neural network language model, as shown in FIG. 5, i.e. constructing 1W matrix of V × N, and C W matrices of N × V1'、W2'......Wc' matrix; v is the total number of the entities, N is the expression dimension of the entities, and C is the window size of the context field of the entity in the waybill data where the entity is located; w, W1' to Wc' hidden layer parameters of neural network, i.e. facts to be learnedAnd (4) volume expression. Wherein W is a matrix of 60 ten thousand rows by 200 columns, W1'、W2'......Wc' are matrices of 200 columns by 60 ten thousand rows each.
In practical application, each entity, namely each word, is represented by a unique id, and the value of the id is 0 to 60 ten thousand-1. Randomly selecting a word from the corpus, converting the word into a 60-dimensional one-hot vector according to the id of the word, and using the one-hot vector as the input of a neural network language model, namely the input x of the modelkThen, the word is taken out from the C words closest to the field context of the waybill data where the word is located, and the id corresponding to the context words is the label y of the neural network model1j、y2j......ycjA predicted value is required.
s132, constructing a loss function f loss;
Figure BDA0001918226960000101
v represents the set of all entities in the corpus;
Ns(v)representing other entities in the single waybill data in which the entity v is located;
f (v) an N-dimensional vector representing the current entity v, f (u) an N-dimensional vector representing the predicted entity;
gamma is a hyper-parameter, different values are tried in training, the larger gamma is, the larger the proportion of the concurrence loss in the whole loss is, the more concurrence relation can be mined, and the smaller gamma is, the more homogeneous relation can be mined. Is constant during the training process. In this example, γ is 0.5.
μcThe length of the intermediate vector in the training process is also N-dimensional, and the intermediate vector can be automatically updated in the training process.
In fig. 5, the Hidden layer (Hidden layer) is provided with a conversion function, and the conversion function is owned by the neural network model, and the conversion function performs conversion calculation on W to obtain W1'、W2'......Wc',W1'、W2'......Wc' Each output y obtained by a conversion function1j、y2j......ycjO of 60 ten thousand dimensions each being a predictive entityne-hot vector.
The loss function in the neural network model of the present application is redesigned, wherein
Figure BDA0001918226960000111
The neural network language model can better mine the co-occurrence relationship between the entities for the considered co-occurrence loss.
s133 based on the loss function fDecrease in the thickness of the steelDetermining an objective function f order;
Figure BDA0001918226960000112
and s134, carrying out iterative optimization on the neural network language model based on the objective function to obtain the entity expression matrix.
Performing iterative optimization on a model by using a random gradient descent method (stochastic gradientsubsequent) based on entity information in a corpus to obtain final W and W ', performing matrix addition and averaging on transposes of W and W' to obtain a new 60-ten-thousand-row 200-column entity expression matrix, wherein a 200-dimensional vector of each entity corresponds to each row of the entity expression matrix, when a query is performed, an entity, such as a 'departure screen point 512L V', is input, the set number of entities most closely related to the 'departure screen point 512L V' is required to be 5, the 'departure screen point 512L V' is firstly converted into a unique 200-dimensional vector corresponding to the entity, in a 200-dimensional coordinate system, the cosine distance between vectors of other entities and the vector of the 'departure screen point 512L V' is calculated, the 5 entities with the minimum cosine distance are selected as a result to be queried, and the result of the same type of entities as the field of which belongs to the same type as the 'departure screen point 512L V' is judged by the fact that the same type of entities and the same type of entities are input as shown in the following table:
Figure BDA0001918226960000121
TABLE 2
Example two:
fig. 6 is a schematic block diagram of an entity relationship query apparatus 100 provided in the present application, including:
an input module 110 configured to input a query entity; for example, the "input entity" in the first column of table 2 in embodiment one is the entity to be queried, which is input through the input module 110.
A storage module 120 configured to store an entity expression matrix for expressing the affinity relationship of the entities; the entity expression matrix is that in the first embodiment, based on entity information in a corpus, iterative optimization is performed on a neural network language model finally obtained in the first embodiment by adopting a stochastic gradient descent method (stochasticidcgradientsubsequent), and finally obtained W and W' are passed; and finally, performing matrix addition and averaging on the transposes of W and W'.
The calculation module 130 is configured to determine the position of the query entity in the entity expression matrix, and determine a set number of entities having the closest relationship with the query entity in the entity expression matrix;
for example, as described in the first embodiment, the entity expression matrix is a matrix of 60 ten thousand rows and 200 columns, and a 200-dimensional vector of each entity corresponds to each row of the entity expression matrix.
When an entity is input during query, for example, 512L V serving as a starting node, the set number of the entities which are most closely related to 512L V serving as the starting node is 5, the 512L V serving as the starting node is converted into a unique 200-dimensional vector corresponding to the vector, the cosine distance between the vector of other entities and the vector of 512L V serving as the starting node is calculated in a 200-dimensional coordinate system, and the 5 entities with the smallest cosine distance are selected as the entities which are most closely related to the query entity.
The display module 140 is configured to display a set number of entities closest to the query entity.
The display module 140 finally displays the queried entity.
Preferably, the calculation module 130 is further configured to classify the entity with the closest relationship with the query entity into a co-occurrence entity and a homogeneous entity; the display module is further configured to display a set number of entities most closely related to the query entity in a classified manner, for example, as shown in table 2, 5 entities with the highest affinity corresponding to the input query entity are displayed as the homogeneous entity and the concurrent entity in a classified manner.
Preferably, the above apparatus further comprises:
an obtaining module 150 configured to obtain waybill data, where the waybill data includes fields and an entity corresponding to each field; the obtaining module 150 obtains and processes the waybill data through step s110 in the first embodiment.
The neural network module 160 is configured to construct a neural network language model and train the neural network language model with the waybill data to obtain the entity expression matrix, and the neural network module 160 specifically constructs the neural network language model through step s130 in the first embodiment and trains the neural network language model with the waybill data to obtain the entity expression matrix.
Preferably, the calculation module 120 is further configured to corporate each waybill data; the detailed procedure of the data linguistic process of the waybill by the calculation module 120 refers to the detailed description in the step s120 in the first embodiment;
the storage module 120 is provided with a corpus configured to store the corporated waybill data;
the neural network module 160 is configured to train the neural network language model based on the corpus to obtain the entity expression matrix, and the training of the neural network language model by the neural network module 160 may refer to the specific step of step s130 in the first embodiment.
For brevity, please refer to the implementation steps in the first embodiment for the specific working process of the neural network module 160 and the calculation module 130.
The obtaining module 150 in the above-mentioned apparatus can periodically obtain and update waybill data, for example, update waybill data every 2 days or every week, the calculating module can also periodically update the corpus along with the periodic update, and the neural network language module 160 correspondingly periodically trains the model with the latest corpus and outputs the latest entity expression matrix, so as to ensure the timely update of information and the validity of the entity expression matrix.
In summary, according to the technical solutions provided in the first and second embodiments, a 60 ten thousand by 200 entity expression matrix is constructed, so that the data storage space is effectively reduced, the data operation efficiency is improved, the affinity relationship of each entity can be quickly inquired and displayed through the technical solution of the present application, and a guarantee and a basis are provided for the fine management and the intelligent decision of the logistics industry.
It should be understood that the units or modules recited in the apparatus 100 correspond to the various steps in the method described with reference to fig. 5. Thus, the operations and features described above for the method are equally applicable to the apparatus 100 and the units included therein, and are not described in detail here. The apparatus 100 may be implemented in a browser or other security applications of the electronic device in advance, or may be loaded into the browser or other security applications of the electronic device by downloading or the like. Corresponding elements in the apparatus 100 may cooperate with elements in the electronic device to implement aspects of embodiments of the present application.
Example three: the functional block diagram of a terminal device provided for the present application includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the entity relationship query method in the first embodiment.
Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use in implementing a terminal device or server of an embodiment of the present application.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
To the I/O interface 705, AN input section 706 including a keyboard, a mouse, and the like, AN output section 707 including a keyboard such as a Cathode Ray Tube (CRT), a liquid crystal display (L CD), and the like, a speaker, and the like, a storage section 708 including a hard disk and the like, and a communication section 709 including a network interface card such as a L AN card, a modem, and the like, the communication section 709 performs communication processing via a network such as the internet, a drive 710 is also connected to the I/O interface 705 as necessary, a removable medium 711 such as a magnetic disk, AN optical disk, a magneto-optical disk, a semiconductor memory, and the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.
In particular, the process described above with reference to fig. 2 may be implemented as a computer software program, according to an embodiment of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the method of fig. 2. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Example four:
as another aspect, the present application also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiments; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs, which are used by one or more processors to execute the steps of the entity relationship query method of the logistics industry in the first embodiment.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (12)

1. An entity relationship query method, characterized in that the method comprises the following steps:
constructing an entity expression matrix according to the intimacy relationship of the entities, wherein the entity expression matrix consists of N-dimensional vectors of V entities;
and determining and displaying the entities with the most close relationship with the query entity in a set number in the entity expression matrix.
2. The method of claim 1, wherein the constructing the entity expression matrix according to the affinity relationship of the entities comprises the following steps:
acquiring waybill data, wherein the waybill data comprises fields and entities corresponding to the fields; each waybill data is corporated and then stored into a corpus;
and constructing a neural network language model and training the model based on the corpus to obtain the entity expression matrix.
3. The entity relationship query method of claim 2, wherein the step of constructing a neural network language model and training the model based on the corpus to obtain the entity expression matrix comprises the steps of:
constructing a neural network language model, namely constructing a W matrix of 1V N and C W of N V derived through a hidden layer1'、W2'......Wc' matrix; v is the total number of the entities, N is the expression dimension of the entities, and C is the window size of the context field of the set entity in the waybill data where the entity is located;
construction of the loss function fDecrease in the thickness of the steel
Based on a loss function fDecrease in the thickness of the steelDetermining an objective function fEyes of a user
And carrying out iterative optimization on the neural network language model based on the objective function to obtain the entity expression matrix.
4. The entity relationship query method of claim 3, wherein the step of constructing a neural network language model and training the model based on the corpus to obtain the entity expression matrix comprises the following steps:
iterative optimization is carried out on the neural network language model by adopting a random gradient descent method to obtain final matrixes W and W1' to Wc';
Will matrix W and W1' to WcThe transpose matrix of' is added to average to obtain the entity expression matrix.
5. The entity relationship query method according to claim 2, wherein before the step of formulating each waybill data into a corpus, the method further comprises the following steps:
normalizing the entities in each waybill data through the following steps:
acquiring a preliminary entity from waybill data;
and matching the obtained preliminary entities with dictionaries corresponding to the preliminary entities respectively, and replacing the preliminary entities with the matched entities.
6. The entity relationship query method according to claim 5, wherein the step of matching the obtained preliminary entities with the dictionaries corresponding to the preliminary entities and replacing the preliminary entities with the matched entities specifically comprises the steps of:
matching the shape code and the sound code of each Chinese character in the preliminary entity with the shape code and the sound code of the Chinese character in the corresponding dictionary respectively;
weighting and summing the values matched with the shape codes and the values matched with the sound codes to obtain matching values matched with the Chinese characters in the preliminary entity, and replacing the Chinese characters in the preliminary entity with the Chinese characters corresponding to the matching values; and obtaining an entity matched with the preliminary entity.
7. The entity relationship query method of claim 6,
the shape code of each Chinese character is obtained by the following method:
carrying out shape and sound coding on the Chinese characters;
and (3) performing high-level shape feature extraction on the shape and sound codes of each Chinese character by using a convolutional neural network to obtain an 8-dimensional vector.
8. The entity relationship query method of claim 7,
the sound code of each Chinese character is a 3-dimensional vector obtained by coding the initial consonant, the final sound and the tone of the pinyin of the Chinese character respectively.
9. An entity relationship query device, comprising
An input module configured to input a query entity;
the storage module is used for storing an entity expression matrix for expressing the affinity relationship of the entities;
the computing module is configured to determine the position of the query entity in an entity expression matrix for expressing the intimacy relationship of the entities, and determine a set number of entities which are most closely related to the query entity in the entity expression matrix;
and the display module is configured for displaying the set number of entities which are closest to the query entity.
10. The entity relationship query device of claim 9,
the computing module is further configured to classify entities most closely related to the query entity into co-occurring entities and homogeneous entities;
the display module is also configured to display the set number of entities most closely related to the query entity in a classified manner.
11. The entity relationship query device according to any one of claims 9-10, further comprising:
the acquisition module is configured to acquire waybill data, and the waybill data comprises fields and entities corresponding to the fields;
and the neural network module is configured and used for constructing a neural network language model and training the neural network language model by using the waybill data to obtain the entity expression matrix.
12. An entity relationship query device according to any one of claims 9-10,
the computing module is also configured to corporate each waybill data;
the storage module is provided with a corpus and is configured for storing the linguistic waybill data;
and the neural network module is configured to train the neural network language model based on the corpus to obtain the entity expression matrix.
CN201811582250.5A 2018-12-24 2018-12-24 Entity relationship query method and device Active CN111427967B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811582250.5A CN111427967B (en) 2018-12-24 2018-12-24 Entity relationship query method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811582250.5A CN111427967B (en) 2018-12-24 2018-12-24 Entity relationship query method and device

Publications (2)

Publication Number Publication Date
CN111427967A true CN111427967A (en) 2020-07-17
CN111427967B CN111427967B (en) 2023-06-09

Family

ID=71545660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811582250.5A Active CN111427967B (en) 2018-12-24 2018-12-24 Entity relationship query method and device

Country Status (1)

Country Link
CN (1) CN111427967B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079070A (en) * 2006-05-26 2007-11-28 国际商业机器公司 Computer and method for response of information query
CN101470754A (en) * 2007-12-27 2009-07-01 国际商业机器公司 Community server system and activity recording method therefor
CN106844426A (en) * 2016-12-09 2017-06-13 中电科华云信息技术有限公司 Computing system and method based on random walk personnel's cohesion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101079070A (en) * 2006-05-26 2007-11-28 国际商业机器公司 Computer and method for response of information query
CN101470754A (en) * 2007-12-27 2009-07-01 国际商业机器公司 Community server system and activity recording method therefor
CN106844426A (en) * 2016-12-09 2017-06-13 中电科华云信息技术有限公司 Computing system and method based on random walk personnel's cohesion

Also Published As

Publication number Publication date
CN111427967B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN108717406B (en) Text emotion analysis method and device and storage medium
Ozdemir et al. Feature Engineering Made Easy: Identify unique features from your dataset in order to build powerful machine learning systems
CN111368175B (en) Event extraction method and system and entity classification model
CN114330354B (en) Event extraction method and device based on vocabulary enhancement and storage medium
CN109598517B (en) Commodity clearance processing, object processing and category prediction method and device thereof
US20220300546A1 (en) Event extraction method, device and storage medium
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN116127020A (en) Method for training generated large language model and searching method based on model
CN115062732B (en) Resource sharing cooperation recommendation method and system based on big data user tag information
CN116226334A (en) Method for training generated large language model and searching method based on model
CN112434131A (en) Text error detection method and device based on artificial intelligence, and computer equipment
CN111858933A (en) Character-based hierarchical text emotion analysis method and system
CN112148831A (en) Image-text mixed retrieval method and device, storage medium and computer equipment
CN112926308A (en) Method, apparatus, device, storage medium and program product for matching text
CN114492661B (en) Text data classification method and device, computer equipment and storage medium
CN116821372A (en) Knowledge graph-based data processing method and device, electronic equipment and medium
CN112905787B (en) Text information processing method, short message processing method, electronic device and readable medium
CN117708428A (en) Recommendation information prediction method and device and electronic equipment
CN112906368A (en) Industry text increment method, related device and computer program product
CN114036921A (en) Policy information matching method and device
CN113051911A (en) Method, apparatus, device, medium, and program product for extracting sensitive word
CN111581386A (en) Construction method, device, equipment and medium of multi-output text classification model
CN111427967B (en) Entity relationship query method and device
CN113688232B (en) Method and device for classifying bid-inviting text, storage medium and terminal
CN116109420A (en) Insurance product recommendation method, apparatus, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant