CN112183630B - Embedding vector generation method, device, equipment and medium based on embedded point level - Google Patents

Embedding vector generation method, device, equipment and medium based on embedded point level Download PDF

Info

Publication number
CN112183630B
CN112183630B CN202011045397.8A CN202011045397A CN112183630B CN 112183630 B CN112183630 B CN 112183630B CN 202011045397 A CN202011045397 A CN 202011045397A CN 112183630 B CN112183630 B CN 112183630B
Authority
CN
China
Prior art keywords
embedded vector
tree
target
embedded
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011045397.8A
Other languages
Chinese (zh)
Other versions
CN112183630A (en
Inventor
萧梓健
杜宇衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202011045397.8A priority Critical patent/CN112183630B/en
Publication of CN112183630A publication Critical patent/CN112183630A/en
Application granted granted Critical
Publication of CN112183630B publication Critical patent/CN112183630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/14Tree-structured documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the field of artificial intelligence, and provides an embedded vector generation method, device, equipment and medium based on embedded point levels, which can perform fusion processing on a dictionary tree to obtain a level tree structure, retain the level information of embedded point data, ensure that each level has overall meaning, better describe the similarity of embedded points with the same prefix under the same branch, realize the automatic construction of the level tree by combining network training, acquire the data of the embedded points to be processed, query the data of the embedded points to be processed in a target level tree to obtain a target embedded vector, and further realize the automatic generation of the embedded vector based on the level tree. The invention also relates to a blockchain technology, and embedded point data and target embedded vectors can be stored in the blockchain.

Description

Embedding vector generation method, device, equipment and medium based on embedded point level
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an embedded vector generation method, device, equipment and medium based on a buried point level.
Background
The buried point is a record of user behavior process and result to meet the requirements of quick, efficient and rich data application. In many companies, buried points are often named with fixed prefix rules for efficient buried point generation and querying.
In the prior art, in order to learn buried point information, a common method is to perform single-heat encoding on a buried point id to obtain an ebedding encoding representation of the buried point, and input the encoding representation into a model to learn buried point related information. The single hot encoding is to encode N states with N-bit state registers, each having independent register bits, and only one of the register bits being valid, i.e., only one state.
However, since the principle of the single thermal encoding is to assume that different buried points id are independent of each other, there is no correlation. Therefore, prefix rule information of the buried point is lost through the single-hot coding, so that the obtained ebadd coding cannot describe the associated characteristics among layers.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus, a device, and a medium for generating an embedded vector based on a buried point hierarchy, where the adopted hierarchical tree structure not only retains the hierarchical information of the buried point data, but also ensures that each hierarchy has an overall meaning, and the buried points of the same prefix are placed under the same branch, so that the similarity of the buried points of the same prefix is better described, and further, automatic generation of the embedded vector is realized based on the hierarchical tree.
An embedded vector generation method based on a buried point level, the embedded vector generation method based on the buried point level comprises the following steps:
Acquiring historical buried point data;
constructing a dictionary tree according to the historical buried point data;
performing fusion processing on the dictionary tree to obtain a hierarchical tree structure;
randomly initializing an embedded vector of each level in the level tree structure to obtain an embedded vector level tree;
training a preset network for the samples according to the embedded vector hierarchical tree;
when the preset network training is completed, acquiring specified parameters from the preset network, and updating the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree;
when receiving an embedded vector generation instruction, acquiring buried point data to be processed;
and inquiring in the target level tree according to the buried data to be processed to obtain a target embedded vector.
According to a preferred embodiment of the present invention, before the historical buried data is obtained, the method further comprises:
monitoring a configuration system through a buried point technology;
when monitoring that the preset operation is generated in the configuration system, acquiring a current log;
and recording the current log as the buried point data.
According to a preferred embodiment of the present invention, the performing fusion processing on the dictionary tree to obtain a hierarchical tree structure includes:
acquiring a bifurcation-free link in the dictionary tree;
And merging branches on each non-bifurcation link to obtain the hierarchical tree structure.
According to a preferred embodiment of the present invention, the training a preset network for samples according to the embedded vector hierarchical tree includes:
and performing unsupervised pre-training on the preset network according to the embedded vector hierarchical tree.
According to a preferred embodiment of the present invention, the training a preset network for samples according to the embedded vector hierarchical tree further includes:
acquiring a current task scene;
determining a network model corresponding to the current task scene as the preset network;
determining a supervision target corresponding to the current task scene;
and training the preset network according to the supervision target by taking the embedded vector hierarchical tree as input until the preset network reaches the configuration accuracy, and stopping training.
According to a preferred embodiment of the present invention, the querying in the target hierarchical tree according to the to-be-processed buried data, to obtain a target embedded vector includes:
splitting the buried data to be processed according to the arrangement sequence of the characters to obtain each character;
traversing each character in the target hierarchical tree in sequence until the configuration condition is met, stopping traversing, and acquiring traversed tree nodes;
And constructing the target embedded vector according to the configuration condition by using the embedded vector corresponding to the traversed tree node.
According to a preferred embodiment of the present invention, the constructing the target embedding vector according to the configuration condition by using the embedding vector corresponding to the traversed tree node includes:
when the configuration condition is that each character traverses to a corresponding tree node, acquiring a tree node corresponding to a last character from the traversed tree node as a target tree node, and determining an embedded vector corresponding to the target tree node as the target embedded vector; or alternatively
When the configuration condition is that the target character in each character does not traverse to the corresponding tree node, determining the current level corresponding to the target character and the highest level of the target level tree, obtaining an embedded vector corresponding to the target character, calculating the level difference between the highest level and the current level, and performing zero padding processing on the obtained embedded vector based on the level difference to obtain the target embedded vector.
An embedded vector generation apparatus based on a buried point hierarchy, the embedded vector generation apparatus based on a buried point hierarchy comprising:
The acquisition unit is used for acquiring the historical buried point data;
the construction unit is used for constructing a dictionary tree according to the historical buried point data;
the fusion unit is used for carrying out fusion processing on the dictionary tree to obtain a hierarchical tree structure;
the initialization unit is used for randomly initializing the embedded vector of each level in the level tree structure to obtain an embedded vector level tree;
the training unit is used for training a preset network for the sample according to the embedded vector hierarchical tree;
the updating unit is used for acquiring specified parameters from the preset network when the preset network training is completed, and updating the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree;
the acquisition unit is further used for acquiring buried point data to be processed when receiving an embedded vector generation instruction;
and the query unit is used for querying in the target level tree according to the buried data to be processed to obtain a target embedded vector.
An electronic device, the electronic device comprising:
a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system
And the processor executes the instructions stored in the memory to realize the embedded vector generation method based on the embedded point level.
A computer-readable storage medium having stored therein at least one instruction that is executed by a processor in an electronic device to implement the embedded vector generation method based on a buried point hierarchy.
According to the technical scheme, the method and the device can acquire historical embedded point data, build a dictionary tree according to the historical embedded point data, conduct fusion processing on the dictionary tree to obtain a hierarchical tree structure, not only reserve hierarchical information of the embedded point data, but also ensure that all levels have overall significance, the embedded points with the same prefix are placed under the same branch, better describe similarity of the embedded points with the same prefix, randomly initialize embedded vectors of all levels in the hierarchical tree structure to obtain embedded vector hierarchical trees, train a preset network according to the embedded vector hierarchical tree as samples, acquire specified parameters from the preset network when the preset network training is completed, update the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree, enable automatic construction of the hierarchical tree to be combined with network training, acquire embedded point data to be processed according to the embedded point data to be processed, inquire the target hierarchical tree to obtain target embedded vectors, and enable automatic generation of the embedded vectors based on the hierarchical tree.
Drawings
FIG. 1 is a flow chart of a preferred embodiment of the embedded vector generation method based on the buried point hierarchy.
FIG. 2 is a functional block diagram of a preferred embodiment of the embedded vector generation apparatus based on buried point level according to the present invention.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing an embedded vector generation method based on a buried point hierarchy.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
FIG. 1 is a flow chart of a method for generating embedded vectors based on buried point level according to a preferred embodiment of the present invention. The order of the steps in the flowchart may be changed and some steps may be omitted according to various needs.
The embedded vector generation method based on the embedded point level is applied to one or more electronic devices, wherein the electronic devices are devices capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions, and the hardware of the electronic devices comprises, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (Field-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices and the like.
The electronic device may be any electronic product that can interact with a user in a human-computer manner, such as a personal computer, tablet computer, smart phone, personal digital assistant (Personal Digital Assistant, PDA), game console, interactive internet protocol television (Internet Protocol Television, IPTV), smart wearable device, etc.
The electronic device may also include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network server, a server group composed of a plurality of network servers, or a Cloud based Cloud Computing (Cloud Computing) composed of a large number of hosts or network servers.
The network in which the electronic device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
S10, acquiring historical buried point data.
The buried point data is data which can reflect user behaviors and the like and is acquired through a buried point technology, and can be used for various tasks such as behavior monitoring, data analysis and the like.
In at least one embodiment of the present invention, before the historical buried data is obtained, the method further comprises:
monitoring a configuration system through a buried point technology;
When monitoring that the preset operation is generated in the configuration system, acquiring a current log;
and recording the current log as the buried point data.
The configuration system is a system needing to be monitored, such as a designated operation platform.
The preset operation may include, but is not limited to: clicking operation and touching operation.
By the embodiment, the historical buried point data can be acquired based on the buried point technology, so that the historical buried point data can be used for subsequent training of the model.
It should be noted that, the buried point data may be stored in a designated database, or may be deployed in a blockchain, so as to prevent malicious tampering and improve data security.
S11, constructing a dictionary tree according to the historical buried point data.
In this embodiment, the construction of the dictionary tree satisfies the following condition:
(1) The root node does not contain characters, and each node contains only one character except the root node;
(2) From the root node to a certain node, the characters passing through the path are connected to form a character string corresponding to the node;
(3) All child nodes of each node contain different characters.
And S12, carrying out fusion processing on the dictionary tree to obtain a hierarchical tree structure.
Wherein the hierarchical tree structure is a further fusion process of the dictionary tree.
Specifically, the fusing the dictionary trees to obtain a hierarchical tree structure includes:
acquiring a bifurcation-free link in the dictionary tree;
and merging branches on each non-bifurcation link to obtain the hierarchical tree structure.
It should be noted that, many times, a character string formed by a plurality of characters as a whole is meaningful, and it is not necessary to layer the buried points entirely in terms of the number of characters.
For example: for buried point ids [ 'application', 'apps', ] if the hierarchy is structured in the form of a dictionary tree, the hierarchy is split according to characters, and the 'application' is split into 11 layers: a. p, p, l, i, c, a, t, i, o, n, breaking 'apps' into 4 layers: a. p, s. The problem with this construction method is that each level is a single character, and there is no overall meaning, and in many cases it is necessary to combine the characters as a whole to make sense. For example, although the numbers of characters are different, the app and the china are taken as a whole, and a finer one-step splitting is not needed, and only one layer is occupied in the tree structure instead of splitting into 3 layers and 5 layers respectively; similarly, apps and application should be split into two layers: the 'app','s' and the 'app', 'application' are such that each level has an overall meaning, and is more suitable for model learning level knowledge.
According to the embodiment, from the perspective of optimizing the embedded vector, the dictionary tree is improved to be in a hierarchical tree structure, continuous characters without independent meanings are combined to form the hierarchical character string with the whole meaning, the embedded point data are reasonably layered, the hierarchical information of the embedded point data is reserved, the whole meaning of each hierarchy is ensured, and then the model can learn the hierarchical information better.
Furthermore, the conventional one-hot (onehot) coding indicates that buried prefix level information is lost assuming that ids are independent of each other.
In the processing manner in this embodiment, the buried points with the same prefix are placed under the same branch, so that the buried points with the same prefix have the same embedded vector (embedding) prefix, so that the similarity of the buried points with the same prefix is better marked, and the hierarchical information of the buried points and the association relationship between the buried points are accurately marked.
For example: for example, 'XX', 'XX encyclopedia', 'data' is set, if the ebedding dimension e=2 of each layer, then "XX", "data" is the first layer, and "encyclopedia" is the second layer, and at this time, the first two dimensions of XX and XX encyclopedia are the same, so that the first two dimensions of ebedding are the same, so that buried points with the same prefix have the same ebedding prefix.
S13, randomly initializing the embedded vector of each level in the level tree structure to obtain an embedded vector level tree.
For example: the embedded vector of each level in the hierarchical tree structure can be initialized to be represented by the vectors of [1, 0], [0,1,0], [0, 1], the values are random, and the embedded vector of each level can be updated and optimized continuously along with subsequent training.
In this embodiment, since the embedded vector of each level in the hierarchical tree structure is used as a parameter of the model, the embedded vector of each level in the corresponding hierarchical tree structure is updated continuously during the continuous learning process of the model, since the parameter of the model is optimized continuously.
S14, training a preset network for the samples according to the embedded vector hierarchical tree.
In this embodiment, when training the preset network, a supervised training mode may be adopted, or an unsupervised training mode may be adopted.
Specifically, the training a preset network for samples according to the embedded vector hierarchical tree includes:
and performing unsupervised pre-training on the preset network according to the embedded vector hierarchical tree.
For example: and learning the buried point id or the semantic information of the words from the buried point behavior sequence or the word sequence of the natural language by using word2vec and the like.
The pre-training has the advantage that the target does not need to be supervised, and the training can be performed through the property of the data (such as the sequence data passing through the sequence relationship of words and the like).
In at least one embodiment of the present invention, the training a preset network for samples according to the embedded vector hierarchical tree further includes:
acquiring a current task scene;
determining a network model corresponding to the current task scene as the preset network;
determining a supervision target corresponding to the current task scene;
and training the preset network according to the supervision target by taking the embedded vector hierarchical tree as input until the preset network reaches the configuration accuracy, and stopping training.
The embodiment directly trains a supervised neural network model, takes the embedding instead of the buried point or the word id as model input, takes the prediction target as model supervision target, and the model adopted by the neural network mainly depends on specific task scenes. For example, the input is a sequence of actions or a sequence of words, a sequence neural network model, such as RNN (Recurrent Neural Network ), etc., may be employed.
The supervised training mode is adopted, so that the embedded vector can be directly learned based on a specific prediction task scene, and the learning is more direct.
The two training modes essentially take the embedded vector as parameters of the neural network, the embedded vector is learned through the process of training the neural network, and the neural network parameters are optimized or updated through gradient descent and other methods according to the supervision targets during training, so that the difference between the prediction result of the neural network and the supervision targets is smaller and smaller, and the purpose of learning the embedded vector through training is achieved. The supervision target is a prediction target, for example, the supervision target may be an insurance sales value, whether to click on an advertisement, whether to hold, and the like, and in an unsupervised task, the supervision target may be constructed based on the data itself, for example, predicting the next word in a behavior sequence, and the like.
The embedded vector of the hierarchical tree can be abstracted into an embedded vector for each node on the hierarchical tree, and the learning of the embedded vector hierarchical tree is the process of the embedded vector learning of the nodes, so that each node can also learn and update independently.
And S15, when the preset network training is completed, acquiring specified parameters from the preset network, and updating the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree.
In this embodiment, the specified parameter corresponds to an embedded vector, so after the preset network training is completed, the specified parameter may be obtained as an embedded vector of each node in the embedded vector hierarchical tree, so as to update the embedded vector hierarchical tree.
The target level tree is a structure for describing the prefix relation of the buried points, and the dependence relation among the buried points can be quickly known through the visualized buried point level tree, so that a foundation can be laid for tasks such as modeling analysis and the like.
The target hierarchical tree may also be used for similarity analysis.
In the traditional similarity analysis based on the ebedding, similarity calculation is generally performed on the ebedding of the two buried point ids, for example, after the inner product is calculated, the similarity=0.8 between the buried point a and the buried point b is known, and the similarity=0.5 between the buried point a and the buried point c is known, so that the similarity between the buried point a and the buried point b is higher, and the overall similarity of the two buried point ids is further determined.
The similarity analysis is performed based on the target level tree, so that not only the overall similarity of the ebedding of the two buried points id can be calculated, but also the analysis can be performed on each level, and the interpretation is higher, for example, the similarity=0.8 of the buried point a and the buried point b can be calculated and obtained, and after the similarity=0.5 of the buried point a and the buried point c, the similarity of the two layers before the buried point a and the buried point b can be calculated and obtained, but also the similarity of the buried point a and the buried point c is high only in the first layer, so that the a and the b are more similar.
The target level tree can also be used as a characteristic, and the vector representation is more accurate, so that the precision of a downstream prediction task can be improved.
S16, when an embedded vector generation instruction is received, buried point data to be processed is obtained.
In this embodiment, the embedded vector generation instruction may be triggered by any user.
The buried point data to be processed can be uploaded by a user triggering the embedded vector generation instruction, and the invention is not limited.
And S17, inquiring in the target level tree according to the buried data to be processed to obtain a target embedded vector.
The target embedded vector refers to an embedded vector representation corresponding to the buried data to be processed.
In at least one embodiment of the present invention, the querying in the target hierarchical tree according to the to-be-processed embedded data, to obtain a target embedded vector includes:
splitting the buried data to be processed according to the arrangement sequence of the characters to obtain each character;
traversing each character in the target hierarchical tree in sequence until the configuration condition is met, stopping traversing, and acquiring traversed tree nodes;
and constructing the target embedded vector according to the configuration condition by using the embedded vector corresponding to the traversed tree node.
For example: when an embedded vector of a buried point is to be queried, traversing each character in the buried point id character string from front to back, starting querying from a root node in the target level tree, entering a corresponding branch when each character is queried, and then continuing to look down until the complete buried point id is traversed.
Specifically, the constructing the target embedded vector according to the configuration condition by using the embedded vector corresponding to the traversed tree node includes:
when the configuration condition is that each character traverses to a corresponding tree node, acquiring a tree node corresponding to a last character from the traversed tree node as a target tree node, and determining an embedded vector corresponding to the target tree node as the target embedded vector; or alternatively
When the configuration condition is that the target character in each character does not traverse to the corresponding tree node, determining the current level corresponding to the target character and the highest level of the target level tree, obtaining an embedded vector corresponding to the target character, calculating the level difference between the highest level and the current level, and performing zero padding processing on the obtained embedded vector based on the level difference to obtain the target embedded vector.
Wherein the zero-padding process is to fill the part which is not matched to the highest hierarchy with 0.
In this embodiment, in order to ensure data security of the obtained target embedded vector, the target embedded vector may be stored in the blockchain.
It should be noted that the embedded vector generated by the present embodiment may be applied to various task scenarios, and will be illustrated in connection with specific task scenarios.
For example, when similarity analysis is required, the data to be analyzed is queried in the target level tree constructed in the embodiment to generate the target embedded vector, and the similarity analysis is performed by using the generated target embedded vector, so that not only can the overall meaning of each level be ensured, but also the similarity of the same prefix embedded point is characterized, and the accuracy of the similarity analysis can be improved.
For example, when performing a prediction task such as text classification, the data to be predicted is queried in the target hierarchical tree constructed in the embodiment to generate the target embedded vector, and the training sample is constructed to train the corresponding network model by using the generated target embedded vector.
In practical applications, the embedded vector generated in this embodiment may be used in many task scenarios, that is, the embedded vector may be generated in this embodiment as long as the task that needs to use the embedded vector.
According to the technical scheme, the method and the device can acquire historical embedded point data, build a dictionary tree according to the historical embedded point data, conduct fusion processing on the dictionary tree to obtain a hierarchical tree structure, not only reserve hierarchical information of the embedded point data, but also ensure that all levels have overall significance, the embedded points with the same prefix are placed under the same branch, better describe similarity of the embedded points with the same prefix, randomly initialize embedded vectors of all levels in the hierarchical tree structure to obtain embedded vector hierarchical trees, train a preset network according to the embedded vector hierarchical tree as samples, acquire specified parameters from the preset network when the preset network training is completed, update the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree, enable automatic construction of the hierarchical tree to be combined with network training, acquire embedded point data to be processed according to the embedded point data to be processed, inquire the target hierarchical tree to obtain target embedded vectors, and enable automatic generation of the embedded vectors based on the hierarchical tree.
Fig. 2 is a functional block diagram of a preferred embodiment of the embedded vector generation apparatus based on the buried point level according to the present invention. The embedded vector generating device 11 based on the embedded point hierarchy includes an acquiring unit 110, a constructing unit 111, a fusing unit 112, an initializing unit 113, a training unit 114, an updating unit 115, and a querying unit 116. The module/unit referred to in the present invention refers to a series of computer program segments capable of being executed by the processor 13 and of performing a fixed function, which are stored in the memory 12. In the present embodiment, the functions of the respective modules/units will be described in detail in the following embodiments.
The acquisition unit 110 acquires historic buried point data.
The buried point data is data which can reflect user behaviors and the like and is acquired through a buried point technology, and can be used for various tasks such as behavior monitoring, data analysis and the like.
In at least one embodiment of the invention, the configuration system is monitored by a buried point technique before the historical buried point data is acquired;
when monitoring that the preset operation is generated in the configuration system, acquiring a current log;
and recording the current log as the buried point data.
The configuration system is a system needing to be monitored, such as a designated operation platform.
The preset operation may include, but is not limited to: clicking operation and touching operation.
By the embodiment, the historical buried point data can be acquired based on the buried point technology, so that the historical buried point data can be used for subsequent training of the model.
It should be noted that, the buried point data may be stored in a designated database, or may be deployed in a blockchain, so as to prevent malicious tampering and improve data security.
The construction unit 111 constructs a dictionary tree from the history buried data.
In this embodiment, the construction of the dictionary tree satisfies the following condition:
(1) The root node does not contain characters, and each node contains only one character except the root node;
(2) From the root node to a certain node, the characters passing through the path are connected to form a character string corresponding to the node;
(3) All child nodes of each node contain different characters.
The merging unit 112 performs merging processing on the dictionary tree to obtain a hierarchical tree structure.
Wherein the hierarchical tree structure is a further fusion process of the dictionary tree.
Specifically, the fusing unit 112 performs a fusing process on the dictionary tree, and the obtaining a hierarchical tree structure includes:
acquiring a bifurcation-free link in the dictionary tree;
And merging branches on each non-bifurcation link to obtain the hierarchical tree structure.
It should be noted that, many times, a character string formed by a plurality of characters as a whole is meaningful, and it is not necessary to layer the buried points entirely in terms of the number of characters.
For example: for buried point ids [ 'application', 'apps', ] if the hierarchy is structured in the form of a dictionary tree, the hierarchy is split according to characters, and the 'application' is split into 11 layers: a. p, p, l, i, c, a, t, i, o, n, breaking 'apps' into 4 layers: a. p, s. The problem with this construction method is that each level is a single character, and there is no overall meaning, and in many cases it is necessary to combine the characters as a whole to make sense. For example, although the numbers of characters are different, the app and the china are taken as a whole, and a finer one-step splitting is not needed, and only one layer is occupied in the tree structure instead of splitting into 3 layers and 5 layers respectively; similarly, apps and application should be split into two layers: the 'app','s' and the 'app', 'application' are such that each level has an overall meaning, and is more suitable for model learning level knowledge.
According to the embodiment, from the perspective of optimizing the embedded vector, the dictionary tree is improved to be in a hierarchical tree structure, continuous characters without independent meanings are combined to form the hierarchical character string with the whole meaning, the embedded point data are reasonably layered, the hierarchical information of the embedded point data is reserved, the whole meaning of each hierarchy is ensured, and then the model can learn the hierarchical information better.
Furthermore, the conventional one-hot (onehot) coding indicates that buried prefix level information is lost assuming that ids are independent of each other.
In the processing manner in this embodiment, the buried points with the same prefix are placed under the same branch, so that the buried points with the same prefix have the same embedded vector (embedding) prefix, so that the similarity of the buried points with the same prefix is better marked, and the hierarchical information of the buried points and the association relationship between the buried points are accurately marked.
For example: for example, 'XX', 'XX encyclopedia', 'data' is set, if the ebedding dimension e=2 of each layer, then "XX", "data" is the first layer, and "encyclopedia" is the second layer, and at this time, the first two dimensions of XX and XX encyclopedia are the same, so that the first two dimensions of ebedding are the same, so that buried points with the same prefix have the same ebedding prefix.
The initializing unit 113 randomly initializes the embedded vector of each level in the hierarchical tree structure to obtain an embedded vector hierarchical tree.
For example: the embedded vector of each level in the hierarchical tree structure can be initialized to be represented by the vectors of [1, 0], [0,1,0], [0, 1], the values are random, and the embedded vector of each level can be updated and optimized continuously along with subsequent training.
In this embodiment, since the embedded vector of each level in the hierarchical tree structure is used as a parameter of the model, the embedded vector of each level in the corresponding hierarchical tree structure is updated continuously during the continuous learning process of the model, since the parameter of the model is optimized continuously.
Training unit 114 trains a predetermined network for the samples according to the embedded vector hierarchical tree.
In this embodiment, when training the preset network, a supervised training mode may be adopted, or an unsupervised training mode may be adopted.
Specifically, the training unit 114 trains a preset network for samples according to the embedded vector hierarchical tree, including:
and performing unsupervised pre-training on the preset network according to the embedded vector hierarchical tree.
For example: and learning the buried point id or the semantic information of the words from the buried point behavior sequence or the word sequence of the natural language by using word2vec and the like.
The pre-training has the advantage that the target does not need to be supervised, and the training can be performed through the property of the data (such as the sequence data passing through the sequence relationship of words and the like).
In at least one embodiment of the present invention, the training unit 114 trains a preset network for samples according to the embedded vector hierarchical tree further includes:
Acquiring a current task scene;
determining a network model corresponding to the current task scene as the preset network;
determining a supervision target corresponding to the current task scene;
and training the preset network according to the supervision target by taking the embedded vector hierarchical tree as input until the preset network reaches the configuration accuracy, and stopping training.
The embodiment directly trains a supervised neural network model, takes the embedding instead of the buried point or the word id as model input, takes the prediction target as model supervision target, and the model adopted by the neural network mainly depends on specific task scenes. For example, the input is a sequence of actions or a sequence of words, a sequence neural network model, such as RNN (Recurrent Neural Network ), etc., may be employed.
The supervised training mode is adopted, so that the embedded vector can be directly learned based on a specific prediction task scene, and the learning is more direct.
The two training modes essentially take the embedded vector as parameters of the neural network, the embedded vector is learned through the process of training the neural network, and the neural network parameters are optimized or updated through gradient descent and other methods according to the supervision targets during training, so that the difference between the prediction result of the neural network and the supervision targets is smaller and smaller, and the purpose of learning the embedded vector through training is achieved. The supervision target is a prediction target, for example, the supervision target may be an insurance sales value, whether to click on an advertisement, whether to hold, and the like, and in an unsupervised task, the supervision target may be constructed based on the data itself, for example, predicting the next word in a behavior sequence, and the like.
The embedded vector of the hierarchical tree can be abstracted into an embedded vector for each node on the hierarchical tree, and the learning of the embedded vector hierarchical tree is the process of the embedded vector learning of the nodes, so that each node can also learn and update independently.
When the training of the preset network is completed, the updating unit 115 obtains a specified parameter from the preset network, and updates the embedded vector hierarchical tree according to the specified parameter to obtain a target hierarchical tree.
In this embodiment, the specified parameter corresponds to an embedded vector, so after the preset network training is completed, the specified parameter may be obtained as an embedded vector of each node in the embedded vector hierarchical tree, so as to update the embedded vector hierarchical tree.
The target level tree is a structure for describing the prefix relation of the buried points, and the dependence relation among the buried points can be quickly known through the visualized buried point level tree, so that a foundation can be laid for tasks such as modeling analysis and the like.
The target hierarchical tree may also be used for similarity analysis.
In the traditional similarity analysis based on the ebedding, similarity calculation is generally performed on the ebedding of the two buried point ids, for example, after the inner product is calculated, the similarity=0.8 between the buried point a and the buried point b is known, and the similarity=0.5 between the buried point a and the buried point c is known, so that the similarity between the buried point a and the buried point b is higher, and the overall similarity of the two buried point ids is further determined.
The similarity analysis is performed based on the target level tree, so that not only the overall similarity of the ebedding of the two buried points id can be calculated, but also the analysis can be performed on each level, and the interpretation is higher, for example, the similarity=0.8 of the buried point a and the buried point b can be calculated and obtained, and after the similarity=0.5 of the buried point a and the buried point c, the similarity of the two layers before the buried point a and the buried point b can be calculated and obtained, but also the similarity of the buried point a and the buried point c is high only in the first layer, so that the a and the b are more similar.
The target level tree can also be used as a characteristic, and the vector representation is more accurate, so that the precision of a downstream prediction task can be improved.
When receiving the embedded vector generation instruction, the acquisition unit 110 acquires the buried point data to be processed.
In this embodiment, the embedded vector generation instruction may be triggered by any user.
The buried point data to be processed can be uploaded by a user triggering the embedded vector generation instruction, and the invention is not limited.
The query unit 116 queries in the target hierarchical tree according to the buried data to be processed to obtain a target embedded vector.
The target embedded vector refers to an embedded vector representation corresponding to the buried data to be processed.
In at least one embodiment of the present invention, the querying unit 116 queries the target hierarchical tree according to the to-be-processed embedded data, where obtaining the target embedded vector includes:
splitting the buried data to be processed according to the arrangement sequence of the characters to obtain each character;
traversing each character in the target hierarchical tree in sequence until the configuration condition is met, stopping traversing, and acquiring traversed tree nodes;
and constructing the target embedded vector according to the configuration condition by using the embedded vector corresponding to the traversed tree node.
For example: when an embedded vector of a buried point is to be queried, traversing each character in the buried point id character string from front to back, starting querying from a root node in the target level tree, entering a corresponding branch when each character is queried, and then continuing to look down until the complete buried point id is traversed.
Specifically, the constructing, by the query unit 116, the target embedding vector with the embedding vector corresponding to the traversed tree node according to the configuration condition includes:
when the configuration condition is that each character traverses to a corresponding tree node, acquiring a tree node corresponding to a last character from the traversed tree node as a target tree node, and determining an embedded vector corresponding to the target tree node as the target embedded vector; or alternatively
When the configuration condition is that the target character in each character does not traverse to the corresponding tree node, determining the current level corresponding to the target character and the highest level of the target level tree, obtaining an embedded vector corresponding to the target character, calculating the level difference between the highest level and the current level, and performing zero padding processing on the obtained embedded vector based on the level difference to obtain the target embedded vector.
Wherein the zero-padding process is to fill the part which is not matched to the highest hierarchy with 0.
In this embodiment, in order to ensure data security of the obtained target embedded vector, the target embedded vector may be stored in the blockchain.
It should be noted that the embedded vector generated by the present embodiment may be applied to various task scenarios, and will be illustrated in connection with specific task scenarios.
For example, when similarity analysis is required, the data to be analyzed is queried in the target level tree constructed in the embodiment to generate the target embedded vector, and the similarity analysis is performed by using the generated target embedded vector, so that not only can the overall meaning of each level be ensured, but also the similarity of the same prefix embedded point is characterized, and the accuracy of the similarity analysis can be improved.
For example, when performing a prediction task such as text classification, the data to be predicted is queried in the target hierarchical tree constructed in the embodiment to generate the target embedded vector, and the training sample is constructed to train the corresponding network model by using the generated target embedded vector.
In practical applications, the embedded vector generated in this embodiment may be used in many task scenarios, that is, the embedded vector may be generated in this embodiment as long as the task that needs to use the embedded vector.
According to the technical scheme, the method and the device can acquire historical embedded point data, build a dictionary tree according to the historical embedded point data, conduct fusion processing on the dictionary tree to obtain a hierarchical tree structure, not only reserve hierarchical information of the embedded point data, but also ensure that all levels have overall significance, the embedded points with the same prefix are placed under the same branch, better describe similarity of the embedded points with the same prefix, randomly initialize embedded vectors of all levels in the hierarchical tree structure to obtain embedded vector hierarchical trees, train a preset network according to the embedded vector hierarchical tree as samples, acquire specified parameters from the preset network when the preset network training is completed, update the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree, enable automatic construction of the hierarchical tree to be combined with network training, acquire embedded point data to be processed according to the embedded point data to be processed, inquire the target hierarchical tree to obtain target embedded vectors, and enable automatic generation of the embedded vectors based on the hierarchical tree.
Fig. 3 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing the embedded vector generation method based on the buried point level.
The electronic device 1 may comprise a memory 12, a processor 13 and a bus, and may further comprise a computer program stored in the memory 12 and executable on the processor 13, such as an embedded vector generation program based on a buried point level.
It will be appreciated by those skilled in the art that the schematic diagram is merely an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, the electronic device 1 may be a bus type structure, a star type structure, the electronic device 1 may further comprise more or less other hardware or software than illustrated, or a different arrangement of components, for example, the electronic device 1 may further comprise an input-output device, a network access device, etc.
It should be noted that the electronic device 1 is only used as an example, and other electronic products that may be present in the present invention or may be present in the future are also included in the scope of the present invention by way of reference.
The memory 12 includes at least one type of readable storage medium including flash memory, a removable hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 12 may in some embodiments be an internal storage unit of the electronic device 1, such as a mobile hard disk of the electronic device 1. The memory 12 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 12 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 12 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of embedded vector generation programs based on buried point levels, but also for temporarily storing data that has been output or is to be output.
The processor 13 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, a combination of various control chips, and the like. The processor 13 is a Control Unit (Control Unit) of the electronic device 1, connects the respective components of the entire electronic device 1 using various interfaces and lines, executes or executes programs or modules stored in the memory 12 (for example, executes embedded vector generation programs based on a buried point level, etc.), and invokes data stored in the memory 12 to perform various functions of the electronic device 1 and process data.
The processor 13 executes the operating system of the electronic device 1 and various types of applications installed. The processor 13 executes the application program to implement the steps of the various embodiments of the embedding vector generation method based on the buried point hierarchy described above, such as the steps shown in fig. 1.
Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to complete the present invention. The one or more modules/units may be a series of instruction segments of a computer program capable of performing a specific function for describing the execution of the computer program in the electronic device 1. For example, the computer program may be divided into generating means 11 comprising an acquisition unit 110, a construction unit 111, a fusion unit 112, an initialization unit 113, a training unit 114, an updating unit 115, a query unit 116.
The integrated units implemented in the form of software functional modules described above may be stored in a computer readable storage medium. The software functional module is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a computer device, or a network device, etc.) or a processor (processor) to execute portions of the embedded vector generation method based on the embedded point hierarchy according to the embodiments of the present invention.
The integrated modules/units of the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on this understanding, the present invention may also be implemented by a computer program for instructing a relevant hardware device to implement all or part of the procedures of the above-mentioned embodiment method, where the computer program may be stored in a computer readable storage medium and the computer program may be executed by a processor to implement the steps of each of the above-mentioned method embodiments.
Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created from the use of blockchain nodes, and the like.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the targeting (anti-counterfeiting) of the information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one arrow is shown in FIG. 3, but only one bus or one type of bus is not shown. The bus is arranged to enable a connection communication between the memory 12 and at least one processor 13 or the like.
Although not shown, the electronic device 1 may further comprise a power source (such as a battery) for powering the various components, which may preferably be logically connected to the at least one processor 13 via a power management means, so as to perform functions such as charge management, discharge management, and power consumption management via the power management means. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
Fig. 3 shows only an electronic device 1 with components 12-13, it being understood by a person skilled in the art that the structure shown in fig. 3 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or a different arrangement of components.
In connection with fig. 1, the memory 12 in the electronic device 1 stores a plurality of instructions to implement a buried point level based embedded vector generation method, the processor 13 being executable to implement:
acquiring historical buried point data;
constructing a dictionary tree according to the historical buried point data;
performing fusion processing on the dictionary tree to obtain a hierarchical tree structure;
randomly initializing an embedded vector of each level in the level tree structure to obtain an embedded vector level tree;
training a preset network for the samples according to the embedded vector hierarchical tree;
when the preset network training is completed, acquiring specified parameters from the preset network, and updating the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree;
When receiving an embedded vector generation instruction, acquiring buried point data to be processed;
and inquiring in the target level tree according to the buried data to be processed to obtain a target embedded vector.
Specifically, the specific implementation method of the above instructions by the processor 13 may refer to the description of the relevant steps in the corresponding embodiment of fig. 1, which is not repeated herein.
In the several embodiments provided in the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the system claims can also be implemented by means of software or hardware by means of one unit or means. The terms second, etc. are used to denote a name, but not any particular order.
Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims (9)

1. The embedded vector generation method based on the buried point level is characterized by comprising the following steps of:
acquiring historical buried point data;
constructing a dictionary tree according to the historical buried point data;
and carrying out fusion processing on the dictionary tree to obtain a hierarchical tree structure, wherein the method comprises the following steps of: acquiring non-bifurcation links in the dictionary tree, and merging branches on each non-bifurcation link to obtain the hierarchical tree structure;
randomly initializing an embedded vector of each level in the level tree structure to obtain an embedded vector level tree;
training a preset network for the samples according to the embedded vector hierarchical tree;
when the preset network training is completed, acquiring specified parameters from the preset network, and updating the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree;
when receiving an embedded vector generation instruction, acquiring buried point data to be processed;
and inquiring in the target level tree according to the buried data to be processed to obtain a target embedded vector.
2. The embedded vector generation method based on a buried point hierarchy according to claim 1, further comprising, before acquiring the historical buried point data:
Monitoring a configuration system through a buried point technology;
when monitoring that the preset operation is generated in the configuration system, acquiring a current log;
and recording the current log as the buried point data.
3. The embedded vector generation method based on the embedded point hierarchy as claimed in claim 1, wherein training a preset network for samples according to the embedded vector hierarchy tree comprises:
and performing unsupervised pre-training on the preset network according to the embedded vector hierarchical tree.
4. The embedded vector generation method based on the embedded point hierarchy as claimed in claim 1, wherein training a preset network for samples according to the embedded vector hierarchy tree further comprises:
acquiring a current task scene;
determining a network model corresponding to the current task scene as the preset network;
determining a supervision target corresponding to the current task scene;
and training the preset network according to the supervision target by taking the embedded vector hierarchical tree as input until the preset network reaches the configuration accuracy, and stopping training.
5. The embedded vector generation method based on the embedded point hierarchy as claimed in claim 1, wherein said querying in the target hierarchy tree according to the embedded point data to be processed, to obtain a target embedded vector, includes:
Splitting the buried data to be processed according to the arrangement sequence of the characters to obtain each character;
traversing each character in the target hierarchical tree in sequence until the configuration condition is met, stopping traversing, and acquiring traversed tree nodes;
and constructing the target embedded vector according to the configuration condition by using the embedded vector corresponding to the traversed tree node.
6. The embedded vector generation method based on the embedded point hierarchy as claimed in claim 5, wherein said constructing the target embedded vector with the embedded vector corresponding to the traversed tree node according to the configuration condition comprises:
when the configuration condition is that each character traverses to a corresponding tree node, acquiring a tree node corresponding to a last character from the traversed tree node as a target tree node, and determining an embedded vector corresponding to the target tree node as the target embedded vector; or alternatively
When the configuration condition is that the target character in each character does not traverse to the corresponding tree node, determining the current level corresponding to the target character and the highest level of the target level tree, obtaining an embedded vector corresponding to the target character, calculating the level difference between the highest level and the current level, and performing zero padding processing on the obtained embedded vector based on the level difference to obtain the target embedded vector.
7. An embedded vector generation device based on a buried point level, characterized in that the embedded vector generation device based on a buried point level includes:
the acquisition unit is used for acquiring the historical buried point data;
the construction unit is used for constructing a dictionary tree according to the historical buried point data;
the merging unit is used for merging the dictionary trees to obtain a hierarchical tree structure, and comprises the following steps: acquiring non-bifurcation links in the dictionary tree, and merging branches on each non-bifurcation link to obtain the hierarchical tree structure;
the initialization unit is used for randomly initializing the embedded vector of each level in the level tree structure to obtain an embedded vector level tree;
the training unit is used for training a preset network for the sample according to the embedded vector hierarchical tree;
the updating unit is used for acquiring specified parameters from the preset network when the preset network training is completed, and updating the embedded vector hierarchical tree according to the specified parameters to obtain a target hierarchical tree;
the acquisition unit is further used for acquiring buried point data to be processed when receiving an embedded vector generation instruction;
and the query unit is used for querying in the target level tree according to the buried data to be processed to obtain a target embedded vector.
8. An electronic device, the electronic device comprising:
a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system
A processor executing instructions stored in the memory to implement the embedded vector generation method based on a buried point hierarchy as recited in any one of claims 1 to 6.
9. A computer-readable storage medium, characterized by: the computer-readable storage medium has stored therein at least one instruction that is executed by a processor in an electronic device to implement the embedded vector generation method based on a buried point hierarchy as recited in any one of claims 1 to 6.
CN202011045397.8A 2020-09-28 2020-09-28 Embedding vector generation method, device, equipment and medium based on embedded point level Active CN112183630B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011045397.8A CN112183630B (en) 2020-09-28 2020-09-28 Embedding vector generation method, device, equipment and medium based on embedded point level

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011045397.8A CN112183630B (en) 2020-09-28 2020-09-28 Embedding vector generation method, device, equipment and medium based on embedded point level

Publications (2)

Publication Number Publication Date
CN112183630A CN112183630A (en) 2021-01-05
CN112183630B true CN112183630B (en) 2023-09-26

Family

ID=73946500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011045397.8A Active CN112183630B (en) 2020-09-28 2020-09-28 Embedding vector generation method, device, equipment and medium based on embedded point level

Country Status (1)

Country Link
CN (1) CN112183630B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1097664A (en) * 1996-09-19 1998-04-14 Toshiba Corp Area extracting method
JP2013073256A (en) * 2011-09-26 2013-04-22 Osaka Prefecture Univ Approximate nearest neighbor search method, nearest neighbor search program, and nearest neighbor search device
CN107562620A (en) * 2017-08-24 2018-01-09 阿里巴巴集团控股有限公司 One kind buries an automatic setting method and device
CN108153641A (en) * 2016-12-05 2018-06-12 北京国双科技有限公司 A kind of nothing buries deployment monitoring method and a relevant apparatus
CN108536589A (en) * 2018-03-26 2018-09-14 广州小鹏汽车科技有限公司 A kind of application program buries point methods and system
WO2020020088A1 (en) * 2018-07-23 2020-01-30 第四范式(北京)技术有限公司 Neural network model training method and system, and prediction method and system
CN111191677A (en) * 2019-12-11 2020-05-22 北京淇瑀信息科技有限公司 User characteristic data generation method and device and electronic equipment
CN111210336A (en) * 2019-12-16 2020-05-29 北京淇瑀信息科技有限公司 User risk model generation method and device and electronic equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1097664A (en) * 1996-09-19 1998-04-14 Toshiba Corp Area extracting method
JP2013073256A (en) * 2011-09-26 2013-04-22 Osaka Prefecture Univ Approximate nearest neighbor search method, nearest neighbor search program, and nearest neighbor search device
CN108153641A (en) * 2016-12-05 2018-06-12 北京国双科技有限公司 A kind of nothing buries deployment monitoring method and a relevant apparatus
CN107562620A (en) * 2017-08-24 2018-01-09 阿里巴巴集团控股有限公司 One kind buries an automatic setting method and device
CN108536589A (en) * 2018-03-26 2018-09-14 广州小鹏汽车科技有限公司 A kind of application program buries point methods and system
WO2020020088A1 (en) * 2018-07-23 2020-01-30 第四范式(北京)技术有限公司 Neural network model training method and system, and prediction method and system
CN111191677A (en) * 2019-12-11 2020-05-22 北京淇瑀信息科技有限公司 User characteristic data generation method and device and electronic equipment
CN111210336A (en) * 2019-12-16 2020-05-29 北京淇瑀信息科技有限公司 User risk model generation method and device and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种基于前缀编码的树生成算法;况立群;熊风光;韩燮;;小型微型计算机系统(05);第73-76+100页 *
基于前缀编码的先根遍历树生成算法的研究与应用;况立群;熊风光;韩燮;;计算机应用与软件(04);第51-54页 *

Also Published As

Publication number Publication date
CN112183630A (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN111950621B (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN112446025A (en) Federal learning defense method and device, electronic equipment and storage medium
CN111949708B (en) Multi-task prediction method, device, equipment and medium based on time sequence feature extraction
CN111666415A (en) Topic clustering method and device, electronic equipment and storage medium
CN112860848B (en) Information retrieval method, device, equipment and medium
CN111985545B (en) Target data detection method, device, equipment and medium based on artificial intelligence
CN111858834B (en) Case dispute focus determining method, device, equipment and medium based on AI
CN113158676A (en) Professional entity and relationship combined extraction method and system and electronic equipment
CN112396547A (en) Course recommendation method, device, equipment and medium based on unsupervised learning
CN112948275A (en) Test data generation method, device, equipment and storage medium
CN112800178A (en) Answer generation method and device, electronic equipment and readable storage medium
CN116341523A (en) Text error correction method, device, computer equipment and storage medium
CN111950707B (en) Behavior prediction method, device, equipment and medium based on behavior co-occurrence network
CN112651782B (en) Behavior prediction method, device, equipment and medium based on dot product attention scaling
WO2023040145A1 (en) Artificial intelligence-based text classification method and apparatus, electronic device, and medium
CN113204698B (en) News subject term generation method, device, equipment and medium
CN112052409B (en) Address resolution method, device, equipment and medium
CN112395432B (en) Course pushing method and device, computer equipment and storage medium
CN113256181A (en) Risk factor prediction method, device, equipment and medium
CN113128196A (en) Text information processing method and device, storage medium
US20110082828A1 (en) Large Scale Probabilistic Ontology Reasoning
CN112183630B (en) Embedding vector generation method, device, equipment and medium based on embedded point level
CN116823437A (en) Access method, device, equipment and medium based on configured wind control strategy
CN113449037B (en) AI-based SQL engine calling method, device, equipment and medium
CN113705692A (en) Emotion classification method and device based on artificial intelligence, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant