WO2020010996A1 - 超链接的处理方法和装置及存储介质 - Google Patents
超链接的处理方法和装置及存储介质 Download PDFInfo
- Publication number
- WO2020010996A1 WO2020010996A1 PCT/CN2019/092279 CN2019092279W WO2020010996A1 WO 2020010996 A1 WO2020010996 A1 WO 2020010996A1 CN 2019092279 W CN2019092279 W CN 2019092279W WO 2020010996 A1 WO2020010996 A1 WO 2020010996A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- hyperlink
- output
- context
- target
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/134—Hyperlinking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Definitions
- the present application relates to the field of computers, and in particular, to a method and device for processing a hyperlink, and a storage medium.
- Hypertext document Text with a hyperlink structure is called a hypertext document or hypertext.
- the hypertext document needs to be vectorized first and expressed as a fixed-length feature vector.
- the embodiments of the present application provide a method and a device for processing a hyperlink, and a storage medium, so as to at least solve a technical problem of information loss caused by converting a hyperlink object into a common object.
- An embodiment of the present application provides a method for processing a hyperlink, including: converting first context information of a first hyperlink in a first object into a first context vector; and acquiring a first when the first object is used as a link source.
- An input vector wherein the first object contains information of the first hyperlink pointing to a second object; obtaining a first average vector according to the first context vector and the first input vector; adjusting the first At least one of an input vector, the first context vector, and a first output vector corresponding to the second object; and calculating a similarity between the first output vector and the first average vector according to an adjustment result, When the similarity between the first output vector and the first average vector is greater than or equal to a first target threshold, the first output vector is used as an output vector of the second object and output.
- An embodiment of the present application provides a method for processing a hyperlink, including: obtaining a first input vector when a first object is used as a link source, wherein the first input vector is used to represent at least the first object and the first object
- the first object describes content of a second object, the first object including information of a first hyperlink pointing to the second object; obtaining a first output vector when the second object is used as a link target; and at least according to the The first input vector and the first output vector are adjusted to obtain an output vector of the second object.
- An embodiment of the present application further provides a hyperlink processing apparatus, including: a conversion unit configured to convert first context information of a first hyperlink in a first object into a first context vector; and a first acquisition unit configured to: Acquiring a first input vector when the first object is used as a link source, wherein the first object includes information of the first hyperlink pointing to a second object; a second acquiring unit is configured to A context vector and the first input vector to obtain a first average vector; and an adjusting unit configured to adjust at least one of the first average vector, the first input vector, and a first output vector corresponding to the second object One; an output unit, configured to calculate the similarity between the first output vector and the first average vector according to the adjustment result, and when the similarity between the first output vector and the first average vector is greater than or equal to When the first target threshold value is used, the first output vector is used as an output vector of the second object and output.
- a conversion unit configured to convert first context information of a first hyperlink in a first object into a first context vector
- An embodiment of the present application further provides a hyperlink processing apparatus, including: a first obtaining unit, configured to obtain a first input vector when a first object is used as a link source, where the first input vector is at least used to represent The first object and content of the second object describing the second object, the first object including information of a first hyperlink pointing to the second object; a second obtaining unit, configured to obtain the second object A first output vector when used as a link target; an adjusting unit, configured to adjust to obtain an output vector of the second object according to at least the first input vector and the first output vector.
- An embodiment of the present application further provides an electronic device including a processor and a memory connected to the processor.
- the memory stores a computer program executable by the processor, and the processor executes the processor.
- An embodiment of the present application further provides a storage medium, which stores a computer program, wherein the computer program can be executed by a processor to complete the operations of the foregoing methods.
- FIG. 1 is a schematic diagram of an application environment of a method for processing a hyperlink according to an embodiment of the present application
- FIG. 2 is a schematic flowchart of a method for processing a hyperlink according to an embodiment of the present application
- FIG. 3 is a schematic diagram of a hypertext document and a hyperlink according to an embodiment of the present application
- FIG. 4 is a schematic diagram of another method for processing a hyperlink according to an embodiment of the present application.
- FIG. 5 is a schematic diagram of another method for processing a hyperlink according to an embodiment of the present application.
- FIG. 6 is a schematic diagram of another method for processing a hyperlink according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of a h-d2v network structure according to an embodiment of the present application.
- FIG. 8 is a schematic diagram of another method for processing a hyperlink according to an embodiment of the present application.
- FIG. 9 is a schematic diagram of another method for processing a hyperlink according to an embodiment of the present application.
- FIG. 10 is a schematic diagram of another method for processing a hyperlink according to an embodiment of the present application.
- FIG. 11A is a schematic structural diagram of a hyperlink processing apparatus according to an embodiment of the present application.
- 11B is a schematic structural diagram of a hyperlink processing apparatus according to an embodiment of the present application.
- 11C is a schematic structural diagram of a hyperlink processing apparatus according to an embodiment of the present application.
- 11D is a schematic structural diagram of a hyperlink processing apparatus according to an embodiment of the present application.
- 11E is a schematic structural diagram of a hyperlink processing apparatus according to an embodiment of the present application.
- FIG. 12A is a schematic structural diagram of another hyperlink processing apparatus according to an embodiment of the present application.
- FIG. 12B is a schematic structural diagram of another hyperlink processing apparatus according to an embodiment of the present application.
- FIG. 12C is a schematic structural diagram of another hyperlink processing apparatus according to an embodiment of the present application.
- FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the vectorization processing method of a hypertext document usually converts a hyperdocument into an ordinary document and then performs vectorization processing.
- the above-mentioned processing method of the hypertext document ignores the relationship between the hyperlink content in the hypertext document and the source document and the related context, resulting in information loss.
- the context is regarded as the absolute description of the target document and the source is lost.
- the background information provided by the document makes it impossible for the source document to refer to the target document by comparing the two documents.
- the embodiment of the present application provides a method for processing a hyperlink.
- the method for processing a hyperlink can be applied to, but not limited to, the application environment shown in FIG. 1.
- the hyperlink object is transmitted to the server 106 through the network 104.
- the server 106 converts, for each hyperlink object, the first context information of the first hyperlink in the first object into a first context vector; and obtains the first object as A first input vector when linking sources, where the first object contains information of a first hyperlink pointing to a second object; obtaining a first average vector according to the first context vector and the first input vector; adjusting the first input vector, the first At least one of a context vector and a first output vector corresponding to the second object; the similarity between the first output vector and the first average vector is calculated according to the adjustment result, and when the first output vector is similar to the first average vector When the degree is greater than or equal to the first target threshold, the first output vector is used as the output vector of the second object and
- a hyperlink refers to a tag that points to another object from a sentence in one object, such as a Uniform Resource Locator (URL) in a web page, a reference in an academic paper Wait.
- Hyperlink objects refer to objects that contain hyperlinks, including but not limited to: hypertext documents.
- Hypertext documents (referred to as hyperdocuments for short) refer to documents containing hyperlinks, including but not limited to ordinary web pages and academic papers.
- the object containing the hyperlink is called the source object (or the link source), and the object pointed to by the hyperlink is called the target object (or the link target).
- the target object or the link target.
- it can contain one or more hyperlinks to refer to one or more objects, or it can be referenced by one or more other objects.
- the servers that perform processing operations on the hyperlink objects and perform classification, recommendation, and retrieval operations on the hyperlink objects may be the same server or different servers. This embodiment is not limited thereto.
- the foregoing terminal may include, but is not limited to, at least one of the following: a mobile phone, a tablet computer, a PC, and the like.
- the above network may include, but is not limited to, a wireless network or a wired network.
- the wireless network includes Bluetooth, WIFI, and other networks that implement wireless communication.
- the wired network includes a local area network, a metropolitan area network, and a wide area network.
- the foregoing server may include, but is not limited to, at least one of the following: a PC and other devices for providing services. The above is only an example, and this embodiment does not limit this in any way.
- a method for processing a hyperlink executable by a computing device may include the following steps.
- Step S202 Convert the first context information of the first hyperlink in the first object into a first context vector
- Step S204 Obtain a first input vector when the first object is used as a link source, where the first object includes information of a first hyperlink pointing to the second object;
- Step S206 Obtain a first average vector according to the first context vector and the first input vector.
- Step S208 Adjust at least one of the first input vector, the first context vector, and the first output vector corresponding to the second object;
- Step S210 Calculate the similarity between the first output vector and the first average vector according to the adjustment result. When the similarity between the first output vector and the first average vector is greater than or equal to the first target threshold, use the first output vector as the first Output vector of two objects and output.
- the above-mentioned method for processing a hyperlink may be, but is not limited to, a process of classifying, recommending, and searching for a specific object.
- a process of classifying, recommending, and searching for a specific object For example, it is used in the citation recommendation of academic papers, or the retrieval of similar documents and keywords, or the classification of hyperdocuments (ie, hypertext documents).
- the server obtains a paper collection of existing papers.
- the context information of a hyperlink in the first paper (the first object, the link source of the hyperlink) is converted into a context vector, and the first An input vector when a paper is used as a link source, where the first paper contains information about hyperlinks pointing to the second paper (second object, the link target of the hyperlink); obtain the first average vector according to the context vector and the input vector; adjust At least one of the first average vector, the first input vector, and the output vector corresponding to the second paper; the similarity between the output vector and the average vector is calculated according to the adjustment result, and when the similarity between the output vector and the average vector is greater than or equal to For the target threshold, the output vector is used as the output vector of the second paper and output.
- the output vector (and input vector) of each existing paper can be saved to the server.
- a user when writing an academic paper, a user can use a target APP for referral recommendation on a terminal, and the target APP can perform data interaction with a server that stores the above-mentioned output vector and input vector.
- a context is input at a designated input position of the target APP, and the target APP sends the context to the server.
- the server uses the target formula to score each existing paper based on the output vector of the existing paper and the context (the details will be described below) (Discussion), obtain the score value of each existing paper, determine one or more existing papers that can be referenced by the context according to the obtained score values, and point to one or more super papers of the determined one or more existing papers
- a link is sent to the terminal so that users can cite an existing paper. It can be understood that one or more hyperlinks may be embedded in the context for sending, or may be directly sent in a manner of displaying in combination with the abstract, which is not limited in the embodiment of the present application.
- the server that processes existing papers may be the same or different. This is not done in this embodiment. limited.
- the source document (Zhao and Gildea, 2010, that is, Zhao and Gildea, 2010, abbreviated as d s ) is formed in a sentence ("We also evaluate our model by computing the machine translation BLEU score ( Papineni et al., 2002) using the Moses system (Koehn et al., 2007) ", that is,” we also calculate and translate the BLEU score of its translation by using the Moses system (Cohen et al., 2007) (Papinenni et al., 2002) Let's evaluate our model ", abbreviated as C), point to the target document through a hyperlink ((Papinenni et al., 2002) and (Cohen et al., 2007), abbreviated as d t ).
- the above hyperlink can be recorded as ⁇ d s , C, d t >.
- a vectorized representation of a hyperlink object may be performed.
- Vectorized representation is a way to express an abstract object (such as a word, document, user, etc.) as a fixed-length feature vector.
- Objects represented by vectorization can be used by specific applications in the subsequent classification, recommendation, and retrieval of the original object. Different from the traditional method of manually extracting each dimension of the feature vector, the vectorized representation method uses an automatic method to obtain the entire feature vector.
- word2vec (abbreviated as w2v): For each word in the document, w2v will learn an IN vector and an OUT vector.
- the technique of word vector learning includes two variants of cbow and skip-gram: the cbow method averages the IN vector of the context word, and uses this to predict the OUT vector of the current word.
- the skip-gram method uses the IN vector of the current word to predict the OUT vector of the context word. Because the relationship of words in ordinary documents is mutual (a word is the context word of b word, b word is also the context word of a word), so the IN vector and OUT vector of a learned word are similar.
- d2v doc2vec
- d2v is based on w2v extension and also contains two variants: pv-dm and pv-dbow.
- the pv-dm method is similar to cbow, except that the IN vector of the current document is also counted into the average as a special context word vector.
- the pv-dbow method uses the IN vector of the current document and a skip-gram-like network structure to predict the OUT vector of words in the document.
- the following describes the two ways of vectorizing a hyperdocument (references are words and context is content).
- the way of quoting is to convert a superdocument into a normal document and correspondingly call w2v to obtain a document vector.
- the context is The content way is to convert the super document into a normal document and correspondingly call d2v to get the document vector.
- References are words (process the reference information in the source document as words in the source document): treat the document ID (reference information in the source document) generated by referencing the target document in the source document as a special word. Use w2v to find a vector of all "words" for a set of superdocuments (combination of multiple superdocuments) containing special words, and treat the "word" vectors of special words as the vectors corresponding to the superdocument. As shown in Figure 4. In the above source document d s (Cohen et al., 2007) is citation information, that is, special words.
- Context is content (process the context of the hyperlink in the source document as the content in the target document): first delete all hyperlinks in the source document; then add the context C of each hyperlink to the target document as the content of the target document ; Then enter the transformed ordinary document into d2v to get the document vector, as shown in Figure 5.
- This method is called d2v-cac.
- d2v-nc a method of deleting a hyperlink without entering a context in the target document and entering d2v.
- the super-document vectorization method in the related art can be used to make citation recommendations in academic papers, but the commonly used method is essentially a citation or word method, and can only be used for this task in the field.
- the vectorization method can also be used to find the node vectors in an undirected graph, but the method used only deals with the link structure between the super-documents, without considering the text content information.
- the IN vector of the source document d s corresponding to the hyperlink (The effect is the same as the aforementioned first input vector, the IN vector can be the initial input vector of the source document, or the intermediate input vector of the source document obtained during the training process);
- the IN vector w I corresponding to the word in the context C of the hyperlink (C in this example is the context obtained by removing the hyperlink from the sentence containing the hyperlink);
- the OUT vector of the target document d t corresponding to the hyperlink (the same as the aforementioned first output vector, the OUT vector can be the initial output vector of the target document or the intermediate output vector of the target document obtained during the training process) and other Super document's OUT vector.
- Input layer IN vector of d s
- the IN vector w I of each word w in C and C is averaged using formula (1) to get the vector x:
- Output layer The OUT vectors d O of all hyperdocuments D form a softmax classifier to match the appropriate hyperdocument for x.
- the recommended basis is:
- the above input layer structure is similar to the pv-dm model, except that the output layer is replaced by a softmax word classifier into a document classifier.
- D I is a matrix composed of IN vectors of each super document in the super document set
- D O is a matrix composed of OUT vectors of each super document in the super document set
- W I is an IN vector of all words in the super document set. matrix.
- the words in the document are used to pre-train the document vector.
- a method similar to pv-dm is used to optimize the above objective function.
- the network shown in FIG. 6 is firstly subjected to several rounds (for example, 5 rounds) of pv-dm iteration, and then the above objective function is used to perform several rounds of iteration optimization .
- n is the number of samples
- P N (d) is the average distribution over all document sets.
- a method of retraining / fine-tuning (also called retro-fitting) may be adopted, and two targets may be linearly combined, using joint optimization or multiple targets. Learning methods to learn documents and word vectors.
- a method of hierarchical softmax may also be used to simplify and accelerate the learning process.
- P N (d) may also choose to make the probability of each document proportional to the number of times the document is cited (the document that is cited 0 times uses a smoothing technique to obtain a non-zero probability).
- a similar approach to w2v / d2v's sub-sampling can be used to prevent highly cited documents from being sampled too many times.
- a method of hierarchical softmax may also be used to simplify and accelerate the learning process.
- Document No. 101 refers to Document No. 108, where ds represents a vector of the document 101 itself, which can be stored in advance.
- the vector of the 101st document itself will be obtained first, and then the part of the 101st document containing the hyperlink will be located, and the context information of the hyperlink and the corresponding context vector (equivalent to the vector) will be obtained.
- the target document pointed to by this hyperlink will be obtained, that is, the vector corresponding to the document No. 108 (equivalent to the output vector dt).
- the vectors of ds and C are averaged to obtain an averaged vector (assuming da).
- the average vector da, the output vector dt, and the objective function to be optimized are put into an optimization algorithm such as gradient descent.
- the algorithm will give suggestions on how to adjust ds, C, and dt to increase the objective function. Repeat the above process several times until the objective function almost never increases. When the objective function no longer increases, the optimization algorithm will output the IN and OUT vectors of all documents.
- multiple criteria can be used to evaluate the processing method of the hyperlink.
- the above criteria may include, but are not limited to, context sensitive, content sensitive, new document friendly, and context intent sensitive. These four standards are specified below.
- Context sensitive The hyperdocument vector of a hyperdocument must be affected by the hyperlink context in other hyperdocuments that point to it (that is, in other hyperdocuments that point to the current document, what context do those hyperdocuments use to describe the current document) .
- Hyperdocument vectors must be affected by their content.
- New documents are friendly: For newly generated hyper-documents, such as new web pages and new papers, there is probably no other document pointing to itself. For this new document, the hyperdocument vectorization method should also be able to generate hyperdocument vectors for the new document.
- Context intent sensitive Unlike the above three criteria for hyperdocument vectors, this criterion is for word vectors in the process of hyperdocument vectorization.
- a good hyperdocument vectorization method should be able to express the intention of the hyperlink (for example, refer to the target document in a broad sense, or agree / disagree with the opinions and methods in the target document) in the vector corresponding to the context word.
- the w2v method is not content-sensitive. As shown in Figure 4, "... Computer Translation BLEU Score " although it is in the source document (Zhao and Gilder, 2010), it has nothing to do with this document after the conversion: the word vector is not Document ID (Zhao and Gilder, 2010). In addition, for newly published and uncited papers, the w2v method does not generate special words for its document ID, nor does it generate a "word" vector for it, so w2v is not new document friendly.
- the d2v-nc method is not context-sensitive. This is because the hyperlink is removed without adding the context to the target document, so the connection between the target document and the context is lost.
- the processing method of the hyperlink provided in this example can meet all four criteria mentioned above, namely, context sensitive, content sensitive, new document friendly, and context intent sensitive.
- the following uses the hyper document shown in FIG. 7 as an example for description:
- Context sensitive When optimizing the aforementioned objective function, the OUT vector of (Papineni et al., 2002) will be affected by the IN vector of context words (such as "BLEU").
- New document friendly If a hyperlink is not pointed to by any hyperlink, it can rely on its own content to get the IN vector at worst. At the same time, when the number of negative examples of negative sampling is large enough, an OUT vector is also generated for the current document.
- Context intent sensitive The aforementioned objective function enables each vector of the source document / target document pair and the context word to be improved on each other.
- the context word vector captures the "evaluateby" implicit "use method / technology in the target document" intent, which can make Better prediction of hyperlinks (Papineni et al., 2002) is the target document for BLEU evaluation methods in machine translation.
- the network in FIG. 4 is trained on multiple networks similar to (Zhao and Gilder, 2010) / (Papineni et al., 2002) (source document / target document), the network in FIG. 4 can be better. Capture the semantics of "using methods / techniques in the target document".
- Table 1 shows the analysis results of w2v, d2v-nc, d2v-cac, and h-d2v in combination with the above-mentioned four standards.
- the hyperlink processing method (h-d2v) provided in this example avoids the loss of key information by directly modeling the three elements of the hyperlink (source document, context, and target document). With this modeling approach, h-d2v can meet all four criteria.
- the input vector is a vector corresponding to the object as a link source.
- the information contained in the input vector can be used to represent the object and the target object referenced by the object.
- the output vector is the vector corresponding to the object as the link target. The information contained in the output vector can be used to indicate the source object that references the object, and the source object that references the object is used to describe the content of the object.
- the initial input vector and the initial output vector of each hyperlink object may be obtained first.
- the initial input vector of each hyperlink object may be a document vector of each hyperlink object obtained by inputting a set of hyperlink objects into the first target model.
- the first target model is used to vectorize each hyperlink object in the hyperlink object set to obtain the document vector of each hyperlink object, and use the obtained document vector of each hyperlink object as the initial input vector of each hyperlink object.
- An input vector of each word can also be obtained through the first target model. It can be understood that the initial output vector of the hyperlink object can be randomly generated by the target algorithm.
- the hyperlink object set may be processed to convert all hyperlink objects in the hyperlink object set into ordinary objects.
- the transformation may include, but is not limited to, directly deleting the hyperlink, deleting the hyperlink in the source object and adding the context of the hyperlink to the target object as the content of the target object, and using the reference information corresponding to the hyperlink as a special word.
- preprocessing operations such as word segmentation and part-of-speech annotation may be performed on the hyperlink objects.
- a specific pre-processing method may be performed as required, which is not limited in this application.
- the context containing the above hyperlink may be specified content that contains the hyperlink in the hyperlink object.
- the context containing the hyperlink can be obtained by setting the number of words in the context (for example, the context containing the above hyperlink can be included in the hyperlink object from 50 words before the hyperlink to 50 words after the hyperlink Content), or you can obtain the context containing the hyperlink by setting the number of sentences included in the context (for example, the context containing the above hyperlink is the statement containing the hyperlink in the hyperlink object, or What's included in a sentence before a sentence containing a hyperlink to a sentence after a sentence containing a hyperlink).
- the first context may be a context obtained by removing a hyperlink from a context containing the hyperlink.
- obtaining the first average vector of the first input vector and the first context vector may include: averaging the input vector corresponding to the first input vector and each word in the first context, where the first A context is a context obtained by removing the first hyperlink from the context containing the first hyperlink in the first object.
- the first average vector may be obtained according to the first input vector and the first context vector.
- the method for obtaining the first average vector may be: averaging the input vector corresponding to the first input vector and each word in the first context to obtain the first average vector.
- the vector length of the first input vector is the same as the length of the input vector of the word in the context. You can average the values of the elements at each position in the input vector corresponding to the first input vector and the word in the first context. Way to get the first average vector.
- the length of the vector of the first input vector is the same as the length of the input vector of each word, both of which contain 6 elements.
- the values of the elements in each position of the first input vector are respectively 5 and 5.
- the value of the element at the corresponding position on the input vector of the word is averaged (for example, the value of the first element of the first input vector is averaged with the value of the first element of each word in the 5 words, and so on), Get the first average vector.
- the first input vector may be an initial input vector of the first object, or may be an intermediate input of the first object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- the first output vector may be an initial output vector of the second object, or may be an intermediate output vector of the second object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- the similarity between the output vector of the target object and the average vector is adjusted to be greater than or Equal to the first target threshold, the output vector of the target object can more accurately represent the source object (the input vector of the source object) referencing the target object and the information describing the target object in the source object referencing the target object ( Hyperlink context vector), so that the input vector of the source object can more accurately represent its content (the input vector of the source object) and the reference target object (the output vector of the target object).
- the foregoing adjustment process may be by inputting a first input vector and a word vector corresponding to each word in the first context to a second target model, obtaining a first average vector from the second target model, and An average vector is compared with the first output vector of the second object, and at least one of the first input vector, the first context vector, and the first output vector is adjusted to increase the similarity between the first average vector and the first output vector. Sex.
- the input of the second target model may include an initial input vector and an initial output vector of each hyperlink object, and an initial input vector of each word in the hyperlink object set.
- the second objective model uses an optimization algorithm to optimize the objective function by adjusting the input vector and output vector of each hyperdocument and the input vector of each word in the set of hyperlink objects.
- the variables of the objective function are the input vector and output vector of the hyperlink object, and the input vector of each word in the set of hyperlink objects.
- the objective function is used to solve the input vector and output vector of each hyperlink object that satisfies the following conditions, and the input vector of each word in the set of hyperlink objects: the source of each hyperlink in all hyperlinks included in the set of hyperlink objects
- the average vector of the input vector of the object and the input vector corresponding to the word in the context containing the hyperlink has the highest similarity with the output vector of the target object of the hyperlink.
- the first input vector and the first output vector may be updated to the adjusted first input vector and the first output vector.
- the output output vector can be used to update the last stored output vector.
- the first input vector may also be used as the input vector of the first object and output, and the input vector may be used to update the first input vector.
- the input vector of each word in the adjusted average vector can also be used as the input vector of the corresponding word, and the input vector of the output word can be used to update the input vector of the word.
- first input vector and the first output vector are updated to the adjusted first input vector and the first output vector
- other hyperlinks in all objects can be located, and the located hyperlink is used as the first hyperlink.
- the source object of the hyperlink is taken as the first object
- the target object of the hyperlink is taken as the second object, and the foregoing steps are repeatedly performed until the hyperlinks contained in all the objects have been processed.
- the steps of locating all the hyperlinks and processing the positioned hyperlinks can be repeatedly performed to obtain a more accurate vectorized representation of the hyperlink object.
- all the above objects may be all objects in a hyperlink object set.
- the collection of hyperlink objects is a collection of hyperlinks
- the hyperlink document is a hyperlink.
- the method for processing hyperlinks includes the following steps:
- Step 1 Convert each super document in the super document collection into a normal document
- Step 2 Use a first target model (eg, a pv-dm model) to initialize the initial document vector of each super document (transformed ordinary document) in the super document set and the initial IN vector and initial OUT of each word in the super document set.
- the vector is processed to obtain the document vector of each super document in the super document set and the IN vector and the OUT vector of each word in the super document set;
- Step 3 Use the second target model for the initial IN vector of each super document in the super document set (the document vector of each super document obtained using the first model) and the initial OUT vector, and the IN vector of each word in the super document set (Using the IN vector of each word in the super document set obtained by the first model) to obtain the IN vector (acting the same as the aforementioned target input vector) and the OUT vector (acting the same as the aforementioned target output vector) of each super document in the super document set , And the IN vector of each word in the hyperdocument collection.
- the processing method of the hyperlink may include two phases: a pre-training phase and a training phase.
- the pre-training phase the first target model is used to obtain the IN vector and the OUT vector of each word in the super document set, and the document vector of each super document; in the training phase, the second target model is used to obtain the IN vector of each super document. And OUT vectors, and the IN vectors for each word in the superdocument collection.
- the pre-training phase first, 100 hyper documents in the hyper document set are converted into 100 ordinary documents.
- the conversion method can be as described in d2v-nc: delete the hyperlinks in the hyper document but do not add context to the target document.
- the initial IN vector and the initial OUT vector of each word in the super document set and the initial document vector of each super document are obtained through the target algorithm; 100 converted ordinary documents, the initial IN vector and the initial OUT vector of each word are converted, And the initial document vector of each super document is input to the pv-dm model to obtain the IN vector and the OUT vector of each word in the super document set, and the document vector of each super document.
- the document vector of each of the above super documents may be used as an initial IN vector (which may be one of the foregoing first input vectors) of the super document in the second target model.
- the method of processing the super document by the second target model can be called hyperdoc2vec (h-d2v for short).
- This method avoids the loss of key information by directly modeling the three elements of the hyperlink (source document, context, and target document).
- two vectors (IN vector and OUT vector) are used to represent each hyperdocument.
- the IN vector d I of the super document d stores information when the super document d is used as a source document, for example, the content of d itself, what kind of document d references, and the like.
- the OUT vector d d d O stored as information when the target document, for example, what kind of d referenced documents describe how a document reference d of d the like.
- the input of the second target model is: the initial IN vector and the initial OUT vector of each super document in the super document set, and the IN vector of each word in the super document set.
- the initial IN vector of each super document may be the document vector of each super document obtained using the first target model, and the initial OUT vector of each super document may be the OUT vector of each super document randomly generated using the target algorithm.
- the IN vector of the word may be the IN vector of each word in the super document set obtained using the first target model.
- the generation timing of the initial OUT vector may be any time before the initial IN vector and initial OUT vector of each super document and the IN vector of each word in the super document set are input to the second target model. For example, before using the first target model, it is generated together with the initial IN vector and initial OUT vector of each word in the super document set, and the initial document vector of each super document. It may also be obtained after using the first target model to obtain the IN vector and the OUT vector of each word in the super document set, and the document vector of each super document. The timing of obtaining the initial initial OUT vector is not limited in this example.
- the operation of obtaining the hyperlinks can be performed before using the second target model or using the second target model. Executed after the target model.
- all hyperlinks can be obtained by scanning all the super documents in the super document collection, or in the process of using the second target model, all the super documents can be scanned in order to obtain all the links. Hyperlinks.
- the specific acquisition method and timing are not limited in this example.
- the IN vector and the OUT vector of each super document in the super document set can be obtained.
- the obtained output vector (and input vector) can be used in multiple scenes, and the above scenes can include but are not limited to:
- Hyper-document classification For a set of labeled hyper-documents ⁇ d, l> ⁇ (that is, ⁇ document, tags> ⁇ ), use a classification algorithm (for example, SVM, etc.), training using hyper-document vectors and annotations
- the data ⁇ d, l> ⁇ is used to train the classifier and applied to the unknown superdocuments to predict the type of the superdocuments.
- the document vector used in the prediction process may be an IN vector or an OUT vector of a super document, or a concatenation of the two.
- the obtained super document vector (input vector and output vector) can be used to calculate the similarity between documents (universal cosine angle method, using the IN vector or OUT vector of the super document, or The concatenation of the two) and the similarity between the document and the keywords (using the OUT vector of the super document and the input vector of the words in the super document set), such similarity calculation tasks are very common in Internet-related products For example, the precise placement of advertisements (the similarity between advertisement documents and user search terms), the construction of knowledge graphs (the similarity between various entities and descriptions), and so on.
- Citation recommendation In the writing of academic papers, you can automatically recommend suitable papers as references for a specified context (using the OUT vector of the super document and the input vector of words in the super document set). Assuming the context word set is C, the existing paper can be scored by formula (4):
- w is a context word in C
- w I is an input vector of words
- d is a document to be scored
- d O is an OUT vector of the document to be scored.
- multiple methods can be used to determine the recommended existing papers. For example, all existing papers are scored to obtain the scores of the existing papers; one or more existing papers with the highest scores are selected according to the scores of the existing papers, and references are recommended for the context. For another example, you can set a target threshold for the existing papers recommended by citations, or the target threshold and the number of cited recommendations.
- the existing paper's score is greater than (or Equal to) the target threshold, if it is greater than the target threshold, determine to refer the existing paper to the context and recommend, if the number of recommendations is 1, then the end; if the number of recommendations is n (n is greater than or equal to 2), then judge Whether the recommended number is equal to n. If it is equal to n, the process ends; if it is less than n, the scoring operation is continued until the recommended number of existing papers is equal to n.
- an input vector corresponding to each word in the third object is obtained; according to the input vector and the first The output vector of the two objects determines the target parameter of the second object; and determines whether the second object is allowed to be referenced by the third object according to the target parameter.
- determining whether to allow the second object to be referenced by the third object according to the target parameter includes: determining that the second object is allowed to be the third object when the value of the target parameter is higher than the second target threshold. Reference; or, in the case where the target parameter value of the second object in the candidate object set is the largest, it is determined that the second object is allowed to be referenced by the third object, where the candidate object set includes the second object.
- inserting a third hyperlink for pointing to the second object at the target position in the third object includes: searching for the target word in the third object, where the input vector corresponding to the target word and The similarity between the output vectors of the second object is higher than the third target threshold; a third hyperlink is inserted at a position after the target word in the third object.
- a hyperlink for pointing to the second object may be automatically inserted at a target position in the third object.
- the target position may be a start position, an end position of the third object, or any position in the middle of the third object, and the target word may also be found in the third object, where the input vector corresponding to the target word and the output of the second object
- the similarity between vectors is higher than a set threshold, or among all words contained in the third object, the words with the highest similarity between the input vector and the output vector of the second object (that is, find the third object and the second The word corresponding to the most similar word vector of the output vector of the object); inserting a hyperlink pointing to the second object at a position after the target word in the third object.
- the insertion position of the third hyperlink can be determined by the terminal displaying the third object.
- instruction information indicating the insertion position of the third hyperlink in the third object may be received; and according to the instruction information, the third hyperlink is inserted at the insertion position in the third object.
- the display position of the prompt information may be a screen of a terminal device displaying the third object. After the prompt information is displayed, input information indicating the insertion position of the third hyperlink in the third object is detected; and according to the input information, the third hyperlink is inserted at the insertion position in the third object. In this way, the user can specify where to insert the hyperlink.
- instruction information indicating the insertion position of the third hyperlink in the third object may be received; and according to the instruction information, the third hyperlink is inserted at the insertion position in the third object.
- the insertion position of the third hyperlink can be determined by the remote device (for example, the server), and the terminal displaying the third object can insert the third hyperlink at the insertion position according to the instruction information.
- a target app is opened in a terminal device, and a target text (third object) is input in the target app.
- the terminal device obtains the input target text through the target app, and sends the target text to the server through the network.
- the server scores the existing papers and determines that the existing papers recommended by the target text are the two highest-scoring existing papers ((Cohen et al., 2007) and (Papineni et al., 2002)).
- Send two existing paper hyperlinks to the terminal device You can also send two existing papers or abstracts of existing papers to the terminal device for users to determine whether to insert existing papers and insert existing papers in the target text Where?).
- the user determines the insertion position of the existing paper according to the prompt information: (Cohen et al., 2007) is inserted after the "Moses system”, and (Papineni et al., 2002) is inserted after the "BLEU score”.
- After receiving the target instruction insert two existing paper hyperlinks at the insertion position in the target text.
- the description of the hyperlink can be the document ID of an existing paper.
- first context information of a first hyperlink in a first object is converted into a first context vector; a first input vector when the first object is used as a link source is obtained, where the first object includes a pointer to a second object Information of the first hyperlink; obtaining a first average vector according to the first context vector and the first input vector; adjusting at least one of the first input vector, the first context vector, and the first output vector corresponding to the second object ; Calculate the similarity between the first output vector and the first average vector according to the adjustment result; when the similarity between the first output vector and the first average vector is greater than or equal to the first target threshold value, use the first output vector as the second object Output vector and output.
- the above method further includes the following steps.
- step S1 the first input vector and the first output vector are updated to the adjusted first input vector and the first output vector.
- the above method further includes the following steps.
- Step S2 repeat the following steps:
- Step S21 locate other hyperlinks in all objects, and use the located other hyperlinks as the second hyperlink;
- Step S22 Convert the second context information of the second hyperlink into a second context vector
- Step S23 Obtain a second input vector when the source object of the second hyperlink is used as the link source, where the source object contains information of the second hyperlink pointing to the target object;
- Step S24 Obtain a second average vector according to the second context vector and the second input vector.
- Step S25 Adjust at least one of the second input vector, the second context vector, and the second output vector corresponding to the target object;
- Step S26 Calculate the similarity between the second output vector and the second average vector according to the adjustment result.
- the similarity between the second output vector and the second average vector is greater than or equal to the first target threshold, use the second output vector as the target.
- updating the first input vector and the first output vector to the adjusted first input vector and the first output vector can ensure the validity of the first input vector and the second input vector. Further, by locating the hyperlinks in all objects and performing the adjustment steps on each hyperlink, the ability of the input vector and output vector of each object to represent the object can be improved.
- the above method further includes:
- a target parameter of the second object is determined according to an input vector of each word of the third object and an output vector of the second object; and whether the second object is allowed to be referenced by the third object is determined according to the target parameter.
- determining whether to allow the second object to be referenced by the third object according to the target parameter includes:
- the second object in the candidate object set When the value of the target parameter of the second object in the candidate object set is the largest, it is determined that the second object is allowed to be referenced by the third object, where the candidate object set includes the second object.
- the above method further includes:
- inserting a third hyperlink to point to the second object at a target position in the third object includes:
- a reference recommendation is performed for the third object by using the output vector converted by the hyperlink, and a hyperlink pointing to the recommended hyperlink object is inserted into the third object, so that after the output vector is obtained, it is the third object.
- the referral recommendation is performed, thereby improving the application value of the processing method of hyperlinks and the use value of the third object (for example, use in scenarios such as thesis writing and web design).
- the server preprocesses the super document set through step S902.
- the super document in the super document set is converted into a target input vector and a target output vector.
- a reference recommendation is made for the target text.
- An embodiment of the present application further provides a method for processing a hyperlink that can be executed by a computing device. As shown in FIG. 10, the method includes the following steps.
- Step S1002 Obtain a first input vector when the first object is used as a link source.
- the first input vector is used to represent at least the first object and the content of the first object that describes the second object.
- the first object includes a pointer to the second object. Information of the first hyperlink
- Step S1004 obtaining a first output vector when the second object is used as a link target.
- step S1006 at least the first input vector and the first output vector are adjusted to obtain an output vector of the second object.
- the above-mentioned method for processing a hyperlink may be, but is not limited to, a process of classifying, recommending, and searching for a specific object.
- the first input vector when the first object is used as the link source is obtained by using the foregoing processing method of the hyperlink, where the first input vector is used to represent at least the first object and the first object. Describes the content of the second object in the first object, the first object contains information of the first hyperlink pointing to the second object; obtaining the first output vector when the second object is the link target; and at least according to the first input vector and the first output The vector is adjusted to obtain the output vector of the second object, and the output vector is used to represent the information when the hyperlink object is used as the link target, thereby achieving the purpose of avoiding the loss of key information and achieving the technical effect of improving information integrity.
- one or more word vectors corresponding to the first context of the first hyperlink may be obtained, where the first context is The context obtained by removing the first hyperlink from the context containing the first hyperlink in the first object; obtaining a first average vector according to the first input vector and one or more word vectors corresponding to the first context; according to the first average Vector and first output vector, adjusting at least one of the first input vector, one or more word vectors corresponding to the first context, and the first output vector to obtain an output vector of the second object.
- the first average vector may be obtained according to the first input vector and the first context vector.
- the method for obtaining the first average vector may be: averaging the input vector corresponding to the first input vector and each word in the first context to obtain the first average vector.
- the vector length of the first input vector is the same as the length of the input vector of the word in the context. You can average the values of the elements at each position in the input vector corresponding to the first input vector and the word in the first context. Way to get the first average vector.
- the first input vector may be an initial input vector of the first object, or may be an intermediate input of the first object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- the first output vector may be an initial output vector of the second object, or may be an intermediate output vector of the second object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- adjusting at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector includes: calculating a similarity between the first average vector and the first output vector. Degree; based on the similarity optimization algorithm, adjusting at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector, so that the similarity between the first output vector and the first average vector The degree is greater than or equal to the target threshold.
- the foregoing adjustment process may be by inputting a first input vector and a word vector corresponding to each word in the first context to a target model, obtaining a first average vector from the target model, and combining the first average vector with The first output vector of the second object is compared, and at least one of the first average vector, the first input vector, and the first output vector is adjusted to increase the similarity between the first average vector and the first output vector.
- the input of the target model may include an initial input vector and an initial output vector of each hyperlink object, and an initial input vector of each word in the hyperlink object set.
- the objective model uses an optimization algorithm to optimize the objective function by adjusting the input vector and output vector of each hyperlink object and the input vector of each word in the hyperlink object set.
- the variables of the objective function are the input vector and output vector of the hyperlink object, and the input vector of each word in the set of hyperlink objects.
- the objective function is used to solve the input vector and output vector of each hyperlink object and the input vector of each word in the set of hyperlink objects that satisfy the following conditions: the source of each hyperlink in all the hyperlinks contained in the set of hyperlink objects The average vector of the input vector of the object and the input vector corresponding to the word in the context containing the hyperlink has the highest similarity with the output vector of the target object of the hyperlink.
- the located hyperlink is used as the first hyperlink
- the source object of the hyperlink is used as The first object, and repeats the foregoing steps until the hyperlinks contained in all objects have been processed.
- the steps of locating all the hyperlinks and processing the positioned hyperlinks can be repeatedly performed to obtain a more accurate vectorized representation of the hyperlink object.
- a first input vector when a first object is used as a link source is obtained, where the first input vector is used to represent at least the first object and the content describing the second object in the first object, and the first object includes a pointer to the first object.
- Information of the first hyperlinks of the two objects; obtaining the first output vector when the second object is the link target; and adjusting at least the first input vector and the first output vector to obtain the output vector of the second object, by using the output Vectors are used to represent hyperlink objects, which avoids the loss of key information and improves the integrity of the information.
- the above method further includes the following steps.
- Step S3 obtaining one or more word vectors corresponding to the first context of the first hyperlink, where the first context is a context obtained by removing the first hyperlink from a context including the first hyperlink in the first object;
- Step S4 Obtain a first average vector according to the first input vector and one or more word vectors corresponding to the first context.
- adjusting the output vector of the second object based on at least the first input vector and the first output vector includes:
- Step S5 Adjust at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector according to the first average vector and the first output vector to obtain the output of the second object. vector.
- adjusting at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector includes:
- Step S51 calculating the similarity between the first average vector and the first output vector
- Step S52 Adjust at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector based on the similarity optimization algorithm, so that the difference between the first output vector and the first average vector The similarity is greater than or equal to the target threshold.
- At least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector is adjusted according to the first average vector and the first output vector to obtain a second object.
- the output vector can ensure the ability of the obtained output vector to represent the second object.
- a similarity optimization algorithm is used for at least one of the first input vector, one or more word vectors corresponding to the first context, and the first output vector. The adjustment can improve the expression ability of the output vector to the second object.
- the above method further includes the following steps.
- Step S6 repeat the following steps:
- Step S61 Locate other hyperlinks in all objects according to a predetermined rule, and use the located other hyperlinks as a second hyperlink;
- Step S62 Obtain a second input vector when the source object of the second hyperlink is used as the link source, where the second input vector is used to represent at least the source object and the content describing the target object in the source object, and the source object includes a pointer to the target object. Information about the second hyperlink;
- Step S63 Obtain a second output vector when the target object is used as the link target.
- Step S64 Adjust and obtain an output vector of the target object according to at least the second input vector and the second output vector.
- the method according to the above embodiments can be implemented by means of software plus a necessary universal hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is Better implementation.
- the technical solution of this application that is essentially or contributes to the existing technology can be embodied in the form of a software product, which is stored in a storage medium (such as ROM / RAM, magnetic disk, The optical disc) includes several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to execute the methods of the embodiments of the present application.
- a terminal device which may be a mobile phone, a computer, a server, or a network device, etc.
- An embodiment of the present application further provides a hyperlink processing device for implementing the foregoing method for processing a hyperlink.
- the device includes:
- a converting unit 1102 configured to convert the first context information of the first hyperlink in the first object into a first context vector
- a first obtaining unit 1104 configured to obtain a first input vector when a first object is used as a link source, where the first object includes information of a first hyperlink pointing to a second object;
- a second obtaining unit 1106, configured to obtain a first average vector according to the first context vector and the first input vector
- An adjusting unit 1108, configured to adjust at least one of a first input vector, a first context vector, and a first output vector corresponding to a second object;
- An output unit 1110 configured to calculate the similarity between the first output vector and the first average vector according to the adjustment result; and when the similarity between the first output vector and the first average vector is greater than or equal to the first target threshold, output the first output vector The vector is output as the output vector of the second object.
- the above-mentioned hyperlink processing device may be used in, but not limited to, classification, recommendation, and retrieval of a specific object.
- the first context information of the first hyperlink in the first object is converted into the first context vector by the above-mentioned hyperlink processing device; the first time when the first object is used as the link source is obtained.
- An input vector where the first object contains information of a first hyperlink pointing to a second object; obtaining a first average vector according to the first context vector and the first input vector; adjusting the first average vector, the first input vector, and the corresponding At least one of the first output vector of the second object; the similarity between the first output vector and the first average vector is calculated according to the adjustment result, and when the similarity between the first output vector and the first average vector is greater than or equal to the first
- a target threshold is used, the first output vector is used as the output vector of the second object and output.
- the output vector is used to represent the information when the hyperlink object is used as the link target.
- the second obtaining unit 1106 may be specifically configured to average the first input vector and the input vector corresponding to each word in the first context, where the first context is a context in which the first object contains the first hyperlink, and the first context is removed. Get the context after the hyperlink.
- the first average vector may be obtained according to the first input vector and the first context vector.
- the method for obtaining the first average vector may be: averaging the input vector corresponding to the first input vector and each word in the first context to obtain the first average vector.
- the vector length of the first input vector is the same as the length of the input vector of the word in the context. You can average the values of the elements at each position in the input vector corresponding to the first input vector and the word in the first context. Way to get the first average vector.
- the first input vector may be an initial input vector of the first object, or may be an intermediate input of the first object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- the first output vector may be an initial output vector of the second object, or may be an intermediate output vector of the second object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- the similarity between the output vector of the target object and the average vector is adjusted to be greater than or Equal to the first target threshold, the output vector of the target object can more accurately represent the source object (the input vector of the source object) referencing the target object and the information describing the target object in the source object referencing the target object ( Hyperlink context vector), so that the input vector of the source object can more accurately represent its content (the input vector of the source object) and the reference target object (the output vector of the target object).
- the foregoing adjustment process may be by inputting a first input vector and a word vector corresponding to each word in the first context to a second target model, obtaining a first average vector from the second target model, and An average vector is compared with the first output vector of the second object, and at least one of the first input vector, the first context vector, and the first output vector is adjusted to increase the similarity between the first average vector and the first output vector. Sex.
- the input of the second target model may include an initial input vector and an initial output vector of each hyperlink object, and an initial input vector of each word in the hyperlink object set.
- the second objective model uses an optimization algorithm to optimize the objective function by adjusting the input vector and output vector of each hyperdocument and the input vector of each word in the hyperlink object set.
- the variables of the objective function are the input vector and output vector of the hyperlink object, and the input vector of each word in the set of hyperlink objects.
- the objective function is used to solve the input vector and output vector of each hyperlink object and the input vector of each word in the set of hyperlink objects that satisfy the following conditions: the source of each hyperlink in all the hyperlinks contained in the set of hyperlink objects The average vector of the input vector of the object and the input vector corresponding to the word in the context containing the hyperlink has the highest similarity with the output vector of the target object of the hyperlink.
- the first input vector and the first output vector may be updated to the adjusted first input vector and the first output vector.
- the output output vector can be used to update the first output vector.
- the first input vector may also be used as the input vector of the first object and output, and the output input vector may be used to update the first input vector.
- the input vector of each word in the adjusted average vector can also be used as the input vector of the corresponding word, and the input vector of the output word can be used to update the input vector of the word.
- first input vector and the first output vector are updated to the adjusted first input vector and the first output vector
- other hyperlinks in all objects can be located, and the located hyperlink is used as the first hyperlink.
- the source object of the hyperlink is taken as the first object
- the target object of the hyperlink is taken as the second object, and the foregoing steps are repeatedly performed until the hyperlinks contained in all the objects have been processed.
- the steps of locating all the hyperlinks and processing the positioned hyperlinks can be repeatedly performed to obtain a more accurate vectorized representation of the hyperlink object.
- all the above objects may be all objects in a hyperlink object set.
- the obtained output vector (and input vector) can be used in multiple scenes, and the above scenes can include, but are not limited to, super documents Classification, similar document and keyword search, citation recommendation.
- the first output vector is used as the output vector of the second object and output, an input vector corresponding to each word in the third object is obtained; and the first Target parameters of the two objects; determine whether to allow the second object to be referenced by the third object according to the target parameters.
- Determining whether to allow the second object to be referenced by the third object according to the target parameter includes: determining that the second object is allowed to be referenced by the third object when the value of the target parameter is higher than the second target threshold; or, in the candidate object set When the value of the target parameter of the second object is the largest, it is determined that the second object is allowed to be referenced by the third object, and the candidate object set includes the second object.
- Inserting a third hyperlink pointing to a second object at a target position in the third object includes: finding a target word in the third object, where between an input vector corresponding to the target word and an output vector of the second object The similarity of is higher than the third target threshold; a third hyperlink is inserted at a position after the target word in the third object.
- first context information of a first hyperlink in a first object is converted into a first context vector; a first input vector when the first object is used as a link source is obtained, where the first object includes a pointer to a second object Information of the first hyperlink; obtaining a first average vector according to the first context vector and the first input vector; adjusting at least one of the first input vector, the first context vector, and the first output vector corresponding to the second object ; Calculate the similarity between the first output vector and the first average vector according to the adjustment result; when the similarity between the first output vector and the first average vector is greater than or equal to the first target threshold value, use the first output vector as the second object Output vector and output.
- the foregoing apparatus further includes:
- An update unit 1112 is configured to update the first input vector and the first output vector to the adjusted first input vector and the first output vector after the first output vector is used as the output vector of the second object and output.
- the foregoing apparatus further includes:
- the first execution unit 1114 is configured to repeatedly perform the following steps after the update unit 1112 updates the first input vector and the first output vector to the adjusted first input vector and the first output vector:
- the vector corresponds to at least one of the second output vector of the target object; the similarity between the second output vector and the second average vector is calculated according to the adjustment result, and when the similarity between the second output vector and the second average vector is greater than or equal to When the first target threshold value is used, the second output vector is used as the output vector of the target object and output.
- updating the first input vector and the first output vector to the adjusted first input vector and the first output vector can ensure the validity of the first input vector and the second input vector. Further, by locating the hyperlinks in all objects and performing the adjustment step on each hyperlink, the ability of the input vector and output vector of each object to represent the object can be improved.
- the foregoing apparatus further includes:
- a third obtaining unit 1116 configured to obtain an input vector corresponding to each word in the third object after using the first output vector as the output vector of the second object and outputting the output vector;
- a determining unit 1118 is configured to determine a target parameter of the second object according to an input vector of each word of the third object and an output vector of the second object; and determine whether to allow the second object to be referenced by the third object according to the target parameter.
- the determining unit 1118 includes:
- a first determining module configured to determine that a second object is allowed to be referenced by a third object when the value of the target parameter is higher than the second target threshold;
- the above device further includes:
- the second execution unit 1120 is configured to execute, if it is determined according to the target parameter, that the second object is allowed to be referenced by the third object:
- the second execution unit 1120 is specifically configured to:
- a reference recommendation is performed for the third object by using the output vector converted by the hyperlink, and a hyperlink pointing to the recommended hyperlink object is inserted into the third object, so that after the output vector is obtained, it is the third object.
- the referral recommendation is performed, thereby improving the application value of the processing method of hyperlinks and the use value of the third object (for example, use in scenarios such as thesis writing and web design).
- An embodiment of the present application further provides a hyperlink processing apparatus. As shown in FIG. 12A, the apparatus includes:
- a first obtaining unit 1202 is configured to obtain a first input vector when the first object is used as a link source, where the first input vector is used to represent at least the first object and content of the first object describing the second object, and the first object Contain information of a first hyperlink pointing to a second object;
- a second obtaining unit 1204 configured to obtain a first output vector when the second object serves as a link target
- the adjusting unit 1206 is configured to adjust to obtain an output vector of the second object according to at least the first input vector and the first output vector.
- the above-mentioned hyperlink processing device may be used in, but not limited to, classification, recommendation, and retrieval of a specific object.
- the first input vector when the first object is used as the link source is obtained through the above-mentioned hyperlink processing device, where the first input vector is used to represent at least the first object and the first object. Describes the content of the second object in the first object, the first object contains information of the first hyperlink pointing to the second object; obtaining the first output vector when the second object is the link target; and at least according to the first input vector and the first output The vector is adjusted to obtain the output vector of the second object, and the output vector is used to represent the information when the hyperlink object is used as the link target, thereby achieving the purpose of avoiding the loss of key information and achieving the technical effect of improving information integrity.
- one or more word vectors corresponding to the first context of the first hyperlink may be obtained, where the first context is the first The context obtained by removing the first hyperlink from the context containing the first hyperlink in the object; obtaining a first average vector according to the first input vector and one or more word vectors corresponding to the first context; and according to the first average vector and For a first output vector, at least one of the first input vector, one or more word vectors corresponding to the first context, and the first output vector is adjusted to obtain an output vector of the second object.
- the first average vector may be obtained according to the first input vector and the first context vector.
- the method for obtaining the first average vector may be: averaging the input vector corresponding to the first input vector and each word in the first context to obtain the first average vector.
- the vector length of the first input vector is the same as the length of the input vector of the word in the hyperlink. You can average the values of the elements at each position in the input vector corresponding to the first input vector and the word in the first context. Way to get the first average vector.
- the first input vector may be an initial input vector of the first object, or may be an intermediate input of the first object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- the first output vector may be an initial output vector of the second object, or may be an intermediate output vector of the second object obtained during an iterative manner of obtaining an input vector and an output vector of each hyperlink object.
- adjusting at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector includes: calculating a similarity between the first average vector and the first output vector; Based on the similarity optimization algorithm, at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector is adjusted so that the similarity between the first output vector and the first average vector is greater than Or equal to the target threshold.
- the foregoing adjustment process may be by inputting a first input vector and a word vector corresponding to each word in the first context to a target model, obtaining a first average vector from the target model, and combining the first average vector with The first output vector of the second object is compared, and at least one of the first average vector, the first input vector, and the first output vector is adjusted to increase the similarity between the first average vector and the first output vector.
- the input of the target model may include an initial input vector and an initial output vector of each hyperlink object, and an initial input vector of each word in the hyperlink object set.
- the objective model uses an optimization algorithm to optimize the objective function by adjusting the input vector and output vector of each hyperlink object and the input vector of each word in the hyperlink object set.
- the variables of the objective function are the input vector and output vector of the hyperlink object, and the input vector of each word in the set of hyperlink objects.
- the objective function is used to solve the input vector and output vector of each hyperlink object and the input vector of each word in the set of hyperlink objects that satisfy the following conditions: the source of each hyperlink in all the hyperlinks contained in the set of hyperlink objects The average vector of the input vector of the object and the input vector corresponding to the word in the context containing the hyperlink has the highest similarity with the output vector of the target object of the hyperlink.
- the located hyperlink is used as the first hyperlink
- the source object of the hyperlink is used as the first Object
- the steps of locating all the hyperlinks and processing the positioned hyperlinks can be repeatedly performed to obtain a more accurate vectorized representation of the hyperlink object.
- a first input vector when a first object is used as a link source is obtained, where the first input vector is used to represent at least the first object and the content describing the second object in the first object, and the first object includes a pointer to the first object.
- Information of the first hyperlinks of the two objects; obtaining the first output vector when the second object is the link target; and adjusting at least the first input vector and the first output vector to obtain the output vector of the second object, by using the output Vectors are used to represent hyperlink objects, which avoids the loss of key information and improves the integrity of the information.
- the foregoing apparatus further includes:
- the third obtaining unit 1208 is configured to obtain one or more word vectors corresponding to the first context of the first hyperlink after obtaining the first input vector when the first object is used as the link source, where the first context is the first An object contains a context obtained by removing the first hyperlink from the context of the first hyperlink;
- a fourth obtaining unit 1210 configured to obtain a first average vector according to the first input vector and one or more word vectors corresponding to the first context;
- the adjusting unit 1206 includes an adjusting module configured to adjust at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector according to the first average vector and the first output vector. One to get the output vector of the second object.
- the adjustment module includes:
- An adjustment sub-module for adjusting at least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector based on the similarity optimization algorithm, so that the first output vector and the first average vector The similarity between them is greater than or equal to the target threshold.
- At least one of the first input vector, the one or more word vectors corresponding to the first context, and the first output vector is adjusted according to the first average vector and the first output vector to obtain a second object.
- the output vector can ensure the ability of the obtained output vector to represent the second object.
- a similarity optimization algorithm is used for at least one of the first input vector, one or more word vectors corresponding to the first context, and the first output vector. The adjustment can improve the expression ability of the output vector to the second object.
- the foregoing apparatus further includes:
- the execution unit 1212 is configured to, after adjusting to obtain the output vector of the second object, repeatedly execute the following steps, and output the output vectors of all objects:
- a second input vector when a source object of a second hyperlink is used as a link source wherein the second input vector is used to represent at least the source object and the content describing the target object in the source object, and the source object contains a second hyperlink pointing to the target object Linked information
- An embodiment of the present application further provides a storage medium.
- the storage medium stores a computer program, and the computer program is configured to execute the steps in any one of the foregoing method embodiments when running.
- the storage medium may include: a flash disk, a read-only memory (ROM), a random access device (Random Access Memory, RAM), a magnetic disk, or an optical disk.
- An embodiment of the present application further provides an electronic device for implementing the foregoing method for processing a hyperlink.
- the electronic device includes a processor 1302, a memory 1304, a transmission device 1306, and the like.
- a computer program is stored in the memory, and the processor is configured to execute the steps in any one of the foregoing method embodiments by executing the computer program.
- the electronic device may be located in at least one network device among a plurality of network devices in a computer network.
- FIG. 13 is only a schematic, and the electronic device may also be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a handheld computer, and a mobile Internet device (Mobile Is MID), PAD and other terminal equipment or server.
- FIG. 13 does not limit the structure of the electronic device.
- the electronic device may further include more or fewer components (such as a network interface, etc.) than those shown in FIG. 13, or have a different configuration from that shown in FIG.
- the memory 1304 may be used to store software programs and modules, such as program instructions / modules corresponding to the method and device for processing hyperlinks in the embodiments of the present application.
- the processor 1302 runs the software programs and modules stored in the memory 1304, thereby Execute various functional applications and data processing, that is, the processing method for implementing the above hyperlink.
- the memory 1304 may include a high-speed random access memory, and may further include a non-volatile memory, such as one or more magnetic storage devices, a flash memory, or other non-volatile solid-state memory.
- the memory 1304 may further include memory remotely set with respect to the processor 1302, and these remote memories may be connected to the terminal through a network. Examples of the above network include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
- the transmission device 1306 is used to receive or send data via a network.
- Specific examples of the foregoing network may include a wired network and a wireless network.
- the transmission device 1306 includes a network adapter (Network Interface Controller, NIC for short), which can be connected to other network devices and routers through a network cable so as to communicate with the Internet or a local area network.
- the transmission device 1306 is a radio frequency (RF) module, which is used to communicate with the Internet in a wireless manner.
- RF radio frequency
- the integrated unit in the foregoing embodiment When the integrated unit in the foregoing embodiment is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in the computer-readable storage medium.
- the technical solution of the present application essentially or part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium.
- Several instructions are included to cause one or more computer devices (which may be personal computers, servers, or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种超链接的处理方法和装置及存储介质。其中,该方法包括:将第一对象中第一超链接的第一上下文信息转换为第一上下文向量(S202);获取所述第一对象作为链接源时的第一输入向量,其中,所述第一对象包含指向第二对象的所述第一超链接的信息(S204);根据所述第一上下文向量和所述第一输入向量获取第一平均向量(S206);调整所述第一输入向量、所述第一上下文向量和对应于所述第二对象的第一输出向量中的至少之一(S208);根据调整结果计算得到所述第一输出向量与所述第一平均向量的相似度,当所述第一输出向量与所述第一平均向量的相似度大于或者等于第一目标阈值时,将所述第一输出向量作为所述第二对象的输出向量并输出(S210)。
Description
本申请要求于2018年7月13日提交中国专利局、申请号为201810771876.4,申请名称为“超链接的处理方法和装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及计算机领域,具体而言,涉及一种超链接的处理方法和装置及存储介质。
具有超链接结构的文本称为超文本文档或超文本。对于超文本文档,首先需要对超文本文档进行向量化处理,将其表示为一个定长的特征向量的形式。
发明内容
本申请实施例提供了一种超链接的处理方法和装置及存储介质,以至少解决由于将超链接对象转化为普通对象造成的信息丢失的技术问题。
本申请实施例提供了一种超链接的处理方法,包括:将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;获取所述第一对象作为链接源时的第一输入向量,其中,所述第一对象包含指向第二对象的所述第一超链接的信息;根据所述第一上下文向量和所述第一输入向量获取第一平均向量;调整所述第一输入向量、所述第一上下文向量和对应于所述第二对象的第一输出向量中的至少之一;根据调整结果计算得到所述第一输出向量与所述第一平均向量的相似度,当所述第一输出向量与所述第一平均向量的相似度大于或者等于第一目标阈值时,将所述第一输出向量作为所述第二对象的输出向量并输出。
本申请实施例提供了一种超链接的处理方法,包括:获取第一对象作为链接源时的第一输入向量,其中,所述第一输入向量至少用于表示所述第一对象以及所述第一对象中描述第二对象的内容,所述第一对象包含指向所述第二对象的第一超链接的信息;获取第二对象作为链接目标时的第一输出向量;以及至少根据所述第一输入向量以及第一输出向量,调整得到所述第二对象的输出向量。
本申请实施例还提供了一种超链接的处理装置,包括:转换单元,用于将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;第一获取单元,用于获取所述第一对象作为链接源时的第一输入向量,其中,所述第一对象包含指向第二对象的所述第一超链接的信息;第二获取单元,用于根据所述第一上下文向量和所述第一输入向量获取第一平均向量;调整单元,用于调整所述第一平均向量、所述第一输入向量和对应于所述第二对象的第一输出向量中的至少之一;输出单元,用于根据调整结果计算得到所述第一输出向量与所述第一平均向量的相似度,当所述第一输出向量与所述第一平均向量的相似度大于或者等于第一目标阈值时,将所述第一输出向量作为所述第二对象的输出向量并输出。
本申请实施例还提供了一种超链接的处理装置,包括:第一获取单元,用于获取第一对象作为链接源时的第一输入向量,其中,所述第一输入向量至少用于表示所述第一对象以及所述第一对象中描述第二对象的内容,所述第一对象包含指向所述第二对象的第一超链接的信息;第二获取单元,用于获取第二对象作为链接目标时的第一输出向量;调整单元,用于至少根据所述第一输入向量以及第一输出向量,调整得到所述第二对象的输出向量。
本申请实施例还提供了一种电子装置,包括:处理器以及与所述处理器相连接的存储器,所述存储器中存储有可由所述处理器执行的计算机程序,所述处理器执行所述计算机程序以执行上述方法。
本申请的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序可被处理器执行以完成上述方法的操作。
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的一种超链接的处理方法的应用环境的示意图;
图2是根据本申请实施例的一种超链接的处理方法的流程示意图;
图3是根据本申请实施例的一种超文本文档及超链接的示意图;
图4是根据本申请实施例的又一种超链接的处理方法的示意图;
图5是根据本申请实施例的又一种超链接的处理方法的示意图;
图6是根据本申请实施例的又一种超链接的处理方法的示意图;
图7是根据本申请实施例的一种h-d2v的网络结构的示意图;
图8是根据本申请实施例的又一种超链接的处理方法的示意图;
图9是根据本申请实施例的又一种超链接的处理方法的示意图;
图10是根据本申请实施例的又一种超链接的处理方法的示意图;
图11A是根据本申请实施例的一种超链接的处理装置的结构示意图;
图11B是根据本申请实施例的一种超链接的处理装置的结构示意图;
图11C是根据本申请实施例的一种超链接的处理装置的结构示意图;
图11D是根据本申请实施例的一种超链接的处理装置的结构示意图;
图11E是根据本申请实施例的一种超链接的处理装置的结构示意图;
图12A是根据本申请实施例的另一种超链接的处理装置的结构示 意图;
图12B是根据本申请实施例的另一种超链接的处理装置的结构示意图;
图12C是根据本申请实施例的另一种超链接的处理装置的结构示意图;
图13是根据本申请实施例的一种电子装置的结构示意图。
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
目前,对超文本文档的向量化处理方式,通常是将超文档转化为普通文档,然后进行向量化处理。然而,上述超文本文档的处理方式,忽略了超文本文档中的超链接内容与源文档及相关上下文之间的关系,造成信息丢失,例如,上下文被视为目标文档的绝对描述,丢失了源文档 提供的背景信息,使得源文档引用目标文档的意图无法通过比较这两个文档来得到。
本申请实施例提供了一种超链接的处理方法,上述超链接的处理方法可以但不限于应用于如图1所示的应用环境中。在终端102中,将超链接对象通过网络104发送给服务器106。服务器106在接收到一个或多个终端102发送的超链接对象之后,对于各超链接对象,将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;获取第一对象作为链接源时的第一输入向量,其中,第一对象包含指向第二对象的第一超链接的信息;根据第一上下文向量和第一输入向量获取第一平均向量;调整第一输入向量、第一上下文向量和对应于第二对象的第一输出向量中的至少之一;根据调整结果计算得到第一输出向量与第一平均向量的相似度,当第一输出向量与第一平均向量的相似度大于或者等于第一目标阈值时,将第一输出向量作为第二对象的输出向量并输出。得到的各超链接对象的输出向量,可以供特定应用对超文本文档做分类、推荐、检索时使用。
在本实施例中,超链接是指从一个对象中的某句话中,指向另一个对象的标记,例如网页中的统一资源定位符(Uniform Resource Locator,简称为URL),学术论文中的引用等。超链接对象,是指含有超链接的对象,包括但不限于:超文本文档。超文本文档(简称为超文档)是指含有超链接的文档,包括但不限于普通网页和学术论文。对于一个超链接,包含超链接的对象称为源对象(或称为链接源),超链接所指向的对象称为目标对象(或称为链接目标)。对于一个对象,其可以包含有一个或多个超链接来引用一个或多个对象,也可以被一个或多个其他对象所引用。
在其它实施例中,对超链接对象执行处理操作和对超链接对象做分类、推荐、检索操作的服务器可以是同一服务器,也可以是不同的服务器。本实施例对此不作限定。
在本实施例中,上述终端可以包括但不限于以下至少之一:手机、平板电脑、PC机等。上述网络可以包括但不限于无线网络或有线网络,其中,该无线网络包括:蓝牙、WIFI及其它实现无线通信的网络,该有线网络包括:局域网、城域网及广域网。上述服务器可以包括但不限于以下至少之一:PC机及其它用于提供服务的设备。上述只是一种示例,本实施例对此不做任何限定。
根据本申请一实施例,如图2所示,一种可由计算设备执行的超链接的处理方法可以包括以下步骤。
步骤S202,将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;
步骤S204,获取第一对象作为链接源时的第一输入向量,其中,第一对象包含指向第二对象的第一超链接的信息;
步骤S206,根据第一上下文向量和第一输入向量获取第一平均向量;
步骤S208,调整第一输入向量、第一上下文向量和对应于第二对象的第一输出向量中的至少之一;
步骤S210,根据调整结果计算得到第一输出向量与第一平均向量的相似度,当第一输出向量与第一平均向量的相似度大于或者等于第一目标阈值时,将第一输出向量作为第二对象的输出向量并输出。
在本申请一实施例中,上述超链接的处理方法可以但不限于对于特定物体的分类、推荐、检索等过程中。例如应用于学术论文的引用推荐中,或者相似文档及关键词的检索,或者超文档(即,超文本文档)分类中。
下面以学术论文的引用推荐为例进行说明。首先,服务器获取已有论文的论文集合,对于论文集合中的各超链接,将第一论文(第一对象,超链接的链接源)中某一超链接的上下文信息转换为上下文向量,获取第一论文作为链接源时的输入向量,其中,第一论文包含指向第二论文 (第二对象,超链接的链接目标)的超链接的信息;根据上下文向量和输入向量获取第一平均向量;调整第一平均向量、第一输入向量和对应于第二论文的输出向量中的至少之一;根据调整结果计算得到输出向量与平均向量的相似度,当输出向量与平均向量的相似度大于或者等于目标阈值时,将输出向量作为第二论文的输出向量并输出。在获取到各已有论文的输出向量(和输入向量)之后,可以将各已有论文的输出向量(和输入向量)保存在到服务器。
通过上述方案,用户在撰写学术论文时,可以在终端上使用目标APP进行引用推荐,该目标APP与保存了上述输出向量和输入向量的服务器可进行数据交互。例如,在目标APP的指定输入位置输入一段上下文,目标APP将该上下文发送到服务器,服务器根据各已有论文的输出向量以及该上下文,使用目标公式对各已有论文进行打分(以下将进行详细论述),获取各已有论文的评分值,根据获取的评分值确定一个或多个可以被该上下文引用的已有论文,并将指向确定的一个或多个已有论文的一个或多个超链接发送到该终端,以便用户对已有论文进行引用。可以理解,一个或多个超链接可嵌接在该上下文中进行发送,也可以直接通过与摘要结合显示的方式进行发送,本申请实施例对此不做限定。
在本申请一实施例中,对于已有论文进行处理的服务器、保存处理结果的服务器、接收上下文的服务器和对已有论文进行打分的服务器可以相同,也可以不同,本实施例中对此不作限定。
需要说明的是,在本实施例中,通过上述超链接的处理方法,通过结合超链接对象作为链接源时的输入向量,以及对应的上下文向量,来与超链接对象作为目标对象时的输出向量进行相似度对比,从而能够使得输出向量充分包含超链接对象的完整信息,达到了避免丢失关键信息的目的,实现了提高信息完整性的技术效果。
下面以超文档为例对超链接及超链接对象进行说明。如图3所示,源文档(Zhao and Gildea,2010,即,赵和吉尔德,2010,简写为d
s)在一 句话形成的上下文("We also evaluate our model by computing the machine translation BLEU score(Papineni et al.,2002)using the Moses system(Koehn et al.,2007)",即“我们也通过使用摩西系统(科恩等人,2007)计算及其翻译的BLEU评分(Papinenni等人,2002)来评估我们的模型”,简写为C)中,通过一个超链接指向目标文档((Papinenni等人,2002)和(科恩等人,2007),简写为d
t),上述超链接可以记为<d
s,C,d
t>。
在本实施例中,可以对超链接对象进行向量化表示。向量化表示是将一个抽象物体(如,词语、文档、用户等)表达成一个定长的特征向量的方式。向量化表示的物体可以供之后的特定应用对原物体做分类、推荐、检索时使用。不同于传统特征工程中人工提取特征向量每一维的方法,向量化表示方法使用自动的方法得到整个特征向量。
以下以文档为例对向量化表示进行说明。对普通文档(即,不包含超链接的文档)进行向量化处理的方法可以有多种。可以包括但不限于:word2vec和doc2vec。现分别对两种方式进行说明。
word2vec(简称为w2v):对于文档中的每个词,w2v都会学得一个IN向量和一个OUT向量。词向量学习的技术包含cbow和skip-gram这两个变种:cbow方法将上下文词的IN向量作平均,并以此预测当前词的OUT向量。skip-gram方法则是利用当前词的IN向量来预测上下文词的OUT向量。由于普通文档中词的关系是相互的(a词是b词的上下文词,b词也是a词的上下文词),因此,学习得到的某个词的IN向量和OUT向量两者是类似的。
doc2vec(简称为d2v):d2v基于w2v扩展,也包含两个变种:pv-dm和pv-dbow。pv-dm方法类似cbow,区别只是当前文档的IN向量也作为一个特殊的上下文词向量计入平均。类似地,pv-dbow方法利用当前文档的IN向量,以及类似skip-gram的网络结构来预测文档中词的OUT向量。
以下分别对超文档进行向量化处理的两种方式(引用即词语和上下 文即内容)进行说明,其中,引用即词语的方式是将超文档转化为普通文档并对应调用w2v得到文档向量,上下文即内容的方式是将超文档转化为普通文档并对应调用d2v得到文档向量。
引用即词语(将源文档中的引用信息作为源文档中的词语进行处理):将在源文档中引用目标文档而产生的文档ID(源文档中的引用信息)视为特殊词语。利用w2v对包含特殊词语的超文档集合(多个超文档的结合)求得所有“词”的向量,将其中特殊词语的“词”向量视为对应超文档的向量。如图4所示。上述源文档d
s中的(科恩等人,2007)为引用信息,也就是特殊词语。
上下文即内容(将源文档中的超链接的上下文作为目标文档中的内容进行处理):首先删除源文档中所有超链接;然后将各超链接的上下文C加入目标文档中,作为目标文档的内容;再将转化后的普通文档输入d2v,得到文档向量,如图5所示。这种方法称为d2v-cac。与此对应地,将删除超链接但不在目标文档中补充上下文就输入d2v的方法称为d2v-nc。
需要说明的是,相关技术中超文档向量化的方法可以用于学术论文中做引用推荐,但通常采用的方式本质上属于引用即词语方法,而且只能用于本领域本任务。向量化方法还可以用于求无向图中节点向量,但采用的方式只处理超文档之间的链接结构,而不考虑文本内容信息。
结合图6,与上述两种方式相比较,对于上述超链接<d
s,C,d
t>,根据本申请实施例提供的超链接的处理方法,可以获取到:
超链接的上下文C(本示例中的C为包含超链接的语句去除超链接后得到的上下文)中的词所对应的IN向量w
I;
超链接所对应的目标文档d
t的OUT向量(作用同前述第一输出向量,OUT向量可以是目标文档的初始输出向量,也可以是在训练过程中得到的目标文档的中间输出向量)以及其他超文档的OUT向量。
输出层:所有超文档D的OUT向量d
O形成一个softmax分类器,来为x匹配合适超文档,推荐的依据为:
以上输入层结构和pv-dm模型类似,只是输出层由softmax词分类器换成了文档分类器。
对于所有超链接C={<d
s,C,d
t>},可以通过梯度下降等优化算法优化如下目标函数:
其中,D
I为超文档集合中各超文档的IN向量组成的矩阵,D
O为超文档集合中各超文档的OUT向量组成的矩阵,W
I为超文档集合中所有词的IN向量组成的矩阵。
在网络的训练阶段,为了使得学得的超文档向量是内容敏感的,使用文档中的词来预训练文档向量。本示例中使用类似pv-dm的方式来优化上述目标函数,先对图6中所示的网络进行若干轮(例如,5轮)pv-dm迭代,再利用上式目标函数进行若干轮迭代优化。
与w2v/d2v类似地,为了加速训练过程,本示例中使用如下negative sampling的公式(3)来近似logP(d
t/d
s,C):
其中,n为采样的个数,P
N(d)是所有文档集合上的平均分布。用近似后的logP(d
t/d
s,C)来代替前述目标函数中的logP(d
t/d
s,C)。
在本申请一实施例中,在图6后的训练步骤中,可以采用retraining/fine-tuning的方法(也称为retro-fitting),还可以将两个目标线性组合,使用联合优化或多目标学习的方法来学习文档和词向量。
在本申请一实施例中,在步骤negative sampling中,还可以使用hierarchical softmax的方法来简化和加速学习过程。
在本申请一实施例中,P
N(d)还可以选择让各文档的概率正比于文档被引用的次数(被引用0次的文档使用平滑技术得到非0概率)。可以采用与w2v/d2v的sub-sampling类似的方法来防止高被引文档被取样太多次。在本申请一实施例中,还可以使用hierarchical softmax的方法来简化和加速学习过程。
例如,在一个1000件文献的数据库中,当需要对数据库中的这1000件文献进行超链接处理时,会逐一进行处理。例如,第101号文献引用了第108号文献,其中ds表示的即是101这件文献本身的向量,该向量可预先存储起来。当处理到101号文献时,将会先获取101号文献本身的向量,然后定位到101号文献中包含了超链接的部分,并获取该超链接的上下文信息以及对应的上下文向量(相当于向量C)。同时,还会获取这个超链接指向的目标文献,即,108号文献对应的向量(相当于输出向量dt)。获取这些向量之后,对ds和C的向量进行平均,得到平均化向量(假设是da)。此后,将平均向量da、输出向量dt、以及要优化的目标函数放入梯度下降等优化算法,算法会给出如何调整ds、C、dt才能使目标函数增加的建议。重复以上过程多次,直到目标函数几乎不再增加。当目标函数不再增加时,优化算法会输出所有文档的IN,OUT向量。
在本示例中,可以采用多个标准评价超链接的处理方法,上述标准可以包括但不限于:上下文敏感、内容敏感、新文档友好以及上下文意图敏感。这四个标准具体说明如下。
上下文敏感:超文档的超文档向量必须受到指向其的其他超文档中的超链接上下文(即,在其它指向当前文档的超文档中,那些超文档用什么样的上下文来描述当前文档)的影响。
内容敏感:超文档向量必须受自身内容影响。
新文档友好:对于新产生的超文档,如新网页、新论文,很可能没有其它文档指向自身。对于这种新文档,超文档向量化方法也应能为新文档产生超文档向量。
上下文意图敏感:不同于以上三条针对超文档向量的标准,本条标准是针对超文档向量化过程中的词向量而言的。一个好的超文档向量化方法应该能将超链接的意图(如,广义的指代目标文档,或赞同/反对目标文档中的观点和方法)表达在上下文词对应的向量中。
前述w2v、d2v-nc和d2v-cac均在上述评价标准的某一个或几个标准上存在缺点,以下逐一进行说明。
首先,w2v方法不是内容敏感的。如图4所示,“…计算机器翻译BLEU评分…”虽然是源文档(赵和吉尔德,2010)中的内容,但在转化后已经和这篇文档没有关系了:即词向量中并未体现文档ID(赵和吉尔德,2010)。此外,对于新发表、未被引用的论文,w2v方法不会为其文档ID生成特殊词语,也就不会为其生成“词”向量,因此w2v也不是新文档友好的。
其次,d2v-nc方法不是上下文敏感的,这是因为去掉了超链接却又不将上下文补充到目标文档,使得目标文档与上下文之间的联系被丢失了。
最后,上述三种方法(w2v方法、d2v-cac方法和d2v-nc方法)都 不是上下文意图敏感的。在对超链接进行建模的时候,超链接的三要素-源文档、上下文、目标文档被这三种方法简化为上下文和目标文档的关系。这使得上下文被视为目标文档的绝对描述,丢失了源文档提供的背景信息,使得源文档引用目标文档的意图无法通过比较这两个文档来得到。
本示例中所提供的超链接的处理方法能够符合上述所有四个标准,即,上下文敏感、内容敏感、新文档友好、上下文意图敏感。下面以图7所示的超文档为例进行说明:
上下文敏感:当优化前述目标函数时,(Papineni等人,2002)的OUT向量会受到上下文词(如“BLEU”)的IN向量的影响。
内容敏感:通过pv-dm模型的预训练,每个超文档的IN向量会受到文中词语的影响。
新文档友好:如果一个超文档没有被任何超链接指向,它最坏可以依靠自己的内容来得到IN向量。同时,negative sampling的负例个数n足够大时,也会为当前文档生成OUT向量。
上下文意图敏感:前述目标函数使得源文档/目标文档对和上下文词语各自的向量可以互相改进。在(赵和吉尔德,2010)这篇机器翻译源文档的背景下,上下文词向量把“evaluate by”这两个词隐含的“使用目标文档中的方法/技术”意图捕捉到,可以使得超链接更好的预测(Papineni等人,2002)是关于机器翻译中BLEU评价方法的目标文档。而当图4的网络在多个类似(赵和吉尔德,2010)/(Papineni等人,2002)(源文档/目标文档)的文档对图4中的网络进行训练后,也能更好的捕捉到“使用目标文档中的方法/技术”这种语义。
表1示出了结合上述四个标准对w2v、d2v-nc、d2v-cac和h-d2v进行分析的分析结果。
表1
由表1可知,本示例中所提供的超链接的处理方法(h-d2v)通过对超链接的三要素(源文档、上下文、目标文档)直接建模,避免了关键信息的丢失。通过这种建模方式,h-d2v能够符合所有四个标准。
需要说明的是,对于每个对象,可以有与之对应的一个输入向量和一个输入向量。其中,输入向量为该对象作为链接源所对应的向量,输入向量中包含的信息可以用于表示该对象,以及该对象所引用的目标对象。输出向量为该对象作为链接目标时所对应的向量,输出向量中包含的信息可以用于表示引用该对象的源对象,以及引用该对象的源对象中用于描述该对象的内容。
在本实施例中,在将第一对象中第一超链接的第一上下文信息转换为第一上下文向量之前,可以首先获取各个超链接对象的初始输入向量和初始输出向量。
在本实施例中,各超链接对象的初始输入向量可以是将超链接对象集合输入到第一目标模型中得到的各超链接对象的文档向量。第一目标模型用于对超链接对象集合中的各超链接对象进行向量化处理,得到各超链对象的文档向量,将得到的各超链接对象的文档向量作为各超链接对象的初始输入向量。通过第一目标模型还可以得到各个词的输入向量。可以理解,超链接对象的初始输出向量可以通过目标算法随机生成。
在本实施例中,在将超链接对象集合输入到第一目标模型中之前, 可以对超链接对象集合进行处理,将超链接对象集合中的所有超链接对象转化为普通对象。转化的方式可以包括但不限于:直接删除超链接、删除源对象中的超链接并将超链接的上下文加入到目标对象作为目标对象的内容、以及将超链接对应的引用信息作为特殊词语。
在本实施例中,在将超链接对象集合中的所有超链接对象转化为普通对象之前或者之后,可以对超链接对象进行分词、词性批注等预处理操作。具体的预处理方式可以根据需要执行,本申请对此不作限定。
在本实施例中,在将超链接对象集合输入到第一目标模型中得到该超链接对象的文档向量的过程中,还可以得到超链接对象集合中各个词所对应的IN向量(或者,各个词所对应的IN向量和OUT向量)。
在本实施例中,包含上述超链接的上下文可以是超链接对象中包含该超链接的指定内容。可以通过多种方式获取包含该超链接的上下文。例如,可以通过设定上下文包含字数的方式获取包含超链接的上下文(例如,包含上述超链接的上下文可以是超链接对象中从超链接之前的50个词至超链接之后的50个词所包含的内容),也可以通过设定上下文包含的语句个数的方式获取包含超链接的上下文(例如,包含上述超链接的上下文为超链接对象中包含超链接的语句,或者为超链接对象中从包含超链接的语句前的一个句子至包含超链接的语句后的一个句子所包含的内容)。第一上下文可以是包含超链接的上下文去除超链接后得到的上下文。
在本实施例中,通过使用第一目标模型,可以获得各超链接对象的文档向量以及各个词的输入向量。
在本申请一实施例中,获取第一输入向量和第一上下文向量的第一平均向量可以包括:对第一输入向量和第一上下文中的各个词所对应的输入向量取平均,其中,第一上下文为第一对象中包含第一超链接的上下文去除第一超链接后得到的上下文。
在本实施例中,可以根据第一输入向量以及第一上下文向量获取第一平均向量。获取第一平均向量的方式可以是:对第一输入向量与第一上下文中的各个词所对应的输入向量取平均,得到第一平均向量。第一输入向量的向量长度与上下文中的词的输入向量的长度相同,可以通过对第一输入向量和第一上下文中的词所对应的输入向量中各个位置上的元素的取值取平均的方式,得到第一平均向量。
例如,第一上下文的词有5个,第一输入向量的向量长度和各个词的输入向量的长度相同,均包含6个元素,分别将第一输入向量各个位置上元素的取值与5个词的输入向量上对应位置上元素的取值取平均(例如,第一输入向量的第一个元素的取值与5个词中各个词第一个元素的取值取平均,依次类推),得到第一平均向量。
在本实施例中,第一输入向量可以是第一对象的初始输入向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第一对象的中间输入向量。第一输出向量可以是第二对象的初始输出向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第二对象的中间输出向量。
在本实施例中,对于一个超链接,通过调整,使得目标对象的输出向量与平均向量(源对象的输入向量和源对象中描述目标对象的上下文向量的平均向量)之间的相似度大于或者等于第一目标阈值,可以使目标对象的输出向量能够更准确地表示引用该目标对象的源对象(源对象的输入向量)以及引用该目标对象的源对象中用于描述该目标对象的信息(超链接的上下文向量),使源对象的输入向量能够更准确地表示自身的内容(源对象的输入向量)以及引用的目标对象(目标对象的输出向量)。
在本实施例中,上述调整过程可以是通过将第一输入向量和第一上下文中的各个词所对应的词向量输入到第二目标模型,由第二目标模型获取第一平均向量,将第一平均向量与第二对象的第一输出向量进行比 较,通过调整第一输入向量、第一上下文向量和第一输出向量中的至少之一,以增加第一平均向量与第一输出向量的相似性。
在本实施例中,第二目标模型的输入可以包括:各个超链接对象的初始输入向量和初始输出向量,以及超链接对象集合中各个词的初始输入向量。第二目标模型通过调整各个超文档的输入向量和输出向量,以及超链接对象集合中各个词的输入向量,使用优化算法来优化目标函数。目标函数的变量为超链接对象的输入向量和输出向量,以及超链接对象集合中各个词的输入向量。目标函数用于求解满足以下条件的各个超链接对象的输入向量和输出向量,以及超链接对象集合中各个词的输入向量:使得超链接对象集合中包含的所有超链接中,各超链接的源对象的输入向量与包含该超链接的上下文中的词所对应的输入向量的平均向量,与该超链接的目标对象的输出向量的相似度的总和最高。
在将第一输出向量作为第二对象的输出向量并输出之后,可以将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量。
在本实施例中,在将第一输出向量作为第二对象的输出向量并输出后,输出的输出向量可以用来更新上一次存储的输出向量。也可以将第一输入向量作为第一对象的输入向量并输出,该输入向量可以用来更新第一输入向量。还可以将调整后的平均向量中各个词的输入向量作为对应词的输入向量并输出,输出的词的输入向量可以用来更新该词的输入向量。
在将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量之后,可以定位到所有对象中的其他超链接,将定位到的超链接作为第一超链接,将该超链接的源对象作为第一对象,将该超链接的目标对象作为第二对象,重复执行前述步骤,直到所有对象中包含的超链接均已被处理。
在本实施例中,在对定位到的所有超链接均进行处理之后,可以重 复执行定位所有超链接以及对定位到的超链接进行处理的步骤,以得到超链接对象更为准确的向量化表示。
在本实施例中,上述所有对象可以是超链接对象集合中的所有对象。对于所有对象中的超链接,可以通过依次获取超链接对象集合中各对象所包含的超链接的方式(超链接可以通过<d
s,C,d
t>的方式表示),获取超链接对象集合中包含的所有超链接。
下面结合具体示例对超链接的处理方法进行说明。在本示例中,超链接对象集合为超文档集合,超链接文档为超文档。如图7所示,超链接的处理方法包括以下步骤:
步骤1,将超文档集合中的各个超文档转化为普通文档;
步骤2,使用第一目标模型(如,pv-dm模型)对超文档集合中各超文档(转化后的普通文档)的初始文档向量和对超文档集合中各个词的初始IN向量和初始OUT向量进行处理,得到超文档集合中各超文档的文档向量和对超文档集合中各个词的IN向量和OUT向量;
步骤3,使用第二目标模型对超文档集合中各超文档的初始IN向量(为使用第一模型得到的各超文档的文档向量)和初始OUT向量,以及超文档集合中各个词的IN向量(使用第一模型得到的超文档集合中各个词的IN向量)进行处理,得到超文档集合中各超文档的IN向量(作用同前述目标输入向量)和OUT向量(作用同前述目标输出向量),以及超文档集合中各个词的IN向量。
下面对本示例中的超链接的处理方法进行具体说明。
对于包含100篇超文档的超文档集合,超链接的处理方法可以包括两个阶段:预训练阶段和训练阶段。其中,在预训练阶段,使用第一目标模型获取超文档集合中各个词的IN向量和OUT向量,以及各超文档的文档向量;在训练阶段,使用第二目标模型获取各超文档的IN向量和OUT向量,以及超文档集合中各个词的IN向量。
在预训练阶段,首先,将超文档集合中的100篇超文档转化为100篇普通文档,转化的方式可以如前述d2v-nc:删除超文档中的超链接但不在目标文档中补充上下文。然后,通过目标算法得到超文档集合中各个词的初始IN向量和初始OUT向量,以及各超文档的初始文档向量;将转化后的100篇普通文档、各个词的初始IN向量和初始OUT向量,以及各超文档的初始文档向量输入到pv-dm模型,得到超文档集合中各个词的IN向量和OUT向量,以及各超文档的文档向量。上述各超文档的文档向量可以作为第二目标模型中超文档的初始IN向量(可以是前述第一输入向量的一种)。
在训练阶段,第二目标模型处理超文档的方法可以称为hyperdoc2vec(简称h-d2v)。该方法通过对超链接的三要素(源文档、上下文、目标文档)直接建模,来避免关键信息的丢失。在h-d2v中,使用两个向量(IN向量和OUT向量)表示每个超文档。超文档d的IN向量d
I存储超文档d作为源文档时的信息,例如,d自身的内容、d引用了什么样的文档等。d的OUT向量d
O存储d作为目标文档时的信息,例如,d被什么样的文档引用、引用d的文档如何描述d等。利用d
I和d
O两个向量来表示一个超文档d,使得超文档和超链接能以一种自然直接的方式被向量化建模。
第二目标模型的输入为:超文档集合中各超文档的初始IN向量和初始OUT向量,以及超文档集合中各个词的IN向量。各超文档的初始IN向量可以是使用第一目标模型获得的各超文档的文档向量,各超文档的初始OUT向量可以是使用目标算法随机生成的各超文档的OUT向量,超文档集合中各个词的IN向量可以是使用第一目标模型获得的超文档集合中各个词的IN向量。
初始OUT向量的生成时机可以是在将各超文档的初始IN向量和初始OUT向量以及超文档集合中各个词的IN向量输入到第二目标模型之前的任意时间。例如,在使用第一目标模型之前,与超文档集合中各个 词的初始IN向量和初始OUT向量,以及各超文档的初始文档向量一同生成。也可以是使用第一目标模型获取到超文档集合中各个词的IN向量和OUT向量,以及各超文档的文档向量之后。具体的初始OUT向量的获取时机,本示例中不作限定。
在训练阶段,可以获取超文档集合中所有超链接C={<d
s,C,d
t>},获取超链接的操作可以是在使用第二目标模型之前执行的,也可以是使用第二目标模型之后执行的。例如,可以在使用第二目标模型之前,通过扫描超文档集合中的所有超文档,获取到所有超链接,也可以是在使用第二目标模型过程中,通过依次扫描各个超文档,获取到所有超链接。具体的获取方式以及获取时机,本示例中不作限定。
在本示例中,通过前述预训练阶段和训练阶段,可以得到超文档集合中各超文档的IN向量和OUT向量。
在本实施例中,在将第一输出向量作为第二对象的输出向量并输出之后,可以在多个场景中使用得到的输出向量(以及输入向量),上述场景可以包括但不限于:
超文档分类:对于一组标注后的超文档{<d,l>}(即,{<文档,标签>}),使用分类算法(例如,SVM等),使用超文档向量和标注组成的训练数据{<d,l>}来训练分类器,并应用在标注未知的超文档上,以预测超文档的类型。其中,预测过程中使用到的文档向量可以是超文档的IN向量或OUT向量,或者两者的拼接。
相似文档及关键词检索:得到的超文档向量(输入向量和输出向量),可用于计算文档之间的相似度(通用余弦夹角方法,使用到的是超文档的IN向量或OUT向量,或者两者的拼接)以及文档和关键词之间的相似度(使用到的是超文档的OUT向量以及超文档集合中词的输入向量),这类相似度的计算任务在互联网相关产品中非常普遍,例如,广告的精准投放(广告文档和用户搜索词之间的相似度)、知识图谱的构建(各个实体和说明之间的相似度)等。
引用推荐:在学术论文的撰写中,可以对指定的一段上下文,自动推荐合适的论文作为引用(使用到的是超文档的OUT向量以及超文档集合中词的输入向量)。假设上下文词集合为C,可以通过公式(4)来对已有论文打分:
其中,w为C中的上下文词,w
I为词的输入向量,d为待打分的文档,d
O为待打分文档的OUT向量。
在本实施例中,在为指定的上下文进行引用推荐时,可以采用多种方式确定推荐的已有论文。例如,对所有的已有论文进行打分,得到已有论文的分值;根据各已有论文的分值选择得分最高的一个或多个已有论文,为该上下文进行引用推荐。又例如,可以设置引用推荐的已有论文分值的目标阈值,或者该目标阈值以及引用推荐的个数,在对一个已有论文进行打分之后,判断该已有论文的分值是否大于(或等于)目标阈值,如果大于目标阈值,确定将该已有论文向上下文进行引用推荐,如果推荐的个数为1,则结束;如果推荐的个数为n(n大于等于2),则再判断已推荐的个数是否等于n,如果等于n,则结束;如果小于n,继续执行打分判断的操作,直到已推荐的已有论文数等于n。
在本申请一实施例中,在将第一输出向量作为第二对象的输出向量并输出之后,获取第三对象中各个词所对应的输入向量;根据第三对象的各个词的输入向量和第二对象的输出向量,确定第二对象的目标参数;根据目标参数确定是否允许第二对象被第三对象引用。
在本申请一实施例中,根据目标参数确定是否允许第二对象被第三对象目引用包括:在目标参数的取值高于第二目标阈值的情况下,确定允许第二对象被第三对象引用;或者,在候选对象集合中第二对象的目标参数的取值最大的情况下,确定允许第二对象被第三对象引用,其中,候选对象集合包含第二对象。
在本申请一实施例中,在根据目标参数确定是否允许第二对象被第三对象引用之后,在根据目标参数确定出允许第二对象被第三对象引用的情况下,执行:在第三对象的目标位置上插入用于指向第二对象的第三超链接;在第三对象上显示用于提示第三超链接的提示信息,或,接收用于指示第三超链接在第三对象中的插入位置的指示信息;根据指示信息,在第三对象中的插入位置上插入第三超链接。
在本申请一实施例中,在第三对象中的目标位置上插入用于指向第二对象的第三超链接包括:在第三对象中查找目标词,其中,目标词所对应的输入向量与第二对象的输出向量之间的相似度高于第三目标阈值;在第三对象中的目标词之后的位置上插入第三超链接。
例如,可以在第三对象中的目标位置上自动插入用于指向第二对象的超链接。上述目标位置可以是第三对象的开始位置、结束位置或者是第三对象中间的任意位置,也可以在第三对象中查找目标词,其中,目标词所对应的输入向量与第二对象的输出向量之间的相似度高于设定阈值,或者第三对象包含的所有词中,输入向量与第二对象的输出向量之间的相似度最高的词(即,查找第三对象中与第二对象的输出向量最相似的词向量所对应的词);在第三对象中的目标词之后的位置上插入指向第二对象的超链接。通过这种方式,可以由显示第三对象的终端确定第三超链接的插入位置。
又例如,可以接收用于指示第三超链接在第三对象中插入位置的指示信息;根据该指示信息,在第三对象中的插入位置上插入第三超链接。提示信息的显示位置可以是显示第三对象的终端设备的屏幕。在显示提示信息之后,检测到用于指示第三超链接在第三对象中插入位置的输入信息;根据输入信息,在第三对象中的插入位置上插入第三超链接。通过这种方式,用户可以指定超链接的插入位置。
再例如,可以接收用于指示第三超链接在第三对象中插入位置的指示信息;根据指示信息,在第三对象中的插入位置上插入第三超链接。 通过这种方式,可以由远端装置(例如,服务器)确定第三超链接的插入位置,并由显示第三对象的终端根据指示信息在该插入位置上插入第三超链接。
下面举例对在第三对象中插入超链接的方式进行说明。如图8所示,终端设备中打开目标app,在目标app中输入目标文本(第三对象)。终端设备通过目标app获取到输入的目标文本,将目标文本通过网络发送到服务器上。服务器对已有论文进行打分,确定为目标文本推荐的已有论文为评分最高的两篇已有论文((科恩等人,2007)和(Papineni等人,2002))。向终端设备发送两篇已有论文的超链接(还可以向终端设备发送两篇已有论文或已有论文的摘要,以供用户确定是否插入已有论文以及将已有论文插入在目标文本中的什么位置)。在终端设备的目标app上显示提示信息。用户根据提示信息确定已有论文的插入位置:将(科恩等人,2007)插入到“摩西系统”之后,将(Papineni等人,2002)插入到“BLEU评分”之后。接收到目标指令之后,在目标文本中的插入位置上插入两篇已有论文的超链接。超链接的描述可以是已有论文的文档ID。
通过本实施例,将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;获取第一对象作为链接源时的第一输入向量,其中,第一对象包含指向第二对象的第一超链接的信息;根据第一上下文向量和第一输入向量获取第一平均向量;调整第一输入向量、第一上下文向量和对应于第二对象的第一输出向量中的至少之一;根据调整结果计算得到第一输出向量与第一平均向量的相似度,当第一输出向量与第一平均向量的相似度大于或者等于第一目标阈值时,将第一输出向量作为第二对象的输出向量并输出。通过使用输出向量来表示超链接对象,避免了丢失关键信息,提高了信息的完整性。
在本申请另一实施例中,在将第一输出向量作为第二对象的输出向量并输出之后,上述方法还包括以下步骤。
步骤S1,将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量。
在本申请一实施例中,在将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量之后,上述方法还包括以下步骤。
步骤S2,重复执行以下步骤:
步骤S21,定位到所有对象中的其他超链接,将定位到的其他超链接作为第二超链接;
步骤S22,将第二超链接的第二上下文信息转换为第二上下文向量;
步骤S23,获取第二超链接的源对象作为链接源时的第二输入向量,其中,源对象包含指向目标对象的第二超链接的信息;
步骤S24,根据第二上下文向量和第二输入向量获取第二平均向量;
步骤S25,调整第二输入向量、第二上下文向量和对应于目标对象的第二输出向量中的至少之一;
步骤S26,根据调整结果计算得到第二输出向量与第二平均向量的相似度,当第二输出向量与第二平均向量的相似度大于或者等于第一目标阈值时,将第二输出向量作为目标对象的输出向量并输出。
通过本实施例,将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量,可以保证第一输入向量和第二输入向量的有效性。进一步地,通过定位所有对象中的超链接,并对每个超链接执行调整的步骤,可以提高各对象的输入向量和输出向量表征该对象的能力。
在本申请另一实施例中,在将第一输出向量作为第二对象的输出向量并输出之后,上述方法还包括:
获取第三对象中各个词所对应的输入向量;
根据第三对象的各个词的输入向量和第二对象的输出向量,确定第 二对象的目标参数;根据目标参数确定是否允许第二对象被第三对象引用。
在本申请一实施例中,根据目标参数确定是否允许第二对象被第三对象目引用包括:
在目标参数的取值高于第二目标阈值的情况下,确定允许第二对象被第三对象引用;或者,
在候选对象集合中第二对象的目标参数的取值最大的情况下,确定允许第二对象被第三对象引用,其中,候选对象集合包含第二对象。
在本申请一实施例中,在根据目标参数确定是否允许第二对象被第三对象引用之后,上述方法还包括:
在根据目标参数确定出允许第二对象被第三对象引用的情况下,
在第三对象的目标位置上插入用于指向第二对象的第三超链接;
在第三对象上显示用于提示第三超链接的提示信息,或
接收用于指示第三超链接在第三对象中的插入位置的指示信息;根据指示信息,在第三对象中的插入位置上插入第三超链接。
在本申请一实施例中,在第三对象中的目标位置上插入用于指向第二对象的第三超链接包括:
在第三对象中查找目标词,其中,目标词所对应的输入向量与第二对象的输出向量之间的相似度高于第三目标阈值;在第三对象中的目标词之后的位置上插入第三超链接。
通过本实施例,通过使用超链接转化后的输出向量为第三对象进行引用推荐,并在第三对象中插入指向推荐的超链接对象的超链接,从而在得到输出向量后,为第三对象进行引用推荐,从而提高了超链接的处理方法的应用价值,以及第三对象的使用价值(例如,在论文撰写,网页设计等场景使用)。
以下结合图9,对上述超链接的处理方法进行说明,其中,超链接对象为超文档。如图9所示,服务器通过步骤S902,对超文档集合进行预处理。通过步骤S904,将超文档集合中的超文档转换成目标输入向量和目标输出向量。通过步骤S906,为目标文本进行引用推荐。
本申请实施例还提供了一种可由计算设备执行的超链接的处理方法,如图10所示,该方法包括以下步骤。
步骤S1002,获取第一对象作为链接源时的第一输入向量,其中,第一输入向量至少用于表示第一对象以及第一对象中描述第二对象的内容,第一对象包含指向第二对象的第一超链接的信息;
步骤S1004,获取第二对象作为链接目标时的第一输出向量;以及
步骤S1006,至少根据第一输入向量以及第一输出向量,调整得到第二对象的输出向量。
在本申请一实施例中,上述超链接的处理方法可以但不限于对于特定物体的分类、推荐、检索等过程中。
需要说明的是,在本实施例中,通过上述超链接的处理方法,获取第一对象作为链接源时的第一输入向量,其中,第一输入向量至少用于表示第一对象以及第一对象中描述第二对象的内容,第一对象包含指向第二对象的第一超链接的信息;获取第二对象作为链接目标时的第一输出向量;以及,至少根据第一输入向量以及第一输出向量,调整得到第二对象的输出向量,通过输出向量来表示超链接对象作为链接目标时的信息,达到了避免丢失关键信息的目的,实现了提高信息完整性的技术效果。
在本申请一实施例中,在获取第一对象作为链接源时的第一输入向量之后,可以获取第一超链接的第一上下文所对应的一个或多个词向量,其中,第一上下文为第一对象中包含第一超链接的上下文去除第一超链 接后得到的上下文;根据第一输入向量以及第一上下文所对应的一个或多个词向量,获取第一平均向量;根据第一平均向量和第一输出向量,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,以得到第二对象的输出向量。
在本实施例中,可以根据第一输入向量以及第一上下文向量获取第一平均向量。获取第一平均向量的方式可以是:对第一输入向量与第一上下文中的各个词所对应的输入向量取平均,得到第一平均向量。第一输入向量的向量长度与上下文中的词的输入向量的长度相同,可以通过对第一输入向量和第一上下文中的词所对应的输入向量中各个位置上的元素的取值取平均的方式,得到第一平均向量。
在本实施例中,第一输入向量可以是第一对象的初始输入向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第一对象的中间输入向量。第一输出向量可以是第二对象的初始输出向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第二对象的中间输出向量。
在本申请一实施例中,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一包括:计算第一平均向量以及第一输出向量的相似度;基于相似度优化算法调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,使得第一输出向量与第一平均向量之间的相似度大于或者等于目标阈值。
在本实施例中,上述调整过程可以是通过将第一输入向量和第一上下文中的各个词所对应的词向量输入到目标模型,由目标模型获取第一平均向量,将第一平均向量与第二对象的第一输出向量进行比较,通过调整第一平均向量、第一输入向量和第一输出向量中的至少之一,以增加第一平均向量与第一输出向量的相似性。
在本实施例中,目标模型的输入可以包括:各个超链接对象的初始 输入向量和初始输出向量、以及超链接对象集合中各个词的初始输入向量。目标模型通过调整各个超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量,使用优化算法来优化目标函数。目标函数的变量为超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量。目标函数用于求解满足以下条件的各个超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量:使得超链接对象集合中包含的所有超链接中,各超链接的源对象的输入向量与包含该超链接的上下文中的词所对应的输入向量的平均向量,与该超链接的目标对象的输出向量的相似度的总和最高。
在本申请一实施例中,在调整得到第二对象的输出向量之后,可以定位到所有对象中的其他超链接,将定位到的超链接作为第一超链接,将该超链接的源对象作为第一对象,将该超链接的目标对象作为第二对象,重复执行前述步骤,直到所有对象中包含的超链接均已被处理。
在本实施例中,在对定位到的所有超链接均进行处理之后,可以重复执行定位所有超链接以及对定位到的超链接进行处理的步骤,以得到超链接对象更为准确的向量化表示。
通过本实施例,获取第一对象作为链接源时的第一输入向量,其中,第一输入向量至少用于表示第一对象以及第一对象中描述第二对象的内容,第一对象包含指向第二对象的第一超链接的信息;获取第二对象作为链接目标时的第一输出向量;以及,至少根据第一输入向量以及第一输出向量,调整得到第二对象的输出向量,通过使用输出向量来表示超链接对象,避免了丢失关键信息,提高了信息的完整性。
在本申请另一实施例中,在获取第一对象作为链接源时的第一输入向量之后,上述方法还包括以下步骤。
步骤S3,获取第一超链接的第一上下文所对应的一个或多个词向量,其中,第一上下文为第一对象中包含第一超链接的上下文去除第一超链接后得到的上下文;
步骤S4,根据第一输入向量以及第一上下文所对应的一个或多个词向量,获取第一平均向量;
其中,至少根据第一输入向量以及第一输出向量,调整得到第二对象的输出向量包括:
步骤S5,根据第一平均向量和第一输出向量,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,以得到第二对象的输出向量。
在本申请一实施例中,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一包括:
步骤S51,计算第一平均向量以及第一输出向量的相似度;
步骤S52,基于相似度优化算法调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,使得第一输出向量与第一平均向量之间的相似度大于或者等于目标阈值。
通过本实施例,根据第一平均向量和第一输出向量,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,以得到第二对象的输出向量,可以保证得到的输出向量对第二对象的表示能力。进一步地,根据第一平均向量和第一输出向量的相似度,使用相似度优化算法对第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一进行调整,可以提高输出向量对第二对象的表达能力。
在本申请一实施例中,在调整得到第二对象的输出向量之后,上述方法还包括以下步骤。
步骤S6,重复执行以下步骤:
步骤S61,按照预定规则,定位所有对象中的其他超链接,将定位到的其他超链接作为第二超链接;
步骤S62,获取第二超链接的源对象作为链接源时的第二输入向量,其中,第二输入向量至少用于表示源对象以及源对象中描述目标对象的内容,源对象包含指向目标对象的第二超链接的信息;
步骤S63,获取目标对象作为链接目标时的第二输出向量;以及
步骤S64,至少根据第二输入向量以及第二输出向量,调整得到目标对象的输出向量。
通过本实施例,通过定位所有对象中的超链接,并对每个超链接执行调整的步骤,可以提高各对象的输入向量和输出向量表征该对象的能力。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于示例性的实施例,所涉及的动作和模块并不一定是本申请所必须的。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例的方法。
本申请实施例还提供了一种用于实施上述超链接的处理方法的超链接的处理装置,如图11A所示,该装置包括:
转换单元1102,用于将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;
第一获取单元1104,用于获取第一对象作为链接源时的第一输入向量,其中,第一对象包含指向第二对象的第一超链接的信息;
第二获取单元1106,用于根据第一上下文向量和第一输入向量获取第一平均向量;
调整单元1108,用于调整第一输入向量、第一上下文向量和对应于第二对象的第一输出向量中的至少之一;
输出单元1110,用于根据调整结果计算得到第一输出向量与第一平均向量的相似度,当第一输出向量与第一平均向量的相似度大于或者等于第一目标阈值时,将第一输出向量作为第二对象的输出向量并输出。
在本申请一实施例中,上述超链接的处理装置可以用于但不限于对于特定物体的分类、推荐、检索等过程中。
需要说明的是,在本实施例中,通过上述超链接的处理装置,将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;获取第一对象作为链接源时的第一输入向量,其中,第一对象包含指向第二对象的第一超链接的信息;根据第一上下文向量和第一输入向量获取第一平均向量;调整第一平均向量、第一输入向量和对应于第二对象的第一输出向量中的至少之一;根据调整结果计算得到第一输出向量与第一平均向量的相似度,当第一输出向量与第一平均向量的相似度大于或者等于第一目标阈值时,将第一输出向量作为第二对象的输出向量并输出,通过输出向量来表示超链接对象作为链接目标时的信息,达到了避免丢失关键信息的目的,实现了提高信息完整性的技术效果。
第二获取单元1106具体可以用于对第一输入向量和第一上下文中的各个词所对应的输入向量取平均,其中,第一上下文为第一对象中包含第一超链接的上下文去除第一超链接后得到的上下文。
在本实施例中,可以根据第一输入向量以及第一上下文向量获取第一平均向量。获取第一平均向量的方式可以是:对第一输入向量与第一上下文中的各个词所对应的输入向量取平均,得到第一平均向量。第一输入向量的向量长度与上下文中的词的输入向量的长度相同,可以通过对第一输入向量和第一上下文中的词所对应的输入向量中各个位置上的元素的取值取平均的方式,得到第一平均向量。
在本实施例中,第一输入向量可以是第一对象的初始输入向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第一对象的中间输入向量。第一输出向量可以是第二对象的初始输出向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第二对象的中间输出向量。
在本实施例中,对于一个超链接,通过调整,使得目标对象的输出向量与平均向量(源对象的输入向量和源对象中描述目标对象的上下文向量的平均向量)之间的相似度大于或者等于第一目标阈值,可以使目标对象的输出向量能够更准确地表示引用该目标对象的源对象(源对象的输入向量)以及引用该目标对象的源对象中用于描述该目标对象的信息(超链接的上下文向量),使源对象的输入向量能够更准确地表示自身的内容(源对象的输入向量)以及引用的目标对象(目标对象的输出向量)。
在本实施例中,上述调整过程可以是通过将第一输入向量和第一上下文中的各个词所对应的词向量输入到第二目标模型,由第二目标模型获取第一平均向量,将第一平均向量与第二对象的第一输出向量进行比较,通过调整第一输入向量、第一上下文向量和第一输出向量中的至少之一,以增加第一平均向量与第一输出向量的相似性。
在本实施例中,第二目标模型的输入可以包括:各个超链接对象的初始输入向量和初始输出向量、以及超链接对象集合中各个词的初始输入向量。第二目标模型通过调整各个超文档的输入向量和输出向量、以 及超链接对象集合中各个词的输入向量,使用优化算法来优化目标函数。目标函数的变量为超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量。目标函数用于求解满足以下条件的各个超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量:使得超链接对象集合中包含的所有超链接中,各超链接的源对象的输入向量与包含该超链接的上下文中的词所对应的输入向量的平均向量,与该超链接的目标对象的输出向量的相似度的总和最高。
在将第一输出向量作为第二对象的输出向量并输出之后,可以将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量。
在本实施例中,在将第一输出向量作为第二对象的输出向量并输出后,输出的输出向量可以用来更新第一输出向量。也可以将第一输入向量作为第一对象的输入向量并输出,输出的输入向量可以用来更新第一输入向量。还可以将调整后的平均向量中各个词的输入向量作为对应词的输入向量并输出,输出的词的输入向量可以用来更新该词的输入向量。
在将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量之后,可以定位到所有对象中的其他超链接,将定位到的超链接作为第一超链接,将该超链接的源对象作为第一对象,将该超链接的目标对象作为第二对象,重复执行前述步骤,直到所有对象中包含的超链接均已被处理。
在本实施例中,在对定位到的所有超链接均进行处理之后,可以重复执行定位所有超链接以及对定位到的超链接进行处理的步骤,以得到超链接对象更为准确的向量化表示。
在本实施例中,上述所有对象可以是超链接对象集合中的所有对象。对于所有对象中的超链接,可以通过依次获取超链接对象集合中各对象所包含的超链接的方式(超链接可以通过<d
s,C,d
t>的方式表示),获取超链接对象集合中包含的所有超链接。
在本实施例中,在将第一输出向量作为第二对象的输出向量并输出之后,可以在多个场景中使用得到的输出向量(以及输入向量),上述场景可以包括但不限于:超文档分类、相似文档及关键词检索、引用推荐。
在将第一输出向量作为第二对象的输出向量并输出之后,获取第三对象中各个词所对应的输入向量;根据第三对象的各个词的输入向量和第二对象的输出向量,确定第二对象的目标参数;根据目标参数确定是否允许第二对象被第三对象引用。
根据目标参数确定是否允许第二对象被第三对象目引用包括:在目标参数的取值高于第二目标阈值的情况下,确定允许第二对象被第三对象引用;或者,在候选对象集合中第二对象的目标参数的取值最大的情况下,确定允许第二对象被第三对象引用,其中,候选对象集合包含第二对象。
在根据目标参数确定是否允许第二对象被第三对象引用之后,在根据目标参数确定出允许第二对象被第三对象引用的情况下,执行:在第三对象的目标位置上插入用于指向第二对象的第三超链接;在第三对象上显示用于提示第三超链接的提示信息,或,接收用于指示第三超链接在第三对象中的插入位置的指示信息;根据指示信息,在第三对象中的插入位置上插入第三超链接。
在第三对象中的目标位置上插入用于指向第二对象的第三超链接包括:在第三对象中查找目标词,其中,目标词所对应的输入向量与第二对象的输出向量之间的相似度高于第三目标阈值;在第三对象中的目标词之后的位置上插入第三超链接。
通过本实施例,将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;获取第一对象作为链接源时的第一输入向量,其中,第一对象包含指向第二对象的第一超链接的信息;根据第一上下文向量和第一输入向量获取第一平均向量;调整第一输入向量、第一上下文向 量和对应于第二对象的第一输出向量中的至少之一;根据调整结果计算得到第一输出向量与第一平均向量的相似度,当第一输出向量与第一平均向量的相似度大于或者等于第一目标阈值时,将第一输出向量作为第二对象的输出向量并输出。通过使用输出向量来表示超链接对象,避免了丢失关键信息,提高了信息的完整性。
在本申请一实施例中,如图11B所示,上述装置还包括:
更新单元1112,用于在将第一输出向量作为第二对象的输出向量并输出之后,将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量。
在一些实施例中,如图11C所示,上述装置还包括:
第一执行单元1114,用于在更新单元1112将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量之后,重复执行以下步骤:
定位到所有对象中的其他超链接,将定位到的其他超链接作为第二超链接;将第二超链接的第二上下文信息转换为第二上下文向量;获取第二超链接的源对象作为链接源时的第二输入向量,其中,源对象包含指向目标对象的第二超链接的信息;根据第二上下文向量和第二输入向量获取第二平均向量;调整第二输入向量和、第二上下文向量对应于目标对象的第二输出向量中的至少之一;根据调整结果计算得到第二输出向量与第二平均向量的相似度,当第二输出向量与第二平均向量的相似度大于或者等于第一目标阈值时,将第二输出向量作为目标对象的输出向量并输出。
通过本实施例,将第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量,可以保证第一输入向量和第二输入向量的有效性。进一步地,通过定位所有对象中的超链接,并对每个超链接执行调整的步骤,可以提高各对象的输入向量和输出向量表征该对象的能 力。
在本申请一实施例中,如图11D所示,上述装置还包括:
第三获取单元1116,用于在将第一输出向量作为第二对象的输出向量并输出将之后,获取第三对象中各个词所对应的输入向量;
确定单元1118,用于根据第三对象的各个词的输入向量和第二对象的输出向量,确定第二对象的目标参数;根据目标参数确定是否允许第二对象被第三对象引用。
在其它实施例中,所述确定单元1118包括:
第一确定模块,用于在目标参数的取值高于第二目标阈值的情况下,确定允许第二对象被第三对象引用;或者,
用于在候选对象集合中第二对象的目标参数的取值最大的情况下,确定允许第二对象被第三对象引用,其中,候选对象集合包含第二对象。
在其它实施例中,如图11E所示,上述装置还包括:
第二执行单元1120,用于在根据目标参数确定出允许第二对象被第三对象引用的情况下,执行:
在第三对象的目标位置上插入用于指向第二对象的第三超链接;
在第三对象上显示用于提示第三超链接的提示信息,或
接收用于指示第三超链接在第三对象中的插入位置的指示信息;根据指示信息,在第三对象中的插入位置上插入第三超链接。
在其它实施例中,第二执行单元1120具体用于:
在第三对象中查找目标词,其中,目标词所对应的输入向量与第二对象的输出向量之间的相似度高于第三目标阈值;在第三对象中的目标词之后的位置上插入第三超链接。
通过本实施例,通过使用超链接转化后的输出向量为第三对象进行 引用推荐,并在第三对象中插入指向推荐的超链接对象的超链接,从而在得到输出向量后,为第三对象进行引用推荐,从而提高了超链接的处理方法的应用价值,以及第三对象的使用价值(例如,在论文撰写,网页设计等场景使用)。
本申请实施例还提供了一种超链接的处理装置,如图12A所示,该装置包括:
第一获取单元1202,用于获取第一对象作为链接源时的第一输入向量,其中,第一输入向量至少用于表示第一对象以及第一对象中描述第二对象的内容,第一对象包含指向第二对象的第一超链接的信息;
第二获取单元1204,用于获取第二对象作为链接目标时的第一输出向量;
调整单元1206,用于至少根据第一输入向量以及第一输出向量,调整得到第二对象的输出向量。
在本申请一实施例中,上述超链接的处理装置可以用于但不限于对于特定物体的分类、推荐、检索等过程中。
需要说明的是,在本实施例中,通过上述超链接的处理装置,获取第一对象作为链接源时的第一输入向量,其中,第一输入向量至少用于表示第一对象以及第一对象中描述第二对象的内容,第一对象包含指向第二对象的第一超链接的信息;获取第二对象作为链接目标时的第一输出向量;以及,至少根据第一输入向量以及第一输出向量,调整得到第二对象的输出向量,通过输出向量来表示超链接对象作为链接目标时的信息,达到了避免丢失关键信息的目的,实现了提高信息完整性的技术效果。
在其它实施例中,在获取第一对象作为链接源时的第一输入向量之后,可以获取第一超链接的第一上下文所对应的一个或多个词向量,其 中,第一上下文为第一对象中包含第一超链接的上下文去除第一超链接后得到的上下文;根据第一输入向量以及第一上下文所对应的一个或多个词向量,获取第一平均向量;根据第一平均向量和第一输出向量,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,以得到第二对象的输出向量。
在本实施例中,可以根据第一输入向量以及第一上下文向量获取第一平均向量。获取第一平均向量的方式可以是:对第一输入向量与第一上下文中的各个词所对应的输入向量取平均,得到第一平均向量。第一输入向量的向量长度与超链接中的词的输入向量的长度相同,可以通过对第一输入向量和第一上下文中的词所对应的输入向量中各个位置上的元素的取值取平均的方式,得到第一平均向量。
在本实施例中,第一输入向量可以是第一对象的初始输入向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第一对象的中间输入向量。第一输出向量可以是第二对象的初始输出向量,也可以是在通过迭代的方式获取各超链接对象的输入向量和输出向量的过程中得到的第二对象的中间输出向量。
在其它实施例中,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一包括:计算第一平均向量以及第一输出向量的相似度;基于相似度优化算法调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,使得第一输出向量与第一平均向量之间的相似度大于或者等于目标阈值。
在本实施例中,上述调整过程可以是通过将第一输入向量和第一上下文中的各个词所对应的词向量输入到目标模型,由目标模型获取第一平均向量,将第一平均向量与第二对象的第一输出向量进行比较,通过调整第一平均向量、第一输入向量和第一输出向量中的至少之一,以增加第一平均向量与第一输出向量的相似性。
在本实施例中,目标模型的输入可以包括:各个超链接对象的初始 输入向量和初始输出向量、以及超链接对象集合中各个词的初始输入向量。目标模型通过调整各个超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量,使用优化算法来优化目标函数。目标函数的变量为超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量。目标函数用于求解满足以下条件的各个超链接对象的输入向量和输出向量、以及超链接对象集合中各个词的输入向量:使得超链接对象集合中包含的所有超链接中,各超链接的源对象的输入向量与包含该超链接的上下文中的词所对应的输入向量的平均向量,与该超链接的目标对象的输出向量的相似度的总和最高。
在其它实施例中,在调整得到第二对象的输出向量之后,可以定位到所有对象中的其他超链接,将定位到的超链接作为第一超链接,将该超链接的源对象作为第一对象,将该超链接的目标对象作为第二对象,重复执行前述步骤,直到所有对象中包含的超链接均已被处理。
在本实施例中,在对定位到的所有超链接均进行处理之后,可以重复执行定位所有超链接以及对定位到的超链接进行处理的步骤,以得到超链接对象更为准确的向量化表示。
通过本实施例,获取第一对象作为链接源时的第一输入向量,其中,第一输入向量至少用于表示第一对象以及第一对象中描述第二对象的内容,第一对象包含指向第二对象的第一超链接的信息;获取第二对象作为链接目标时的第一输出向量;以及,至少根据第一输入向量以及第一输出向量,调整得到第二对象的输出向量,通过使用输出向量来表示超链接对象,避免了丢失关键信息,提高了信息的完整性。
在本申请一实施例中,如图12B所示,上述装置还包括:
第三获取单元1208,用于在获取第一对象作为链接源时的第一输入向量之后,获取第一超链接的第一上下文所对应的一个或多个词向量,其中,第一上下文为第一对象中包含第一超链接的上下文去除第一超链接后得到的上下文;
第四获取单元1210,用于根据第一输入向量以及第一上下文所对应的一个或多个词向量,获取第一平均向量;
其中,调整单元1206包括:调整模块,用于根据第一平均向量和第一输出向量,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,以得到第二对象的输出向量。
在本申请一实施例中,调整模块包括:
计算子模块,用于计算第一平均向量以及第一输出向量的相似度;
调整子模块,用于基于相似度优化算法调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,使得第一输出向量与第一平均向量之间的相似度大于或者等于目标阈值。
通过本实施例,根据第一平均向量和第一输出向量,调整第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,以得到第二对象的输出向量,可以保证得到的输出向量对第二对象的表示能力。进一步地,根据第一平均向量和第一输出向量的相似度,使用相似度优化算法对第一输入向量、第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一进行调整,可以提高输出向量对第二对象的表达能力。
在本申请一实施例中,如图12C所示,上述装置还包括:
执行单元1212,用于在调整得到第二对象的输出向量之后,重复执行以下步骤,并输出所有对象的输出向量:
按照预定规则,定位所有对象中的其他超链接,将定位到的其他超链接作为第二超链接;
获取第二超链接的源对象作为链接源时的第二输入向量,其中,第二输入向量至少用于表示源对象以及源对象中描述目标对象的内容,源对象包含指向目标对象的第二超链接的信息;
获取目标对象作为链接目标时的第二输出向量;以及
至少根据第二输入向量以及第二输出向量,调整得到目标对象的输出向量。
通过本实施例,通过定位所有对象的超链接,并对每个超链接执行调整的步骤,提高了各对象的输入向量和输出向量表征该对象的能力。
本申请的实施例还提供了一种存储介质,该存储介质中存储有计算机程序,其中,该计算机程序被设置为运行时执行上述任一项方法实施例中的步骤。
在本实施例中,本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令终端设备相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
本申请实施例还提供了一种用于实施上述超链接的处理方法的电子装置,如图13所示,该电子装置包括:处理器1302、存储器1304、传输装置1306等。该存储器中存储有计算机程序,该处理器被设置为通过执行该计算机程序执行上述任一项方法实施例中的步骤。
在本实施例中,上述电子装置可以位于计算机网络的多个网络设备中的至少一个网络设备。
本领域普通技术人员可以理解,图13所示的结构仅为示意,电子装置也可以是智能手机(如Android手机、iOS手机等)、平板电脑、掌上电脑以及移动互联网设备(Mobile Internet Devices,简称为MID)、PAD等终端设备或服务器。图13其并不对上述电子装置的结构造成限定。例如,电子装置还可包括比图13中所示更多或者更少的组件(如网络 接口等),或者具有与图13所示不同的配置。
其中,存储器1304可用于存储软件程序以及模块,如本申请实施例中的超链接的处理方法和装置对应的程序指令/模块,处理器1302通过运行存储在存储器1304内的软件程序以及模块,从而执行各种功能应用以及数据处理,即实现上述超链接的处理方法。存储器1304可包括高速随机存储器,还可以包括非易失性存储器,如一个或者多个磁性存储装置、闪存、或者其他非易失性固态存储器。在一些实例中,存储器1304可进一步包括相对于处理器1302远程设置的存储器,这些远程存储器可以通过网络连接至终端。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
上述的传输装置1306用于经由一个网络接收或者发送数据。上述的网络具体实例可包括有线网络及无线网络。在一个实例中,传输装置1306包括一个网络适配器(Network Interface Controller,简称为NIC),其可通过网线与其他网络设备与路由器相连从而可与互联网或局域网进行通讯。在一个实例中,传输装置1306为射频(Radio Frequency,简称为RF)模块,其用于通过无线方式与互联网进行通讯。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
以上所述仅是本申请的示例性实施例,应当指出,对于本技术领域 的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。
Claims (16)
- 一种超链接的处理方法,由计算设备执行,包括:将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;获取所述第一对象作为链接源时的第一输入向量,其中,所述第一对象包含指向第二对象的所述第一超链接的信息;根据所述第一上下文向量和所述第一输入向量获取第一平均向量;调整所述第一输入向量、所述第一上下文向量和对应于所述第二对象的第一输出向量中的至少之一;根据调整结果计算得到所述第一输出向量与所述第一平均向量的相似度,当所述第一输出向量与所述第一平均向量的相似度大于或者等于第一目标阈值时,将所述第一输出向量作为所述第二对象的输出向量并输出。
- 根据权利要求1所述的方法,在将所述第一输出向量作为所述第二对象的输出向量并输出之后,所述方法还包括:将所述第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量。
- 根据权利要求2所述的方法,在将所述第一输入向量和第一输出向量更新为调整后的第一输入向量和第一输出向量之后,所述方法还包括:重复执行以下步骤,并输出所有对象的输出向量:定位到所有对象中的其他超链接,将定位到的其他超链接分别作为第二超链接;将所述第二超链接的第二上下文信息转换为第二上下文向量;获取所述第二超链接的源对象作为链接源时的第二输入向量,其中,所述源对象包含指向目标对象的所述第二超链接的信息;根据所述第二上下文向量和所述第二输入向量获取第二平均向量;调整所述第二输入向量、所述第二上下文向量和对应于所述目标对象的第二输出向量中的至少之一;根据调整结果计算得到所述第二输出向量与所述第二平均向量的相似度,当所述第二输出向量与所述第二平均向量的相似度大于或者等于所述第一目标阈值时,将所述第二输出向量作为所述目标对象的输出向量并输出。
- 根据权利要求1所述的方法,所述获取所述第一输入向量和所述第一上下文向量的第一平均向量包括:对所述第一输入向量和第一上下文中的各个词所对应的输入向量取平均,其中,所述第一上下文为所述第一对象中包含所述第一超链接的上下文去除所述第一超链接后得到的上下文。
- 根据权利要求1至4中任一项所述的方法,在将所述第一输出向量作为所述第二对象的输出向量并输出之后,所述方法还包括:获取第三对象中的各个词所对应的输入向量;根据所述第三对象的各个词的所述输入向量和所述第二对象的输出向量,确定所述第二对象的目标参数;根据所述目标参数确定是否允许所述第二对象被所述第三对象引用。
- 根据权利要求5所述的方法,所述根据所述目标参数确定是否允许所述第二对象被所述第三对象目引用包括:在所述目标参数的取值高于第二目标阈值的情况下,确定允许所述第二对象被所述第三对象引用;或者,在候选对象集合中所述第二对象的目标参数的取值最大的情况下,确定允许所述第二对象被所述第三对象引用,其中,所述候选对象集合包含所述第二对象。
- 根据权利要求6所述的方法,还包括:在根据所述目标参数确定出允许所述第二对象被所述第三对象引用时,在所述第三对象的目标位置上插入用于指向所述第二对象的第三超链接;在所述第三对象上显示用于提示所述第三超链接的提示信息,或接收用于指示所述第三超链接在所述第三对象中的插入位置的指示信息;根据所述指示信息,在所述第三对象中的所述插入位置上插入所述第三超链接。
- 根据权利要求7所述的方法,所述在所述第三对象的目标位置上插入用于指向所述第二对象的第三超链接包括:在所述第三对象中查找目标词,其中,所述目标词所对应的输入向量与所述第二对象的输出向量之间的相似度高于第三目标阈值;在所述第三对象中的所述目标词之后的位置上插入所述第三超链接。
- 一种超链接的处理方法,由计算设备执行,包括:获取第一对象作为链接源时的第一输入向量,其中,所述第一输入向量至少用于表示所述第一对象以及所述第一对象中描述第二对象的内容,所述第一对象包含指向所述第二对象的第一超链接的信息;获取第二对象作为链接目标时的第一输出向量;以及至少根据所述第一输入向量以及第一输出向量,调整得到所述第二对象的输出向量。
- 根据权利要求9所述的方法,在获取所述第一对象作为链接源时的所述第一输入向量之后,所述方法还包括:获取所述第一超链接的第一上下文所对应的一个或多个词向量,其中,所述第一上下文为所述第一对象中包含所述 第一超链接的上下文去除所述第一超链接后得到的上下文;根据所述第一输入向量以及所述第一上下文所对应的一个或多个词向量,获取第一平均向量;其中,所述至少根据所述第一输入向量以及所述第一输出向量,调整得到所述第二对象的输出向量包括:根据所述第一平均向量和所述第一输出向量,调整所述第一输入向量、所述第一上下文所对应的一个或多个词向量和所述第一输出向量中的至少之一,以得到所述第二对象的输出向量。
- 根据权利要求10所述的方法,所述调整所述第一输入向量、所述第一上下文所对应的一个或多个词向量和所述第一输出向量中的至少之一包括:计算所述第一平均向量以及所述第一输出向量的相似度;基于相似度优化算法调整所述第一输入向量、所述第一上下文所对应的一个或多个词向量和第一输出向量中的至少之一,使得所述第一输出向量与所述第一平均向量之间的相似度大于或者等于目标阈值。
- 根据权利要求9至11中任一项所述的方法,在调整得到所述第二对象的输出向量之后,所述方法还包括:重复执行以下步骤,并输出所有对象的输出向量:按照预定规则,定位所有对象中的其他超链接,将定位到的其他超链接作为第二超链接;获取所述第二超链接的源对象作为链接源时的第二输入向量,其中,所述第二输入向量至少用于表示所述源对象以及所述源对象中描述目标对象的内容,所述源对象包含指向所述目标对象的第二超链接的信息;获取所述目标对象作为链接目标时的第二输出向量;以及至少根据所述第二输入向量以及第二输出向量,调整得到所述 目标对象的输出向量。
- 一种超链接的处理装置,包括:转换单元,用于将第一对象中第一超链接的第一上下文信息转换为第一上下文向量;第一获取单元,用于获取所述第一对象作为链接源时的第一输入向量,其中,所述第一对象包含指向第二对象的所述第一超链接的信息;第二获取单元,用于根据所述第一上下文向量和所述第一输入向量获取第一平均向量;调整单元,用于调整所述第一输入向量、所述第一上下文向量和对应于所述第二对象的第一输出向量中的至少之一;输出单元,用于根据调整结果计算得到所述第一输出向量与所述第一平均向量的相似度,当所述第一输出向量与所述第一平均向量的相似度大于或者等于第一目标阈值时,将所述第一输出向量作为所述第二对象的输出向量并输出。
- 一种超链接的处理装置,包括:第一获取单元,用于获取第一对象作为链接源时的第一输入向量,其中,所述第一输入向量至少用于表示所述第一对象以及所述第一对象中描述第二对象的内容,所述第一对象包含指向所述第二对象的第一超链接的信息;第二获取单元,用于获取所述第二对象作为链接目标时的第一输出向量;调整单元,用于至少根据所述第一输入向量以及第一输出向量,调整得到所述第二对象的输出向量。
- 一种电子装置,包括:处理器以及与所述处理器相连接的存储器,所述存储器中存储有可由所述处理器执行的计算机程序,所述处理器执行所述计算机程序以执行所述权利要求1至12中任一项中所 述的方法。
- 一种存储介质,所述存储介质中存储有计算机程序,其中,所述计算机程序可被处理器执行以完成所述权利要求1至12中任一项中所述的方法的操作。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/012,380 US11275888B2 (en) | 2018-07-13 | 2020-09-04 | Hyperlink processing method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810771876.4 | 2018-07-13 | ||
CN201810771876.4A CN109086348B (zh) | 2018-07-13 | 2018-07-13 | 超链接的处理方法和装置及存储介质 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/012,380 Continuation US11275888B2 (en) | 2018-07-13 | 2020-09-04 | Hyperlink processing method and apparatus |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020010996A1 true WO2020010996A1 (zh) | 2020-01-16 |
Family
ID=64837886
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/092279 WO2020010996A1 (zh) | 2018-07-13 | 2019-06-21 | 超链接的处理方法和装置及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11275888B2 (zh) |
CN (1) | CN109086348B (zh) |
WO (1) | WO2020010996A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109086348B (zh) * | 2018-07-13 | 2023-04-18 | 腾讯科技(深圳)有限公司 | 超链接的处理方法和装置及存储介质 |
US11875131B2 (en) * | 2020-09-16 | 2024-01-16 | International Business Machines Corporation | Zero-shot cross-lingual transfer learning |
US11443114B1 (en) * | 2021-06-21 | 2022-09-13 | Microsoft Technology Licensing, Llc | Computing system for entity disambiguation and not-in-list entity detection in a knowledge graph |
CN115544257B (zh) * | 2022-11-25 | 2023-04-11 | 天津联想协同科技有限公司 | 网盘文档快速分类方法、装置、网盘及存储介质 |
US20240211686A1 (en) * | 2022-12-23 | 2024-06-27 | Document Crunch, Inc. | Context-based natural language processing |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6256631B1 (en) * | 1997-09-30 | 2001-07-03 | International Business Machines Corporation | Automatic creation of hyperlinks |
CN102541946A (zh) * | 2010-12-31 | 2012-07-04 | 百度在线网络技术(北京)有限公司 | 基于超链接的推荐属性确定超链推荐度的方法与设备 |
CN105243091A (zh) * | 2015-09-11 | 2016-01-13 | 晶赞广告(上海)有限公司 | 基于超链分析的页面语义信息提取方法及系统 |
CN105930546A (zh) * | 2016-07-08 | 2016-09-07 | 北京北大英华科技有限公司 | 文件关联显示方法 |
CN109086348A (zh) * | 2018-07-13 | 2018-12-25 | 腾讯科技(深圳)有限公司 | 超链接的处理方法和装置及存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2411014A (en) * | 2004-02-11 | 2005-08-17 | Autonomy Corp Ltd | Automatic searching for relevant information |
US7716225B1 (en) * | 2004-06-17 | 2010-05-11 | Google Inc. | Ranking documents based on user behavior and/or feature data |
CN100470544C (zh) * | 2005-05-24 | 2009-03-18 | 国际商业机器公司 | 用于链接文档的方法、设备和系统 |
US9690786B2 (en) * | 2008-03-17 | 2017-06-27 | Tivo Solutions Inc. | Systems and methods for dynamically creating hyperlinks associated with relevant multimedia content |
US9715495B1 (en) * | 2016-12-15 | 2017-07-25 | Quid, Inc. | Topic-influenced document relationship graphs |
US10817650B2 (en) * | 2017-05-19 | 2020-10-27 | Salesforce.Com, Inc. | Natural language processing using context specific word vectors |
-
2018
- 2018-07-13 CN CN201810771876.4A patent/CN109086348B/zh active Active
-
2019
- 2019-06-21 WO PCT/CN2019/092279 patent/WO2020010996A1/zh active Application Filing
-
2020
- 2020-09-04 US US17/012,380 patent/US11275888B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6256631B1 (en) * | 1997-09-30 | 2001-07-03 | International Business Machines Corporation | Automatic creation of hyperlinks |
CN102541946A (zh) * | 2010-12-31 | 2012-07-04 | 百度在线网络技术(北京)有限公司 | 基于超链接的推荐属性确定超链推荐度的方法与设备 |
CN105243091A (zh) * | 2015-09-11 | 2016-01-13 | 晶赞广告(上海)有限公司 | 基于超链分析的页面语义信息提取方法及系统 |
CN105930546A (zh) * | 2016-07-08 | 2016-09-07 | 北京北大英华科技有限公司 | 文件关联显示方法 |
CN109086348A (zh) * | 2018-07-13 | 2018-12-25 | 腾讯科技(深圳)有限公司 | 超链接的处理方法和装置及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
US11275888B2 (en) | 2022-03-15 |
CN109086348B (zh) | 2023-04-18 |
US20210141993A1 (en) | 2021-05-13 |
CN109086348A (zh) | 2018-12-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020010996A1 (zh) | 超链接的处理方法和装置及存储介质 | |
US11232140B2 (en) | Method and apparatus for processing information | |
US11151177B2 (en) | Search method and apparatus based on artificial intelligence | |
WO2019153551A1 (zh) | 文章分类方法、装置、计算机设备及存储介质 | |
CN107346336B (zh) | 基于人工智能的信息处理方法和装置 | |
CN107590174B (zh) | 页面访问方法及装置 | |
WO2020082560A1 (zh) | 文本关键词提取方法、装置、设备及计算机可读存储介质 | |
CN116775847B (zh) | 一种基于知识图谱和大语言模型的问答方法和系统 | |
CN110929038B (zh) | 基于知识图谱的实体链接方法、装置、设备和存储介质 | |
US20200293722A1 (en) | Word vector retrofitting method and apparatus | |
WO2014126657A1 (en) | Latent semantic analysis for application in a question answer system | |
CN111831911A (zh) | 查询信息的处理方法、装置、存储介质和电子装置 | |
CN110516033B (zh) | 一种计算用户偏好的方法和装置 | |
WO2021051934A1 (zh) | 基于人工智能的合同关键条款提取方法、装置及存储介质 | |
CN111767796A (zh) | 一种视频关联方法、装置、服务器和可读存储介质 | |
CN110717038B (zh) | 对象分类方法及装置 | |
US20180285742A1 (en) | Learning method, learning apparatus, and storage medium | |
CN116932730B (zh) | 基于多叉树和大规模语言模型的文档问答方法及相关设备 | |
KR102315181B1 (ko) | 개체명 연결 방법, 장치, 시스템 및 컴퓨터 프로그램 | |
JP7121819B2 (ja) | 画像処理方法及び装置、電子機器、コンピュータ可読記憶媒体並びにコンピュータプログラム | |
CN117312535B (zh) | 基于人工智能的问题数据处理方法、装置、设备及介质 | |
CN115248890A (zh) | 用户兴趣画像的生成方法、装置、电子设备以及存储介质 | |
CN109582802B (zh) | 一种实体嵌入方法、装置、介质及设备 | |
CN104765752A (zh) | 基于用户模型演进的推荐装置和方法 | |
US9195940B2 (en) | Jabba-type override for correcting or improving output of a model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19834259 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19834259 Country of ref document: EP Kind code of ref document: A1 |