CN111563159B - Text sorting method and device - Google Patents

Text sorting method and device Download PDF

Info

Publication number
CN111563159B
CN111563159B CN202010683552.2A CN202010683552A CN111563159B CN 111563159 B CN111563159 B CN 111563159B CN 202010683552 A CN202010683552 A CN 202010683552A CN 111563159 B CN111563159 B CN 111563159B
Authority
CN
China
Prior art keywords
feature vector
text feature
text
vector set
pooling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010683552.2A
Other languages
Chinese (zh)
Other versions
CN111563159A (en
Inventor
王瑞欣
方宽
范力文
申战
周日康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhizhe Sihai Beijing Technology Co ltd
Original Assignee
Zhizhe Sihai Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhizhe Sihai Beijing Technology Co ltd filed Critical Zhizhe Sihai Beijing Technology Co ltd
Priority to CN202010683552.2A priority Critical patent/CN111563159B/en
Publication of CN111563159A publication Critical patent/CN111563159A/en
Application granted granted Critical
Publication of CN111563159B publication Critical patent/CN111563159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Abstract

The invention relates to a text sequencing method and a text sequencing device, belongs to the technical field of natural language processing, and aims to improve the correlation between keywords input by a user and search results. The method comprises the following steps: acquiring at least two texts corresponding to the query word to form a text feature vector set, wherein the text feature vector set comprises feature vectors of the at least two texts; weighting the text feature vector set by using the weight vector to generate a weighted text feature vector set; and determining the ordering scores of at least two texts according to the weighted text feature vector set.

Description

Text sorting method and device
Technical Field
The present disclosure relates to the field of natural language processing technologies, and in particular, to a text ordering method and apparatus.
Background
In a search engine, the ranking module scores recalled text and then returns to the user in order of scoring from high to low. The higher the accuracy of the scoring module is, the easier the user can find the desired result, and the better the experience is. At present, in most scoring models, scoring of each text is independent when training and predicting. These models accept as input the characteristics of a text and output the score for that text. The method ignores feature interaction information between texts, so that the relevance of search results returned to the user is low.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a text ranking method and apparatus, aiming to improve the correlation between the keywords input by the user and the search results.
According to a first aspect of the present invention, there is provided a text ranking method, comprising: the method comprises the steps of obtaining at least two texts corresponding to query words to form a text feature vector set, pooling the text feature vector set to obtain a fusion feature vector, exciting the fusion feature vector to generate a weight vector; weighting the text feature vector set by using the weight vector to generate a weighted text feature vector set; and determining the ordering scores of the at least two texts according to the weighted text feature vector set.
In a possible embodiment, the pooling the text feature vector set to obtain a fused feature vector includes: and compressing the text feature vector set according to the maximum dimension pooling or average dimension pooling to obtain a fusion feature vector.
In one possible embodiment, wherein the exciting the fused feature vector to generate a weight vector comprises: converting the fusion characteristic vector through a full connection layer and an activation function to obtain a weight vector; the weighting the text feature vector set by using the weight vector, and generating a weighted text feature vector set includes: and performing point multiplication on the weight vector and the text feature vector set to obtain a weighted text feature vector set.
In one possible embodiment, the method further comprises: and repeating the steps of pooling, activating and weighting.
According to a second aspect of the present invention, there is provided a text ranking apparatus comprising: the system comprises an acquisition module, a pooling module, an activation module, a search module and a search module, wherein the acquisition module is configured to acquire at least two texts corresponding to a search word to form a text feature vector set, and the text feature vector set comprises feature vectors of the at least two texts; the weighting module is used for weighting the text feature vector set by using the weight vector to generate a weighted text feature vector set; and a determination module configured to determine the ranking scores of the at least two texts according to the weighted text feature vector set.
In one possible embodiment, wherein the pooling module is specifically configured to: and compressing the text feature vector set according to the maximum dimension pooling or average dimension pooling to obtain a fusion feature vector.
In one possible embodiment, wherein the activation module is specifically configured to: converting the fusion characteristic vector through a full connection layer and an activation function to obtain a weight vector; the weighting module is specifically configured to: and performing point multiplication on the weight vector and the text feature vector set to obtain a weighted text feature vector set.
In one possible embodiment, the apparatus further comprises: a loop module configured to repeat the steps of pooling, activating, and weighting.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to the first aspect when executing the program.
According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of the first aspect.
The text ordering method and the text ordering device provided by the embodiment of the disclosure comprise the steps of firstly, obtaining at least two texts corresponding to query words to form a text feature vector set, wherein the text feature vector set comprises feature vectors of the at least two texts; weighting the text feature vector set by using a weight vector to generate a weighted text feature vector set; and finally, determining the ordering scores of at least two texts according to the weighted text feature vector set. Through the pooling, exciting and weighting operations, features among the texts are associated, so that the relevance of search results obtained according to search terms of the user is improved.
Additional features and advantages of the disclosure will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the disclosure. The objectives and other advantages of the disclosure may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not intended to be to scale as practical, emphasis instead being placed upon illustrating the subject matter of the present application.
FIG. 1 illustrates a conventional ranking model diagram provided by embodiments of the present disclosure;
FIG. 2 is a diagram illustrating a Group-wise scoring model provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a ranking scoring model provided by embodiments of the present disclosure;
fig. 4 is a schematic diagram illustrating a structure of a pooling-activating network provided by an embodiment of the present disclosure;
fig. 5 shows a specific structural diagram of a pooling-activating module provided by the embodiment of the present disclosure;
FIG. 6 is a specific structural diagram of a ranking and scoring model provided by an embodiment of the present disclosure;
FIG. 7 is a flow chart illustrating a text ranking method provided by an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram illustrating a text sorting apparatus provided in an embodiment of the present disclosure;
fig. 9 shows a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
In a search engine, the ranking module scores recalled text and then returns to the user in order of scoring from high to low. The higher the accuracy of the scoring module is, the easier the user can find the desired result, and the better the experience is. At present, in most scoring models, scoring of each text is independent when training and predicting. These models accept as input the characteristics of a text and output the score for that text. The method ignores feature interaction information between texts, so that the relevance of search results returned to the user is low. For example, existing ranking learning methods are classified mainly according to model structure and loss function. The model structure may be a decision tree, a support vector machine or a neural network. The loss functions are mainly Pointwise, Pairwise and Listwise. These methods treat each text as an independent sample during training and scoring.
Fig. 1 is a schematic diagram of a conventional ranking model according to an embodiment of the present invention. The conventional ranking model input is a feature of a textVector, the output of the ordering model is the corresponding score of the text; when there are a plurality of input texts: (
Figure 618817DEST_PATH_IMAGE001
Figure 332695DEST_PATH_IMAGE002
,…
Figure 846853DEST_PATH_IMAGE003
) Then, the corresponding score corresponding to each text is correspondingly output (the total is calculated
Figure 242062DEST_PATH_IMAGE004
One score), it should be noted that in the conventional ranking model, each text to be ranked is independent from the other, i.e., each text is independently input into the ranking model.
In recent years, a new scoring function (e.g., a Group-wise function) first groups texts, then all features of a Group of texts are used as input, and the score of each text in the Group of texts is output. Because a text corresponds to multiple groups, the score of each text is the average of the scores of the text in the multiple groups. The other method for utilizing the feature interaction between texts is to input all texts into an RNN or Transformer model and then obtain a context-dependent Embedding vector; finally, the context-dependent Embedding vector is spliced back to the characteristics of each text to be used as the input of the model.
Fig. 2 is a schematic diagram of a Group-wise scoring model according to an embodiment of the present invention. Here, the case of groupsize =2 is shown. During scoring, the texts to be ordered are randomly scrambled, then each text and the texts on the left side and the right side of the text form groups, the features of all the texts in each group are spliced together and input into a scoring model, the model outputs a 2-dimensional vector, and each dimension represents the scoring of each text in the group. Finally, the score of each text is the average score in each group in which the text is located. For example, toThe input text is
Figure 853172DEST_PATH_IMAGE001
Figure 410055DEST_PATH_IMAGE002
Figure 677088DEST_PATH_IMAGE005
Figure 672726DEST_PATH_IMAGE006
Wherein, in the step (A),
Figure 76026DEST_PATH_IMAGE002
and
Figure 69389DEST_PATH_IMAGE001
a group1 is formed and,
Figure 620457DEST_PATH_IMAGE002
and
Figure 91889DEST_PATH_IMAGE005
a group2 is formed and,
Figure 412012DEST_PATH_IMAGE005
and
Figure 841856DEST_PATH_IMAGE006
forming group3, splicing each text feature in the 3 groups together and inputting the same to a scoring model, outputting a 2-dimensional vector for each group through the scoring model, each dimension representing the score of each text in the group, and correspondingly
Figure 817903DEST_PATH_IMAGE001
Is the average score of group1,
Figure 889764DEST_PATH_IMAGE002
is the average score of group1 and group2,
Figure 533235DEST_PATH_IMAGE005
is the average score of group2 and group3,
Figure 602822DEST_PATH_IMAGE006
is the average score of group 4.
Whether the above Group-wise scoring model or the method using feature interaction between texts is adopted, there are usually trade-off problems of effect and efficiency. For example, in the Group-wise scoring model, the temporal complexity and the Group size are squared. So in a real ranking system, group is usually set to 2. But the smaller group size, the effect improvement is not so obvious. By acquiring the context Embedding vector of each text in the RNN model, the RNN cannot process the texts in parallel, and the performance is very low when the sorted text is long. And the Transformer structure is also the time complexity of O (n ^2 x d), where n is the length of the sorted list and d is the characteristic dimension of each text.
Unlike the two implementation ranking models of fig. 1 and fig. 2, the invention provides a ranking model at a sequence level, and the implementation ranking model can control the algorithm complexity of feature interaction between texts to be O (d), and the time complexity has very good performance and does not need the balance between effect and efficiency.
Fig. 3 is a schematic diagram of a ranking and scoring model according to an embodiment of the present invention. The model takes all the characteristics of the texts as input, directly outputs the scores of all the texts, interacts the characteristics of all the input texts, each text does not exist independently, but the characteristics of all the texts are fused with each other, and the interaction between the characteristics is utilized to improve the effect of the model, so that the accuracy of the scores is improved. The present invention will be described in detail below based on the ranking model corresponding to fig. 3.
Fig. 4 is a schematic diagram of a pooling-activating network according to an embodiment of the present invention. The main idea is to make a selection for making feature importance of the model dynamic by collecting statistical information among different channels. The network mainly comprises two parts: a pooling operation and an activation operation.
In the pooling operation, a total of L texts are input, respectively: (
Figure 128481DEST_PATH_IMAGE001
Figure 207296DEST_PATH_IMAGE002
,…
Figure 439694DEST_PATH_IMAGE003
) Each text is represented by a D-dimensional vector, i.e. a set of vectors of L dimensions. Pooling performs a statistical operation on the L features in each feature dimension, such as averaging or maximizing, and reduces the input from L x D to D; in the activation operation, after a D-dimensional vector is obtained for all input texts, the weight information of each feature dimension is learned through a full-connection and activation function.
After pooling and activation, a D-dimensional weight vector s (corresponding to vector C on the right in fig. 4) is obtained to represent the weight of each feature in this context. Then multiplying the resulting weight vector s by the original set of input vectors (
Figure 8078DEST_PATH_IMAGE001
Figure 693138DEST_PATH_IMAGE002
,…
Figure 575643DEST_PATH_IMAGE003
) The input after the feature weighted transformation is obtained:
Figure 724865DEST_PATH_IMAGE007
FIG. 5 is a schematic diagram of a specific structure of a pooling-activation module according to an embodiment of the present inventionFigure (a). The specific implementation process is as follows: inputting L texts, respectively
Figure 401834DEST_PATH_IMAGE001
Figure 839768DEST_PATH_IMAGE002
,…
Figure 57123DEST_PATH_IMAGE003
) Each text is a C-dimensional vector, the L C-dimensional vectors are compressed into a C-dimensional vector through the full connection layer, the activation layer and the pooling layer, then the C-dimensional vectors sequentially pass through the full connection layer, the activation layer and the full connection layer, and then the weight vector s is output through the activation function. Here, a simple gate mechanism is used to operate with the aid of the sigmoid activation function, which is specifically calculated as follows:
Figure 264113DEST_PATH_IMAGE008
(formula one)
In equation one above, s is the weight vector of the output (corresponding to vector C on the right in fig. 4), δ is the linear activation function,
Figure 377563DEST_PATH_IMAGE009
Figure 99531DEST_PATH_IMAGE010
the pooled feature vectors (corresponding to C on the left side in fig. 4). To prevent the model from becoming complex and to take into account generalization considerations, two fully connected layers are placed around the non-linearity here, parameterizing the gate mechanism as a bottleneck. Finally, the output of the pooling-activating module outputs the eigenvector of L × C after Scale operation.
Fig. 6 is a schematic structural diagram of a ranking and scoring model provided in the embodiment of the present invention. Compared with the traditional Deep Neural Networks (DNN) structure, the invention adds the pooling-activating module in each layer, and enables the characteristics among texts to be fully interacted through the multi-layer pooling-activating module, thereby improving the sequencing result and improving the searching accuracy. The input in fig. 6 is (L, C) for representing L texts, each text is a vector of C dimensions, and the output is Logits, which is the scoring result corresponding to the L texts.
In fig. 5 and 6, abbreviated as BN in english is collectively called batch normalization for batch normalization, FC is collectively called full connection for full connection, Pooling is a Pooling layer, and ReLu is an active layer.
The sequence-level scoring model is realized by adopting the pooling-activating network, and the model effect is improved by utilizing characteristic interaction information. Meanwhile, the network model has low time complexity, the algorithm complexity of feature interaction between texts can be controlled to be O (d), the time complexity has very good performance, the balance between effect and efficiency is not needed, and the expandability is good in a real scene.
A detailed description will be given of a specific application of the present invention based on the above-described specific structure of the ranking scoring model of fig. 6. The method mainly comprises the following steps:
the method comprises the following steps: preparing training data
The format of the training set is: query doc label. Where query represents a query term. doc stands for the page or text to be retrieved, usually by title as an example. label is the degree of relevance of the corresponding title under the query term, and is generally divided into a plurality of grades to express the degree of relevance of the page or text to the query term. Specifically, the invention divides label into two files of [0,1], wherein 0 represents that the query word is not related to the document, and 1 represents correlation. The label data in the text is obtained from a click log of real user behavior, wherein a label of 1 represents a click, and a label of 0 represents no click. The same query will correspond to multiple doc to form a sequence.
Step two: model training
Training data takes a sequence formed by a query and all doc corresponding to the query as a primary input, enters a pooling-activating module (as shown in fig. 6) after each full-connection layer, and once undergoes pooling operation and activating operation in the pooling-activating module, and the specific content is as follows:
(a) the pooling operation is to compress the feature vectors of different docs into one feature vector through the pooling operation (maximum pooling or average pooling), and the feature vector is fused with the information of different docs, so that the information interaction among different docs is realized.
(b) And performing activation operation after the pooling operation, wherein the activation operation firstly converts the feature vector obtained in the last step through a full connection layer and a sigmoid function to obtain a new feature vector, the value of each dimension of the feature vector is between 0 and 1 and represents the importance of different features, and then the new feature vector is multiplied by the input vector of the pooling-activation module structure to serve as the output of the pooling-activation module.
The output vector of the pooling-activating module keeps the same feature dimension as the input vector, and the output feature vectors are not independent from each other but mutually influenced through the pooling-activating module, so that the model learning is facilitated.
Similar to the conventional LTR model result in the output layer, each doc corresponds to a score, and various loss function calculations can be adapted.
Step three: model prediction
During prediction, the model requires that all doc under the same query are used as a sequence whole to be used as the input of the model, and the model output is the score corresponding to each doc.
Fig. 7 is a flowchart of a text sorting method according to an embodiment of the present invention. The method comprises the following steps:
701. and acquiring at least two texts corresponding to the query words to form a text feature vector set.
The text feature vector set comprises at least two text feature vectors. Referring to FIG. 4, the text feature vector set includes L texts, which are respectively
Figure 58260DEST_PATH_IMAGE001
Figure 119757DEST_PATH_IMAGE002
,…
Figure 200845DEST_PATH_IMAGE003
And each text is a D-dimensional feature vector, so that the text feature vector set comprises L D-dimensional text feature vectors.
702. And pooling the text feature vector set to obtain a fusion feature vector.
In the pooling, statistical operation is performed on the features corresponding to the L documents in each feature dimension, so that the input feature vector is finally reduced from the L x D dimension to the D dimension.
As a preferred embodiment, the step 702 may be implemented by: and compressing the text feature vector set according to the maximum dimension pooling or average dimension pooling to obtain a fusion feature vector.
For example, refer to the following equations two and three, where: x is a text feature vector set which comprises text feature vectors of L lines and D columns, after the pooling operation,
Figure 347793DEST_PATH_IMAGE011
for the fused feature vector, it is a feature vector of 1 row and D columns, wherein,
Figure 172529DEST_PATH_IMAGE012
the value of (a) is the maximum value or the average value in the 1 st column in the text feature vector set X, and so on,
Figure 354112DEST_PATH_IMAGE013
the value is the maximum value or the average value in the nth column in the text feature vector set X, and finally the fusion feature vector can be obtained
Figure 278206DEST_PATH_IMAGE011
Figure 974766DEST_PATH_IMAGE014
(formula two)
Figure 540877DEST_PATH_IMAGE015
(formula three)
703. The fused feature vector is excited to produce a weight vector.
The activation operation refers to learning the weight information of each feature dimension through full connection and activation function conversion after obtaining a D-dimensional vector for all input texts, so as to obtain a weight vector. With regard to the activation function, reference is made in particular to the relevant contents of a part of the above-mentioned formula. Referring to fig. 4, the weight vector here corresponds to vector C on the right side in fig. 4.
704. And weighting the text feature vector set by using the weight vector to generate a weighted text feature vector set.
As a preferred embodiment, the step 704 may be implemented by: and performing point multiplication on the weight vector and the text feature vector set to obtain a weighted text feature vector set.
Optionally, after the pooling operation in step 702 and the activating operation in step 703, a weight vector s is obtained, where the weight vector s represents the weight of each feature in this context, and then the weight vector s is directly multiplied by the input text feature vector set to obtain a weighted text feature vector set, specifically, referring to fig. 4, where C on the right side in fig. 4 is the weight vector s, and the weighted text feature vector set (the feature vector set corresponding to the lower right corner in fig. 4) is obtained after point multiplication of the weight vector s and the text feature vector set.
705. And determining the ordering scores of at least two texts according to the weighted text feature vector set.
Optionally, after the step 705, the method further includes: and outputting the text sorting result according to the sorting score.
Optionally, the method further includes: the pooling, activating and weighting steps corresponding to the steps 702 and 704 are repeated. Features among the texts can be fully interacted by repeatedly executing the steps of pooling, exciting and weighting, so that a sequencing result is improved, and the accuracy of searching is improved.
The text ordering method provided by the embodiment of the disclosure comprises the steps of firstly, obtaining at least two texts corresponding to query words to form a text feature vector set, wherein the text feature vector set comprises feature vectors of the at least two texts; weighting the text feature vector set by using a weight vector to generate a weighted text feature vector set; and finally, determining the ordering scores of at least two texts according to the weighted text feature vector set. Through the pooling, exciting and weighting operations, features among the texts are associated, so that the relevance of search results obtained according to search terms of the user is improved.
A text sorting apparatus provided in the embodiment of the present disclosure will be described below based on the related description in the embodiment of the text sorting method corresponding to fig. 7. Technical terms, concepts, and the like related to the above-described embodiments in the following embodiments may be described with reference to the above-described embodiments.
Fig. 8 is a schematic structural diagram of a text sorting apparatus according to an embodiment of the present disclosure. The device 8 comprises: an obtaining module 801, a pooling module 802, an exciting module 803, a weighting module 804, and a determining module 805, wherein:
the system comprises an acquisition module 801, a pooling module 802, an activation module 803 and a weight vector generation module, wherein the acquisition module is configured to acquire at least two texts corresponding to a query word to form a text feature vector set, and the text feature vector set comprises feature vectors of the at least two texts; the weighting module 804 is used for weighting the text feature vector set by using the weight vector to generate a weighted text feature vector set; and a determining module 805 configured to determine a ranking score for the at least two texts according to the weighted text feature vector set.
As a preferred embodiment, the pooling module 802 is specifically configured to: and compressing the text feature vector set according to the maximum dimension pooling or average dimension pooling to obtain a fusion feature vector.
As a preferred embodiment, the activation module 803 is specifically configured to: converting the fusion characteristic vector through a full connection layer and an activation function to obtain a weight vector; the weighting module 804 is specifically configured to: and performing point multiplication on the weight vector and the text feature vector set to obtain a weighted text feature vector set.
As a preferred embodiment, the method further comprises: an output module 806 configured to output a ranking result of the text according to the ranking score described above.
As a preferred embodiment, the method further comprises: a loop module 807 configured to repeat the steps of pooling, activating, and weighting. The pooling, activation and weighting steps are repeatedly performed by the loop module 807 so that features among the texts can be fully interacted, thereby improving the ranking result and improving the accuracy of the search.
The text sequencing device provided by the embodiment of the disclosure firstly obtains at least two texts corresponding to the query word to form a text feature vector set, wherein the text feature vector set comprises feature vectors of the at least two texts; weighting the text feature vector set by using a weight vector to generate a weighted text feature vector set; and finally, determining the ordering scores of at least two texts according to the weighted text feature vector set. Through the pooling, exciting and weighting operations, features among the texts are associated, so that the relevance of search results obtained according to search terms of the user is improved.
Embodiments of the present invention provide a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform any one of the methods shown in fig. 7. By way of example, computer-readable storage media can be any available media that can be accessed by a computer or a data storage device, such as a server, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
As shown in fig. 9, for a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure, the electronic device 900 includes a Central Processing Unit (CPU) 901, which can execute various appropriate actions and processes shown in fig. 7 according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, only the division of the functional modules is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions. For the specific working processes of the system, the apparatus and the unit described above, reference may be made to the corresponding processes in the foregoing method embodiments, and details are not described here again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method of text ranking, comprising:
obtaining at least two texts corresponding to the query word to form a text feature vector set, wherein the text feature vector set comprises feature vectors of the at least two texts;
pooling the text feature vector set to obtain a fused feature vector, comprising: compressing the text feature vector set according to the dimension to obtain a compressed fusion feature vector;
exciting the fused feature vector to produce a weight vector, comprising: converting the fusion characteristic vector through a full connection layer and an activation function to obtain a weight vector;
weighting the text feature vector set by using the weight vector to generate a weighted text feature vector set; and
and determining the ordering scores of the at least two texts according to the weighted text feature vector set.
2. The method of claim 1, wherein said compressing said set of text feature vectors by dimension to obtain a compressed fused feature vector comprises:
and compressing the text feature vector set according to the maximum dimension pooling or average dimension pooling to obtain a compressed fusion feature vector.
3. The method of claim 1, wherein the weighting the set of text feature vectors using the weight vector, producing a set of weighted text feature vectors comprises:
and performing point multiplication on the weight vector and the text feature vector set to obtain a weighted text feature vector set.
4. The method of claim 1, further comprising: the steps of pooling, activating and weighting are repeated.
5. A text sorting apparatus comprising:
the acquisition module is configured to acquire at least two texts corresponding to the query word to form a text feature vector set, wherein the text feature vector set comprises feature vectors of the at least two texts;
the pooling module is configured to pool the text feature vector set according to dimension compression to obtain a compressed fusion feature vector;
the activation module is configured to convert the fusion characteristic vector through a full connection layer and an activation function to obtain a weight vector;
the weighting module is used for weighting the text feature vector set by using the weight vector to generate a weighted text feature vector set; and
a determination module configured to determine the ranking scores of the at least two texts according to the weighted text feature vector set.
6. The apparatus of claim 5, wherein the pooling module is specifically configured to: and compressing the text feature vector set according to the maximum dimension pooling or average dimension pooling to obtain a fusion feature vector.
7. The apparatus of claim 5, wherein the weighting module is specifically configured to: and performing point multiplication on the weight vector and the text feature vector set to obtain a weighted text feature vector set.
8. The apparatus of claim 5, further comprising: a loop module configured to repeatedly invoke the pooling module, the exciting module, and the weighting module.
9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-4 when executing the program.
10. A computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the method of any one of claims 1-4.
CN202010683552.2A 2020-07-16 2020-07-16 Text sorting method and device Active CN111563159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010683552.2A CN111563159B (en) 2020-07-16 2020-07-16 Text sorting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010683552.2A CN111563159B (en) 2020-07-16 2020-07-16 Text sorting method and device

Publications (2)

Publication Number Publication Date
CN111563159A CN111563159A (en) 2020-08-21
CN111563159B true CN111563159B (en) 2021-05-07

Family

ID=72073939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010683552.2A Active CN111563159B (en) 2020-07-16 2020-07-16 Text sorting method and device

Country Status (1)

Country Link
CN (1) CN111563159B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291699A (en) * 2017-07-04 2017-10-24 湖南星汉数智科技有限公司 A kind of sentence semantic similarity computational methods
CN109086394A (en) * 2018-07-27 2018-12-25 天津字节跳动科技有限公司 Search ordering method, device, computer equipment and storage medium
CN109426664A (en) * 2017-08-30 2019-03-05 上海诺悦智能科技有限公司 A kind of sentence similarity calculation method based on convolutional neural networks
CN110795657A (en) * 2019-09-25 2020-02-14 腾讯科技(深圳)有限公司 Article pushing and model training method and device, storage medium and computer equipment
CN111144094A (en) * 2019-12-09 2020-05-12 中国电子科技集团公司第三十研究所 Text classification method based on CNN and Bi-GRU

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866496B (en) * 2014-02-22 2019-12-10 腾讯科技(深圳)有限公司 method and device for determining morpheme importance analysis model
CN110442689A (en) * 2019-06-25 2019-11-12 平安科技(深圳)有限公司 A kind of question and answer relationship sort method, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291699A (en) * 2017-07-04 2017-10-24 湖南星汉数智科技有限公司 A kind of sentence semantic similarity computational methods
CN109426664A (en) * 2017-08-30 2019-03-05 上海诺悦智能科技有限公司 A kind of sentence similarity calculation method based on convolutional neural networks
CN109086394A (en) * 2018-07-27 2018-12-25 天津字节跳动科技有限公司 Search ordering method, device, computer equipment and storage medium
CN110795657A (en) * 2019-09-25 2020-02-14 腾讯科技(深圳)有限公司 Article pushing and model training method and device, storage medium and computer equipment
CN111144094A (en) * 2019-12-09 2020-05-12 中国电子科技集团公司第三十研究所 Text classification method based on CNN and Bi-GRU

Also Published As

Publication number Publication date
CN111563159A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN109885842B (en) Processing text neural networks
KR101721338B1 (en) Search engine and implementation method thereof
CN103329126B (en) Utilize the search of joint image-audio query
US11782998B2 (en) Embedding based retrieval for image search
WO2021143267A1 (en) Image detection-based fine-grained classification model processing method, and related devices
CN111652378B (en) Learning to select vocabulary for category features
CN102144231A (en) Adaptive visual similarity for text-based image search results re-ranking
CN111666416B (en) Method and device for generating semantic matching model
CN109960749B (en) Model obtaining method, keyword generation method, device, medium and computing equipment
CN110737756B (en) Method, apparatus, device and medium for determining answer to user input data
WO2021012691A1 (en) Method and device for image retrieval
WO2022003991A1 (en) Two-dimensional map generation device, two-dimensional map generation method, and program for generating two-dimensional map
CN113535912A (en) Text association method based on graph convolution network and attention mechanism and related equipment
CN111563159B (en) Text sorting method and device
CN114398883B (en) Presentation generation method and device, computer readable storage medium and server
CN113901278A (en) Data search method and device based on global multi-detection and adaptive termination
CN116522911B (en) Entity alignment method and device
CN117556067B (en) Data retrieval method, device, computer equipment and storage medium
US20240037939A1 (en) Contrastive captioning for image groups
CN117009534B (en) Text classification method, apparatus, computer device and storage medium
CN113283235B (en) User label prediction method and system
Dong et al. High-performance image retrieval based on bitrate allocation
CN113761307A (en) Feature selection method and device
CN116977105A (en) Method, apparatus, device, storage medium and program product for determining a pushmark
CN117312508A (en) Image-based question answering method, apparatus, device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant