CN112749238A

CN112749238A - Search ranking method and device, electronic equipment and computer-readable storage medium

Info

Publication number: CN112749238A
Application number: CN202011620480.3A
Authority: CN
Inventors: 范成
Original assignee: Beijing Jindi Credit Service Co ltd
Current assignee: Beijing Jindi Credit Service Co ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-05-04

Abstract

The disclosure relates to a search ranking method, a search ranking device, an electronic device and a storage medium. Wherein, the method comprises the following steps: in response to an input data search request, recalling a plurality of search result data corresponding to the data search request, wherein each search result data comprises multidimensional data; determining data characteristic vector values corresponding to the multi-dimensional data of each search result data according to the data characteristic categories of the multi-dimensional data; calculating the search probability of the plurality of search result data based on the data feature vector of the multi-dimensional data of each search result; and sequencing and outputting the plurality of search result data according to the search probability of the plurality of search result data. The method and the device can improve the ranking effect of the search results and provide the search ranking results which are more matched with the search requirements of the user.

Description

Search ranking method and device, electronic equipment and computer-readable storage medium

Technical Field

The present disclosure relates to the field of data processing, and in particular, to a search ranking method, apparatus, electronic device, and computer-readable storage medium.

Background

In an application scene of information search, a sequencing result of the information search is a direct window for exposing search result information to a user by a product, and is also the search experience which is most directly felt by the user, and the sequencing effect directly influences the satisfaction degree of the user for searching the product by using the information. The traditional search information sorting algorithm is a sorting and scoring mechanism generally completed according to the text matching degree, the structure is simple, but the sorting result is always relatively harsh, and the sorting result which really meets the search requirement cannot be fed back.

Accordingly, there is a need for one or more methods to address the above-mentioned problems.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of the present disclosure is to provide a search ranking method, apparatus, electronic device, and computer-readable storage medium, thereby overcoming, at least to some extent, one or more problems due to limitations and disadvantages of the related art.

According to an aspect of the present disclosure, there is provided a search ranking method, including:

in response to an input data search request, recalling a plurality of search result data corresponding to the data search request, wherein each search result data comprises multidimensional data;

determining data characteristic vector values corresponding to the multi-dimensional data of each search result data according to the data characteristic categories of the multi-dimensional data;

calculating the search probability of the plurality of search result data based on the data feature vector of the multi-dimensional data of each search result;

and sequencing and outputting the plurality of search result data according to the search probability of the plurality of search result data.

In an exemplary embodiment of the present disclosure, the search ranking model includes multiple layers of perceptrons, each layer of perceptron includes multiple processing nodes, and calculating the search probability of the multiple search result data based on the data feature vector of the multidimensional data of each search result includes:

inputting the search result set into a pre-trained search ranking model;

processing each characteristic vector value in the input search result set, outputting a search probability value, and determining the search probability of the plurality of search result data according to the search probability value; wherein each layer of sensors is processed according to the following method:

f(x)＝f(w*x+b)

the data dimensionality of x and b is the input vector dimensionality of the sensor on the layer where the sensor is located, the vector dimensionality of the sensor on the first layer is the input vector dimensionality, the vector dimensionality of the sensor on the last layer is a preset value, and f is a sigmoid function. In an exemplary embodiment of the present disclosure, the determining, according to the data feature category of the multidimensional data, a data feature vector corresponding to the multidimensional data of each search result data includes any one or more of the following:

the data feature category comprises numerical features, and feature vector values of dimensional data with the data feature category as the numerical features are determined according to a preset feature value mapping relation;

the data feature category comprises structured text features, search words in the data search request are obtained, the first data matching degree of the search words and multi-dimensional data with the structured text features is calculated, and the feature vector value of the dimensional data with the structured text features is determined according to the first data matching degree;

the data feature category comprises unstructured text features, and the unstructured text features are subjected to structuring processing; and acquiring a search word in the data search request, calculating a second data matching degree of the search word and the multi-dimensional data subjected to the structured processing, and determining a feature vector value of the dimensional data with the data feature type being the unstructured text feature according to the second data matching degree.

In an exemplary embodiment of the present disclosure, the unstructured text features are subjected to a structuring process, which includes:

extracting entity data in the unstructured text;

filtering the entity data according to an emotion judgment algorithm;

and carrying out rule matching on the filtered entity data according to a preset expression rule, and generating structured data.

In an exemplary embodiment of the present disclosure, calculating the search probability of the plurality of search result data based on the data feature vector of the multidimensional data of each search result includes:

establishing an incidence relation between the data search request and characteristic vector values of each search result data and multi-dimensional data, and generating a search result set with the incidence relation;

inputting the search result set into a pre-trained search ranking model, and calculating the ranking score value of each feature vector value fitting in the search result set;

and determining the search probability of each search result data according to the sorting score value.

In an exemplary embodiment of the present disclosure, the method further comprises:

acquiring a historical search record set of the data search request, wherein the historical search record set comprises a plurality of historical search result data and the click frequency of each historical search result data;

the calculating the search probability of the plurality of search result data based on the data feature vector of the multi-dimensional data of each search result comprises:

and determining the search probability of each search result data by combining the click frequency of each historical search result data.

In an exemplary embodiment of the present disclosure, determining a search probability of each search result data in combination with the click frequency of each historical search result includes:

carrying out negative sampling on the search result data of which the search probability determined according to the click frequency is smaller than a preset probability index to obtain negative sampling data;

calculating the probability deviation of the negative sampling data according to the original arrangement sequence of the negative sampling data;

and adjusting the search probability of the corresponding search result data according to the probability deviation.

acquiring a historical search record of a current user, and determining the behavior habit of the current user according to the search behavior in the historical search record;

and determining the search probability of each search result data by combining the behavior habits of the current user.

and training the search ranking model according to the search result set with the incidence relation between the data search request and each search result data and the characteristic vector value of the multi-dimensional data and the search probability of each search result data.

acquiring a historical search record of a current user, and analyzing whether a search behavior in the historical search record matches an outlier sample behavior;

and if not, training the search ranking model according to the search result set of the current user with the incidence relation.

In an exemplary embodiment of the present disclosure, the determining a data feature vector value corresponding to multidimensional data of each search result data includes:

and connecting the data characteristic vector values corresponding to the dimensional data end to construct a complete characteristic vector containing the search terms and the search result data.

In one aspect of the present disclosure, there is provided a search ranking apparatus including:

the search result data recalling module is used for recalling a plurality of search result data corresponding to the data search request in response to the input data search request, wherein each search result data comprises multidimensional data;

the data characteristic vector value generation module is used for determining a data characteristic vector value corresponding to the multi-dimensional data of each search result data according to the data characteristic category of the multi-dimensional data;

a search probability calculation module for calculating the search probability of the plurality of search result data based on the data feature vector of the multi-dimensional data of each search result;

and the search result data output module is used for sequencing and outputting the plurality of search result data according to the search probabilities of the plurality of search result data.

In one aspect of the present disclosure, there is provided an electronic device including:

a processor; and

a memory having computer readable instructions stored thereon which, when executed by the processor, implement a method according to any of the above.

In an aspect of the disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, realizes the method according to any one of the above.

A search ranking method in an exemplary embodiment of the present disclosure, the method comprising: in response to an input data search request, recalling a plurality of search result data corresponding to the data search request, wherein each search result data comprises multidimensional data; determining data characteristic vector values corresponding to the multi-dimensional data of each search result data according to the data characteristic categories of the multi-dimensional data; calculating the search probability of the plurality of search result data based on the data feature vector of the multi-dimensional data of each search result; and sequencing and outputting the plurality of search result data according to the search probability of the plurality of search result data. The method and the device can improve the ranking effect of the search results and provide the search ranking results which are more matched with the search requirements of the user.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The above and other features and advantages of the present disclosure will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

FIG. 1 shows a flow diagram of a search ranking method according to an example embodiment of the present disclosure;

FIG. 2 illustrates a search ranking model diagram in accordance with an exemplary embodiment of the present disclosure;

FIG. 3 shows a schematic block diagram of a search ranking apparatus according to an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a block diagram of an electronic device according to an exemplary embodiment of the present disclosure; and

fig. 5 schematically illustrates a schematic diagram of a computer-readable storage medium according to an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the embodiments of the disclosure can be practiced without one or more of the specific details, or with other methods, components, materials, devices, steps, and so forth. In other instances, well-known structures, methods, devices, implementations, materials, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. That is, these functional entities may be implemented in the form of software, or in one or more software-hardened modules, or in different networks and/or processor devices and/or microcontroller devices.

In the present exemplary embodiment, a search ranking method is first provided; referring to fig. 1, the search ranking method may include the steps of:

step S110, responding to an input data search request, recalling a plurality of search result data corresponding to the data search request, wherein each search result data comprises multidimensional data;

step S120, determining data characteristic vector values corresponding to the multidimensional data of each search result data according to the data characteristic categories of the multidimensional data;

step S130, calculating the searching probability of the plurality of searching result data based on the data characteristic vector of the multi-dimensional data of each searching result;

step S140, sorting and outputting the plurality of search result data according to the search probabilities of the plurality of search result data.

Next, the search ranking method in the present exemplary embodiment will be further explained.

In step S110, in response to an input data search request, a plurality of search result data corresponding to the data search request may be recalled, where each search result data includes multidimensional data.

When a user queries for information to be known through the internet, the user often searches in a search engine or a web portal providing vertical information search according to summarized questions or search terms to be searched. In the exemplary embodiment, the data search request may be a request including the search question or a search keyword, which is input by a user trigger. After receiving a data search request input by a user, a search keyword carried in the data search request may be parsed, and corresponding search result data may be recalled according to the search keyword, and in general, there are a plurality of recalled search result data, such as a plurality of results that are accurately matched and a plurality of results that are fuzzily searched and have relatively low accuracy. In most application scenarios, search result data obtained according to a user search request may have multiple dimensions, for example, in a search service scenario with company information, a large amount of multidimensional data is often included, such as dimensions of company industrial and commercial information, investment information, judicial information, trademark information, and the like.

In step S120, a data feature vector value corresponding to the multidimensional data of each search result data may be determined according to the data feature category of the multidimensional data.

The data form corresponding to the data with different dimensions can be in various forms, such as a numerical form, a word form and the like, and generally, the text in the numerical form can be structured text information, such as the registered amount (1000 ten thousand yuan) of a company, the litigation number (20) of the company, the registration time (2020-1-1) of the company and the like; text in text form may have either structured or unstructured text information, for example: the company's brief introduction content is usually large text summarized according to the company business information, or the discussion topic of the user to the company, the public opinion information of the company, etc., are unstructured text information.

In this exemplary embodiment, determining, according to the data feature category of the multidimensional data, a data feature vector corresponding to the multidimensional data of each search result data includes any one or more of the following:

1. when the data characteristic category is a numerical characteristic, determining a characteristic vector value of dimensional data of which the data characteristic category is the numerical characteristic according to a preset characteristic value mapping relation;

2. when the data feature category is the structured text feature, acquiring a search word in the data search request, calculating a first data matching degree of the search word and multi-dimensional data with the structured text feature, and determining a feature vector value of the dimensional data with the data feature category as the structured text feature according to the first data matching degree;

3. when the data feature type is an unstructured text feature, carrying out structuring processing on the unstructured text feature; and acquiring a search word in the data search request, calculating a second data matching degree of the search word and the multi-dimensional data subjected to the structured processing, and determining a feature vector value of the dimensional data with the data feature type being the unstructured text feature according to the second data matching degree.

In the embodiment of the present example, the data feature vector value corresponding to the multidimensional data of each search result data may be determined through normalization processing according to the data feature category of the multidimensional data, and the data processing may be performed according to different data feature vector values, where the normalization processing may include any manner such as extremum normalization, exponential normalization, and logarithmic normalization.

When the data feature type is a numerical feature, determining a feature vector value of dimensional data of which the data feature type is the numerical feature according to a preset feature value mapping relation; specifically, the method comprises the following steps:

for the numerical features, a feature vector value corresponding to a numerical feature range may be preset to generate a preset feature value mapping relationship, and the preset feature vector value may be set to be within a numerical range of 0.1 to 0.9. For example, for numerical features in the form of time stamps such as company establishment dates, the time difference between the numerical features and the current time can be calculated according to time stamp information, then a logarithmic normalization method is used for processing to obtain the company establishment time, and different feature vector values are set according to the company establishment time, if the company establishment date is longer, the corresponding feature vector value is higher, and if the company establishment date is shorter, the feature vector value is lower; for the numerical features of the existence value range such as the number of company history names, the company base score, etc., a linear normalization process may be used first, for example, a calculation process may be performed for the number of company history names by: the number of the company historical names/(the maximum value of the number of the company historical names-the minimum value of the number of the company historical names), and then, the corresponding characteristic vector value is determined according to a preset numerical value characteristic range; for the numerical characteristic range without upper limit, such as registered fund, stockholder quantity and the like, considering that the registered fund of most companies is within 0-200 ten thousand, linear normalization processing can be carried out on the registered fund in 0-200 ten thousand, calculation processing can be carried out on the registered fund in a power index normalization mode which is greater than 200 ten thousand and can use (1/(1+ e ^ (-k x)), wherein k is an empirical normalization coefficient, and then the corresponding characteristic vector value is determined according to the preset numerical characteristic range in the preset characteristic value mapping relation.

When the data feature category is the structured text feature, obtaining a search word in the data search request, calculating a first data matching degree of the search word and multi-dimensional data with the structured text feature, and determining a feature vector value of the dimensional data with the data feature category as the structured text feature according to the first data matching degree. Specifically, the method comprises the following steps:

the structured text features may include partial text information in the structured text in the current business scenario, such as company name, corporate name, company registration number, registration address, corporate information, high-management information, brand organization, and the like. For the structured text features, the text matching degree between the structured text features and the search words needs to be calculated, and the specific matching degree calculation mode can include a sliding window editing rate, a query hit rate, an overall editing rate and the like; for the structured text features, feature vector values corresponding to the matching degree intervals can also be preset, a preset feature value mapping relation is generated, feature vectors are formed according to the preset arrangement sequence of the matching features, the preset feature vector values can be consistent with the feature vector value range of the numerical features and are in the numerical range of 0.1-0.9 for unified calculation standards, and for example, the higher the matching degree is, the larger the corresponding feature vector value is. And then, determining a feature vector value of the structured text feature according to the corresponding preset feature value mapping relation according to the text matching degree between the structured text feature and the search word. And finally, storing the structured text features in a permanent feature storage module for storage.

When the data feature type is an unstructured text feature, the unstructured text feature can be firstly subjected to structuring processing; and acquiring a search word in the data search request, calculating a second data matching degree of the search word and the multi-dimensional data subjected to the structured processing, and determining a feature vector value of the dimensional data with the data feature type being the unstructured text feature according to the second data matching degree. Specifically, the method comprises the following steps:

the unstructured text features may include company business scopes, company profile information, company registration addresses, and the like. Carrying out structuring processing on the unstructured text features, wherein the structuring processing comprises the following steps: extracting entity data in the unstructured text according to a preset entity data extraction rule; filtering the entity data according to an emotion judgment algorithm; and carrying out rule matching on the filtered entity data according to a preset expression rule, and generating structured data. For example, the experience range of a certain technology company includes "technology development, technology consultation, technology service, technology promotion, and technology transfer; a computer system service; a base software service; an application software service; software development; software consultation; engaging in internet cultural activities; operating a telecommunications service; and (3) an internet information service. (business selects business item independently according to law, carries on business activity; business activity of business policy forbidding and restricting item should not be done) then the product and business operated by the company can be extracted by NER (entity extraction method) with customized entity data extraction rule, for example, according to the business license information and preset business license entity data extraction rule: entity data such as 'technology development and internet information service'. Meanwhile, negative oriented tone words such as 'do not contain' and 'remove' existing in the filtering operation range are filtered according to the emotion judgment algorithm and the stop word, entities extracted within the influence range of the tone words need to be filtered, and then the structured text after structured processing can be obtained. And finally, storing the structured text features in a permanent feature storage module for storage.

Further, determining a data feature vector value corresponding to the multidimensional data of each search result data may further include: and connecting the data characteristic vector values corresponding to the dimensional data end to construct a complete characteristic vector containing the search terms and the search result data.

In step S130, the search probability of the plurality of search result data may be calculated based on the data feature vector of the multidimensional data of each search result.

In the embodiment of the present example, after determining the data feature vector value corresponding to the multidimensional data of each search result data according to the step S120, the search ranking model may be trained according to the search result set having the association relationship between the data search request and each search result data, the feature vector value of the multidimensional data, and the search probability of each search result data. Namely, a model training sample is constructed by using data characteristic vector values corresponding to multi-dimensional data of each search result data, and the model training sample is input into a search ranking model to train the search ranking model. The search ordering model can be built by using tensoflow, for example, a point-wise model architecture and a regression model based on ANN are adopted, and the overall model structure comprises an input layer, two full-connection layers and an output layer.

Furthermore, a historical search record set of the data search request can be obtained, wherein the historical search record set comprises a plurality of historical search result data and the click frequency of each historical search result data; the click frequency of each historical search result data according to each search data result can also be used as a training sample of a search ranking model, and a calculation basis is provided for subsequently calculating the search probability of the plurality of search result data.

In an embodiment of the present example, the method further comprises: acquiring a historical search record of a current user, and analyzing whether a search behavior in the historical search record matches an outlier sample behavior; and if not, training the search ranking model according to the search result set of the current user with the incidence relation. Specifically, the method comprises the following steps:

in practical applications, the accuracy of the training samples for searching the ranking model may be affected by the existence of some specific user groups, and such specific group samples may be outlier samples, such as users for electric marketing, and the like. After the historical search record of the current user is obtained, whether the search behavior in the historical search record of the current user is matched with the behavior of the outlier sample is analyzed, if the user sends a data search request, click behaviors are carried out on each recalled search result data or most of the search result data, namely whether the click behaviors of the user tend to Gaussian distribution on the whole or not, if yes, the current user is a user matched with the outlier sample, the current outlier sample can be filtered, and the noise data in the sample set is reduced; if the search behavior in the historical search record does not match the behavior of the outlier sample after the historical search record of the current user is obtained, the search result set of the current user is indicated to be a non-outlier sample, and the search result set of the current user is used as a training sample to train a search ranking model.

In this exemplary embodiment, calculating the search probability of the plurality of search result data based on the data feature vector of the multidimensional data of each search result includes: establishing an incidence relation between the data search request and characteristic vector values of each search result data and multi-dimensional data, and generating a search result set with the incidence relation; inputting the search result set into a pre-trained search ranking model, and calculating the ranking score value of each feature vector value fitting in the search result set; and determining the search probability of each search result data according to the sorting score value.

After determining the data feature vector value corresponding to the multidimensional data of each search result data according to the step S120, establishing an association relationship between the data search request and each search result data and the feature vector value of the multidimensional data, generating a search result set having the association relationship, inputting the search result set into a pre-trained search ranking model, calculating the loss value of each search result data by using a preset loss function according to the feature vector value of each dimension data, wherein the final loss value of each search result data can be fitted with a ranking score value by each feature vector value in the search result set, and finally determining the search probability P of each search result data according to the ranking score value.

As shown in fig. 2, the search ranking model includes multiple layers of sensors, each layer of sensor includes multiple processing nodes, and the calculating of the search probability of the multiple search result data based on the data feature vector of the multidimensional data of each search result includes:

inputting the search result set into a pre-trained search ranking model;

f(x)＝f(w*x+b)

the data dimensionality of x and b is the input vector dimensionality of the sensor on the layer where the sensor is located, the vector dimensionality of the sensor on the first layer is the input vector dimensionality, the vector dimensionality of the sensor on the last layer is a preset value, and for example, 1, f is a sigmoid function.

For example, a search request containing search term data "jinbang technology" includes a plurality of search result data: search result data corresponding to Beijing Jinke technology Limited, search result data corresponding to Beijing Tianyan search technology Limited, search result data corresponding to Yancheng Jinke technology Limited, search result data corresponding to Beijing Jinke credit investigation service Limited, and the like, dimension information corresponding to each search result data comprises registered funds, registered addresses, business ranges, enterprise scores, trademark numbers, litigation numbers, company name change numbers, and the like, and a search result set corresponding to each search result data and having an incidence relation of characteristic vector values of each dimension data comprises: beijing Jinke technologies Inc. [0.2,0.1,0.1,0.3,0.5,0.3,0.6 ]; beijing Tianyan Chao science and technology Co., Ltd. [0.2,0.3,0.1,0.4,0.5,0.3,0.6 ]; salt city jinke technologies ltd [0.1,0.7,0.1,0.3,0.5,0.3,0.6 ]; beijing Jinke Credit services GmbH [0.2,0.2,0.3,0.3,0.5,0.3,0.6], and the like. Then inputting the search result set of the plurality of search result data through an input layer of a search sorting model, calculating loss values of the search result data by using a preset loss function according to the characteristic vector values of the dimension data by two fully-connected layers, determining the search probability P of the search result data according to the loss values of the search result data, and finally outputting the search probability P by an output layer, wherein the representation form of the search probability P can be represented by numerical values in the range of 0.1-0.9.

Furthermore, in the stage of training the search ranking model, the frequency of clicking the historical search result data of each search data result can be used as a training sample of the search ranking model, so that when the search probability of a plurality of search result data is calculated in the subsequent process, the plurality of historical search result data in the historical search record set of the acquired data search request and the frequency of clicking the historical search result data can be further used; and determining the search probability of each search result data by combining the click frequency of each historical search result data.

Furthermore, historical search records of the current user can be obtained, and behavior habits of the current user are determined according to search behaviors in the historical search records; and determining the search probability of each search result data by combining the behavior habits of the current user. For example, different users may have different search habits according to different industries, and the search preferences of the users can be abstracted by determining the behavior habits of the current users through the search behaviors of the users analyzed in the historical search records of the users, for example, the users in the building industry click on companies in the building industry, or some users click on companies related to pipeline processing due to work correlation, so that the use habits of the users are abstracted, and the companies close to the user habits are intervened according to the use habits of the users during sorting, so that the sorting position of the companies can be improved, and the users can use the companies conveniently.

In the embodiment of the example, the method comprises the steps of obtaining a historical search record set of a data search request, wherein the historical search record set comprises a plurality of historical search result data and click frequency of each historical search result data; and determining the search probability of each search result data by combining the click frequency of each historical search result data. Determining the search probability of each search result data by combining the click frequency of each historical search result, wherein the search probability comprises the following steps: carrying out negative sampling on the search result data of which the search probability determined according to the click frequency is smaller than a preset probability index to obtain negative sampling data; calculating the probability deviation of the negative sampling data according to the original arrangement sequence of the negative sampling data; and adjusting the search probability of the corresponding search result data according to the probability deviation.

In the embodiment of the example, a high-quality historical search result set is mined by mining a historical search result data log of a user, aggregation is performed according to historical search results, frequency statistics is performed on the click frequency of each historical search result data under the same search data request, then the click probability P is calculated for each historical search result data set under the search data request according to the click frequency, and the calculation mode of the probability P can be the click frequency of the search result data/the click frequency of all the historical search result data under the historical search result. Because there may be a serious problem of uneven distribution of the click frequency of the search result data due to the selection of the search ranking model, theoretically, the click probability of companies with the top ranking positions is large, for example, the ranking positions are top, but the samples with the small click probability are more important than the samples with the small click probability behind, considering the influence of the ranking positions, different negative sampling strategies are required for companies with the click probabilities of different click positions smaller than a preset probability index (such as 0.1), and similarly, for the samples with the large click probability, sampling of different strategies according to the ranking positions is required, for example, when a point-wise framework based on an ANN regression model is selected, the click probability of most companies is smaller than 0.1, and the number of companies with the click probability larger than 0.5 between 0.1 and 0.5 is almost the same, so after the click probability is calculated, carrying out negative sampling on companies with the click probability less than 0.1 to obtain negative sampling data; calculating the probability deviation of the negative sampling data according to the original arrangement sequence of the negative sampling data; and adjusting the searching probability of the corresponding searching result data according to the probability deviation. After the above steps are completed, a search result set with labels of the search result data can be obtained, wherein the labels are the search probabilities of the search result data.

In step S140, the plurality of search result data may be sorted and output according to the search probability of the plurality of search result data.

In the embodiment of the present example, the search probabilities of the multiple pieces of search result data represent the possibility that the corresponding pieces of search result data are clicked by the user, and therefore, the multiple pieces of search result data may be arranged according to the search probability distribution of the multiple pieces of search result data, and if the search probability output in the search ranking model is represented by 0.1 to 0.9, the multiple pieces of search result data may be ranked and output according to the actual characterization criteria.

It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.

Further, in the present exemplary embodiment, a search ranking apparatus is also provided. Referring to fig. 3, the search ranking apparatus 200 may include: a search result data recall module 210, a data feature vector value generation module 220, a search probability calculation module 230, and a search result data output module 240. Wherein:

a search result data recalling module 210, configured to recall, in response to an input data search request, a plurality of search result data corresponding to the data search request, where each search result data includes multidimensional data;

a data feature vector value generation module 220, configured to determine, according to the data feature category of the multidimensional data, a data feature vector value corresponding to the multidimensional data of each search result data;

a search probability calculation module 230, configured to calculate search probabilities of the plurality of search result data based on data feature vectors of the multidimensional data of the search results;

and a search result data output module 240, configured to perform sorting output on the multiple pieces of search result data according to the search probabilities of the multiple pieces of search result data.

The specific details of each search ranking apparatus module are already described in detail in the corresponding search ranking method, and therefore are not described herein again.

It should be noted that although several modules or units of the search ranking means 200 are mentioned in the above detailed description, such division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 300 according to such an embodiment of the invention is described below with reference to fig. 4. The electronic device 300 shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 3, electronic device 300 is embodied in the form of a general purpose computing device. The components of electronic device 300 may include, but are not limited to: the at least one processing unit 310, the at least one memory unit 320, a bus 330 connecting different system components (including the memory unit 320 and the processing unit 310), and a display unit 340.

Wherein the storage unit stores program code that is executable by the processing unit 310 to cause the processing unit 310 to perform steps according to various exemplary embodiments of the present invention as described in the above section "exemplary method" of the present specification. For example, the processing unit 310 may perform steps S110 to S140 as shown in fig. 1.

The storage unit 320 may include readable media in the form of volatile storage units, such as a random access memory unit (RAM)3201 and/or a cache memory unit 3202, and may further include a read only memory unit (ROM) 3203.

The storage unit 320 may also include a program/utility 3204 having a set (at least one) of program modules 3205, such program modules 3205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 330 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 300 may also communicate with one or more external devices 370 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 300, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 300 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 350. Also, the electronic device 300 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 360. As shown, network adapter 360 communicates with the other modules of electronic device 300 via bus 330. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 300, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above-mentioned "exemplary methods" section of the present description, when said program product is run on the terminal device.

Referring to fig. 5, a program product 400 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A method of search ranking, the method comprising:

2. The method according to claim 1, wherein the determining a data feature vector corresponding to the multidimensional data of each search result data according to the data feature category of the multidimensional data includes any one or more of:

3. The method of claim 2, wherein structuring the unstructured text features comprises:

extracting entity data in the unstructured text;

filtering the entity data according to an emotion judgment algorithm;

4. The method of claim 1, wherein the search ranking model comprises a plurality of layers of perceptrons, each layer of perceptrons comprising a plurality of processing nodes, and wherein computing search probabilities for the plurality of search result data based on data feature vectors for multidimensional data of the respective search results comprises:

inputting the search result set into a pre-trained search ranking model;

f(x)＝f(w*x+b)

the data dimensionality of x and b is the input vector dimensionality of the sensor on the layer where the sensor is located, the vector dimensionality of the sensor on the first layer is the input vector dimensionality, the vector dimensionality of the sensor on the last layer is a preset value, and f is a sigmoid function.

5. The method of claim 1, wherein computing search probabilities for the plurality of search result data based on data feature vectors for the multidimensional data for the respective search results comprises:

6. The method of claim 1 or 5, wherein the method further comprises:

7. The method of claim 6, wherein determining a search probability for each search result data in conjunction with the frequency of clicks for each historical search result comprises:

8. The method of claim 6, wherein the method further comprises:

9. The method of claim 5, wherein the method further comprises:

10. The method of claim 9, wherein the method further comprises:

11. The method of claim 1, wherein the determining data feature vector values corresponding to multidimensional data of each search result data comprises:

12. An apparatus for search ranking, the apparatus comprising:

the data recall module is used for recalling a plurality of search result data corresponding to the data search request in response to the input data search request, wherein each search result data comprises multidimensional data;

the characteristic vector value generation module is used for determining a data characteristic vector value corresponding to the multidimensional data of each search result data according to the data characteristic category of the multidimensional data;

and the data output module is used for sequencing and outputting the plurality of search result data according to the search probabilities of the plurality of search result data.

13. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-11.

14. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-11.