CN113268618B - Search information scoring method and device and electronic equipment - Google Patents

Search information scoring method and device and electronic equipment Download PDF

Info

Publication number
CN113268618B
CN113268618B CN202010095770.4A CN202010095770A CN113268618B CN 113268618 B CN113268618 B CN 113268618B CN 202010095770 A CN202010095770 A CN 202010095770A CN 113268618 B CN113268618 B CN 113268618B
Authority
CN
China
Prior art keywords
score
multimedia content
search information
keywords
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010095770.4A
Other languages
Chinese (zh)
Other versions
CN113268618A (en
Inventor
石斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010095770.4A priority Critical patent/CN113268618B/en
Publication of CN113268618A publication Critical patent/CN113268618A/en
Application granted granted Critical
Publication of CN113268618B publication Critical patent/CN113268618B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a search information scoring method, a search information scoring device and electronic equipment, and relates to the technical field of search. The specific implementation scheme is as follows: obtaining a similarity score of the multimedia content corresponding to the search information, wherein the similarity score comprises a distance score of the multimedia content relative to the word segmentation in the search information and/or a hit condition score of the multimedia content relative to the word segmentation in the search information; and determining the score of the multimedia content relative to the search information according to the similarity score. The method and the device improve the accuracy of scoring of the multimedia content.

Description

Search information scoring method and device and electronic equipment
Technical Field
The present disclosure relates to the field of search technologies in the field of computing technologies, and in particular, to a search information scoring method, apparatus, and electronic device.
Background
In searching multimedia contents, it is necessary to score the multimedia contents, for example: when searching videos, the searched videos need to be scored, so that videos with higher scores are recommended to users. However, only word frequencies in search information (query) in which the multimedia content appears are considered when the multimedia content is scored at present, so that the scoring accuracy of the multimedia content is poor.
Disclosure of Invention
The application provides a search information scoring method, a search information scoring device and electronic equipment, and aims to solve the problem that scoring accuracy of multimedia content is poor.
In a first aspect, the present application provides a search information scoring method, including:
obtaining a similarity score of the multimedia content corresponding to the search information, wherein the similarity score comprises a distance score of the multimedia content relative to the word segmentation in the search information and/or a hit condition score of the multimedia content relative to the word segmentation in the search information;
and determining the score of the multimedia content relative to the search information according to the similarity score.
The scoring accuracy can be improved because the scoring of the multimedia content relative to the search information is determined according to the distance score and/or the hit score.
Optionally, the distance score is determined by:
calculating the distance between a plurality of keywords in the search information and the multimedia content, wherein the distance between the i-th keyword hit in the plurality of keywords is the distance between the i-th keyword hit and the i+1-th keyword hit in the multimedia content, if the missed keywords exist in the plurality of keywords, the distance between the missed keywords is a preset distance, the plurality of keywords are part or all of the keywords in the search information, and i is a positive integer;
And summing the distances of the plurality of cutting words to obtain the distance score.
The distance scores are obtained through the distances of a plurality of word cutting words, so that the scores can be more accurate.
Optionally, the hit score is determined by:
calculating the scores of a plurality of keywords in the search information in the multimedia content, wherein if a target keyword hits in the multimedia content, the score of the target keyword is a first score, and if the target keyword does not hit in the multimedia content but hits the synonym of the target keyword, the score of the target keyword is a second score, wherein the first score is larger than the second score, and the target keyword is any one of the plurality of keywords;
and summing the scores of the plurality of word cutting words to obtain the hit condition score.
The score of the hit condition is obtained through the scores of the plurality of word cutting words, so that the score is more accurate.
Optionally, the determining the score of the multimedia content relative to the search information according to the similarity score includes:
and determining the score of the multimedia content relative to the search information according to the quotient obtained by dividing the hit condition score by the distance score.
Since the score of the multimedia content relative to the search information is determined according to the quotient, the score of the multimedia content more similar to the search information can be higher, and the accuracy of the score can be further improved.
Optionally, before determining the score of the multimedia content relative to the search information according to the similarity score, the method further includes:
calculating a weight score of the multimedia content relative to the search information;
the determining the score of the multimedia content relative to the search information according to the similarity score comprises:
and determining the score of the multimedia content relative to the search information according to the similarity score and the weight score.
The score of the multimedia content can be more accurate due to the combination of the weight scores of the multimedia content relative to the search information.
Optionally, the weight score of the multimedia content relative to the search information is determined by:
calculating weight values of a plurality of cutting words in the search information;
calculating importance scores of a plurality of keywords in the search information in the multimedia content, wherein the importance scores of target keywords in the multimedia content are determined according to content domains of the target keywords or synonyms of the target keywords in the multimedia content, the multimedia content comprises a plurality of content domains, importance weight values of different content domains are different, and the target keywords are any one of the plurality of keywords;
Multiplying the weight value of the target word with the importance score to obtain the weight score of the target word;
and summing the weight scores of the plurality of word segmentation to obtain the weight score of the multimedia content relative to the search information.
Because the importance weight values of different content configurations are different, the weight scores can be more accurate, so that the accuracy of scoring is further improved.
In a second aspect, the present application provides a search information scoring apparatus, including:
the acquisition module is used for acquiring similarity scores of the multimedia content corresponding to the search information, wherein the similarity scores comprise distance scores of the multimedia content relative to the words in the search information and/or hit scores of the multimedia content relative to the words in the search information;
and the determining module is used for determining the score of the multimedia content relative to the search information according to the similarity score.
Optionally, the distance score is determined by:
calculating the distance between a plurality of keywords in the search information and the multimedia content, wherein the distance between the i-th keyword hit in the plurality of keywords is the distance between the i-th keyword hit and the i+1-th keyword hit in the multimedia content, if the missed keywords exist in the plurality of keywords, the distance between the missed keywords is a preset distance, the plurality of keywords are part or all of the keywords in the search information, and i is a positive integer;
And summing the distances of the plurality of cutting words to obtain the distance score.
Optionally, the hit score is determined by:
calculating the scores of a plurality of keywords in the search information in the multimedia content, wherein if a target keyword hits in the multimedia content, the score of the target keyword is a first score, and if the target keyword does not hit in the multimedia content but hits the synonym of the target keyword, the score of the target keyword is a second score, wherein the first score is larger than the second score, and the target keyword is any one of the plurality of keywords;
and summing the scores of the plurality of word cutting words to obtain the hit condition score.
Optionally, the determining module is configured to determine a score of the multimedia content relative to the search information according to a quotient obtained by dividing the hit score by the distance score.
Optionally, the apparatus further includes:
a calculation module for calculating a weight score of the multimedia content relative to the search information;
the determining module is used for determining the score of the multimedia content relative to the search information according to the similarity score and the weight score.
Optionally, the weight score of the multimedia content relative to the search information is determined by:
calculating weight values of a plurality of cutting words in the search information;
calculating importance scores of a plurality of keywords in the search information in the multimedia content, wherein the importance scores of target keywords in the multimedia content are determined according to content domains of the target keywords or synonyms of the target keywords in the multimedia content, the multimedia content comprises a plurality of content domains, importance weight values of different content domains are different, and the target keywords are any one of the plurality of keywords;
multiplying the weight value of the target word with the importance score to obtain the weight score of the target word;
and summing the weight scores of the plurality of word segmentation to obtain the weight score of the multimedia content relative to the search information.
In a third aspect, the present application provides an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided herein.
In a fourth aspect, the present application provides a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method provided herein.
One embodiment of the above application has the following advantages or benefits:
obtaining a similarity score of the multimedia content corresponding to the search information, wherein the similarity score comprises a distance score of the multimedia content relative to the word segmentation in the search information and/or a hit condition score of the multimedia content relative to the word segmentation in the search information; and determining the score of the multimedia content relative to the search information according to the similarity score. The scoring of the multimedia content relative to the search information is determined according to the similarity score, so that the technical problem that the scoring accuracy of the multimedia content is poor is solved, and the technical effect of improving the scoring accuracy of the multimedia content is achieved.
Other effects of the above alternative will be described below in connection with specific embodiments.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flow chart of a search information scoring method provided herein;
FIG. 2 is an exemplary schematic diagram of the weight settings provided herein;
FIG. 3 is an exemplary schematic diagram of the scoring process provided herein;
FIG. 4 is a block diagram of a search information scoring apparatus provided herein;
FIG. 5 is a block diagram of a search information scoring apparatus provided herein;
fig. 6 is a block diagram of an electronic device for implementing a search information scoring method according to an embodiment of the present application.
Description of the embodiments
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Referring to fig. 1, fig. 1 is a flowchart of a search information scoring method provided in the present application, and as shown in fig. 1, the method includes the following steps:
Step S101, obtaining similarity scores of multimedia contents corresponding to search information, wherein the similarity scores comprise distance scores of the multimedia contents relative to the words in the search information and/or hit scores of the multimedia contents relative to the words in the search information.
The multimedia content may be video, text, or images, etc., or a combination of video, text, images, etc. And the search information may be text, voice, etc. And the search information includes at least one cut, each cut may be a phrase or word, and the cut may also be referred to as a keyword.
The distance score may be used to represent the order and distance of the words in the search information where the multimedia content appears, where the higher the similarity between the order and the order in the search information, the smaller the distance, the higher the score calculated according to the distance score, and vice versa, the lower the score. For example: the search information is "beijing great wall of china", for three multimedia contents: 1) chinese beijing great wall, 2) chinese beijing great wall, 3) chinese great wall, beijing, where' ″ represents a character. The score calculated from the distance scores is 2) highest in the great wall of chinese, 1) lowest in the great wall of chinese, 3) lowest in the great wall of chinese.
The hit score is used to indicate the case where the multimedia content hits the search information, for example: hit cases include hit keywords or synonyms for hit keywords, where the score of a hit cut is higher than the score of a synonym for hit cut.
In this application, the information such as title, author, content, etc. of the multimedia content may be hit in the word in the search information.
And step S102, determining the score of the multimedia content relative to the search information according to the similarity score.
The determining the score of the multimedia content relative to the search information according to the similarity score may be determining the score according to a correspondence between a pre-configured similarity score and the score, or calculating the score according to a pre-configured formula for calculating the score according to the similarity score. In addition, the relationship between the similarity score and the score may be that the higher the similarity score is, the higher the score is, whereas the lower the similarity score is, the lower the score is.
In the method, the score of the multimedia content relative to the search information is determined according to the distance score and/or the hit condition score, so that the accuracy of the score can be improved.
Optionally, the distance score is determined by:
calculating the distance between a plurality of keywords in the search information and the multimedia content, wherein the distance between the i-th keyword hit in the plurality of keywords is the distance between the i-th keyword hit and the i+1-th keyword hit in the multimedia content, if the missed keywords exist in the plurality of keywords, the distance between the missed keywords is a preset distance, the plurality of keywords are part or all of the keywords in the search information, and i is a positive integer;
and summing the distances of the plurality of cutting words to obtain the distance score.
The distance between the ith and the hit (i+1) th keywords in the multimedia content may be the minimum distance between the ith and the hit (i+1) th keywords in the multimedia content, which is not limited, for example: may be an average distance. The predetermined distance may be a predetermined specific value, or an attribute parameter of the multimedia content, etc.
The following is illustrated by the following formula:
wherein, represents the aboveDistance score, ->The minimum value of the distance between the cut q (i) and the cut q (i+1) is expressed as the distance of the cut q (i). The minimum value of the distance between the cut q (i) and the cut q (i+1) may be: firstly, the position index of the word q (i) appearing in the multimedia content can be an array, the position indexes of the word q (i+1) in the multimedia content can be found out in the same way, the position indexes can be an array, then the minimum distance between the two arrays, namely the minimum distance between the two words, needs to be calculated, and finally, the minimum distance of all the words is obtained through summation operation. For keywords that do not hit, the distance may be calculated as half the average length of the multimedia content.
The above formula can make the word sequence be disorder, the hit proportion be smaller, the distance be bigger, so the distance score and the correlation of the score are inversely proportional. Further, by the aboveWherein->The average length of the multimedia content is represented, n is the number of words included in the search information, so that a smooth transition effect can be achieved through the constant distance, absolute influence of the distance score on the relevance is reduced, and the accuracy of scoring is further improved.
It should be noted that the above formula is only an example, and the calculation of the distance score is not limited in this application, for example: can be dispensed withWith minimum distance between the words, or in some scenarios without additionOr other constants may be added, etc.
The distance scores are obtained through the distances of a plurality of word cutting words, so that the scores can be more accurate. In addition, the score may be calculated such that the distance score and the score are inversely related, and the score may be increased based on the fact that the larger the distance score is, the smaller the distance score is, and the larger the score is.
Optionally, the hit score is determined by:
calculating the scores of a plurality of keywords in the search information in the multimedia content, wherein if a target keyword hits in the multimedia content, the score of the target keyword is a first score, and if the target keyword does not hit in the multimedia content but hits the synonym of the target keyword, the score of the target keyword is a second score, wherein the first score is larger than the second score, and the target keyword is any one of the plurality of keywords;
And summing the scores of the plurality of word cutting words to obtain the hit condition score.
The target word hits in the multimedia content may be hits of information such as title, author, content, discussion, introduction, etc. in the multimedia content.
The above calculated hit score may also be calculated by the following formula:
where Rate represents the hit score, target represents the hit of the word, and simtaget represents the synonym that hit changed the word. As the simTarget is added, the relevance of more query words hit due to overlarge close distance caused by disorder when the multi-query words hit can be prevented from being smaller than that of fewer but more ordered results after the distance score is added, so that the accuracy of scoring is further improved.
Wherein, the aboveThe weight value representing the word segmentation can be specifically the comprehensive weight of a content library,/>I-th word representing search information, < ->An importance rating of the ith term representing the search information within the overall content pool may be merely an importance rating of the term itself within the overall content pool. Wherein the importance of a term is inversely proportional to its frequency of occurrence in the total document collection.
) I.e. the number of the i-th word-segment, N is the total number of the multimedia contents, and 1 is added to the log to prevent the IDF value from becoming 0, so that the IDF is a non-zero positive number.
WhileRepresenting the word->Is used to determine the intervention weight of (1).
Mainly for classifying parts of speech in some scenarios (e.g. short video scenarios) the content library. For example: as shown in fig. 2, the word segmentation is divided into multiple stages, such as a first-stage part of speech is TV, MV, variety, video author, etc., a second-stage part of speech collection is mainly aimed at a television office, a third-stage part of speech noun, name, celebrity, important event of a point of interest, etc., and a fourth-stage part of speech is a common event, etc. The core words of the search information can be identified through grading, and the keyword weights are subjected to step weight raising and weight lowering, so that the method is more suitable for query word weight evaluation in some scenes (for example, short video scenes). It should be noted that fig. 2 is only an example of taking multimedia content as video, and the present application is not limited thereto. For example: for some text multimedia, the keywords in the search information can be divided into different levels so as to reflect the importance degree of different keywords in the search information.
The above-mentionedThis application is not limited to this, for example: can be left out of consideration->
Above-mentionedMerely by way of example, for example: can be used to calculate hit scores in some scenarios without consideration +.>The scoring accuracy of the multimedia information can be improved as well, or the weight values 1 and 0.8 can be adjusted according to the requirements.
In this embodiment, since the hit score is obtained by the scores of the plurality of cut words, the score can be made more accurate.
Optionally, the determining the score of the multimedia content relative to the search information according to the similarity score includes:
and determining the score of the multimedia content relative to the search information according to the quotient obtained by dividing the hit condition score by the distance score.
The score of the multimedia content relative to the search information may be determined by dividing the hit score by the distance score, where the quotient obtained by dividing the hit score by the distance score is used as a similarity score, for example: is expressed by the following formula:
wherein,,representing hit score of the above multimedia content, the above +. >Representing the distance score of the multimedia content.
The similarity score can score the similarity of the whole multimedia content, and the similarity score can enable the hit word in the multimedia content to be consistent with the sequence of the word in the search information as far as possible, and meanwhile, the distance between the word in the multimedia content and the word in the search information can be as small as possible when the sequence of the word in the search information is consistent with the sequence of the word in the multimedia content. In addition, the similarity score may be made more accurate by the hit score.
Since the score of the multimedia content relative to the search information is determined according to the quotient, the score of the multimedia content more similar to the search information can be higher, and the accuracy of the score can be further improved.
It should be noted that, the above-mentioned quotient obtained by dividing the hit score by the distance score is only a preferred embodiment for determining the score of the multimedia content relative to the search information, and the embodiment for determining the score of the multimedia content relative to the search information is not limited in this application, for example: in a scenario, different weights of the hit case score and the distance score may be set, where the weight of the distance score is a negative real number and the weight of the hit case score is a positive real number, so that the overall similarity between the multimedia content and the search information may be also embodied.
Optionally, before determining the score of the multimedia content relative to the search information according to the similarity score, the method further includes:
calculating a weight score of the multimedia content relative to the search information;
the determining the score of the multimedia content relative to the search information according to the similarity score comprises:
and determining the score of the multimedia content relative to the search information according to the similarity score and the weight score.
The weight score may be used to indicate importance of the multimedia content with respect to the search information, where the higher the weight score, the more important the multimedia content is with respect to the search information, and vice versa.
The score of the multimedia content can be more accurate due to the combination of the weight scores of the multimedia content relative to the search information.
Optionally, the weight score of the multimedia content relative to the search information is determined by:
calculating weight values of a plurality of cutting words in the search information;
calculating importance scores of a plurality of keywords in the search information in the multimedia content, wherein the importance scores of target keywords in the multimedia content are determined according to content domains of the target keywords or synonyms of the target keywords in the multimedia content, the multimedia content comprises a plurality of content domains, importance weight values of different content domains are different, and the target keywords are any one of the plurality of keywords;
Multiplying the weight value of the target word with the importance score to obtain the weight score of the target word;
and summing the weight scores of the plurality of word segmentation to obtain the weight score of the multimedia content relative to the search information.
Wherein the weight score of each word may be preconfigured, or based on the descriptionThe weight score of each word is calculated, and will not be described here.
And the importance scores of the cut words may be determined from the hit content domain, for example: different weights are configured for each content domain in advance, for example, the weight of the title content domain is greater than that of the tag content domain, and the weight of the author content domain is greater than that of the title content domain, so that the importance scores of the cutting words can be determined through the hit content domain.
For example: taking multimedia content as an example of video, the importance scores of the terms can be calculated by the following formula:
wherein,,、/>、/>and->Representing four different content fields, such as title, introduction, label and author, respectively, of course, is only an example here, as eitherSome content fields are deleted or different fields may be set for different types of multimedia content, which is not limited. In addition, the weights for each domain may be preconfigured, for example: and obtaining better parameters on the basis of multiple statistics.
In the application, a piece of multimedia content can be split into a plurality of domains, and the weight corresponding to the occurrence of the word q (i) in each domain is different. Furthermore, the processing of the synonyms can be combined, and when the corresponding word is not found in the title, the synonyms of the word can be found, so that a better semantic retrieval effect can be achieved.
Above-mentionedRepresenting the quality score of this multimedia content itself. Taking video as an example, a comprehensive score related to video quality dimensions such as video release time, play times, praise numbers, comment numbers and the like is fused. Furthermore, in order to reduce the excessive influence of the mass fraction of the multimedia content on the overall correlation, a maximum value of the mass fraction may be added before the mass fraction, i.e. the above->And the score is preset.
By the aboveThe word frequency can be represented to have different meanings in different multimedia contents, the same word frequency can be realized, and the higher score is possessed in the multimedia contents with better quality.
And k1 and b are constants, and D is multimedia content, wherein k1 can be used for controlling the maximum value of influence of word frequency on the whole relevance, and the maximum value of influence of word frequency on the relevance can be calculated through the second term function of the formula, so that when the maximum value of influence of word frequency on the relevance of one word segmentation does not exceed (1+k1), the accuracy of scoring can be further improved. Where the meaning of b is understood as the probability of occurrence of word frequency in titles of media content of different lengths is different, and b is understood as a penalty for long titles and a prize for short titles, so that when D is larger and other conditions are the same, the overall correlation is negative, to further improve the accuracy of scoring.
The above-mentioned l represents the title length of the multimedia content, the above-mentionedAn average length of the plurality of multimedia contents may be represented.
It should be noted that the above formula is only one preferred embodiment, for example: in some embodiments, the above may not be addedThe importance of each word can be also represented by adding other constants based on the formula or removing the constants.
In this embodiment, because the importance weight values of different content configurations are different, the weight score can be more accurate, so as to further improve the accuracy of the score.
The determining the score of the multimedia content relative to the search information according to the similarity score and the weight score may be that the score of the multimedia content relative to the search information is obtained by multiplying the similarity score and the weight score.
For example: the score may be calculated by the following formula:
the meaning of each part is described later.
Wherein,,scoring representing the multimedia content relative to the search information, as multimedia contentFor video example, the score acquisition process may be as shown in fig. 3.
It should be noted that, determining the score of the multimedia content relative to the search information according to the similarity score and the weight score in the present application is not limited to multiplying the similarity score and the weight score to obtain the score of the multimedia content relative to the search information, for example: the similarity score and the weight score can be added to obtain the score of the multimedia content relative to the search information, so that the score can also reflect the integral relevance of the multimedia definition to the search information.
In the application, obtaining a similarity score of multimedia content corresponding to search information, wherein the similarity score comprises a distance score of the multimedia content relative to a word in the search information and/or a hit condition score of the multimedia content relative to the word in the search information; and determining the score of the multimedia content relative to the search information according to the similarity score. The scoring of the multimedia content relative to the search information is determined according to the similarity score, so that the technical problem that the scoring accuracy of the multimedia content is poor is solved, and the technical effect of improving the scoring accuracy of the multimedia content is achieved.
Referring to fig. 4, fig. 4 is a block diagram of a search information scoring apparatus provided in the present application, and as shown in fig. 4, a search information scoring apparatus 400 includes:
an obtaining module 401, configured to obtain a similarity score of the multimedia content corresponding to the search information, where the similarity score includes a distance score of the multimedia content relative to the word in the search information and/or a hit score of the multimedia content relative to the word in the search information;
A determining module 402 is configured to determine a score of the multimedia content relative to the search information according to the similarity score.
Optionally, the distance score is determined by:
calculating the distance between a plurality of keywords in the search information and the multimedia content, wherein the distance between the i-th keyword hit in the plurality of keywords is the distance between the i-th keyword hit and the i+1-th keyword hit in the multimedia content, if the missed keywords exist in the plurality of keywords, the distance between the missed keywords is a preset distance, the plurality of keywords are part or all of the keywords in the search information, and i is a positive integer;
and summing the distances of the plurality of cutting words to obtain the distance score.
Optionally, the hit score is determined by:
calculating the scores of a plurality of keywords in the search information in the multimedia content, wherein if a target keyword hits in the multimedia content, the score of the target keyword is a first score, and if the target keyword does not hit in the multimedia content but hits the synonym of the target keyword, the score of the target keyword is a second score, wherein the first score is larger than the second score, and the target keyword is any one of the plurality of keywords;
And summing the scores of the plurality of word cutting words to obtain the hit condition score.
Optionally, the determining module 402 is configured to determine a score of the multimedia content relative to the search information according to a quotient obtained by dividing the hit score by the distance score.
Optionally, as shown in fig. 5, the apparatus further includes:
a calculation module 403, configured to calculate a weight score of the multimedia content relative to the search information;
the determining module 402 is configured to determine a score of the multimedia content relative to the search information based on the similarity score and the weight score.
Optionally, the weight score of the multimedia content relative to the search information is determined by:
calculating weight values of a plurality of cutting words in the search information;
calculating importance scores of a plurality of keywords in the search information in the multimedia content, wherein the importance scores of target keywords in the multimedia content are determined according to content domains of the target keywords or synonyms of the target keywords in the multimedia content, the multimedia content comprises a plurality of content domains, importance weight values of different content domains are different, and the target keywords are any one of the plurality of keywords;
Multiplying the weight value of the target word with the importance score to obtain the weight score of the target word;
and summing the weight scores of the plurality of word segmentation to obtain the weight score of the multimedia content relative to the search information.
The device provided in this embodiment can implement each process implemented in the method embodiment shown in fig. 1, and can achieve the same beneficial effects, so that repetition is avoided, and no further description is given here.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 6, a block diagram of an electronic device according to a search information scoring method according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 6, the electronic device includes: one or more processors 601, memory 602, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 601 is illustrated in fig. 6.
Memory 602 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the search information scoring method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the search information scoring method provided by the present application.
The memory 602 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/modules (e.g., the acquisition module 401 and the determination module 402 shown in fig. 4) corresponding to the search information scoring method in the embodiment of the present application. The processor 601 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 602, that is, implements the search information scoring method in the above-described method embodiments.
The memory 602 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device of the search information scoring method, and the like. In addition, the memory 602 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 602 may optionally include memory remotely located with respect to processor 601, which may be connected to the electronic device of the search information scoring method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the search information scoring method may further include: an input device 603 and an output device 604. The processor 601, memory 602, input device 603 and output device 604 may be connected by a bus or otherwise, for example in fig. 6.
The input device 603 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the search information scoring method, such as a touch screen, keypad, mouse, trackpad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, etc. input devices. The output means 604 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
According to the technical scheme of the embodiment of the application, in the application, a similarity score of the multimedia content corresponding to the search information is obtained, wherein the similarity score comprises a distance score of the multimedia content relative to the word in the search information and/or a hit condition score of the multimedia content relative to the word in the search information; and determining the score of the multimedia content relative to the search information according to the similarity score. The scoring of the multimedia content relative to the search information is determined according to the similarity score, so that the technical problem that the scoring accuracy of the multimedia content is poor is solved, and the technical effect of improving the scoring accuracy of the multimedia content is achieved.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (8)

1. A search information scoring method, comprising:
obtaining a similarity score of the multimedia content corresponding to the search information, wherein the similarity score comprises a distance score of the multimedia content relative to the word segmentation in the search information and a hit condition score of the multimedia content relative to the word segmentation in the search information;
determining a score of the multimedia content relative to the search information according to the similarity score;
Wherein the hit score is determined by:
calculating the scores of a plurality of keywords in the search information in the multimedia content, wherein if a target keyword hits in the multimedia content, the score of the target keyword is a first score, and if the target keyword does not hit in the multimedia content but hits the synonym of the target keyword, the score of the target keyword is a second score, wherein the first score is larger than the second score, and the target keyword is any one of the plurality of keywords; summing the scores of the plurality of word segmentation to obtain the hit condition score;
wherein said determining a score for said multimedia content relative to said search information based on said similarity score comprises: determining the score of the multimedia content relative to the search information according to the quotient obtained by dividing the hit condition score by the distance score; alternatively, before the scoring of the multimedia content with respect to the search information based on the similarity score, the method further comprises: calculating a weight score of the multimedia content relative to the search information; the determining the score of the multimedia content relative to the search information according to the similarity score comprises: and determining the score of the multimedia content relative to the search information according to the similarity score and the weight score.
2. The method of claim 1, wherein the distance score is determined by:
calculating the distance between a plurality of keywords in the search information and the multimedia content, wherein the distance between the i-th keyword hit in the plurality of keywords is the distance between the i-th keyword hit and the i+1-th keyword hit in the multimedia content, if the missed keywords exist in the plurality of keywords, the distance between the missed keywords is a preset distance, the plurality of keywords are part or all of the keywords in the search information, and i is a positive integer;
and summing the distances of the plurality of cutting words to obtain the distance score.
3. The method of claim 1, wherein the weight score of the multimedia content relative to the search information is determined by:
calculating weight values of a plurality of cutting words in the search information;
calculating importance scores of a plurality of keywords in the search information in the multimedia content, wherein the importance scores of target keywords in the multimedia content are determined according to content domains of the target keywords or synonyms of the target keywords in the multimedia content, the multimedia content comprises a plurality of content domains, importance weight values of different content domains are different, and the target keywords are any one of the plurality of keywords;
Multiplying the weight value of the target word with the importance score to obtain the weight score of the target word;
and summing the weight scores of the plurality of word segmentation to obtain the weight score of the multimedia content relative to the search information.
4. A search information scoring apparatus comprising:
the acquisition module is used for acquiring similarity scores of the multimedia content corresponding to the search information, wherein the similarity scores comprise distance scores of the multimedia content relative to the words in the search information and hit scores of the multimedia content relative to the words in the search information;
a determining module, configured to determine a score of the multimedia content relative to the search information according to the similarity score;
wherein the hit score is determined by:
calculating the scores of a plurality of keywords in the search information in the multimedia content, wherein if a target keyword hits in the multimedia content, the score of the target keyword is a first score, and if the target keyword does not hit in the multimedia content but hits the synonym of the target keyword, the score of the target keyword is a second score, wherein the first score is larger than the second score, and the target keyword is any one of the plurality of keywords; summing the scores of the plurality of word segmentation to obtain the hit condition score;
The determining module is used for determining the score of the multimedia content relative to the search information according to the quotient obtained by dividing the hit condition score by the distance score; alternatively, the apparatus further comprises: a calculation module for calculating a weight score of the multimedia content relative to the search information; the determining module is used for determining the score of the multimedia content relative to the search information according to the similarity score and the weight score.
5. The apparatus of claim 4, wherein the distance score is determined by:
calculating the distance between a plurality of keywords in the search information and the multimedia content, wherein the distance between the i-th keyword hit in the plurality of keywords is the distance between the i-th keyword hit and the i+1-th keyword hit in the multimedia content, if the missed keywords exist in the plurality of keywords, the distance between the missed keywords is a preset distance, the plurality of keywords are part or all of the keywords in the search information, and i is a positive integer;
and summing the distances of the plurality of cutting words to obtain the distance score.
6. The apparatus of claim 4, wherein the weight score of the multimedia content relative to the search information is determined by:
Calculating weight values of a plurality of cutting words in the search information;
calculating importance scores of a plurality of keywords in the search information in the multimedia content, wherein the importance scores of target keywords in the multimedia content are determined according to content domains of the target keywords or synonyms of the target keywords in the multimedia content, the multimedia content comprises a plurality of content domains, importance weight values of different content domains are different, and the target keywords are any one of the plurality of keywords;
multiplying the weight value of the target word with the importance score to obtain the weight score of the target word;
and summing the weight scores of the plurality of word segmentation to obtain the weight score of the multimedia content relative to the search information.
7. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.
8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.
CN202010095770.4A 2020-02-17 2020-02-17 Search information scoring method and device and electronic equipment Active CN113268618B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010095770.4A CN113268618B (en) 2020-02-17 2020-02-17 Search information scoring method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010095770.4A CN113268618B (en) 2020-02-17 2020-02-17 Search information scoring method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113268618A CN113268618A (en) 2021-08-17
CN113268618B true CN113268618B (en) 2023-07-25

Family

ID=77227612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010095770.4A Active CN113268618B (en) 2020-02-17 2020-02-17 Search information scoring method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113268618B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134760A (en) * 2019-05-17 2019-08-16 北京思维造物信息科技股份有限公司 A kind of searching method, device, equipment and medium
CN110516062A (en) * 2019-08-26 2019-11-29 腾讯科技(深圳)有限公司 A kind of search processing method and device of document
CN110659422A (en) * 2019-09-27 2020-01-07 百度在线网络技术(北京)有限公司 Retrieval method, retrieval device, electronic equipment and storage medium
CN110674320A (en) * 2019-09-27 2020-01-10 百度在线网络技术(北京)有限公司 Retrieval method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9852188B2 (en) * 2014-06-23 2017-12-26 Google Llc Contextual search on multimedia content

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134760A (en) * 2019-05-17 2019-08-16 北京思维造物信息科技股份有限公司 A kind of searching method, device, equipment and medium
CN110516062A (en) * 2019-08-26 2019-11-29 腾讯科技(深圳)有限公司 A kind of search processing method and device of document
CN110659422A (en) * 2019-09-27 2020-01-07 百度在线网络技术(北京)有限公司 Retrieval method, retrieval device, electronic equipment and storage medium
CN110674320A (en) * 2019-09-27 2020-01-10 百度在线网络技术(北京)有限公司 Retrieval method and device and electronic equipment

Also Published As

Publication number Publication date
CN113268618A (en) 2021-08-17

Similar Documents

Publication Publication Date Title
CN112507068B (en) Document query method, device, electronic equipment and storage medium
US10521484B1 (en) Typeahead using messages of a messaging platform
US9405805B2 (en) Identification and ranking of news stories of interest
US20170169008A1 (en) Method and electronic device for sentiment classification
CN110991196B (en) Translation method and device for polysemous words, electronic equipment and medium
WO2021139209A1 (en) Query auto-completion method, apparatus and device, and computer storage medium
KR101423549B1 (en) Sentiment-based query processing system and method
WO2021139221A1 (en) Method and apparatus for query auto-completion, device and computer storage medium
US11570527B2 (en) Method and apparatus for retrieving teleplay content
CN106446122B (en) Information retrieval method and device and computing equipment
US9251289B2 (en) Matching target strings to known strings
CN111538815B (en) Text query method, device, equipment and storage medium
CN111984774B (en) Searching method, searching device, searching equipment and storage medium
CN111737501A (en) Content recommendation method and device, electronic equipment and storage medium
JP7139028B2 (en) Target content determination method, apparatus, equipment, and computer-readable storage medium
US20230222161A1 (en) Video Title Generation Method, Device, Electronic Device and Storage Medium
CN112182292A (en) Training method and device for video retrieval model, electronic equipment and storage medium
CN111291184B (en) Expression recommendation method, device, equipment and storage medium
CN112506864B (en) File retrieval method, device, electronic equipment and readable storage medium
CN111666417B (en) Method, device, electronic equipment and readable storage medium for generating synonyms
CN113268618B (en) Search information scoring method and device and electronic equipment
CN112632285A (en) Text clustering method and device, electronic equipment and storage medium
CN116628278A (en) Multi-modal searching method, device, storage medium and equipment
CN111797205B (en) Vocabulary retrieval method and device, electronic equipment and storage medium
Yang et al. Data analysis system for online short video comments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant