CN111597471A - Display position determining method and device, electronic equipment and storage medium - Google Patents

Display position determining method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111597471A
CN111597471A CN202010439248.3A CN202010439248A CN111597471A CN 111597471 A CN111597471 A CN 111597471A CN 202010439248 A CN202010439248 A CN 202010439248A CN 111597471 A CN111597471 A CN 111597471A
Authority
CN
China
Prior art keywords
search information
media content
determining
word
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010439248.3A
Other languages
Chinese (zh)
Inventor
王鑫宇
张永华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202010439248.3A priority Critical patent/CN111597471A/en
Publication of CN111597471A publication Critical patent/CN111597471A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a method and a device for determining a display position, an electronic device and a storage medium, wherein the method for determining the display position comprises the following steps: acquiring search information, and performing word segmentation processing on the search information to obtain multiple types of word units related to the search information; for each media content in a plurality of media contents associated with the search information, determining similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information; and determining the display position of each media content in the search result display page of the client based on the similarity between each media content associated with the search information and the search information. The embodiment of the disclosure improves the accuracy of the information display position.

Description

Display position determining method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a display position, an electronic device, and a storage medium.
Background
With the development of internet technology, a media client pushes a large number of different types of articles every day for users to watch, such as biographies, travels, gourmets and the like. The user may enter search information at an input box of the media client to find media content that is desired to be viewed.
After receiving the search information, the server may recall the media content associated with the search information based on the search information, and when the number of recalled media content is large, the presentation order of the media content at the client may be predetermined.
The media content sorting mode in the related technology is mechanical and simple, the accuracy is low, the media content which is provided for the user and is sorted ahead may not contain articles which are interesting to the user, and the real requirement of the user cannot be met.
Disclosure of Invention
The embodiment of the disclosure at least provides a scheme for determining a display position, so as to improve the accuracy of recommending the information display position.
In a first aspect, an embodiment of the present disclosure provides a method for determining a display position, including:
acquiring search information, and performing word segmentation processing on the search information to obtain multiple types of word units related to the search information;
for each media content in a plurality of media contents associated with the search information, determining similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information;
and determining the display position of each media content in the search result display page of the client based on the similarity between each media content associated with the search information and the search information.
In a possible implementation manner, the performing word segmentation processing on the search information to obtain multiple types of word units associated with the search information includes:
performing word segmentation processing on the search information according to the language type of the search information to obtain a plurality of words contained in the search information;
extracting at least two continuous words capable of forming phrases under the language type according to the position of each word in the search information to obtain the phrases contained in the search information;
for each word with the length exceeding a set value, extracting the prefix of the word to obtain the prefix contained in the search information, and for each phrase formed by the words with the length exceeding the set value, extracting the prefix of each word in the phrase to obtain the prefix phrase contained in the search information;
and taking the words, phrases, prefixes and prefix phrases contained in the search information as various types of word units related to the search information.
In one possible implementation, the determining, for each media content associated with the search information, a similarity between the media content and the search information based on text information associated with the media content and multiple types of word units associated with the search information includes:
acquiring text information associated with each media content associated with the search information;
searching a target word unit matched with the word unit associated with the search information in the text information associated with the media content;
and acquiring the predetermined importance of each target word unit, and determining the similarity between the media content and the search information based on the importance of each target word unit.
In one possible implementation, the obtaining, for each media content associated with the search information, text information associated with the media content includes:
acquiring text information of the media content under different dimensions;
and splicing the text information under each dimension to obtain the text information associated with the media content.
In one possible embodiment, the importance of each target word unit is predetermined as follows:
and determining the importance of each target word unit based on the corresponding media content quantity of each target word unit in the media content library and the total media content quantity in the media content library.
In one possible implementation, the determining the similarity between the media content and the search information based on the importance of each target word unit includes:
determining the sum of the importance of each type of word unit related to the search information based on the importance of each target word unit and the type of the target word unit;
and determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information.
In one possible implementation, the determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information includes:
multiplying the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information; alternatively, the first and second electrodes may be,
and carrying out weighted summation on the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information.
In one possible implementation manner, before determining a presentation position of each media content in a search result presentation page of a client based on a similarity between each media content associated with the search information and the search information, the determining method further includes:
acquiring user behavior data corresponding to each media content;
the determining, based on the similarity between each media content associated with the search information and the search information, a presentation position of each media content in a search result presentation page of the client includes:
and determining the display position of each media content in the search result display page of the client based on the similarity between each media content and the search information and the user behavior data corresponding to the media content.
In one possible implementation, after determining the presentation position of each of the media contents in the search result presentation page of the client, the determining method further includes:
and sending the plurality of media contents and the display positions corresponding to the plurality of media contents to the client.
In a second aspect, an embodiment of the present disclosure provides an apparatus for determining a display position, including:
the information processing module is used for acquiring search information and performing word segmentation processing on the search information to obtain multiple types of word units related to the search information;
a first determining module, configured to determine, for each media content in a plurality of media contents associated with the search information, a similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information;
and the second determination module is used for determining the display position of each media content in the search result display page of the client based on the similarity between each media content associated with the search information and the search information.
In a possible implementation manner, when the information processing module is configured to perform word segmentation processing on the search information to obtain multiple types of word units associated with the search information, the information processing module includes:
performing word segmentation processing on the search information according to the language type of the search information to obtain a plurality of words contained in the search information;
extracting at least two continuous words capable of forming phrases under the language type according to the position of each word in the search information to obtain the phrases contained in the search information;
for each word with the length exceeding a set value, extracting the prefix of the word to obtain the prefix contained in the search information, and for each phrase formed by the words with the length exceeding the set value, extracting the prefix of each word in the phrase to obtain the prefix phrase contained in the search information;
and taking the words, phrases, prefixes and prefix phrases contained in the search information as various types of word units related to the search information.
In one possible implementation, the first determining module, when configured to determine, for each media content associated with the search information, a similarity between the media content and the search information based on text information associated with the media content and multiple types of word units associated with the search information, includes:
acquiring text information associated with each media content associated with the search information;
searching a target word unit matched with the word unit associated with the search information in the text information associated with the media content;
and acquiring the predetermined importance of each target word unit, and determining the similarity between the media content and the search information based on the importance of each target word unit.
In one possible embodiment, the first determining module, when configured to obtain, for each media content associated with the search information, text information associated with the media content, includes:
acquiring text information of the media content under different dimensions;
and splicing the text information under each dimension to obtain the text information associated with the media content.
In one possible embodiment, the first determining module determines the importance of each target word unit in advance according to the following manner:
and determining the importance of each target word unit based on the corresponding media content quantity of each target word unit in the media content library and the total media content quantity in the media content library.
In one possible embodiment, the first determining module, when configured to determine the similarity between the media content and the search information based on the importance of each target word unit, includes:
determining the sum of the importance of each type of word unit related to the search information based on the importance of each target word unit and the type of the target word unit;
and determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information.
In one possible implementation, the first determining module, when configured to determine the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information, includes:
multiplying the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information; alternatively, the first and second electrodes may be,
and carrying out weighted summation on the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information.
In one possible embodiment, before determining the presentation position of each media content in the search result presentation page of the client based on the similarity between each media content associated with the search information and the search information, the second determination module is further configured to:
acquiring user behavior data corresponding to each media content and historical click rate corresponding to the media content;
the second determination module, when configured to determine, based on the similarity between each media content associated with the search information and the search information, a presentation position of each media content in a search result presentation page of the client, includes:
and determining the display position of each media content in a search result display page of the client based on the similarity between each media content and the search information, the user behavior data corresponding to the media content and the historical click rate corresponding to the media content.
In a possible implementation manner, the determining apparatus further includes a sending module, after the second determining module determines the presentation position of each of the media contents in the search result presentation page of the client, the sending module is configured to:
and sending the plurality of media contents and the display positions corresponding to the plurality of media contents to the client.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the determination method according to the first aspect.
In a fourth aspect, the disclosed embodiments provide a computer-readable storage medium having stored thereon a computer program, which, when executed by a processor, performs the steps of the determination method according to the first aspect.
The method for determining the display position provided by the embodiment of the disclosure includes, for search information, performing word segmentation processing on the search information to obtain multiple types of word units, determining the similarity between each media content associated with the search information and the search information through different types of word units, and thus when determining the similarity through the multiple types of word units, considering the similarity between the search information and the media content from multiple directions to obtain more accurate similarity.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 is a flowchart illustrating a method for determining a display position according to an embodiment of the disclosure;
FIG. 2 is a flowchart illustrating a method for determining similarity between search information and media content according to an embodiment of the disclosure;
fig. 3 is a schematic structural diagram illustrating a device for determining a display position according to an embodiment of the present disclosure;
fig. 4 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
For some media clients, media contents with a large number of titles can be provided for users to watch, when the users generate search requirements, the users can also search in the media contents based on the input search information, and then the search information of 'tomato-fried eggs' can be input in the media clients, so that when the servers receive the search information, the media contents associated with the search information, such as the media contents containing 'tomato', 'egg', 'tomato-fried egg' in the search media contents, can be searched in the media library, a large number of media contents can be obtained, when the media contents which are recommended are returned to the media clients, the media contents which are more in line with the requirements of the users are displayed at the front positions, how to find the media contents which are suitable for being displayed at the front positions, that is, how to improve the recommendation accuracy, a problem to be investigated for the embodiments of the present disclosure.
Based on the research, the disclosure provides a method for determining a display position, aiming at search information, firstly performing word segmentation processing on the search information to obtain multiple types of word units, determining the similarity between each media content associated with the search information and the search information through different types of word units, thus when determining the similarity through the multiple types of word units, the similarity between the search information and the media content can be considered in multiple directions to obtain more accurate similarity, then when determining the display position of each media content associated with the search information based on the similarity, the more accurate display position of each media content can be obtained, and thus when pushing the media content associated with the search information to a client, the accuracy of the pushed information display position can be improved.
In order to facilitate understanding of the present embodiment, a method for determining a display position disclosed in the embodiments of the present disclosure is first described in detail, and an execution subject of the method for determining a display position provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, such as a server. In some possible implementations, the determination method may be implemented by way of a processor calling computer readable instructions stored in a memory.
Referring to fig. 1, a flowchart of a method for determining a display position according to an embodiment of the present disclosure is shown, where the method for determining a display position includes the following specific steps S101 to S103:
s101, obtaining search information, and performing word segmentation processing on the search information to obtain multiple types of word units related to the search information.
The search information may be carried in a search request sent by a user, or may be current statistical popular search information.
Here, the word units obtained by word segmentation may specifically include words, phrases, prefixes and prefix phrases contained in the search information, which will be specifically described below.
S102, for each media content in a plurality of media contents associated with the search information, determining similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information.
When searching for the media content associated with the search information based on the search information, the keyword included in the search information may be obtained first, and then the media content including the keyword is searched for in the media content library as the media content associated with the search information.
The keyword here indicates a word unit having an actual meaning, and can be obtained by performing word segmentation processing on the search information and then deleting stop words in the word unit after the word segmentation processing.
Specifically, the keyword of the search information may be determined by the frequency of each word unit appearing in the media content library, the frequency of each word unit appearing in the media content library may be determined by the number of media contents containing the word unit in the media content library and the total amount of media contents in the media content library, specifically, the ratio of the number of media contents containing the word unit and the total amount of media contents in the media content library may be used as the frequency of the word unit appearing in the media content library, and then the word unit with the frequency lower than the set threshold value may be selected as the keyword here.
After the plurality of media contents associated with the search information are obtained in the above manner, for each media content in the plurality of media contents, whether each type of word unit associated with the search information exists or not may be searched in text information associated with the media content, and then the correlation between the media content and the search information is determined based on the search result, which will be described later in detail.
S103, determining the display position of each media content in the search result display page of the client based on the similarity between each media content associated with the search information and the search information.
After determining the similarity between each media content associated with the search information and the search information, a display position of each media content when the media content is displayed at the client may be further determined, for example, the media contents may be sorted in a descending order according to the corresponding similarity, and a sequence number of each media content in the descending order is used as the display position of the media content when the media content is displayed at the client.
In the method for determining the display position provided in steps S101 to S103, for the search information, word segmentation processing is performed on the search information to obtain word units of various types, and the similarity between each media content associated with the search information and the search information is determined by word units of different types, so that when the similarity is determined by word units of various types, the similarity between the search information and the media content can be considered in multiple directions to obtain more accurate similarity, and then when the display position of each media content associated with the search information is determined based on the similarity, the display position of each media content can be obtained more accurately, so that when the media content associated with the search information is pushed to the client, the accuracy of the pushed information display position can be improved.
The above-described S101 to S103 will be described with reference to specific examples.
For the above S101, when performing word segmentation processing on the search information to obtain multiple types of word units associated with the search information, the following (1) to (4) may be included:
(1) and performing word segmentation processing on the search information according to the language type to which the search information belongs to obtain a plurality of words contained in the search information.
The search information can be segmented according to a preset segmentation dictionary matched with the language type, the language type can comprise Chinese, English and the like, when the search information is Chinese, the search information can be segmented according to the Chinese segmentation dictionary to obtain a plurality of Chinese words, and when the search information is English, the search information can be segmented according to the English segmentation dictionary to obtain a plurality of English words.
For example, taking the search information as english as an example, if the search information is "facial relative check", after performing the word segmentation processing, a plurality of words can be obtained: "famous", "relative", and "check".
(2) And extracting at least two continuous words capable of forming phrases under the language type according to the position of each word in the search information to obtain the phrases contained in the search information.
Here, the position of each word in the search information refers to the ranking position of the word in the search information, and the search information "famous relative check" is taken as an example, where the positions of "famous", "relative" and "check" in the search information are respectively 1,2 and 3, and the positions are used for determining whether the word is a continuous word when determining the phrase included in the search information, for example, the search phrase obtained here includes "famous relative", "relative check" and "famous relative check", that is, the search phrase including words with positions 1 and 2, the search phrase including words with positions 2 and 3, and the search phrase including words with positions 1,2 and 3.
In particular, a phrase is introduced here, the space between words included in the phrase may not be limited, for example, "false relative" may also be expressed as "false relative", so that a problem of word failure caused by that some words included in the text information associated with the media content are not divided, for example, any media content associated with the search information, if the text information associated with the media content includes "false relative", the phrase is not divided into "false" and "relative", and if the text information associated with the media content still includes "false" or "relative", the phrase may be returned to be not found, that is, the word is failed.
(3) And extracting the prefix of each word in the word group to obtain a prefix word group contained in the search information.
For Chinese, the prefix is a word-forming component in Chinese before the root of a word, for example, the 'A' in 'argo' and 'Ali' is the prefix; for English, Chinese and English words can be divided into three parts: prefix (prefix), root word (stem) and suffix (suffix), wherein the part of the word which is positioned in front of the root word is the prefix.
The prefix of the word is introduced in the embodiment of the present disclosure, mainly to overcome the problem of word failure caused by the change of the single complex number and the tense of the word, for example, if the search information includes a word representing "durian", an english word of a single durian is "durian", and if a plurality of durans change into a complex form, i.e., "durians", if the durian in the text information associated with the media content is in a complex form, if it is still found out whether the text information associated with the media content includes the "durian" according to the "durian", it may not be found out, i.e., the word fails, but prefixes of the "durians" and the "durian" may be the same, e.g., both may be represented by "durer", and similarly, the meaning of extracting the prefix here is also to overcome the problem of word failure caused by the change of the single complex number and the tense of the word.
In addition, when a prefix of a word is extracted, the length of the prefix is ensured, and in general, the length of the extracted prefix may be half of the length of the word, and the length of the prefix should be ensured to include at least 3 letters, so that when a prefix is searched in the text information associated with the media content, the matching degree between the word found based on the prefix and the word to which the prefix belongs is high, if the length of the prefix is short, such as less than 3 letters, the word found in the text information associated with the media content based on the prefix may not match with the word to which the prefix belongs, such as for "relative", if "re" is taken as the prefix of the word, and when "re" is searched in the text information associated with the media content, the found word may not be related to "relative".
In contrast, here, only words having a length exceeding a predetermined value are extracted, and for example, prefix extraction is performed on words having a letter length exceeding 3 letters, and similarly, when prefix phrases are extracted, phrases composed of words having a length exceeding a predetermined value are extracted.
(4) And taking words, phrases, prefixes and prefix phrases contained in the search information as various types of word units related to the search information.
Regarding the search information, the words, phrases, prefixes and prefix phrases extracted based on the above (1) to (3) are used as the word units of the plurality of types associated with the search information.
Through the method, word units used for representing various types of search information can be obtained, when the relevance between each media content associated with the search information and the search information is determined for different types of word units, the problems of word failure caused by word singleness, tense and the like and word failure caused by word segmentation of text information associated with the media content are solved, the accuracy of the obtained relevance is high, the obtained media content with high relevance is more in line with the search intention of a user, and the display position determined based on the relevance is more accurate.
Regarding the above step S102, when determining the similarity between each media content associated with the search information and the search information based on the text information associated with the media content and the plurality of types of word units associated with the search information, as shown in fig. 2, the following steps S201 to S203 may be included:
s201, aiming at each media content relevant to the search information, obtaining text information relevant to the media content.
The text information associated with the media content refers to text information included in the media content, and specifically, after each media content associated with the search information is obtained, the text information associated with the media content may be obtained as follows:
(1) acquiring text information of the media content under different dimensions;
(2) and splicing the text information under each dimension to obtain the text information associated with the media content.
Each media content may include a media title, a music content, a user name for publishing the media content, and the like, which may be referred to as different dimensions, and when the above-mentioned media content associated with the search information is searched in the media content library based on the search information, the media content may be obtained according to different matching dimensions, for example, the media content may be searched in the media content library according to the matching of the media title and the search information, that is, the media content including the search information in the media title, the media content including the search information in the music content, and the media content including the search information in the search user name, so that the media content matching the search information may be obtained.
The text information of the media content under different dimensions, that is, the text information corresponding to the media title, the music content, the user name for publishing the media content, and the like of the media content, is spliced according to the text information corresponding to the dimensions, so as to obtain the text information associated with the media content, and in the splicing process, the text information can be connected through a preset symbol, for example, through "#".
S202, searching a target word unit matched with the word unit associated with the search information in the text information associated with the media content.
And taking word units which can be found in the text information associated with the media content and are consistent with the word units associated with the search information as target word units.
For example, when the search information is "false relative check", the word unit associated with the search information may include the word: "famous", "relative" and "check"; may include the phrases: "facial relative", "relative check" and "facial relative check", or "facial relative", "relative check" and "facial relative check"; may include a prefix: "fam", "rela", and "che"; prefix phrases may be included: "fam rela", "rela che" and "fam rela che", if the text information associated with the media content is "case, lou # casielou # adh check tiktok for you related reusable makethviral makemefacial facial: "famous", "check", "fam", "rela", and "che".
S203, acquiring the predetermined importance of each target word unit, and determining the similarity between the media content and the search information based on the importance of each target word unit.
Here, the importance of each target word unit may be represented by a score, and the higher the corresponding score, the higher the importance of the target word unit.
Considering that some word units, such as stop words, appear in the media content library with high frequency, almost all the media content in the media content library contains such stop words, so the importance of such word units is low, and in particular, when determining the importance of each target word unit, the importance can be determined by the corresponding media content number of the target word unit in the media content library and the total media content number in the media content library.
For example, when determining the importance of any target word unit, the number of media contents corresponding to any target word unit in the media content library may be divided by the total number of media contents in the media content library, and then the obtained quotient is further processed, so as to obtain a score representing the importance of any target word unit.
Specifically, the number of media contents corresponding to each word unit in the media content library refers to the number of media contents including the word unit in the media content library, and considering that the number of media contents including the word unit is smaller than or much smaller than the total number of media contents in the media content library, the quotient of any target word unit obtained above is generally smaller than 1, and in order to make the score and the importance positively correlated, the obtained quotient may be subjected to negative logarithm, that is, the score representing the importance of any target word unit may be obtained.
After obtaining the importance of each target word unit, the similarity between the media content and the search information may be determined based on the importance of each target word unit, which specifically includes:
(1) and determining the sum of the importance of each type of word unit related to the search information based on the importance of each target word unit and the type of the target word unit.
The obtained target word units may include one or more of a word type, a prefix type, a phrase type, and a prefix phrase type, where the importance of the target word units belonging to the same type may be summed based on the importance of each target word unit and the type to which the target word unit belongs, so as to obtain the sum of the importance of each type of word unit associated with the search information.
(2) And determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information.
Specifically, when determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information, the similarity may be determined in the following two ways:
the first mode is as follows: and multiplying the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity of the media content and the search information.
For example, the sum of the importance of the word units of the word type is L1, the sum of the importance of the word units of the prefix type is L2, the sum of the importance of the word units of the phrase type is L3, and the sum of the importance of the prefix phrase type is L4, where L1, L2, L3, and L4 are multiplied to obtain a product as the similarity between the media content and the search information.
In particular, considering that a certain media content associated with the search information may not include all types of target word units, the sum of the importance levels of a certain type of word units may be 0, and in order to prevent the similarity from being 0, the sum of the importance levels of each type of word units may be multiplied by a set natural number (for example, 1).
The second mode is as follows: and carrying out weighted summation on the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information.
Before weighted summation, a weight corresponding to each type of word unit needs to be obtained, where the weight may be set in advance, for example, a weight corresponding to each type of word unit when determining similarity may be determined according to a large amount of data statistics, then, based on the weight, weighted summation is performed on the sum of importance levels of each type of word unit, and a final summation result is used as the similarity between the media content and the search information.
How to determine the similarity between any media content associated with the search information and the search information is described below with respect to a specific embodiment:
for example, the search information is "facial relative check", and the text information associated with any media content is "case, # caseelou # adhd check tiktok for you relative changeable makethvironemefural viral # # makehighvirovascular makefamous viral", then the target word unit here includes: the "famous", "check", "fam" and "che" may include two types of word units, that is, word units of word types and word units of prefix types, respectively, and if the predetermined importance of "famous" is 2.45, the importance of "check" is 2.24, the importance of "fam" is 2.45, the importance of "che" is 2.24, and the importance of "rela" is 2.83, the sum of the importance of word units of word types here is 4.69, the sum of the importance of word units of prefix types is 7.52, and the sum of the importance of word units of other types is 0.
If the similarity between the search information and any one of the media contents is determined in the first manner, if the natural number is set to 1, the similarity can be obtained by (1+4.69) × (1+7.52) × (1+0), and if the similarity is determined in the second manner, and the weight of the word unit of the word type is 0.25 and the weight of the word unit of the prefix type is 0.1, the similarity can be obtained by 0.25 × 4.69+0.1 × 7.52.
In another implementation manner, before determining, in step S103, a presentation position of each media content in the search result presentation page of the client based on a similarity between each media content associated with the search information and the search information, the determination method provided by the embodiment of the present disclosure further includes:
and acquiring user behavior data corresponding to each media content.
The user behavior data corresponding to each media content refers to an operation performed by a user for the media content when the media content is played at the client, for example, when the media content is a video, the user behavior data may include the number of praise for the video, the number of comments, the playing time length, and the click rate corresponding to the video when the video is displayed at the client.
The click rate corresponding to the media content when the media content is displayed at the client can be determined by the ratio of the click times of the media content to the displayed times within the latest set time, for example, the server counts the click rate corresponding to each media content every day at a fixed time.
Thus, when determining the presentation position of each media content in the search result presentation page of the client based on the similarity between each media content associated with the search information and the search information, the method includes:
and determining the display position of each media content in the search result display page of the client based on the similarity between each media content and the search information and the user behavior data corresponding to the media content.
The similarity and user behavior data may be weighted and summed to determine a score that characterizes the presentation location.
Specifically, the number of praise, the number of comments, the playing time length, and the click rate included in the user behavior data mentioned above may be converted into corresponding scores, and then the scores and the scores representing the similarity are subjected to weighted summation to obtain a score for representing the display position, where the weights corresponding to the user behavior data and the similarity may be determined according to historical data statistics.
Further, after determining the display position of each media content in the search result display page of the client, the determining method further includes:
and sending the display positions corresponding to the plurality of media contents and the plurality of media contents to the client.
The steps S101 to S103 may be performed after receiving a search request including search information sent by a client, so that after obtaining a display position of each media content in a search result display page of the client based on the steps S101 to S103, the obtained display positions corresponding to the plurality of media contents and the plurality of media contents associated with the search information may be sent to the client, and then after receiving the plurality of media contents, the client may display the media contents in the search result display page according to the display position corresponding to each media content.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same technical concept, a display position determining device corresponding to the display position determining method is further provided in the embodiment of the present disclosure, and as the principle of solving the problem of the device in the embodiment of the present disclosure is similar to the display position determining method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 3, a schematic structural diagram of a display position determining apparatus 300 according to an embodiment of the present disclosure is provided, where the display position determining apparatus includes: an information processing module 301, a first determining module 302, and a second determining module 303.
The information processing module 301 is configured to obtain search information, and perform word segmentation processing on the search information to obtain multiple types of word units associated with the search information;
a first determining module 302, configured to determine, for each media content of a plurality of media contents associated with search information, a similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information;
a second determining module 303, configured to determine, based on a similarity between each media content associated with the search information and the search information, a display position of each media content in the search result display page of the client.
In a possible implementation, the information processing module 301, when configured to perform word segmentation processing on the search information to obtain multiple types of word units associated with the search information, includes:
performing word segmentation processing on the search information according to the language type to which the search information belongs to obtain a plurality of words contained in the search information;
extracting at least two continuous words capable of forming phrases under the language type according to the position of each word in the search information to obtain the phrases contained in the search information;
extracting prefixes of the words aiming at the words with the lengths exceeding the set value to obtain prefixes contained in the search information, and extracting prefixes of the words in the phrases aiming at each phrase formed by the words with the lengths exceeding the set value to obtain prefix phrases contained in the search information;
and taking words, phrases, prefixes and prefix phrases contained in the search information as various types of word units related to the search information.
In one possible implementation, the first determining module 302, when configured to determine, for each media content associated with search information, a similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information, includes:
acquiring text information associated with each media content associated with the search information;
searching a target word unit matched with the word unit associated with the search information in the text information associated with the media content;
and acquiring the predetermined importance of each target word unit, and determining the similarity between the media content and the search information based on the importance of each target word unit.
In one possible implementation, the first determining module 302, when configured to obtain, for each media content associated with the search information, text information associated with the media content, includes:
acquiring text information of the media content under different dimensions;
and splicing the text information under each dimension to obtain the text information associated with the media content.
In one possible implementation, the first determining module 302 pre-determines the importance of each target word unit as follows:
and determining the importance of each target word unit based on the corresponding media content quantity of each target word unit in the media content library and the total media content quantity in the media content library.
In one possible implementation, the first determining module 302, when configured to determine the similarity between the media content and the search information based on the importance of each target word unit, includes:
determining the sum of the importance of each type of word unit related to the search information based on the importance of each target word unit and the type of the target word unit;
and determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information.
In one possible implementation, the first determining module 302, when configured to determine the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information, includes:
multiplying the sum of the importance degrees of each type of word unit related to the search information to obtain the similarity of the media content and the search information; alternatively, the first and second electrodes may be,
and carrying out weighted summation on the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information.
In a possible implementation manner, before determining a presentation position of each media content in the search result presentation page of the client based on a similarity between each media content associated with the search information and the search information, the second determining module 303 is further configured to:
acquiring user behavior data corresponding to each media content;
the second determining module 303, when configured to determine a presentation position of each media content in the search result presentation page of the client based on a similarity between each media content associated with the search information and the search information, includes:
and determining the display position of each media content in the search result display page of the client based on the similarity between each media content and the search information and the user behavior data corresponding to the media content.
In a possible implementation manner, the determining apparatus further includes a sending module 304, and after the second determining module 303 determines the presentation position of each media content in the search result presentation page of the client, the sending module 304 is configured to:
and sending the display positions corresponding to the plurality of media contents and the plurality of media contents to the client.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Corresponding to the method for determining the display position in fig. 1, an embodiment of the present disclosure further provides an electronic device 400, and as shown in fig. 4, a schematic structural diagram of the electronic device 400 provided in the embodiment of the present disclosure includes:
a processor 41, a memory 42, and a bus 43; the memory 42 is used for storing execution instructions and includes a memory 421 and an external memory 422; the memory 421 is also referred to as an internal memory, and is used for temporarily storing the operation data in the processor 41 and the data exchanged with the external memory 422 such as a hard disk, the processor 41 exchanges data with the external memory 422 through the memory 421, and when the electronic device 400 operates, the processor 41 communicates with the memory 42 through the bus 43, so that the processor 41 executes the following instructions: acquiring search information, and performing word segmentation processing on the search information to obtain multiple types of word units related to the search information; for each media content in a plurality of media contents associated with the search information, determining similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information; and determining the display position of each media content in the search result display page of the client based on the similarity between each media content associated with the search information and the search information.
The embodiment of the present disclosure also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for determining a display position in the above method embodiments are executed. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The computer program product of the method for determining the display position provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the method for determining the display position in the above method embodiments, which may be referred to in the above method embodiments specifically, and are not described herein again.
The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (12)

1. A method for determining a display position, comprising:
acquiring search information, and performing word segmentation processing on the search information to obtain multiple types of word units related to the search information;
for each media content in a plurality of media contents associated with the search information, determining similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information;
and determining the display position of each media content in the search result display page of the client based on the similarity between each media content associated with the search information and the search information.
2. The method for determining according to claim 1, wherein the performing word segmentation processing on the search information to obtain multiple types of word units associated with the search information includes:
performing word segmentation processing on the search information according to the language type of the search information to obtain a plurality of words contained in the search information;
extracting at least two continuous words capable of forming phrases under the language type according to the position of each word in the search information to obtain the phrases contained in the search information;
for each word with the length exceeding a set value, extracting the prefix of the word to obtain the prefix contained in the search information, and for each phrase formed by the words with the length exceeding the set value, extracting the prefix of each word in the phrase to obtain the prefix phrase contained in the search information;
and taking the words, phrases, prefixes and prefix phrases contained in the search information as various types of word units related to the search information.
3. The determination method according to claim 1 or 2, wherein the determining, for each media content associated with the search information, the similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information comprises:
acquiring text information associated with each media content associated with the search information;
searching a target word unit matched with the word unit associated with the search information in the text information associated with the media content;
and acquiring the predetermined importance of each target word unit, and determining the similarity between the media content and the search information based on the importance of each target word unit.
4. The method for determining according to claim 3, wherein the obtaining, for each media content associated with the search information, text information associated with the media content comprises:
acquiring text information of the media content under different dimensions;
and splicing the text information under each dimension to obtain the text information associated with the media content.
5. The determination method according to claim 3, wherein the importance of each target word unit is predetermined in the following manner:
and determining the importance of each target word unit based on the corresponding media content quantity of each target word unit in the media content library and the total media content quantity in the media content library.
6. The method for determining according to claim 3, wherein said determining similarity between the media content and the search information based on the importance of each target word unit comprises:
determining the sum of the importance of each type of word unit related to the search information based on the importance of each target word unit and the type of the target word unit;
and determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information.
7. The determination method according to claim 6, wherein determining the similarity between the media content and the search information based on the sum of the importance of each type of word unit associated with the search information comprises:
multiplying the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information; alternatively, the first and second electrodes may be,
and carrying out weighted summation on the sum of the importance degrees of each type of word unit associated with the search information to obtain the similarity between the media content and the search information.
8. The method for determining according to claim 1, wherein said determining, based on a similarity between each media content associated with the search information and the search information, a presentation position of each media content in a search result presentation page of a client is determined, and the method further comprises:
acquiring user behavior data corresponding to each media content;
the determining, based on the similarity between each media content associated with the search information and the search information, a presentation position of each media content in a search result presentation page of the client includes:
and determining the display position of each media content in the search result display page of the client based on the similarity between each media content and the search information and the user behavior data corresponding to the media content.
9. The method of claim 1, wherein after determining a presentation position of each of the media contents in a search result presentation page of the client, the method further comprises:
and sending the plurality of media contents and the display positions corresponding to the plurality of media contents to the client.
10. An apparatus for determining a display position, comprising:
the information processing module is used for acquiring search information and performing word segmentation processing on the search information to obtain multiple types of word units related to the search information;
a first determining module, configured to determine, for each media content in a plurality of media contents associated with the search information, a similarity between the media content and the search information based on text information associated with the media content and a plurality of types of word units associated with the search information;
and the second determination module is used for determining the display position of each media content in the search result display page of the client based on the similarity between each media content associated with the search information and the search information.
11. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the determination method according to any one of claims 1 to 9.
12. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, performs the steps of the determination method according to one of claims 1 to 9.
CN202010439248.3A 2020-05-22 2020-05-22 Display position determining method and device, electronic equipment and storage medium Pending CN111597471A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010439248.3A CN111597471A (en) 2020-05-22 2020-05-22 Display position determining method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010439248.3A CN111597471A (en) 2020-05-22 2020-05-22 Display position determining method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111597471A true CN111597471A (en) 2020-08-28

Family

ID=72183136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010439248.3A Pending CN111597471A (en) 2020-05-22 2020-05-22 Display position determining method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111597471A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104850609A (en) * 2015-05-08 2015-08-19 湖北光谷天下传媒股份有限公司 Filtering method aiming at character-skipping keywords
US20150286718A1 (en) * 2014-04-04 2015-10-08 Fujitsu Limited Topic identification in lecture videos
CN104978314A (en) * 2014-04-01 2015-10-14 深圳市腾讯计算机系统有限公司 Media content recommendation method and device
US20160147878A1 (en) * 2014-11-21 2016-05-26 Inbenta Professional Services, L.C. Semantic search engine
CN108021566A (en) * 2016-10-31 2018-05-11 方正国际软件(北京)有限公司 A kind of search method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978314A (en) * 2014-04-01 2015-10-14 深圳市腾讯计算机系统有限公司 Media content recommendation method and device
US20150286718A1 (en) * 2014-04-04 2015-10-08 Fujitsu Limited Topic identification in lecture videos
US20160147878A1 (en) * 2014-11-21 2016-05-26 Inbenta Professional Services, L.C. Semantic search engine
CN104850609A (en) * 2015-05-08 2015-08-19 湖北光谷天下传媒股份有限公司 Filtering method aiming at character-skipping keywords
CN108021566A (en) * 2016-10-31 2018-05-11 方正国际软件(北京)有限公司 A kind of search method and device

Similar Documents

Publication Publication Date Title
US8725717B2 (en) System and method for identifying topics for short text communications
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
Furlan et al. Semantic similarity of short texts in languages with a deficient natural language processing support
WO2019217096A1 (en) System and method for automatically responding to user requests
EP2829990A1 (en) Image search device, image search method, program, and computer-readable storage medium
US8825620B1 (en) Behavioral word segmentation for use in processing search queries
US8793120B1 (en) Behavior-driven multilingual stemming
CN110019669B (en) Text retrieval method and device
CN112633000A (en) Method and device for associating entities in text, electronic equipment and storage medium
CN113326420A (en) Question retrieval method, device, electronic equipment and medium
CN112395867A (en) Synonym mining method, synonym mining device, synonym mining storage medium and computer equipment
CN111813993A (en) Video content expanding method and device, terminal equipment and storage medium
CN113094519B (en) Method and device for searching based on document
CN111597469B (en) Display position determining method and device, electronic equipment and storage medium
CN109522275B (en) Label mining method based on user production content, electronic device and storage medium
CN111753204B (en) Information pushing method and device, electronic equipment and storage medium
CN113378058A (en) Information searching method and device, computer equipment and storage medium
CN111460808B (en) Synonymous text recognition and content recommendation method and device and electronic equipment
CN112231468A (en) Information generation method and device, electronic equipment and storage medium
CN112231513A (en) Learning video recommendation method, device and system
CN111339778A (en) Text processing method, device, storage medium and processor
EP2219121A1 (en) Efficient computation of ontology affinity matrices
CN111597471A (en) Display position determining method and device, electronic equipment and storage medium
CN113627201B (en) Information extraction method and device, electronic equipment and storage medium
CN113449063B (en) Method and device for constructing document structure information retrieval library

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination