CN117390204A - Object proxy mining method, device, computer equipment and storage medium - Google Patents

Object proxy mining method, device, computer equipment and storage medium Download PDF

Info

Publication number
CN117390204A
CN117390204A CN202210783692.6A CN202210783692A CN117390204A CN 117390204 A CN117390204 A CN 117390204A CN 202210783692 A CN202210783692 A CN 202210783692A CN 117390204 A CN117390204 A CN 117390204A
Authority
CN
China
Prior art keywords
data
media
interaction
target
interaction data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210783692.6A
Other languages
Chinese (zh)
Inventor
陈小帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Beijing Co Ltd
Original Assignee
Tencent Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Beijing Co Ltd filed Critical Tencent Technology Beijing Co Ltd
Priority to CN202210783692.6A priority Critical patent/CN117390204A/en
Publication of CN117390204A publication Critical patent/CN117390204A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present application relates to an object proxy mining method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring media interaction data corresponding to media data related to a target object; selecting target interaction data for interaction of target objects from the media interaction data; digging a plurality of alternative names corresponding to the target object according to the target interaction data; verifying the validity of each alternative name based on the name of the target object or at least one data in media content related to the target object in the media data; and determining the target code belonging to the target object based on the alternative code passing the validity verification. By adopting the method, incorrect false calls can be removed, or the codes irrelevant to the target object can be removed. The target proxy belonging to the target object can be efficiently and accurately mined without manual auditing and screening in the whole process, and the labor cost is greatly saved.

Description

Object proxy mining method, device, computer equipment and storage medium
Technical Field
The present invention relates to the field of computer data processing technology, and in particular, to an object proxy mining method, an object proxy mining device, a computer device, a storage medium, and a computer program product.
Background
In daily life, people usually take corresponding names for some objects, such as nicknames of movie images, unique names of public characters or the names of virtual even images. The designations of these objects may typically carry information to be the basis for subsequent processing, such as designations that may reflect the emotion or attitude of a person to the object, for relevant content recommendations for the object, etc.
In the related art, the object representation is usually obtained by crawling information in a web page by a crawler, and then manually screening the crawled information to determine the representation of the target object. The method has the advantages that the method is seriously dependent on operators, manual screening is needed, and the problem of low proxy excavation efficiency exists.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an object proxy mining method, apparatus, computer device, storage medium, and computer program product.
In one aspect, the present application provides a method for object proxy mining, the method including:
acquiring media interaction data corresponding to media data related to a target object;
selecting target interaction data for interaction of target objects from the media interaction data;
Digging a plurality of alternative names corresponding to the target object according to the target interaction data;
verifying the validity of each alternative name based on the name of the target object or at least one data in media content related to the target object in the media data;
and determining the target code belonging to the target object based on the alternative code passing the validity verification.
In another aspect, the present application further provides an object proxy mining apparatus, including:
the acquisition module is used for acquiring media interaction data corresponding to the media data related to the target object;
the screening module is used for screening target interaction data for interaction of the target objects from the media interaction data;
the mining module is used for mining a plurality of alternative names corresponding to the target object according to the target interaction data;
the verification module is used for verifying the validity of each alternative name based on the name of the target object or at least one data in the media content related to the target object in the media data;
and the determining module is used for determining the target code belonging to the target object based on the alternative code passing the validity verification.
In one embodiment, the screening module is specifically configured to screen the first interaction data including the title of the target object from the media interaction data; and screening target interaction data from the media interaction data according to the proximity degree between the interaction time of the first interaction data and the interaction time of the media interaction data.
In one embodiment, the screening module is specifically configured to determine an occurrence time of the target object in the media data; and screening the target interaction data from the media interaction data according to the approaching degree between the appearance time of the target object and the interaction time of the media interaction data.
In one embodiment, the mining module is specifically configured to obtain a plurality of word segments based on the target interaction data; for each word segment, determining second interaction data comprising the corresponding word segment from the target interaction data, and determining interaction weights corresponding to the second interaction data respectively; determining the word frequency weight corresponding to each word segment based on the interaction weight of the second interaction data corresponding to each word segment; and screening word fragments meeting preset conditions from the plurality of word fragments as alternative codes according to the word frequency weights corresponding to the word fragments.
In one embodiment, the mining module is further configured to obtain preference degrees of the release objects of the second interaction data on the target objects respectively; determining the respective interaction heat of each second interaction data; and determining the interaction weight corresponding to each second interaction data according to the preference degree and the interaction heat corresponding to each second interaction data.
In one embodiment, the mining module is further configured to determine a plurality of media data related to the target object, and determine a media heat corresponding to each media data; obtaining the perception completion degree of the release object of the second interactive data aiming at each media data; and determining the preference degree of the release object to the target object according to the perception completion degree of the release object to each piece of media data and the media heat degree of each piece of media data.
In one embodiment, the validity verification includes a first validity verification and a second validity verification; the verification module is specifically used for determining first interaction data of the title comprising the target object from the media interaction data; performing first validity verification on each alternative model based on the first interactive data or at least one interactive data in third interactive data comprising the alternative model; and taking the alternative representation passing the first validity verification as a candidate representation, and carrying out second validity verification on the candidate representation according to the media content related to the target object in the media data.
In one embodiment, the verification module is further configured to replace, for a current alternative of the plurality of alternative alternatives, the current alternative to replace the local alternative in the first interaction data to obtain a first replacement text, where the current alternative is any alternative; determining a first semantic smoothness of the first alternative text; replacing the current alternative name in the third interactive data by the local name to obtain a second alternative text, and determining the second semantic smoothness of the second alternative text; and determining a first validity verification result of each alternative proxy according to the first semantic smoothness and the second semantic smoothness corresponding to each alternative proxy.
In one embodiment, the verification module is further configured to determine, for a current candidate generation among the plurality of candidate generation, fourth interaction data including the current candidate generation, the current candidate generation being referred to as any candidate generation; determining media content matched with fourth interaction data in the media data; under the condition that a target object appears in the media content matched with the fourth interaction data, determining the corresponding fourth interaction data as co-occurrence interaction data; and determining a second validity verification result of each candidate generation based on the co-occurrence interaction data respectively corresponding to each candidate generation.
In one embodiment, the verification module is further configured to determine, for a current candidate generation of the plurality of candidate generation, co-occurrence interaction data and fourth interaction data corresponding to the current candidate generation; according to the interaction weight of each co-occurrence interaction data, calculating to obtain a first weight sum, and according to the interaction weight of each fourth interaction data, calculating to obtain a second weight sum; determining a co-occurrence probability score corresponding to the current candidate generation name based on the comparison value between the first weight sum and the second weight sum; and determining a second validity verification result of each candidate generation according to the co-occurrence probability score corresponding to each candidate generation.
In one embodiment, in the case where there are a plurality of media data related to the target object, the alternative for passing the validity verification is referred to as a valid code; the determining module is specifically used for acquiring effective codes mined from the media interaction data of the plurality of media data and acquiring the effectiveness scores corresponding to the effective codes respectively; determining the respective corresponding media weight of each media data; and screening the effective codes based on the media weights respectively corresponding to the media data and the effectiveness scores respectively corresponding to the effective codes to obtain target codes belonging to target objects.
In one embodiment, the determining module is further configured to obtain a first score corresponding to the effective proxy when the effective proxy is mined from the target interaction data; obtaining a second score corresponding to the validity of the valid proxy; a validity score for the validity proxy is determined based on the first score and the second score.
In one embodiment, the apparatus further comprises:
and the updating module is used for returning to the step of screening the target interaction data for interaction of the target objects from the media interaction data to continue to execute when the interaction increment of the media interaction data reaches the preset quantity so as to update the target names of the target objects.
In one embodiment, the apparatus further comprises:
the pushing module is used for taking a target code belonging to the target object as an associated attention label of the object to be recommended under the condition that the attention label of the object to be recommended comprises the target object; media interaction data related to the associated attention tag is pushed to the object to be recommended.
In one embodiment, the pushing module is further configured to, when pushing media interaction data related to each target name to the object to be recommended under the condition that the focus tag of the object to be recommended does not include the local name of the target object, label the local name of the target object to a preset area adjacent to a display area of the media interaction data.
In another aspect, the present application further provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps in the object proxy mining method when executing the computer program.
In another aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the object proxy mining method described above.
In another aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the above-described object proxy mining method.
The object proxy mining method, the object proxy mining device, the computer equipment, the storage medium and the computer program product acquire media interaction data corresponding to the media data related to the target object, so that target interaction data for interaction of the target object is screened out from the acquired media interaction data. Because the interactive data for the target object often includes a plurality of names for different users to call the target object, a plurality of alternative names corresponding to the target object can be accurately mined according to the target interactive data. Based on the above, validity verification is performed on each alternative name based on the name of the target object or at least one data in the media data related to the target object, so that incorrect false names can be removed, or names irrelevant to the target object can be removed. The target proxy belonging to the target object can be efficiently and accurately mined without manual auditing and screening in the whole process, and the labor cost is greatly saved.
In addition, according to the scheme, the proxy can be mined from the media interaction data corresponding to the media data, the media interaction data is large in data size and more in groups related to interaction, so that the coverage rate of the proxy mining can be ensured, namely the target proxy belonging to the target object can be comprehensively mined. In addition, as the media interaction data is updated along with the migration of time, the representation of the target object can be dynamically updated by the method, so that the timeliness of the representation is improved.
Drawings
FIG. 1 is a diagram of an application environment for an object proxy mining method in one embodiment;
FIG. 2 is a flow diagram of a method of object proxy mining in one embodiment;
FIG. 3 is a flow chart of another embodiment of a method for object proxy mining;
FIG. 4 is a schematic diagram of a sentence-through recognition model in one embodiment;
FIG. 5 is a flow chart of a method of object representation mining in yet another embodiment;
FIG. 6 is a diagram of relationships between multiple media data in one embodiment;
FIG. 7 is a flow chart of a method of object proxy mining in yet another embodiment;
FIG. 8 is a schematic diagram of a video data and text bullet screen in one embodiment;
FIG. 9 is a flow chart of a method of object representation mining in yet another embodiment;
FIG. 10 is a flow chart of a method of object representation mining in yet another embodiment;
FIG. 11 is a flow chart of a method of object representation mining in yet another embodiment;
FIG. 12 is a block diagram of an object representation mining apparatus in one embodiment;
fig. 13 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
First, before describing the object code mining method of the present application in detail, a brief explanation is given for some related nouns in the embodiments of the present application:
the object is: refers to things that are objects of processing. In the embodiment of the present application, the processing procedure for the object is mainly referred to as mining, so that the object may be a real person, a virtual person, a movie work or a literature work, or the like, which is not specifically limited in the embodiment of the present application.
The method is characterized by comprising the following steps: the formal names of the target objects, such as person names, school names or other names, and the formal names of the character names in the works, etc. are also referred to.
And (3) the substitution is as follows: refers to alternative names of formal names, such as nicknames given to viewers in movie works for the movie work image.
Media data: refers to carrier data for conveying information, such as text data, audio data, or video data, and the embodiment of the present application does not specifically limit the type of data.
Media interaction data: refers to the data which is published by the object of the perceived media data and interacts with the media data in the process of externally presenting the media data. The format of the media interaction data may be voice, text, image or video, and the presentation form of the media interaction data may be a barrage or a general comment, which is not limited in this embodiment of the present application.
The following describes the object-substitution mining method provided in the embodiment of the present application in detail:
in some embodiments, the object proxy mining method provided in the embodiments of the present application may be applied to an application environment as shown in fig. 1. The terminal 102 may directly or indirectly communicate with the server 104 through a wired or wireless network, which is not specifically limited in the embodiments of the present application. In addition, the terminal 102 or the server 104 may be used separately to perform the object proxy mining method in the embodiment of the present application, or may be used cooperatively to perform the object proxy mining method in the embodiment of the present application, which is not particularly limited in the embodiment of the present application.
Taking as an example one of implementation procedures when the server 104 alone executes the object proxy mining method. Specifically, the server 104 may obtain media interaction data corresponding to media data related to the target object, and filter target interaction data for interaction with the target object from the media interaction data. Server 104 discovers a plurality of alternative names corresponding to the target object according to the target interaction data. The server performs validity verification on each alternative name based on the name of the target object or at least one data in the media data related to the target object. And determining the target code belonging to the target object based on the alternative code passing the validity verification. The data storage system may store the server 104 media data and the acquired media interaction data. The data storage system may be provided separately, integrated on the server 104, or integrated on the cloud or other servers.
The terminal 102 may be, but not limited to, various desktop computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The terminal 102 may have an application program, such as a video application, or an audio application, running thereon for presenting media data and media interaction data. The server 104 may be a background server corresponding to software, a web page, an applet, or the like, or a server dedicated to object code mining, which is not specifically limited in the embodiment of the present application. The server 104 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), basic cloud computing services such as big data and artificial intelligent platforms, and the like.
In some embodiments, in conjunction with the above explanation of terms, explanation of techniques, and description of the implementation environment, as shown in fig. 2, an object proxy mining method is provided, which is described by taking application of the method to a computer device (the computer device may specifically be a terminal or a server in fig. 1) as an example, and includes the following steps:
step 202, media interaction data corresponding to media data related to the target object is acquired.
In particular, the relative interpretation of objects, media data, and media interaction data may refer to the relative interpretation of the previous noun interpretation. Wherein, the reference to the target object means that the media data is related to the target object. For example, taking media data as video, and taking a video actor as an example as a target object, the video related to the video actor may refer to a video containing a role played by the video actor, where the video may contain a video image of the video actor. Taking media data as video, and taking a virtual character as an example, the video related to the virtual character may also refer to video played by the virtual character, such as a concert video of the virtual character.
For another example, taking media data as audio, and taking a dubbing actor as an example as a target object, the audio related to the dubbing actor may be audio including a dubbing character dubbed by the dubbing actor, such as a speech novel, and the audio may include a sound image of the dubbing actor. The text of the real person (e.g. name of the person) may be referred to herein, for example, by taking media data as text, and the target object is a real person, and the text of the real person may be referred to herein, for example, by reporting a person in a news story, etc., and the name of the real person is referred to herein. It should be noted that, the above examples list a plurality of scenes to which the embodiments of the present application are applicable, and the actual implementation process may not be limited to the plurality of scenes.
Step 204, screening the target interaction data for interaction of the target object from the media interaction data.
It will be appreciated that not all media interaction data corresponding to a certain media data is typically interacted with for the target object. For example, taking media data as an example of a video, an actor playing the video may have actor a and actor B. The target object is actor a, and the media interaction data corresponding to the video may have media interaction data for the interaction of actor a or media interaction data for the interaction of actor B. Instead, since the mining of the target object is a proxy, the computer device will naturally mine the target interaction data for the interaction of the target object in this step.
And 206, mining a plurality of alternative names corresponding to the target object according to the target interaction data.
Since the alternative references are generally referred to as text data, the computer device may first obtain the corresponding target interaction text of the target interaction data prior to the mining process. Specifically, if the target interaction data is text format data, the computer device may directly use the target interaction data as the target interaction text. For example, a computer device may directly obtain a text bullet screen in a video as target text data. If the target interactive data is the data in the non-text format, the computer equipment can firstly convert the target interactive data in the non-text format into the data in the text format, so as to obtain the target interactive text. For example, if the target interactive data is audio format data, the computer device may perform speech recognition based on the target interactive data to obtain the target interactive text. It is also understood that the audio bullet screen is converted into text data.
Further, after the computer equipment acquires the target interaction text corresponding to the target interaction data, the target interaction text can be mined. Wherein the mining process may be field interception. Taking the target interactive text as an example of 'AAB finally coming out', partial fields which are presented in sequence can be intercepted as alternative names of the target object. Such as "AA", "AAB", and "AAB termination", etc.
Step 208, verifying the validity of each alternative name based on the name of the target object or at least one data in the media data related to the target object.
It is understood that the format of the media content is related to the format of the media data. For example, if the media data is video, the media content may be a video frame. If the media data is audio, the media content may be an audio frame. If the media data is text, the media content may be a paragraph of text.
In some embodiments, the validity of the alternative names based on the present name of the target object may be specifically understood as judging the closeness degree between the meaning represented by each alternative name and the meaning represented by the present name, and the closer is the possibility that the alternative name is represented as the target name. The specific verification process can be as follows: the computer device refers to the alternative and the present name as similarity calculation, and refers to the alternative with similarity larger than a preset threshold value as the name passing the validity verification. Or, online webpage content is acquired to count the number of online webpage content co-occurring between the local title and the alternative title, so as to calculate the co-occurrence ratio between the local title and the alternative title, and the alternative title with the co-occurrence ratio larger than a preset threshold value is taken as the title passing the validity verification. Alternatively, the present name and the alternative name may be replaced in the respective target interaction data, and validity verification may be performed according to semantic consistency of the text after replacement.
The effectiveness verification of the alternative representations based on the media content related to the target object can be specifically understood as whether the presentation time of each alternative representation (i.e. the interaction time of the target interaction data from which the alternative representation is derived) is close to the occurrence time of the target object, and the closer the presentation time is, the higher the possibility that the alternative representation is the representation of the target object is. The specific verification process can be as follows: the computer equipment calculates the time interval between the appearance time of the target object in the media content related to the target object and the interaction time of the target interaction data comprising the alternative substitution, and the alternative substitution with the time interval smaller than the preset threshold value is taken as the substitution passing the validity verification. The computer equipment can also acquire the appearance time of the target object in the media content related to the target object, and determine that the interaction time is in a time period range determined based on the appearance time and comprises the media interaction data of the name; and calculating the time interval between the media interaction data and the interaction time of the target interaction data comprising the alternative names, and taking the alternative names with the time interval smaller than a preset threshold value as the names passing the validity verification.
Step 210, determining a target name belonging to the target object based on the alternative names passing the validity verification.
Specifically, the computer device may directly use the candidate agent passing the validity verification as the target agent belonging to the target object, or may further screen the candidate agent passing the validity verification to obtain the target agent belonging to the target object. The further screening process may be automatic screening based on field length (for example, screening out a target name with a text total number less than 5), or manual screening, which is not limited in this embodiment.
According to the object proxy mining method, the media interaction data corresponding to the media data related to the target object is obtained, and therefore the target interaction data for interaction of the target object is screened out from the obtained media interaction data. Because the interactive data for the target object often includes a plurality of names for different users to call the target object, a plurality of alternative names corresponding to the target object can be accurately mined according to the target interactive data. Based on the above, validity verification is performed on each alternative name based on the name of the target object or at least one data in the media data related to the target object, so that incorrect false names can be removed, or names irrelevant to the target object can be removed. The target proxy belonging to the target object can be efficiently and accurately mined without manual auditing and screening in the whole process, and the labor cost is greatly saved.
In addition, according to the scheme, the proxy can be mined from the media interaction data corresponding to the media data, the media interaction data is large in data size and more in groups related to interaction, so that the coverage rate of the proxy mining can be ensured, namely the target proxy belonging to the target object can be comprehensively mined. In addition, as the media interaction data is updated along with the migration of time, the representation of the target object can be dynamically updated by the method, so that the timeliness of the representation is improved.
Further, a business data processing basis can be provided for business scenes such as content understanding, content retrieval, content recommendation and content interaction of the media platform based on the target proxy, so that the overall business effect of the media platform can be improved.
In some embodiments, screening target interaction data from media interaction data for interaction with a target object includes: screening out first interaction data of the title including the target object from the media interaction data; and screening target interaction data from the media interaction data according to the proximity degree between the interaction time of the first interaction data and the interaction time of the media interaction data.
In particular, the form "comprising" of the first interactive data of the present name comprising the target object may relate to the format of the media interactive data. For example, if the media interaction data is a text bullet screen, the first interaction data including the term may refer to the media interaction data including the term. For another example, if the media interaction data is an audio bullet screen, the computer device may convert the media interaction data into text first, and the first interaction data including the name may refer to the media interaction data including the name after being converted into text.
Wherein the proximity may be quantified by a time interval, the smaller the time interval, the closer the indication. Because the first interaction data comprises the name, the computer equipment can clearly determine that the first interaction data is related to the target object. For a certain media interaction data, if the interaction time of the media interaction data is closer to the interaction time of the first interaction data, the likelihood that the media interaction data is related to the target object is higher. Therefore, the time interval of interaction time between the two can be used as a screening basis. It should be noted that the number of the first interactive data may be more than one, and in the actual implementation process, each media interactive data may be screened separately for each first interactive data.
In the above embodiment, the media interaction data may be screened according to the interaction time of the first interaction data explicitly related to the target object and the proximity degree between the interaction time of the first interaction data and the interaction time of each media interaction data, so that the data processing amount may be reduced and the processing efficiency may be improved. In addition, the screened media interaction data can be related to the target object as much as possible, so that the accuracy of the follow-up proxy mining can be improved.
In some embodiments, screening target interaction data from media interaction data for interaction with a target object includes: determining the occurrence time of a target object in media data; and screening the target interaction data from the media interaction data according to the approaching degree between the appearance time of the target object and the interaction time of the media interaction data.
In particular, the form of "occurrence" in "occurrence time of the target object" may be related to the type of media data. For example, if the media data is video data and the target object is a video actor, the appearance time of the target object may refer to the appearance time of the video actor in the video image that the video data shows. If the media data is audio data and the target object is a dubbing actor, the appearance time of the target object may refer to the appearance time of the dubbing actor in the sound image played by the video data.
The proximity may also be quantified by a time interval, the smaller the time interval, the closer the proximity is indicated. For certain media interaction data, if the interaction time of the media interaction data is closer to the appearance time of the target object, the likelihood that the media interaction data is related to the target object is higher. Therefore, the time interval of interaction time between the two can be used as a screening basis. It should be noted that the number of the occurrence times of the target object may be more than one, and in the actual implementation process, the computer device may filter each media interaction data for each occurrence time of the target object.
In the above embodiment, the media interaction data may be screened according to the appearance time of the target object in the media data and the proximity degree between the appearance time and the interaction time of each media interaction data, so that the data processing amount may be reduced, and the processing efficiency may be improved. In addition, the screened media interaction data can be related to the target object as much as possible, so that the accuracy of the follow-up proxy mining can be improved.
In some embodiments, mining a plurality of alternative representations corresponding to a target object from target interaction data includes: obtaining a plurality of word segments based on the target interaction data; for each word segment, determining second interaction data comprising the corresponding word segment from the target interaction data, and determining interaction weights corresponding to the second interaction data respectively; determining the word frequency weight corresponding to each word segment based on the interaction weight of the second interaction data corresponding to each word segment; and screening word fragments meeting preset conditions from the plurality of word fragments as alternative codes according to the word frequency weights corresponding to the word fragments.
Specifically, the computer device may obtain a plurality of word segments by word segmentation of the target interaction data. In addition, in some embodiments, a plurality of word segments may be obtained using a process comprising: acquiring a target interaction text corresponding to the target interaction data; and carrying out field interception on the target interactive text according to the preset length, and taking a plurality of results obtained directly after interception as a plurality of word fragments. It should be noted that, the word segment obtained by the capturing may not be a known word collocation or expression, but may be a newly created character combination, and the capturing result is collectively referred to as the word segment in the embodiment of the present application. In addition, in consideration of the consistency of the code, the intercepting result does not need to be combined in the actual implementation process, and the intercepting result can be directly used as a word segment.
It should be further noted that, the intercepting result may not be performed once, but the field intercepting may be performed on the target interactive text by adjusting the preset length multiple times. For example, the computer device may obtain a preset length range, determine a plurality of preset lengths from the preset length range, and perform field interception on the target interactive text according to the plurality of preset lengths. The preset length range is set because the substitution is generally simple and convenient to call, so that the substitution length is not too long, the substitution of any length can be covered by setting a reasonable range, and the data processing capacity caused by exhausting all possible preset lengths can be avoided. In addition, there may be more than one target interactive data, and thus there may be more than one target interactive text as an intercepted object.
Since there may be more than one target interaction data, the computer device may determine the target interaction data including the word segments as the second interaction data after obtaining the plurality of word segments based on the target interaction data. The interaction weight may reflect the heat of the second interaction data and the degree of targeting the target object. The calculation mode of the interaction weight can be related to the format of the second interaction data and the interaction form of the second interaction data. For example, if the second interactive data is a text bullet screen and the interactive form of the second interactive data is praise, the interactive weight calculation mode of the second interactive data may be a ratio between the praise number of the second interactive data and the total praise number of all the second interactive data corresponding to the media data.
It will be appreciated that word segments with sufficient heat are more likely to be a proxy for the target object. Thus, for each word segment, the computer device may determine a respective word frequency weight for each word segment based on the respective interaction weight of the corresponding second interaction data. For a certain word segment, the second interaction data corresponding to the word segment refers to the second interaction data comprising the word segment. In addition, the process of determining the word frequency weight corresponding to a word segment by the computer device may be to integrate the interaction weight of the second interaction data corresponding to the word segment. The integration mode may be direct summation or weighted summation, which is not limited in the embodiment of the present application. The weight used in the weighted summation may be set according to the proximity between the interaction time of the second interaction data and the occurrence time of the target object in the media data, which is not limited in detail in the embodiment of the present application.
After obtaining the word frequency weight corresponding to each word segment through the above process, the computer device may further screen the plurality of word segments to obtain alternative names. The larger the word frequency weight is, the more the corresponding word segment is represented as a substitution is possible. The preset condition may include that the word frequency weight is greater than a preset threshold, that is, a word segment with the word frequency weight greater than the preset threshold is selected from the plurality of word segments as an alternative, which is not specifically limited in the embodiment of the present application. It should be noted that, after the word segments satisfying the preset condition are selected, it is considered that the disuse words such as "have", "do", "all" and "why" are not included in the substitution. Thus, the computer device may delete the word segment including the stop word from the word segments satisfying the preset condition, and take the remaining word segments as alternative names.
In the above embodiment, for a word segment, the word frequency weight of the word segment may be determined based on the interaction weight corresponding to the second interaction data including the word segment, where the interaction weight may reflect the heat of the second interaction data, and thus the word frequency weight may also reflect the heat of the second interaction data. Because the heat degree corresponding to the word segments can reflect the possibility of the word segments as the codes, the alternative codes are screened from a plurality of word segments based on the word frequency weight corresponding to the word segments, and the accuracy rate of the subsequent code mining can be improved.
In some embodiments, determining the interaction weight corresponding to each of the second interaction data includes: obtaining preference degrees of release objects of the second interaction data on the target objects respectively; determining the respective interaction heat of each second interaction data; and determining the interaction weight corresponding to each second interaction data according to the preference degree and the interaction heat corresponding to each second interaction data.
Specifically, the preference degree of the release object to the target object can be obtained through the following process: the computer device obtains the preference degree of the release object on the target object according to the perception completion degree of the release object on the media data related to the target object and the proximity degree between the perception completion degrees of all the media data. Wherein "all media data" in the sense completion degree for "all media data" refers to media data including both media data related to the target object and media data not related to the target object. And the determination of the perceived completion may be related to the type of media data. For example, if the media data is video data, the perception completion may be calculated according to the duration of viewing video data related to the target object by the publishing object and the duration of viewing all video data. The proximity can be quantified by the ratio between the two perceptions, the larger the ratio the closer the indication.
Of course, in some embodiments, in the actual implementation process, the computer device may also use other manners, for example, the preference degree of the publishing object to the target object may be obtained according to the forwarding times of the publishing object to the media data related to the target object and the proximity degree between the forwarding times of forwarding all media data, which is not limited in this embodiment of the present application. The proximity degree can be quantified by the ratio between the two forwarding times, and the larger the ratio is, the closer the proximity is indicated. And the closer, the higher the degree of preference.
In some embodiments, the interaction heat of the second interaction data may be determined by: the computer equipment determines the interaction heat of the second interaction data according to the interaction times of the second interaction data and the proximity degree between the interaction times of all media interaction data corresponding to the media data. The interaction may be praise or forwarding, which is not limited in this embodiment of the present application. The proximity can be quantified by the ratio between the two interacted times, the larger the ratio, the closer the indication. And the closer, the higher the heat of interaction.
Further, after determining the preference degree and the interaction heat degree corresponding to the second interaction data, the computer device may determine the interaction weight corresponding to the second interaction data according to the preference degree and the interaction heat degree. Specifically, the computer device may use a product between the preference degree and the interaction heat degree corresponding to the second interaction data as the interaction weight corresponding to the second interaction data. It can be understood that the more preferred the published object is for the target object, and the higher the interaction heat of the second interaction data published by the published object, the higher the likelihood that the second interaction data may be media interaction data with high interaction heat published by the published object highly preferred for the target object, i.e. the second interaction data includes a name. Therefore, the probability that the second interactive data comprises the code can be represented through the interactive weight, so that the word frequency weight calculated according to the interactive weight can reflect the probability that the word segment is code.
In the above embodiment, for the second interaction data, the interaction heat of the second interaction data is determined according to the preference degree of the release object of the second interaction data on the target object and the interaction heat of the second interaction data. The preference degree and the interaction heat degree can jointly reflect the possibility of including the substitution in the second interaction data, so that the preference degree and the interaction heat degree can be used as the basis for screening the substitution, and the accuracy rate of subsequent substitution mining can be improved.
In some embodiments, the step of obtaining the preference degree of the release object of each second interaction data on the target object includes: determining a plurality of media data related to a target object, and determining the respective media heat of each media data; obtaining the perception completion degree of the release object of the second interactive data aiming at each media data; and determining the preference degree of the release object to the target object according to the perception completion degree of the release object to each piece of media data and the media heat degree of each piece of media data.
Specifically, for a certain media data, the media heat of the media data may be determined by the following procedure: the computer equipment acquires the media heat corresponding to the media data according to the perceived total duration of the media data and the proximity degree between the recorded longest perceived total duration of the media data. The proximity can be quantified by the ratio between the two durations, with a larger ratio indicating closer proximity. And the closer the proximity, the higher the media heat may be indicated. And the determination of the perceived completion of the release object of the second interactive data for each media data may be related to the type of media data. For example, if the media data is video data, the perception completion degree for the media data may be calculated according to the duration of viewing the video data by the publishing object and the total duration of the video data. The proximity can be quantified by the ratio between the two perceptions, the larger the ratio the closer the indication.
After determining the perceived completion of the published object for each media data, and the media warmth of the respective media data, the computer device may determine the degree of preference of the published object for the target object. Specifically, the sum of media warmth of each media data can be calculated first; and taking the perception completion degree of the release object on each media data as a weight, carrying out weighted summation on the media heat degree of each media data, and taking the weighted summation result as the preference degree of the release object on the target object.
It is understood that the perceived completion of the publishing object with respect to the media data related to the target object can reflect the preference of the publishing object with respect to the target object. For example, it is natural that the more videos the actor is related to, the better the actor is. However, the media data has insufficient media heat, and even if the perception completion degree of the publishing object for the media data related to the target object is insufficient, the publishing object cannot represent that the publishing object is not good enough for the target object, mainly because the media heat is not high. Accordingly, the preference degree of the release object to the target object can be determined according to the media heat of the media data.
In the above embodiment, the result of determining the preference degree directly according to the perception completion degree of the publishing object for the media data may be inaccurate due to insufficient heat of the media data itself. By balancing the perception completion degree of the issuing object for the media data according to the media heat of the media data, data deviation caused by insufficient heat of the media data can be avoided. Therefore, the preference degree is used as the basis for screening the generation, and the accuracy rate of the subsequent generation mining can be improved.
In some embodiments, the validation includes a first validation and a second validation, the validation of each alternative proxy based on at least one of the native name of the target object or media content related to the target object in the media data, including: determining first interaction data of the title including the target object from the media interaction data; performing first validity verification on each alternative model based on the first interactive data or at least one interactive data in third interactive data comprising the alternative model; and taking the alternative representation passing the first validity verification as a candidate representation, and carrying out second validity verification on the candidate representation according to the media content related to the target object in the media data.
It is understood that the first interactive data includes the present name, and the third interactive data includes the spare name. If the alternative representation is indeed the target representation of the target object, then there should be some textual connection, such as a textual similarity, between the first interaction data and the alternative representation and between the third interaction data and the present representation. Based on this principle, a first validity verification can then be performed.
Thus, the process of performing a first validity verification on the alternative representation based solely on the first interaction data may be as follows: the computer equipment converts the alternative names and the first interaction data into text vectors; calculating a first text similarity between two text vectors; and taking the alternative names with the similarity of the first text larger than a preset threshold value as alternative names passing the first validity verification. Similarly, the process of performing the first validity verification on the alternative may be as follows based only on the third interaction data including the alternative: the computer equipment converts the native name and the third interaction data into text vectors; calculating a second text similarity between the two text vectors; and taking the alternative names with the second text similarity larger than the preset threshold value as the alternative names passing the first validity verification.
Of course, in the actual implementation process, if the validity verification is performed based on the first interactive data and the third interactive data including the alternative substitution at the same time, the computer device may combine, such as weight summation, the first text similarity with the second text similarity, so as to perform the first validity verification on the alternative substitution based on the combination result. It should be noted that, in the actual implementation process, the first validity verification may also be performed in other manners, which is not limited in particular in the embodiments of the present application.
Further, after obtaining the candidate representation through the first validity verification, the computer device may perform a second validity verification on the candidate representation according to media content related to the target object in the media data. It will be appreciated that since the media data relates to a target object, if the alternative is indeed the target representation of the target object, then there should be some association of the alternative with the media content in the media data that is related to the target object. Based on this principle, a second validity verification can then be performed.
Wherein, according to the media content related to the target object in the media data, the process of performing the second validity verification on the candidate agent may be as follows: the computer equipment counts the times of time overlapping between the occurrence time of the media content related to the target object in the media data and the interaction time of the media interaction data comprising the candidate agents; and if the time overlapping times are greater than a preset threshold value, determining that the corresponding candidate codes pass the second validity verification. In the above example, the first validity verification is performed first, and then the second validity verification is performed. In an actual implementation process, the computer device may perform the second validity verification first and then perform the first validity verification, which is not limited in this embodiment of the present application.
In the above embodiment, the candidate generation is obtained by performing the first validity verification on the candidate generation based on at least one of the first interaction data including the present generation or the third interaction data including the candidate generation. The candidate agent is validated for a second validity by being based on media content in the media data associated with the target object. Because the first validity verification can be performed on the alternative names based on the relation in the text sense, the accuracy rate of the subsequent name mining can be improved. In addition, the second validity verification can be performed on the alternative names based on the relation between the text and the media content, so that the accuracy rate of the subsequent name mining can be improved. Finally, as the first and second validity verification can be carried out, the accuracy of the subsequent generation mining can be comprehensively improved.
In some embodiments, as shown in fig. 3, performing the first validity verification on each alternative, based on at least one of the first interaction data or the third interaction data including the alternative, includes:
step 302, for a current alternative in a plurality of alternatives, replacing the present alternative in the first interactive data with the current alternative to obtain a first replacement text.
Step 304, determining a first semantic smoothness of the first alternative text.
It will be appreciated that the first interactive data is typically more than one, so that the replacement of the present name in the first interactive data with the current alternative name results in more than one first replacement text. Thus, the computer device may obtain the first semantic smoothness by counting the number of first alternative texts of the sentence smoothness in the plurality of first alternative texts. It should be noted that the first interactive data is derived from the media interactive data. It will be appreciated that there should be text data that is not media interaction data, but that also includes the native name of the target object. Therefore, in the actual implementation process, besides taking the first interaction data as the replacement object, other text data including the target object may also be taken as the replacement object, which is not limited in particular in the embodiment of the present application.
And 306, replacing the current alternative name in the third interactive data with the local name to obtain a second alternative text, and determining the second semantic smoothness of the second alternative text.
Similarly, the third interactive data is usually more than one, and the computer device can obtain the second semantic smoothness by counting the number of the second alternative texts with the smooth sentences in the plurality of the second alternative texts. The statement sequence judging process may be: the computer equipment acquires a text vector corresponding to the sentence to be judged; and inputting the text vector into the sentence-passing recognition model, and outputting the judging result of the sentence-passing. The statement-passing recognition model may be specifically constructed based on a BERT (Bidirectional Encoder Representations from Transformers, converter-based bi-directional coded representation) model, and specific structure may be referred to in fig. 4. In addition to the BERT model, fig. 4 also includes a fully connected network layer. The full-connection network layer is mainly used for mapping the recessive feature space output by the BERT model to the sample mark space, so that classification about statement sequence is realized. In addition, the sentence-in-one recognition model can be trained by marking the sentence-in-one sample media interaction data as 1 and the sentence-out sample media interaction data as 0.
Step 308, determining a first validity verification result of each alternative proxy according to the first semantic meaning and the second semantic meaning corresponding to each alternative proxy.
After determining the first semantic smoothness and the second semantic smoothness of each alternative, the computer device may obtain a replacement fluency score for each alternative. Specifically, the first semantic meaning and the second semantic meaning may be averaged or weighted and summed to obtain the alternative fluency score for each alternative proxy. The alternative codes with the alternative fluency score larger than the preset threshold value can be used as alternative codes passing the first validity verification, so that a first validity verification result of each alternative code is obtained.
In the above embodiment, since the alternative proxy is replaced to the text data including the present proxy, such as the first interactive data, if the alternative proxy is the target proxy of the target object, the replacing result should be statement-compliant, so that on the premise that the alternative proxy replaces the present proxy in the replacing manner, the alternative proxy can be effectively screened by using the first semantic compliance. Similarly, if the alternative is the target representation of the target object, the alternative is replaced to text data including the alternative, such as third interactive data, and the replacing result should be statement-compliant, so that the alternative can be effectively screened by using the second semantic smoothness on the premise of replacing the alternative by the alternative. In addition, the alternative names are screened by integrating the first semantic smoothness and the second semantic smoothness, so that the screening result is more accurate.
In some embodiments, performing a second validation of the candidate proxy based on media content in the media data associated with the target object includes: for a current candidate generation in the plurality of candidate generation, determining fourth interaction data comprising the current candidate generation, wherein the current candidate generation is called any candidate generation; determining media content matched with fourth interaction data in the media data; under the condition that a target object appears in the media content matched with the fourth interaction data, determining the corresponding fourth interaction data as co-occurrence interaction data; and determining a second validity verification result of each candidate generation based on the co-occurrence interaction data respectively corresponding to each candidate generation.
In one embodiment, determining the media content in the media data that matches the fourth interaction data may be implemented by: the computer equipment determines the interaction time of the fourth interaction data, and takes the media content of which the time interval between the presentation time and the interaction time in the media data is within the preset duration as the media content matched with the fourth interaction data. It will be appreciated that the interaction time of the media interaction data is typically advanced along with the progress of the presentation of the media data, for example, a user in video data typically issues bullet screen data for a video when it is played to a certain screen. Thus, if the fourth interactive data including the current candidate representation is actually directed to the target object, the media content whose presentation time is close to the interaction time of the fourth interactive data should be related to the target object. Wherein, "close to" may be reflected by the time interval being at a preset duration.
Further, after determining the media content in the media data that matches the fourth interaction data, the computer device may determine whether the target object is present in the media content that matches the fourth interaction data. Taking media data as video data as an example, according to the process mentioned in the foregoing embodiment, the computer device may determine, in the video data, an image frame whose time interval between the presentation time of the image frame and the interaction time of the fourth interaction data is within a preset duration, as media content in the video data that matches the fourth interaction data.
Thus, the process of the computer device determining whether the target object exists in the media content matched with the fourth interaction data may be as follows: for a plurality of image frames matched with fourth interaction data in the video data, carrying out image recognition on each image frame, and determining the image frames comprising the target object in the plurality of image frames; acquiring the number ratio of image frames comprising a target object in a plurality of image frames; and under the condition that the quantity ratio is larger than a preset threshold value, determining that a target object appears in the media content matched with the fourth interaction data. Thus, the computer device can determine the corresponding fourth interaction data as co-occurrence interaction data.
After determining the co-occurrence interaction data corresponding to each candidate generation, the computer device may determine a second validity verification result for each candidate generation. Specifically, for a candidate representation, the computer device may calculate a quantity ratio of co-occurrence interaction data in all fourth interaction data including the candidate representation; and under the condition that the number ratio is larger than a preset threshold value, determining that the candidate agent passes the second validity verification, and obtaining a second validity verification result of the candidate agent.
In the above embodiment, since the candidate agent is the target agent of the target object, the media content whose media content presentation time is near the interaction time of the fourth interaction data should be associated with the target object, so that the candidate agent can be called as validity verification by using the principle, and the accuracy of the subsequent agent mining can be improved.
In some embodiments, determining a second validity verification result for each candidate generation based on co-occurrence interaction data corresponding to each candidate generation, respectively, includes: for a current candidate generation in the plurality of candidate generation, determining co-occurrence interaction data and fourth interaction data corresponding to the current candidate generation; according to the interaction weight of each co-occurrence interaction data, calculating to obtain a first weight sum, and according to the interaction weight of each fourth interaction data, calculating to obtain a second weight sum; determining a co-occurrence probability score corresponding to the current candidate generation name based on the comparison value between the first weight sum and the second weight sum; and determining a second validity verification result of each candidate generation according to the co-occurrence probability score corresponding to each candidate generation.
The process of calculating the interaction weight may refer to the content of the above embodiment, and will not be described herein. For the current candidate generation, the co-occurrence interaction data corresponding to the current candidate generation refers to fourth interaction data of the target object in the matched media content in all fourth interaction data corresponding to the current candidate generation. And fourth interaction data corresponding to the current candidate generation refers to all fourth interaction data corresponding to the current candidate generation.
It should be noted that, the comparison value between the first weight sum and the second weight sum may reflect the number ratio of the co-occurrence interaction data in all the fourth interaction data on the one hand. It will be appreciated that it is natural that the larger the number duty cycle, the more likely the fourth interaction data is to be relevant to the target object. On the other hand, the interactive weights not only are the accumulated numbers but also the accumulated sums of the interactive weights, and the interactive weights can reflect the heat of the media interactive data and the targeting degree of the target object, namely, whether the media interactive data has strong correlation with the determined target representation or not can be reflected, so that the probability degree of the candidate representation as the target representation can be reflected through the comparison value between the first weight sum and the second weight sum.
When determining the co-occurrence probability score corresponding to the current candidate generation, the computer device may directly use the comparison value as the co-occurrence probability score corresponding to the current candidate generation, or may perform additional processing, for example, change the ratio into a score, which is not particularly limited in the embodiment of the present application. After the co-occurrence probability score corresponding to each candidate generation is calculated, whether the co-occurrence probability score corresponding to each candidate generation is larger than a preset threshold value or not can be judged, and the candidate generation which is larger than the preset threshold value and passes the second validity verification can be obtained as the candidate generation which passes the second validity verification, so that the second validity verification result of each candidate generation can be obtained.
In the above embodiment, the candidate generation is referred to as second validity verification by the comparison value between the first weight sum and the second weight sum. The comparison value can reflect the quantity ratio of the co-occurrence interaction data in all fourth interaction data so as to reflect the association degree of the candidate generation and the target object, so that the accuracy rate of the subsequent generation mining can be improved based on the comparison value for verifying the second validity of the candidate generation. In addition, because the interaction weight can reflect whether the media interaction data has strong correlation with the determined target generation, the probability degree of the candidate generation as the target generation can be reflected through the comparison value, and further the candidate generation is called second validity verification based on the comparison value, and the accuracy rate of the subsequent generation mining can be improved.
In some embodiments, where there are multiple media data related to the target object; as shown in fig. 5, the candidate generation that the validity verification passes is called a valid generation, and determining the target generation belonging to the target object based on the candidate generation that the validity verification passes includes:
step 502, obtaining effective codes mined from the media interaction data of the plurality of media data, and obtaining the effectiveness scores corresponding to the effective codes respectively.
In particular, through the processes mentioned in the above embodiments, the computer device may mine the valid proxy from the media interaction data of the plurality of media data. Wherein a plurality of media data are each related to a target object. Taking media data as video data as an example, the relationship of a plurality of media data can be referred to fig. 6. In connection with fig. 6, in an actual implementation, the plurality of media data mentioned in the embodiments of the present application may refer to video data corresponding to different diversity in an episode of a subject in which an actor R (i.e., a target object) participates. For example, the plurality of media data may be "video 1" corresponding "1 st set" to "J-th set". Of course, the plurality of media data mentioned in the embodiments of the present application may refer to episode video of a plurality of topics in which the actor R (i.e., the target object) participates. For example, the plurality of media data may also be "video 1" to "video V".
After the computer device obtains the effective codes mined from the media interaction data of the plurality of media data, the computer device can obtain the effectiveness scores corresponding to the effective codes respectively. In combination with the content of the previous embodiment, the computer device may directly use the word frequency weight of the valid code as the validity score, or may directly use the co-occurrence probability score of the valid code as the validity score, which is not specifically limited in the embodiment of the present application.
Step 504, determining a respective media weight of each media data.
In one embodiment, for a certain media data related to a target object, the media weight corresponding to the media data may reflect the heat of the media data in all media data related to the target object. In an actual implementation process, for a certain media data, the media weight corresponding to the media data may be determined by the following process: the computer equipment obtains the perceived total duration of the media data; acquiring accumulated perceived total duration of all media data related to the target object; and calculating the ratio between the perceived total duration and the accumulated perceived total duration of the media data, and taking the ratio as the media weight corresponding to the media data.
Step 506, screening the valid codes based on the media weights corresponding to the media data and the validity scores corresponding to the valid codes, so as to obtain the target code belonging to the target object.
Through the above process, the computer device can calculate the validity score of the mined validity proxy for each media data, and can calculate the media weight corresponding to each media data. It will be appreciated that if the valid representations mined for each of all media data are integrated into a set, there may be duplicate elements in the set, i.e., the same valid representation may be mined for different media data, which may result in multiple valid scores for the duplicate valid representations. The media weight can reflect the heat of the corresponding media data in all the media data related to the target object, so that the method has reference significance for the effectively-called screening process.
Thus, the computer device may filter the effective designations based on the media weights respectively corresponding to each media data and the effectiveness scores respectively corresponding to each effective designation. Specifically, in the actual implementation process, for the repeated effective proxy, the weighted summation may be performed on the effective score corresponding to the repeated effective proxy and the media weight corresponding to the media data from which the repeated effective proxy is derived, so as to recalculate the effective score of the repeated effective proxy. Then, the repeated effective codes and the non-repeated effective codes are taken as screening objects, and the computer equipment judges whether the effective score corresponding to each effective code is larger than a preset threshold value, so that the effective code larger than the preset threshold value can be taken as a target code belonging to a target object, or the candidate codes of the N first in the sorting of the effective codes after the sorting from large to small are taken as the target code.
Of course, in an actual implementation process, the computer device may not recalculate the validity score of the repeated validity proxy in the weighted summation manner, but may recalculate the maximum value or the minimum value of the validity scores of the repeated validity proxy as the validity score of the repeated validity proxy, or may take an average value as the validity score of the repeated validity proxy, which is not limited in this embodiment of the present application.
In the above embodiment, since the effective proxy mined by the plurality of media data related to the target object can be screened, the coverage rate of the proxy mining can be ensured. In addition, when the corresponding effectiveness score is repeated for the effective representation, the media weight of each source media data is considered, and the media weight can reflect the heat of the corresponding media data in all the media data related to the target object, so that the accuracy of the representation mining can be improved.
In some embodiments, the step of obtaining each validity proxy corresponding to the validity score comprises: acquiring a first score corresponding to the effective proxy when the effective proxy is mined from the target interaction data; obtaining a second score corresponding to the validity of the valid proxy; a validity score for the validity proxy is determined based on the first score and the second score.
Specifically, the first score may include the word frequency weight mentioned in the above embodiment. The validity verification may include the first validity verification and the second validity verification, so that the second score may be only the replacement fluency score determined by the first validity verification, may be only the co-occurrence probability score determined by the second validity verification, and may also include both the replacement fluency score and the co-occurrence probability score.
And based on the first score and the second score, determining a validity score for the validity proxy may be as follows: the computer device multiplies the first score and the second score, taking the multiplication result as a validity score of the validity proxy. Taking the first score as the word frequency weight and the second score including the substitution fluency score and the co-occurrence probability score as examples, the validity score of the validity proxy may be the product of the three. Of course, in an actual implementation process, the validity score may also be a weighted result of the first score and the second score, which is not specifically limited in the embodiment of the present application.
In the above embodiment, since the first score may reflect the degree of possibility of the effective representation being the target representation when the effective representation is mined from the target interaction data, and the second score may reflect the degree of effectiveness when the effective representation is validated, the effectiveness score of the effective representation is determined by combining the two scores and using the score as a screening basis, the accuracy of the representation mining can be improved.
In some embodiments, the method further comprises: and when the interaction increment of the media interaction data reaches the preset quantity, returning to the step of screening the target interaction data for interaction of the target objects from the media interaction data to continue to execute so as to update the target names of the target objects.
It will be appreciated that media interaction data is typically generated in real-time. For example, for video data, bullet screen data may be generated in real-time. Therefore, in order to ensure timeliness of the generation, the target generation can be updated after the new generation of the media interaction data so as to acquire the new generation possibly existing in the new generation of the media interaction data. However, it is considered that the newly generated media interaction data, if less in quantity, may not be sufficient to support mining new representations. Therefore, in the embodiment of the present application, when the interactive increment of the media interactive data reaches the preset number, the target proxy can be updated continuously.
In the above embodiment, since the target agent may be updated when the media interaction data has the interaction increment, timeliness of the agent may be ensured. In addition, the target proxy is updated when the interaction increment of the media interaction data reaches the preset quantity, so that effective mining can be realized, and huge data processing capacity caused by frequent mining times is avoided.
In some embodiments, the method further comprises: under the condition that the attention label of the object to be recommended comprises the target object, the target code belonging to the target object is regarded as the associated attention label of the object to be recommended; media interaction data related to the associated attention tag is pushed to the object to be recommended.
Specifically, when the media data sensing request of the object to be recommended is obtained, the attention tag of the object to be recommended includes the local name of the target object, so that the object to be recommended can be interested in the media interaction data related to the target name among the media interaction data of the media data requested to be sensed by the media data sensing request. The computer equipment can determine the media data requested to be perceived by the object to be recommended according to the user identification carried in the media data perception request, determine the media interaction data comprising the associated attention tag from the media interaction data of the media data, and push the media interaction data to the object to be recommended.
In the above embodiment, since the target agent may be used as an associated attention tag, and be used as an index base for obtaining media interaction data related to the target agent by using the target agent as an object to be recommended, accurate pushing of the media interaction data may be achieved based on the associated attention tag.
In some embodiments, the method further comprises: when media interaction data related to each target name is pushed to the object to be recommended under the condition that the target object name is not included in the attention label of the object to be recommended, the target object name is marked in a preset area adjacent to the display area of the media interaction data.
It can be understood that, in the case that the attention tag of the object to be recommended includes the name of the target object, the object to be recommended is explained to be relatively known about the target object, and even if media interaction data related to the associated attention tag is pushed, the object to be recommended knows that the media interaction data is related to the target object. If the target object is not included in the attention tag of the object to be recommended, if the media interaction data related to each target agent is directly pushed to the object to be recommended, the object to be recommended does not necessarily know that the pushed media interaction data is related to the target object preferred by the user. The target object name is marked in a preset area adjacent to the display area of the media interaction data, so that a prompting effect can be achieved, and the push is prevented from being too obtrusive or becoming invalid. The "adjacent" preset area may refer to an area in which a distance between a center and a target center is within a preset range, and the embodiment of the present application is not specifically limited.
In the above embodiment, since the name of the target object may be marked in a preset area adjacent to the display area of the media interaction data, a prompting effect may be performed to improve the effectiveness of information pushing under the condition that the object to be recommended does not necessarily know that the pushed media interaction data is related to the target object.
For ease of understanding, as shown in FIG. 7, an object proxy mining method is provided. Taking video data with media data as videos, target objects as actors and media interaction data as text barrages as examples in the method, nicknames aiming at the actors can be mined by the method provided by the embodiment of the application. The method is described by taking a computer device (the computer device may be a terminal or a server in fig. 1 as an example), and includes the following steps:
step 702, obtain video data v and corresponding text barrages of the actor, and screen the target text barrages for the actor to interact with from the text barrages.
Specifically, the video data and the content presented by the corresponding text bullet may refer to fig. 8, where the underlined and bolded characters in fig. 8 represent the designation of actor R in the text bullet. The screening manner of screening the target text barrage for the interaction of the actor R from the text barrages corresponding to the video data v may be specifically as follows:
(1) The computer equipment screens out a first text barrage of the title comprising the actor R from the text barrages; and screening out target text barrages from the text barrages according to the approaching degree between the interaction time of the first text barrages and the interaction time of the text barrages.
The name may refer to the name of the actor or the role of the actor in the video data play set. Specifically, the text barrage with the interaction time within T seconds before and after the interaction time of the first text barrage can be used as the screened target text barrage. Wherein T may be 7.
(2) The computer equipment determines the departure time of the actor R in the video data; and screening out target text barrages from the text barrages according to the approaching degree between the appearance time of the actor R and the interaction time of the text barrages.
Specifically, a text bullet screen with the interaction time within T seconds before and after the occurrence time of the actor R can be used as the screened target text bullet screen.
Step 704, mining a plurality of alternative names corresponding to the actors R according to the target text barrage.
The specific implementation process of the step can be as follows: the computer device obtains a plurality of word segments based on the target text bullet screen; for each word segment, determining a second text barrage comprising the corresponding word segment from the target text barrages, and determining the interaction weight corresponding to each second text barrage; determining the word frequency weight corresponding to each word segment based on the interaction weight of the second text bullet screen corresponding to each word segment; and screening word fragments meeting preset conditions from the plurality of word fragments as alternative codes according to the word frequency weights corresponding to the word fragments.
Specifically, the computer device may field intercept the target text bullet screen by the length of [1, maxLen ]. The method comprises the steps of firstly intercepting a target text barrage according to a field length of 1, then intercepting the target text barrage according to a field length of 2, repeating the steps, and finally intercepting the target text barrage according to a field length of MaxLen. Wherein, all target text bullet screens can be marked as a set D, based on which a plurality of word segments can be obtained, which can constitute a word segment set.
For a word segment w in the set, the computer device may determine a second text bullet screen including w in D, and determine an interaction weight corresponding to each second text bullet screen. And the word frequency weight corresponding to the word segment w may be the sum of the interaction weights of the second text bullet screen including w. Wherein the computer device may calculate the weight of the second text bullet screen by the formula: weight of the second text bullet = preference of the release object of the second text bullet for the actor R. Interaction heat of the second text bullet.
For the preference degree of the release object of the second text bullet screen to the actor R, the computer device may calculate the following procedure: determining a plurality of video data related to the actor R, and determining the video heat corresponding to each video data; acquiring the watching completion degree of a release object of the second text bullet screen aiming at each video data; and determining the preference degree of the release object to the actor R according to the watching completion degree of the release object to each video data and the video heat degree of each video data. The above process of determining the preference degree of the publishing object to the actor R may refer to the following formula: the preference degree of the release object of the second text bullet screen for the actor r=sum (the viewing completion degree of the release object for each video data)/(the video heat degree of each video data).
For the interaction heat of the second text bullet screen, the interaction heat can be calculated by referring to the following process: the interaction heat of the second text bullet screen=the number of praise times of the second text bullet screen/the number of praise times of the most praise video data in the video platform.
Through the above process, the computer device can calculate the word frequency weight g_sub corresponding to each word segment, screen out the word segments with g_sub larger than the preset threshold, screen out the word segments including stop words (such as "no semantic meaning", "has been", "is" and "why", etc.), so as to obtain the word segments meeting the preset conditions, and can be used as alternative alternatives, and these alternatives can form an alternative list nickel_list.
Step 706, verifying the validity of each alternative name based on the name of the actor R or at least one data in the image frames related to the actor R in the video data.
Specifically, the validity verification includes a first validity verification and a second validity verification, and the specific implementation process of the step may be as follows: the computer equipment determines a first text barrage of the title comprising the actor R from the text barrages; performing first validity verification on each alternative name based on the first text bullet screen or at least one text bullet screen in a third text bullet screen comprising the alternative names; and taking the alternative code passing the first validity verification as a candidate code, and carrying out second validity verification on the candidate code according to the image frame related to the actor R in the video data.
For the ith alternative, the specific implementation process of performing the first validity verification on the alternative may be as follows: the computer device obtains a first alternate text with the alternate proxy replacing the native name of actor R in the first text bullet screen. A first semantic smoothness G_L1[ i ] of the first alternate text is determined, where i represents an i-th alternate name. The alternative representation in the third text bullet is replaced with the native representation of the actor R, a second alternative text is obtained, and a second semantic smoothness G_L2[ i ] of the second alternative text is determined. And (3) averaging the G_L1[ i ] and the G_L2[ i ] to obtain an i-th alternative substitution fluency score G_Li. Through the above process, the replacement fluency score of each alternative substitution can be calculated. And (5) taking the alternative substitution fluency score which is larger than a preset threshold as an alternative representation passing the first validity verification. It may be appreciated that the number of the first alternative texts and the number of the second alternative texts may be multiple, and the first semantic meaning may refer to the number of the first alternative texts with the statement passing, and the second semantic meaning may refer to the number of the second alternative texts with the statement passing, where whether the alternative texts with the statement passing may be determined by a statement passing recognition model based on the BERT model architecture.
Through the above-described process of the first validity verification, alternative names that the first validity verification passes may be obtained as candidate names. For the ith candidate generation, the specific implementation process of performing the second validity verification on the candidate generation may be as follows: the computer device determines a fourth text bullet screen including the ith candidate representation. Image frames in the video data that match the fourth text bullet screen are determined. And under the condition that the target object appears in the image frame matched with the fourth text barrage, determining the corresponding fourth text barrage as the co-occurrence text barrage. And determining a second validity verification result of the ith candidate generation based on the co-occurrence text barrages respectively corresponding to the ith candidate generation.
For a fourth text bullet screen, the specific implementation process of determining the image frame matched with the fourth text bullet screen in the video data may be as follows: the computer equipment determines the interaction time of the fourth text barrage; and selecting an image frame which is T seconds before and after the interaction time at the playing time in the video data as an image frame matched with the fourth text barrage. Next, the computer device determines a number of image frames including the actor R in the selected image frames, determines a total number of the selected image frames, and uses a ratio between the number of image frames including the actor R and the total number of the selected image frames as a face occurrence probability corresponding to the fourth text bullet screen. And under the condition that the occurrence probability of the human face is larger than a preset threshold value, determining that an actor R appears in the image frame matched with the fourth text barrage, and determining the fourth text barrage as the co-occurrence text barrage. Through the above process, it can be determined which fourth text backdrop is the co-occurrence text backdrop among all the fourth text backdrops corresponding to the ith candidate proxy.
The computer device may then determine a second validation result for each candidate generation based on the co-occurrence text bullet screen respectively corresponding to each candidate generation. For the ith candidate substitution, the specific implementation process can be as follows: the computer device determines a co-occurrence text bullet screen and a fourth text bullet screen corresponding to the i-th candidate proxy. And calculating to obtain a first weight and a sum1_di according to the interaction weight of each co-occurrence interaction data. And calculating to obtain a second weight and sum2 di according to the interaction weight of each fourth interaction data. The ratio between sum1_di and sum2_di is calculated and can be used as the co-occurrence probability score G_face [ i ] corresponding to the ith candidate generation. And (5) taking the candidate codes with the co-occurrence probability score larger than a preset threshold value as candidate codes passing the second validity verification. Thereby, an alternative representation passing both the first validity verification and the second validity verification can be determined, i.e. as a target representation belonging to the target object determined for the video data v.
Step 708, for video v, determining a target representation belonging to actor R based on the alternative representations passing the validity verification.
In the actual implementation process, for the video data v, an alternative code passing the validity verification can be directly used as a target code belonging to the actor R. In addition, the validity score corresponding to the alternative name passing the validity verification can be calculated first, and the specific process for the ith alternative name is as follows: a first score corresponding to the candidate proxy when mined from the target interaction data is obtained, as mentioned in the above embodiment for g_sub [ i ]. Second scores, such as G_Face [ i ] and G_L [ i ], are obtained for the alternate generations when validated. In the actual implementation process, the validity score of the ith alternative can be obtained by multiplying the G_sub [ i ], the G_face [ i ] and the G_L [ i ]. And taking the alternative codes with the effectiveness score larger than a preset threshold value as target codes which are determined for the video data v and belong to the actors R. In case of considering a fusion process of a plurality of video data for a target designation in combination with the subsequent steps, an alternative designation having a validity score greater than a preset threshold value may be regarded as a valid designation mined for a text bullet screen of the video data vi.
Step 710, obtaining effective codes mined from the text barrages of the video data, and obtaining the effectiveness scores corresponding to the effective codes respectively; determining the video weight corresponding to each video data; and screening the effective codes based on the video weights corresponding to the video data and the effective scores corresponding to the effective codes to obtain the final target code belonging to the target object.
For the video data vi, the specific implementation process of determining the video weight corresponding to vi may be as follows: the method comprises the steps that computer equipment obtains vi total playing duration; acquiring the playing total duration of all video data participated by the actor R; and taking the ratio of the total playing duration of vi to the total playing duration of all video data as the video weight corresponding to vi.
It will be appreciated that duplicate efficient representations may be mined for different video data. Thus, for a valid representation that is not repeated, the calculated validity score for the corresponding video data may be directly taken as its final validity score. For the repeated effective code c, the final effective score can be calculated by the following process: and (3) carrying out weighted summation on the video weight of the video data from which the c is sourced and the corresponding effectiveness score, and taking the weighted summation result as a final effectiveness score of the c. By determining whether the final validity score for each valid representation is greater than a preset threshold, valid representations that are greater than the preset threshold may be considered as final target representations that belong to the target object.
In the above embodiment, the media interaction data corresponding to the media data related to the target object is obtained, so that the target interaction data for interaction with the target object is screened out from the obtained media interaction data. Because the interactive data for the target object often includes a plurality of names for different users to call the target object, a plurality of alternative names corresponding to the target object can be accurately mined according to the target interactive data. Based on the above, validity verification is performed on each alternative name based on the name of the target object or at least one data in the media data related to the target object, so that incorrect false names can be removed, or names irrelevant to the target object can be removed. The target proxy belonging to the target object can be efficiently and accurately mined without manual auditing and screening in the whole process, and the labor cost is greatly saved.
In addition, according to the scheme, the proxy can be mined from the media interaction data corresponding to the media data, the media interaction data is large in data size and more in groups related to interaction, so that the coverage rate of the proxy mining can be ensured, namely the target proxy belonging to the target object can be comprehensively mined. In addition, as the media interaction data is updated along with the migration of time, the representation of the target object can be dynamically updated by the method, so that the timeliness of the representation is improved.
The embodiment of the application also provides an application scene, which applies the object code mining method, takes media data as video data, takes media interaction data as text barrage, takes an actor R as a target object, and is called nick as an example. Specifically, as shown in fig. 9, the object-substitution mining method is executed by a computer device, and the application in the application scenario is as follows:
step 902, acquiring a video data set of the actor R.
Wherein, for each video data vi, video data vi and corresponding text barrages that the actor is participating in are obtained, and the computer device screens out the text barrages for the target text barrages that the actor is interacting with.
Step 904, performing nickname mining on the actor R based on the single video in which the actor R participates.
In particular, the computer device may first mine out multiple alternative representations of actor R based on the text bullet screen of video data vi. The computer device may then filter by validity determination: a first validity determination is made for the alternative representation based on the nickname-to-present semantic consistency, and a second validity determination is made for the alternative representation based on the nickname-to-face co-occurrence. Through the above screening process, the effective proxy mined from the text bullet screen of video data vi and the corresponding effectiveness score can be determined.
Step 906, merging the valid codes mined from the text bullet screen of each video data in the video data set, and further screening the valid codes for nickname validity according to the validity score.
Step 908, the remaining valid codes of the final filtering are called the nickname of the actor R.
The embodiment of the application also provides an application scene, which applies the object code mining method, takes media data as video data, takes media interaction data as text barrage, takes an actor R as a target object, and is called nick as an example. Specifically, as shown in fig. 10, the object proxy mining method is executed by a computer device, and the application in the application scenario is as follows:
step 1002, a video data set of the actor R is obtained, and for each video data vi, the video data vi of the actor R and the corresponding text barrage are obtained, and a target text barrage for interaction with the actor is selected from the text barrages.
Step 1004, mining a plurality of alternative names corresponding to the actors R according to the target text barrage.
Step 1006, verifying validity of each alternative name based on the name of the actor R or at least one data in the image frames related to the actor R in the video data.
Step 1008, determining a target name belonging to the actor R based on the candidate names passing the validity verification for the video data vi.
Step 1010, taking the target codes belonging to the actor R mined from the text barrages of the video data vi as effective codes, and acquiring the effectiveness scores corresponding to the effective codes respectively according to the effective codes mined from the text barrages of the video data; determining the video weight corresponding to each video data; and screening the effective codes based on the video weights corresponding to the video data and the effective scores corresponding to the effective codes to obtain the final target code belonging to the actor R.
Step 1012, pushing video data with the video label being the final target name to the object to be recommended under the condition that the attention label of the object to be recommended includes the name of the actor R.
Specifically, since the object to be recommended focuses on the actor R, it is explained that the object to be recommended is interested in the actor R. For some video data, especially video made by a video producer, the cost of the actor R may not be referred to as a video tag, but the final target generation of the actor R may be referred to as a video tag. Therefore, the video tag is pushed to the object to be recommended to serve as the video data of the final target representation, and the stored video data can be effectively utilized while the potential video watching requirement of the object to be recommended is met. Compared with the automatic manual searching of the object to be recommended, the method and the device have the advantages that the data acquisition efficiency of the object to be recommended is improved.
The embodiment of the application also provides an application scene, which applies the object code mining method, takes media data as audio data, takes media interaction data as a text barrage, takes a target object as a dubbing actor R, and is called nickname example. Specifically, as shown in fig. 11, the object proxy mining method is executed by a computer device, and the application in the application scenario is as follows:
step 1102, an audio data set of dubbing participated by the dubbing actor R is obtained, and for each audio data ai, the audio data ai dubbed by the dubbing actor and a corresponding text barrage are obtained, and a target text barrage for interaction of the dubbing actor R is selected from the text barrages.
Step 1104, digging out a plurality of alternative names corresponding to the dubbing actor R according to the target text bullet screen.
Step 1106, verifying the validity of each alternative representation based on the name of the dubbing actor R or at least one of the audio data in the audio frames related to the dubbing actor R.
Step 1108, for the audio data ai, determining a target representation belonging to the dubbing actor R based on the alternative representations passing the validity verification.
Step 1110, taking the target codes belonging to the dubbing actor R dubbed from the text barrage of the audio data ai as effective codes, and obtaining the effectiveness scores corresponding to the effective codes respectively according to the effective codes dubbed from the text barrage of the plurality of audio data; determining the audio weight corresponding to each audio data; and screening the effective codes based on the audio weights respectively corresponding to the audio data and the effectiveness scores respectively corresponding to the effective codes to obtain the final target code belonging to the dubbing actor R.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.
It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an object representation mining device for realizing the above-mentioned related object representation mining method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the object-representation mining device provided below may refer to the limitation of the object-representation mining method hereinabove, and will not be repeated herein.
In some embodiments, as shown in fig. 12, an object proxy mining apparatus is provided, which may employ software modules or hardware modules, or a combination of both, as part of a computer device, and specifically includes: an acquisition module 1202, a screening module 1204, a mining module 1206, a verification module 1208, and a determination module 1210, wherein:
an acquisition module 1202 is configured to acquire media interaction data corresponding to media data related to a target object.
And the screening module 1204 is used for screening the target interaction data for interaction of the target objects from the media interaction data.
The mining module 1206 is configured to mine a plurality of alternative names corresponding to the target object according to the target interaction data.
The verification module 1208 is configured to perform validity verification on each alternative name based on the name of the target object or at least one data in the media content related to the target object in the media data.
A determining module 1210 is configured to determine a target name belonging to the target object based on the candidate names passing the validity verification.
In some embodiments, a screening module 1204 is configured to screen the media interaction data for the first interaction data including the title of the target object; and screening target interaction data from the media interaction data according to the proximity degree between the interaction time of the first interaction data and the interaction time of the media interaction data.
In some embodiments, a filtering module 1204 is configured to determine a time of occurrence of the target object in the media data; and screening the target interaction data from the media interaction data according to the approaching degree between the appearance time of the target object and the interaction time of the media interaction data.
In some embodiments, a mining module 1206 to obtain a plurality of word segments based on the target interaction data; for each word segment, determining second interaction data comprising the corresponding word segment from the target interaction data, and determining interaction weights corresponding to the second interaction data respectively; determining the word frequency weight corresponding to each word segment based on the interaction weight of the second interaction data corresponding to each word segment; and screening word fragments meeting preset conditions from the plurality of word fragments as alternative codes according to the word frequency weights corresponding to the word fragments.
In some embodiments, the mining module 1206 is further configured to obtain a preference degree of the publishing object of each second interaction data on the target object respectively; determining the respective interaction heat of each second interaction data; and determining the interaction weight corresponding to each second interaction data according to the preference degree and the interaction heat corresponding to each second interaction data.
In some embodiments, the mining module 1206 is further configured to determine a plurality of media data related to the target object, and determine a respective media heat for each media data; obtaining the perception completion degree of the release object of the second interactive data aiming at each media data; and determining the preference degree of the release object to the target object according to the perception completion degree of the release object to each piece of media data and the media heat degree of each piece of media data.
In some embodiments, the validity verification includes a first validity verification and a second validity verification; a verification module 1208, configured to determine, from the media interaction data, first interaction data including a local name of the target object; performing first validity verification on each alternative model based on the first interactive data or at least one interactive data in third interactive data comprising the alternative model; and taking the alternative representation passing the first validity verification as a candidate representation, and carrying out second validity verification on the candidate representation according to the media content related to the target object in the media data.
In some embodiments, the verification module 1208 is further configured to replace the current alternative in the first interaction data with the current alternative to obtain a first replacement text, where the current alternative is any alternative; determining a first semantic smoothness of the first alternative text; replacing the current alternative name in the third interactive data by the local name to obtain a second alternative text, and determining the second semantic smoothness of the second alternative text; and determining a first validity verification result of each alternative proxy according to the first semantic smoothness and the second semantic smoothness corresponding to each alternative proxy.
In some embodiments, the verification module 1208 is further configured to determine, for a current candidate generation in the plurality of candidate generation, fourth interaction data including the current candidate generation, the current candidate generation being referred to as any candidate generation; determining media content matched with fourth interaction data in the media data; under the condition that a target object appears in the media content matched with the fourth interaction data, determining the corresponding fourth interaction data as co-occurrence interaction data; and determining a second validity verification result of each candidate generation based on the co-occurrence interaction data respectively corresponding to each candidate generation.
In some embodiments, the verification module 1208 is further configured to determine, for a current candidate generation of the plurality of candidate generation, co-occurrence interaction data and fourth interaction data corresponding to the current candidate generation; according to the interaction weight of each co-occurrence interaction data, calculating to obtain a first weight sum, and according to the interaction weight of each fourth interaction data, calculating to obtain a second weight sum; determining a co-occurrence probability score corresponding to the current candidate generation name based on the comparison value between the first weight sum and the second weight sum; and determining a second validity verification result of each candidate generation according to the co-occurrence probability score corresponding to each candidate generation.
In some embodiments, in the case where there are multiple media data related to the target object, the alternative for the validation to pass is referred to as the validation proxy; the determining module 1210 is configured to obtain valid representations mined from media interaction data of the plurality of media data, and obtain validity scores corresponding to the valid representations respectively; determining the respective corresponding media weight of each media data; and screening the effective codes based on the media weights respectively corresponding to the media data and the effectiveness scores respectively corresponding to the effective codes to obtain target codes belonging to target objects.
In some embodiments, the determining module 1210 is further configured to obtain a first score corresponding to the effective proxy when mined from the target interaction data; obtaining a second score corresponding to the validity of the valid proxy; a validity score for the validity proxy is determined based on the first score and the second score.
In some embodiments, the apparatus further comprises:
and the updating module is used for returning to the step of screening the target interaction data for interaction of the target objects from the media interaction data to continue to execute when the interaction increment of the media interaction data reaches the preset quantity so as to update the target names of the target objects.
In some embodiments, the apparatus further comprises:
the pushing module is used for taking a target code belonging to the target object as an associated attention label of the object to be recommended under the condition that the attention label of the object to be recommended comprises the target object; media interaction data related to the associated attention tag is pushed to the object to be recommended.
In some embodiments, the pushing module is further configured to, when pushing media interaction data related to each target name to the object to be recommended, label the local name of the target object to a preset area adjacent to a display area of the media interaction data in a case that the local name of the target object is not included in the attention tag of the object to be recommended.
The object proxy mining device acquires media interaction data corresponding to media data related to a target object, and then screens out target interaction data for interaction of the target object from the acquired media interaction data. Because the interactive data for the target object often includes a plurality of names for different users to call the target object, a plurality of alternative names corresponding to the target object can be accurately mined according to the target interactive data. Based on the above, validity verification is performed on each alternative name based on the name of the target object or at least one data in the media data related to the target object, so that incorrect false names can be removed, or names irrelevant to the target object can be removed. The target proxy belonging to the target object can be efficiently and accurately mined without manual auditing and screening in the whole process, and the labor cost is greatly saved.
In addition, according to the scheme, the proxy can be mined from the media interaction data corresponding to the media data, the media interaction data is large in data size and more in groups related to interaction, so that the coverage rate of the proxy mining can be ensured, namely the target proxy belonging to the target object can be comprehensively mined. In addition, as the media interaction data can be updated along with the migration of time, the representation of the target object can be dynamically updated through the method, and therefore timeliness of the representation is improved. Further, a business data processing basis can be provided for business scenes such as content understanding, content retrieval, content recommendation and content interaction of the media platform based on the target proxy, so that the overall business effect of the media platform can be improved.
For specific limitations on the object-proxy mining apparatus, reference may be made to the above limitations on the object-proxy mining method, and no further description is given here. The various modules in the above-described object-proxy mining apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a terminal or a server, and the internal structure of which may be as shown in fig. 13. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing media data and media interaction data. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements an object proxy mining method.
It will be appreciated by those skilled in the art that the structure shown in fig. 13 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto. The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (19)

1. A method of object proxy mining, the method comprising:
acquiring media interaction data corresponding to media data related to a target object;
screening target interaction data for interaction of the target object from the media interaction data;
digging a plurality of alternative names corresponding to the target object according to the target interaction data;
verifying the validity of each alternative name based on the name of the target object or at least one data in the media data related to the target object;
and determining a target code belonging to the target object based on the alternative code passing the validity verification.
2. The method of claim 1, wherein the screening the media interaction data for the target object comprises:
screening out first interaction data of the title comprising the target object from the media interaction data;
and screening target interaction data from the media interaction data according to the proximity degree between the interaction time of the first interaction data and the interaction time of the media interaction data.
3. The method of claim 1, wherein the screening the media interaction data for the target object comprises:
determining the occurrence time of the target object in the media data;
and screening the target interaction data from the media interaction data according to the approaching degree between the appearance time of the target object and the interaction time of the media interaction data.
4. The method of claim 1, wherein mining a plurality of alternative representations corresponding to the target object from the target interaction data comprises:
obtaining a plurality of word segments based on the target interaction data;
For each word segment, determining second interaction data comprising the corresponding word segment from the target interaction data, and determining interaction weights corresponding to the second interaction data respectively;
determining the word frequency weight corresponding to each word segment based on the interaction weight of the second interaction data corresponding to each word segment;
and screening word fragments meeting preset conditions from the plurality of word fragments as alternative substitution names according to the word frequency weights corresponding to each word fragment.
5. The method of claim 4, wherein determining the respective interaction weights for each of the second interaction data comprises:
obtaining preference degrees of the release objects of the second interaction data on the target objects respectively;
determining the respective interaction heat of each second interaction data;
and determining the interaction weight corresponding to each second interaction data according to the preference degree and the interaction heat corresponding to each second interaction data.
6. The method according to claim 5, wherein the step of obtaining the preference degree of the release object of each second interaction data for the target object includes:
determining a plurality of media data related to a target object, and determining the respective media heat of each media data;
Obtaining the perception completion degree of the release object of the second interactive data aiming at each media data;
and determining the preference degree of the release object to the target object according to the perception completion degree of the release object to each piece of media data and the media heat degree of each piece of media data.
7. The method of claim 1, wherein the validation comprises a first validation and a second validation, wherein the validating each alternative representation based on the local representation of the target object or at least one of the media data related to the target object comprises:
determining first interaction data of a local name comprising a target object from the media interaction data;
performing first validity verification on each alternative name based on the first interactive data or at least one interactive data in the third interactive data comprising the alternative name;
and taking the alternative code passing the first validity verification as a candidate code, and carrying out second validity verification on the candidate code according to the media content related to the target object in the media data.
8. The method of claim 7, wherein the performing the first validity verification on each alternative representation based on at least one of the first interaction data or the third interaction data including the alternative representation comprises:
for a current alternative in a plurality of alternative alternatives, replacing the current alternative with the current alternative to obtain a first alternative text, wherein the current alternative is any alternative;
determining a first semantic smoothness of the first alternative text;
replacing a current alternative name in the third interactive data with the local name to obtain a second alternative text, and determining a second semantic smoothness of the second alternative text;
and determining a first validity verification result of each alternative proxy according to the first semantic smoothness and the second semantic smoothness corresponding to each alternative proxy.
9. The method of claim 7, wherein said performing a second validation of said candidate proxy based on media content in said media data associated with said target object comprises:
for a current candidate generation in a plurality of candidate generation, determining fourth interaction data comprising the current candidate generation, wherein the current candidate generation is called any candidate generation;
Determining media content matched with the fourth interaction data in the media data;
under the condition that a target object appears in the media content matched with the fourth interaction data, determining the corresponding fourth interaction data as co-occurrence interaction data;
and determining a second validity verification result of each candidate generation based on the co-occurrence interaction data respectively corresponding to each candidate generation.
10. The method of claim 9, wherein determining a second validity verification result for each candidate generation based on co-occurrence interaction data corresponding to each candidate generation, respectively, comprises:
for a current candidate generation in a plurality of candidate generation, determining co-occurrence interaction data and fourth interaction data corresponding to the current candidate generation;
according to the interaction weight of each co-occurrence interaction data, calculating to obtain a first weight sum, and according to the interaction weight of each fourth interaction data, calculating to obtain a second weight sum;
determining a co-occurrence probability score corresponding to the current candidate proxy based on the first weight and a comparison value between the second weight and the first weight;
and determining a second validity verification result of each candidate generation according to the co-occurrence probability score corresponding to each candidate generation.
11. The method of claim 1, wherein in the case where there are a plurality of media data related to a target object, the candidate representation passing the validity verification is a valid representation, and the determining the target representation belonging to the target object based on the candidate representation passing the validity verification comprises:
acquiring effective codes mined from media interaction data of a plurality of media data, and acquiring effectiveness scores corresponding to the effective codes respectively;
determining the respective corresponding media weight of each media data;
and screening the effective codes based on the media weights respectively corresponding to the media data and the effectiveness scores respectively corresponding to the effective codes to obtain target codes belonging to the target object.
12. The method of claim 11, wherein the step of obtaining each validity score for each validity proxy comprises:
acquiring a first score corresponding to the effective proxy when the effective proxy is mined from the target interaction data;
obtaining a second score corresponding to the validity of the valid proxy;
a validity score for the validity proxy is determined based on the first score and the second score.
13. The method according to any one of claims 1 to 12, further comprising:
and when the interaction increment of the media interaction data reaches the preset quantity, returning to the step of screening the target interaction data for interaction of the target objects from the media interaction data, and continuously executing so as to update the target names of the target objects.
14. The method according to any one of claims 1 to 12, further comprising:
under the condition that the attention label of the object to be recommended comprises a target object, the target code belonging to the target object is called as an associated attention label of the object to be recommended;
pushing media interaction data related to the associated attention tag to the object to be recommended.
15. The method of claim 14, wherein the method further comprises:
when media interaction data related to each target name is pushed to the object to be recommended under the condition that the focus tag of the object to be recommended does not comprise the local name of the target object, the local name of the target object is marked in a preset area adjacent to a display area of the media interaction data.
16. An object representation mining apparatus, the apparatus comprising:
the acquisition module is used for acquiring media interaction data corresponding to the media data related to the target object;
the screening module is used for screening target interaction data for interaction of the target objects from the media interaction data;
the mining module is used for mining a plurality of alternative names corresponding to the target object according to the target interaction data;
the verification module is used for verifying the validity of each alternative name based on the name of the target object or at least one data in the media content related to the target object in the media data;
and the determining module is used for determining the target code belonging to the target object based on the alternative code passing the validity verification.
17. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 15 when the computer program is executed.
18. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 15.
19. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 15.
CN202210783692.6A 2022-07-05 2022-07-05 Object proxy mining method, device, computer equipment and storage medium Pending CN117390204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210783692.6A CN117390204A (en) 2022-07-05 2022-07-05 Object proxy mining method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210783692.6A CN117390204A (en) 2022-07-05 2022-07-05 Object proxy mining method, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117390204A true CN117390204A (en) 2024-01-12

Family

ID=89435080

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210783692.6A Pending CN117390204A (en) 2022-07-05 2022-07-05 Object proxy mining method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117390204A (en)

Similar Documents

Publication Publication Date Title
US20210141814A1 (en) Concept-level user intent profile extraction and applications
US10567329B2 (en) Methods and apparatus for inserting content into conversations in on-line and digital environments
JP6986527B2 (en) How and equipment to process video
US20210294833A1 (en) System and method for rich media annotation
CN109155136B (en) Computerized system and method for automatically detecting and rendering highlights from video
CN112749608B (en) Video auditing method, device, computer equipment and storage medium
US10223464B2 (en) Suggesting filters for search on online social networks
KR101944469B1 (en) Estimating and displaying social interest in time-based media
US11372917B2 (en) Labeling video files using acoustic vectors
Wang et al. Retweet wars: Tweet popularity prediction via dynamic multimodal regression
CN111258995B (en) Data processing method, device, storage medium and equipment
CN106354861A (en) Automatic film label indexing method and automatic indexing system
KR20160055930A (en) Systems and methods for actively composing content for use in continuous social communication
CN109509010B (en) Multimedia information processing method, terminal and storage medium
US10248645B2 (en) Measuring phrase association on online social networks
US10430805B2 (en) Semantic enrichment of trajectory data
WO2023108980A1 (en) Information push method and device based on text adversarial sample
CN114339360B (en) Video processing method, related device and equipment
CN110110218B (en) Identity association method and terminal
Matsumoto et al. Music video recommendation based on link prediction considering local and global structures of a network
CN114547257B (en) Class matching method and device, computer equipment and storage medium
CN117390204A (en) Object proxy mining method, device, computer equipment and storage medium
CN114662002A (en) Object recommendation method, medium, device and computing equipment
CN116049566A (en) Object representation method, apparatus, device, storage medium and computer program product
CN114329049A (en) Video search method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination