CN111488491A - Method, system, medium and equipment for identifying target anchor - Google Patents
Method, system, medium and equipment for identifying target anchor Download PDFInfo
- Publication number
- CN111488491A CN111488491A CN202010584332.4A CN202010584332A CN111488491A CN 111488491 A CN111488491 A CN 111488491A CN 202010584332 A CN202010584332 A CN 202010584332A CN 111488491 A CN111488491 A CN 111488491A
- Authority
- CN
- China
- Prior art keywords
- anchor
- metric value
- obtaining
- announcements
- sets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/78—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/7867—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title and artist information, manually generated time, location and usage information, user ratings
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/2187—Live feed
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
- H04N21/25875—Management of end-user data involving end-user authentication
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer Graphics (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for identifying target anchor, which weights a first metric value, a second metric value and a third metric value through a weight vector obtained by a positive sample set composed of anchor with the same live broadcast characteristics and a negative sample set composed of anchor with different live broadcast characteristics on the basis of obtaining the first metric value, the second metric value and the third metric value according to the high similarity among the target anchors, so that the obtained target parameter value can represent the similarity among the anchors more accurately.
Description
Technical Field
The invention relates to the technical field of network live broadcast, in particular to a method, a system, a medium and equipment for identifying a target anchor.
Background
The live broadcasting guild is a brokerage company for managing and operating anchor, one live broadcasting guild generally comprises a plurality of anchor, and if the anchor joins a guild, the anchor can not use other account numbers to represent other guild to carry out live broadcasting according to the regulation. However, some target anchor broadcasts live under multiple meetings by using a small size in order to obtain support and rewards of the multiple meetings, and network resources of a network live broadcast platform are abnormally occupied. In order to solve the problem of occupying network resources, the target anchor needs to be accurately identified so as to limit the target anchor. In the prior art, the following method is generally adopted for identification: the real-name authentication is adopted, the anchor can be checked with the authenticated identity information during registration during broadcasting, but the authentication can be bypassed by means of false authentication and the like; by adopting the public equipment, if the same equipment is used for a period of time, the equipment is identified as a size number, and the method only considers the dimension of the equipment and can cause false killing or omission. Therefore, the existing identification accuracy rate of the target anchor is low.
Disclosure of Invention
In view of the above, the present invention has been developed to provide a method and system for identifying a target anchor that overcomes, or at least partially solves, the above-mentioned problems.
On one hand, the present application provides the following technical solutions through an embodiment of the present application:
a method of identifying a target anchor, the method comprising:
acquiring anchor announcements, broadcast IP sets and broadcast equipment sets of a plurality of anchors;
obtaining a first metric value representing the similarity of the anchor announcements between two anchor announcements based on the anchor announcements of the anchor announcements; obtaining a second metric value representing the similarity of the broadcasting IP sets between the two anchor broadcasters based on the broadcasting IP sets of the anchor broadcasters; obtaining a third metric value representing the similarity of the broadcasting equipment sets between the two anchor broadcasters based on the broadcasting equipment sets of the anchor broadcasters;
based on the broadcasting IP sets and the broadcasting equipment sets of the multiple anchor broadcasters, obtaining a positive sample set formed by anchor broadcasters with the same live broadcast characteristics and a negative sample set formed by anchor broadcasters with different live broadcast characteristics;
obtaining a weight vector for the first, second, and third metric values based on the first, second, third metric values, a set of positive samples, and a set of negative samples;
obtaining a target parameter value representing the weighted similarity between the two anchor players based on the first metric value, the second metric value, the third metric value and the weight vector;
and identifying two anchor broadcasters with the target parameter value larger than a preset threshold value as target anchor broadcasters.
Optionally, the obtaining a first metric value representing the similarity between the anchor announcements of the two anchor based on the anchor announcements of the multiple anchor specifically includes:
performing word segmentation on the anchor announcements of the two anchors to generate a word vector set;
obtaining announcement vectors of the two anchor announcements based on the word vector set;
and obtaining a first metric value representing the similarity of the anchor announcements between the two anchors based on the announcement vectors of the two anchor announcements.
Optionally, the obtaining the advertisement vectors of the two anchor announcements based on the word vector set specifically includes:
the advertisement vector is obtained according to the following formula:
wherein the content of the first and second substances,an advertisement vector representing the anchor advertisement s,the number of words after the word segmentation of the anchor announcement s is represented;meaning termA generated vector;is a constant;words in the representation of the anchor announcements sWord frequencies that occur in all anchor announcements.
Optionally, the obtaining a first metric value representing the similarity between the anchor announcements of the two anchor based on the anchor announcements of the multiple anchor specifically includes:
the first metric value is obtained according to the following formula:
the obtaining a second metric value representing the similarity of the broadcast IP sets between the two anchor broadcasters based on the broadcast IP sets of the anchor broadcasters specifically includes:
the second metric value is obtained according to the following formula:
the obtaining, based on the broadcast device sets of the multiple anchor, a third metric value representing similarity of the broadcast device sets between the two anchors specifically includes:
the third metric value is obtained according to the following formula:
Optionally, the obtaining a weight vector about the first metric value, the second metric value, and the third metric value based on the first metric value, the second metric value, the third metric value, the positive sample set, and the negative sample set specifically includes:
based on the first metric value, the second metric value, the third metric value, the positive sample set and the negative sample set, the following formula is constructed:
wherein the content of the first and second substances,,representing a weight vector;representing a weight adjustment coefficient;is a variable, representing a anchor in the positive set of samples S or the negative set of samples D,;
solving and obtaining the result by utilizing the positive sample set and the negative sample setMinimum weight vector。
Optionally, the method includes solving by using the positive sample set and the negative sample set to obtainMinimum weight vectorThe method specifically comprises the following steps:
Wherein the content of the first and second substances,is the weight value of the t-th iteration;represents a learning rate;represents a subset of the positive sample set S selected in the t-th iteration,representing a set of negative samples selected in the t-th iterationD.
Optionally, after identifying two anchor whose target parameter values are greater than a preset threshold as target anchors, the method further includes:
and limiting the target anchor.
In another aspect, the present application provides, through another embodiment of the present application, a system for identifying a target anchor, the system including:
the data acquisition module is used for acquiring the anchor announcements, the broadcasting IP sets and the broadcasting equipment sets of a plurality of anchors;
a first obtaining module, configured to obtain, based on the anchor announcements of the multiple anchors, a first metric value representing similarity of the anchor announcements between two anchors; obtaining a second metric value representing the similarity of the broadcasting IP sets between the two anchor broadcasters based on the broadcasting IP sets of the anchor broadcasters; obtaining a third metric value representing the similarity of the broadcasting equipment sets between the two anchor broadcasters based on the broadcasting equipment sets of the anchor broadcasters;
a second obtaining module, configured to obtain, based on the broadcasting IP sets and the broadcasting device sets of the multiple anchor, a positive sample set composed of anchors with the same live broadcast characteristics and a negative sample set composed of anchors with different live broadcast characteristics;
a third obtaining module for obtaining a weight vector for the first, second and third metric values based on the first, second, third metric values, a set of positive samples and a set of negative samples;
a fourth obtaining module, configured to obtain a target parameter value representing the weighted similarity between the two anchor broadcasters, based on the first metric value, the second metric value, the third metric value, and the weight vector;
and the target identification module is used for identifying the two anchor broadcasters with the target parameter values larger than a preset threshold value as target anchor broadcasters.
The invention discloses a readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
The invention discloses an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor performing the steps of the method.
One or more technical solutions provided in the embodiments of the present application have at least the following technical effects or advantages:
the method comprises the steps of firstly, obtaining a first metric value representing the similarity of the anchor announcements between two anchor according to the anchor announcements of the anchor; according to the broadcasting IP set of the anchor, a second metric value representing the similarity of the broadcasting IP set between the two anchors is obtained; according to the broadcasting equipment set of the anchor, obtaining a third measurement value representing the similarity of the broadcasting equipment set between the two anchors; according to the broadcasting IP set and the broadcasting equipment set of the anchor, obtaining a positive sample set formed by the anchor with the same live broadcasting characteristics and a negative sample set formed by the anchors with different live broadcasting characteristics; secondly, obtaining a weight vector about the first metric value, the second metric value and the third metric value based on the first metric value, the second metric value, the third metric value, the positive sample set and the negative sample set; obtaining a target parameter value representing the weighted similarity between the two anchor players based on the first metric value, the second metric value, the third metric value and the weight vector; and identifying two anchor broadcasters with the target parameter value larger than a preset threshold value as target anchor broadcasters. According to the method, the first metric value, the second metric value and the third metric value are weighted by the weight vector obtained by the positive sample set formed by the anchor with the same live broadcast characteristics and the negative sample set formed by the anchor with different live broadcast characteristics on the basis of obtaining the first metric value, the second metric value and the third metric value, so that the obtained target parameter value can represent the similarity between the anchors more accurately.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a flow diagram of a method of identifying a target anchor in one embodiment of the invention;
fig. 2 is a system architecture diagram for identifying a target anchor in one embodiment of the invention.
Detailed Description
The embodiment of the application provides a method, a system, a medium and equipment for identifying a target anchor, and provides a method capable of accurately identifying the target anchor in order to solve the problem of occupying network resources.
In order to solve the technical problems, the general idea of the embodiment of the application is as follows:
a method for identifying a target anchor includes the steps that first, according to anchor announcements of the anchors, a first metric value representing similarity of the anchor announcements between two anchors is obtained; according to the broadcasting IP set of the anchor, a second metric value representing the similarity of the broadcasting IP set between the two anchors is obtained; according to the broadcasting equipment set of the anchor, obtaining a third measurement value representing the similarity of the broadcasting equipment set between the two anchors; according to the broadcasting IP set and the broadcasting equipment set of the anchor, obtaining a positive sample set formed by the anchor with the same live broadcasting characteristics and a negative sample set formed by the anchors with different live broadcasting characteristics; secondly, obtaining a weight vector about the first metric value, the second metric value and the third metric value based on the first metric value, the second metric value, the third metric value, the positive sample set and the negative sample set; obtaining a target parameter value representing the weighted similarity between the two anchor players based on the first metric value, the second metric value, the third metric value and the weight vector; and identifying two anchor broadcasters with the target parameter value larger than a preset threshold value as target anchor broadcasters. Because the target anchor has high similarity, on the basis of obtaining the first metric value, the second metric value and the third metric value, the first metric value, the second metric value and the third metric value are weighted by the weight vector obtained by the positive sample set formed by the anchor with the same live broadcast characteristics and the negative sample set formed by the anchor with different live broadcast characteristics, so that the obtained target parameter value can represent the similarity between the anchors more accurately.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
First, it is stated that the term "and/or" appearing herein is merely one type of associative relationship that describes an associated object, meaning that three types of relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Example one
The present embodiment provides a method for identifying a target anchor, and referring to fig. 1, the method of the present embodiment includes the following steps:
s101, acquiring anchor announcements, broadcasting IP sets and broadcasting equipment sets of a plurality of anchors;
s102, obtaining a first metric value representing the similarity of the anchor announcements between two anchor announcements based on the anchor announcements of the anchor announcements; obtaining a second metric value representing the similarity of the broadcasting IP sets between the two anchor broadcasters based on the broadcasting IP sets of the anchor broadcasters; obtaining a third metric value representing the similarity of the broadcasting equipment sets between the two anchor broadcasters based on the broadcasting equipment sets of the anchor broadcasters;
s103, based on the broadcasting IP sets and the broadcasting equipment sets of the multiple anchor broadcasters, obtaining a positive sample set formed by anchor broadcasters with the same live broadcast characteristics and a negative sample set formed by anchor broadcasters with different live broadcast characteristics;
s104, obtaining weight vectors of the first metric value, the second metric value and the third metric value based on the first metric value, the second metric value, the third metric value, the positive sample set and the negative sample set;
s105, obtaining a target parameter value representing the weighted similarity between the two anchor players based on the first metric value, the second metric value, the third metric value and the weight vector;
and S106, identifying the two anchor anchors with the target parameter values larger than a preset threshold value as target anchors.
It should be noted that the anchor in this embodiment, that is, the network anchor, may be responsible for participating in a series of operations such as planning, editing, recording, producing, audience interaction, hosting, and the like in an internet program or activity. The anchor in this embodiment may be a person participating in live broadcasting, or may be an electronic device participating in live broadcasting interaction, such as an intelligent robot. In order to facilitate management, a anchor can join a guild, the guild is a broker company for managing and operating the anchor, the anchor cannot join a plurality of guilds at the same time according to the regulation, but some anchors use a small size (a small size, namely, another live account of the anchor except a main live account) to live on behalf of other guilds.
It should be noted that, in the embodiment of the present application, only the live accounts that are played with different live accounts may be described as a main broadcast. Thus, due to the existence of a anchor "trumpet", the anchors that are driven by two different live accounts may belong to the same anchor.
The method for identifying the target anchor provided by the embodiment can be applied to scenes for identifying the target anchors which are live in a plurality of guilds by using the small numbers. The method may be performed by a device for identifying a target anchor, which may be implemented in software and/or hardware, typically integrated in a terminal, such as a server corresponding to a live platform.
Referring to fig. 1, the method of the present embodiment is performed as follows:
first, S101 is executed to acquire an anchor announcement, an originating IP set, and an originating device set of a plurality of anchors.
In the specific implementation process, in this step, the anchor announcements of the multiple anchors, the playback IP sets of the multiple anchors, and the playback device sets of the multiple anchors are obtained. The anchor announcement can be that the anchor fills in the announcement column of the anchor with the introduction related to the live content of the anchor, which is beneficial for users watching the anchor to quickly know the live content or the live time of the anchor; the IP in the broadcasting IP set refers to a network address used by the anchor broadcasting live broadcasting; the devices in the broadcasting device set refer to devices used by the main broadcasting. Specifically, the anchor announcement, the play IP set and the play equipment set can be obtained through the historical log data of live broadcast, the log data refers to data generated by interaction activities of a live broadcast room to be identified and recorded on the network live broadcast platform, and the log data to be recorded can be set as required. The historical log data is the trace left after the user uses, is objective and accords with the natural law.
In order to obtain a first metric value, a second metric value, and a third metric value for more accurately measuring the similarity between anchor broadcasters, objective data generated in the anchor live broadcasting process is selected to calculate the first metric value, the second metric value, and the third metric value, that is, an anchor announcement, an initiating IP set, and an initiating device set, in this embodiment, the theory is as follows:
in this embodiment, the reason why the anchor announcement is selected as the basic data is that the playing contents of the same anchor are generally very similar, and therefore, the contents displayed in the announcements are often very close to each other, and therefore, if the same anchor uses different accounts to perform live broadcasting in different meetings, the anchor announcement is often very close to each other, which is visible, and the anchor announcement is one of the essential core parameters for obtaining a more accurate measurement value.
In this embodiment, the broadcast IP is selected as the basic data because the network environments broadcast by the same anchor are likely to be consistent, and therefore some IP addresses are shared, so if the same anchor uses different accounts to broadcast directly in different meetings, the broadcast IP is at least partially overlapped, and thus the broadcast IP is one of the essential core parameters for obtaining a more accurate measurement value.
In this embodiment, the broadcasting equipment is selected as the basic data because the same equipment for broadcasting by the same anchor is likely to be the same, and therefore, if the same anchor uses different accounts to broadcast directly in different meetings and the broadcasting equipment is the same, the similarity between the anchors is higher, and thus, the broadcasting equipment is one of the essential core parameters for obtaining a more accurate measurement value.
Next, executing S102, obtaining a first metric value representing the similarity of the anchor announcements between the two anchors based on the anchor announcements of the multiple anchors; obtaining a second metric value representing the similarity of the broadcasting IP sets between the two anchor broadcasters based on the broadcasting IP sets of the anchor broadcasters; and obtaining a third metric value representing the similarity of the broadcasting equipment sets between the two anchor broadcasters based on the broadcasting equipment sets of the anchor broadcasters.
In a specific implementation process, after acquiring a multicast announcement, a broadcast IP set, and a broadcast device set of a multicast, in order to measure the similarity between the multicasts and obtain a weight vector in the following, a first metric value representing the similarity between the multicast announcements of two multicasts, a second metric value representing the similarity between the broadcast IP sets of two multicasts, and a third metric value representing the similarity between the broadcast device sets of two multicasts need to be obtained first.
For example, in this embodiment, the obtaining a first metric value representing the similarity between the anchor announcements of the two anchor based on the anchor announcements of the multiple anchor specifically includes:
the first metric value is obtained according to the following formula:
specifically, the principle of the formula is as follows: the anchor announcement of the anchor is a characteristic in a vector form, the formula measures the two anchor announcements by adopting cosine similarity, in the calculation method of the cosine similarity, a numerator is an inner product of two vectors, a denominator is a modulus of the vectors, and the modulus of the inner product divided by the vectors represents a cosine value of an included angle formed by the two vectors.
The obtaining a second metric value representing the similarity of the broadcast IP sets between the two anchor broadcasters based on the broadcast IP sets of the anchor broadcasters specifically includes:
the second metric value is obtained according to the following formula:
specifically, the principle of the formula is as follows: the anchor's broadcast IP set is a feature of the set form, and the two anchor's broadcast IP sets are measured using jaccard similarity. In the calculation method of the Jacard similarity, the numerator is the intersection of two sets and represents the IP number commonly used by the anchor, and the denominator is the union of the two sets and represents the IP number used by at least one anchor.
The obtaining, based on the broadcast device sets of the multiple anchor, a third metric value representing similarity of the broadcast device sets between the two anchors specifically includes:
the third metric value is obtained according to the following formula:
Specifically, the principle of the formula is as follows: the broadcasters' set of playback devices is a characteristic of set form, and Jacobs similarity is used to measure two sets of broadcasters. In the calculation method of the Jacard similarity, the numerator is the intersection of two sets and represents the number of devices commonly used by the anchor, and the denominator is the union of the two sets and represents the number of devices used by at least one anchor.
In the foregoing, because the advertisement vector is required to be calculated when calculating the first metric value, as an optional embodiment, the obtaining the first metric value representing the similarity between the anchor advertisements of two anchor based on the anchor advertisements of the multiple anchor specifically includes:
performing word segmentation on the anchor announcements of the two anchors to generate a word vector set;
obtaining announcement vectors of the two anchor announcements based on the word vector set;
and obtaining a first metric value representing the similarity of the anchor announcements between the two anchors based on the announcement vectors of the two anchor announcements.
In particular implementations, a word vector set typically includes word vectors generated by an anchor's anchor announcements. Specifically, word vectors can be generated by word technology such as word2vec or glove.
Illustratively, the advertisement vector may be obtained as follows:
wherein the content of the first and second substances,an anchor announcement vector representing an anchor announcement s,the number of words after the word segmentation of the anchor announcement s is represented;meaning termA generated vector;is constant and takes 0.0001;words in the representation of the anchor announcements sWord frequencies that occur in all anchor announcements.
Specifically, the word frequency calculation method comprises the following steps:wherein, in the step (A),is a wordNumber of occurrences in all anchor announcements, andis the sum of the number of times that all words in the anchor post s appear in all posts, i.e.V is the set of words in the anchor announcement s.
wherein:
wherein the content of the first and second substances,this is a normalization term, which is considered herein as a constant since the vectors of the words are distributed approximately uniformly over the entire vector space, and thus are approximately the same for different s.
Taking the logarithm of the above formula:
if the vector representation of the bulletin text s is reasonable, thenShould be as large as possible, and then get the above formulaA maximum value.
The optimal solution is solved as follows:
carrying out Taylor expansion:
the optimal solution is a weighted average of all word vectors in the sentence:
Next, S103 is executed, and based on the broadcasting IP set and the broadcasting device set of the multiple anchor, a positive sample set composed of anchors having the same live broadcast characteristics and a negative sample set composed of anchors having different live broadcast characteristics are obtained.
In a specific implementation, if it is found that the playback devices used by two anchor are the same, then the probability that the two anchors are the same anchor is very high. Based on this, a set of anchor pairs that are live broadcast using the same device for a period of time may be extracted as the same anchor set, and the set is recorded as a positive sample set S; and extracting that the same equipment is not used for live broadcasting in a period of time and no anchor pair using the same live broadcasting IP exists, and recording the set as a negative sample set D.
Next, S104 is performed, and based on the first metric value, the second metric value, the third metric value, the positive sample set and the negative sample set, a weight vector for the first metric value, the second metric value and the third metric value is obtained.
In a specific implementation process, the first metric value, the second metric value and the third metric value may measure the similarity between two anchor broadcasters, but if weighting is not performed, the overall measurement accuracy cannot be improved. For subsequent accurate identification of the primary target anchor, a weight vector for the first, second and third metric values needs to be obtained first.
For example, based on the first metric value, the second metric value, the third metric value, the positive sample set, and the negative sample set, the following formula may be constructed:
wherein the content of the first and second substances,,representing a weight vector;representing a weight adjustment coefficient;is a variable, representing a anchor in the positive set of samples S or the negative set of samples D,;
the above formula allows the construction of an optimal solution problem, with the goal of solving the optimal vectorSo thatAnd minimum. The optimization principle is as follows: for i and j in set S, we want the values measured in this way to be as small as possible, whereas for i and j in set D, the values are as large as possible. Thus, the solution may be obtained using the positive and negative sample setsMinimum weight vector。
Specifically, the method uses a positive sample set and a negative sample set to solve and obtainMinimum weight vectorThe method specifically comprises the following steps:
Randomly selecting a fixed number of subsets from S and D, and iteratively obtaining the subset according to the following formulaMinimum weight vector:
Wherein the content of the first and second substances,,is the weight value of the t-th iteration;the learning rate is represented, the learning rate cannot be too large or too small, the iteration speed is slow due to too small learning rate, the optimal solution is difficult to find due to oscillation during optimization due to too large learning rate, and the value is generally 0.01;represents a subset of the positive sample set S selected in the t-th iteration,representing a subset of the negative sample set D selected in the t-th iteration.
The principle of the above solution is: for finding the correspondence at F maxFirst, F pairs are calculatedThe gradient direction is the direction in which the value of the function F decreases most rapidly, and therefore iteration is performed in each round according to the gradient direction. The descending speed is controlled by the learning rate, and the extreme point is found after a plurality of iterations.
In addition, the present embodiment employs weight learning to obtain the weight vector, which has the following advantages: if the weight is defined according to business experience without learning the weight, the biggest problem is that the change cannot be quickly coped with, and the importance of some elements cannot be immediately adjusted when the importance changes, for example, a small-size anchor number suddenly changes a behavior mode, the same equipment is not used for broadcasting in different guilds, and the similarity of two anchors (actually belonging to the same anchor) in different guilds is reduced because the weight of the broadcasting equipment is well defined and cannot be timely found but cannot be timely changed, thereby causing the condition of missing identification. The problem can be well solved by adopting a weight learning method.
Next, S105 is executed, and a target parameter value representing the weighted similarity between the two anchor broadcasters is obtained based on the first metric value, the second metric value, the third metric value and the weight vector.
In a specific implementation process, the first metric value, the second metric value and the third metric value can be weighted through the weight values in the weight vector, and a target parameter value representing the weighted similarity between the two anchor players is obtained.
Specifically, the target parameter value may be obtained by the following formulaW:
Next, S106 is executed to identify two anchor whose target parameter value is greater than a preset threshold as target anchors.
In a specific implementation process, the preset threshold may be set as needed, and the determination method provided in this embodiment is: and calculating the average association probability of similar anchor among the sets S, sorting the association probabilities from large to small, and taking 99% quantile points as a threshold value. If the coverage rate is to be improved, the preset threshold value can be reduced, and if the accuracy rate is to be improved, the preset threshold value can be improved.
If the target parameter value between two anchor is larger than the preset threshold value and belongs to different guilds, the two anchor can be regarded as the same anchor and identified as the target anchor. In order to avoid occupation of live broadcast network resources, a target anchor can be limited, specifically, a small-size anchor can be forcibly required to perform real person verification of a handheld identity card, and a live broadcast room can be forbidden if the small-size anchor does not pass the real person verification.
Therefore, after the steps of S101-S106 are performed, the method further includes:
the target anchor is restricted, e.g., its activity behavior is restricted. Therefore, occupation of network resources is reduced, and live broadcast smoothness of a live broadcast room on a network live broadcast platform is ensured.
The following describes the implementation process of the method of this embodiment by using a practical example:
assuming that the anchor announcement of anchor A is "" ranking of the peak of the canyon "", the set of broadcasting facilities includesBroadcasting an IP set including(ii) a The main announcement of the main broadcaster B is 'the queen ranking and single-row upper division', and the broadcasting equipment set comprisesBroadcasting an IP set including。
Segmenting the anchor announcement, and generating a word vector:
calculating the word frequency of each word:
and calculating the announcement text representation according to the word vector and the word frequency:
thus:
A threshold value of 0.6 is set, and since 0.737>0.6, it is determined that a and B are target anchor of the same anchor, and a or B is an anchor trumpet.
The advantages of the embodiments are illustrated below by a business application scenario example:
and if the anchor does not complain any more or the handheld identity card verification fails, the official anchor is determined to be the anchor with the small number, namely the identification is correct. 350 small-size anchor can be correctly identified by adopting the method of the associated equipment, and the method not only covers the identification of the method, but also can additionally identify 200 anchor, and can improve the identification quantity by 57%.
The technical scheme in the embodiment of the application at least has the following technical effects or advantages:
the method of the embodiment first obtains a first metric value representing the similarity of the anchor announcements between two anchor according to the anchor announcements of the anchor; according to the broadcasting IP set of the anchor, a second metric value representing the similarity of the broadcasting IP set between the two anchors is obtained; according to the broadcasting equipment set of the anchor, obtaining a third measurement value representing the similarity of the broadcasting equipment set between the two anchors; according to the broadcasting IP set and the broadcasting equipment set of the anchor, obtaining a positive sample set formed by the anchor with the same live broadcasting characteristics and a negative sample set formed by the anchors with different live broadcasting characteristics; secondly, obtaining a weight vector about the first metric value, the second metric value and the third metric value based on the first metric value, the second metric value, the third metric value, the positive sample set and the negative sample set; obtaining a target parameter value representing the weighted similarity between the two anchor players based on the first metric value, the second metric value, the third metric value and the weight vector; and identifying two anchor broadcasters with the target parameter value larger than a preset threshold value as target anchor broadcasters. According to the method, the first metric value, the second metric value and the third metric value are weighted by the weight vector obtained by the positive sample set formed by the anchor with the same live broadcast characteristics and the negative sample set formed by the anchor with different live broadcast characteristics on the basis of obtaining the first metric value, the second metric value and the third metric value, so that the obtained target parameter value can represent the similarity between the anchors more accurately.
Example two
Based on the same inventive concept as the embodiment, the present embodiment provides a system for identifying a target anchor, referring to fig. 2, the system including:
the data acquisition module is used for acquiring the anchor announcements, the broadcasting IP sets and the broadcasting equipment sets of a plurality of anchors;
a first obtaining module, configured to obtain, based on the anchor announcements of the multiple anchors, a first metric value representing similarity of the anchor announcements between two anchors; obtaining a second metric value representing the similarity of the broadcasting IP sets between the two anchor broadcasters based on the broadcasting IP sets of the anchor broadcasters; obtaining a third metric value representing the similarity of the broadcasting equipment sets between the two anchor broadcasters based on the broadcasting equipment sets of the anchor broadcasters;
a second obtaining module, configured to obtain, based on the broadcasting IP sets and the broadcasting device sets of the multiple anchor, a positive sample set composed of anchors with the same live broadcast characteristics and a negative sample set composed of anchors with different live broadcast characteristics;
a third obtaining module for obtaining a weight vector for the first, second and third metric values based on the first, second, third metric values, a set of positive samples and a set of negative samples;
a fourth obtaining module, configured to obtain a target parameter value representing the weighted similarity between the two anchor broadcasters, based on the first metric value, the second metric value, the third metric value, and the weight vector;
and the target identification module is used for identifying the two anchor broadcasters with the target parameter values larger than a preset threshold value as target anchor broadcasters.
Since the system for identifying a target anchor described in this embodiment is a system adopted to implement the method for identifying a target anchor according to the embodiment of the present application, a specific implementation manner of the system according to this embodiment and various variations thereof can be understood by those skilled in the art based on the method for identifying a target anchor described in the first embodiment of the present application, and therefore, how to implement the method according to the first embodiment of the present application by using the system is not described in detail herein. As long as a person skilled in the art uses a system for implementing the method for identifying a target anchor in the embodiments of the present application, the system is within the scope of the present application;
based on the same inventive concept as in the previous embodiments, embodiments of the present invention further provide a readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the method of an embodiment.
Based on the same inventive concept as in the foregoing embodiments, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method according to the first embodiment are implemented.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A method of identifying a target anchor, the method comprising:
acquiring anchor announcements, broadcast IP sets and broadcast equipment sets of a plurality of anchors;
obtaining a first metric value representing the similarity of the anchor announcements between two anchor announcements based on the anchor announcements of the anchor announcements; obtaining a second metric value representing the similarity of the broadcasting IP sets between the two anchor broadcasters based on the broadcasting IP sets of the anchor broadcasters; obtaining a third metric value representing the similarity of the broadcasting equipment sets between the two anchor broadcasters based on the broadcasting equipment sets of the anchor broadcasters;
based on the broadcasting IP sets and the broadcasting equipment sets of the multiple anchor broadcasters, obtaining a positive sample set formed by anchor broadcasters with the same live broadcast characteristics and a negative sample set formed by anchor broadcasters with different live broadcast characteristics;
obtaining a weight vector for the first, second, and third metric values based on the first, second, third metric values, a set of positive samples, and a set of negative samples;
obtaining a target parameter value representing the weighted similarity between the two anchor players based on the first metric value, the second metric value, the third metric value and the weight vector;
and identifying two anchor broadcasters with the target parameter value larger than a preset threshold value as target anchor broadcasters.
2. The method of claim 1, wherein said obtaining a first metric value characterizing a similarity of anchor announcements between two anchor based on the anchor announcements of the plurality of anchor specifically comprises:
performing word segmentation on the anchor announcements of the two anchors to generate a word vector set;
obtaining announcement vectors of the two anchor announcements based on the word vector set;
and obtaining a first metric value representing the similarity of the anchor announcements between the two anchors based on the announcement vectors of the two anchor announcements.
3. The method of claim 2, wherein said obtaining advertisement vectors for two of said anchor announcements based on said set of word vectors comprises:
the advertisement vector is obtained according to the following formula:
wherein the content of the first and second substances,an advertisement vector representing the anchor advertisement s,the number of words after the word segmentation of the anchor announcement s is represented;meaning termA generated vector;is a constant;words in the representation of the anchor announcements sWord frequencies that occur in all anchor announcements.
4. The method of claim 2, wherein said obtaining a first metric value characterizing a similarity of anchor announcements between two anchor based on said anchor announcements of said plurality of anchor specifically comprises:
the first metric value is obtained according to the following formula:
the obtaining a second metric value representing the similarity of the broadcast IP sets between the two anchor broadcasters based on the broadcast IP sets of the anchor broadcasters specifically includes:
the second metric value is obtained according to the following formula:
the obtaining, based on the broadcast device sets of the multiple anchor, a third metric value representing similarity of the broadcast device sets between the two anchors specifically includes:
the third metric value is obtained according to the following formula:
5. The method of claim 4, wherein obtaining a weight vector for the first, second, and third metric values based on the first, second, third metric values, a set of positive samples, and a set of negative samples comprises:
based on the first metric value, the second metric value, the third metric value, the positive sample set and the negative sample set, the following formula is constructed:
wherein the content of the first and second substances,,representing a weight vector;representing a weight adjustment coefficient;is a variable, representing a anchor in the positive set of samples S or the negative set of samples D,;
6. The method of claim 1, wherein the solving using the positive set of samples and the negative set of samples results inMinimum weight vectorThe method specifically comprises the following steps:
7. The method of claim 1, wherein after identifying as target anchor two of said anchors for which said target parameter value is greater than a preset threshold, said method further comprises:
and limiting the target anchor.
8. A system for identifying a target anchor, the system comprising:
the data acquisition module is used for acquiring the anchor announcements, the broadcasting IP sets and the broadcasting equipment sets of a plurality of anchors;
a first obtaining module, configured to obtain, based on the anchor announcements of the multiple anchors, a first metric value representing similarity of the anchor announcements between two anchors; obtaining a second metric value representing the similarity of the broadcasting IP sets between the two anchor broadcasters based on the broadcasting IP sets of the anchor broadcasters; obtaining a third metric value representing the similarity of the broadcasting equipment sets between the two anchor broadcasters based on the broadcasting equipment sets of the anchor broadcasters;
a second obtaining module, configured to obtain, based on the broadcasting IP sets and the broadcasting device sets of the multiple anchor, a positive sample set composed of anchors with the same live broadcast characteristics and a negative sample set composed of anchors with different live broadcast characteristics;
a third obtaining module for obtaining a weight vector for the first, second and third metric values based on the first, second, third metric values, a set of positive samples and a set of negative samples;
a fourth obtaining module, configured to obtain a target parameter value representing the weighted similarity between the two anchor broadcasters, based on the first metric value, the second metric value, the third metric value, and the weight vector;
and the target identification module is used for identifying the two anchor broadcasters with the target parameter values larger than a preset threshold value as target anchor broadcasters.
9. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-7 are implemented when the program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010584332.4A CN111488491B (en) | 2020-06-24 | 2020-06-24 | Method, system, medium and equipment for identifying target anchor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010584332.4A CN111488491B (en) | 2020-06-24 | 2020-06-24 | Method, system, medium and equipment for identifying target anchor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111488491A true CN111488491A (en) | 2020-08-04 |
CN111488491B CN111488491B (en) | 2020-10-16 |
Family
ID=71813528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010584332.4A Active CN111488491B (en) | 2020-06-24 | 2020-06-24 | Method, system, medium and equipment for identifying target anchor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111488491B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113965772A (en) * | 2021-10-29 | 2022-01-21 | 北京百度网讯科技有限公司 | Live video processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180144738A1 (en) * | 2016-11-23 | 2018-05-24 | IPsoft Incorporated | Selecting output from candidate utterances in conversational interfaces for a virtual agent based upon a priority factor |
CN108390883A (en) * | 2018-02-28 | 2018-08-10 | 武汉斗鱼网络科技有限公司 | Recognition methods, device and the terminal device of brush popularity user |
CN108419126A (en) * | 2018-01-23 | 2018-08-17 | 广州虎牙信息科技有限公司 | Abnormal main broadcaster's recognition methods, storage medium and the terminal of platform is broadcast live |
CN109086813A (en) * | 2018-07-23 | 2018-12-25 | 广州虎牙信息科技有限公司 | Determination method, apparatus, equipment and the storage medium of main broadcaster's similarity |
CN109151518A (en) * | 2018-08-06 | 2019-01-04 | 武汉斗鱼网络科技有限公司 | A kind of recognition methods, device and the electronic equipment of stolen account |
-
2020
- 2020-06-24 CN CN202010584332.4A patent/CN111488491B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180144738A1 (en) * | 2016-11-23 | 2018-05-24 | IPsoft Incorporated | Selecting output from candidate utterances in conversational interfaces for a virtual agent based upon a priority factor |
CN108419126A (en) * | 2018-01-23 | 2018-08-17 | 广州虎牙信息科技有限公司 | Abnormal main broadcaster's recognition methods, storage medium and the terminal of platform is broadcast live |
CN108390883A (en) * | 2018-02-28 | 2018-08-10 | 武汉斗鱼网络科技有限公司 | Recognition methods, device and the terminal device of brush popularity user |
CN109086813A (en) * | 2018-07-23 | 2018-12-25 | 广州虎牙信息科技有限公司 | Determination method, apparatus, equipment and the storage medium of main broadcaster's similarity |
CN109151518A (en) * | 2018-08-06 | 2019-01-04 | 武汉斗鱼网络科技有限公司 | A kind of recognition methods, device and the electronic equipment of stolen account |
Non-Patent Citations (1)
Title |
---|
郭淑慧等: "网络直播平台数据挖掘与行为分析综述", 《物理学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113965772A (en) * | 2021-10-29 | 2022-01-21 | 北京百度网讯科技有限公司 | Live video processing method and device, electronic equipment and storage medium |
CN113965772B (en) * | 2021-10-29 | 2024-05-10 | 北京百度网讯科技有限公司 | Live video processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111488491B (en) | 2020-10-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9028260B2 (en) | Automated quiz generation system | |
CN110909205B (en) | Video cover determination method and device, electronic equipment and readable storage medium | |
CN111698575B (en) | Live highlight video editing method, device, equipment and storage medium | |
CN109640112B (en) | Video processing method, device, equipment and storage medium | |
CN109803176B (en) | Auditing monitoring method and device, electronic equipment and storage medium | |
US10084870B1 (en) | Identifying user segment assignments | |
Zhao et al. | Direct and indirect spillovers from content providers’ switching: Evidence from online livestreaming | |
CN111479129B (en) | Live cover determination method, device, server, medium and system | |
CN111522724B (en) | Method and device for determining abnormal account number, server and storage medium | |
CN113535991B (en) | Multimedia resource recommendation method and device, electronic equipment and storage medium | |
Young et al. | Understanding the relative contribution of technical and tactical performance to match outcome in Australian Football | |
CN111488491B (en) | Method, system, medium and equipment for identifying target anchor | |
CN110362751B (en) | Service recommendation method, device, computer equipment and storage medium | |
CN108066990B (en) | Method, device and server for selecting user from candidate user list | |
CN109558542B (en) | Information quality evaluation method, information pushing method and device | |
Ciechan-Kujawa et al. | Success determinants of sports projects financed with donation crowdfunding | |
CN109451332B (en) | User attribute marking method and device, computer equipment and medium | |
US20200218740A1 (en) | Data prioritization through relationship analysis mapping | |
CN117114475A (en) | Comprehensive capability assessment system based on multidimensional talent assessment strategy | |
WO2022247671A1 (en) | User recall method and apparatus, and computer device and storage medium | |
CN113780415B (en) | User portrait generating method, device, equipment and medium based on applet game | |
CN113301362B (en) | Video element display method and device | |
CN114548263A (en) | Method and device for verifying labeled data, computer equipment and storage medium | |
CN112651764B (en) | Target user identification method, device, equipment and storage medium | |
CN110292774B (en) | Method, device, equipment and storage medium for processing stubble finding picture material |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20200804 Assignee: Yidu Lehuo Network Technology Co.,Ltd. Assignor: WUHAN DOUYU YULE NETWORK TECHNOLOGY Co.,Ltd. Contract record no.: X2023980041383 Denomination of invention: A method, system, medium, and device for identifying target anchors Granted publication date: 20201016 License type: Common License Record date: 20230908 |