CN115209188A

CN115209188A - Detection method, device, server and storage medium for simultaneous live broadcast of multiple accounts

Info

Publication number: CN115209188A
Application number: CN202211091768.5A
Authority: CN
Inventors: 易澄; 雷刚
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2022-10-18
Anticipated expiration: 2042-09-07
Also published as: CN115209188B

Abstract

The disclosure relates to a method, a device, a server and a storage medium for detecting simultaneous live broadcast of multiple accounts, wherein the method comprises the following steps: respectively carrying out voice recognition processing on a plurality of voice fragments of a first live broadcast room and a plurality of voice fragments of a second live broadcast room to obtain a plurality of voice text fragments of the first live broadcast room and a plurality of voice text fragments of the second live broadcast room; obtaining a plurality of voice text segment pairs based on a plurality of voice text segments of a first live broadcast room and a plurality of voice text segments of a second live broadcast room; obtaining a voice text segment pair sequence and sequence matching information of the voice text segment pair sequence based on the segment matching information of each voice text segment pair and a plurality of voice text segment pairs; and under the condition that the sequence matching information meets the preset sequence matching condition, determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast aiming at the same content. The method can achieve accurate detection results of simultaneous live broadcast of multiple accounts under the condition of relatively low resource consumption.

Description

Detection method, device, server and storage medium for simultaneous live broadcast of multiple accounts

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a method, an apparatus, a server, a storage medium, and a program product for detecting simultaneous live broadcast of multiple accounts.

Background

With the development of the live broadcasting industry of new media, some illegal behaviors disturbing the ecology of the industry also appear, for example, the same content is live broadcast simultaneously through a plurality of accounts, specifically, the same anchor uses a plurality of accounts for live broadcast simultaneously, the live broadcast of other people synchronously is embezzled through a technical means, and the condition that the live broadcast with the same content is broadcast simultaneously is realized through any other mode. The "live content is the same" includes, but is not limited to, the case where the videos and audios are completely the same, and for example, the case where the same or multiple anchor broadcasts or the same scene are shot simultaneously from different angles and different sides is also included. The behavior of simultaneous live broadcast of multiple accounts invades the traffic of a public domain, so that the detection of the behavior is very important.

Currently, the detection of the simultaneous live broadcast behavior of multiple accounts generally includes capturing key frames from a live broadcast stream, and judging whether two live broadcast rooms are visually matched or not through an image matching technology. However, this method requires a huge consumption of hardware resources.

Disclosure of Invention

The disclosure provides a method, a device, a server, a storage medium and a program product for detecting simultaneous live broadcast of multiple accounts, so as to at least solve the problem that huge hardware resources are required to be consumed in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, a method for detecting simultaneous live broadcast of multiple accounts is provided, including:

respectively carrying out voice recognition processing on a plurality of voice segments of a first live broadcast room and a plurality of voice segments of a second live broadcast room to obtain a plurality of voice text segments of the first live broadcast room and a plurality of voice text segments of the second live broadcast room;

obtaining a plurality of voice text segment pairs based on the plurality of voice text segments of the first live broadcast room and the plurality of voice text segments of the second live broadcast room; each voice text segment pair comprises a voice text segment of the first live broadcast room and a voice text segment of the second live broadcast room;

acquiring fragment matching information of each voice text fragment pair, and acquiring a voice text fragment pair sequence and sequence matching information of the voice text fragment pair sequence based on the fragment matching information and the plurality of voice text fragment pairs;

and under the condition that the sequence matching information meets a preset sequence matching condition, determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast aiming at the same content.

In an exemplary embodiment, each speech text segment has a corresponding timestamp; obtaining a speech text segment pair sequence based on the segment matching information and the plurality of speech text segment pairs, including:

determining a current voice text segment pair from the plurality of voice text segment pairs according to the time stamp;

acquiring segment matching information of the current voice text segment pair, and determining a next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs by adopting a corresponding segment pair determination mode based on a comparison result between the segment matching information of the current voice text segment pair and a preset segment matching condition;

taking the next voice text segment pair as a new current voice text segment pair, and returning to the step of acquiring segment matching information of the current voice text segment pair until all voice text segments in the first live broadcast room and all voice text segments in the second live broadcast room are traversed;

and obtaining the voice text segment pair sequence based on each voice text segment pair of which the determined segment matching information meets the segment matching condition.

In an exemplary embodiment, the determining, based on a comparison result between the segment matching information of the current speech text segment pair and a preset segment matching condition, a next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs in a corresponding segment pair determining manner includes:

determining current sequence matching information;

if the comparison result is that the segment matching information of the current voice text segment pair meets the preset segment matching condition, updating the current sequence matching information according to the segment matching information of the current voice text segment pair to obtain updated sequence matching information;

and under the condition that the updated sequence matching information does not meet the sequence matching condition, determining the next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs in a mode that the two voice text segments are updated.

In an exemplary embodiment, the determining, based on a comparison result between the segment matching information of the current speech text segment pair and a preset segment matching condition, a next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs in a corresponding segment pair determining manner further includes:

if the comparison result is that the segment matching information of the current voice text segment pair does not meet the preset segment matching condition, comparing the timestamps of the two voice text segments included in the current voice text segment pair;

and determining the next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs by updating the voice text segment with the earlier time stamp.

In an exemplary embodiment, each speech text segment has a corresponding timestamp; before obtaining a speech text segment pair sequence based on the segment matching information and the plurality of speech text segment pairs, the method further includes:

sequencing the plurality of voice text segments of the first live broadcast room and the plurality of voice text segments of the second live broadcast room respectively according to the time stamps to obtain a first voice text segment sequence of the first live broadcast room and a second voice text segment sequence of the second live broadcast room;

establishing a two-dimensional corresponding matrix between the first voice text fragment sequence and the second voice text fragment sequence; each element in the two-dimensional corresponding matrix corresponds to a speech text segment pair;

obtaining a speech text segment pair sequence based on the segment matching information and the plurality of speech text segment pairs, further comprising:

acquiring a state transition relational expression; the state transition relational expression is used for determining accumulated matching information corresponding to each element, the accumulated matching information of each element is determined based on the accumulated matching information corresponding to the associated element of each element and the segment matching information corresponding to each element, and the timestamp of the voice text segment pair corresponding to the associated element is earlier than the timestamp of the voice text segment pair corresponding to the element;

and determining a plurality of target elements from each element of the two-dimensional corresponding matrix based on the segment matching information and the state transition relational expression, and taking a sequence formed by voice text segment pairs corresponding to the target elements as the voice text segment pair sequence.

In an exemplary embodiment, the determining, based on the segment matching information and the state transition relation, a plurality of target elements from each element of the two-dimensional correspondence matrix includes:

obtaining accumulated matching information corresponding to each element in the two-dimensional corresponding matrix based on the segment matching information and the state transition relational expression;

determining a starting element of the voice text segment pair sequence, and determining a target element corresponding to the starting element from the associated elements of the starting element based on the accumulated matching information of the associated elements of the starting element and the segment matching information of the voice text segment pair corresponding to the starting element;

and taking the target element corresponding to the initial element as a new initial element, returning to the step of determining the target element corresponding to the initial element from the associated elements of the initial element based on the cumulative matching information of the associated elements of the initial element and the segment matching information of the voice text segment pair corresponding to the initial element until the determined new initial element is the element with the earliest time stamp in the two-dimensional corresponding matrix, and determining each obtained target element as the plurality of target elements.

In an exemplary embodiment, the sequence matching information includes average difference information of each pair of speech text segments in the sequence of pairs of speech text segments and the number of pairs of speech text segments matched in each pair of speech text segments; the method further comprises the following steps:

and under the condition that the average difference information of all the voice text segment pairs is smaller than a first threshold value and the number of the matched voice text segment pairs is larger than a second threshold value, determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast aiming at the same content.

In an exemplary embodiment, the obtaining of the segment matching information of each speech text segment pair includes:

respectively acquiring fragment characteristics of multiple dimensions of two voice text fragments in each voice text fragment pair;

determining matching information of the two voice text segments in each dimension based on the segment features of the plurality of dimensions;

and obtaining the segment matching information of the voice text segment pair based on the matching information under each dimension.

In an exemplary embodiment, the method further comprises:

acquiring the number of the live broadcast rooms to be detected under the condition that a plurality of live broadcast rooms to be detected exist;

determining the number of the calculation nodes based on the number of the live broadcast rooms to be detected;

and averagely dividing the live broadcast room to be detected to each computing node for simultaneous live broadcast detection.

According to a second aspect of the embodiments of the present disclosure, there is provided a detection apparatus for simultaneous live broadcast of multiple accounts, including:

the voice recognition unit is configured to perform voice recognition processing on a plurality of voice fragments of a first live broadcast room and a plurality of voice fragments of a second live broadcast room respectively to obtain a plurality of voice text fragments of the first live broadcast room and a plurality of voice text fragments of the second live broadcast room;

a segment pair obtaining unit configured to perform a plurality of speech text segment pairs based on a plurality of speech text segments of the first live broadcast room and a plurality of speech text segments of the second live broadcast room; each voice text segment pair comprises a voice text segment of the first live broadcast room and a voice text segment of the second live broadcast room;

the sequence determining unit is configured to execute the steps of obtaining segment matching information of each voice text segment pair, and obtaining a voice text segment pair sequence and sequence matching information of the voice text segment pair sequence based on the segment matching information and the voice text segment pairs;

the detection unit is configured to execute the step of determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast aiming at the same content under the condition that the sequence matching information meets a preset sequence matching condition.

In an exemplary embodiment, each speech text segment has a corresponding timestamp; the sequence determination unit is further configured to determine a current speech text segment pair from the plurality of speech text segment pairs according to the time stamp; acquiring segment matching information of the current voice text segment pair, and determining a next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs by adopting a corresponding segment pair determining mode based on a comparison result between the segment matching information of the current voice text segment pair and a preset segment matching condition; taking the next voice text segment pair as a new current voice text segment pair, and returning to the step of acquiring segment matching information of the current voice text segment pair until all voice text segments in the first live broadcast room and all voice text segments in the second live broadcast room are traversed; and obtaining the voice text segment pair sequence based on each voice text segment pair of which the determined segment matching information meets the segment matching conditions.

In an exemplary embodiment, the sequence determining unit further includes a segment pair determining subunit configured to perform determining current sequence matching information; if the comparison result is that the segment matching information of the current voice text segment pair meets the preset segment matching condition, updating the current sequence matching information according to the segment matching information of the current voice text segment pair to obtain updated sequence matching information; and under the condition that the updated sequence matching information does not meet the sequence matching condition, determining the next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs in a mode that the two voice text segments are updated.

In an exemplary embodiment, the segment pair determining subunit is further configured to compare timestamps of two speech text segments included in the current speech text segment pair if the comparison result indicates that the segment matching information of the current speech text segment pair does not satisfy the preset segment matching condition; and determining the next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs by updating the voice text segment with the earlier time stamp.

In an exemplary embodiment, the apparatus further includes a correspondence matrix establishing module configured to perform, according to the timestamps, a sorting process on a plurality of speech text segments in the first live broadcast room and a plurality of speech text segments in the second live broadcast room, respectively, so as to obtain a first speech text segment sequence in the first live broadcast room and a second speech text segment sequence in the second live broadcast room; establishing a two-dimensional corresponding matrix between the first voice text fragment sequence and the second voice text fragment sequence; each element in the two-dimensional corresponding matrix corresponds to a speech text segment pair;

the sequence determination unit is further configured to perform obtaining a state transition relation; the state transition relational expression is used for determining accumulated matching information corresponding to each element, the accumulated matching information of each element is determined based on the accumulated matching information corresponding to the associated element of each element and the segment matching information corresponding to each element, and the timestamp of the voice text segment pair corresponding to the associated element is earlier than the timestamp of the voice text segment pair corresponding to the element; and determining a plurality of target elements from each element of the two-dimensional corresponding matrix based on the segment matching information and the state transition relational expression, and taking a sequence formed by voice text segment pairs corresponding to the target elements as the voice text segment pair sequence.

In an exemplary embodiment, the sequence determining unit is further configured to perform obtaining cumulative matching information corresponding to each element in the two-dimensional corresponding matrix based on the segment matching information and the state transition relational expression; determining a starting element of the voice text segment pair sequence, and determining a target element corresponding to the starting element from the associated elements of the starting element based on the accumulated matching information of the associated elements of the starting element and the segment matching information of the voice text segment pair corresponding to the starting element; and taking the target element corresponding to the initial element as a new initial element, returning to the step of determining the target element corresponding to the initial element from the associated elements of the initial element based on the cumulative matching information of the associated elements of the initial element and the segment matching information of the voice text segment pair corresponding to the initial element until the determined new initial element is the element with the earliest time stamp in the two-dimensional corresponding matrix, and determining each obtained target element as the plurality of target elements.

In an exemplary embodiment, the sequence matching information includes average difference information of each pair of speech text segments in the sequence of pairs of speech text segments and the number of pairs of speech text segments matched in each pair of speech text segments;

the detection unit is further configured to determine that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast for the same content when the average difference information of the respective voice text segment pairs is smaller than a first threshold and the number of the matched voice text segment pairs is larger than a second threshold.

In an exemplary embodiment, the sequence determining unit further includes a segment matching subunit configured to perform, for each pair of speech text segments, obtaining segment features of multiple dimensions of two speech text segments in the pair of speech text segments respectively; determining matching information of the two voice text segments under each dimension based on the segment features of the plurality of dimensions; and obtaining the segment matching information of the voice text segment pair based on the matching information under each dimension.

In an exemplary embodiment, the apparatus further includes a dividing unit configured to perform, in a case that there are a plurality of live broadcast rooms to be detected, acquiring the number of live broadcast rooms to be detected; determining the number of the calculation nodes based on the number of the live broadcast rooms to be detected; and averagely dividing the live broadcast room to be detected to each computing node for simultaneous live broadcast detection.

According to a third aspect of the embodiments of the present disclosure, there is provided a server, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of the above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of a server, enable the server to perform the method of any one of the above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when executed by a processor of a server, enable the server to perform the method as defined in any one of the above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the method has the advantages that the multi-account simultaneous live broadcast detection is carried out through the voice recognition result of the live broadcast room, the voice recognition result of the live broadcast room belongs to basic information, and a plurality of follow-up services need to be analyzed based on the voice recognition result, so that the method does not need to additionally increase the resource consumption of the voice recognition service, and accurate detection results can be obtained under the condition of relatively low resource consumption.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating an application environment of a method for detecting simultaneous live broadcast of multiple accounts according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a method for detecting simultaneous live broadcast of multiple accounts according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating a method for determining a sequence of a speech-to-text segment pair according to an exemplary embodiment.

FIG. 4 is a schematic diagram illustrating a two-dimensional correspondence matrix in accordance with an exemplary embodiment.

Fig. 5 is a schematic diagram of a two-dimensional correspondence matrix shown in an application example.

Fig. 6 (a) is a schematic diagram illustrating the optimization of the calculation amount with respect to the two-dimensional correspondence matrix according to an exemplary embodiment.

Fig. 6 (b) is a schematic diagram illustrating calculation amount optimization for a two-dimensional correspondence matrix according to another exemplary embodiment.

Fig. 7 is a schematic diagram of live broadcast room allocation by way of traversal of bubble sort in the prior art.

Fig. 8 is a diagram illustrating live room allocation by way of uniform partitioning in accordance with an example embodiment.

Fig. 9 is a design diagram illustrating a detection system for multiple accounts live simultaneously in accordance with an example embodiment.

Fig. 10 is a block diagram illustrating an architecture of a detection apparatus for simultaneous live multi-account playing according to an exemplary embodiment.

FIG. 11 is a block diagram illustrating a server in accordance with an exemplary embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. It should also be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are both information and data that are authorized by the user or sufficiently authorized by various parties.

The method for detecting simultaneous live broadcast of multiple accounts provided by the embodiment of the disclosure can be applied to an application environment as shown in fig. 1. The server 104 communicates with a plurality of live terminals 102, respectively. The data storage system may store data that the server 104 needs to process, such as voice segments, voice-text segments, segment matching information for pairs of voice-text segments, and sequence matching information for sequences of pairs of voice-text segments. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. In the application scenario of the present disclosure, each live broadcast terminal 102 acquires a plurality of voice segments of a respective corresponding live broadcast room, and sends the voice segments to the server 104, and the server 104 performs voice recognition processing on the plurality of voice segments of each live broadcast room respectively to obtain voice text segments of the live broadcast room of each live broadcast terminal 102, obtains a plurality of voice text segment pairs for each two live broadcast rooms according to the voice text segments of the two live broadcast rooms, obtains sequence matching information of the voice text segment pairs and the voice text segment pairs based on the segment matching information of each voice text segment pair, compares the sequence matching information with a preset sequence matching condition, and determines whether the two live broadcast rooms belong to a multi-account simultaneous live broadcast room or not according to a comparison result. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.

Fig. 2 is a flowchart illustrating a method for detecting simultaneous live broadcast of multiple accounts according to an exemplary embodiment, and as shown in fig. 2, the method is described as applied to the server 104 in fig. 1, and includes the following steps:

in step S210, voice recognition processing is performed on the multiple voice segments of the first live broadcast room and the multiple voice segments of the second live broadcast room respectively to obtain multiple voice text segments of the first live broadcast room and multiple voice text segments of the second live broadcast room.

Wherein, the voice segment represents a section of voice intercepted in the live broadcast process of the live broadcast room. Each voice segment has a corresponding time stamp, which may be a time stamp of a start time of the voice segment.

The first live broadcast room and the second live broadcast room represent two live broadcast rooms to be detected, and it should be noted that, in this embodiment, the first live broadcast room and the second live broadcast room are taken as an example, and are only used for explaining the multi-account simultaneous live broadcast detection method provided by the present application, and are not used for limiting that the method can only be applied to two live broadcast rooms, and it can be understood that the method can also be used for detecting multi-account simultaneous live broadcast in more than three live broadcast rooms.

In the specific implementation, in the live broadcasting process of the first live broadcasting room, the live broadcasting terminal corresponding to the first live broadcasting room can intercept the voice segments of the first live broadcasting room in real time to obtain a plurality of voice segments of the first live broadcasting room and send the voice segments to the server 104, and similarly, the live broadcasting terminal corresponding to the second live broadcasting room can obtain a plurality of voice segments of the second live broadcasting room and send the voice segments to the server 104. The server 104 respectively performs voice recognition on each voice segment of the first live broadcast room and each voice segment of the second live broadcast room to obtain a plurality of voice text segments of the first live broadcast room and a plurality of voice text segments of the second live broadcast room.

In step S220, a plurality of speech text segment pairs are obtained based on the plurality of speech text segments in the first live broadcast room and the plurality of speech text segments in the second live broadcast room; each speech-text segment pair includes a speech-text segment of the first live broadcast room and a speech-text segment of the second live broadcast room.

Each voice text segment pair is composed of a voice text segment of the first live broadcast room and a voice text segment of the second live broadcast room.

In specific implementation, each voice text segment in the first live broadcast room and each voice text segment in the second live broadcast room can form a voice text segment pair, and if there are M voice text segments in the first live broadcast room and N voice text segments in the second live broadcast room, M × N voice text segment pairs can be obtained.

In step S230, segment matching information of each speech text segment pair is obtained, and based on the segment matching information and the plurality of speech text segment pairs, a speech text segment pair sequence and sequence matching information of the speech text segment pair sequence are obtained.

The segment matching information is obtained based on the matching of the feature of the speech text segment pair with the segments of multiple dimensions, and the segment matching information includes the matching information of multiple dimensions, for example, the segment matching information may include the matching information of the segment length, the matching information of the segment content, the matching information of the segment time, the matching information of the segment sequence number, and the like. The sequence number of the segment represents the sequence number in the obtained speech text segment sequence after sequencing the speech text segments according to the time stamps.

The sequence matching information may include average difference information of each pair of speech text segments in the sequence of pairs of speech text segments and the number of pairs of speech text segments matched in each pair of speech text segments.

In this step, it is considered that live content of two live rooms may not be live in real time, that is, there is a recorded broadcast situation, and in the recorded broadcast situation, there is a large deviation in the playing time of the same live content, resulting in a consistent offset of the timestamp of the matched voice segment, for example, the voice text segment chunk (a) _i ) Actual and speech text fragment chunk (b) _j ) The playing content corresponds to, wherein, a _i An ith speech segment, b, that can represent a first live broadcast room a _j The jth speech segment of the second live broadcast room b can be represented, but the timestamps of the two speech text segments differ by 20s. If the two voice text segments are matched according to the timestamp, a correctly matched segment pair cannot be found, so that the situation of recording and broadcasting at different times cannot be recalled. In order to solve the problem, the disclosure provides a method for determining a sequence of a voice text segment through segment matching information of each voice text segment pair, so as to realize recall of a recording and broadcasting situation.

In a specific implementation, there are two methods for determining a speech text segment sequence through segment matching information of each speech text segment pair. Determining an initial voice text segment pair, determining a next voice text segment pair from a plurality of voice text segment pairs according to a comparison result between segment matching information of the initial voice text segment pair and a preset segment matching condition by adopting a corresponding segment pair updating mode, taking the next voice text segment pair as a new voice text segment pair, returning to the step of comparing with the preset segment matching condition to obtain a comparison result until each voice text segment in a first live broadcast room and each voice text segment in a second live broadcast room are traversed, and forming a voice text segment pair sequence by using each voice text segment pair of which the segment matching information determined in the circulating process meets the segment matching condition.

The method can recall the recorded broadcast situation, but the time offset of the voice text segment not only influences whether the voice text segment pair is matched, but also influences the problem of increasing the sequence of the voice text segment in the matching and searching process. As shown in fig. 3, a flow chart of the method for determining a sequence of a speech text segment pair needs to determine to sequentially check the speech text segment of the next live broadcast room a or b when the current speech text segment pair does not match. When the speech text segment time is not accurate, the logic is not accurate. E.g., currently judging to chunk (a) _i ) And chunk (b) _j ) Not matching, if it is chunk (a) _i+1 ) Is earlier than chunk (b) _j ) Will proceed to chunk (a) _i+1 ) And chunk (b) _j ) So as to miss the correct matching chunk (a) _i ) And chunk (b) _j+1 )。

Therefore, in order to solve the influence caused by inaccurate timestamp, the present disclosure further provides another method for determining a sequence of a speech text segment, that is, a dynamic programming method is used to determine the sequence of the speech text segment, which is equivalent to converting the problem of whether two live broadcast rooms are live broadcast with multiple accounts at the same time into: an alignment relationship is found in the sequence of ordered speech text segments between the two live broadcasts.

More specifically, the process of determining the sequence of the speech text segment by using the dynamic programming method comprises the following steps: the method comprises the steps of firstly sequencing a plurality of voice text segments of a first live broadcast according to time stamps to obtain a first voice text segment sequence of the first live broadcast, and sequencing a plurality of voice text segments of a second live broadcast to obtain a second voice text segment sequence of the second live broadcast. Establishing a two-dimensional corresponding matrix between the first voice text segment sequence and the second voice text segment sequence, as shown in fig. 4, the two-dimensional corresponding matrix is a schematic diagram of the two-dimensional corresponding matrix, each element in the two-dimensional corresponding matrix corresponds to one voice text segment pair, determining a plurality of target elements from each element of the two-dimensional corresponding matrix based on segment matching information of each voice text segment pair and a preset state transition relational expression, and taking a sequence formed by the voice text segment pairs corresponding to the plurality of target elements as a voice text segment pair sequence.

In step S240, in a case that the sequence matching information meets a preset sequence matching condition, it is determined that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast for the same content.

The sequence matching condition is that the average difference information of each voice text segment pair is smaller than a first threshold value, and the number of matched voice text segment pairs is larger than a second threshold value.

In a specific implementation, the sequence matching information includes average difference information of each pair of voice text segments in the sequence of the pair of voice text segments and the number of pairs of matched voice text segments in each pair of voice text segments, so that it can be determined whether the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast for the same content based on the average difference information of each pair of voice text segments and the number of pairs of matched voice text segments.

In the method for detecting the simultaneous live broadcast of the multiple accounts, voice recognition is carried out on each voice segment of a first live broadcast room and a second live broadcast room to obtain the voice text segment of each live broadcast room, therefore, a plurality of voice text segment pairs are obtained based on the voice text segment of each live broadcast room, a voice text segment pair sequence and sequence matching information of the voice text segment pair sequence are obtained based on segment matching information of each voice text segment pair and the voice text segment pairs, and finally, whether the first live broadcast room and the second live broadcast room belong to the simultaneous live broadcast of the multiple accounts aiming at the same content is determined according to the sequence matching information of the voice text segment pair sequence. The method carries out the detection of the multi-account simultaneous live broadcast through the voice recognition result of the live broadcast room, and as the voice recognition result of the live broadcast room belongs to the basic information, and a plurality of subsequent services need to be analyzed based on the voice recognition result, the method does not need to additionally increase the resource consumption of the voice recognition service, thereby realizing the purpose of obtaining an accurate detection result under the condition of relatively low resource consumption.

In an exemplary embodiment, each speech text segment has a corresponding timestamp; in step S230, obtaining a speech text segment pair sequence based on the segment matching information and the plurality of speech text segment pairs, includes:

step S2304, determining current voice text segment pairs from the plurality of voice text segment pairs according to the time stamps;

step S2305, acquiring segment matching information of the current voice text segment pair, and determining a next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs by adopting a corresponding segment pair determining mode based on a comparison result between the segment matching information of the current voice text segment pair and a preset segment matching condition;

step S2306, using the next voice text segment pair as a new current voice text segment pair, and returning to the step of obtaining segment matching information of the current voice text segment pair until all voice text segments in the first live broadcast room and all voice text segments in the second live broadcast room are traversed;

step S2307, based on each voice text segment pair whose segment matching information meets the segment matching condition, a voice text segment pair sequence is obtained.

The segment matching condition may be that matching information of two speech text segments in the speech text segment pair in each dimension both meets a corresponding matching requirement, that matching information of two speech text segments in the speech text segment pair in a predetermined dimension both meets a corresponding matching requirement, that matching information of two speech text segments in the speech text segment pair in a dimension of a certain proportion both meets a corresponding matching requirement, and that the segment matching condition may be specifically determined according to an actual requirement, which is not specifically limited by the present application.

For example, matching information in the segment content dimension meeting the corresponding matching requirement may be: the difference information of the segment contents of the two voice text segments is smaller than the content threshold value. Matching information in the segment length dimension meeting corresponding matching requirements may be: the segment lengths of the two voice text segments are both larger than a first length threshold, and the difference value of the segment lengths of the two voice text segments is smaller than a second length threshold. Matching information in the segment time dimension meeting corresponding matching requirements may be: the time difference of the two voice text segments is within a preset time range. Matching information in the segment sequence number dimension meeting corresponding matching requirements can be as follows: the difference value of the segment sequence numbers of the two voice text segments is smaller than the sequence number threshold value.

In specific implementation, referring to a flow diagram of the method for determining a sequence of a speech text segment pair shown in fig. 3, after obtaining each speech text segment chunk (a) in the first live broadcast and each speech text segment chunk (b) in the second live broadcast, each speech text segment in the first live broadcast and each speech text segment in the second live broadcast may be sorted according to a timestamp, and subsequent cycles are executed in sequence, where the cycle process is as follows:

(1) Determining initial speech text segment pair chunk (a) ₀ ，b ₀ ) Wherein a is ₀ Can represent the voice text segment with the earliest time stamp in the first live broadcast room a, b ₀ The first speech-text segment with the earliest time stamp in the second live broadcast room b can be represented.

(2) Obtaining initial voice text segment pair chunk (a) ₀ ，b ₀ ) Matches the information and will chunk (a) ₀ ，b ₀ ) And comparing the fragment matching information with the fragment matching condition A to obtain a comparison result.

(3) If the comparison result is chunk (a) ₀ ，b ₀ ) And if the segment matching information meets the segment matching condition A, determining the next voice text segment pair in a first segment pair updating mode.

(4) If the comparison result is chunk (a) ₀ ，b ₀ ) And if the segment matching information does not meet the segment matching condition A, determining the next voice text segment pair in a second segment pair updating mode.

(5) And (3) taking the next voice text segment pair as a new voice text segment pair, and returning to the step (2) until all voice text segments in the first live broadcast room and all voice text segments in the second live broadcast room are traversed to obtain the comparison result of the segment matching information of the plurality of voice text segment pairs and the segment matching condition A.

(6) And (5) obtaining a voice text segment pair sequence based on each voice text segment pair of which the segment matching information determined in the cyclic process of the steps (2) to (5) meets the segment matching condition A.

To further increase accuracy, two checks may be repeated: firstly, checking at (i =0, j = 0) once, after the check is passed, checking the remaining unchecked chunks once again, and after the check is passed twice, determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast aiming at the same content.

In this embodiment, the voice text segments chunk (a) in the first live broadcast room are compared pairwise in time order _i ) And a second live room speech-to-text segment chunk (b) _j ) When chunk (a) _i ) And chunk (b) _j ) And when the preset segment matching condition is met, considering that the two chunks are matched, searching the next chunk pair to obtain a plurality of matched chunk pairs, and forming the matched chunk pairs into a voice text segment pair sequence so as to be convenient for carrying out multi-account simultaneous live broadcast detection on sequence matching information of the voice text segment pair sequence in the follow-up process.

Further, in an exemplary embodiment, in the step S2305, based on a comparison result between segment matching information of the current speech text segment pair and a preset segment matching condition, determining, by using a corresponding segment pair determining manner, a next speech text segment pair of the current speech text segment pair from the multiple speech text segment pairs, where the determining includes:

step S2305A, determining current sequence matching information;

step S2305B, if the comparison result is that the segment matching information of the current voice text segment pair meets the preset segment matching condition, updating the current sequence matching information according to the segment matching information of the current voice text segment pair to obtain updated sequence matching information;

step S2305C, when the updated sequence matching information does not satisfy the sequence matching condition, determining a next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs in a manner that both speech text segments are updated.

In a specific implementation, referring to fig. 3, if the comparison result is an initial speech text segment pair chunk (a) ₀ ，b ₀ ) If the segment matching information satisfies the segment matching condition A, the method is based on chunk (a) ₀ ，b ₀ ) The initial sequence matching information S is updated by the segment matching information to obtain updated sequence matching information S, whether the updated sequence matching information S meets the sequence matching condition is further judged, namely whether the average difference information of each voice text segment pair in the updated sequence matching information S is smaller than a first threshold value or not and whether the number of the matched voice text segment pairs is larger than a second threshold value or not are judged. And under the condition that the updated sequence matching information meets the sequence matching condition, judging that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast aiming at the same content. Otherwise, determining the next voice text segment pair chunk (a) by updating both voice text segments in the initial voice text segment pair ₁ ，b ₁ ）。

In an exemplary embodiment, in the step S2305, determining, in a corresponding segment pair determining manner, a next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs based on a comparison result between the segment matching information of the current speech text segment pair and a preset segment matching condition, further includes:

step S2305D, if the comparison result is that the segment matching information of the current voice text segment pair does not meet the preset segment matching condition, comparing the timestamps of the two voice text segments included in the current voice text segment pair;

step S2305E, determining a next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs by updating the speech text segment with the earlier timestamp.

In specific implementation, referring to fig. 3, if the comparison result is the initial speech text segment pair chunk (a) ₀ ，b ₀ ) If the segment matching information does not satisfy the segment matching condition a, comparing the two voice text segments chunk (a) included in the initial voice text segment pair ₀ ) And chunk (b) ₀ ) The timestamp of (2).

If chunk (a) ₀ ) Is earlier than chunk (b) ₀ ) The timestamp of (a), then the chunk (a) is updated ₀ ) I.e. via the path i = i +1, the updated chunk (a) is determined ₁ ) Then the next speech text segment pair is chunk (a) ₁ ，b ₀ ）。

If chunk (a) ₀ ) Is later than chunk (b) ₀ ) Then the chunk (b) is updated ₀ ) I.e. via the path of j = j +1, the updated chunk (b) is determined ₁ ) Then the next speech text segment pair is chunk (a) ₀ ，b ₁ ）。

In the embodiment, based on the comparison result between the segment matching information of the current speech text segment pair and the preset segment matching condition, the next speech text segment pair is determined in different segment pair updating modes, so that the ordered determination of the next speech text segment pair is ensured.

In an exemplary embodiment, each speech text segment has a corresponding timestamp; before step S230, the method further includes:

step S221, sequencing a plurality of voice text segments of the first live broadcast room and a plurality of voice text segments of the second live broadcast room respectively according to the time stamps to obtain a first voice text segment sequence of the first live broadcast room and a second voice text segment sequence of the second live broadcast room;

step S222, establishing a two-dimensional corresponding matrix between the first voice text segment sequence and the second voice text segment sequence; each element in the two-dimensional correspondence matrix corresponds to a speech-text segment pair.

The step S230, obtaining a speech text segment pair sequence based on the segment matching information and the plurality of speech text segment pairs, further includes:

step S2308, obtaining a state transition relational expression; the state transition relational expression is used for determining the accumulated matching information corresponding to each element, the accumulated matching information of each element is determined based on the accumulated matching information corresponding to the associated element of each element and the segment matching information corresponding to each element, and the timestamp of the voice text segment pair corresponding to the associated element is earlier than the timestamp of the voice text segment pair corresponding to the element;

step S2309, based on the segment matching information and the state transition relational expression, determines a plurality of target elements from each element of the two-dimensional correspondence matrix, and takes a sequence formed by the speech text segment pairs corresponding to the plurality of target elements as a speech text segment pair sequence.

The associated element of one element indicates that the timestamp of the corresponding speech text segment pair is earlier than the timestamp of the speech text segment pair corresponding to the element, and specifically, the associated element may be an upper left diagonal element, an upper element, and a left element of the element.

In a specific implementation, referring to a schematic diagram of a two-dimensional corresponding matrix shown in fig. 4, a vertical coordinate represents a plurality of voice text segments, a, of a first live broadcast room a ₀ 、a ₁ 、…、a _i A first speech text segment sequence representing a first direct play time a ordered from morning to evening according to the time stamp of each speech text segment. The abscissa represents a plurality of speech text segments of a second live broadcast room b, b ₀ 、b ₁ 、…、b _j And a second speech text segment sequence representing a second live broadcast b ordered from morning to evening according to the time stamp of each speech text segment. As shown in fig. 4, each element in the two-dimensional correspondence matrix corresponds to a speech text segment a of the first live broadcast room _i And a speech text segment b of a second live broadcast room _j I.e. each element corresponds to a pair of speech text segments chunk (a) _i ，b _j ). Calculating the matching information of the voice text segment pairs corresponding to each element, obtaining the accumulated matching information corresponding to each element in the two-dimensional corresponding matrix based on the segment matching information and the state transition relational expression, determining a plurality of target elements from each element of the two-dimensional corresponding matrix according to the accumulated matching information corresponding to each element and the segment matching information corresponding to each element, and taking the sequence formed by the voice text segment pairs corresponding to the target elements as the voice text segment pair sequence.

More specifically, the state transition relation may be expressed as:

s(a _i ,b _j )=min{s(a _i-1 ,b _j-1 )+dist(a _i ,b _j ),s(a _i-1 ,b _j )+dist_max,s(a _i ,b _j-1 )+dist_max}

wherein, s (a) _i ,b _j ) Representing elements (a) in a two-dimensional correspondence matrix _i ,b _j ) Min represents the minimum value (a) _i-1 ,b _j-1 )、(a _i-1 ,b _j ) And (a) _i ,b _j-1 ) Are respectively an element (a) _i ,b _j ) 3 associated elements of (a), dist (a) _i ,b _j ) Representing element (a) _i ,b _j ) The segment matching information of (3) may specifically be difference information of segment contents, dist _ max represents difference information of segment contents of two unmatched voice text segment pairs, and may be set to 80% of the length of the voice text segment.

The above relational expression represents an element (a) in a two-dimensional correspondence matrix _i ,b _j ) The accumulated matching information of (a) is the slave s (a) _i-1 ,b _j-1 )+dist(a _i ,b _j )、s(a _i-1 ,b _j ) + dist _ max and s (a) _i ,b _j-1 ) And + dist _ max, the minimum value determined from the three.

Further, in an exemplary embodiment, in the step S2309, based on the segment matching information and the state transition relational expression, a plurality of target elements are determined from each element of the two-dimensional corresponding matrix, and the method specifically includes the following steps:

step S2309A, determining a starting element of the sequence of the voice text segment pairs, and determining a target element corresponding to the starting element from the associated elements of the starting element based on the accumulated matching information of the associated elements of the starting element and the segment matching information of the voice text segment pair corresponding to the starting element;

step S2309B, taking the target element corresponding to the initial element as a new initial element, returning the cumulative matching information based on the associated elements of the initial element and the segment matching information of the voice text segment pair corresponding to the initial element, and determining the target element corresponding to the initial element from the associated elements of the initial element until the obtained new initial element is the earliest element in the two-dimensional corresponding matrix, and determining each obtained target element as a plurality of target elements.

In a specific implementation, referring to the two-dimensional correspondence matrix shown in fig. 4, since the cumulative matching information corresponding to each element is determined based on the cumulative matching information of the associated element whose timestamp precedes the element, a speech text segment pair (a) can be formed by two speech text segments whose timestamps are the latest _i ,b _j ) As a start element, determining a target element corresponding to the start element from the associated elements of the start element based on the cumulative matching information of the 3 associated elements of the start element and the segment matching information of the speech text segment pair corresponding to the start element, and specifically substituting the cumulative matching information of the 3 associated elements of the start element and the segment matching information of the speech text segment pair corresponding to the start element into the state transition relational expression to obtain s (a) _i-1 ,b _j-1 )+dist(a _i ,b _j )、s(a _i-1 ,b _j ) + dist _ max and s (a) _i ,b _j-1 ) And determining the element corresponding to the minimum value as the target element according to the minimum value determined in the + dist _ max. Taking the target element corresponding to the initial element as a new initial element, returning to step S2309A, calculating the target element corresponding to the new initial element again, and so on until the obtained new initial element is the earliest element in the two-dimensional corresponding matrix, that is, the obtained new initial element is the (a) element in the two-dimensional corresponding matrix ₀ ,b ₀ ) Thereby obtaining a plurality of targetsAnd (4) elements.

For example, assuming that i =3 and j =4, the resulting two-dimensional correspondence matrix is shown in fig. 5, where each element connected by arrows represents a target element determined on a per-iteration basis, with the latest time (a) ₃ ，b ₄ ) As a starting element, the following (a) is required ₂ ,b ₃ )、(a ₂ ,b ₄ ) And (a) ₃ ,b ₃ ) To determine (a) ₃ ，b ₄ ) And substituting the corresponding target element into the state transition relational expression to obtain:

s(a ₃ ,b ₄ )=min{s(a ₂ ,b ₃ )+dist(a ₃ ,b ₄ ),s(a ₂ ,b ₄ )+dist_max,s(a ₃ ,b ₃ )+dist_max}

let s (a) ₃ ,b ₃ ) The value of + dist _ max is the minimum, then (a) will be ₃ ,b ₃ ) As a new starting element, further from (a) ₂ ,b ₂ )、(a ₂ ,b ₃ ) And (a) ₃ ,b ₂ ) To determine (a) ₃ ，b ₃ ) Corresponding target element, and so on until the determined new target element is (a) ₀ ，b ₀ ）。

Further, since the calculation of the complete state transition table requires a large number of comparison calculations of the speech text segments, the calculation amount is significantly increased, and therefore, some rules can be added to accelerate the training. For example, optimization can be performed from two aspects, and on one hand, only the accumulated matching information of the diagonal line region in the two-dimensional correspondence matrix is calculated, i.e., the range of i and j is limited, as shown in fig. 6 (a). On the other hand, as shown in fig. 6 (b), chunk (a) and chunk (b) may be divided into a plurality of parts, and the cumulative matching information may be calculated for each diagonal line region of each part.

According to the embodiment, the problem of judging whether two live broadcast rooms belong to multi-account simultaneous live broadcast is converted into the problem of aligning the sequential voice text segments of the two live broadcast rooms, dependence on the time stamps of the voice text segments is eliminated (the time stamps are only used for sequencing the voice text segments), and the problem that correctly matched voice text segment pairs are possibly omitted when the time stamps are inaccurate is solved by adopting a dynamic programming concept. Meanwhile, the dynamic programming is also adopted in consideration of calculating the matching information of the two voice text segment pairs, the scheme is essentially dual dynamic programming, and the accuracy of the determined voice text segment pair sequence is further improved.

In an exemplary embodiment, the sequence matching information includes average difference information of each pair of speech text segments in the sequence of pairs of speech text segments and the number of pairs of speech text segments matched in each pair of speech text segments; the method further comprises the following steps: and under the condition that the average difference information of all the voice text segment pairs is smaller than a first threshold value and the number of the matched voice text segment pairs is larger than a second threshold value, determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast aiming at the same content.

Wherein the average difference information represents an average value of difference information of each voice text segment pair in the voice text segment pair sequence under the segment content dimension.

In a specific implementation, when the speech text segment sequence is determined based on the flow chart shown in fig. 3, the number of the matched speech text segment pairs is equal to the number of the speech text segment pairs in the obtained speech text segment pair sequence, so that the average difference information can be obtained by calculating the average value of the difference information of each speech text segment pair in the speech text segment pair sequence in the segment content dimension.

When determining the sequence of the speech text segments based on the two-dimensional correspondence matrix shown in fig. 4, the number of pairs of matched speech text segments needs to be calculated, which specifically includes: comparing the segment matching information of each voice text segment pair in the voice text segment sequence with a preset segment matching condition so as to determine whether each voice text segment pair belongs to the matched voice text segment pair, and finally calculating the number of the matched voice text segment pairs, wherein under the condition, the number of the matched voice text segment pairs is less than or equal to the number of the voice text segment pairs in the voice text segment pair sequence. And the average difference information of each speech text segment pair is equal to the two-dimensional pairCumulative match information for the latest time stamped element in the matrix, e.g., in FIG. 4, the latest time stamped element is the last element (a) _i ，b _j ）。

In this embodiment, whether the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast for the same content is determined from two dimensions, namely, average difference information of each voice text segment pair in the voice text segment pair sequence and the number of the voice text segment pairs matched in each voice text segment pair, so that validity and accuracy of a detection result can be ensured.

In an exemplary embodiment, in the step S230, the obtaining of the segment matching information of each speech text segment pair may specifically be implemented by the following steps:

step S2301, respectively acquiring segment characteristics of multiple dimensions of two voice text segments in each voice text segment pair;

step S2302, determining matching information of two voice text segments in each dimension based on segment characteristics of multiple dimensions;

step S2303, based on the matching information in each dimension, obtaining segment matching information of the speech text segment pair.

The segment characteristics of multiple dimensions may include segment content, segment length, segment time, and segment sequence number.

In specific implementation, for each voice text segment pair, segment contents, segment lengths, segment times and segment numbers of two voice text segments in the voice text segment pair may be obtained, and differences of the two voice text segments under the segment contents, the segment lengths, the segment times and the segment numbers are obtained as matching information in each dimension, and the matching information under the segment contents, the segment lengths, the segment times and the segment numbers is used to form segment matching information of the voice text segment pair.

The acquisition process of the sequence number of the fragment is as follows: and sequencing the plurality of voice text segments of each live broadcast room in advance according to the time stamps of the voice text segments of each live broadcast room to obtain a voice text segment sequence, so as to obtain the sequence number of each voice text segment.

The matching information under the segment content dimension can be understood as the content difference degree of the two segments, and the segment content matching information can be quantified by the number of unmatched characters.

In this embodiment, the segment matching information of the voice text segment pair is determined by the segment characteristics of the two voice text segments in the voice text segment pair in multiple dimensions, so that the accuracy of the matching result of the voice text segment pair obtained based on the segment matching information can be ensured.

It can be understood that, in practical application, the matching algorithm of the two live broadcast rooms described in the above embodiment needs to be packaged as a service, and a plurality of computing nodes are deployed, and when the simultaneous live broadcast detection is implemented, these computing nodes need to be reasonably scheduled, and pairwise calling services performed in the live broadcast rooms at the same time are detected.

The traditional service design is that each batch of calculation is executed through a 10-minute timing task on 1 container cloud computing node, in order to reduce the calculated amount, live broadcast is divided into two parts, a new broadcast part (recorded as M live broadcast rooms) and a calculation batch front broadcast part (recorded as N live broadcast rooms) within 10 minutes are compared in pairs (calculated amount is M (M-1)/2), and the new broadcast live broadcast and the old live broadcast are compared with each other (calculated amount is M N). However, the method is limited by single machine resources such as a CPU (central processing unit) and bandwidth, the late peak takes 20-40 minutes, so that some live broadcasts with different numbers are not recalled, and some live broadcasts in each calculation batch influence a judgment result because the broadcasting time is short and the number of voice recognition sentences is small. Meanwhile, the subsequent size detection is considered to possibly enlarge the live broadcast coverage, and the bottleneck of single-machine calculation is always reached. Therefore, in order to solve the problems, the present disclosure further provides a concept of time conversion and division and control by space under the condition that there are more live broadcast rooms, each live broadcast room to be detected is distributed to a plurality of computing nodes for computation, on the premise that the computation speed is increased, full live broadcast is performed in each batch and compared with each other, taking the late peak computing batch of 5w live broadcast (1 w is newly added live broadcast) as an example, the computation amount of a single computing node comparison link in two schemes is compared:

1 node, the calculated amount: 10000 × 40000+ (10000 × 9999)/2 = 449,995,000

25 compute nodes, total computation: 50000 (50000-1)/2 = 1249,975,000

Single computing node calculated amount: 1249,975,000/25 = 49,999,000

As can be seen, the calculation amount of a single calculation node is 1/10 of that of the original scheme, and the total calculation amount is 3 times of that of the original scheme, so that if the distribution is uniform enough, the time consumption of a comparison link can be 1/10 of that of the original scheme. And for the problem that the total calculated quantity becomes large, because the logic encapsulation of live broadcast pairwise comparison is served by a cpu rpc (Remote Procedure Call), too many resources are not consumed for capacity expansion.

In an exemplary embodiment, further comprising: under the condition that a plurality of live broadcasting rooms to be detected exist, acquiring the number of the live broadcasting rooms to be detected; determining the number of the calculation nodes based on the number of the live broadcasting rooms to be detected; and dividing the live broadcast room to be detected to each computing node for simultaneous live broadcast detection.

In this embodiment, the time complexity of n × n (n-1)/2 needs to be distributed to m computing nodes, and the time complexity needs to be distributed as evenly as possible, so that the overall task computing time is reduced. If node C is computed in a bubble-ordering-like traversal as shown in FIG. 7 ₁ Respectively with C ₁ To C _m All perform a calculation of C ₂ Are respectively reacted with C ₂ To C _m All carry out calculation once, and so on, the last calculation node C _m Will only react with C _m A calculation is performed, see from C ₁ To C _m The calculated amount is gradually reduced, and the difference is large, so that the calculated amount of the first node far exceeds that of the last node, and the effect of reducing the calculation time of the whole task cannot be achieved.

In the specific implementation, taking m =25 as an example, the following method is adopted to allocate the live broadcast rooms to be detected:

dividing a live broadcast room to be detected into 25 parts by using a live broadcast Id, 1 part of data of each computing node, and carrying out n x (n-1) calculation on 1/25 data of each computing node and calculating the data and data of other nodes. Meanwhile, in order to distribute the calculation evenly, the calculation of a certain data may be distributed to a plurality of calculation nodes. Referring to FIG. 8, compute node number 1 performs a comparison with the data on compute nodes number 1 through 12. Meanwhile, the computing node with the number 24 bears the comparison between the data on the computing node with the number 2 and the data on the computing nodes with the numbers 13 to 24.

And (3) calculating the calculation amount of each node at the moment according to the 5w live broadcast amount:

2000*(2000-1)/2 + 2000*(2000*12) = 49,999,000

consistent with the previously estimated average distribution. It is understood that if m is even (e.g., 26), the computation of each computation node cannot be absolutely averaged, and some computation nodes will have more computation than other computation nodes, when the computation points of each computation node are evenly distributed as a whole.

Further, in an exemplary embodiment, when multiple computing nodes are used to perform simultaneous multi-account live broadcast detection, a central service is required to determine when to start a computing task every 10min of computation. Meanwhile, because the calculation of the same data is distributed in a plurality of calculation nodes, the results need to be merged and processed when all the calculation nodes finish the calculation, so that the design drawing of the detection system for simultaneously live broadcasting of multiple accounts as shown in fig. 9 is also provided in the present disclosure.

As shown in fig. 9, the service of the scheduling center is responsible for task control, including whether a computing task should be newly created, issuing a computation start instruction, monitoring the computation progress of each computing node, and finally processing the computation result. Each computing node starts computing when receiving a computing task instruction, and after computing is completed, the result is written into a Redis (Remote Dictionary Server), and meanwhile, the computing node marks that computing is completed. And the task issuing instruction and the task progress monitoring are also realized by Redis. For example, a String format is used for storing the task ID, the scheduling center is responsible for modifying the key, the key is empty and represents that no calculation task exists currently, and each calculation node starts to execute data calculation corresponding to the ID when the key is not empty. The computational state of each node of the task is stored in a Hash (a Hash function) format, and the key is modified by each node. The dispatching center judges whether all the computing nodes are completely computed or not according to the information, and the computing nodes check whether the computing nodes execute the task or not before executing the task, so that repeated execution is prevented.

Besides optimizing the multi-account simultaneous live broadcast detection algorithm from the chunk comparison angle, the comparison times can also be optimized from the live broadcast room comparison angle. Besides the chunk information, other information exists in the live broadcast rooms, and the two live broadcast rooms can be used for quickly judging whether the two live broadcast rooms are of the size, so that calculation is saved. For example, commodity information of a live broadcast room, the size detection is only needed to be carried out on the live broadcast rooms belonging to the same category during global comparison. The flexible use of the information of the live rooms can obviously reduce the overall contrast times and improve the accuracy.

According to the method provided by the embodiment, in nearly 20 ten thousand live broadcasting rooms each day, the repeated live broadcasting account numbers of more than 17% are searched through the size number duplicate checking service, and the service party can avoid repeated recommendation to the user according to the searched repeated live broadcasting condition.

In an exemplary embodiment, for N live bays that are simultaneously on-air, two-by-two comparisons are made between the live bays based on the chunk generated for the same 10 minute period. Theoretically, nxN/2 times of comparison needs to be completed within 10 minutes, and real-time detection of the size of the live broadcast room can be met. Taking three-field live broadcast A, B, C as an example, the specific detection steps of multi-account simultaneous live broadcast are as follows:

1) And respectively carrying out real-time speech recognition on the live speech to obtain corresponding text information text _ A, text _ B, text _ C, wherein the text information comprises sentences after sentence break and timestamp information of the starting time of the sentences.

2) Then, a text segment which simultaneously falls into the same time interval [ t _ start, t _ end ] is obtained: text _ a _ chunk, text _ B _ chunk, and text _ C _ chunk.

3) For three text segments: and comparing the text _ A _ chunk, the text _ B _ chunk and the text _ C _ chunk pairwise to calculate the text similarity. The text segment similarity is judged to be "multi-account simultaneous live broadcast" when the text segment similarity is higher than the threshold, and the text segment similarity calculation herein includes two aspects:

a) On one hand, the similarity of sentence timestamp sequences in text segments is realized, and because the voices in a plurality of live broadcast rooms of 'multi-account simultaneous live broadcast' are synchronous, the further character content similarity comparison is necessary only between the live broadcast rooms with consistent sentence timestamp sequences;

b) Another aspect is the literal content similarity of sentences in the text at the same timestamp position. In consideration of the influence of voice acquisition difference and data transmission difference of a plurality of live broadcast terminals in the multi-account simultaneous live broadcast on voice recognition results, the voice recognition results between every two live broadcast stations cannot be required to be completely identical, and the multi-account simultaneous live broadcast can be judged as long as the similarity of the text content is higher than a certain threshold value. There are many ways to calculate the similarity of the text content, including but not limited to calculating the word error rate between two pairs of texts.

The method carries out multi-account simultaneous live broadcast detection based on the voice recognition result of the live broadcast room, can obtain more accurate results under the condition of relatively low resource consumption, simultaneously is simple and reliable, is easy to maintain and update, and can flexibly process special conditions in actual scenes.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a part of the steps or stages in other steps.

It is understood that the same/similar parts between the embodiments of the method described above in this specification can be referred to each other, and each embodiment focuses on the differences from the other embodiments, and it is sufficient that the relevant points are referred to the descriptions of the other method embodiments.

Based on the same inventive concept, the embodiment of the present disclosure further provides a multi-account simultaneous live broadcast detection apparatus for implementing the above-mentioned multi-account simultaneous live broadcast detection method.

Fig. 10 is a block diagram illustrating an architecture of a detection apparatus for simultaneous live multi-account according to an exemplary embodiment. Referring to fig. 10, the apparatus includes: a speech recognition unit 1001, a segment pair acquisition unit 1002, a sequence determination unit 1003, and a detection unit 1004, wherein,

a voice recognition unit 1001 configured to perform voice recognition processing on a plurality of voice segments of the first live broadcast room and a plurality of voice segments of the second live broadcast room, respectively, to obtain a plurality of voice text segments of the first live broadcast room and a plurality of voice text segments of the second live broadcast room;

a segment pair obtaining unit 1002, configured to execute a plurality of voice text segments based on the first live broadcast room and a plurality of voice text segments based on the second live broadcast room, so as to obtain a plurality of voice text segment pairs; each voice text segment pair comprises a voice text segment of the first live broadcast room and a voice text segment of the second live broadcast room;

a sequence determining unit 1003 configured to execute acquiring segment matching information of each voice text segment pair, and obtain a voice text segment pair sequence and sequence matching information of the voice text segment pair sequence based on the segment matching information and the plurality of voice text segment pairs;

a detecting unit 1004 configured to perform, in a case that the sequence matching information meets a preset sequence matching condition, determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcasts for the same content.

In an exemplary embodiment, each speech text segment has a corresponding timestamp; a sequence determining unit 1003, further configured to determine a current speech text segment pair from the plurality of speech text segment pairs according to the time stamp; acquiring segment matching information of a current voice text segment pair, and determining a next voice text segment pair of the current voice text segment pair from a plurality of voice text segment pairs by adopting a corresponding segment pair determination mode based on a comparison result between the segment matching information of the current voice text segment pair and a preset segment matching condition; taking the next voice text segment pair as a new current voice text segment pair, and returning to the step of obtaining the segment matching information of the current voice text segment pair until all the voice text segments in the first live broadcast room and all the voice text segments in the second live broadcast room are traversed; and obtaining a voice text segment pair sequence based on each voice text segment pair of which the determined segment matching information meets the segment matching condition.

In an exemplary embodiment, the sequence determining unit 1003 further includes a segment pair determining subunit configured to perform determining current sequence matching information; if the comparison result is that the segment matching information of the current voice text segment pair meets the preset segment matching condition, updating the current sequence matching information according to the segment matching information of the current voice text segment pair to obtain updated sequence matching information; and under the condition that the updated sequence matching information does not meet the sequence matching condition, determining the next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs in a mode that the two voice text segments are updated.

In an exemplary embodiment, the segment pair determining subunit is further configured to compare timestamps of two speech text segments included in the current speech text segment pair if the comparison result indicates that the segment matching information of the current speech text segment pair does not satisfy the preset segment matching condition; and determining the next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs by updating the speech text segment with the earlier time stamp.

In an exemplary embodiment, the apparatus further includes a correspondence matrix establishing module configured to perform, according to the timestamps, sequencing processing on the multiple voice text segments in the first live broadcast room and the multiple voice text segments in the second live broadcast room respectively to obtain a first voice text segment sequence in the first live broadcast room and a second voice text segment sequence in the second live broadcast room; establishing a two-dimensional corresponding matrix between the first voice text fragment sequence and the second voice text fragment sequence; each element in the two-dimensional corresponding matrix corresponds to a voice text segment pair;

a sequence determination unit 1003 further configured to perform acquiring a state transition relational expression; the state transition relational expression is used for determining the accumulated matching information corresponding to each element, the accumulated matching information of each element is determined based on the accumulated matching information corresponding to the associated element of each element and the segment matching information corresponding to each element, and the timestamp of the voice text segment pair corresponding to the associated element is earlier than the timestamp of the voice text segment pair corresponding to the element; and determining a plurality of target elements from each element of the two-dimensional corresponding matrix based on the segment matching information and the state transition relational expression, and taking a sequence formed by the voice text segment pairs corresponding to the target elements as a voice text segment pair sequence.

In an exemplary embodiment, the sequence determining unit 1003 is further configured to perform, based on the segment matching information and the state transition relational expression, obtaining cumulative matching information corresponding to each element in the two-dimensional corresponding matrix; determining a starting element of the voice text segment pair sequence, and determining a target element corresponding to the starting element from the associated elements of the starting element based on the accumulated matching information of the associated elements of the starting element and the segment matching information of the voice text segment pair corresponding to the starting element; and taking the target elements corresponding to the initial elements as new initial elements, returning accumulated matching information based on the associated elements of the initial elements and segment matching information of the voice text segment pairs corresponding to the initial elements, and determining the target elements corresponding to the initial elements from the associated elements of the initial elements until the determined new initial elements are the elements with the earliest time stamps in the two-dimensional corresponding matrix, and determining the obtained target elements as a plurality of target elements.

the detecting unit 1004 is further configured to perform, in a case that the average difference information of each pair of voice text segments is smaller than a first threshold, and the number of the matched pairs of voice text segments is larger than a second threshold, determining that the first live broadcast room and the second live broadcast room belong to multi-account simultaneous live broadcast for the same content.

In an exemplary embodiment, the sequence determining unit 1003 further includes a segment matching subunit configured to perform, for each pair of speech text segments, obtaining segment features of multiple dimensions of two speech text segments in the pair of speech text segments respectively; determining matching information of the two voice text segments under each dimension based on the segment characteristics of the plurality of dimensions; and obtaining segment matching information of the voice text segment pair based on the matching information under each dimension.

In an exemplary embodiment, the apparatus further includes a dividing unit configured to acquire the number of the live broadcast rooms to be detected when there are a plurality of live broadcast rooms to be detected; determining the number of the calculation nodes based on the number of the live broadcasting rooms to be detected; and averagely dividing the live broadcast rooms to be detected to each computing node for simultaneous live broadcast detection.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 11 is a block diagram illustrating an electronic device 1100 for implementing a detection method for multi-account simultaneous live broadcast, according to an example embodiment. For example, the electronic device 1100 may be a server. Referring to fig. 11, electronic device 1100 includes a processing component 1120 that further includes one or more processors, and memory resources, represented by memory 1122, for storing instructions, such as application programs, that are executable by processing component 1120. The application programs stored in memory 1122 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1120 is configured to execute instructions to perform the above-described methods.

The electronic device 1100 may further include: the power component 1124 is configured to perform power management of the electronic device 1100, the wired or wireless network interface 1126 is configured to connect the electronic device 1100 to a network, and the input/output (I/O) interface 1128. The electronic device 1100 may operate based on an operating system stored in memory 1122 such as Window11 11erver, mac O11X, unix, linux, freeB11D, or the like.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as memory 1122 comprising instructions, executable by a processor of electronic device 1100 to perform the above-described method is also provided. The storage medium may be a computer-readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which includes instructions executable by a processor of the electronic device 1100 to perform the above-described method.

It should be noted that the descriptions of the above-mentioned apparatus, the electronic device, the computer-readable storage medium, the computer program product, and the like according to the method embodiments may also include other embodiments, and specific implementations may refer to the descriptions of the related method embodiments, which are not described in detail herein.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A detection method for simultaneous live broadcast of multiple accounts is characterized by comprising the following steps:

acquiring segment matching information of each voice text segment pair, and acquiring a voice text segment pair sequence and sequence matching information of the voice text segment pair sequence based on the segment matching information and the plurality of voice text segment pairs;

2. The method of claim 1, wherein each speech text segment has a corresponding timestamp; obtaining a sequence of speech text segment pairs based on the segment matching information and the plurality of speech text segment pairs, including:

acquiring segment matching information of the current voice text segment pair, and determining a next voice text segment pair of the current voice text segment pair from the plurality of voice text segment pairs by adopting a corresponding segment pair determining mode based on a comparison result between the segment matching information of the current voice text segment pair and a preset segment matching condition;

and obtaining the voice text segment pair sequence based on each voice text segment pair of which the determined segment matching information meets the segment matching conditions.

3. The method according to claim 2, wherein the determining a next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs in a corresponding segment pair determination manner based on a comparison result between the segment matching information of the current speech text segment pair and a preset segment matching condition comprises:

determining current sequence matching information;

4. The method according to claim 2, wherein the determining a next speech text segment pair of the current speech text segment pair from the plurality of speech text segment pairs in a corresponding segment pair determination manner based on a comparison result between the segment matching information of the current speech text segment pair and a preset segment matching condition further comprises:

5. The method of claim 1, wherein each speech text segment has a corresponding timestamp; before obtaining a speech text segment pair sequence based on the segment matching information and the plurality of speech text segment pairs, the method further includes:

determining a plurality of target elements from each element of the two-dimensional corresponding matrix based on the segment matching information and the state transition relational expression, and taking a sequence formed by voice text segment pairs corresponding to the target elements as the voice text segment pair sequence.

6. The method of claim 5, wherein determining a plurality of target elements from each element of the two-dimensional correspondence matrix based on the segment matching information and the state transition relationships comprises:

7. The method according to claim 1, wherein the sequence matching information includes average difference information of each pair of speech text segments in the sequence of pairs of speech text segments and the number of pairs of speech text segments matching each pair of speech text segments; the method further comprises the following steps:

8. The method according to claim 1, wherein the obtaining segment matching information of each speech-text segment pair comprises:

determining matching information of the two voice text segments under each dimension based on the segment features of the plurality of dimensions;

9. The method of claim 1, further comprising:

10. The utility model provides a detection apparatus that many accounts were live simultaneously which characterized in that includes:

11. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of detecting a multi-account simultaneous live broadcast of any one of claims 1 to 9.

12. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of a server, enable the server to perform a method of detecting multi-account simultaneous live broadcast according to any one of claims 1 to 9.