CN117750056A

CN117750056A - Recording and playing detection method and device, electronic equipment and storage medium

Info

Publication number: CN117750056A
Application number: CN202311763241.7A
Authority: CN
Inventors: 张鹏
Original assignee: Baidu com Times Technology Beijing Co Ltd
Current assignee: Baidu com Times Technology Beijing Co Ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-22

Abstract

The disclosure provides a recording and broadcasting detection method, a recording and broadcasting detection device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, in particular to the technical field of natural language processing, voice technology, intelligent searching and live broadcast stream detection. The specific implementation scheme is as follows: responding to the triggering of a periodic task, sampling target live texts included in a text library to obtain a plurality of target live text paragraphs, wherein the target live texts are composed of a plurality of live text paragraphs related to live contents of a target live room in a current live occasion; based on a plurality of live texts included in a text library, recall matching is carried out on a plurality of target live text paragraphs respectively to obtain a plurality of matching results, wherein each live text is composed of a plurality of live text paragraphs related to live contents of each live broadcasting room in each historical live broadcasting occasion; and obtaining a recorded broadcast detection result of the target live broadcast room based on the plurality of matching results.

Description

Recording and playing detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the technical fields of natural language processing, speech technology, intelligent search, and live stream detection.

Background

With the rapid development of the internet and communication technology and the increase of the demand of people for real-time interaction, the live broadcast industry presents a situation of vigorous development in recent years, and the network live broadcast gradually becomes an important channel for the masses to acquire information, entertainment and social interaction.

Disclosure of Invention

The disclosure provides a recording and playing detection method, a recording and playing detection device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a recording and broadcasting detection method, including: responding to the triggering of a periodic task, sampling target live texts included in a text library to obtain a plurality of target live text paragraphs, wherein the target live texts are composed of a plurality of live text paragraphs related to live contents of a target live broadcasting room in a current live broadcasting place; based on a plurality of live texts included in the text library, respectively carrying out recall matching on the target live text paragraphs to obtain a plurality of matching results, wherein each live text is composed of a plurality of live text paragraphs related to live contents of each live broadcasting room in each historical live broadcasting place; and obtaining a recorded broadcast detection result of the target live broadcast room based on the plurality of matching results.

According to another aspect of the present disclosure, there is provided a recording and playing detection apparatus, including: the sampling module is used for responding to the triggering of the periodic task, sampling target live texts included in the text library to obtain a plurality of target live text paragraphs, wherein the target live texts are composed of a plurality of live text paragraphs related to live contents of the target live broadcasting room in the current live broadcasting place; the matching module is used for respectively carrying out recall matching on the target live text paragraphs based on a plurality of live texts included in the text library to obtain a plurality of matching results, wherein each live text is composed of a plurality of live text paragraphs related to live contents of each live broadcasting room in each historical live broadcasting occasion; and the detection module is used for obtaining the recorded broadcast detection result of the target live broadcast room based on the plurality of matching results.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method as described above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 schematically illustrates an exemplary system architecture to which the recording detection methods and apparatuses may be applied, according to embodiments of the present disclosure.

Fig. 2 schematically illustrates a flowchart of a recording detection method according to an embodiment of the present disclosure.

Fig. 3 schematically illustrates a schematic diagram of a live text generation flow according to an embodiment of the present disclosure.

Fig. 4 schematically illustrates a schematic diagram of a recording detection flow according to an embodiment of the disclosure.

Fig. 5 schematically illustrates a schematic diagram of a sampling process flow for target live text according to an embodiment of the present disclosure.

Fig. 6 schematically illustrates a block diagram of a recording and playback detecting apparatus according to an embodiment of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Depending on the live broadcast platform, the anchor can push video streams to a live broadcast source station of the live broadcast platform so as to start network live broadcast in a live broadcast room applied by the anchor. In view of the fact that the live content of the network live broadcast is actively pushed by the host, the platform side cannot intuitively judge whether the video stream pushed by the live broadcast is a live video stream or a recorded video stream only by the video stream. For a live broadcast platform, live broadcast by using a prerecorded video can reduce the content quality of the platform and affect the watching and interaction experience of a user.

In order to detect whether the live broadcast content is recorded broadcast, a scheme is generally adopted in which a key frame of a video is extracted, hash calculation is performed on the key frame to obtain a hash value of the key frame, the hash value of each frame of a historical live broadcast video can be recorded in a database, hash collision is performed in the database by using the hash value of the key frame, and a detection result of whether the live broadcast content is recorded broadcast can be obtained based on a collision result. However, the fingerprint characteristic of the hash value causes small changes in the hash value, such as adding masks, hues, watermarking, etc. to the video, thereby bypassing the detection method described above. For audio information, even if the audio information has fine tuning, the text converted from the audio information is stable and unchanged.

In view of this, an embodiment of the present disclosure provides a recording and broadcasting detection method, which acquires a live text by collecting an audio stream in a live stream, and detects whether a live content of the session is recorded and broadcast by using the live text. Specifically, the recording and broadcasting detection method comprises the following steps: responding to the triggering of a periodic task, sampling target live texts included in a text library to obtain a plurality of target live text paragraphs, wherein the target live texts are composed of a plurality of live text paragraphs related to live contents of a target live room in a current live occasion; based on a plurality of live texts included in a text library, recall matching is carried out on a plurality of target live text paragraphs respectively to obtain a plurality of matching results, wherein each live text is composed of a plurality of live text paragraphs related to live contents of each live broadcasting room in each historical live broadcasting occasion; and obtaining a recorded broadcast detection result of the target live broadcast room based on the plurality of matching results.

It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the recording detection method and apparatus may be applied may include a terminal device, but the terminal device may implement the recording detection method and apparatus provided by the embodiments of the present disclosure without interaction with a server.

As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired and/or wireless communication links, etc., and the network 104 may be represented as a live source station.

The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 101, 102, 103, such as, for example only, live platform class applications, web browser applications, search class applications, instant messaging tools, mailbox clients and/or social platform software, and the like.

The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 105 may be a server that provides various services, such as a background server that provides background services for live platforms.

The recording detection method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the recording and playing detection apparatus provided in the embodiments of the present disclosure may be generally disposed in the server 105. The recording detection method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the recording and playing detection apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing, applying and the like of the personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public order harmony is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

As shown in fig. 2, the method includes operations S210 to S230.

In operation S210, in response to the periodic task being triggered, sampling the target live text included in the text library to obtain a plurality of target live text paragraphs.

In operation S220, recall matching is performed on the multiple target live text paragraphs based on the multiple live texts included in the text library, respectively, to obtain multiple matching results.

In operation S230, a recording and playing detection result of the target live broadcasting room is obtained based on the plurality of matching results.

According to an embodiment of the present disclosure, the periodic task may be a task that is executed in a period, and the task may be used to record and play the live content in the live broadcast room of the live broadcast platform, that is, the periodic task may be triggered once every a fixed duration, so that the electronic device executes the periodic task to record and play the live content in each live broadcast room. The trigger interval duration of the periodic task is not limited herein.

According to embodiments of the present disclosure, a text gallery may be used to record respective live text for all live scenes of all live rooms on a live platform over a period of time, and live audio content for each live scene of each live room may be recorded as one live text of the text gallery. That is, the text gallery may include a plurality of live text, each of which may be associated with live content at each historical live session at each live room. The period of time may be determined by a developer or an operation and maintenance person of the live platform based on the application scene, and may be set to 1 month, 1 week, or the like, for example. In the text gallery, each live text set may generate an identification of the live text based on the identification of the live room and a start time, end time, etc. of the live broadcast to facilitate retrieval of the live text from the text gallery.

According to embodiments of the present disclosure, the text library may be implemented using various databases, or the text library may be implemented based on various search engines, which are not limited herein.

According to embodiments of the present disclosure, each live text may be composed of a plurality of live text paragraphs associated with the live content of each live room at each historical live session. Accordingly, the target live text may be composed of a plurality of live text paragraphs related to live content of the target live room at the current live spot. Each live text passage may be a text conversion of an audio stream received from a live room over a period of time by a live platform. Text conversion of an audio stream may be accomplished, for example, by invoking an ASR (Automatic Spccech Recognition, automatic speech recognition) service, or by other methods of speech-to-text conversion, without limitation.

According to an embodiment of the present disclosure, optionally, a plurality of target live text paragraphs may be sampled from a plurality of live texts in the target live text set in units of "sentences". I.e. each targeted live text passage may contain a fixed number of text sentences.

According to embodiments of the present disclosure, for each target live text paragraph, the matching result of the target live text paragraph may be expressed as whether a text sentence having the same or similar meaning as the target live text paragraph is contained in the text library, or other live text paragraphs having the same or similar semantics. I.e. the matching result may be represented as a match or a mismatch, respectively.

According to the embodiment of the disclosure, the recording and playing detection result of the target live broadcast room can be determined according to the number of the matching results which represent the matching in the plurality of matching results. The recorded broadcast detection result can be expressed as that the live broadcast content of the target live broadcast room is live broadcast or recorded broadcast.

According to the embodiment of the disclosure, in the running process of the target live broadcasting room, the record broadcasting detection can be periodically performed on the live broadcasting content of the target live broadcasting room according to the similarity between the target live broadcasting text of the target live broadcasting room and the recorded live broadcasting texts in the text library, so as to obtain a record broadcasting detection result indicating whether the live broadcasting content of the target live broadcasting room is the record broadcasting. By the text-based detection mode, the cost of recording and broadcasting detection can be effectively reduced, and the accuracy of recording and broadcasting detection can be improved. Based on the recorded broadcast detection result, the live broadcast platform can rapidly detect the live broadcast room with recorded broadcast conditions, so that the content quality of the platform can be indirectly improved, and the viewing experience of a user is improved.

The method shown in fig. 2 is further described below with reference to fig. 3-5 in conjunction with the exemplary embodiment.

According to embodiments of the present disclosure, live text in a live text library may be generated by a real-time text generation task and recorded in the live text library. Under the condition that the live broadcasting room is in an operation state, the live broadcasting text related to the live broadcasting room can be obtained based on the live broadcasting content of the live broadcasting room through the real-time text generation task.

As shown in fig. 3, the live platform may continuously run a real-time text generation task 301. When the live broadcast room 302 is opened and pushes the live broadcast stream to the live broadcast source station 303, the live broadcast platform runs the real-time text generation task 301, and can pull the audio stream in the live broadcast stream of the live broadcast room 302 from the live broadcast source station 303, call the ASR service 304, and perform text conversion on the audio stream to obtain a live broadcast text paragraph of the live broadcast room 302. The live platform may splice a live text paragraph of the live room 302 into live text associated with the live room 302.

According to the embodiment of the present disclosure, the live texts of each live room in the text library 305 at each historical live occasion may be generated by the above live text generation manner, and accordingly, the target live texts may also be generated by the above live text generation manner, which is not described herein.

Alternatively, the real-time text generation task 301 may run after receiving an amount of data of an audio stream, which may be converted into a paragraph of live text, according to embodiments of the present disclosure.

According to the embodiment of the disclosure, after each live broadcasting room starts a live broadcasting for a period of time, the live broadcasting platform can perform recording and broadcasting detection on the live broadcasting room based on the live broadcasting text of the live broadcasting room acquired during the period of time.

As shown in fig. 4, the live platform may be configured with a task trigger for each live room 401 that needs to perform recording detection, and the task triggers may be triggered periodically. When the task trigger is triggered, the live platform may perform a live identification task 402 for the corresponding live room 401.

The recording identification task 402 may include sub-tasks such as sampling point selection, similar text recall, text similarity calculation, recording determination, etc. After completing the recording identification task 402 by using the text library 403, the live broadcast platform may obtain a recording detection result 404 for the live broadcast room 401.

According to the embodiment of the disclosure, recording and playing detection is performed on the target live broadcast room based on the target live broadcast text, recall of similar text paragraphs or sentences can be performed from a text library based on that the target live broadcast text comprises all text paragraphs and text sentences, and then the recall is utilized to obtain the similar text paragraphs or sentences for recording and playing detection. Or alternatively, the target live broadcast text may be sampled, and then the text paragraphs or sentences obtained by sampling are used for recall of similar text paragraphs or sentences and recording and playing detection of the target live broadcast room.

As shown in fig. 5, the live time length of the target live room in the current live time may be T, and the target live text 501 may be obtained by performing text conversion on an audio stream with the time length of T.

According to the embodiment of the disclosure, a plurality of sampling points can be determined in the target live text 501 based on the live time of the target live room in the current live time and the number of preset sampling points, and the target live text 501 is sampled based on the plurality of sampling points, so as to obtain a plurality of target live text paragraphs 502.

According to the embodiment of the present disclosure, the preset number of sampling points may be set by a developer according to a specific application scenario, which is not limited herein.

According to the embodiment of the disclosure, the sampling positions of each of the plurality of sampling points can be randomly determined from the target live text in a random sampling manner. Alternatively, the sampling positions of the sampling points can be determined from the target live text by means of average sampling.

According to the embodiment of the disclosure, specifically, after determining the sampling manner, the positions of the plurality of sampling points on the time axis may be determined by taking time as an axis, so as to determine timestamp information respectively represented by the plurality of sampling points.

For example, the number of preset sampling points is X, and live content with a duration of T is equally divided into live content of each of X time periods, so that the start time and the end time of the live content of each time period can be determined. The time stamp of the start time of each of the X time periods may be used as the time stamp of each of the plurality of sampling points.

According to the embodiment of the disclosure, similarly, the target live text may be bound to a time axis, that is, each character in the target live text may correspond to a moment on the time axis, so that, based on the time stamp information respectively represented by the plurality of sampling points, the sampling positions of the plurality of sampling points in the target live text may be determined.

According to an embodiment of the present disclosure, after determining sampling positions of each of the plurality of sampling points in the target live text, sampling extraction of the live text paragraphs may be performed based on the sampling positions of each of the plurality of sampling points. Alternatively, for each sampling point, a text paragraph containing the sampling position of the sampling point may be used as the target live text paragraph sampled based on the sampling point. Alternatively, for each sampling point, a complete text paragraph that is located before the sampling point or after the sampling point and does not include the sampling point may be used as the target live text paragraph sampled based on the sampling point. Each target live paragraph may contain a fixed number of text sentences, and the number of text sentences specifically contained may be determined according to a specific application scenario, which is not limited herein. The text sentence may be divided using preset identifiers. For example, when the text sentence is a chinese text sentence, the preset identifier may be. "? ", I! "etc., are not limited herein.

For example, as shown in fig. 5, the sampling positions of each of the X sampling points in the target live text may be taken as a starting point, based on a preset identifier. "? "and" ≡! "a plurality of text sentences which are respectively related to X sampling points are extracted from the target live text, namely a plurality of text sentences which are positioned behind the sampling point and do not contain the sampling point. And obtaining X target live text paragraphs based on the text sentences related to the X sampling points.

According to the embodiment of the disclosure, after sampling of a plurality of sampling points is completed and a plurality of target live text paragraphs are obtained, a plurality of live texts included in a text library can be used as a data set, and text matching can be performed on each target live text paragraph. The recall matching process of a single target live text paragraph is described below by taking the processing process of the target live text paragraph as an example.

According to the embodiment of the disclosure, the matching result can be obtained by matching the target live text paragraph with each text sentence of each of the plurality of live texts. For example, based on various text distance algorithms, the text distance between the target live text paragraph and each text sentence of each of the plurality of live texts can be calculated, and a matching result can be obtained based on the magnitude relation between the plurality of text distances and the set distance threshold. That is, when the plurality of text distances are all larger than the set distance threshold, a matching result indicating a mismatch can be obtained, and when at least one text record exists in the plurality of text distances smaller than the set distance threshold, a matching result indicating a match can be obtained.

According to the embodiment of the disclosure, since the target live text paragraph can comprise a plurality of text sentences, when the live text paragraph is utilized to match with the text sentences, a large error exists in a similarity matching result due to a large difference in information quantity between the live text paragraph and the text sentence. As an alternative embodiment, recall of similar text paragraphs may be performed in a text library using a text statement in the target live text paragraph.

For example, each target live text paragraph may include a target text sentence, and recall matching is performed on the multiple target live text paragraphs based on multiple live texts included in the text library, so as to obtain multiple matching results, which may include the following operations:

recall, from the plurality of live text, a plurality of similar text paragraphs associated with each target live text paragraph based on the target text sentence included in each target live text paragraph; and respectively carrying out text matching on each target live text paragraph and a plurality of similar text paragraphs related to each target live text paragraph to obtain a matching result of each target live text paragraph.

According to the embodiment of the disclosure, when a plurality of similar text paragraphs are recalled from a plurality of live texts by using a target text sentence, one text sentence in the target live text paragraph can be used for recalling the similar text in a text library to determine a plurality of similar text sentences, and then the similar text sentence is complemented into the similar text paragraph by using the text sentences adjacent to the similar text sentence.

According to an embodiment of the present disclosure, recall, from a plurality of live text, a plurality of similar text paragraphs associated with each target live text paragraph based on target text sentences included in each target live text paragraph may include the operations of:

performing keyword matching on target text sentences included in each target live text paragraph and a plurality of text sentences included in each live text paragraph respectively to obtain matching attribute values of the text sentences included in each live text; determining a plurality of similar text sentences related to each target live text paragraph from the plurality of text sentences included in each of the plurality of live texts based on the matching attribute values of each of the plurality of text sentences included in each of the plurality of live texts; and recall a plurality of similar text paragraphs associated with each target live text paragraph from a plurality of live text included in each of the plurality of live text based on respective locations of the plurality of similar text sentences associated with each target live text paragraph in the plurality of live text.

According to an embodiment of the present disclosure, each of a plurality of text sentences included in the text library may contain one or more keywords, and similarly, keyword extraction may be performed on a target text sentence to obtain one or more target keywords. The one or more target keywords may be matched with one or more keywords included in each of the plurality of text sentences, respectively. Based on the number of keywords that the one or more target keywords match with the one or more keywords included in each text sentence, a match attribute value for each text sentence may be determined. Based on the size of the matching attribute value of each of the plurality of text sentences, a plurality of text sentences with larger matching attribute values can be selected from the plurality of text sentences to serve as the plurality of similar text sentences.

According to an embodiment of the present disclosure, a first text sentence of a plurality of text sentences included in a target live text paragraph may be determined as a target text sentence. Accordingly, the similar text sentence determined from the live text can also be used as the first text sentence of the similar text paragraph to be recalled, and the similar text paragraph can be extracted from the live text by taking the similar text sentence as the first text sentence based on the number of the text sentences contained in the target text paragraph.

According to an embodiment of the present disclosure, optionally, it may also be determined that, of a plurality of text sentences included in the target live text paragraph, other text sentences other than the first text sentence are used as the target text sentence. Similarly, a similar text paragraph may be extracted from the live text based on the location of the target text sentence in the target live text paragraph, based on the similar text sentence, which is not described herein.

According to an embodiment of the present disclosure, text matching is performed on each target live text paragraph and a plurality of similar text paragraphs related to each target live text paragraph, so as to obtain a matching result of each target live text paragraph, which may include the following operations:

Calculating the text distance between each target live text paragraph and a plurality of similar text paragraphs related to each target live text paragraph respectively to obtain the similarity between each target live text paragraph and the plurality of similar text paragraphs related to each target live text paragraph; and obtaining a matching result of each target live text paragraph based on the similarity between each target live text paragraph and each of the plurality of similar text paragraphs related to each target live text paragraph.

According to embodiments of the present disclosure, text distance may include euclidean distance, pre-distance, hamming distance, editing distance, etc., without limitation herein. The text distance between two text paragraphs may be calculated using a text distance calculation method or a kit corresponding to the text distance, which will not be described in detail herein.

According to the embodiment of the disclosure, the text distance between two text paragraphs may be subjected to data conversion to obtain the similarity between the two text paragraphs, for example, the text distance between the two text paragraphs is D, the text distance may be subjected to normalization processing, and the value obtained after the normalization processing may be represented as the similarity between the two text paragraphs. Alternatively, the text distance between two text paragraphs may be converted to a similarity between the two text paragraphs using rules corresponding to the text distance calculation method used. Taking the text distance as an editing distance as an example, for the target live text paragraph a and the similar text paragraph B, the word number of a longer text paragraph in the target live text paragraph a and the similar text paragraph B is a, and the editing distance between the target live text paragraph a and the similar text paragraph B is B, so that the similarity between the target live text paragraph a and the similar text paragraph B is (a-B)/a can be obtained.

According to an embodiment of the present disclosure, for each target live text paragraph, it may be determined that a matching result of each target live text paragraph represents a matching failure in a case where a similarity between each target live text paragraph and each of a plurality of similar text paragraphs related to each target live text paragraph is smaller than a first preset threshold. The matching result of each target live text paragraph may be determined to be indicated as successful in matching if the similarity between each target live text paragraph and each of the plurality of similar text paragraphs associated with each target live text paragraph is greater than a first preset threshold.

According to an embodiment of the present disclosure, the matching result of each target live text passage represents the matching result of each sampling point. The recording and broadcasting detection result of the target live broadcasting room can be obtained according to the number of sampling points which are represented as successful in matching by the sampling result.

For example, the recording and playing detection attribute value of the target live broadcasting room can be determined based on the number of the matching results which represent successful matching in the plurality of matching results; and obtaining a recording and broadcasting detection result of the target live broadcasting room based on the recording and broadcasting detection attribute value of the target live broadcasting room.

According to an embodiment of the present disclosure, the calculated recording detection attribute value may be a value between 0 and 1. Aiming at the recording and broadcasting detection result, a developer can realize the decision based on the recording and broadcasting detection result by configuring a plurality of different preset thresholds and comparing the size relation between the recording and broadcasting detection attribute value and the plurality of preset thresholds.

For example, it may be determined that the recording detection result of the target live room indicates that the recording detection is passed if the recording detection attribute value of the target live room is less than or equal to a second preset threshold; and determining that the recording detection result of the target live broadcasting room is not passed by the recording detection under the condition that the recording detection attribute value of the target live broadcasting room is larger than a second preset threshold value.

According to an embodiment of the disclosure, the magnitude of the second preset threshold may be set by a developer based on a specific application scenario, which is not limited herein.

According to the embodiment of the disclosure, optionally, a third preset threshold may be further set, where the magnitude of the third preset threshold may be set by a developer based on a specific application scenario, as long as the third preset threshold is satisfied to be greater than the second preset threshold, which is not limited herein. The identification information related to the target live broadcasting room can be sent to the auditing background under the condition that the recorded broadcasting detection attribute value of the target live broadcasting room is larger than a second preset threshold value and smaller than or equal to a third preset threshold value; and processing the target live broadcasting room based on the recording and broadcasting processing strategy under the condition that the recording and broadcasting detection attribute value of the target live broadcasting room is larger than a third preset threshold value.

According to embodiments of the present disclosure, the audit background may be a manual processing background that may be maintained by the operation and maintenance personnel of the live platform. Namely, if the recording and broadcasting detection attribute value of the target live broadcasting room is larger than the second preset threshold value and smaller than or equal to the third preset threshold value, the operation and maintenance personnel can be notified, and the operation and maintenance personnel can manually judge whether the target live broadcasting room is recording and broadcasting.

In accordance with embodiments of the present disclosure, the recording processing policy may include a series of offending processing actions that are checkpointed to by the registered user upon registration with the live room, including but not limited to shutting down the live room, restricting the hosting behavior, etc. The recording and broadcasting processing strategies can be processing strategies which acquire the authorization or consent of the user and accord with the regulations of related laws and regulations without violating the popular regulations.

According to the embodiment of the disclosure, by using the recording and broadcasting detection method, recording and broadcasting can be detected at lower cost and faster speed, and the method is non-invasive to a live platform system and has lower implementation and updating cost.

According to the embodiment of the disclosure, optionally, the recording and playing detection method as described above may also be used for originality detection of various audios and videos, and the live texts in the text library may be replaced by texts that may contain various audios and videos, and the target live texts may be replaced by texts obtained by converting the audios and videos that need to be detected, which is not described herein.

As shown in fig. 6, the recording and playing detection apparatus 600 may include a sampling module 610, a matching module 620, and a detection module 630.

And the sampling module 610 is configured to sample, in response to the triggering of the periodic task, a target live text included in the text library to obtain a plurality of target live text paragraphs, where the target live text is formed by a plurality of live text paragraphs related to live content of the target live room in a current live scene.

And a matching module 620, configured to recall and match a plurality of target live text paragraphs based on a plurality of live texts included in the text library, respectively, to obtain a plurality of matching results, where each live text is composed of a plurality of live text paragraphs related to live content of each live broadcast room in each historical live broadcast.

The detection module 630 is configured to obtain a recording and playing detection result of the target live broadcast room based on the multiple matching results.

According to an embodiment of the present disclosure, the sampling module 610 includes a first sampling sub-module and a second sampling sub-module.

The first sampling sub-module is used for determining a plurality of sampling points in the target live broadcast text based on the live broadcast duration of the target live broadcast room in the current live broadcast field and the number of preset sampling points.

And the second sampling sub-module is used for sampling the target live text based on the plurality of sampling points to obtain a plurality of target live text paragraphs.

According to an embodiment of the present disclosure, the second sampling submodule includes a first sampling unit, a second sampling unit, and a third sampling unit.

And the first sampling unit is used for determining the sampling position of each of the plurality of sampling points in the target live broadcast text based on the time stamp information respectively represented by the plurality of sampling points.

And the second sampling unit is used for taking the sampling positions of the sampling points in the target live broadcast text as starting points and extracting a plurality of text sentences related to the sampling points from the target live broadcast text based on the preset identifier.

And the third sampling unit is used for obtaining a plurality of target live text paragraphs based on a plurality of text sentences respectively related to the plurality of sampling points.

According to an embodiment of the present disclosure, each target live text paragraph includes a target text sentence.

According to an embodiment of the present disclosure, the matching module 620 includes a first matching sub-module and a second matching sub-module.

And the first matching sub-module is used for recalling a plurality of similar text paragraphs related to each target live text paragraph from a plurality of live texts based on the target text sentences included in each target live text paragraph.

And the second matching sub-module is used for respectively carrying out text matching on each target live text paragraph and a plurality of similar text paragraphs related to each target live text paragraph to obtain a matching result of each target live text paragraph.

According to an embodiment of the present disclosure, the first matching sub-module includes a first matching unit, a second matching unit, and a third matching unit.

And the first matching unit is used for matching the target text sentence included in each target live text paragraph with the text sentences included in the live texts respectively to obtain the matching attribute values of the text sentences included in the live texts respectively.

And a second matching unit for determining a plurality of similar text sentences related to each target live text paragraph from the plurality of text sentences included in each of the plurality of live texts based on the matching attribute values of the plurality of text sentences included in each of the plurality of live texts.

And a third matching unit, configured to recall, from a plurality of live texts included in each of the plurality of live texts, a plurality of similar text paragraphs related to each target live text paragraph based on positions of each of the plurality of similar text sentences related to each target live text paragraph.

According to an embodiment of the present disclosure, the second matching sub-module comprises a fourth matching unit and a fifth matching unit.

And the fourth matching unit is used for calculating the text distance between each target live text paragraph and a plurality of similar text paragraphs related to each target live text paragraph respectively to obtain the similarity between each target live text paragraph and each similar text paragraph related to each target live text paragraph.

And a fifth matching unit, configured to obtain a matching result of each target live text paragraph based on a similarity between each target live text paragraph and each of a plurality of similar text paragraphs related to each target live text paragraph.

According to an embodiment of the present disclosure, the fifth matching unit includes a first matching subunit and a second matching subunit.

And the first matching subunit is used for determining that the matching result of each target live text paragraph is represented as matching failure under the condition that the similarity between each target live text paragraph and each of a plurality of similar text paragraphs related to each target live text paragraph is smaller than a first preset threshold value.

And the second matching subunit is used for determining that the matching result of each target live text paragraph is represented as successful in matching when the similarity between each target live text paragraph and each of a plurality of similar text paragraphs related to each target live text paragraph is larger than a first preset threshold.

According to an embodiment of the present disclosure, the recording and playing detection apparatus 600 further includes a determining module.

And the determining module is used for determining a first text sentence in the plurality of text sentences included in the target live text paragraph as a target text sentence.

According to an embodiment of the present disclosure, the detection module 630 includes a first detection sub-module and a second detection sub-module.

The first detection sub-module is used for determining the record and broadcast detection attribute value of the target live broadcast room based on the number of the matching results which show successful matching in the plurality of matching results.

And the second detection sub-module is used for obtaining the recording and broadcasting detection result of the target live broadcasting room based on the recording and broadcasting detection attribute value of the target live broadcasting room.

According to an embodiment of the present disclosure, the second detection sub-module comprises a first detection unit and a second detection unit.

The first detection unit is used for determining that the recording and broadcasting detection result of the target live broadcasting room is indicated to pass the recording and broadcasting detection under the condition that the recording and broadcasting detection attribute value of the target live broadcasting room is smaller than or equal to a second preset threshold value.

And the second detection unit is used for determining that the recording and broadcasting detection result of the target live broadcasting room is not passed through the recording and broadcasting detection under the condition that the recording and broadcasting detection attribute value of the target live broadcasting room is larger than a second preset threshold value.

According to an embodiment of the present disclosure, the recording and playing detection apparatus 600 further includes a third detection unit and a fourth detection unit

And the third detection unit is used for sending the identification information related to the target live broadcasting room to the auditing background under the condition that the recorded broadcasting detection attribute value of the target live broadcasting room is larger than the second preset threshold value and smaller than or equal to the third preset threshold value.

And the fourth detection unit is used for processing the target live broadcasting room based on the recording and broadcasting processing strategy under the condition that the recording and broadcasting detection attribute value of the target live broadcasting room is larger than the third preset threshold value.

According to an embodiment of the present disclosure, the recording and playing detection apparatus 600 further includes a processing module.

And the processing module is used for obtaining the live text related to the live broadcasting room based on the live broadcasting content of the live broadcasting room under the condition that the live broadcasting room is in an operation state.

According to an embodiment of the present disclosure, a processing module includes a first processing unit, a second processing unit, and a third processing unit.

And the first processing unit is used for pulling the audio stream of the live broadcasting room from the live broadcasting source station.

And the second processing unit is used for carrying out text conversion on the audio stream to obtain a live text paragraph of the live broadcasting room.

And the third processing unit is used for splicing the live text paragraphs of the live broadcasting room into the live broadcasting text related to the live broadcasting room.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.

According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.

FIG. 7 illustrates a schematic block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the apparatus 700 includes a computing unit 701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 may also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in device 700 are connected to an input/output (I/O) interface 705, including: an input unit 706 such as a keyboard, a mouse, etc.; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, an optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 701 performs the respective methods and processes described above, such as the recording and playback detection method. For example, in some embodiments, the recording detection method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 700 via ROM 702 and/or communication unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the recording detection method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the recording detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A recording and broadcasting detection method comprises the following steps:

responding to the triggering of a periodic task, sampling target live texts included in a text library to obtain a plurality of target live text paragraphs, wherein the target live texts are composed of a plurality of live text paragraphs related to live contents of a target live room in a current live place;

based on a plurality of live texts included in the text library, carrying out recall matching on the target live text paragraphs respectively to obtain a plurality of matching results, wherein each live text is composed of a plurality of live text paragraphs related to live contents of each live broadcasting room in each historical live broadcasting place; and

And obtaining a recorded broadcast detection result of the target live broadcast room based on the plurality of matching results.

2. The method of claim 1, wherein the sampling the target live text included in the text library to obtain a plurality of target live text paragraphs includes:

determining a plurality of sampling points in the target live broadcast text based on the live broadcast time length of the target live broadcast room in the current live broadcast time and the number of preset sampling points; and

and sampling the target live text based on the plurality of sampling points to obtain a plurality of target live text paragraphs.

3. The method of claim 2, wherein the sampling the target live text based on the plurality of sampling points to obtain the plurality of target live text paragraphs, comprises:

determining sampling positions of the sampling points in the target live text based on the time stamp information respectively represented by the sampling points;

taking the sampling positions of the sampling points in the target live broadcast text as starting points, and extracting a plurality of text sentences related to the sampling points from the target live broadcast text based on a preset identifier; and

And obtaining the target live text paragraphs based on the text sentences related to the sampling points.

4. A method according to any one of claims 1 to 3, wherein each of the target live text paragraphs comprises a target text sentence;

the recall matching is performed on the target live text paragraphs based on the live texts included in the text library, so as to obtain a plurality of matching results, including:

recall, from the plurality of live text, a plurality of similar text paragraphs associated with each target live text paragraph based on target text sentences included in each target live text paragraph; and

and respectively carrying out text matching on each target live text paragraph and a plurality of similar text paragraphs related to each target live text paragraph to obtain a matching result of each target live text paragraph.

5. The method of claim 4, wherein recalling a plurality of similar text paragraphs associated with each target live text paragraph from the plurality of live text based on the target text statement included in each target live text paragraph comprises:

Performing keyword matching on target text sentences included in each target live text paragraph and a plurality of text sentences included in each live text paragraph respectively to obtain matching attribute values of the text sentences included in each live text;

determining a plurality of similar text sentences related to each target live text paragraph from the plurality of text sentences respectively included in the plurality of live texts based on the matching attribute values of the plurality of text sentences respectively included in the plurality of live texts; and

based on the respective locations of the plurality of similar text sentences associated with each target live text paragraph in the plurality of live text, a plurality of similar text paragraphs associated with each target live text paragraph are recalled from a plurality of live text included in the respective plurality of live text paragraphs.

6. The method of claim 4, wherein the text matching each target live text passage with a plurality of similar text passages associated with each target live text passage, respectively, to obtain a matching result of each target live text passage, includes:

calculating the text distance between each target live text paragraph and a plurality of similar text paragraphs related to each target live text paragraph respectively to obtain the similarity between each target live text paragraph and the plurality of similar text paragraphs related to each target live text paragraph; and

And obtaining a matching result of each target live text paragraph based on the similarity between each target live text paragraph and each of a plurality of similar text paragraphs related to each target live text paragraph.

7. The method of claim 6, wherein the obtaining the matching result of each target live text passage based on the similarity between each target live text passage and each of the plurality of similar text passages associated with each target live text passage, comprises:

determining that the matching result of each target live text paragraph is represented as matching failure under the condition that the similarity between each target live text paragraph and each of a plurality of similar text paragraphs related to each target live text paragraph is smaller than a first preset threshold value; and

and determining that the matching result of each target live text paragraph is represented as successful matching under the condition that the similarity between each target live text paragraph and each of a plurality of similar text paragraphs related to each target live text paragraph is larger than the first preset threshold value.

8. The method of claim 4, further comprising:

and determining a first text sentence in a plurality of text sentences included in the target live text paragraph as the target text sentence.

9. The method according to claim 1 or 7, wherein the obtaining the recording detection result of the target live room based on the plurality of matching results includes:

determining a recording and broadcasting detection attribute value of the target live broadcasting room based on the number of the matching results which represent successful matching in the plurality of matching results; and

and obtaining a recording and broadcasting detection result of the target live broadcasting room based on the recording and broadcasting detection attribute value of the target live broadcasting room.

10. The method of claim 9, wherein the obtaining the recording detection result of the target live room based on the recording detection attribute value of the target live room includes:

determining that the recording and broadcasting detection result of the target live broadcasting room is indicated to pass the recording and broadcasting detection under the condition that the recording and broadcasting detection attribute value of the target live broadcasting room is smaller than or equal to a second preset threshold value; and

and under the condition that the recording detection attribute value of the target live broadcasting room is larger than the second preset threshold value, determining that the recording detection result of the target live broadcasting room is not passed through recording detection.

11. The method of claim 10, further comprising:

transmitting identification information related to the target live broadcasting room to an auditing background under the condition that the recorded broadcasting detection attribute value of the target live broadcasting room is larger than the second preset threshold value and smaller than or equal to a third preset threshold value; and

And processing the target live broadcasting room based on a recording and broadcasting processing strategy under the condition that the recording and broadcasting detection attribute value of the target live broadcasting room is larger than the third preset threshold value.

12. The method of claim 1, further comprising:

and under the condition that the live broadcasting room is in an operation state, acquiring a live broadcasting text related to the live broadcasting room based on the live broadcasting content of the live broadcasting room.

13. The method of claim 12, wherein the obtaining, based on the live content of the live room, live text related to the live room comprises:

pulling the audio stream of the live broadcast room from a live broadcast source station;

performing text conversion on the audio stream to obtain a live text paragraph of the live broadcasting room; and

and splicing the live text paragraphs of the live broadcasting room into live broadcasting texts related to the live broadcasting room.

14. A recording and playing detection device, comprising:

the sampling module is used for responding to the triggering of the periodic task, sampling target live texts included in the text library to obtain a plurality of target live text paragraphs, wherein the target live texts are composed of a plurality of live text paragraphs related to live contents of the target live broadcasting room in the current live broadcasting place;

The matching module is used for respectively carrying out recall matching on the target live text paragraphs based on a plurality of live texts included in the text library to obtain a plurality of matching results, wherein each live text is composed of a plurality of live text paragraphs related to live contents of each live broadcasting room in each historical live broadcasting occasion; and

and the detection module is used for obtaining the recorded broadcast detection result of the target live broadcast room based on the plurality of matching results.

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-13.