RU2801541C1

RU2801541C1 - Method and device for content checking model learning, method and device for video content checking, computing device and storage device

Info

Publication number: RU2801541C1
Application number: RU2022114373A
Authority: RU
Inventors: Фэн ШИ; Чжэньцян ЛЮ
Original assignee: Биго Текнолоджи Пте. Лтд.
Priority date: 2019-10-31
Filing date: 2020-08-06
Publication date: 2023-08-10

Abstract

FIELD: video image recognition.

SUBSTANCE: group of inventions can be used for moderating video materials. The video content moderation method comprises extracting a part of the video data from the video file being checked as the target video data; establishing a point in time in the checked video data of the checked video file in the case when the checked video data comprises invalid content; and extracting a significant portion of the video data from the video data in the area of this point in time; and moderating the content of the video file to be checked by inputting the portion of video data and the video data to be checked into a predefined video content moderation model; moreover, extracting a significant portion of video data from video data in the area of this point in time involves: determining the time range that comprises this point in time; searching for a significant area detection model designed to determine a significant area in the video data; and detecting a significant portion of the video data in the video data by inputting the video data within the time range into the significant portion detection model.

EFFECT: increased efficiency of video material moderation.

17 cl, 9 dwg

Description

Перекрестная ссылка на родственные заявкиCross-reference to related applications

[0001] Согласно настоящей заявке испрашивается приоритет согласно заявке на патент Китайской Народной Республики №201911051711.0, поданной 31 октября 2019 г., содержание которой полностью включено в настоящий документ посредством ссылки.[0001] The present application claims priority under People's Republic of China Patent Application No. 201911051711.0, filed October 31, 2019, the contents of which are incorporated herein by reference in their entirety.

Область техники, к которой относится настоящее изобретениеThe field of technology to which the present invention relates

[0002] Настоящее раскрытие относится к области технологий модерации видеоматериалов и, в частности, относится к способу и устройству для обучения модели модерации контента, способу и устройству модерации видеоконтента, и вычислительному устройству и запоминающему устройству.[0002] The present disclosure relates to the field of video content moderation technologies, and specifically relates to a method and apparatus for training a content moderation model, a method and apparatus for moderating video content, and a computing device and storage device.

Предшествующий уровень техники настоящего изобретенияBackground of the Invention

[0003] В последние годы с развитием интернет-технологий трафик видеоматериалов в Интернете резко возрастает, и пользовательский контент (UGC), например, короткие видеоматериалы, онлайн трансляции и т.п., приводит к тому, что Интернет все более насыщается видеоматериалами.[0003] In recent years, with the development of Internet technology, video traffic on the Internet has increased dramatically, and user-generated content (UGC), such as short videos, live broadcasts, and the like, is causing the Internet to become more and more saturated with video.

[0004] Также в этой ситуации создают и пытаются распространять в Интернете множество видеоматериалов с недопустимым контентом, например, видеоматериалы терроризма, насилия, порнографии, азартных игр и т.п.[0004] Also in this situation, many video materials with inappropriate content are created and attempted to be distributed on the Internet, for example, videos of terrorism, violence, pornography, gambling, and the like.

[0005] Поэтому до или после публикации видеоматериала контент этого видеоматериала необходимо модерировать и отфильтровывать видеоматериалы с недопустимым контентом.[0005] Therefore, before or after the publication of video material, the content of this video material must be moderated and video materials with inappropriate content should be filtered out.

[0006] В одном варианте видеоресурс, загруженный в Интернет, модерируют вручную, чтобы определить содержит ли он недопустимый контент. Однако при увеличении количества видеоресурсов в Интернете модерация видеоконтента вручную занимает много времени и менее эффективна. Поэтому подходом к решению задачи модерирования видеоконтента может быть подход машинного обучения. Согласно этому подходу, если видеоданные (кадр изображения) в обучающем видеоматериале являются недопустимыми, эти видеоданные вначале помечают как относящиеся к категории нарушения, а затем видеоданные и соответствующую категорию недопустимого контента вводят в модель машинного обучения для обучения, и при помощи обученной модели выявляют другой видеоконтент.[0006] In one embodiment, a video resource uploaded to the Internet is manually moderated to determine if it contains inappropriate content. However, as the number of video resources on the Internet increases, manual moderation of video content is time-consuming and less efficient. Therefore, the approach to solving the problem of moderating video content can be a machine learning approach. According to this approach, if the video data (image frame) in the training video material is invalid, this video data is first marked as belonging to the violation category, and then the video data and the corresponding category of invalid content are entered into a machine learning model for training, and other video content is detected using the trained model. .

[0007] Однако когда видеоданные для обучения помечают в каждом видеоматериале, количество видеоданных велико, операция присвоения меток может быть трудоемкой, это приводит к низкой эффективности обучения модели и модерирования видеоматериала при помощи этого модуля, а также к повышению затрат на обучение модели из-за присвоения меток вручную.[0007] However, when training video data is tagged in each video, the amount of video data is large, the tagging operation may be laborious, resulting in low efficiency of model training and video moderation with this module, as well as higher model training costs due to labeling manually.

Краткое описание настоящего изобретенияBrief description of the present invention

[0008] Варианты осуществления настоящего изобретения обеспечивают способ и устройство для обучения модели модерации контента, способ и устройство для модерации видеоконтента, и вычислительное устройство и запоминающее устройство, с тем чтобы решить проблему низкой эффективности обучения модели и модерации видеоматериала при помощи этой модели, и высоких затрат на обучение модели, связанных с присвоением меток видеоданным вручную.[0008] Embodiments of the present invention provide a method and apparatus for training a content moderation model, a method and apparatus for moderating video content, and a computing device and a storage device, so as to solve the problem of low model training and video moderation efficiency with this model, and high model training costs associated with manual labeling of video data.

[0009] Предлагается способ обучения модели модерации контента. Способ предусматривает:[0009] A method for training a content moderation model is provided. The method provides:

[0010] извлечение части видеоданных из образца видеофайла в качестве образца видеоданных;[0010] extracting a portion of the video data from the sample video file as the sample video data;

[0011] установление момента времени в образце видеоданных из образца видеофайла в случае, когда образец видеоданных содержит недопустимый контент;[0011] establishing a point in time in the video data sample from the video file sample in case the video data sample contains invalid content;

[0012] извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[0012] extracting a significant portion of video data from the video data in the area of this point in time; And

[0013] обучение модели модерации контента, исходя из участка видеоданных и образца видеоданных.[0013] training a content moderation model based on a portion of video data and a sample of video data.

[0014] Также предлагается способ модерации видеоконтента. Способ предусматривает:[0014] A method for moderating video content is also provided. The method provides:

[0015] извлечение части видеоданных из проверяемого видеофайла в качестве целевых видеоданных;[0015] extracting a portion of the video data from the video file being checked as the target video data;

[0016] установление момента времени в проверяемых видеоданных проверяемого видеофайла в случае, когда проверяемые видеоданные содержат недопустимый контент;[0016] establishing a point in time in the checked video data of the checked video file in the case when the checked video data contains invalid content;

[0017] извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[0017] extracting a significant portion of video data from the video data in the area of this point in time; And

[0018] модерация контента проверяемого видеофайла путем ввода участка видеоданных и проверяемых видеоданных в предварительно заданную модель модерации контента.[0018] moderating the content of the video file being checked by inputting the video data portion and the video data being checked into a predefined content moderation model.

[0019] Также предлагается устройство для обучения модели модерации контента. Способ предусматривает:[0019] An apparatus for training a content moderation model is also provided. The method provides:

[0020] модуль извлечения образца видеоданных, рассчитанный на извлечения части видеоданных из образца видеофайла в качестве образца видеоданных;[0020] a video sample extractor for extracting a portion of the video data from the sample video file as a video sample;

[0021] модуль установления момента времени, рассчитанный на установление момента времени в образце видеоданных из образца видеофайла в случае, когда образец видеоданных содержит недопустимый контент;[0021] a time point setting module for establishing a point in time in a video data sample from a video file sample in case the video data sample contains invalid content;

[0022] модуль извлечения участка видеоданных, рассчитанный на извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[0022] a video chunk extractor for extracting a significant chunk of video data from the video data in a region of that point in time; And

[0023] модуль обучения модели, рассчитанный на обучение модели модерации контента, исходя из участка видеоданных и образца видеоданных.[0023] A model training module for training a content moderation model based on a video chunk and a video sample.

[0024] Также предлагается устройство для модерации видеоконтента. Способ предусматривает:[0024] A device for moderating video content is also provided. The method provides:

[0025] модуль извлечения проверяемых видеоданных, рассчитанный на извлечение части видеоданных из проверяемого видеофайла в качестве проверяемых видеоданных;[0025] an inspected video data extraction module for extracting a portion of video data from the inspected video file as inspected video data;

[0026] модуль установления момента времени, рассчитанный на установление момента времени проверяемых видеоданных в проверяемом видеофайле в случае, когда проверяемые видеоданные содержат недопустимый контент;[0026] a time point setting module for setting the time point of the video data to be checked in the video file to be checked when the video data to be checked contains invalid content;

[0027] модуль извлечения участка видеоданных, рассчитанный на извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[0027] a video chunk extractor for extracting a significant chunk of video data from the video data in a region of that point in time; And

[0028] модуль модерации видеоматериала, рассчитанный на модерацию контента проверяемого видеофайла путем ввода участка видеоданных и проверяемых видеоданных в предварительно заданную модель модерации контента.[0028] a video content moderation module for moderating the content of a video file being checked by inputting a portion of the video data and the video data being checked into a predefined content moderation model.

[0029] Также предлагается вычислительное устройство. Вычислительное устройство предусматривает:[0029] A computing device is also provided. The computing device provides:

[0030] один или несколько процессоров;[0030] one or more processors;

[0031] накопительное устройство, рассчитанное на хранение одной или нескольких программ;[0031] a storage device for storing one or more programs;

[0032] при этом запуск одним или несколькими процессорами одной или нескольких программ приводит к выполнению описанного выше способа обучения модели модерации контента или описанного выше способа модерации видеоконтента.[0032] whereby running one or more programs by one or more processors results in the execution of the content moderation model training method described above or the video content moderation method described above.

[0033] Также предлагается машиночитаемое запоминающее устройство. На машиночитаемом запоминающем устройстве хранят компьютерную программу, при этом запуск компьютерной программы процессором вычислительного устройства приводит к тому, что вычислительное устройство выполняет описанный выше способ обучения модели модерации контента или описанный выше способ модерации видеоконтента.[0033] A computer-readable storage device is also provided. A computer program is stored on the computer-readable storage device, wherein the computer program being run by the processor of the computing device causes the computing device to perform the content moderation model learning method described above or the video content moderation method described above.

Краткое описание фигурBrief description of the figures

[0034] На фиг. 1 представлена блок-схема способа обучения модели модерации контента согласно первому варианту осуществления настоящего изобретения.[0034] FIG. 1 is a flowchart of a method for training a content moderation model according to the first embodiment of the present invention.

[0035] На фиг. 2 представлена принципиальная схема обучения модели модерации контента согласно первому варианту осуществления настоящего изобретения.[0035] FIG. 2 is a schematic diagram of training a content moderation model according to the first embodiment of the present invention.

[0036] На фиг. 3 представлена блок-схема способа обучения модели модерации контента согласно второму варианту осуществления настоящего изобретения.[0036] FIG. 3 is a flowchart of a content moderation model training method according to the second embodiment of the present invention.

[0037] На фиг. 4 представлена блок-схема способа модерации видеоконтента согласно третьему варианту осуществления настоящего изобретения.[0037] FIG. 4 is a flowchart of a video content moderation method according to the third embodiment of the present invention.

[0038] На фиг. 5 представлена принципиальная схема модерации видеоконтента согласно третьему варианту осуществления настоящего изобретения.[0038] FIG. 5 is a schematic diagram of video content moderation according to the third embodiment of the present invention.

[0039] На фиг. 6 представлена блок-схема способа модерации видеоконтента согласно четвертому варианту осуществления настоящего изобретения.[0039] FIG. 6 is a flowchart of a video content moderation method according to the fourth embodiment of the present invention.

[0040] На фиг. 7 представлена принципиальная структурная схема устройства для обучения модели модерации контента согласно пятому варианту осуществления настоящего изобретения.[0040] In FIG. 7 is a block diagram of an apparatus for teaching a content moderation model according to a fifth embodiment of the present invention.

[0041] На фиг. 8 представлена принципиальная структурная схема устройства для модерации видеоконтента согласно шестому варианту осуществления настоящего изобретения.[0041] FIG. 8 is a block diagram of a video content moderation apparatus according to a sixth embodiment of the present invention.

[0042] На фиг. 9 представлена принципиальная структурная схема вычислительного устройства согласно седьмому варианту осуществления настоящего изобретения.[0042] FIG. 9 is a schematic block diagram of a computing device according to a seventh embodiment of the present invention.

Подробное раскрытие настоящего изобретенияDetailed disclosure of the present invention

[0043] Ниже представлено описание настоящего изобретения, связанное с прилагаемыми чертежами, и варианты его осуществления. На прилагаемых чертежах показаны не все, а только некоторые конструкции, связанные с настоящим изобретением. Варианты осуществления настоящего изобретения и признаки в вариантах осуществления могут быть объединены друг с другом. Следует понимать, что, хотя термины «первый», «второй», «третий» и т.п. могут быть использованы в настоящем документе для описания различной информации, эта информация не должна ограничиваться этими терминами. Эти термины использованы только для того, чтобы отличать одну категорию информации от другой. Формы единственного числа «а», «an» и «the» включают в себя как единственное, так и множественное число определяемых объектов, если контекст явно не указывает на иное.[0043] The following is a description of the present invention in connection with the accompanying drawings and embodiments thereof. The accompanying drawings show not all, but only some of the structures associated with the present invention. Embodiments of the present invention and features in the embodiments may be combined with each other. It should be understood that although the terms "first", "second", "third", etc. may be used in this document to describe various information, this information should not be limited to these terms. These terms are used only to distinguish one category of information from another. The singular forms "a", "an", and "the" include both the singular and the plural of the entities being defined, unless the context clearly indicates otherwise.

[0044] Первый вариант осуществления[0044] First Embodiment

[0045] На фиг. 1 представлена блок-схема способа обучения модели модерации контента согласно первому варианту осуществления настоящего изобретения. Этот вариант осуществления применим в случаях, когда видеоданные автоматически помечены во времени, пространстве. Способ могут выполнять при помощи устройства для обучения модели модерации контента, и устройство для обучения модели модерации контента может быть реализовано на программном и/или аппаратном обеспечении и может быть включено в конфигурацию вычислительного устройства, например сервера, рабочей станции, персонального компьютера и т.п. Способ предусматривает следующие процедуры.[0045] FIG. 1 is a flowchart of a method for training a content moderation model according to the first embodiment of the present invention. This embodiment is applicable in cases where video data is automatically tagged in time, space. The method may be performed by a content moderation model learning device, and the content moderation model learning device may be implemented in software and/or hardware and may be included in a configuration of a computing device such as a server, workstation, personal computer, etc. . The method includes the following procedures.

[0046] На стадии S101 получают образец видеофайла.[0046] In step S101, a sample video file is obtained.

[0047] Образец видеофайла, будучи видеоматериалом, содержит некоторое количество кадров последовательных видеоданных. В случае, когда последовательная смена видеоданных происходит чаще, чем 24 раза в секунду, согласно принципу инерции зрения, глаз человека не может видеть единое статическое изображение и поэтому визуальный эффект является плавным и непрерывным.[0047] A sample video file, being video material, contains a number of frames of consecutive video data. In the case where the successive change of video data occurs more than 24 times per second, according to the principle of inertia of vision, the human eye cannot see a single static image, and therefore the visual effect is smooth and continuous.

[0048] Согласно этому варианту осуществления, видеофайл могут получать заблаговременно в качестве образца для обучения модели модерации контента путем перехвата файла в сети, накопления видеофайлов, загруженных пользователем, выгрузки видеофайла из опубликованной базы данных и т.п., и образец для обучения модели модерации контента также называют образцом видеофайла.[0048] According to this embodiment, a video file can be obtained in advance as a template for training a content moderation model by intercepting a file in a network, accumulating video files uploaded by a user, uploading a video file from a published database, and the like, and a template for training a moderation model content is also referred to as a sample video file.

[0049] Для различных бизнес-сценариев форматы и виды образцов видеофайлов отличаются, в данном варианте осуществления они не ограничены.[0049] For different business scenarios, the formats and types of sample video files are different, in this embodiment, they are not limited.

[0050] Согласно одному примеру, форматы образца видеофайла могут включать в себя MPEG (формат Экспертной группы по кинематографии), RMVB (формат переменной скорости передачи данных RealMedia), AVI (формат файлов с чередованием аудио и видео), FLV (формат Flash Video) и т.п.[0050] According to one example, sample video file formats may include MPEG (Motion Picture Expert Group Format), RMVB (RealMedia Variable Bit Rate Format), AVI (Audio Video Interleaved File Format), FLV (Flash Video Format) and so on.

[0051] Образец видеофайла может быть, в том числе, в виде короткого видеоматериала, видеоматериала онлайн трансляции, фильма, телевизионного сериала и т.п.[0051] A sample video file may be in the form of a short video, online broadcast video, a movie, a television series, or the like, among others.

[0052] На стадии S102 часть видеоданных из образца видеофайла извлекают в качестве образца видеоданных.[0052] In step S102, a portion of the video data from the sample video file is extracted as the sample video data.

[0053] Согласно этому варианту осуществления, часть видеоданных могут выбирать изо всех видеоданных образца видеофайла в качестве образца видеоданных.[0053] According to this embodiment, a portion of the video data may be selected from all video data of the video file sample as the video data sample.

[0054] На стадии S103 момент времени в образце видеоданных устанавливают в образце видеофайла в случае, когда образец видеоданных содержит недопустимый контент.[0054] In step S103, a point in time in the video sample is set in the video file sample in the case where the video sample contains invalid content.

[0055] Согласно этому варианту осуществления, могут определять контент образца видеоданных, чтобы определить является ли контент нежелательным контентом. Контент образца видеоданных может быть определен как недопустимый контент в случае, когда контент связан с терроризмом, насилием, порнографией, азартными играми и т.п., и контент образца видеоданных может быть определен как допустимый в случае, когда контент относится к природному ландшафту, зданию и т.п.[0055] According to this embodiment, the content of the sample video data can be determined to determine whether the content is unwanted content. The content of the video sample can be defined as invalid content when the content is related to terrorism, violence, pornography, gambling, etc., and the content of the video sample can be defined as valid when the content is related to a natural landscape, a building and so on.

[0056] Для образца видеоданных с недопустимым контентом могут устанавливать момент времени в образце видеоданных из образца видеофайла.[0056] For the sample video data with invalid content, a point in time in the sample video data from the sample video file can be set.

[0057] На стадии S104 значимый участок видеоданных извлекают из видеоданных в области этого момента времени.[0057] In step S104, a significant portion of the video data is extracted from the video data in the area of this point in time.

[0058] Значимость, как визуальная особенность изображения, представляет собой внимание человеческого взгляда к некоторым участкам изображения.[0058] Significance, as a visual feature of the image, is the attention of the human eye to some areas of the image.

[0059] В кадре изображения пользователя интересует элемент изображения, и интересующий элемент отражает намерение пользователя. Большинство остальных участков не связаны с намерением пользователя, то есть, значимый участок представляет собой участок изображения, который с наибольшей вероятностью вызовет интерес у пользователя и представляет контент изображения.[0059] In an image frame, a user is interested in an image element, and the element of interest reflects the user's intent. Most of the remaining regions are not related to the user's intent, that is, the significant region is the region of the image that is most likely to arouse the interest of the user and represents the content of the image.

[0060] На самом деле, выбор значимости субъективен, и на одном и том же кадре изображения разные пользователи могут выбирать разные участки в качестве значимых участков в силу различных задач и знаний пользователей.[0060] In fact, the choice of significance is subjective, and on the same image frame, different users may choose different areas as significant areas due to different tasks and users' knowledge.

[0061] Для расчета значимости участка используют механизм внимания человека. Исследования в области когнитивной психологии показали, что некоторые участки изображения могут сильно привлекать внимание человека, и эти участки содержат больший объем информации. Таким образом, механизм внимания человека могут моделировать, исходя из математической модели, и извлеченные значимые участки больше соответствуют субъективной оценке человека, поскольку в процессе познания изображения используют общее правило.[0061] The human attention mechanism is used to calculate site significance. Research in the field of cognitive psychology has shown that certain areas of an image can attract a person's attention, and these areas contain more information. Thus, the mechanism of human attention can be modeled based on a mathematical model, and the extracted significant areas are more consistent with the subjective assessment of a person, since a general rule is used in the process of image cognition.

[0062] На временной шкале образца видеофайла в области момента времени в образце видеоданных присутствует некоторое количество кадров. Согласно этому варианту осуществления, значимые участки могут извлекать из видеоданных в качестве участков видеоданных.[0062] On the timeline of the sample video file, there are a number of frames in the sample video data in the region of a point in time. According to this embodiment, significant portions can be extracted from video data as portions of video data.

[0063] В образце видеофайла объект съемки обычно не изменяется за короткий промежуток времени. То есть, другие видеоданные в области образца видеоданных по существу аналогичны контенту образца видеоданных. В случае, когда образец видеоданных содержит недопустимый контент, очень вероятно, что контент видеоданных является недопустимым, и таким образом, контент видеоданных также считают недопустимым. Поэтому, исходя из чувствительности пользователя к недопустимому контенту, связанному с терроризмом, насилием, порнографией, азартным играм и т.п., значимый участок видеоданных в видеоданных сосредоточен в первую очередь на терроризме, насилии, порнографии, азартных играх и т.п.[0063] In a sample video file, the subject usually does not change in a short period of time. That is, the other video data in the video sample area is essentially the same as the content of the video sample. In the case where the video data sample contains invalid content, it is very likely that the video data content is invalid, and thus the video data content is also considered invalid. Therefore, based on the user's sensitivity to inappropriate content related to terrorism, violence, pornography, gambling, etc., a significant portion of video data in video data is primarily focused on terrorism, violence, pornography, gambling, etc.

[0064] На стадии S105 модель модерации контента обучают, исходя из участка видеоданных и образца видеоданных.[0064] In step S105, a content moderation model is trained based on the video portion and the video sample.

[0065] Согласно этому варианту осуществления, образец видеофайла могут заранее помечать как относящийся к категории нарушения. Модель модерации контента получают путем обучения по предварительно заданной сети с обучающей выборкой участка видеоданных и образца видеоданных различных образцов видеофайлов и тега категории нарушения, в случае завершения обучения.[0065] According to this embodiment, the sample video file may be tagged in advance as belonging to the violation category. The content moderation model is obtained by training on a predetermined network with a training sample of a piece of video data and a sample of video data of various sample video files and a violation category tag, if training is completed.

[0066] Согласно одному примеру, сеть может включать в себя компьютерную модель, например, SVM (машину опорных векторов), модель случайного леса, библиотеку Xgboost, и нейронную сеть, например, CNN (сверточную нейронную сеть), DNN (глубокую нейронную сеть) и RNN (рекуррентную нейронную сеть), что не представляет собой ограничения этого варианта осуществления.[0066] According to one example, the network may include a computer model, such as SVM (Support Vector Machine), random forest model, Xgboost library, and a neural network, such as CNN (Convolutional Neural Network), DNN (Deep Neural Network) and RNN (Recurrent Neural Network), which is not a limitation of this embodiment.

[0067] Согласно этому варианту осуществления, DNN приведена в качестве примера модели модерации контента.[0067] According to this embodiment, a DNN is given as an example of a content moderation model.

[0068] Согласно этому варианту осуществления, определяют категорию нарушения (например, терроризм, насилие, порнография и т.п.), которой помечен образец видеофайла и которая представляет недопустимый контент.[0068] According to this embodiment, the category of offense (eg, terrorism, violence, pornography, etc.) with which the sample video file is tagged and which represents inappropriate content is determined.

[0069] Получают глубокую нейронную сеть и предварительно обученную модель. Предварительно обученная модель представляет собой структуру глубокого обучения, которая обучена выполнять конкретные задачи (например, определять классификацию по фотографии) на большом объеме данных, и предусматривает архитектуры сетей VGG, Inception, ResNet, MobileNet, NasNet и т.п.[0069] A deep neural network and a pretrained model are obtained. A pretrained model is a deep learning framework that is trained to perform specific tasks (such as determining a classification from a photo) on a large amount of data, and provides network architectures for VGG, Inception, ResNet, MobileNet, NasNet, etc.

[0070] Глубокую нейронную сеть инициализируют с применением предварительно обученной модели. То есть путем применения предварительно обученной модели, которая до применения обучена на большом наборе данных, соответствующую конфигурацию и вес могут напрямую применять к глубокой нейронной сети с целью реализации миграционного обучения.[0070] A deep neural network is initialized using a pre-trained model. That is, by applying a pre-trained model that has been trained on a large data set prior to application, the appropriate configuration and weight can be directly applied to a deep neural network to implement migration learning.

[0071] Путем обратного распространения, глубокую нейронную сеть обучают как модель модерации контента, исходя из участка видеоданных, образца видеоданных и категории нарушения.[0071] By backpropagation, a deep neural network is trained as a content moderation model based on a video chunk, a video sample, and a violation category.

[0072] Согласно одному примеру, участок видеоданных и образец видеоданных вводят в глубокую нейронную сеть, исходную пиксельную информацию объединяют между нейронами при помощи нелинейного отображения, и оценки различных категорий недопустимости получают с помощью регрессионного слоя функции Softmax, они выступают в качестве оценки недопустимости. Потерю классификации всей глубокой нейронной сети получают путем вычисления перекрестной энтропии оценки недопустимости и тэга обучающей выборки.[0072] According to one example, a piece of video data and a sample of video data are input into a deep neural network, the original pixel information is combined between neurons using a non-linear mapping, and estimates of various invalidation categories are obtained using a regression layer of the Softmax function, they act as an invalidation estimate. The classification loss of the entire deep neural network is obtained by computing the cross entropy of the invalidity score and the training sample tag.

[0073] Согласно одному примеру, в котором образец видеофайла принадлежит к разным категориям недопустимости, модель модерации контента может быть рассчитана на выявление различных категорий недопустимости. Когда образец видеофайла принадлежит к той же категории нарушения, модель модерации контента может быть рассчитана на выявление этой категории нарушения.[0073] According to one example, in which the sample video file belongs to different categories of inadmissibility, the content moderation model can be designed to identify different categories of inadmissibility. When a sample video file belongs to the same infringement category, the content moderation model can be designed to detect that infringement category.

[0074] Способ обучения модели модерации контента согласно этому варианту осуществления проиллюстрирован следующими примерами.[0074] The method for training the content moderation model according to this embodiment is illustrated by the following examples.

[0075] Например, как показано на фиг. 2, для образца 201 видеофайла, контент которого представляет собой соревнование по боксу, шесть кадров видеоданных извлекли из образца видеофайла в качестве образца видеоданных 202, четыре кадра видеоданных 202, содержащих недопустимый контент, определили как образец 203 видеоданных, и образец 203 видеоданных, содержащий недопустимый контент, содержит насилие. Момент времени в образце 203 видеоданных, содержащих недопустимый контент, располагают на временной шкале 204 образца 201 видеофайла, значимый участок видеоданных извлекают из видеоданных 205 в области этого момента времени (часть блока). Модель 207 модерации контента обучают на обучающем образце значимого участка видеоданных в образце 202 видеоданных и видеоданных 205, и тэге 206 категории нарушения образца 201 видеофайла, так что модель 207 модерации контента могут настраивать на классификацию видеоданных, и рассматриваемые факторы классификации соответствуют категории 206 нарушения.[0075] For example, as shown in FIG. 2, for the sample video file 201 whose content is a boxing competition, six frames of video data are extracted from the sample video file as sample video data 202, four frames of video data 202 containing invalid content are determined as video sample 203, and video sample 203 containing invalid content contains violence. The point in time in the video data sample 203 containing invalid content is located on the timeline 204 of the video file sample 201, a significant portion of the video data is extracted from the video data 205 in the region of this point in time (part of the block). The content moderation model 207 is trained on a training sample of a significant portion of the video data in the video data sample 202 and video data 205, and the violation category tag 206 of the video file sample 201, so that the content moderation model 207 can be tuned to classify the video data and the classification factors considered correspond to the violation category 206.

[0076] Согласно этому варианту осуществления, получают образец видеофайла. Образец видеофайла содержит некоторое количество кадров видеоданных, часть видеоданных извлекают в качестве образца видеоданных, и устанавливают момент времени в образце видеоданных из образца видеофайла в случае, когда образец видеоданных содержит недопустимый контент. Значимый участок видеоданных извлекают из видеоданных в области этого момента времени, и модель модерации контента обучают, исходя из участка видеоданных и образца видеоданных. Устанавливают местоположение образца видеоданных, содержащего недопустимый контент, по времени, а местоположение значимого участка видеоданных - в пространстве, таким образом определяют пространственно-временное местоположение образца видеофайла, то есть, получают самоопределение местоположения недопустимого контента образца видеофайла во времени и пространстве. Таким образом, признак недопустимого контента могут быстро получать из образца видеофайла для определения его характеристик, качество признака возрастает в аспектах времени и пространства с точки зрения модерации контента, и эффективность модели модерации контента могут обеспечивать путем такого способа обучения модели модерации контента. Кроме того, автоматическую расстановку тэгов в образце видеоданных выполняют без дополнительных действий по аннотированию, путем установления местоположения образца видеоданных и области видеоданных с недопустимым контентом, что легко выполнить и что исключает необходимость расстановки тэгов вручную, повышает эффективность обучения модели модерации контента и снижает затраты на обучение модели модерации контента.[0076] According to this embodiment, a sample video file is obtained. The sample video file contains a number of frames of video data, a part of the video data is extracted as a sample video data, and a point in time is set in the sample video data from the sample video file in the case where the sample video data contains invalid content. A significant piece of video data is extracted from the video data in the region of this point in time, and a content moderation model is trained based on the piece of video data and the sample video data. The location of the video data sample containing the invalid content is set in time, and the location of the significant portion of the video data is in space, thus determining the spatio-temporal location of the video file sample, i.e., self-determining the location of the invalid content of the video file sample in time and space. Thus, an invalid content flag can be quickly obtained from a video file sample to determine its characteristics, the quality of the flag is increased in terms of time and space in terms of content moderation, and the effectiveness of the content moderation model can be ensured by such a method of training the content moderation model. In addition, automatic tagging in the video sample is performed without additional annotation steps, by identifying the location of the video sample and the video data area with invalid content, which is easy to perform and eliminates the need for manual tagging, improves the training efficiency of the content moderation model, and reduces training costs. content moderation models.

[0077] Второй вариант осуществления[0077] Second Embodiment

[0078] На фиг. 3 представлена блок-схема способа обучения модели модерации контента согласно второму варианту осуществления настоящего изобретения. Этот вариант осуществления иллюстрирует действия по извлечению образца видеоданных, установлению местоположения момента времени и извлечению области видеоданных, исходя из описанного выше варианта осуществления. Способ предусматривает следующие процедуры.[0078] FIG. 3 is a flowchart of a content moderation model training method according to the second embodiment of the present invention. This embodiment illustrates steps for retrieving a sample of video data, locating a point in time, and retrieving a region of video data based on the embodiment described above. The method includes the following procedures.

[0079] На стадии S301 получают образец видеофайла.[0079] In step S301, a sample video file is obtained.

[0080] Образец видеофайла содержит некоторое количество кадров видеоданных.[0080] The sample video file contains a number of frames of video data.

[0081] На стадии S302 образец видеофайла разделяют по меньшей мере на два видеофрагмента образца.[0081] In step S302, the sample video file is divided into at least two sample videos.

[0082] На стадии S303 часть видеоданных извлекают из каждого видеофрагмента образца в качестве образца видеоданных.[0082] In step S303, a portion of video data is extracted from each sample video as sample video data.

[0083] Согласно одному примеру, образец видеофайла могут разделять на фрагменты по интервалу времени, то есть, образец видеофайла разделяют по меньшей мере на два видеофрагмента образца.[0083] According to one example, the sample video file can be divided into fragments by time interval, that is, the sample video file is divided into at least two video fragments of the sample.

[0084] Из каждого видеофрагмента образца случайным образом извлекают n кадров видеоданных в качестве образца видеоданных, и таким образом формируют последовательность видеокадров для обработки.[0084] From each video fragment of the sample, n frames of video data are randomly extracted as a sample of video data, and thus a sequence of video frames for processing is formed.

[0085] Параметры t и n представляют собой регулируемые параметры.[0085] The parameters t and n are adjustable parameters.

[0086] Согласно одному примеру, за исключением усредненного разделения образца видеофайла на части и случайного извлечения изображения, образец видеоданных могут извлекать другими способами согласно фактическим потребностям. Так, для образца видеофайла, содержащего плотины, плотина может в определенной степени представлять для пользователя интерес в контенте образца видеофайла, и образец видеофайла разделяют на части относительно заграждений так, чтобы содержание плотин в каждом видеофрагменте образца (количество на единицу времени) находилась в заданном диапазоне. В качестве альтернативы, видеоданные извлекают из каждого видеофрагмента образца так, чтобы интервалу времени между двумя кадрами видеоданных был одинаковым, и т.п., что не ограничено в настоящем варианте осуществления.[0086] According to one example, except for the average division of the video file sample into parts and random image extraction, the video data sample can be extracted in other ways according to actual needs. So, for a video file sample containing dams, the dam may be of interest to the user in the content of the video file sample to a certain extent, and the video file sample is divided into parts relative to obstacles so that the content of dams in each video fragment of the sample (number per unit of time) is in the specified range . Alternatively, video data is extracted from each sample video so that the time interval between two frames of video data is the same, and the like, which is not limited in the present embodiment.

[0087] Кроме того, образец видеоданных могут масштабировать до предварительно заданного размера и образец видеоданных могут дополнительно упорядочивать по времени, тем самым облегчая обучение модели модерации контента.[0087] In addition, the sample video data can be scaled to a predetermined size, and the sample video data can be further ordered in time, thereby facilitating training of the content moderation model.

[0088] На стадии S304 выполняют поиск модели распознавания нарушений.[0088] At step S304, a search for a violation recognition model is performed.

[0089] Согласно этому варианту осуществления, видеоданные, содержащие недопустимый контент, могут предварительно использовать в качестве обучающей выборки, и сеть (например, CNN) обучают с тегом категории нарушения. В случае, когда обучение завершено, можно получить модель распознавания нарушений. То есть, модель распознавания нарушений может быть рассчитана на определение оценки недопустимости изображения в контенте видеоданных.[0089] According to this embodiment, video data containing invalid content may be pre-used as a training sample, and a network (eg, CNN) is trained with a violation category tag. In the case when the training is completed, you can get a violation recognition model. That is, the infringement recognition model can be calculated to determine an image inadmissibility score in the video data content.

[0090] При необходимости для разных категорий нарушений могут обучать разные модели распознавания нарушений. То есть, модель распознавания нарушений может быть рассчитана на определение оценки недопустимости изображения в видеоданных, контент которых принадлежит к той же категории нарушения.[0090] If necessary, different violation recognition models can be trained for different categories of violations. That is, the infringement recognition model can be designed to determine an image inadmissibility score in video data whose content belongs to the same infringement category.

[0091] Как правило, образец видеофайла соответствует категории нарушения в образце видеоданных. Таким образом, могут определять категорию нарушения, которой помечен образец видеофайла и которая представляет недопустимый контент, находить модель распознавания нарушения, которая соответствует категории нарушения, причем модель распознавания нарушения рассчитана на определение оценки недопустимости изображения в видеоданных, контент которых принадлежит к этой категории нарушения.[0091] Typically, the sample video file corresponds to the violation category in the video sample. Thus, it is possible to determine the infringement category with which the video file sample is marked and which represents inadmissible content, to find the infringement recognition model that corresponds to the infringement category, and the infringement recognition model is designed to determine the image inadmissibility score in video data whose content belongs to this infringement category.

[0092] Типичную модель распознавания нарушения могут также обучать по разным категориям нарушений, то есть, одна модель распознавания нарушений может быть рассчитана на определение оценки недопустимости изображения в видеоданных, контент которых принадлежит к другой категории нарушения, что не ограничено в настоящем варианте осуществления.[0092] A typical violation recognition model can also be trained on different categories of violations, that is, one violation recognition model can be calculated to determine the image invalidity score in video data whose content belongs to a different violation category, which is not limited in the present embodiment.

[0093] На стадии S305 оценку недопустимости изображения в контенте образца видеоданных определяют путем ввода образца видеоданных в модель распознавания нарушения.[0093] In step S305, an image inadmissibility judgment in the video sample content is determined by inputting the video sample into the violation recognition model.

[0094] В случае, когда определена модель распознавания нарушения, образцы видеоданных образца видеофайла могут последовательно вводить в модель распознавания нарушения для обработки, и модель распознавания нарушения на выходе последовательно дает оценку недопустимости изображения образца видеоданных.[0094] In the case where a violation recognition model is determined, video samples of the video file sample may be sequentially input to the violation recognition model for processing, and the output violation recognition model sequentially evaluates the image of the video sample as invalid.

[0095] На стадии S306 выбирают образец видеоданных с оценкой недопустимости, которая удовлетворяет предварительно заданному условию недопустимости.[0095] In step S306, a sample of video data with an invalid score that satisfies a predetermined invalid condition is selected.

[0096] Согласно этому варианту осуществления, условие недопустимости могут задавать предварительно, и это условие недопустимости используют для определения образца видеоданных, содержащего недопустимый контент.[0096] According to this embodiment, an invalid condition may be pre-set, and this invalid condition is used to determine a video data sample containing invalid content.

[0097] В случае, когда определена оценка недопустимости изображения, определяют образец видеоданных с оценкой недопустимости изображения, который удовлетворяет условию недопустимости.[0097] In the case where an image invalid score is determined, a sample of video data with an image invalid score that satisfies the invalid condition is determined.

[0098] Согласно одному примеру, условие недопустимости заключается в том, что оценка недопустимости изображения превышает пороговое значение оценки изображения, либо значение оценки недопустимости изображения является наибольшим.[0098] According to one example, the invalidation condition is that the image invalidation score exceeds the image evaluation threshold, or the image invalidation evaluation value is the largest.

[0099] Согласно этому варианту осуществления, оценку недопустимости образца видеоданных могут определять согласно тому, превышает ли она предварительно заданное пороговое значение оценки изображения.[0099] According to this embodiment, the video sample invalid score may be determined according to whether it exceeds a predetermined image evaluation threshold.

[00100] В случае, когда оценка недопустимости изображения образца видеоданных больше предварительно заданного порогового значения оценки изображения, определяют, что оценка недопустимости изображения удовлетворяет предварительно заданному условию недопустимости.[00100] In a case where the image invalid score of the video data sample is greater than the predefined image evaluation threshold, it is determined that the image invalid score satisfies the predefined invalid condition.

[00101] В случае, когда оценка недопустимости изображения образца видеоданных больше предварительно заданного порогового значения оценки изображения, определяют, что оценка недопустимости изображения удовлетворяет предварительно заданному условию недопустимости.[00101] In the case where the image invalid score of the video data sample is larger than the predefined image evaluation threshold, it is determined that the image invalid evaluation satisfies the predefined invalid condition.

[00102] Приведенное выше условие недопустимости представляет собой только пример, и когда этот вариант осуществления реализуют, согласно фактической потребности могут задавать другое условие недопустимости, например, верхнее значение m оценки недопустимости изображения и т.п., что не ограничено в вариантах осуществления настоящего изобретения. Кроме того, согласно фактическим потребностям могут применять другие условия недопустимости помимо указанного выше условия недопустимости, что не ограничено в вариантах осуществления настоящего изобретения.[00102] The above invalidation condition is only an example, and when this embodiment is implemented, according to the actual need, another invalidation condition can be set, for example, the upper value m of the image invalidity evaluation, etc., which is not limited in the embodiments of the present invention. . In addition, according to actual needs, other invalidation conditions can be applied in addition to the above invalidation condition, which is not limited in the embodiments of the present invention.

[00103] На стадии S307 в образце видеоданных определяют момент времени, отвечающий предварительно заданному условию недопустимости образца видеофайла.[00103] In step S307, a point in time is determined in the video data sample that meets a predetermined video file sample invalid condition.

[00104] В случае, когда для удовлетворения условия недопустимости определен кадр образца видеоданных, этот момент времени определяют в образце видеоданных из образца видеофайла.[00104] In the case where a frame of the sample video data is determined to satisfy the invalid condition, this point in time is determined in the sample video data from the sample video file.

[00105] На стадии S308 определяют временной диапазон, который содержит этот момент времени.[00105] At step S308, a time range is determined that contains this point in time.

[00106] Исходя из этого момента времени, на временной шкале образца видеофайла создают временной диапазон, содержащий этот момент.[00106] Based on this point in time, a time range containing this point is created on the timeline of the sample video file.

[00107] Согласно одному примеру, предполагают, что момент времени - это Т, тогда могут создавать временной диапазон продолжительностью F[T-F/2, T+F/2], где F представляет собой регулируемый параметр.[00107] According to one example, assume that the time point is T, then you can create a time range of duration F[T-F/2, T+F/2], where F is an adjustable parameter.

[00108] Описанный выше способ создания временного диапазона представляет собой только пример, и когда этот вариант осуществления реализуют, могут использовать другие способы создания временного диапазона согласно фактическим потребностям, например [Т-F/3, T+2F/3], [Т-3F/4, T+F/4] и т.п., что не ограничено в вариантах осуществления настоящего изобретения. Кроме того, согласно фактическим потребностям могут применять другие условия недопустимости помимо указанного выше условия недопустимости, что не ограничено в вариантах осуществления настоящего изобретения.[00108] The above time range creation method is only an example, and when this embodiment is implemented, other time range creation methods may be used according to actual needs, such as [T-F/3, T+2F/3], [T- 3F/4, T+F/4] and the like, which is not limited in the embodiments of the present invention. In addition, according to actual needs, other invalidation conditions can be applied in addition to the above invalidation condition, which is not limited in the embodiments of the present invention.

[00109] На стадии S309 находят модель обнаружения значимого участка.[00109] In step S309, a significant area detection model is found.

[00110] Согласно этому варианту осуществления, модель обнаружения значимого участка также является предварительно заданной, и модель обнаружения значимого участка может быть рассчитана на выявление значимого участка изображения в видеоданных.[00110] According to this embodiment, the significant area detection model is also predefined, and the significant area detection model can be calculated to detect the significant area of the image in the video data.

[00111] Согласно одному примеру, модель обнаружения значимого участка могут следующим образом применять в трех классах алгоритмов.[00111] According to one example, the significant area detection model can be applied in three classes of algorithms as follows.

[00112] Первый класс представляет собой алгоритм анализа значимости, исходя из низкоуровневой обработки изображения, например, алгоритм визуальной значимости (алгоритм ITTI), который представляет собой алгоритм избирательного внимания, при помощи которого моделируют механизм визуального внимания организма, и приспособлен для обработки естественных изображений.[00112] The first class is a significance analysis algorithm based on low-level image processing, such as the visual significance algorithm (ITTI algorithm), which is a selective attention algorithm that simulates the visual attention mechanism of an organism, and is adapted to natural image processing.

[00113] Второй класс представляет собой способ исключительно математических вычислений, который не исходит ни из какого визуального биологического принципа, например, алгоритм полного разрешения (алгоритм Ахо-Корасик, АС алгоритм), алгоритм спектральной остаточности (алгоритм SR), которые основаны на области пространственных частот.[00113] The second class is a purely mathematical calculation method that does not come from any visual biological principle, for example, the full resolution algorithm (Aho-Korasik algorithm, AC algorithm), the spectral residual algorithm (SR algorithm), which are based on the spatial frequencies.

[00114] Третий класс сочетает два вышеописанных класса алгоритмов, например, алгоритм, основанный на теории графов (алгоритм GBVS), который моделирует визуальный принцип, аналогичный алгоритму ITTI в процедуре извлечения признаков, и вводит марковские цепи в процедуру создания значимого изображения и получает значение значимости при помощи способа исключительно математических вычислений.[00114] The third class combines the above two classes of algorithms, for example, a graph theory based algorithm (GBVS algorithm), which models a visual principle similar to the ITTI algorithm in a feature extraction procedure, and introduces Markov chains into a significant image generation procedure and obtains a significance value using a method of purely mathematical calculations.

[00115] На стадии S310 значимый участок видеоданных выявляют в видеоданных путем ввода видеоданных из временного диапазона в модель обнаружения значимого участка.[00115] In step S310, a significant area of video data is detected in the video data by inputting video data from the time range into a significant area detection model.

[00116] Видеоданные в пределах временного диапазона извлекают из образца видеофайла, эти видеоданные последовательно вводят в модель обнаружения значимого участка, и модель обнаружения значимого участка выдает на выходе значимый участок видеоданных из видеоданных.[00116] Video data within the time range is extracted from the sample video file, this video data is sequentially input into the significant area detection model, and the significant area detection model outputs the significant area of video data from the video data.

[00117] На стадии S311 модель модерации контента обучают, исходя из участка видеоданных и образца видеоданных.[00117] In step S311, a content moderation model is trained based on a portion of video data and a sample of video data.

[00118] Согласно этому варианту осуществления, для образца видеоданных, содержащего недопустимый контент, временной диапазон определяют, исходя из момента времени, и весьма вероятно, что контент видеоданных в этом временном диапазоне недопустимый. Таким образом, извлечение значимого участка видеоданных может быстро повысить надежность обучающей выборки, тем самым повышая эффективность модели модерации контента за счет обучения модели модерации контента.[00118] According to this embodiment, for a video data sample containing invalid content, a time range is determined based on a point in time, and it is highly likely that the video data content in this time range is invalid. Thus, extracting a significant piece of video data can quickly improve the reliability of the training sample, thereby improving the performance of the content moderation model by training the content moderation model.

[00119] Третий вариант осуществления[00119] Third embodiment

[00120] На фиг. 4 представлена блок-схема способа модерации видеоконтента согласно третьему варианту осуществления настоящего изобретения. Этот вариант осуществления применим в случаях, когда видеоданные модерируют во времени и пространстве. Способ могут выполнять при помощи устройства для модерации видеоконтента, и устройство для обучения модели модерации контента может быть реализовано на программном и/или аппаратном обеспечении и может быть включено в конфигурацию вычислительного устройства, например сервера, рабочей станции, персонального компьютера и т.п. Способ предусматривает следующие процедуры.[00120] FIG. 4 is a flowchart of a video content moderation method according to the third embodiment of the present invention. This embodiment is applicable in cases where video data is moderated in time and space. The method may be performed by a device for moderating video content, and the device for learning the content moderation model may be implemented in software and/or hardware and may be included in the configuration of a computing device, such as a server, workstation, personal computer, or the like. The method includes the following procedures.

[00121] На стадии S401 получают проверяемый видеофайл.[00121] In step S401, a video file to be checked is obtained.

[00122] Проверяемый видеофайл, будучи видеоматериалом, содержит некоторое количество кадров последовательных видеоданных. В случае, когда последовательная смена видеоданных происходит чаще, чем 24 раза в секунду, согласно принципу инерции зрения, глаз человека не может видеть единое статическое изображение и поэтому визуальный эффект является плавным и непрерывным.[00122] The video file being checked, being a video material, contains a number of frames of consecutive video data. In the case where the successive change of video data occurs more than 24 times per second, according to the principle of inertia of vision, the human eye cannot see a single static image, and therefore the visual effect is smooth and continuous.

[00123] Для различных бизнес-сценариев форматы и виды проверяемых видеофайлов отличаются, в данном варианте осуществления они не ограничены.[00123] For different business scenarios, the formats and types of video files to be checked are different, in this embodiment, they are not limited.

[00124] Формат проверяемого видеофайла может представлять собой MPEG, RMVB, AVI, FLV и т.п.[00124] The format of the video file to be checked may be MPEG, RMVB, AVI, FLV, or the like.

[00125] Проверяемый видеофайл может быть, в том числе, в виде короткого видеоматериала, видеоматериала онлайн трансляции, фильма, телевизионного сериала и т.п.[00125] The video file to be checked may be, among other things, in the form of a short video, video of an online broadcast, a movie, a television series, and the like.

[00126] Пользователь загружает проверяемый видеофайл в вычислительное устройство и намеревается опубликовать целевой видеофайл для публичного просмотра.[00126] The user downloads the video file to be checked into the computing device and intends to publish the target video file for public viewing.

[00127] Вычислительное устройство может разработать критерии модерации согласно бизнес-факторам, факторам легальности и другим факторам. Перед публикацией проверяемого видеофайла контент проверяемого видеофайла модерируют, исходя из критериев модерации, некоторые проверяемые видеофайлы, которые не отвечают критериям оценки, отфильтровывают (например, контент проверяемого видеофайла содержит терроризм, насилие, порнографию, азартные игры и т.п.), а некоторые проверяемые видеофайлы, отвечающие критериям модерации, публикуют.[00127] The computing device may develop moderation criteria according to business factors, legality factors, and other factors. Before the video file being checked is published, the content of the video file being checked is moderated based on the moderation criteria, some checked video files that do not meet the rating criteria are filtered out (for example, the content of the video file being checked contains terrorism, violence, pornography, gambling, etc.), and some checked video files that meet the moderation criteria are published.

[00128] Для проверяемого видеофайла с более высокими требованиями к своевременности, например, короткого видеоматериала, видеоматериала онлайн трансляции и т.п., могут предусматривать систему потоковой передачи в режиме реального времени. Пользователь загружает проверяемый видеофайл в систему потоковой передачи в режиме реального времени через клиента, и система потоковой передачи в режиме реального времени может передавать проверяемый видеофайл на вычислительное устройство с целью модерации.[00128] For a video file to be checked with higher timeliness requirements, such as a short video, online broadcast video, and the like, a real-time streaming system may be provided. The user uploads the video file to be checked to the real-time streaming system through the client, and the real-time streaming system can transmit the video file to be checked to the computing device for moderation.

[00129] Для проверяемого видеофайла с более низкими требованиями к своевременности, например, фильмов, телесериалов и т.п., может быть предусмотрена база данных, например, распределенная база данных и т.п. Пользователь загружает базу данных на вычислительное устройство через клиента, и вычислительное устройство может читать проверяемый видеофайл из базы данных с целью модерации.[00129] For a video file to be checked with lower timeliness requirements, such as movies, TV series, and the like, a database, such as a distributed database, or the like, may be provided. The user downloads the database to the computing device via the client, and the computing device can read the video file to be checked from the database for the purpose of moderation.

[00130] На стадии S402 часть видеоданных извлекают из проверяемого видеофайла в качестве проверяемых видеоданных.[00130] In step S402, a part of the video data is extracted from the video file to be checked as the video data to be checked.

[00131] Согласно этому варианту осуществления, часть видеоданных могут выбирать в качестве целевых видеоданных изо всех видеоданных целевого видеофайла.[00131] According to this embodiment, a portion of video data may be selected as target video data from all video data of the target video file.

[00132] На стадии S403 момент времени целевых видеоданных в целевом видеофайле устанавливают в случае, когда целевые видеоданные содержат недопустимый контент.[00132] In step S403, the time point of the target video data in the target video file is set in the case where the target video data contains invalid content.

[00133] Согласно этому варианту осуществления, могут определять контент целевых видеоданных, чтобы определить является ли контент целевых видеоданных нежелательным контентом. Контент целевых видеоданных могут определять как недопустимый контент в случае, когда контент связан с терроризмом, насилием, порнографией, азартными играми и т.п., и контент целевых видеоданных могут определять как допустимый в случае, когда контент относится к природному ландшафту, зданию и т.п.[00133] According to this embodiment, the content of the target video data can be determined to determine whether the content of the target video data is unwanted content. Target video content may be determined as invalid content when the content is related to terrorism, violence, pornography, gambling, and the like, and target video content may be determined as valid when the content is related to a natural landscape, a building, etc. .P.

[00134] Для проверяемых видеоданных, содержащих недопустимый контент, могут устанавливать момент времени в проверяемых видеоданных проверяемого видеофайла.[00134] For inspected video data containing invalid content, a point in time can be set in the inspected video data of the inspected video file.

[00135] На стадии S404 значимый участок видеоданных извлекают из видеоданных в области этого момента времени.[00135] In step S404, a significant portion of the video data is extracted from the video data in the area of this point in time.

[00136] Значимость, как визуальная особенность изображения, представляет собой отражение внимания человеческого взгляда к некоторым участкам изображения.[00136] Significance, as a visual feature of an image, is a reflection of the attention of the human eye to some areas of the image.

[00137] В кадре изображения пользователя интересует элемент изображения, и интересующий элемент отражает намерение пользователя. Большинство остальных участков не зависят от намерения пользователя, то есть, значимый участок представляет собой участок изображения, который с наибольшей вероятностью вызовет интерес у пользователя и представляет контент изображения.[00137] In an image frame, the user is interested in an image element, and the element of interest reflects the user's intent. Most of the remaining regions are independent of the user's intent, that is, the significant region is the region of the image that is most likely to arouse the interest of the user and represents the content of the image.

[00138] На самом деле, выбор значимости субъективен, и на одном и том же кадре изображения разные пользователи могут выбирать разные участки в качестве значимых участков в силу различных задач и знаний пользователей.[00138] In fact, the choice of significance is subjective, and on the same image frame, different users may choose different areas as significant areas due to different tasks and users' knowledge.

[00139] Для расчета значимости участка используют механизм внимания человека. Исследования в области когнитивной психологии показали, что некоторые участки изображения могут сильно привлекать внимание человека, и эти участки содержат больший объем информации. Таким образом, механизм внимания человека могут моделировать, исходя из математической модели, и извлеченные значимые участки лучше согласуются с субъективной оценкой человека, поскольку в процессе познания изображения используют общее правило.[00139] The human attention mechanism is used to calculate site significance. Research in the field of cognitive psychology has shown that certain areas of an image can attract a person's attention, and these areas contain more information. Thus, the mechanism of human attention can be modeled based on a mathematical model, and the extracted significant areas are better consistent with the subjective assessment of a person, since a general rule is used in the image cognition process.

[00140] На временной шкале целевого видеофайла в области момента времени в целевых видеоданных присутствует некоторое количество кадров. Согласно этому варианту осуществления, значимые участки могут извлекать из видеоданных в качестве участков видеоданных.[00140] On the timeline of the target video file, there are a number of frames in the target video data in the region of a point in time. According to this embodiment, significant portions can be extracted from video data as portions of video data.

[00141] В целевом видеофайле объект съемки обычно не изменяется за короткий промежуток времени. То есть другие видеоданные в области целевых видеоданных по существу аналогичны контенту целевых видеоданных. В случае, когда целевые видеоданные содержат недопустимый контент, очень вероятно, что контент видеоданных является недопустимым, и таким образом, контент видеоданных также считают недопустимым контентом. Поэтому, исходя из чувствительности пользователя к недопустимым данным, связанным с терроризмом, насилием, порнографией, азартными играми и т.п., значимый участок видеоданных в видеоданных сосредоточен в первую очередь на терроризме, насилии, порнографии, азартных играх и недопустимом контенте.[00141] In the target video file, the subject usually does not change in a short period of time. That is, other video data in the target video data area is essentially the same as the content of the target video data. In the case where the target video data contains invalid content, it is very likely that the content of the video data is invalid, and thus the video content is also considered invalid content. Therefore, based on the sensitivity of the user to inappropriate data related to terrorism, violence, pornography, gambling, etc., a significant portion of the video data in the video data is primarily focused on terrorism, violence, pornography, gambling, and inappropriate content.

[00142] На стадии S405 контент целевого видеофайла модерируют путем ввода участка видеоданных и целевых видеоданных в предварительно заданную модель модерации контента.[00142] In step S405, the content of the target video file is moderated by inputting the portion of video data and the target video data into a predetermined content moderation model.

[00143] Согласно этому варианту осуществления, модель модерации контента могут предварительно обучать и модель модерации контента может быть рассчитана на определение оценки недопустимости файла в случае, когда контент проверяемого видеофайла принадлежит к предварительно заданной категории нарушения.[00143] According to this embodiment, the content moderation model may be pre-trained and the content moderation model may be configured to determine a file's invalid score when the content of the video file being checked belongs to a predetermined violation category.

[00144] Поскольку способ обучения модели модерации контента по существу аналогичен применению описанных выше первого варианта осуществления и второго варианта осуществления, это описание краткое, и можно сослаться на части описания первого варианта осуществления, второго варианта осуществления, который не описан в данном варианте осуществления в этом документе.[00144] Since the method of training the content moderation model is essentially the same as applying the first embodiment and the second embodiment described above, this description is brief, and parts of the description of the first embodiment, the second embodiment, which are not described in this embodiment in this document.

[00145] Для проверяемого видеофайла участок видеоданных и проверяемые видеоданные могут вводить в модель модерации контента с целью обработки, и контент проверяемого видеофайла могут модерировать, исходя из результата на выходе модели модерации контента, чтобы определить представляет ли собой контент недопустимый контент.[00145] For an inspected video file, a piece of video data and inspected video data may be input to a content moderation model for processing, and the content of the inspected video file may be moderated based on the output of the content moderation model to determine whether the content is invalid content.

[00146] Согласно одному примеру, участок видеоданных и проверяемые видеоданные вводят в предварительно заданную модель модерации контента для определения оценки недопустимости файла в случае, когда контент проверяемого видеофайла принадлежит к предварительно заданной категории нарушения.[00146] According to one example, a portion of the video data and the video to be checked are entered into a predefined content moderation model to determine a file inadmissibility score in the case where the content of the video file to be checked belongs to a predefined violation category.

[00147] Определяют пороговое значение оценки файла.[00147] A file evaluation threshold is determined.

[00148] Оценку недопустимости файла сопоставляют с пороговым значением оценки файла.[00148] A file's invalid score is compared to a file score threshold.

[00149] В случае, когда оценка недопустимости файла ниже или равна пороговой оценке файла, вероятность того, что контент проверяемого видеофайла недопустимый, меньше, и контент проверяемого видеофайла определяют как допустимый.[00149] In the case where the file's invalid score is lower than or equal to the file's threshold score, the probability that the content of the video file being checked is invalid is smaller, and the content of the video file being checked is determined to be valid.

[00150] В случае, когда оценка недопустимости файла выше пороговой оценки файла, весьма вероятно, что контент проверяемого видеофайла недопустимый, и проверяемый видеофайл могут передать специальному клиенту в качестве задания на модерацию. Клиентом управляет специальный модератор.[00150] In the case where the file's invalid score is higher than the file's threshold score, it is highly likely that the content of the video file being checked is invalid, and the video file being checked may be given to a special client as a moderation job. The client is managed by a special moderator.

[00151] В случае, когда клиент получает задание на модерацию, модератор может просмотреть проверяемый видеофайл, чтобы вручную определить является ли контент проверяемого видеофайла недопустимым.[00151] In the case where the client receives a moderation task, the moderator may view the video file being checked to manually determine whether the content of the video file being checked is invalid.

[00152] Контент проверяемого видеофайла определяют как допустимый в случае, когда от клиента получают первую информацию о модерации.[00152] The content of the video file being checked is determined to be valid when the first moderation information is received from the client.

[00153] Контент проверяемого видеофайла определяют как недопустимый в случае, когда от клиента получают вторую информацию о модерации.[00153] The content of the video file being checked is determined to be invalid when the second moderation information is received from the client.

[00154] Путем определения порогового значения оценки файла определяют общее количество проверяемых видеофайлов за предыдущий промежуток времени (например, предыдущий день), и определяют оценку недопустимости проверяемого видеофайла.[00154] By determining the file evaluation threshold value, the total number of video files to be checked for the previous period of time (eg, the previous day) is determined, and the inadmissibility score of the video file to be checked is determined.

[00155] Пороговое значение оценки файла создают таким образом, что отношение количества модераций к общему количеству совпадений соответствует предварительно заданному push-показателю (предполагаемому недопустимому push-показателю, SIPR), при этом количество модераций представляет собой количество проверяемых видеофайлов, для которых оценки недопустимости файла выше порогового значения оценки файла.[00155] The file rating threshold is created such that the ratio of the number of moderations to the total number of matches corresponds to a predefined push score (Suggested Invalid Push Rate, SIPR), where the number of moderations is the number of video files being checked for which the file's invalid scores above the file evaluation threshold.

[00156] Как правило, отношение количества проверяемых видеофайлов с недопустимым контентом к количеству всех проверяемых видеофайлов относительно невелико, например, 1%. В этом способе определения могут задавать push-показатель выше показателя в 1% (например, 10%), чтобы обеспечить возможность ручной модерации как можно большего количества проверяемых видеофайлов с недопустимым контентом.[00156] As a rule, the ratio of the number of video files with inadmissible content to be checked to the number of all video files to be checked is relatively small, for example, 1%. In this method, definitions may set a push rate higher than 1% (eg, 10%) to allow manual moderation of as many video files with inappropriate content as possible.

[00157] Если предположить, что количество всех проверяемых видеофайлов за предыдущий промежуток времени составляет 100 000, а push-показатель составляет 10%, тогда проверяемый видеофайл можно ранжировать, исходя из оценки недопустимости файла (от меньшей к большей) и в качестве порогового значения оценки задают оценку недопустимости файла для каждого 10000-го проверяемого видеофайла.[00157] Assuming that the number of all inspected video files in the previous time period is 100,000, and the push rate is 10%, then the inspected video file can be ranked based on the file's invalid score (lowest to highest) and as a score threshold specifying a file invalidity score for each 10,000th video file being checked.

[00158] Описанный выше способ определения порогового значения оценки файла представляет собой только пример, и когда реализуют этот вариант осуществления, могут задавать другой способ определения порогового значения оценки файла согласно фактическим потребностям, например, в качестве порогового значения оценки файла могут задавать значение по умолчанию и т.п., что не ограничено в вариантах осуществления настоящего изобретения. Кроме того, согласно фактическим потребностям могут применять другой способ определения порогового значения оценки файла помимо указанного выше способа определения порогового значения оценки файла, что не ограничено в вариантах осуществления настоящего изобретения.[00158] The above-described file rating threshold determination method is only an example, and when this embodiment is implemented, another method for determining the file rating threshold may be set according to actual needs, for example, the file rating threshold may be set to a default value and etc., which is not limited in the embodiments of the present invention. In addition, according to actual needs, another file evaluation threshold determination method other than the above file evaluation threshold determination method may be applied, which is not limited in the embodiments of the present invention.

[00159] Способ модерации видеоконтента согласно этому варианту осуществления проиллюстрирован следующими вариантами осуществления.[00159] The video content moderation method according to this embodiment is illustrated in the following embodiments.

[00160] Например, как показано на фиг. 5, для образца видеофайла 501, контент которого представляет собой соревнование по боксу, шесть кадров видеоданных извлекли из образца видеофайла в качестве проверяемых видеоданных 502, четыре кадра видеоданных 502, содержащих недопустимый контент, определили как проверяемые видеоданные 503, и проверяемые видеоданные 503, содержащие недопустимый контент, содержат насилие. Момент времени проверяемых видеоданных 503, содержащих недопустимый контент, располагают на временной шкале 504 проверяемого видеофайла 501, значимый участок видеоданных извлекают из видеоданных 505 в области этого момента времени (часть блока). Значимый участок видеоданных в проверяемых видеоданных 502 и видеоданные 505 вводят в модель 506 модерации контента, и тэг 507, то есть класс допустимости или недопустимости проверяемого видеофайла 501, определяют, исходя из результата на выходе модели 506 модерации контента.[00160] For example, as shown in FIG. 5, for the sample video file 501 whose content is a boxing competition, six frames of video data are extracted from the sample video file as the video data to be checked 502, four frames of video data 502 containing invalid content are determined to be the video data to be checked 503, and the video data to be checked 503 containing invalid content contains violence. The point in time of the checked video data 503 containing invalid content is located on the timeline 504 of the checked video file 501, a significant portion of the video data is extracted from the video data 505 in the area of this point in time (part of the block). A significant piece of video data in the video data to be checked 502 and video data 505 are input to the content moderation model 506, and a tag 507, that is, a valid or invalid class of the video file 501 to be checked, is determined based on the result of the output of the content moderation model 506.

[00161] Согласно этому варианту осуществления, получают проверяемый видеофайл. Проверяемый видеофайл содержит некоторое количество кадров видеоданных, часть видеоданных извлекают в качестве проверяемых видеоданных, и момент времени для проверяемых видеоданных устанавливают в проверяемом видеофайле в случае, когда образец видеоданных содержит недопустимый контент. Характерный участок видеоданных извлекают из видеоданных в области этого момента времени, и участок видеоданных и проверяемые видеоданные вводят в предварительно заданную модель модерации контента с целью модерации контента проверяемого видеофайла. Устанавливают местоположение во времени проверяемых видеоданных, содержащих недопустимый контент, а местоположение значимого участка видеоданных - в пространстве, таким образом определяют пространственно-временное местоположение проверяемого видеофайла, то есть, получают самоопределение местоположения недопустимых данных в проверяемом видеофайле во времени и пространстве. Таким образом, признак недопустимого контента быстро получают из образца видеофайла с целью определения его характеристик, и для модерации контента качество признака повышают в аспектах времени и пространства, тем самым обеспечивают качество модерации контента, снижают показатель неверной модерации и повышают эффективность модерации видеоконтента.[00161] According to this embodiment, a video file to be checked is obtained. The video file to be checked contains a number of frames of video data, a part of the video data is extracted as the video data to be checked, and the time point for the video data to be checked is set in the video file to be checked in the case where the sample video contains invalid content. A characteristic section of video data is extracted from the video data in the area of this point in time, and the section of video data and the video data to be checked are entered into a predetermined content moderation model in order to moderate the content of the video file being checked. The location in time of the checked video data containing invalid content is set, and the location of the significant portion of the video data is in space, thus determining the spatio-temporal location of the checked video file, that is, obtaining self-location of the invalid data in the checked video file in time and space. Thus, an invalid content flag is quickly obtained from a sample video file to determine its characteristics, and for content moderation, the quality of the flag is increased in terms of time and space, thereby ensuring the quality of content moderation, reducing the incorrect moderation rate, and improving the efficiency of video content moderation.

[00162] Четвертый вариант осуществления[00162] Fourth embodiment

[00163] На фиг.6 представлена блок-схема способа модерации видеоконтента согласно четвертому варианту осуществления настоящего изобретения, и этот вариант осуществления иллюстрирует действия по извлечению проверяемых видеоданных, установлению местоположения момента времени и извлечению области видеоданных, исходя из описанных выше вариантов осуществления. Способ предусматривает следующие процедуры.[00163] FIG. 6 is a flowchart of a video content moderation method according to a fourth embodiment of the present invention, and this embodiment illustrates the steps for extracting the video to be checked, establishing the location of a point in time, and extracting the video data region based on the above embodiments. The method includes the following procedures.

[00164] На стадии S601 получают проверяемый видеофайл.[00164] In step S601, a video file to be checked is obtained.

[00165] Проверяемый видеофайл содержит некоторое количество кадров видеоданных.[00165] The video file being checked contains a number of frames of video data.

[00166] На стадии S602 проверяемый видеофайл разделяют по меньшей мере на два проверяемых видеофрагмента.[00166] In step S602, the video file to be checked is divided into at least two video fragments to be checked.

[00167] На стадии S603 часть видеоданных извлекают из каждого проверяемого видеофрагмента в качестве проверяемых видеоданных.[00167] In step S603, a portion of the video data is extracted from each video clip to be checked as video data to be checked.

[00168] Согласно одному примеру, проверяемый видеофайл могут разделять на сегменты по интервалу времени, то есть, проверяемый видеофайл разделяют по меньшей мере на два проверяемых видеофрагмента.[00168] According to one example, the video file to be checked can be divided into segments by time interval, that is, the video file to be checked is divided into at least two video fragments to be checked.

[00169] Из каждого проверяемого видеофрагмента случайным образом извлекают n кадров видеоданных в качестве проверяемых видеоданных, и таким образом формируют последовательность видеокадров для обработки.[00169] From each video clip to be checked, n frames of video data are randomly extracted as video data to be checked, and thus a sequence of video frames is formed for processing.

[00170] Параметры t и n представляют собой регулируемые параметры.[00170] The parameters t and n are adjustable parameters.

[00171] За исключением усредненного разделения проверяемого видеофайла на части и случайного извлечения изображения, проверяемые видеоданные могут извлекать другими способами согласно фактическим потребностям. Так, когда проверяемый видеофайл разделен на части, продолжительность проверяемых видеофрагментов на обоих концах больше, а продолжительность среднего проверяемого видеофрагмента меньше. В качестве альтернативы, видеоданные извлекают из каждого проверяемого видеофрагмента так, чтобы интервал времени между каждыми двумя кадрами видеоданных был одинаковым, и т.п., что не ограничено в настоящем варианте осуществления.[00171] Except for the average division of the video file to be checked into parts and the random extraction of the image, the video data to be checked can be extracted in other ways according to actual needs. So, when the video file being checked is divided into parts, the length of the checked video fragments at both ends is longer, and the length of the average checked video fragment is shorter. Alternatively, video data is extracted from each video clip being checked so that the time interval between every two frames of video data is the same, etc., which is not limited in the present embodiment.

[00172] Кроме того, целевые видеоданные могут масштабировать до предварительно заданного размера и проверяемые видеоданные могут дополнительно упорядочивать по времени, тем самым облегчая определение модели модерации контента.[00172] In addition, the target video data can be scaled to a predetermined size, and the video data being checked can be further ordered in time, thereby facilitating the determination of the content moderation model.

[00173] На стадии S604 выполняют поиск модели распознавания нарушений.[00173] In step S604, a search for a violation recognition model is performed.

[00174] Согласно этому варианту осуществления, видеоданные, содержащие недопустимый контент, могут предварительно использовать в качестве обучающей выборки, и сеть (например, CNN) обучают с тегом категории нарушения. В случае, когда обучение завершено, можно получить модель распознавания нарушений. То есть, модель распознавания нарушений может быть рассчитана на определение оценки недопустимости изображения в контенте видеоданных.[00174] According to this embodiment, video data containing invalid content may be pre-used as a training sample, and a network (eg, CNN) is trained with a violation category tag. In the case when the training is completed, you can get a violation recognition model. That is, the infringement recognition model can be calculated to determine an image inadmissibility score in the video data content.

[00175] При необходимости для разных категорий нарушений могут обучать разные модели распознавания нарушений. То есть, модель распознавания нарушений может быть рассчитана на определение оценки недопустимости изображения в видеоданных, контент которых принадлежит к той же категории нарушения.[00175] If necessary, different violation recognition models can be trained for different categories of violations. That is, the infringement recognition model can be designed to determine an image inadmissibility score in video data whose content belongs to the same infringement category.

[00176] Типичную модель распознавания нарушения могут также обучать по разным категориям нарушений, то есть, одна модель распознавания нарушений может быть рассчитана на определение оценки недопустимости изображения в видеоданных, контент которых принадлежит к этой категории нарушения, что не ограничено в настоящем варианте осуществления.[00176] A typical violation recognition model can also be trained on different categories of violations, that is, one violation recognition model can be calculated to determine the image invalidity score in video data whose content belongs to this violation category, which is not limited in the present embodiment.

[00177] На стадии S605 оценку недопустимости изображения в контенте проверяемых видеоданных определяют путем ввода проверяемых видеоданных в модель распознавания нарушения.[00177] In step S605, an image inadmissibility judgment in the content of the inspected video data is determined by inputting the inspected video data into the violation recognition model.

[00178] В случае, когда определена модель распознавания нарушения, проверяемые видеоданные проверяемого видеофайла могут последовательно вводить в эту модель распознавания нарушения для обработки, и эта модель распознавания нарушения на выходе последовательно дает оценку недопустимости изображения проверяемых видеоданных.[00178] In the case where a violation recognition model is determined, the inspected video data of the video file being inspected may be sequentially input to this violation recognition model for processing, and this violation recognition model outputs sequentially an image invalidity judgment of the inspected video data.

[00179] На стадии S606 выбирают проверяемые видеоданные с оценкой недопустимости, которая удовлетворяет предварительно заданному условию недопустимости.[00179] In step S606, video data to be checked with an invalidity score that satisfies a predetermined invalidity condition is selected.

[00180] Согласно этому варианту осуществления, условие недопустимости могут задавать предварительно, и это условие недопустимости используют для определения проверяемых видеоданных, содержащих недопустимый контент.[00180] According to this embodiment, an invalid condition may be pre-set, and this invalid condition is used to determine the video data to be checked containing invalid content.

[00181] В случае, когда определена оценка недопустимости проверяемого изображения, определяют проверяемые видеоданные с оценкой недопустимости изображения, которая удовлетворяет условию недопустимости.[00181] In the case where the invalidity score of the image to be checked is determined, the video data to be checked is determined with an image invalidity score that satisfies the invalidity condition.

[00182] Согласно одному примеру, условие недопустимости заключается в том, что оценка недопустимости изображения превышает пороговое значение оценки изображения, либо значение оценки недопустимости изображения является наибольшим.[00182] According to one example, the invalidation condition is that the image invalid score exceeds the image evaluation threshold, or the image invalid evaluation value is the largest.

[00183] Согласно этому варианту осуществления, оценку недопустимости проверяемых видеоданных могут определять согласно тому, превышает ли она предварительно заданное пороговое значение оценки изображения.[00183] According to this embodiment, the invalidation score of the video data being checked may be determined according to whether it exceeds a predetermined image evaluation threshold.

[00184] В случае, когда оценка недопустимости изображения проверяемых видеоданных больше предварительно заданного порогового значения оценки изображения, определяют, что оценка недопустимости изображения удовлетворяет предварительно заданному условию недопустимости.[00184] In the case where the image invalid score of the video data being checked is greater than a predetermined image evaluation threshold, it is determined that the image invalid evaluation satisfies the predetermined invalid condition.

[00185] В случае, когда оценка недопустимости изображения проверяемых видеоданных больше предварительно заданного порогового значения оценки изображения, определяют, что оценка недопустимости изображения удовлетворяет предварительно заданному условию недопустимости.[00185] In the case where the image invalid score of the video data being checked is greater than a predetermined image evaluation threshold, it is determined that the image invalid evaluation satisfies the predetermined invalid condition.

[00186] Приведенное выше условие недопустимости представляет собой только пример, и когда этот вариант осуществления реализуют, согласно фактической потребности могут задавать другое условие недопустимости, например, верхнее значение m оценки недопустимости изображения и т.п., что не ограничено в вариантах осуществления настоящего изобретения. Кроме того, согласно фактическим потребностям могут применять другое условие недопустимости помимо указанного выше условия недопустимости, что не ограничено в вариантах осуществления настоящего изобретения.[00186] The above invalidation condition is only an example, and when this embodiment is implemented, according to the actual need, another invalidation condition may be set, such as the upper value m of the image invalidity evaluation, etc., which is not limited in the embodiments of the present invention. . In addition, according to actual needs, another invalidation condition in addition to the above invalidation condition may be applied, which is not limited in the embodiments of the present invention.

[00187] На стадии S607 в проверяемых видеоданных определяют момент времени, отвечающий предварительно заданному условию недопустимости проверяемого видеофайла[00187] In step S607, a point in time is determined in the video data being checked that meets a predetermined condition of invalidity of the video file being checked

[00188] В случае, когда для удовлетворения условия недопустимости определяют кадр проверяемых видеоданных, этот момент времени определяют в проверяемых видеоданных из проверяемого видеофайла.[00188] In the case where a frame of the video data to be checked is determined to satisfy the invalidity condition, this point in time is determined in the video data to be checked from the video file to be checked.

[00189] На стадии S608 определяют временной диапазон, который содержит этот момент времени.[00189] At step S608, a time range is determined that contains this point in time.

[00190] Исходя из этого момента времени, на временной шкале проверяемого видеофайла создают временной диапазон, содержащий этот момент.[00190] Based on this point in time, a time range containing this point is created on the timeline of the video file being checked.

[00191] Согласно одному примеру, предполагают, что момент времени - это Т, тогда могут создавать временной диапазон продолжительностью F[T-F/2, T+F/2], где F представляет собой регулируемый параметр.[00191] According to one example, assume that the time is T, then you can create a time range of duration F[T-F/2, T+F/2], where F is an adjustable parameter.

[00192] Описанный выше способ создания временного диапазона представляет собой только пример, и когда этот вариант осуществления реализуют, могут задавать другие способы создания временного диапазона согласно фактическим потребностям, например [T-F/3, T+2F/3], [T-3F/4, T+F/4] и т.п., что не ограничено в вариантах осуществления настоящего изобретения. Кроме того, согласно фактическим потребностям могут применять другое условие недопустимости помимо указанного выше условия недопустимости, что не ограничено в вариантах осуществления настоящего изобретения.[00192] The above time range creation method is only an example, and when this embodiment is implemented, other time range creation methods can be specified according to actual needs, such as [T-F/3, T+2F/3], [T-3F/ 4, T+F/4] and the like, which is not limited in the embodiments of the present invention. In addition, according to actual needs, another invalidation condition in addition to the above invalidation condition may be applied, which is not limited in the embodiments of the present invention.

[00193] На стадии S609 находят модель обнаружения значимого участка.[00193] In step S609, a significant area detection model is found.

[00194] Согласно этому варианту осуществления, модель обнаружения значимого участка также является предварительно заданной, и модель обнаружения значимого участка может быть рассчитана на выявление значимого участка изображения в видеоданных.[00194] According to this embodiment, the significant area detection model is also predefined, and the significant area detection model can be calculated to detect the significant area of the image in the video data.

[00195] Согласно одному примеру, модель обнаружения значимого участка могут применять в трех классах алгоритмов следующим образом.[00195] According to one example, the significant area detection model can be applied in three classes of algorithms as follows.

[00196] Первый класс представляет собой алгоритм анализа значимости, исходя из низкоуровневой обработки изображения, например, алгоритм визуальной значимости (алгоритм ITTI), который представляет собой алгоритм избирательного внимания, при помощи которого моделируют механизм визуального внимания организма, и приспособлен для обработки естественных изображений.[00196] The first class is a significance analysis algorithm based on low-level image processing, for example, the visual significance algorithm (ITTI algorithm), which is a selective attention algorithm that simulates the visual attention mechanism of the body, and is adapted to natural image processing.

[00197] Второй класс представляет собой способ исключительно математических вычислений, который не исходит ни из какого визуального биологического принципа, например, алгоритм полного разрешения (алгоритм Ахо-Корасик, АС алгоритм), алгоритм спектральной остаточности (алгоритм SR), которые основаны на области пространственных частот.[00197] The second class is a purely mathematical calculation method that does not come from any visual biological principle, for example, the full resolution algorithm (Aho-Korasik algorithm, AC algorithm), the spectral residual algorithm (SR algorithm), which are based on the spatial frequencies.

[00198] Третий класс сочетает два вышеописанных класса алгоритмов, например, алгоритм, основанный на теории графов (алгоритм GBVS), который моделирует визуальный принцип, аналогичный алгоритму ITTI в процедуре извлечения признаков, и вводит марковские цепи в процедуру создания значимого изображения и получает значение значимости при помощи способа исключительно математических вычислений.[00198] The third class combines the above two classes of algorithms, for example, a graph theory based algorithm (GBVS algorithm), which models a visual principle similar to the ITTI algorithm in a feature extraction procedure, and introduces Markov chains into a significant image generation procedure and obtains a significance value using a method of purely mathematical calculations.

[00199] На стадии S610 значимый участок видеоданных выявляют в видеоданных путем ввода видеоданных в пределах временного диапазона в модель обнаружения значимого участка.[00199] In step S610, a significant area of video data is detected in the video data by inputting video data within a time range into a significant area detection model.

[00200] Видеоданные в пределах временного диапазона извлекают из проверяемого видеофайла, видеоданные последовательно вводят в модель обнаружения значимого участка, и модель обнаружения значимого участка выдает на выходе значимый участок видеоданных из видеоданных.[00200] Video data within the time range is extracted from the video file to be checked, the video data is sequentially input into a significant area detection model, and the significant area detection model outputs a significant area of video data from the video data.

[00201] На стадии S611 контент целевого видеофайла модерируют путем ввода участка видеоданных и целевых видеоданных в предварительно заданную модель модерации контента.[00201] In step S611, the content of the target video file is moderated by inputting the portion of video data and the target video data into a predetermined content moderation model.

[00202] Согласно этому варианту осуществления, для проверяемых видеоданных, содержащих недопустимый контент, временной диапазон определяют, исходя из момента времени, и весьма вероятно, что контент видеоданных в этом временном диапазоне недопустимый. Таким образом, извлечение значимого участка видеоданных может быстро повысить надежность обучающей выборки, тем самым повышая эффективность модели модерации контента за счет обучения модели модерации контента.[00202] According to this embodiment, for video data containing invalid content to be checked, a time range is determined based on a point in time, and it is highly likely that the content of video data in this time range is invalid. Thus, extracting a significant piece of video data can quickly improve the reliability of the training sample, thereby improving the performance of the content moderation model by training the content moderation model.

[00203] Пятый вариант осуществления[00203] Fifth embodiment

[00204] На фиг. 7 представлена принципиальная структурная схема устройства для обучения модели модерации контента согласно пятому варианту осуществления настоящего изобретения. Устройство предусматривает:[00204] FIG. 7 is a block diagram of an apparatus for teaching a content moderation model according to a fifth embodiment of the present invention. The device provides:

[00205] модуль 701 получения образца видеофайла, рассчитанный на получение образца видеофайла, при этом образец видеофайла содержит некоторое количество кадров видеоданных;[00205] a sample video file obtaining module 701 for obtaining a sample video file, wherein the sample video file contains a number of frames of video data;

[00206] модуль 702 извлечения образца видеоданных, рассчитанный на извлечение части видеоданных из образца видеофайла в качестве образца видеоданных;[00206] a video sample extractor 702 for extracting a portion of video data from a video sample file as a video sample;

[00207] модуль 703 установления момента времени, рассчитанный на установление момента времени образца видеоданных в образце видеофайла в том случае, когда образец видеоданных содержит недопустимый контент;[00207] a time point setting module 703 for setting the time point of the video sample in the video file sample when the video sample contains invalid content;

[00208] модуль 704 извлечения участка видеоданных, рассчитанный на извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[00208] a video chunk extractor 704 for extracting a significant chunk of video data from the video data in a region of that point in time; And

[00209] модуль 705 обучения модели, рассчитанный на обучение модели модерации контента, исходя из участка видеоданных и образца видеоданных.[00209] A model training module 705 for training a content moderation model based on a portion of video data and a sample of video data.

[00210] Устройство для обучения модели модерации контента согласно вариантам осуществления настоящего изобретения может осуществлять способ обучения модели модерации контента согласно любому варианту осуществления настоящего изобретения, и имеет функциональные модули и преимущества в соответствии с этим способом.[00210] A content moderation model learning apparatus according to embodiments of the present invention can implement a content moderation model learning method according to any embodiment of the present invention, and has functional modules and advantages in accordance with the method.

[00211] Шестой вариант осуществления[00211] Sixth embodiment

[00212] На фиг. 8 представлена принципиальная структурная схема устройства для модерации видеоконтента согласно шестому варианту осуществления настоящего изобретения. Устройство предусматривает:[00212] FIG. 8 is a block diagram of a video content moderation apparatus according to a sixth embodiment of the present invention. The device provides:

[00213] модуль 801 получения проверяемого видеофайла, рассчитанный на получение проверяемого видеофайла, при этом проверяемый видеофайл содержит некоторое количество кадров видеоданных;[00213] a module 801 for obtaining a video file to be checked, configured to receive a video file to be checked, wherein the video file to be checked contains a number of frames of video data;

[00214] модуль 802 извлечения проверяемых видеоданных, рассчитанный на извлечение части видеоданных из проверяемого видеофайла в качестве проверяемых видеоданных;[00214] an inspected video data extraction module 802 for extracting a portion of video data from the inspected video file as inspected video data;

[00215] модуль 803 установления момента времени, рассчитанный на установление момента времени в проверяемых видеоданных из проверяемого видеофайла в случае, когда проверяемые видеоданные содержат недопустимый контент;[00215] a time point setting module 803, configured to set a time point in the video data being checked from the video file being checked when the video data being checked contains invalid content;

[00216] модуль 804 извлечения участка видеоданных, рассчитанный на извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[00216] a video chunk extractor 804, configured to extract a significant chunk of video data from the video data in a region of that point in time; And

[00217] модуль 805 модерации видеоматериала, рассчитанный на модерацию контента проверяемого видеофайла путем ввода участка видеоданных и проверяемых видеоданных в предварительно заданную модель модерации контента.[00217] A video content moderation module 805 for moderating the content of a video file being checked by inputting a portion of the video data and the video data being checked into a predefined content moderation model.

[00218] Устройство для модерации видеоконтента согласно вариантам осуществления настоящего изобретения может осуществлять способ модерации видеоконтента согласно любому варианту осуществления настоящего изобретения, и имеет функциональные модули и преимущества в соответствии с этим способом.[00218] The video content moderation device according to the embodiments of the present invention can implement the video content moderation method according to any embodiment of the present invention, and has functional modules and advantages according to the method.

[00219] Седьмой вариант осуществления[00219] Seventh Embodiment

[00220] На фиг.9 представлена принципиальная структурная схема вычислительного устройства согласно седьмому варианту осуществления настоящего изобретения. Как видно из фиг. 9 вычислительное устройство предусматривает процессор 900, накопительное устройство 901, модуль 902 связи, устройство 903 ввода и устройство 904 вывода. Вычислительное устройство может содержать один или несколько процессоров 900, на фиг. 9 в качестве примера показан один процессор 900. Процессор 900, запоминающее устройство 901 и модуль 902 связи, устройство 903 ввода и устройство 904 вывода в компьютерном устройстве могут быть соединены шиной или другими средствами, и на фиг. 9 в качестве примера показано соединение шиной.[00220] FIG. 9 is a schematic block diagram of a computing device according to a seventh embodiment of the present invention. As can be seen from FIG. 9, the computing device is provided with a processor 900, a storage device 901, a communication module 902, an input device 903, and an output device 904. The computing device may include one or more processors 900, in FIG. 9 shows one processor 900 by way of example. Processor 900, storage device 901 and communication module 902, input device 903 and output device 904 in a computing device may be connected by a bus or other means, and in FIG. 9 shows a bus connection as an example.

[00221] Вычислительное устройство согласно этому варианту осуществления может осуществлять способ обучения модели модерации контента или способ модерации видеоконтента согласно любому варианту осуществления настоящего изобретения, и имеет функциональные модули и преимущества в соответствии с этим способом.[00221] The computing device according to this embodiment can implement the content moderation model learning method or the video content moderation method according to any embodiment of the present invention, and has functional modules and advantages according to this method.

[00222] Восьмой вариант осуществления[00222] Eighth Embodiment

[00223] Этот вариант осуществления предусматривает машиночитаемое запоминающее устройство, на котором хранят компьютерную программу. Запуск компьютерной программы процессором вычислительного устройства приводит к тому, что вычислительное устройство осуществляет способ обучения модели модерации контента или способ модерации видеоконтента.[00223] This embodiment provides for a computer-readable storage device that stores a computer program. Running the computer program by the processor of the computing device causes the computing device to perform a content moderation model learning method or a video content moderation method.

[00224] Способ обучения модели модерации контента предусматривает:[00224] A method for training a content moderation model includes:

[00225] получение образца видеофайла, при этом образец видеофайла содержит некоторое количество кадров видеоданных;[00225] obtaining a sample video file, wherein the sample video file contains a number of frames of video data;

[00226] извлечение части видеоданных в качестве образца видеоданных;[00226] extracting a portion of video data as a sample of video data;

[00227] установление момента времени в образце видеоданных из образца видеофайла в случае, когда образец видеоданных содержит недопустимый контент;[00227] establishing a point in time in the video data sample from the video file sample in case the video data sample contains invalid content;

[00228] извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[00228] extracting a significant portion of video data from the video data in the area of this point in time; And

[00229] обучение модели модерации контента, исходя из участка видеоданных и образца видеоданных.[00229] training a content moderation model based on a portion of video data and a sample of video data.

[00230] Способ модерации видеоконтента предусматривает:[00230] The video content moderation method includes:

[00231] получение проверяемого видеофайла, при этом проверяемый видеофайл содержит некоторое количество кадров видеоданных;[00231] obtaining a video file to be checked, wherein the video file to be checked contains a certain number of frames of video data;

[00232] извлечение части видеоданных в качестве проверяемых видеоданных;[00232] extracting a portion of the video data as the video data to be checked;

[00233] установление момента времени в проверяемых видеоданных проверяемого видеофайла в случае, когда проверяемые видеоданные содержат недопустимый контент;[00233] establishing a point in time in the checked video data of the checked video file in the case when the checked video data contains invalid content;

[00234] извлечение значимого участка видеоданных из видеоданных в области этого момента времени; и[00234] extracting a significant portion of the video data from the video data in the area of this point in time; And

[00235] модерация контента проверяемого видеофайла путем ввода участка видеоданных и проверяемых видеоданных в предварительно заданную модель модерации контента.[00235] moderating the content of the video file being checked by entering the video data portion and the video data being checked into a predefined content moderation model.

[00236] Согласно этому варианту осуществления настоящего изобретения, для машиночитаемого запоминающего устройства компьютерная программа не ограничена описанным выше способом работы, и может также выполнять сопутствующие операции в способе обучения модели модерации контента или способе модерации видеоконтента согласно любому из вариантов осуществления настоящего изобретения.[00236] According to this embodiment of the present invention, for a computer-readable storage device, the computer program is not limited to the above-described operation method, and can also perform related operations in the content moderation model learning method or the video content moderation method according to any of the embodiments of the present invention.

[00237] Исходя из приведенного выше описания вариантов осуществления, настоящее изобретение могут осуществлять посредством программного обеспечения и необходимого обычного аппаратного обеспечения, или могут осуществлять посредством аппаратного обеспечения. Технические решения настоящего изобретения могут осуществлять в виде программного продукта, и программный продукт могут хранить на машиночитаемом запоминающем устройстве, например, на дискете, постоянном запоминающем устройстве (ROM), в оперативной памяти (RAM), флэш-памяти (FLASH), на жестком диске, оптическом диске компьютера и т.п. Машиночитаемое запоминающее устройство содержит различные команды, при помощи которых вычислительное устройство (это может быть персональный компьютер, сервер или сетевое устройство, и т.п.) выполняет способы, описанные в различных вариантах осуществления настоящего изобретения.[00237] Based on the above description of the embodiments, the present invention may be implemented by software and the necessary conventional hardware, or may be implemented by hardware. The technical solutions of the present invention may be implemented as a software product, and the software product may be stored on a computer-readable storage device, for example, a floppy disk, read only memory (ROM), random access memory (RAM), flash memory (FLASH), hard disk , computer optical disc, etc. The computer-readable storage device contains various instructions by which the computing device (this may be a personal computer, server or network device, etc.) performs the methods described in various embodiments of the present invention.

[00238] Согласно описанным выше вариантам осуществления устройства для обучения модели модерации контента и устройства для модерации видеоконтента, предусмотренные узлы и модули только разделены согласно функциональной логике, но не ограничены описанным выше разделением при условии, что могут быть реализованы соответствующие функции. Кроме того, названия функциональных узлов также носят только разграничительный характер и не преследуют цели ограничить объем настоящего изобретения.[00238] According to the above-described embodiments of the content moderation model learning apparatus and the video content moderation apparatus, the provided nodes and modules are only separated according to the functional logic, but are not limited to the above-described separation, provided that the corresponding functions can be implemented. In addition, the names of functional units are also only delimiting and are not intended to limit the scope of the present invention.

Claims

1. A method for training a video content moderation model, which includes:

extracting a portion of the video data from the sample video file as the sample video data;

establishing a point in time in the video data sample from the video file sample in case the video data sample contains invalid content;

extracting a significant portion of the video data from the video data in the area of this point in time; And

training a video content moderation model based on the video data section and the video data sample;

extracting a significant portion of video data from video data in the area of this point in time involves:

determining a time range that contains this point in time;

searching for a significant area detection model designed to determine a significant area in the video data; And

detecting a significant portion of video data within the video data by inputting video data within a time range into a significant portion detection model.

2. The method according to claim 1, wherein extracting a portion of video data from a sample video file as a sample video data includes:

dividing the sample video file into at least two video fragments of the sample; And

extracting a portion of the video data from each video clip of the sample as a sample of the video data.

3. The method of claim 2, wherein extracting a portion of the video data from the sample video file as the sample video data comprises at least one of the following:

ranking the sample video data in chronological order; and/or

scaling the sample video data to a predefined size.

4. The method according to claim. 1, in which the establishment of a point in time in the video data sample from the video file sample in the case when the video data sample contains invalid content involves:

search for a violation recognition model, designed to determine the assessment of the inadmissibility of the image in the content of the video data;

determining an image inadmissibility score in the video sample content by inputting the video sample into a violation recognition model;

selecting a sample of video data with an image invalidity score corresponding to a predetermined invalidity condition; And

determining a point in time in the sample video data corresponding to a predetermined invalid condition from the sample video file;

5. The method according to claim 4, in which the search for a violation recognition model includes:

determining the category of violation with which the sample video file is marked and representing inadmissible content; And

searching for a violation recognition model corresponding to the violation category, wherein the violation recognition model is designed to determine the image inadmissibility score in the content that belongs to the violation category in this video data.

6. The method of claim 4, wherein selecting a video data sample with an image invalidation score corresponding to a predetermined invalidation condition comprises:

determining whether the video data sample contains an image invalid score that exceeds a predetermined image evaluation threshold;

determining that the image invalid score meets a predetermined violation condition when the video sample contains an image invalid score that exceeds a predefined image evaluation threshold; And

determining that the maximum image invalid score value meets a predetermined violation condition in the case where the video data sample does not contain an image invalid score that exceeds the predefined image evaluation threshold.

7. The method according to any one of paragraphs. 1-6, in which the training of the video content moderation model, based on the video data section and the video data sample, provides for:

determining the category of violation with which the sample video file is marked and which represents inadmissible content.

obtaining a deep neural network and a pre-trained model;

initialization of a deep neural network using a pre-trained model;

training a deep neural network by backpropagating an error as a video content moderation model based on a video data section, a video data sample, and a violation category.

8. Video content moderation method, which includes:

extracting a part of the video data from the video file being checked as the target video data;

establishing a point in time in the checked video data of the checked video file in the case when the checked video data contains invalid content; And

moderating the content of the video file to be checked by inputting the video data portion and the video data to be checked into a predefined video content moderation model;

moreover, extracting a significant portion of video data from video data in the area of this point in time provides for:

determining a time range that contains this point in time;

9. The method according to claim 8, in which the input of the video data section and the video data being checked into a predefined video content moderation model for the purpose of moderating the content of the video file being checked includes:

determining a file inadmissibility score for the content of the video file being checked by entering a portion of video data and the video data being checked into a predefined content moderation model in the case where the video content of the video file being checked belongs to a predefined violation category;

determining a file evaluation threshold;

determining that the file being checked is valid when the file's invalid score is less than or equal to the file's evaluation threshold.

10. The method according to claim 9, wherein moderating the content of the video file being checked by inputting a portion of video data and the video data being checked into a predefined video content moderation model further comprises:

transmitting the video file to be checked to the client in the case where the evaluation of the file's inadmissibility is higher than the threshold value of the evaluation of the file;

determining the content of the video file being checked as valid in the case where the first moderation information is received from the client; And

determining the content of the video file being checked as invalid in the case where the second moderation information is received from the client.

11. The method of claim 9 or 10, wherein determining the file evaluation threshold comprises:

determining the total number of checked video files for the previous period of time in which the evaluation of the inadmissibility of the file for the checked video file was determined;

creating a file rating threshold in such a way that the ratio of the number of moderations to the total number of matches corresponds to a predetermined push indicator, while the number of moderations is the number of video files being checked for which the file's invalid ratings are higher than the file rating threshold.

12. A device for training a video content moderation model, which includes:

a video sample extractor for extracting a portion of the video data from the sample video file as a video sample;

a time point setting module for setting a time point in the video sample of the video file sample in case the video data sample contains invalid content;

a video chunk extractor for extracting a significant chunk of video data from the video data in a region of that point in time; And

a model training module for training a video content moderation model based on the video data portion and the video data sample;

in this case, the module for extracting a section of video data provides:

a time range determination sub-module, configured to determine the time range that contains this point in time;

a significant area detection model search submodule, configured to search for a significant area detection model, configured to detect a significant area of an image in the video data; And

a significant area detection model processing submodule, configured to detect a significant portion of video data in the video data by inputting video data from the time range into the significant area detection model.

13. A device for moderating video content, providing for:

an inspected video data extraction module for extracting a portion of video data from the inspected video file as inspected video data;

a time point setting module for setting a time point in the video data to be checked from the video file to be checked if the video data to be checked contains invalid content; And

a video chunk extractor for extracting a significant chunk of video data from the video data in a region of that point in time;

a video material moderation module designed to moderate the content of the video file to be checked after entering the video data section and the video data to be checked into a predefined video content moderation model,

in this case, the module for extracting a section of video data provides:

14. Computing device for training the video content moderation model, providing for:

one or more processors;

a storage device designed to store one or more programs;

at the same time, the launch of one or more programs by one or more processors leads to the execution of the method for training the video content moderation model according to the description in any of claims 1-7.

15. A non-volatile computer-readable storage device that stores a computer program, wherein running the computer program by the processor of the computing device causes the computing device to perform a method for learning a video content moderation model as described in any one of paragraphs. 1-7.

16. Computing device for training the video content moderation model, providing for:

one or more processors;

a storage device designed to store one or more programs;

wherein the launch of one or more programs by one or more processors leads to the execution of the video content moderation method as described in any of paragraphs. 8-11.

17. A non-volatile computer-readable storage device that stores a computer program, wherein the launch of the computer program by the processor of the computing device causes the computing device to perform the video content moderation method as described in any one of paragraphs. 8-11.