CN111144360A

CN111144360A - Multimode information identification method and device, storage medium and electronic equipment

Info

Publication number: CN111144360A
Application number: CN201911410099.1A
Authority: CN
Inventors: 王猛; 敖乃翔; 王德勇; 师文喜; 张鑫; 吴少强; 王宇琪; 赵学义; 刘海强; 马磊
Original assignee: Xinjiang Lianhai Chuangzhi Information Technology Co ltd
Current assignee: Xinjiang Lianhai Chuangzhi Information Technology Co ltd
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-12

Abstract

The invention provides a multimode information identification method, a device, a storage medium and electronic equipment. Therefore, the technical scheme of automatically screening the multimode information based on the early warning model is provided, and the screening efficiency is improved.

Description

Multimode information identification method and device, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of information identification, in particular to a multimode information identification method, a multimode information identification device, a storage medium and electronic equipment.

Background

Currently, some specific fields, such as the public security field, need to identify the content of the video, and the current content identification is manually filtered, and the filtering efficiency is low.

Therefore, how to provide a method for identifying multimodal information, which can automatically screen multimodal information and improve the screening efficiency, is a great technical problem to be solved by those skilled in the art.

Disclosure of Invention

In view of this, the embodiment of the present invention provides a method for identifying multimodal information, which can automatically perform screening of multimodal information, and improve screening efficiency.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

a multimodal information identification method comprising:

acquiring characteristic parameters of a video to be recognized, wherein the characteristic parameters at least comprise one or more of face characteristic information, character behavior characteristic information, activity scene characteristic information, subtitle content characteristic information, video background characteristic information and audio characteristic information;

and inputting the characteristic parameters into a trained target information early warning model, and outputting a target video expression tendency by the target early warning model.

Optionally, the method further includes:

acquiring a user confirmation result triggered by the user based on the target video expression tendency;

generating a positive and negative sample set based on the user confirmation result;

and training the target information early warning model based on the positive and negative sample sets.

Optionally, the inputting the characteristic parameters into a trained target information early warning model, and outputting a target video expression tendency by the target early warning model includes:

acquiring the weight of each characteristic parameter in the target information early warning model;

and based on the weight, carrying out weighting processing on the characteristic parameters input into the target information early warning model, and outputting a target video expression tendency.

Optionally, the training the target information early warning model based on the positive and negative sample sets includes:

acquiring index information of each video in the positive and negative sample sets;

and training the target information early warning model based on the index information.

A multimode information identification device comprising:

the first acquisition module is used for acquiring characteristic parameters of a video to be identified, wherein the characteristic parameters at least comprise one or more of face characteristic information, character behavior characteristic information, activity scene characteristic information, subtitle content characteristic information, video background characteristic information and audio characteristic information;

and the processing module is used for inputting the characteristic parameters into the trained target information early warning model and outputting the target video expression tendency by the target early warning model.

Optionally, the method further includes:

the second acquisition module is used for acquiring a user confirmation result triggered by the user based on the target video expression tendency;

the generating module is used for generating a positive and negative sample set based on the user confirmation result;

and the training module is used for training the target information early warning model based on the positive and negative sample sets.

Optionally, the processing module includes:

the first acquisition unit is used for acquiring the weight of each characteristic parameter in the target information early warning model;

and the processing unit is used for performing weighting processing on the characteristic parameters input into the target information early warning model based on the weight and outputting a target video expression tendency.

Optionally, the training module includes:

the second acquisition unit is used for acquiring index information of each video in the positive and negative sample sets;

and the training unit is used for training the target information early warning model based on the index information.

A storage medium comprising a stored program, wherein a device on which the storage medium is located is controlled to execute any one of the above multimode information identification methods when the program runs.

An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform any one of the above multimodal information identification methods.

Based on the technical scheme, the invention provides a multimode information identification method, a device, a storage medium and electronic equipment. Therefore, the technical scheme of automatically screening the multimode information based on the early warning model is provided, and the screening efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flowchart of a multimode information identification method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of a multimode information identification method according to an embodiment of the invention;

fig. 3 is a schematic flowchart of a multimode information identification method according to an embodiment of the invention;

fig. 4 is a schematic flowchart of a multimode information identification method according to an embodiment of the invention;

fig. 5 is a schematic structural diagram of a multimode information identification device according to an embodiment of the invention;

FIG. 6 is a diagram illustrating a multimodal information identification system according to an embodiment of the present invention;

fig. 7 is a hardware schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Referring to fig. 1, fig. 1 is a schematic flow chart of a multimode information identification method according to an embodiment of the present invention, where the multimode information identification method is based on an early warning model, and a technical scheme for automatically screening multimode information improves screening efficiency, and specifically includes the following steps:

and S11, acquiring the characteristic parameters of the video to be identified.

In this embodiment, the feature parameters may include face feature information, character behavior feature information, moving scene feature information, subtitle content feature information, video background feature information, and audio feature information.

Specifically, each characteristic parameter of the video, including pictures, subtitles, audio, titles, and the like, can be extracted by constructing a multivariate data identification channel. For example, the video vehicle feature and the human face feature and the sensitive scene feature in the video are analyzed and identified by the specific object identification model based on the feature pyramid and the video scene identification model based on the deep neural network.

For another example, a semi-supervised algorithm and a data popularization method can be adopted to realize the conversion of the recognition of the middle and English voices in the video into the text, and a neural network algorithm and a line-distinguishing training method based on the input sequence pointer are adopted to realize the voice recognition, the language recognition and the sensitive word retrieval of the voice data in the complex channel environment of the video.

For another example, a multi-mode comprehensive algorithm is adopted based on an OCR recognition framework of deep learning, and a function of converting a dimensional image into a text with high accuracy is realized aiming at characteristics of Uyghur grammar, writing and the like. Based on a neural network machine translation model, a neural network algorithm of an input sequence pointer is adopted, and an attention mechanism method is used to realize semantic understanding of a large segment of dimensional language and translate the semantic understanding into a Chinese function.

Of course, the embodiment of the present invention may also adopt other identification manners of the characteristic parameters, and is not limited to the foregoing exemplary manner.

And S12, inputting the characteristic parameters into the trained target information early warning model, and outputting the target video expression tendency by the target early warning model.

It should be noted that, in the embodiment of the present invention, the target information early warning model needs to be trained in advance, then the obtained characteristic parameters are input into the trained model, and then the model outputs the prediction result. The target video expression tendency is an expression mode for representing video content, such as mark identification, yellow identification and the like.

For example, the model is established by screening, identifying and classifying data of the identification channel to form effective metadata, then an expert system is built according to a front-line practical application experience, multi-mode data is subjected to fusion analysis, and a social security sensitive information early warning model suitable for the social security under a mobile internet social platform is established.

Therefore, the scheme can comprehensively judge the video expression content through multi-angle recognition of characters, behaviors, scenes, subtitles, backgrounds, voices and the like in the short video based on an artificial intelligence technology, achieves automatic screening of the video, and improves screening efficiency compared with manual screening.

On the basis of the above embodiment, as shown in fig. 2, an embodiment of the present invention further provides a specific implementation manner in which the feature parameters are input into a trained target information early warning model, and the target early warning model outputs a target video expression tendency, including the steps of:

s21, acquiring the weight of each characteristic parameter in the target information early warning model;

and S22, based on the weight, carrying out weighting processing on the characteristic parameters input into the target information early warning model, and outputting a target video expression tendency.

In the embodiment of the present invention, there are many product identification channels, that is, many factors for determining the determination result, so it is necessary to provide a weighted integration determination method as a standard for video representation tendency. For example, the content item contains a human-vehicle factor weight of 60% (face weight, vehicle weight), a keyword factor of 20% (speech, caption, title, comment), a scene factor of 20% (multi-action scene, logo, building).

And after the weight is determined, performing weighting processing on the characteristic parameters input into the target information early warning model, and further outputting the target video expression tendency.

Specifically, in order to further improve the output accuracy of the target information model, as shown in fig. 3, the embodiment of the present invention may further include, on the basis of the above multimode information identification method, the steps of:

s31, acquiring a user confirmation result triggered by the user based on the target video expression tendency;

s32, generating a positive and negative sample set based on the user confirmation result;

and S33, training the target information early warning model based on the positive and negative sample sets.

The training of the target information early warning model based on the positive and negative sample sets may be implemented in a manner as shown in fig. 4, including:

s41, acquiring index information of each video in the positive and negative sample set;

and S42, training the target information early warning model based on the index information.

Namely, the system identifies the video through a video content identification channel, such as extracting and identifying human faces in short video, identifying character behavior characteristics, identifying characteristics of active scenes, identifying caption content, identifying background characteristics of the video and identifying voice content in the video.

And then the recognition result passes through an information early warning model, the video expression tendency is comprehensively judged by weighting each item, then the recognized target video is displayed to a user, the recognition result is confirmed by the user, meanwhile, the system stores the video confirmation result into a positive and negative sample library, and the recognition channel algorithm is optimized by manual participation to improve the recognition accuracy.

On the basis of the above embodiment, as shown in fig. 5, an embodiment of the present invention further provides a multimode information identification apparatus, including:

the first obtaining module 51 is configured to obtain feature parameters of a video to be identified, where the feature parameters at least include one or more of face feature information, character behavior feature information, activity scene feature information, subtitle content feature information, video background feature information, and audio feature information;

and the processing module 52 is configured to input the characteristic parameters into the trained target information early warning model, and output a target video expression tendency by the target early warning model.

In addition, the multimode information identification device may further include:

Wherein the processing module may include:

In addition, the training module may include:

The working principle of the device is described in the above embodiments of the method, and will not be described repeatedly.

On the basis of the above embodiments, the embodiment of the present invention further provides a multimode information identification system, the structure of which is shown in fig. 6, and the multimode information identification system includes a database 61, a video content identification channel 62, an information early warning model 63, a positive and negative sample library 64, and an identification channel optimization module 65.

The multimode information identification device comprises a processor and a memory, wherein the first acquisition module, the processing module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.

The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. The kernel can be set to be one or more than one, and the screening of the multimode information is automatically carried out by adjusting the kernel parameters, so that the screening efficiency is improved.

An embodiment of the present invention provides a storage medium having a program stored thereon, which when executed by a processor implements the multimode information identification method.

The embodiment of the invention provides a processor, which is used for running a program, wherein the multimode information identification method is executed when the program runs.

An embodiment of the present invention provides an electronic device, as shown in fig. 7, the device includes at least one processor 71, and at least one memory 72 and a bus 73 connected to the processor; the processor and the memory complete mutual communication through a bus; the processor is used for calling the program instructions in the memory to execute the multimode information identification method. The device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:

Optionally, the method further includes:

In summary, the present invention provides a method, an apparatus, a storage medium and an electronic device for identifying multi-mode information, wherein the identifying method first obtains characteristic parameters of a video to be identified, then inputs the characteristic parameters into a trained target information early warning model, and outputs a target video expression tendency through the target early warning model. Therefore, the technical scheme of automatically screening the multimode information based on the early warning model is provided, and the screening efficiency is improved.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A multimode information identification method is characterized by comprising the following steps:

2. The multimodal information identification method as claimed in claim 1, further comprising:

3. The multimodal information identification method according to claim 1, wherein the inputting the feature parameters into a trained target information early warning model, and outputting target video expression tendency by the target early warning model comprises:

4. The multimodal information identification method according to claim 2, wherein the training the target information early warning model based on the positive and negative sample sets comprises:

5. A multimode information recognition device, comprising:

6. The multimode information identification device of claim 5, further comprising:

7. The multimodal information apparatus of claim 5 wherein the processing module comprises:

8. The multimodal information recognition apparatus of claim 6 wherein the training module comprises:

9. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device where the storage medium is located is controlled to execute the multimode information identification method according to any one of claims 1 to 4.

10. An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete mutual communication through the bus; the processor is configured to call program instructions in the memory to perform the multimodal information identification method according to any one of claims 1 to 4.