CN116052142A

CN116052142A - Information identification method and device

Info

Publication number: CN116052142A
Application number: CN202111262941.9A
Authority: CN
Inventors: 陈小帅
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-10-28
Filing date: 2021-10-28
Publication date: 2023-05-02

Abstract

The application provides an information identification method and device, which are applied to the technical field of computers. The method comprises the following steps: acquiring characteristic information of a target object; invoking an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises an active tendency probability and an active entity probability, and the object recognition model is obtained by training based on a training sample object and an active label and an entity label which are correspondingly marked; and determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not. Through the method and the device, the identification efficiency and the identification accuracy of the active objects can be improved.

Description

Information identification method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an information identification method and apparatus.

Background

The object may be a post published for a person (e.g., actor, singer, director), program, work, etc. intellectual property (Intellectual Property, IP). At present, whether the identified object is an active object or not is mainly checked by means of rules and manpower, and polymorphic data of the object are not considered, so that the identification efficiency of the active object is low, the recall rate and the accuracy rate of identification are low, and the distribution requirement of the active object is difficult to meet.

Disclosure of Invention

The embodiment of the invention provides an information identification method and an information identification device, which can improve the identification efficiency and the identification accuracy of active objects.

In one aspect, an embodiment of the present invention provides an information identification method, including:

acquiring characteristic information of a target object;

invoking an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises an active tendency probability and an active entity probability, and the object recognition model is obtained by training based on a training sample object and an active label and an entity label which are correspondingly marked;

and determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not.

In one aspect, an embodiment of the present application provides an information identifying apparatus, including:

the acquisition unit is used for acquiring the characteristic information of the target object;

the processing unit is used for calling an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained by training based on a training sample object and active labels and entity labels which are correspondingly marked;

And a determining unit configured to determine, based on the activity state information, a recognition result of the target object, the recognition result being used to indicate whether the target object is an active object.

In one aspect, an embodiment of the present application provides a computer device, where the computer device includes a memory and a processor, and the memory stores a computer program, where the computer program, when executed by the processor, causes the processor to execute the information identifying method described above.

In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program that, when read and executed by a processor of a computer device, causes the computer device to perform the above-described information identification method.

In one aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions so that the computer device performs the information identifying method described above.

According to the embodiment of the application, the characteristic information of the target object is obtained; invoking an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained by training based on a training sample object and an active label and an entity label which are correspondingly marked; the identification result of the target object is determined based on the active state information, and the identification result is used for indicating whether the target object is an active object or not, so that the identification efficiency and the identification accuracy of the active object can be improved, effective active objects can be quickly mined from mass data, and the distribution effect of the active objects can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic diagram of an architecture of an information recognition system according to an embodiment of the present invention;

Fig. 2 is a schematic flow chart of an information identification method according to an embodiment of the present invention;

FIG. 3a is a schematic diagram of an interactive interface according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of another interactive interface provided by an embodiment of the present invention;

fig. 4 is a flowchart of another information identifying method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a text object recognition network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a graphic object recognition network according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a video object recognition network according to an embodiment of the present invention;

FIG. 8 is a flowchart of another information identification method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an information identifying apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the descriptions of "first," "second," and the like in the embodiments of the present application are for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a technical feature defining "first", "second" may include at least one such feature, either explicitly or implicitly.

First, some terms related to embodiments of the present application are explained for easy understanding by those skilled in the art.

BERT model: is called Bidirectional Encoder Representations from Transformers, which is a new language model proposed by google, and pre-trains bi-directional depth representation (Embedding) by jointly adjusting bi-directional converters (transducers) in all layers.

Optical character recognition (Optical Character Recognition, OCR) technology: the method refers to a process of analyzing, identifying and processing the image file of the text data to obtain the text and layout information. I.e. the text in the image is identified and returned in the form of text.

Automatic speech recognition (Automatic Speech Recognition) technique: a technology for converting human voice into text features that the voice signal is automatically recognized and understood by machine through speech signal processing and mode recognition, and then converted into corresponding text or command. The main flow comprises the following steps: speech input, encoding (feature extraction), decoding, and text output.

Depth residual network (Deep Residual Network, res net): resNet solves the problem that deep convolutional neural network (Convolutional Neural Networks, CNN) models are difficult to train. The residual network is characterized by easy optimization and can improve accuracy by increasing considerable depth. The residual blocks inside the deep neural network are connected in a jumping mode, and the gradient disappearance problem caused by depth increase in the deep neural network is relieved.

Video fingerprint detection: the method comprises the steps of extracting the inherent characteristic information of the video content, and obtaining a digital sequence which is uniquely corresponding to the video and can uniquely identify the video through an algorithm based on the characteristic information, wherein no information is needed to be embedded in the video, and the integrity of the video is maintained. The video fingerprint is extracted based on the video content, has unique relevance with the video content, and can find the corresponding video by searching the video fingerprint. Meanwhile, the video fingerprint has good robustness and distinguishing property, can well meet the characteristic that the Internet video is easy to attack, and ensures that the video has good distinguishing property after being attacked so as to be convenient for application such as authentication and the like.

Self-Attention (Self-Attention) model: the attention model mimics the internal process of biological observation behavior, a mechanism that aligns internal experience with external sensations to increase the observation finesse of the partial region. Attention models can quickly extract important features of sparse data and are thus widely used for natural language processing tasks, particularly machine translation. While self-attention mechanisms are improvements to attention models that reduce reliance on external information, are more adept at capturing internal correlations of data or features.

In order to improve the identification efficiency and the identification accuracy of active objects, an information identification method is provided in the embodiments of the present application.

The information identification method provided by the embodiment of the application can be realized based on an artificial intelligence technology, wherein the artificial intelligence (Artificial Intelligence, AI) technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chip cloud computing, cloud storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Computer Vision (CV) is a science of how to make a machine "look at", and more specifically, to replace human eyes with a camera and a Computer to perform machine Vision such as recognition, tracking and measurement on a target, and further perform graphic processing, so that the Computer processes the target into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others, as well as common biometric recognition techniques such as face recognition, fingerprint recognition, and others.

Natural language processing (Nature Language processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language. Natural language processing is a science that integrates linguistics, computer science, and mathematics. Thus, the research in this field will involve natural language, i.e. language that people use daily, so it has a close relationship with the research in linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic questions and answers, knowledge graph techniques, and the like.

With research and progress of artificial intelligence technology, the artificial intelligence technology is developed and applied in a plurality of fields, and the embodiment of the application can specifically relate to the technologies such as computer vision technology, natural language processing technology and the like in the artificial intelligence technology when a data processing method is realized.

Referring to fig. 1, fig. 1 is a schematic architecture diagram of an information identifying system according to an embodiment of the present application, and as shown in fig. 1, the information identifying system 100 may include a plurality of terminal devices 101 and a server 102. Of course, the information identifying system 100 may also include one or more terminal devices 101 and a plurality of servers 102, which are not limited in this embodiment. The terminal device 101 is mainly configured to send one or more target objects to the server 102, and receive one or more target objects that are active objects and are sent by the server 102 as a result of recognition; the server 102 is mainly configured to perform relevant steps of the information recognition method, obtain a recognition result of the target object, and send the target object whose recognition result is an active object to the terminal device 101. The terminal device 101 and the server 102 may implement a communication connection, and a connection manner thereof may include a wired connection and a wireless connection, which is not limited herein.

In one possible implementation manner, the terminal device 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart car, etc.; the server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like.

In combination with the above information identification system, the information identification method of the embodiment of the present application may generally include: the method comprises the steps that feature information of a target object sent by a terminal device 101 is obtained, a server 102 calls an object recognition model to process the feature information of the target object to obtain active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained through training based on a training sample object and active labels and entity labels which are correspondingly marked. Then, the server 102 determines a recognition result of the target object based on the active state information, the recognition result being used to indicate whether the target object is an active object. Finally, the server 102 transmits the target object whose recognition result is the active object to the terminal device 101. According to the method, whether the target object is an active object is judged, firstly, the characteristic information of the target object is obtained, then the characteristic information is processed by means of an object identification model to obtain the active state information of the target object, and finally, whether the target object is the active object is judged according to the active state information, so that the identification efficiency and the identification accuracy of the active object are improved.

In one embodiment, the object recognition model may include a plurality of recognition networks, including, by way of example, a text object recognition network, a graphics object recognition network, and a video object recognition network. In addition, each recognition network of the plurality of recognition networks may include a different hierarchy of processing modules, illustratively, the text object recognition network includes a BERT layer comprised of BERT networks and a feature representation layer, and the video object recognition network includes a BERT layer comprised of BERT networks, a feature representation layer, a self-attention layer.

It may be understood that the schematic diagram of the system architecture described in the embodiments of the present application is for more clearly describing the technical solution of the embodiments of the present application, and does not constitute a limitation on the technical solution provided in the embodiments of the present application, and those skilled in the art can know that, with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiments of the present application is equally applicable to similar technical problems.

Based on the above description of the architecture of the information recognition system, the embodiment of the present application discloses an information recognition method, please refer to fig. 2, which is a schematic flow chart of an information recognition method disclosed in the embodiment of the present application, where the information recognition method may be executed by a computer device, and the computer device may specifically be the server 102 in the information recognition system. The information identification method specifically includes steps S201 to S203:

S201, acquiring characteristic information of the target object.

In the embodiment of the present application, the object may refer to a user paste, where the user paste represents comments, mindsets, and the like published in an electronic forum on the internet, and may include text, images, sound, video, and the like. In addition, the active object may specifically refer to a user paste published to a certain entity for enhancing the popularity of the entity.

As shown in fig. 3a, fig. 3a is a schematic diagram of an interactive interface provided in an embodiment of the present application, where the interactive interface is an interactive interface for star a, and includes a popularity ranking, a number of fans, a posting option, a discussion area, a video area, a work area, and an album area of star a. When the terminal equipment detects the clicking operation of the user for the posting options, the posting of the object can be performed. The object posted by the user will be displayed in the discussion area, as shown in FIG. 3a is a text post posted by user "111111" including the text content "Star A will attend the awarding event in open day-! "and associated topic" star a ". It follows that the text paste does not have an active tendency, i.e. the text paste is not an active object.

As shown in fig. 3b, fig. 3b is a schematic diagram of another interactive interface provided in an embodiment of the present application, where the interactive interface is an interactive interface for star a, and includes a popularity ranking, a number of fans, a list playing option, a joining option, a posting option, a discussion area, a video area, a work area, and an album area of star a. When the terminal equipment detects the clicking operation of the user for the posting options, the posting of the object can be performed. The objects posted by the user are displayed in the discussion area, as shown in FIG. 3b, with the graphic post posted by user "222222" including the text content "wish to make a weekend happy-! Praise the cheer-! "associated topic" Star A ", and pictures of Star A. It follows that the tile has an active tendency, i.e. the tile is an active object.

It should be noted that the target object may be a different type of user paste, for example, the type of the user paste may be a text paste, a graphic paste, a video paste, a text-picture-video hybrid paste, or the like. Different types of target objects correspond to different characteristic information. For example, if the target object is a text object, the obtained feature information of the target object includes text information corresponding to text content in the target object. For example, if the target object is a graphic object, the obtained feature information of the target object includes text information corresponding to text content in the target object, character information corresponding to a picture in the target object, and text information. For example, if the target object is a video object, the obtained feature information of the target object includes text information corresponding to text content in the target object, character information and text information corresponding to pictures in the target object, character information corresponding to video in the target object, video fingerprint information and text information.

In one possible implementation, the characteristic information of the target object includes association information of the target object, the association information including one or more of an associated topic and an associated entity. The target object may have specific association information, and the association information may be an entity corresponding to the object or a topic corresponding to the object, which is not limited herein. For example, when a user paste for star a is issued, the user paste may indicate association information, which may be star a, a movie work in which star a participates, an activity in which star a participates, or the like. And the associated information is used as the auxiliary text characteristic of the current object to be identified, spliced in front of the text content in the target object to be identified correspondingly, and the relationship data characteristic of the object is fully utilized, so that the identification efficiency and the identification accuracy of the active object are improved.

S202, invoking an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained by training based on training sample objects and active labels and entity labels corresponding to the training sample objects.

In the embodiment of the present application, there are two main targets for the identification of active objects: firstly, the object is required to be an object with an active tendency, namely the object is required to have enough active tendency; secondly, specific active entities need to be identified. The object recognition model carries out multi-target combined training based on the training sample object and the corresponding marked active label and entity label, and links the two target recognition, thereby improving the recognition accuracy. And processing the characteristic information of the target object by means of the object recognition model so as to obtain the active state information, namely the active tendency probability and the active entity probability, of the target object.

In one possible implementation, the method further includes: acquiring a training data set, wherein the training data set comprises a training sample object, and an active label and an entity label which are correspondingly marked, the active label is used for indicating whether the training sample object is an active object, and the entity label is used for indicating each entity included in the training sample object; and training the initial neural network model by using the training data set to obtain an object recognition model. The object recognition model is obtained by training massive objects, and the specific training process comprises the following steps:

1. Acquisition of training data sets: and labeling the active labels and the entity labels on the plurality of training sample objects, and forming a training data set by the plurality of training sample objects, the corresponding labeled active labels and the entity labels. The active tag is used for indicating whether the training sample object is an active object, and the entity tag is used for indicating each entity included in the training sample object. For example, when the training sample object is an active object, a label of "1" is used; when the training sample object is not an active object, the label "0" is used to indicate. The training sample object M is known to be an active object, and thus the active label of the training sample object M is 1. The training sample object M includes entities including star a, star B, work C and program D, so that the entity labels of the training sample object M are star a, star B, work C and program D.

2. Training process: and inputting the plurality of training sample objects, training the initial neural network model by using the plurality of training sample objects, adjusting parameters of the initial neural network model by using the active labels and the entity labels marked by each training sample object, and completing training after the training times are reached to obtain the object identification model.

In one possible implementation, the initial neural network model includes a text object recognition network, a graphics object recognition network, and a video object recognition network, training the initial neural network model using the training data set, including: training the text object identification network by using the text object in the training sample object and the corresponding marked active label and entity label; initializing the network parameters of the image-text object recognition network and the network parameters of the video object recognition network by using the trained network parameters of the text object recognition network; and training the image-text object recognition network after the network parameter initialization by using the image-text object in the training sample object and the active label and the entity label which are correspondingly marked, and training the video object recognition network after the network parameter initialization by using the video object in the training sample object and the active label and the entity label which are correspondingly marked. It should be noted that, since the release cost of the text object is small, the number of such objects is the largest, in the embodiment of the present application, the text object recognition network is used as the basic recognition network, and the network parameters of the text object recognition network after training are used as the initialization network parameters of the image-text object recognition network and the initialization network parameters of the video object recognition network, so that the text object recognition network and the video object recognition network are initialized to a better state at the beginning of training, thereby reducing the training cost.

S203, determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not.

In the embodiment of the application, after the feature information of the target object is processed by calling the object recognition model to obtain the active state information of the target object, the recognition result of the target object can be determined according to the active state information, so that the target object is determined to be an active object or the target object is determined not to be an active object, and recognition of the target object is realized.

In one possible implementation manner, the active state information of the target object specifically includes an active tendency probability and an active entity probability of each text segment corresponding to the feature information, and determining the recognition result of the target object based on the active state information includes: if a target text segment exists in each text segment, the probability of the activity tendency of the target text segment is larger than a first preset threshold, and the probability of the activity entity of the target text segment is larger than a second preset threshold, determining that the identification result of the target object is an active object, wherein the target text segment is any one of the text segments. That is, as long as the probability of the activity tendency of any one text segment in the text segments corresponding to the feature information is greater than the first preset threshold value, and the probability of the activity entity of the text segment is greater than the second preset threshold value, the target object is considered to be an active object.

For example, the first preset threshold is 60%, the second preset threshold is 50%, and the feature information of the target object N is divided into 3 text segments, namely, a text segment a, a text segment b, and a text segment c. The probability of the active tendency of the text segment a is 90%, and the probability of the active entity of the text segment a is 85%; the probability of the activity tendency of the text segment b is 45%, and the probability of the activity entity of the text segment b is 30%; the probability of the active tendency of the text segment c is 25%, and the probability of the active entity of the text segment c is 10%. Among the 3 text segments, the probability of the activity tendency of the text segment a is larger than a first preset threshold value, and the probability of the activity entity of the text segment a is larger than a second preset threshold value, so that the recognition result of the target object N is determined to be an active object.

In one possible implementation, the method further includes: if the target object is an active object, acquiring a target entity which corresponds to the active object; and sending the target object to the client corresponding to each user associated with the target entity. After the active objects are accurately mined from the mass objects, the active target entity corresponding to the active object can be further determined, so that the active object is sent to the client corresponding to each user associated with the target entity, the distribution value of the active object is improved, and the interaction atmosphere of the user is enhanced. In addition, the identified active object can be put into a global active object pool for storage, and then distributed to each user according to the personalized requirements of the user, or can be put into a specific active object pool of a target entity corresponding to the active object for storage, and then distributed to each user according to the personalized requirements of the user.

In summary, in the embodiment of the present invention, the feature information of the target object is obtained; invoking an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained by training based on a training sample object and an active label and an entity label which are correspondingly marked; and determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not. It should be understood that the feature information of the target object is extracted, and the feature information of the target object is processed by means of the object recognition model to obtain the active state information of the target object, so that whether the target object is an active object can be judged according to the active state information of the target object, and the recognition efficiency and the recognition accuracy of the active object can be improved.

Referring to fig. 4, a flowchart of another information identifying method disclosed in an embodiment of the present application is shown, where the information identifying method may be performed by a computer device, and the computer device may specifically be the server 102 in the information identifying system. The information identification method may specifically include steps S401 to S405. Steps S402 to S404 are a specific implementation manner of step S202. Wherein:

S401, acquiring characteristic information of a target object.

The specific implementation manner of step S401 is the same as that of step S201, and will not be described herein.

S402, obtaining the object type of the target object.

In the embodiment of the application, the object type of the target object may be a text object, a graphic object, or a video object. And selecting a corresponding identification network from a plurality of identification networks included in the object identification model according to the object type of the target object to process.

S403, determining a target recognition network from a plurality of recognition networks comprising the object recognition model according to the object type, wherein the plurality of recognition networks comprise one or more of a text object recognition network, a graphic object recognition network and a video object recognition network.

In the embodiment of the application, different object types correspond to different identification networks. For example, if the object type of the target object is a text object, the target recognition network determined from the plurality of recognition networks included in the object recognition model is a text object recognition network; the object type of the target object is a graphic object, and the target recognition network determined from a plurality of recognition networks included in the object recognition model is a graphic object recognition network; the object type of the target object is a video object, and the target recognition network determined from the plurality of recognition networks included in the object recognition model is a video object recognition network. It should be noted that the plurality of identification networks may also include other types of identification networks, which are not limited herein.

S404, calling the target identification network to process the characteristic information of the target object to obtain the active state information of the target object.

In one possible implementation manner, invoking the target identification network to process the feature information of the target object to obtain the active state information of the target object includes: calling the target recognition network to process each text segment corresponding to the characteristic information of the target object to obtain the active state information of each text segment; and determining the active state information of the target object according to the active state information of the text fragments. That is, the feature information of the target object may be divided into a plurality of text segments, the target recognition network determined in the step S403 is called to process each text segment, so as to obtain the active state information of each text segment, and the active state information of the target object is further determined according to the active state information of each text segment. Based on the possible implementation manner, the accuracy of active object identification is improved. The length of each text segment may be preset, and the set value is not limited herein.

For example, the length of the text segment is preset to be 10 words, the target object M is a text object, the characteristic information of the target object M is text information corresponding to the text content in the target object M, the text information includes 30 words in total, and the target recognition network corresponding to the target object M is a text object recognition network. Therefore, the characteristic information can be divided into 3 text fragments, the text object recognition network is called to process the 3 text fragments corresponding to the characteristic information of the target object M, the active state information of the 3 text fragments is obtained, and the active information of the target object M is further determined according to the active state information of the 3 text fragments.

In one possible implementation manner, the object type is a text object, and the calling the target recognition network to process each text segment corresponding to the feature information of the target object to obtain active information of each text segment includes: invoking the text object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the text object recognition network to perform fusion processing on each piece of information included in the intermediate result information to obtain the active state information of each text segment.

It should be noted that, the fusion processing of multiple feature information can comprehensively utilize multiple feature information, realize the advantage complementation of multiple features, reduce the influence of single feature limitation, thereby improving the accuracy of information identification. The method of the feature information fusion processing may be feature fusion based on bayesian theory, feature fusion based on sparse representation theory, or feature fusion based on deep learning theory, which is not limited herein. For example, the manner of performing fusion processing on each piece of information included in the intermediate result information may be to splice each piece of information, that is, perform summation calculation; the information can be calculated by calculating the maximum value; the fusion may also be performed using a network layer. The method of fusing the contextual characteristic information of the picture and the information included in the intermediate result information and the method of fusing the contextual characteristic information of the video and the information included in the intermediate result information may be the above-mentioned method of fusing a plurality of characteristic information, which will not be described in detail.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a text object recognition network according to an embodiment of the present application. As shown in fig. 5, the text object recognition network includes a BERT layer formed by a BERT network, text information corresponding to text content in a text object and associated information of the text object are input into the BERT layer in the text object recognition network to be processed, and intermediate result information corresponding to each text segment corresponding to the text object is output, where the intermediate result information includes active prediction information, local context feature information, global context feature information and section length feature information. And then carrying out fusion processing on each piece of information included in the intermediate result information to obtain the active state information of each text segment. The activity prediction information is used for indicating the probability of the activity tendency of the text segment and the probability of the activity entity of the text segment.

In a possible implementation manner, the object type is a graphic object, the feature information of the target object includes text information corresponding to text content in the target object, character information corresponding to pictures in the target object, and text information, and the calling the target identification network to process each text segment corresponding to the feature information of the target object to obtain active state information of each text segment includes: acquiring the picture context characteristic information of the target object; invoking the image-text object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the image-text object recognition network to perform fusion processing on the context characteristic information of the image and each piece of information included in the intermediate result information, so as to obtain the active state information of each text segment.

It should be noted that, the person information corresponding to the picture in the target object may be obtained by performing face detection on the picture, and an open-source face detection tool may be used to identify the person in the picture. Text information corresponding to the picture in the target object can be recognized by performing OCR (optical character recognition) on the picture, character information and OCR text recognized in the picture are expanded into characteristic information of the target object, and text content of the target object is enhanced, so that recognition effect is improved.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a graphic object recognition network according to an embodiment of the present application. As shown in fig. 6, the graphic object recognition network includes a BERT layer formed by a BERT network and a feature representation layer formed by a res net network, and inputs text information corresponding to text content in the graphic object, associated information of the graphic object, character information corresponding to pictures in the graphic object and text information into the BERT layer in the graphic object recognition network for processing, and outputs intermediate result information corresponding to each text segment corresponding to the graphic object, where the intermediate result information includes active prediction information, local context feature information, global context feature information and interval length feature information. And processing the picture of the image-text object by utilizing a characteristic representation layer in the image-text object identification network to obtain the picture context characteristic information of the image-text object. And then fusing the picture context characteristic information and each piece of information included in the intermediate result information to obtain the active state information of each text segment. The activity prediction information is used for indicating the probability of the activity tendency of the text segment and the probability of the activity entity of the text segment.

In one possible implementation manner, the object type is a video object, the feature information of the target object includes text information corresponding to text content in the target object, character information and text information corresponding to pictures in the target object, character information corresponding to video in the target object, video fingerprint information and text information, and the calling the target identification network processes each text segment corresponding to the feature information of the target object to obtain active state information of each text segment, including: acquiring video context characteristic information of the target object; invoking the video object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the video object recognition network to perform fusion processing on the video context characteristic information and each piece of information included in the intermediate result information to obtain the active state information of each text segment.

It should be noted that, the character information corresponding to the video in the target object may be obtained by performing face detection on the video, and the video fingerprint information corresponding to the video in the target object may be obtained by performing video fingerprint detection on the video, so as to identify whether the video is derived from a certain video work (such as a television show, a movie, a variety, a cartoon, etc.). In addition, text information corresponding to the video in the target object can be identified by performing OCR (optical character recognition) and ASR (automatic recognition) on the video, so that subtitle text and voice-to-white text in the video can be identified. And expanding character information and video fingerprint information which are recognized from the target object in a video manner and text information recognized by OCR/ASR into characteristic information of the target object, enhancing text content of the target object, and being beneficial to improving recognition effect of the active object.

Optionally, the acquiring the video context feature information of the target object includes: acquiring at least one frame of image of a video in the target object; determining an image sequence based on the at least one frame of images and the images in the target object; invoking a feature representation layer in the video object recognition network to process each image included in the image sequence to obtain feature representation information of each image; and inputting the characteristic representation information of each image into a self-attention layer in the video object recognition network to perform fusion processing to obtain the video context characteristic information of the target object.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a video object recognition network according to an embodiment of the present application. As shown in fig. 7, the video object recognition network includes a BERT layer formed by a BERT network, a feature representation layer formed by a res net network, and a self-attention layer formed by a self-attention network, and inputs text information corresponding to text content in a video object, associated information of the video object, character information and text information corresponding to pictures in the video object, character information corresponding to videos in the video object, video fingerprint information, and text information into the BERT layer in the video object recognition network to process, and outputs intermediate result information corresponding to each text segment corresponding to the video object, wherein the intermediate result information includes active prediction information, local context feature information, global context feature information, and section length feature information. Extracting at least one frame of image of a video in a target object, forming an image sequence with the image existing in the target object, and processing each image in the image sequence through a characteristic representation layer in a video object identification network to obtain characteristic representation information of each image; and carrying out fusion processing on the characteristic representation information of each image through a self-attention layer in the video object recognition network to obtain the video context characteristic information of the target object. And then fusing the video context characteristic information and each piece of information included in the intermediate result information to obtain the active state information of each text segment. The activity prediction information is used for indicating the probability of the activity tendency of the text segment and the probability of the activity entity of the text segment.

S405, determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not.

The specific implementation manner of step S405 is the same as that of step S203, and will not be described herein.

In summary, in the embodiment of the present invention, the feature information of the target object is obtained; obtaining an object type of the target object; determining a target recognition network from a plurality of recognition networks included in the object recognition model according to the object type, wherein the plurality of recognition networks include one or more of a text object recognition network, a graphic object recognition network and a video object recognition network; invoking the target recognition network to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained by training based on training sample objects and active labels and entity labels corresponding to the training sample objects; and determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not. It should be understood that the characteristic information of the target object is extracted, the type of the target object is judged to determine the corresponding type of the recognition network in the object recognition model, the characteristic information of the target object is processed by the recognition network to obtain the active state information of the target object, so that whether the target object is an active object can be judged according to the active state information of the target object.

Referring to fig. 8, a flowchart of another information identifying method disclosed in an embodiment of the present application is shown, where the information identifying method may be performed by a computer device, and the computer device may specifically be performed by a terminal device 101 and a server 102 in an information identifying system together. The information identification method may specifically include steps S801 to S804. Wherein:

s801, a user issues a paste, and active object identification is carried out on the paste.

In this embodiment of the present invention, the post corresponds to the target object, and after the user issues the post through the client, the server identifies the active object with respect to the post, where the active object may specifically refer to the post published for a certain entity for improving popularity of the entity.

S802, constructing the sub-features of the video platform.

In this embodiment of the present application, the video platform corresponds to the above server, and the server performs feature construction on the post issued by the user, so as to obtain feature information of the post, and the specific implementation manner of step S802 is the same as the specific implementation manner of step S201, which is not described herein.

S803, text form active object recognition, image-text form active object recognition or video form active object recognition is carried out on the paste.

In the embodiment of the application, a corresponding active object recognition model is selected according to a shape corresponding to a patch to recognize the patch, namely, the text shape active object recognition model is used for recognizing the patch in the text shape, the text shape active recognition model is used for recognizing the patch in the text shape, and the video shape active object recognition model is used for recognizing the patch in the video shape. The method is characterized by constructing the active object recognition capability, in particular to obtaining three active object recognition models, namely a text form active object recognition model, a picture-text form active object recognition model and a video form active object recognition model, through training object annotation data. The text form active object recognition model corresponds to the text object recognition network, the image-text form active object recognition model corresponds to the image-text object recognition network, the video form active object recognition model corresponds to the video object recognition network, and the object labeling data corresponds to the training sample object and the corresponding labeled active label corresponds to the entity label. The specific implementation manner of step S803 is the same as the specific implementation manner of steps S402 to S405, and will not be described here again.

S804, the active objects enter a pool and are distributed.

According to the method and the device for distributing the active objects, after the active objects are accurately mined from the massive objects, the active objects are placed in the active object data pool, the active objects in the active object data pool can be distributed according to individual requirements of users, the distribution value of the active objects can be improved, and the interaction atmosphere of the users can be enhanced.

In summary, in the embodiment of the present invention, after a user issues a patch, active object recognition needs to be performed on the patch, firstly, feature construction is performed on the patch to obtain feature information of the patch, a corresponding active object recognition model is selected according to a form corresponding to the patch to identify the patch, and finally, the patch identified as an active object is put into an active object data pool to further distribute the active object in the active object data pool, where the active object recognition model includes a text form active object recognition model, a graphic form active object recognition model and a video form active object recognition model. It should be understood that corresponding features are obtained for the patches with different shapes, and the active object recognition models corresponding to the patches are selected to recognize the active objects, so that the recognition efficiency and recognition accuracy of the active objects can be improved.

Based on the above information identification method, the embodiment of the invention provides an information identification device. Referring to fig. 9, a schematic structural diagram of an information identifying apparatus according to an embodiment of the present invention is provided, and the information identifying apparatus 900 may operate the following units:

an acquiring unit 901, configured to acquire feature information of a target object;

the processing unit 902 is configured to invoke an object recognition model to process feature information of the target object to obtain active state information of the target object, where the active state information includes an active tendency probability and an active entity probability, and the object recognition model is obtained by training based on a training sample object and an active tag and an entity tag that are labeled correspondingly;

a determining unit 903, configured to determine, based on the activity status information, a recognition result of the target object, where the recognition result is used to indicate whether the target object is an active object.

In one embodiment, the processing unit 902 is specifically configured to, when invoking the object recognition model to process the feature information of the target object to obtain the active state information of the target object: obtaining an object type of the target object; determining a target recognition network from a plurality of recognition networks included in the object recognition model according to the object type, wherein the plurality of recognition networks include one or more of a text object recognition network, a graphic object recognition network and a video object recognition network; and calling the target identification network to process the characteristic information of the target object to obtain the active state information of the target object.

In one embodiment, the processing unit 902 is specifically configured to, when invoking the target identification network to process the feature information of the target object to obtain the active state information of the target object: calling the target recognition network to process each text segment corresponding to the characteristic information of the target object to obtain the active state information of each text segment; and determining the active state information of the target object according to the active state information of the text fragments.

In one embodiment, the object type is a text object, and the processing unit 902 is specifically configured to, when calling the target recognition network to process each text segment corresponding to the feature information of the target object to obtain the active state information of each text segment: invoking the text object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the text object recognition network to perform fusion processing on each piece of information included in the intermediate result information to obtain the active state information of each text segment.

In one embodiment, the object type is a graphic object, the feature information of the target object includes text information corresponding to text content in the target object, character information corresponding to pictures in the target object, and text information, and the processing unit 902 is specifically configured to, when invoking the target recognition network to process each text segment corresponding to the feature information of the target object to obtain active state information of each text segment: acquiring the picture context characteristic information of the target object; invoking the image-text object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the image-text object recognition network to perform fusion processing on the context characteristic information of the image and each piece of information included in the intermediate result information, so as to obtain the active state information of each text segment.

In one embodiment, the object type is a video object, and the feature information of the target object includes text information corresponding to text content in the target object, character information and text information corresponding to pictures in the target object, character information corresponding to video in the target object, video fingerprint information and text information, and the processing unit 902 is specifically configured to, when invoking the target recognition network to process each text segment corresponding to the feature information of the target object, obtain active state information of each text segment: acquiring video context characteristic information of the target object; invoking the video object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the video object recognition network to perform fusion processing on the video context characteristic information and each piece of information included in the intermediate result information to obtain the active state information of each text segment.

In one embodiment, the processing unit 902, when acquiring the video context feature information of the target object, is specifically configured to: acquiring at least one frame of image of a video in the target object; determining an image sequence based on the at least one frame of images and the images in the target object; invoking a feature representation layer in the video object recognition network to process each image included in the image sequence to obtain feature representation information of each image; and inputting the characteristic representation information of each image into a self-attention layer in the video object recognition network to perform fusion processing to obtain the video context characteristic information of the target object.

In one embodiment, the active state information of the target object specifically includes an active tendency probability and an active entity probability of each text segment corresponding to the feature information, and the processing unit 902 is specifically configured to, when determining the recognition result of the target object based on the active state information: if a target text segment exists in each text segment and the activity tendency probability of the target text segment is larger than a first preset threshold value, the activity entity probability of the target text segment is larger than a second preset threshold value, the identification result of the target object is determined to be an active object, and the target text segment is any one of the text segments.

In one embodiment, the obtaining unit 901 is further configured to obtain a target entity corresponding to the active target object if the target object is an active object; the apparatus further includes a transmitting unit that transmits the target object to clients corresponding to respective users associated with the target entity.

In one embodiment, the obtaining unit 901 is further configured to obtain a training data set, where the training data set includes a training sample object, and an active tag and an entity tag that are labeled correspondingly, where the active tag is used to indicate whether the training sample object is an active object, and the entity tag is used to indicate each entity included in the training sample object; the device also comprises a training unit, wherein the training unit is used for training the initial neural network model by utilizing the training data set to obtain an object recognition model.

In one embodiment, the initial neural network model includes a text object recognition network, a graphics object recognition network, and a video object recognition network, and the training unit, when training the initial neural network model using the training data set, is specifically configured to: training the text object identification network by using the text object in the training sample object and the corresponding marked active label and entity label; initializing the network parameters of the image-text object recognition network and the network parameters of the video object recognition network by using the trained network parameters of the text object recognition network; and training the image-text object recognition network after the network parameter initialization by using the image-text object in the training sample object and the active label and the entity label which are correspondingly marked, and training the video object recognition network after the network parameter initialization by using the video object in the training sample object and the active label and the entity label which are correspondingly marked.

In summary, in the embodiment of the present invention, the feature information of the target object is obtained; obtaining an object type of the target object; determining a target recognition network from a plurality of recognition networks included in the object recognition model according to the object type, wherein the plurality of recognition networks include one or more of a text object recognition network, a graphic object recognition network and a video object recognition network; invoking the target recognition network to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained by training based on training sample objects and active labels and entity labels corresponding to the training sample objects; and determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not. It should be understood that the characteristic information of the target object is extracted, the identification network corresponding to the type in the object identification model is determined by judging the type of the target object, and the characteristic information of the target object is processed by means of the identification network to obtain the active state information of the target object, so that whether the target object is an active object can be judged according to the active state information of the target object, and the identification efficiency and the identification accuracy of the active object can be improved.

Based on the above information identification method and the embodiments of the information identification device, the embodiments of the present invention provide an electronic device, where the electronic device corresponds to the foregoing server. Referring to fig. 10, a schematic structural diagram of an electronic device according to an embodiment of the invention is shown, where the electronic device 1000 may at least include: a processor 1001, a communication interface 1002, and a computer storage medium 1003. Wherein the processor 1001, the communication interface 1002, and the computer storage medium 1003 may be connected by a bus or other means.

A computer storage medium 1003 may be stored in a memory 1004 of the electronic device 1000, the computer storage medium 1003 being for storing a computer program comprising program instructions, the processor 1001 being for executing the program instructions stored by the computer storage medium 1003. The processor 1001 (or CPU (Central Processing Unit, central processing unit)) is a computing core and a control core of the electronic device 1000, which is adapted to implement one or more instructions, in particular to load and execute:

acquiring characteristic information of a target object; invoking an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises an active tendency probability and an active entity probability, and the object recognition model is obtained by training based on a training sample object and an active label and an entity label which are correspondingly marked; and determining a recognition result of the target object based on the active state information, wherein the recognition result is used for indicating whether the target object is an active object or not.

In one embodiment, the processor 1001 is specifically configured to, when invoking an object recognition model to process feature information of the target object to obtain active state information of the target object: obtaining an object type of the target object; determining a target recognition network from a plurality of recognition networks included in the object recognition model according to the object type, wherein the plurality of recognition networks include one or more of a text object recognition network, a graphic object recognition network and a video object recognition network; and calling the target identification network to process the characteristic information of the target object to obtain the active state information of the target object.

In one embodiment, the processor 1001 is specifically configured to, when invoking the target identification network to process the feature information of the target object to obtain the active state information of the target object: calling the target recognition network to process each text segment corresponding to the characteristic information of the target object to obtain the active state information of each text segment; and determining the active state information of the target object according to the active state information of the text fragments.

In one embodiment, the object type is a text object, and the processor 1001 is specifically configured to, when calling the target recognition network to process each text segment corresponding to the feature information of the target object to obtain the active state information of each text segment: invoking the text object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the text object recognition network to perform fusion processing on each piece of information included in the intermediate result information to obtain the active state information of each text segment.

In one embodiment, the object type is a graphic object, the feature information of the target object includes text information corresponding to text content in the target object, character information corresponding to pictures in the target object, and text information, and the processor 1001 is specifically configured to, when invoking the target recognition network to process each text segment corresponding to the feature information of the target object to obtain active state information of each text segment: acquiring the picture context characteristic information of the target object; invoking the image-text object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the image-text object recognition network to perform fusion processing on the context characteristic information of the image and each piece of information included in the intermediate result information, so as to obtain the active state information of each text segment.

In one embodiment, the object type is a video object, the feature information of the target object includes text information corresponding to text content in the target object, character information and text information corresponding to pictures in the target object, character information corresponding to video in the target object, video fingerprint information and text information, and the processor 1001 is specifically configured to, when invoking the target recognition network to process each text segment corresponding to the feature information of the target object, obtain active state information of each text segment: acquiring video context characteristic information of the target object; invoking the video object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information; and calling the video object recognition network to perform fusion processing on the video context characteristic information and each piece of information included in the intermediate result information to obtain the active state information of each text segment.

In one embodiment, the processor 1001, when acquiring the video context feature information of the target object, is specifically configured to: acquiring at least one frame of image of a video in the target object; determining an image sequence based on the at least one frame of images and the images in the target object; invoking a feature representation layer in the video object recognition network to process each image included in the image sequence to obtain feature representation information of each image; and inputting the characteristic representation information of each image into a self-attention layer in the video object recognition network to perform fusion processing to obtain the video context characteristic information of the target object.

In one embodiment, the activity information of the target object specifically includes an activity tendency probability and an activity entity probability of each text segment corresponding to the feature information, and the processor 1001 is specifically configured to, when determining the recognition result of the target object based on the activity information: if a target text segment exists in each text segment and the activity tendency probability of the target text segment is larger than a first preset threshold value, the activity entity probability of the target text segment is larger than a second preset threshold value, the identification result of the target object is determined to be an active object, and the target text segment is any one of the text segments.

In one embodiment, the processor 1001 is further configured to: if the target object is an active object, acquiring a target entity which corresponds to the active object; and sending the target object to the client corresponding to each user associated with the target entity.

In one embodiment, the processor 1001 is further configured to: acquiring a training data set, wherein the training data set comprises a training sample object, and an active label and an entity label which are correspondingly marked, the active label is used for indicating whether the training sample object is an active object, and the entity label is used for indicating each entity included in the training sample object; and training the initial neural network model by using the training data set to obtain an object recognition model.

In one embodiment, the initial neural network model includes a text object recognition network, a graphics object recognition network, and a video object recognition network, and the processor 1001 is specifically configured to, when training the initial neural network model using the training dataset: training the text object identification network by using the text object in the training sample object and the corresponding marked active label and entity label; initializing the network parameters of the image-text object recognition network and the network parameters of the video object recognition network by using the trained network parameters of the text object recognition network; and training the image-text object recognition network after the network parameter initialization by using the image-text object in the training sample object and the active label and the entity label which are correspondingly marked, and training the video object recognition network after the network parameter initialization by using the video object in the training sample object and the active label and the entity label which are correspondingly marked.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments. The technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc., and may be a processor in the computer device in particular) to execute all or part of the steps of the above-described method of the embodiments of the present application. Wherein the aforementioned storage medium may comprise: a U-disk, a removable hard disk, a magnetic disk, an optical disk, a Read-Only Memory (abbreviated as ROM), a random access Memory (abbreviated as Random Access Memory, RAM), or the like.

Those of ordinary skill in the art will appreciate that the elements and steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. The computer instructions may be stored in or transmitted across a computer storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). Computer storage media may be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc. that contain an integration of one or more of the available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

The foregoing is merely specific embodiments of the present application, but the protection scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present disclosure, and all changes and substitutions are intended to be covered by the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An information identification method, comprising:

acquiring characteristic information of a target object;

invoking an object recognition model to process the characteristic information of the target object to obtain the active state information of the target object, wherein the active state information comprises active trend probability and active entity probability, and the object recognition model is obtained by training based on a training sample object and active tags and entity tags which are correspondingly marked;

2. The method according to claim 1, wherein the calling the object recognition model to process the feature information of the target object to obtain the active state information of the target object includes:

Obtaining an object type of the target object;

determining a target recognition network from a plurality of recognition networks included in the object recognition model according to the object type, wherein the plurality of recognition networks include one or more of a text object recognition network, a graphic object recognition network and a video object recognition network;

and calling the target identification network to process the characteristic information of the target object to obtain the active state information of the target object.

3. The method according to claim 2, wherein the calling the target recognition network to process the feature information of the target object to obtain the active state information of the target object includes:

invoking the target recognition network to process each text segment corresponding to the characteristic information of the target object to obtain the active state information of each text segment;

and determining the active state information of the target object according to the active state information of each text segment.

4. The method according to claim 3, wherein the object type is a text object, and the calling the target recognition network to process each text segment corresponding to the feature information of the target object to obtain the active state information of each text segment includes:

Invoking the text object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information;

and calling the text object recognition network to perform fusion processing on each piece of information included in the intermediate result information to obtain the active state information of each text segment.

5. The method according to claim 3, wherein the object type is a graphic object, the feature information of the target object includes text information corresponding to text content in the target object, character information corresponding to pictures in the target object, and text information, and the calling the target recognition network to process each text segment corresponding to the feature information of the target object to obtain active state information of each text segment includes:

acquiring the picture context characteristic information of the target object;

invoking the image-text object recognition network in the object recognition model to process the characteristic information to obtain intermediate result information corresponding to each text segment corresponding to the characteristic information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context characteristic information, global context characteristic information and interval length characteristic information;

And calling the image-text object recognition network to perform fusion processing on the context characteristic information of the picture and each piece of information included in the intermediate result information to obtain the active state information of each text segment.

6. The method according to claim 3, wherein the object type is a video object, the feature information of the target object includes text information corresponding to text content in the target object, character information and text information corresponding to pictures in the target object, character information corresponding to video in the target object, video fingerprint information and text information, and the calling the target recognition network to process each text segment corresponding to the feature information of the target object to obtain active state information of each text segment includes:

acquiring video context characteristic information of the target object;

invoking the video object recognition network in the object recognition model to process the feature information to obtain intermediate result information corresponding to each text segment corresponding to the feature information of the target object, wherein the intermediate result information comprises one or more of active prediction information, local context feature information, global context feature information and interval length feature information;

And calling the video object recognition network to perform fusion processing on the video context characteristic information and each piece of information included in the intermediate result information to obtain the active state information of each text segment.

7. The method of claim 6, wherein the obtaining video context feature information of the target object comprises:

acquiring at least one frame of image of a video in the target object;

determining an image sequence based on the at least one frame of images and the images in the target object;

invoking a feature representation layer in the video object recognition network to process each image included in the image sequence to obtain feature representation information of each image;

and inputting the characteristic representation information of each image into a self-attention layer in the video object recognition network to perform fusion processing to obtain the video context characteristic information of the target object.

8. The method according to any one of claims 1 to 7, wherein the active state information of the target object specifically includes an active tendency probability and an active entity probability of each text segment corresponding to the feature information, and the determining the recognition result of the target object based on the active state information includes:

If the target text fragments exist in the text fragments and the active trend probability of the target text fragments is larger than a first preset threshold, and the active entity probability of the target text fragments is larger than a second preset threshold, determining that the recognition result of the target object is an active object, wherein the target text fragments are any one of the text fragments.

9. The method according to any one of claims 1 to 7, further comprising:

if the target object is an active object, acquiring a target entity which is active and corresponds to the target object;

and sending the target object to clients corresponding to the users associated with the target entity.

10. The method according to any one of claims 1 to 7, further comprising:

acquiring a training data set, wherein the training data set comprises a training sample object, and an active label and an entity label which are correspondingly marked, the active label is used for indicating whether the training sample object is an active object, and the entity label is used for indicating each entity included in the training sample object;

and training the initial neural network model by using the training data set to obtain an object recognition model.

11. The method of claim 10, wherein the initial neural network model comprises a text object recognition network, a graphics object recognition network, and a video object recognition network, wherein training the initial neural network model using the training dataset comprises:

training the text object recognition network by using the text objects in the training sample objects and the corresponding marked active tags and entity tags;

initializing the network parameters of the image-text object recognition network and the network parameters of the video object recognition network by using the trained network parameters of the text object recognition network;

and training the image-text object recognition network after the network parameter initialization by using the image-text object in the training sample object and the active label and the entity label which are correspondingly marked, and training the video object recognition network after the network parameter initialization by using the video object in the training sample object and the active label and the entity label which are correspondingly marked.

12. An information identifying apparatus, characterized in that the apparatus comprises:

and the determining unit is used for determining the identification result of the target object based on the activity state information, wherein the identification result is used for indicating whether the target object is an active object or not.