CN115858816A

CN115858816A - Construction method and system of intelligent agent cognitive map for public security field

Info

Publication number: CN115858816A
Application number: CN202211680967.XA
Authority: CN
Inventors: 于笑博; 张广志; 成立立; 刘畔青
Original assignee: Beiling Rongxin Datalnfo Science and Technology Ltd
Current assignee: Beiling Rongxin Datalnfo Science and Technology Ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-03-28

Abstract

The invention discloses a construction method and a system of an intelligent agent cognitive map facing the public safety field, wherein the method comprises the following steps: constructing an initial agent cognitive map, wherein the initial agent cognitive map at least comprises a spatial coordinate system, a virtual reality scene and a multi-modal cognitive relationship map; performing multi-mode recognition and cognitive extraction on public safety field data based on the initial intelligent agent cognitive map, wherein the multi-mode recognition mode at least comprises image segmentation, video understanding, audio recognition and natural language processing, and the cognitive extraction mode at least comprises entity extraction, entity disambiguation, entity attribute extraction, entity relationship extraction and event extraction; fusing multi-modal cognitive data extracted from the public safety field, wherein the fusion mode at least comprises multi-modal entity linking and cognitive merging; and performing cognitive processing according to a logic reasoning rule preset in the public safety field to complete the construction of the intelligent agent cognitive map.

Description

Construction method and system of intelligent agent cognitive map for public security field

Technical Field

The invention relates to the technical field of big data and artificial intelligence, in particular to a construction method and a system of an intelligent agent cognitive map facing the public safety field.

Background

Artificial intelligence has moved from computational intelligence, perceptual intelligence, to cognitive intelligence stages. The intelligent agent atlas is an important component of artificial intelligence technology, and provides a technical basis for artificial intelligence application in various fields due to strong multi-modal semantic processing, interconnection organization, information retrieval and cognitive reasoning capabilities. In essence, a smart graph is a large, multi-modal semantic network graph that describes various concrete entities and abstractions and their relationships.

With the application of technologies such as big data, cloud computing and artificial intelligence in the field of public security, the capabilities of information research and judgment, investigation and striking and the like are improved. However, with the change of public safety situation and the change of technical means, public safety puts forward more urgent needs for deeper, more intelligent and more comprehensive early warning prediction, analysis and judgment capabilities. The intelligent agent atlas is applied to the public safety field, the strong cognitive reasoning capability of the intelligent agent atlas is used for early warning prediction, analysis and study and judgment, and the application in the public safety field is still in a blank stage.

Disclosure of Invention

The invention aims to provide a construction method and a system of an intelligent agent cognitive map facing the public safety field, which can effectively construct the intelligent agent map facing the public safety field.

The invention provides a method for constructing an intelligent agent cognitive map for the public safety field, which comprises the following steps:

constructing an initial agent cognitive map, wherein the initial agent cognitive map at least comprises a space coordinate system, a virtual reality scene and a multi-modal cognitive relationship map;

performing multi-mode recognition and cognitive extraction on public safety field data based on the initial intelligent agent cognitive map, wherein the multi-mode recognition mode at least comprises image segmentation, video understanding, audio recognition and natural language processing, and the cognitive extraction mode at least comprises entity extraction, entity disambiguation, entity attribute extraction, entity relationship extraction and event extraction;

fusing multi-modal cognitive data extracted from the public safety field, wherein the fusion mode at least comprises multi-modal entity linking and cognitive merging;

and performing cognitive processing according to a logic reasoning rule preset in the public safety field to complete the construction of the intelligent agent cognitive map.

In this scheme, the constructing the initial agent recognition spectrum specifically includes:

constructing the space coordinate system and the scale based on OpenGL, wherein the space coordinate system at least comprises an object coordinate system, a world coordinate system and an observation coordinate system;

establishing a scene model based on OpenGL to construct the virtual reality scene, wherein the scene model is converted into pixels through rasterization;

and mapping the corresponding relation and the logical incidence relation of various things in time and space based on the space coordinate system and the time sequence so as to construct the multi-modal cognitive relationship map.

In this scheme, based on the initial agent cognitive map, multi-modal identification is performed on public safety field data, and the method specifically includes:

performing image segmentation based on a DeepLabV3 and a preset network, wherein the preset network comprises an Encoder part and a Decode part;

performing video understanding based on a TSM network, wherein an offline video identification model is constructed based on a bidirectional TSM, and a feature map is transferred from a previous frame to a current frame by using a unidirectional TSM;

performing audio recognition by using an ARS algorithm;

and performing natural language processing by using the BERT network.

In the scheme, an LSTM + CRF model is adopted for extraction when the initial agent cognitive atlas is used for entity extraction of public safety field data.

In this scheme, the multi-modal cognitive data that obtains the extraction of public safety field fuses, specifically includes:

carrying out entity link by using an image segmentation recognition network, a video understanding network and a BERT network;

and combining the external cognition library and the relational database to complete cognition combination, wherein the combined external cognition library comprises a fusion data layer and a mode layer, the fusion of the mode layer comprises the fusion of concepts, the fusion of concept upper-lower order relations and the fusion of concept attribute definitions, and the fusion of the data layer comprises the fusion of entities and the fusion of entity attributes.

In this scheme, the cognitive processing is performed according to a logical inference rule preset in the public safety field to complete the construction of the cognitive map of the intelligent agent, and the method specifically includes:

identifying entity parallel relation and entity superior-subordinate relation, thereby clustering concepts of each layer to generate an ontology;

the method comprises the steps of completing logic-based reasoning and graph-based reasoning, wherein the logic-based reasoning at least comprises first-order predicate logic, description logic and rule-based reasoning, and the graph-based reasoning is completed based on a neural network model or a Path Ranking algorithm;

and evaluating the accuracy and the coverage rate of the cognitive map of the intelligent agent, and finishing the construction of the cognitive map of the intelligent agent after the evaluation is qualified, wherein the accuracy at least comprises syntax accuracy, semantic accuracy and timeliness.

The second aspect of the present invention further provides a system for constructing an agent cognitive map for the public safety domain, which includes a memory and a processor, wherein the memory includes a program of a method for constructing the agent cognitive map for the public safety domain, and when the program of the method for constructing the agent cognitive map for the public safety domain is executed by the processor, the following steps are implemented:

constructing an initial agent cognitive map, wherein the initial agent cognitive map at least comprises a spatial coordinate system, a virtual reality scene and a multi-modal cognitive relationship map;

performing multi-mode recognition and cognitive extraction on public security field data based on the initial intelligent agent cognitive map, wherein the multi-mode recognition mode at least comprises image segmentation, video understanding, audio recognition and natural language processing, and the cognitive extraction mode at least comprises entity extraction, entity disambiguation, entity attribute extraction, entity relationship extraction and event extraction;

performing audio recognition by using an ARS algorithm;

and performing natural language processing by using the BERT network.

In the scheme, an LSTM + CRF model is adopted for extraction when entity extraction is carried out on public safety field data based on the initial agent cognitive atlas.

In this scheme, the multi-modal cognitive data that obtains to the extraction of public safety field fuses, specifically includes:

and combining the external cognitive library and the relational database to complete cognitive combination, wherein the combined external cognitive library comprises a fusion data layer and a mode layer, the fusion of the mode layer comprises the fusion of concepts, the fusion of the upper-lower relation of the concepts and the fusion of concept attribute definition, and the fusion of the data layer comprises the fusion of entities and the fusion of entity attributes.

identifying entity parallel relation and entity superior-subordinate relation, thereby clustering concepts of each level to generate an ontology;

and evaluating the accuracy and the coverage rate of the cognitive map of the intelligent agent, and completing the construction of the cognitive map of the intelligent agent after the cognitive map is qualified by evaluation, wherein the accuracy at least comprises syntax accuracy, semantic accuracy and timeliness.

The third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a program of a method for constructing a public safety-oriented agent awareness graph of a machine, and when the program of the method for constructing the public safety-oriented agent awareness graph is executed by a processor, the method for constructing the public safety-oriented agent awareness graph of the machine implements the steps of the method for constructing the public safety-oriented agent awareness graph of the machine.

According to the method and the system for constructing the cognitive map of the intelligent agent facing the public safety field, disclosed by the invention, the cognitive map of the intelligent agent is applied to the public safety field, and the strong cognitive reasoning capability of the intelligent agent is utilized to perform early warning prediction, analysis, study and judgment, so that more controllability and convenience can be increased when the cognitive map is applied to the public safety field.

Drawings

FIG. 1 is a flow chart of a construction method of an agent cognitive map facing the public safety field according to the invention;

FIG. 2 is a speech recognition flow chart of the construction method of the cognitive map of the intelligent agent facing the public safety field;

fig. 3 shows a block diagram of a system for constructing an agent cognitive map oriented to the public safety field according to the invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.

Fig. 1 shows a flowchart of a method for constructing an agent cognitive map for the public safety domain.

As shown in fig. 1, the application discloses a construction method of an agent cognitive map for the public safety field, comprising the following steps:

s102, constructing an initial agent cognitive map;

s104, performing multi-mode recognition and cognitive extraction on public safety field data based on the initial agent cognitive map;

s106, fusing the multi-modal cognitive data extracted from the public safety field;

and S108, performing cognitive processing according to a logic inference rule preset in the public safety field to complete the construction of the cognitive map of the intelligent agent.

It should be noted that, in this embodiment, the initial agent cognitive map is constructed through a spatial coordinate system, a virtual reality scene, and a multi-modal cognitive relationship map, where the spatial coordinate system may be switched between a reference system centered on the perceptron and a reference system established by things other than the perceptron; the multi-modal cognitive relationship map is a multi-modal network map which maps corresponding relationships of various objects in space and time and various logical association relationships by depending on a space coordinate system and a time sequence.

Further, when multi-modal recognition is carried out on public safety field data based on the initial intelligent agent cognitive map, the multi-modal recognition at least comprises image segmentation, video understanding, audio recognition and natural language processing, and when cognitive extraction is carried out, the multi-modal recognition at least comprises entity extraction, entity disambiguation, entity attribute extraction, entity relationship extraction and event extraction; and then fusing the multi-modal cognitive data extracted from the public security field, wherein the fusion mode at least comprises multi-modal entity linking and cognitive merging, for example, entity linking is carried out by comprehensively using an image segmentation recognition network, a video understanding network and a BERT network, and for the cognitive merging, when the cognitive map of the intelligent agent is constructed, a cognitive library of a third party can be fused into the cognitive map of the third party to supplement contents, and finally, cognitive processing is carried out according to a logic inference rule preset in the public security field to complete the construction of the cognitive map of the intelligent agent, wherein the steps adopted in the cognitive processing, such as body construction, cognitive inference and quality evaluation, are carried out.

According to the embodiment of the invention, the constructing of the initial agent recognition graph specifically comprises the following steps:

It should be noted that, in this embodiment, the spatial coordinate system and the scale are constructed based on OpenGL, wherein the object coordinate system: the object Coordinate system (Local Coordinate) is the Coordinate system associated with a particular object. Each object has its own independent coordinate system, and the coordinate system associated with the object changes as the object moves or changes direction. In some cases, the object coordinate system is also referred to as a model coordinate system. Because the coordinates of the model vertices are described in the model coordinates; world coordinate system: the world coordinate system establishes the frame of reference needed to describe the other coordinate systems. That is, the world coordinate system can be used to describe the location of other coordinate systems, and the world coordinate system cannot be described with a larger, external coordinate system; observation coordinate system: the observation coordinate system is also called Eye Coordinates (EC), and is a three-dimensional rectangular auxiliary coordinate system that can be defined anywhere in any direction in the user coordinate system. In the viewing coordinate system, a plane is generally defined perpendicular to the Z-axis of the coordinate system, called the viewing plane. The coordinate system is mainly used for appointing a cutting space and determining which part of the three-dimensional geometric body needs to be output on a screen; in addition, the coordinate values of the part, which needs to be output, of the three-dimensional geometric body in the world coordinate system can be converted into the coordinate values in the normalized coordinate system through the observation plane.

Further, using OpenGL to construct a virtual reality scene, including various real-world things, wherein the basic steps of the main graphics operations to construct the scene are: a) A scene model is built from basic graphic elements, and the built model is mathematically described (OpenGL: points, lines, polygons, images, and bitmaps as basic graphic units); b) Placing the scene model in an appropriate position in a three-dimensional space and setting a viewpoint (viewpoint) to observe a landscape of interest; c) Calculating the colors of all objects in the model, wherein the colors are determined according to application requirements, and meanwhile, determining illumination conditions, texture pasting modes and the like; d) The mathematical description of the scene model and its color information are converted to pixels on the computer screen, a process known as rasterization (rasterization).

And mapping the corresponding relation and the logical incidence relation of various objects in space and time based on the space coordinate system and the time sequence to construct the multi-modal cognitive relationship map, wherein the directed graph comprises a plurality of oriented graphs

Wherein

Respectively representing entities, relationships, attributes and attribute values;

and

respectively representing a relation triple and an attribute triple; triple unit

Representing entities

And entities

Have a relationship

；

Triple unit

Representing entities

Having an attribute

Having an attribute value of

。

The entities are divided into tangible entities and abstract entities. The avatar entity refers to a visual three-dimensional object or scene, and the abstract entity is a character concept or label abstracted based on the avatar entity. The abstract entities are further divided into object entities and event entities. The object entities may correspond one-to-one to the avatar objects. The event entity is a combination of a series of dynamic changing processes of things, for example, the "sunrise" is a combination of a series of processes of scene visual effect changes generated by the earth and the sun in the process of relative movement.

The entity may have multiple attributes, such as a person having attributes of height, weight, gender, age, motion, expression, etc., and the attributes may be in the form of textual concepts, such as height: "170cm", may also be in the form of a graphic or animation, such as an animation of "running" corresponding to an action. The entities can have various relationships, such as spatial and time-series relationships, such as up-down, left-right, front-back, sequence and the like, and can be logical relationships, such as friends, teachers and students and the like.

According to the embodiment of the invention, the multi-mode recognition of the public safety field data based on the initial intelligent agent cognitive map specifically comprises the following steps:

performing audio identification by using an ARS algorithm;

and performing natural language processing by using the BERT network.

It should be noted that, in this embodiment, the deplab v3+ network based on improving the feature resolution is used to perform image segmentation and recognition, the preset network is mainly divided into two parts, encoder and Decoder, where Xception is used as a backbone network and an ASPP structure is used to solve the multi-scale problem. In order to fuse the bottom layer features and the high layer features and improve the accuracy of the segmentation boundary, a Decoder part is introduced.

1. In the Encoder part, the primary effective characteristic layers compressed for four times are extracted by using the ASPP structural characteristics, then are combined, and then are subjected to 1x1 convolution compression characteristics.

2. In the Decoder, the number of channels is adjusted by using 1x1 convolution on the primary effective feature layer compressed twice, the channels are connected with the feature processed by the ASPP, and then the final feature map is obtained by performing convolution twice.

3. The final characteristic diagram is used for prediction, and two steps of operation are needed

(1) Channel adjustment to total number of classes using a 1x1 convolution

(2) resize, upsampling to restore the original size of the output predicted picture.

Further, video understanding is performed based on a TSM (Trusted Services Module) network, and specifically, an offline video recognition model is constructed by using a bidirectional TSM. Given a video V, firstly sampling T frames $ F _ 1.,. F _ T $fromthe video, after sampling the frames, independently processing each frame by a 2D CNN, averaging output registers to give final prediction, inserting a TSM into each residual module, realizing time information fusion without calculation, wherein a backhaul is ResNet50 (other pre-trained models can be used, a mobile terminal can be placed by a MobileNet V2), embedding the TSM into each residual module (residual block), increasing the time domain receptive field by 1 time, and realizing the effect of a pseudo-3D model only through a shifting operation. The unidirectional TSM transfers the features from the previous frame to the current frame, the unidirectional TSM for online identification stores and caches the previous 1/8 'feature map of each residual block in a memory for each frame in the inference process, replaces the previous 1/8' feature map of the current feature map with the cached feature map for the next frame, and generates the next layer by combining the '7/8' current feature map and the feature map of the previous frame of '1/8', and repeats.

Further, performing audio recognition by using an ARS algorithm; specifically, as shown in fig. 2, the ASR algorithm follows an "input-encode-decode-output" process, wherein the encoding process: the input of speech recognition is voice, which needs to be converted into digital information by a coding process and extracted with features for processing, the voice signal is cut into small segments according to a short time interval during coding, the small segments become frames, for each frame, the features in the signal can be extracted through MFCC features and changed into a multi-dimensional vector, and each dimension in the vector is a feature of the frame signal. And (3) decoding process: the decoding process is a process of converting a vector obtained by coding into a character, and needs to be processed by two models, wherein one model is an acoustic model, the other model is a language model, the acoustic model combines adjacent frames into a phoneme by processing the vector obtained by coding, such as an initial consonant and a final sound in Chinese pinyin, and then the phoneme is combined into a single word or a Chinese character. The language model is used for adjusting the illogical words obtained by the acoustic model so as to enable the recognition result to become smooth.

Further, natural language processing is performed using a BERT (Bidirectional Encoder responses from transforms) network, and specifically, two training strategies are used: one is a MASK Language Model (MLM) in which 15% of the words in each sequence are replaced with [ MASK ] tokens before the word sequence is input into BERT, which then attempts to predict the original values of the MASK words from the context provided by the other non-MASK words in the sequence, specifically the prediction of the output word requires: (1) adding a classification layer on top of the encoder output; (2) Multiplying the output vectors by the embedding matrix, and converting the output vectors into vocabulary dimensions; (3) The probability of each word in the vocabulary is calculated using softmax. The second is Next Sentence Prediction (NSP), in which the model receives as input a pair of sentences and learns to predict whether the second sentence in the pair is a subsequent sentence in the original document, during training 50% of the input is a pair of sentences in which the second sentence is a subsequent sentence in the original document, and in the other 50% of the input one sentence is randomly selected from the corpus as the second sentence, and assuming that the random sentence is not connected to the first sentence, in order to help the model distinguish the two sentences in the training, the input is processed before entering the model as follows: (1) Inserting a [ CLS ] mark at the beginning of a first sentence and inserting an [ SEP ] mark at the end of each sentence; (2) Adding a sense Embedding indicating sentence a or sentence B to each tag; the entry Embedding is conceptually similar to token Embedding when the vocabulary length is 2; (3) positional embedding is added to each marker to indicate its position in the sequence.

According to the embodiment of the invention, an LSTM + CRF model is adopted for extraction when the initial agent cognitive map is used for entity extraction of public safety field data.

It should be noted that, in this embodiment, correspondence and dependency relationship of the multimodal data are established, and entity extraction, entity disambiguation, entity attribute extraction, entity relationship extraction, event extraction, and the like are performed based on the multimodal data. Taking entity extraction as an example, an LSTM + CRF model is adopted for entity extraction, the LSTM and the CRF are combined together, input past characteristics and sentence-level label information can be captured, a CRF layer has a state transition matrix parameter, past and future label information can be effectively used for predicting a current label through the CRF layer, the current label is similar to a bidirectional LSTM, each sentence is input into the bidirectional LSTM one by one according to a word order, the probability that each word belongs to each entity category label is obtained through combining with positive and negative hidden layer output, and the target function is optimized when the current label is input into the CRF, so that the entity category to which each word belongs is obtained. Wherein, long short-term memory (LSTM) is a special RNN, mainly for solving the problems of gradient extinction and gradient explosion in the Long sequence training process; while the fixed rate coefficient (constant rate) is to ensure "certain quality", intelligently allocate code rates, including the allocated code rate in the same frame and the allocated code rate between frames.

According to the embodiment of the invention, the fusion of the multi-modal cognitive data extracted from the public security field specifically comprises the following steps:

It should be noted that, in this embodiment, the multi-modal cognitive data extracted in the public security domain is fused, which includes multi-modal entity linking and cognitive merging, where entity linking refers to an operation of corresponding all obtained same entities and related entities to a same correct entity in the cognitive library. Firstly, judging whether entities in the existing knowledge base have the same entities or related entities through technologies such as an image segmentation recognition network, a video understanding network and a BERT network, namely merging the entities with the same meaning into a correct entity; then, an entity object is obtained through the related technology of entity extraction; and finally, corresponding the entity object to a correct entity in the cognitive library.

And the cognitive integration means that when the cognitive map of the intelligent agent is constructed, a cognitive library of a third party can be integrated into the cognitive map of the third party. The external cognitive library needs to be merged into a data layer and a mode layer respectively. The fusion of the mode layers comprises: concept fusion, concept upper and lower position relation fusion and concept attribute definition fusion. The fusion of the data layers comprises: and fusing entities and fusing entity attributes. Cognitive merging includes merging external cognitive libraries and merging relational databases, fusing external cognitive libraries to local cognitive libraries needs to deal with two levels of problems: the fusion of data layers, including the names, attributes, relationships, categories of entities, etc., has a major problem of how to avoid the conflict between instances and relationships, which results in unnecessary redundancy. By means of fusion of the mode layers, newly obtained ontologies are fused into an existing ontology base and then a relational database is merged, and an important high-quality cognitive source is the own relational database of an enterprise or an organization in the process of establishing an intelligent agent cognitive map. In order to incorporate these structured historical data into cognitive maps, a Resource Description Framework (RDF) may be employed as a data model.

According to the embodiment of the invention, the cognitive processing is carried out according to the logic inference rule preset in the public safety field so as to complete the construction of the cognitive map of the intelligent agent, and the cognitive processing method specifically comprises the following steps:

the method comprises the steps of completing logic-based reasoning and graph-based reasoning, wherein the logic-based reasoning at least comprises first-order predicate logic, description logic and rule-based reasoning, and completing the graph-based reasoning based on a neural network model or a Path Ranking algorithm;

It should be noted that, in this embodiment, the ontology may be manually constructed in a manual editing manner (by using ontology editing software), or may be constructed in a data-driven automatic manner. Wherein, the automatic ontology construction process comprises three stages: a) The entity parallel relation similarity is an index measure for investigating the degree of belonging of any given two entities to the same concept classification, and the higher the similarity is, the more likely the two entities belong to the same semantic class. The parallel relationship is a conceptual membership in the vertical direction. For example, "china" and "usa" are entities of country names, and have higher similarity of parallel relationship; the possibility that the two entities, namely the American entity and the mobile phone, belong to the same semantic category is low, so that the similarity of the parallel relationship is low. The similarity calculation methods of the entity parallel relationship of the current mainstream comprise two methods: a pattern matching method and a distribution similarity method. The mode matching method adopts a mode of predefining entity pair modes, obtains the common occurrence frequency of given keyword combinations in the same corpus unit through mode matching, and calculates the similarity between entity pairs according to the common occurrence frequency. The premise assumption of the distributed similarity (distributed similarity) method is that: there is semantic similarity between entities that frequently appear in similar context environments. During specific calculation, each entity is represented as 1N-dimensional vector, wherein each dimension of the vector represents 1 predefined context environment, and vector element values represent the probability of the entity appearing in each context environment, and then the similarity of the parallel relationship among the entities can be obtained by solving the similarity among the vectors; b) The extraction of the upper and lower relation of the entity is used for determining the membership (IsA) relation between concepts, and this relation is also called the upper and lower relation, for example, the phrase (missile, weapon) constitutes the upper and lower relation, where "missile" is the lower word and "weapon" is the upper word. The extraction of the context relationship of the entity is the research focus in the field, and the main research method is to extract the IsA entity pair based on a grammar mode (such as a Hearst mode). Currently, mainstream information extraction systems, such as knowtall, textRunner, NELL, and the like, can extract the context of an entity at a syntax level, and the base adopts an iterative extraction technology based on semantics to extract the context of the entity in a gradual refinement manner. The iterative extraction technology based on semantics generally utilizes a probability model to judge IsA relations and distinguish upper and lower-level words, and generally helps to train the model by means of concept classification knowledge provided by encyclopedic websites so as to improve algorithm precision. For example, when processing a sentence such as "social relationships other than the sentences plus facts as cats", two alternative facts can be obtained by extracting the top and bottom word in the IsA entity pair: (cat, isA, dog) and (cat, isA, biomedical animal). If the concept of these entities is already related in the base, the correct result can be obtained; c) And (3) generating an ontology, wherein the main task is to cluster concepts obtained in each layer, calibrate semantic classes of the concepts, and assign one or more public hypernyms to entities in the classes.

Furthermore, the cognitive inference refers to establishing new association among entities through computer inference from existing entity relationship data in a cognitive library, so that a cognitive network is expanded and enriched. The cognitive inference is an important means and a key link for constructing the cognitive map, and new cognition can be found from the existing cognition through the cognitive inference. For example (qian, father, yong) and (yong, father, kangxi) are known, and (qian, grandfather, kangxi) or (kangxi, grandson, qian) can be obtained. The object of cognitive inference is not limited to the relationship between entities, and may be an attribute value of an entity, a conceptual hierarchical relationship of an ontology, or the like. For example, given the birthday attribute of an entity, the age attribute of the entity can be inferred. Concept reasoning can also be done based on concept inheritance relationships in the ontology library, such as known (tiger, family, feline) and (feline, order, carnivora), which can be inferred (tiger, order, carnivora). Cognitive inference methods can be divided into two broad categories: logic-based reasoning and graph-based reasoning. Logic-based reasoning mainly includes first-order predicate logic, description logic, and rule-based reasoning. First order predicate logic builds on propositions, where the proposition is decomposed into two parts, an individual (individuals) and a predicate (prediction). An individual refers to an object that can exist independently, and can be a specific thing such as a desk or an abstract concept such as a student. The predicate is a word used for describing the property and object relationship of an individual, for example, friend in a triple (A, friend, B) is the predicate expressing the relationship between the individual A and the individual B. For example, the interpersonal relationship can be inferred by adopting first-order predicate logic, and the method is to regard the relationship as a predicate, regard a person as a argument, express the interpersonal relationship by adopting a logical operation symbol, and then set the logic and constraint conditions of the relationship inference, so that the logic inference of the simple relationship can be realized. For complex entity relationships, inference can be performed using description logic. Description logic (description logic) is a formalization tool based on knowledge representation of objects, is a subset of first-order predicate logic, and is an important design basis for ontology language reasoning. The knowledge base based on description logic generally contains TBox (tertiary box) which is a axiom set for describing relationships between concepts and relationships, and ABox (association box) which is an axiom set describing specific facts. With these two tools, inference based on description logic can be ultimately attributed to the consistency check problem of ABox, thereby simplifying and ultimately enabling relational inference. When reasoning is carried out based on the concept hierarchy of the ontology, the object is mainly a concept described by a Web ontology language (OWL), and the OWL provides rich sentences and has strong knowledge description capability. However, in the aspect of describing attribute synthesis and attribute value transfer, the expression capability of the network ontology language is insufficient, and in order to realize inference, a special rule language (e.g., SWRL) can be used to add a custom rule to the ontology model for functional expansion. The graph-based reasoning method is mainly based on a neural network model or a Path Ranking algorithm. And expressing the entities in the knowledge base into a word vector form, and performing relationship reasoning by adopting a neural tensor network model (neural tensor networks).

Further, quality assessment is the final 'quality inspection' link of cognitive processing, and the reasonability of the intelligent multi-modal cognitive atlas is ensured, specifically, a) Accuracy rate refers to the degree that entities and relations (coded by nodes and edges in the figure) correctly represent phenomena in real life. Accuracy can be further subdivided into three dimensions: syntax accuracy of syntax accuracycacy, semantic accuracy of semantic accuracacacy and timeliness; b) Coverage is intended to avoid missing elements associated with a domain that might otherwise result in incomplete query or derived results, biased models, etc.

As shown in fig. 3, the present invention discloses a system for constructing an agent cognitive map for the public safety domain, which includes a memory and a processor, wherein the memory includes a program of a method for constructing an agent cognitive map for the public safety domain, and the program of the method for constructing an agent cognitive map for the public safety domain is executed by the processor to implement the following steps:

It should be noted that, in this embodiment, the initial agent cognitive map is constructed through a spatial coordinate system, a virtual reality scene, and a multi-modal cognitive relationship map, where the spatial coordinate system may be switched between a reference system centered on the perceptron and a reference system established by things other than the perceptron; the multi-modal cognitive relationship map is a multi-modal network map which is based on a space coordinate system and a time sequence and is used for mapping the corresponding relationship of various objects in space and time and the incidence relationship of various logics.

According to the embodiment of the present invention, the building of the initial agent recognition spectrum specifically includes:

and mapping the corresponding relation and the logical association relation of various things in time and space based on the space coordinate system and the time sequence so as to construct the multi-modal cognitive relationship map.

It should be noted that, in this embodiment, the spatial coordinate system and the scale are constructed based on OpenGL, wherein the object coordinate system: the object Coordinate system (Local Coordinate) is the Coordinate system associated with a particular object. Each object has its own independent coordinate system, and the coordinate system associated with the object changes as the object moves or changes direction. In some cases, the object coordinate system is also referred to as a model coordinate system. Because the coordinates of the model vertices are described in the model coordinates; world coordinate system: the world coordinate system establishes the frame of reference needed to describe the other coordinate systems. That is, the world coordinate system can be used to describe the location of other coordinate systems, and the world coordinate system cannot be described with a larger, external coordinate system; and (3) observing a coordinate system: the observation coordinate system is also called Eye Coordinates (EC), and is a three-dimensional rectangular auxiliary coordinate system that can be defined anywhere and in any direction in the user coordinate system. In the viewing coordinate system, a plane is generally defined perpendicular to the Z-axis of the coordinate system, called the viewing plane. The coordinate system is mainly used for appointing a cutting space and determining which part of the three-dimensional geometric figure needs to be output on a screen; in addition, the coordinate values of the part, which needs to be output, of the three-dimensional geometric body in the world coordinate system can be converted into the coordinate values in the normalized coordinate system through the observation plane.

Further, using OpenGL to construct a virtual reality scene, including various real-world things, wherein the basic steps of the main graphics operations to construct the scene are: a) A scene model is built from basic graphic elements, and the built model is mathematically described (OpenGL: points, lines, polygons, images, and bitmaps as basic graphic units); b) Placing the scene model at an appropriate position in three-dimensional space and setting a viewpoint (viewpoint) to observe the landscape of interest; c) Calculating the colors of all objects in the model, wherein the colors are determined according to application requirements, and meanwhile, determining illumination conditions, texture pasting modes and the like; d) The mathematical description of the scene model and its color information are converted to pixels on the computer screen, a process known as rasterization (rasterization).

Wherein

and

Representing entities

And entities

Have a relationship

；

Triple unit

Representing entities

Have an attribute

Having an attribute value of

。

The entity may have multiple attributes, such as a person having attributes of height, weight, gender, age, motion, expression, etc., and the attributes may be in the form of textual concepts, such as height: "170cm", may also be in the form of a graphic or animation, such as an animation of "running" corresponding to an action. The entities can have various relationships, such as spatial and time sequence relationships, such as up and down, left and right, front and back, sequence, and the like, and can be logical relationships, such as friends, teachers and students, and the like.

performing audio recognition by using an ARS algorithm;

and performing natural language processing by using the BERT network.

2. In the Decoder, the number of channels is adjusted by convolution of 1x1 on the primary effective feature layer compressed twice, the channels are connected with the features processed by the ASPP, and then the final feature map is obtained by performing convolution twice.

(1) Channel adjustment is performed by using a 1x1 convolution to adjust to the total number of categories

Further, natural language processing is performed using a BERT (Bidirectional Encoder responses from transforms) network, and specifically, two training strategies are used: one is a MASK Language Model (MLM) in which 15% of the words in each sequence are replaced with [ MASK ] tokens before entering the word sequence into BERT, which then attempts to predict the original values of the MASK words based on the context provided by the other non-MASK words in the sequence, specifically, the prediction of the output word requires: (1) adding a classification layer on top of the encoder output; (2) Multiplying the output vectors by the embedding matrix, and converting the output vectors into vocabulary dimensions; (3) The probability of each word in the vocabulary is calculated using softmax. The second is Next Sentence Prediction (NSP), in which the model receives as input a pair of sentences and learns to predict whether the second sentence in the pair is a subsequent sentence in the original document, during training 50% of the input is a pair of sentences in which the second sentence is a subsequent sentence in the original document, and in the other 50% of the input one sentence is randomly selected from the corpus as the second sentence, and assuming that the random sentence is not connected to the first sentence, in order to help the model distinguish the two sentences in the training, the input is processed before entering the model as follows: (1) Inserting a [ CLS ] mark at the beginning of a first sentence and inserting an [ SEP ] mark at the end of each sentence; (2) Adding a sense Embedding indicating sentence a or sentence B to each tag; the entry Embedding is conceptually similar to token Embedding when the vocabulary length is 2; (3) positional embedding is added to each marker to indicate its position in the sequence.

According to the embodiment of the invention, an LSTM + CRF model is adopted for extraction when entity extraction is carried out on public safety field data based on the initial agent cognitive atlas.

It should be noted that, in this embodiment, the correspondence and the dependency relationship of the multi-modal data are established, and entity extraction, entity disambiguation, entity attribute extraction, entity relationship extraction, event extraction, and the like are performed based on the multi-modal data. Taking entity extraction as an example, an LSTM + CRF model is adopted for entity extraction, the LSTM and the CRF are combined together, input past characteristics and sentence-level label information can be captured, a CRF layer has a state transition matrix parameter, past and future label information can be effectively used for predicting a current label through the CRF layer, the current label is similar to a bidirectional LSTM, each sentence is input into the bidirectional LSTM one by one according to a word order, the probability that each word belongs to each entity category label is obtained through combining with positive and negative hidden layer output, and the target function is optimized when the current label is input into the CRF, so that the entity category to which each word belongs is obtained. Wherein, long short-term memory (LSTM) is a special RNN, mainly for solving the problems of gradient extinction and gradient explosion in the Long sequence training process; while the constant rate coefficient (constant rate) is to ensure "certain quality", intelligently allocate code rates, including allocated code rates within the same frame and allocated code rates between frames.

It should be noted that, in this embodiment, the multi-modal cognitive data extracted in the public security domain is fused, including multi-modal entity linking and cognitive merging, where the entity linking refers to an operation of corresponding all obtained identical entities and related entities to the same correct entity in the cognitive library. Firstly, judging whether entities in the existing knowledge base have the same entities or related entities through technologies such as an image segmentation recognition network, a video understanding network and a BERT network, namely merging entities representing the same meaning into a correct entity; then, acquiring an entity object through related technologies of entity extraction; and finally, corresponding the entity object to a correct entity in the cognitive library.

And the cognitive integration means that when the cognitive map of the intelligent agent is constructed, a cognitive library of a third party can be integrated into the cognitive map of the third party. The external cognitive library needs to be merged into a data layer and a mode layer respectively. The fusion of the mode layers includes: concept fusion, concept upper and lower position relation fusion and concept attribute definition fusion. The fusion of the data layers comprises: and fusing entities and fusing entity attributes. Cognitive merging includes merging external cognitive libraries and merging relational databases, fusing external cognitive libraries to local cognitive libraries needs to deal with two layers of problems: the fusion of data layers, including the designation, attribute, relationship, and belonging category of an entity, has a main problem of how to avoid the conflict problem of instances and relationships, which results in unnecessary redundancy. Through the fusion of the mode layers, a newly obtained ontology is fused into an existing ontology base, then a relational database is merged, and in the process of constructing the intelligent agent recognition graph, an important high-quality cognition source is the own relational database of an enterprise or an organization. In order to incorporate these structured historical data into cognitive maps, a Resource Description Framework (RDF) may be employed as a data model.

In this embodiment, the ontology may be manually constructed by manual editing (by using ontology editing software), or may be constructed in a data-driven automated manner. Wherein, the automatic ontology construction process comprises three stages: a) The entity parallel relation similarity is an index measure for investigating the degree of belonging of any given two entities to the same concept classification, and the higher the similarity is, the more likely the two entities belong to the same semantic class. The parallel relationship is a conceptual membership in the vertical direction. For example, "china" and "usa" are entities of country names, and have higher similarity of parallel relationship; and the two entities of the United states and the cell phone are low in possibility of belonging to the same semantic category, so that the similarity of the parallel relationship is low. The current mainstream entity parallel relationship similarity calculation methods include two methods: a pattern matching method and a distribution similarity method. The mode matching method adopts a mode of predefining entity pair modes, obtains the common occurrence frequency of a given keyword combination in the same corpus unit through mode matching, and calculates the similarity between entity pairs according to the common occurrence frequency. The premise assumption of the distributed similarity (distributed similarity) method is that: there is semantic similarity between entities that occur frequently in similar context environments. During specific calculation, each entity is represented as 1N-dimensional vector, wherein each dimension of the vector represents 1 predefined context environment, and vector element values represent the probability of the entity appearing in each context environment, and then the similarity of the parallel relationship among the entities can be obtained by solving the similarity among the vectors; b) The extraction of the upper and lower relation of the entity is used for determining the membership (IsA) relation between concepts, and this relation is also called the upper and lower relation, for example, the phrase (missile, weapon) constitutes the upper and lower relation, where "missile" is the lower word and "weapon" is the upper word. The extraction of the context relationship of the entity is the research focus in the field, and the main research method is to extract the IsA entity pair based on a grammar mode (such as a Hearst mode). Currently, mainstream information extraction systems, such as knowtall, textRunner, NELL, and the like, can extract the context of an entity at a syntax level, and the base adopts an iterative extraction technology based on semantics to extract the context of the entity in a gradual refinement manner. The iterative extraction technology based on semantics generally utilizes a probability model to judge IsA relations and distinguish upper and lower-level words, and generally helps to train the model by means of concept classification knowledge provided by encyclopedic websites so as to improve algorithm precision. For example, when processing a sentence such as "social relationships other than the sentences plus facts as cats", two alternative facts can be obtained by extracting the top and bottom word in the IsA entity pair: (cat, isA, dog) and (cat, isA, domestic animal). If the concept of these entities is already related in the base, the correct result can be obtained; c) And (3) generating an ontology, wherein the main task is to cluster the concepts obtained in each level, calibrate semantic classes of the concepts, and assign one or more public hypernyms to entities in the classes.

Furthermore, the cognitive inference means that new association between entities is established through computer inference from existing entity relationship data in a cognitive library, so that a cognitive network is expanded and enriched. The cognitive inference is an important means and a key link for constructing the cognitive map, and new cognition can be found from the existing cognition through the cognitive inference. For example (qian, father, yong) and (yong, father, kangxi) are known, and (qian, grandfather, kangxi) or (kangxi, grandson, qian) can be obtained. The object of cognitive inference is not limited to the relationship between entities, and may be an attribute value of an entity, a conceptual hierarchical relationship of an ontology, or the like. For example, knowing the birthday attribute of an entity, the age attribute of the entity can be inferred. Concept reasoning can also be done based on concept inheritance relationships in the ontology library, such as known (tiger, family, feline) and (feline, order, carnivora), which can be inferred (tiger, order, carnivora). Cognitive inference methods can be divided into two broad categories: logic-based reasoning and graph-based reasoning. Logic-based reasoning mainly includes first-order predicate logic, description logic, and rule-based reasoning. First order predicate logic builds on propositions, where the proposition is decomposed into two parts, an individual (individuals) and a predicate (prediction). An individual refers to an object that can exist independently, and can be a concrete thing such as a desk, or an abstract concept such as a student. The predicate is a word used for describing the property and object relationship of an individual, for example, friend in a triple (A, friend, B) is the predicate expressing the relationship between the individual A and the individual B. For example, the interpersonal relationship can be inferred by adopting first-order predicate logic, and the method is to regard the relationship as a predicate, regard a person as a argument, express the interpersonal relationship by adopting a logical operation symbol, and then set the logic and constraint conditions of the relationship inference, so that the logic inference of the simple relationship can be realized. For complex entity relationships, inference can be performed using description logic. Description logic (description logic) is a formalization tool based on knowledge representation of objects, is a subset of first-order predicate logic, and is an important design basis for ontology language reasoning. The knowledge base based on description logic generally contains TBox (tertiary box) which is a axiom set for describing relationships between concepts and relationships, and ABox (association box) which is an axiom set describing specific facts. With these two tools, inference based on description logic can be ultimately attributed to the consistency check problem of ABox, thereby simplifying and ultimately enabling relational inference. When reasoning is carried out based on an ontology concept hierarchy, objects are mainly concepts described by a Web ontology language (OWL), and the OWL provides rich sentences and has strong knowledge description capability. However, in the aspect of describing attribute synthesis and attribute value transfer, the expression capability of the network ontology language is insufficient, and in order to realize inference, a special rule language (e.g., SWRL) can be used to add a custom rule to the ontology model for functional expansion. The graph-based reasoning method is mainly based on a neural network model or a Path Ranking algorithm. And expressing the entities in the knowledge base into a word vector form, and performing relationship reasoning by adopting a neural tensor network model (neural tensor networks).

The third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a public safety domain-oriented agent awareness graph building method program, and when the public safety domain-oriented agent awareness graph building method program is executed by a processor, the steps of the public safety domain-oriented agent awareness graph building method described in any one of the above are implemented.

According to the method and the system for constructing the cognitive map of the intelligent agent facing the public safety field, disclosed by the invention, the cognitive map of the intelligent agent is applied to the public safety field, and the strong cognitive reasoning capability of the intelligent agent is utilized to perform early warning prediction, analysis and study and judgment, so that more controllability and convenience can be increased when the cognitive map is applied to the public safety field.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

Claims

1. A construction method of an intelligent agent cognitive map facing the public safety field is characterized by comprising the following steps:

fusing multi-modal cognitive data extracted from the public security field, wherein the fusion mode at least comprises multi-modal entity linking and cognitive combination;

2. The method for constructing an agent cognitive profile oriented to the public safety domain as claimed in claim 1, wherein the constructing of the initial agent cognitive profile specifically comprises:

3. The method for constructing an agent cognitive map for the public safety domain according to claim 1, wherein performing multi-modal recognition on public safety domain data based on the initial agent cognitive map specifically comprises:

performing audio recognition by using an ARS algorithm;

and performing natural language processing by using the BERT network.

4. The method for constructing a public safety domain-oriented agent cognitive map as claimed in claim 1, wherein the entity extraction of the public safety domain data based on the initial agent cognitive map is performed by using an LSTM + CRF model.

5. The method for constructing an intelligent agent cognitive atlas facing public safety domain according to claim 1, wherein the fusing of the multi-modal cognitive data extracted from the public safety domain specifically comprises:

6. The method for constructing an agent cognitive map facing the public safety domain according to claim 1, wherein the cognitive processing is performed according to a logical inference rule preset in the public safety domain to complete the construction of the agent cognitive map, and specifically comprises:

7. The system for constructing the cognitive map of the intelligent agent facing the public safety field is characterized by comprising a memory and a processor, wherein the memory comprises a program of a method for constructing the cognitive map of the intelligent agent facing the public safety field, and the program of the method for constructing the cognitive map of the intelligent agent facing the public safety field is executed by the processor to realize the following steps:

8. The system for constructing an agent cognitive profile oriented to the public safety domain according to claim 7, wherein the constructing of the initial agent cognitive profile specifically comprises:

9. The public safety domain-oriented agent-aware graph building system according to claim 8, wherein performing multi-modal recognition on public safety domain data based on the initial agent-aware graph specifically comprises:

performing audio recognition by using an ARS algorithm;

and performing natural language processing by using the BERT network.

10. A computer-readable storage medium, wherein the computer-readable storage medium includes a public safety domain-oriented agent awareness graph building method program, and when the public safety domain-oriented agent awareness graph building method program is executed by a processor, the steps of the public safety domain-oriented agent awareness graph building method according to any one of claims 1 to 6 are implemented.