CN116467607A

CN116467607A - Information matching method and storage medium

Info

Publication number: CN116467607A
Application number: CN202310344513.3A
Authority: CN
Inventors: 洪海文; 金炫
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-07-21
Anticipated expiration: 2043-03-28
Also published as: CN116467607B

Abstract

The application discloses an information matching method and a storage medium. Wherein the method comprises the following steps: monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched; invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a disturbance image sample of an image sample as a positive sample based on a confusing text sample of a text sample as a negative sample, and the text sample is used for describing the image content of the image sample; extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic identification model; at least one target image matched with the semantic features is determined in a database, wherein the database is used for storing images matched with different semantic features. The method and the device solve the technical problem of low information searching accuracy.

Description

Information matching method and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an information matching method and a storage medium.

Background

At present, in a picture wind control function, a contrast learning double-flow multi-mode model (Contrastive Language-Image Pre-tracking is simply called CLIP) is generally adopted for information retrieval, but the CLIP model has poor understanding capability on long texts and weak distinguishing capability on similar concepts in text information, so that the technical problem of low information retrieval accuracy is caused.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the application provides an information matching method and a storage medium, which are used for at least solving the technical problem of low information searching accuracy.

According to one aspect of the embodiments of the present application, there is provided an information matching method. The method may include: monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched; invoking a semantic recognition model, wherein the semantic recognition model is obtained by taking a confusing text sample of a text sample as a negative sample, taking a disturbance image sample of an image sample as a positive sample and combining contrast learning training, and the text sample is used for describing the image content of the image sample; extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic identification model; at least one target image matched with the semantic features is determined in a database, wherein the database is used for storing images matched with different semantic features.

According to an aspect of the embodiment of the application, an information matching method is also provided. The method comprises the following steps: displaying information to be matched on an operation interface, wherein the information to be matched comprises text information to be matched and/or image information to be matched; at least one target image matched with semantic features of information to be matched is displayed on an operation interface in response to a matching operation instruction acting on the operation interface, wherein the at least one target image is determined from a database, the semantic features are extracted from the information to be matched based on a feature extraction model corresponding to the information to be matched in a semantic recognition model, the semantic recognition model is obtained by taking a confusing text sample of a text sample as a negative sample and taking a disturbance image sample of the image sample as a positive sample and combining contrast learning training, and the text sample is used for describing image content of the image sample.

According to an aspect of the embodiment of the application, an information matching method is also provided. The method comprises the following steps: monitoring risk information to be matched from an information matching platform, wherein the risk information to be matched comprises risk text information to be matched and/or risk image information to be matched; invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a disturbance image sample of a risk image sample as a positive sample based on a mixed text sample of the risk text sample as a negative sample, and the risk text sample is used for describing the image content of the risk image sample; extracting risk semantic features from risk information to be matched by using a feature extraction model corresponding to the risk information to be matched in the semantic recognition model; determining at least one target image matched with the risk semantic features in a database, wherein the database is used for storing images matched with different risk semantic features; and returning at least one target image to the information matching platform for display, wherein the information matching platform is used for transmitting the at least one target image to the terminal equipment, and the risk event corresponding to the target image is prevented and controlled by the terminal equipment.

According to an aspect of the embodiment of the application, a method for generating a semantic recognition model is also provided. The method comprises the following steps: acquiring a text sample and an image sample, wherein the text sample is used for describing the image content of the image sample; generating a confusing text sample of the text sample and a disturbance image sample of the image sample; training to obtain a semantic recognition model by taking the confusing text sample as a negative sample and taking the disturbance image sample as a positive sample, wherein the semantic recognition model comprises a feature extraction model for extracting semantic features of input text information and a feature extraction model for extracting semantic features of the input image information.

According to an aspect of the embodiment of the application, an information matching device is also provided. The device comprises: the monitoring unit is used for monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched; the invoking unit is used for invoking a semantic recognition model, wherein the semantic recognition model is obtained by taking a confusing text sample of a text sample as a negative sample, taking a disturbance image sample of an image sample as a positive sample and combining with contrast learning training, and the text sample is used for describing the image content of the image sample; the extraction unit is used for extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic recognition model; the searching unit is used for determining at least one target image matched with the semantic features in a database, wherein the database is used for storing images matched with different semantic features.

According to an aspect of the embodiment of the application, an information matching device is also provided. The device comprises: the first display unit is used for displaying information to be matched on the operation interface, wherein the information to be matched comprises text information to be matched and/or image information to be matched; the second display unit is used for responding to a matching operation instruction acted on the operation interface, displaying at least one target image matched with the semantic features of the information to be matched on the operation interface, wherein the at least one target image is determined from a database, the semantic features are extracted from the information to be matched based on a feature extraction model corresponding to the information to be matched in a semantic recognition model, the semantic recognition model is obtained by taking a confusing text sample as a negative sample and taking a disturbance image sample of the image sample as a positive sample and combining comparison learning training, and the text sample is used for describing the image content of the image sample.

According to an aspect of the embodiment of the application, an information matching device is also provided. The device comprises: the monitoring unit is used for monitoring risk information to be matched from the information matching platform, wherein the risk information to be matched comprises risk text information to be matched and/or risk image information to be matched; the invoking unit is used for invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a confused text sample of a risk text sample as a negative sample and a disturbance image sample of a risk image sample as a positive sample, and the risk text sample is used for describing the image content of the risk image sample; the extraction unit is used for extracting risk semantic features from the risk information to be matched by using a feature extraction model corresponding to the risk information to be matched in the semantic recognition model; the searching unit is used for determining at least one target image matched with the risk semantic features in a database, wherein the database is used for storing images matched with different risk semantic features; the display unit is used for returning at least one target image to the information matching platform for display, wherein the information matching platform is used for transmitting the at least one target image to the terminal equipment, and risk events corresponding to the target image are prevented and controlled by the terminal equipment.

According to an aspect of the embodiment of the application, a device for generating a semantic recognition model is also provided. The device comprises: an acquisition unit configured to acquire a text sample and an image sample, wherein the text sample is used to describe image content of the image sample; a generation unit for generating a confusing text sample of the text sample and a disturbance image sample of the image sample; the training unit is used for training the confusing text sample as a negative sample and the disturbance image sample as a positive sample to obtain a semantic recognition model, wherein the semantic recognition model comprises a feature extraction model for extracting semantic features of the input text information and a feature extraction model for extracting the semantic features of the input image information.

According to another aspect of the embodiments of the present application, there is also provided a computer-readable storage medium including a stored program, where the program when executed by a processor controls a device in which the computer-readable storage medium is located to perform an information matching method.

According to another aspect of the embodiments of the present application, there is also provided an electronic device, including: a memory and a processor; the memory is for storing computer executable instructions for executing computer executable instructions that when executed by the processor perform method steps for generating a semantic recognition model.

In the embodiment of the application, after the information to be matched is monitored, a semantic recognition model is called, and a feature extraction model corresponding to the information to be matched in the semantic recognition model is used for extracting semantic features from the information to be matched, wherein the information to be matched can comprise text information to be matched and/or image information to be matched, and after the semantic features are extracted from the information to be matched, at least one target image matched with the semantic features can be searched in a database. The semantic recognition model is trained based on the fact that a confusion sample of a text sample is used as a negative sample, and a disturbance image sample of an image sample is used as a positive sample, the text sample is used for describing image content of the image sample, and the confusion text forces the model to learn finer granularity semantics, so that the semantic recognition model has better understanding ability on long texts, better recognition ability on text information with high similarity, more accurate semantic features are extracted from information to be matched by utilizing the semantic recognition model, and the purpose of higher matching degree between a searched target image and the information to be matched can be achieved by searching in a database based on the extracted semantic features, further, the technical effect of improving the accuracy of information searching is achieved, and further the technical problem of low information searching accuracy is solved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a hardware configuration block diagram of a computer terminal (or mobile device) for implementing an information matching method according to an embodiment of the present application;

FIG. 2 is a block diagram of a computing environment according to an embodiment of the present application;

FIG. 3 is a block diagram of a service grid according to an embodiment of the present application;

FIG. 4 is a flow chart of an information matching method implemented in accordance with the present application;

FIG. 5 is a flow chart of a method of information matching according to an embodiment of the present application;

FIG. 6 is a flow chart of a method of information matching according to an embodiment of the present application;

FIG. 7 is a flow chart of a method of generating a semantic recognition model according to an embodiment of the present application;

FIG. 8 is a flow chart of a method of information matching according to an embodiment of the present application;

FIG. 9 is a schematic diagram of information matching according to an embodiment of the present application;

fig. 10 is an information matching apparatus according to an embodiment of the present application;

fig. 11 is an information matching apparatus according to an embodiment of the present application;

fig. 12 is an information search apparatus according to an embodiment of the present application;

FIG. 13 is a generation apparatus of a semantic recognition model according to an embodiment of the present application;

fig. 14 is a block diagram of a computer terminal according to an embodiment of the present application.

Detailed Description

In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial terms or terminology appearing in describing embodiments of the present application are applicable to the following explanation:

the information to be matched is used for indicating information input by a user, and the information to be matched can comprise text information to be matched and/or image information to be matched;

the semantic recognition model is based on a model which is obtained by training a confusion sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample and is used for extracting semantic features in information to be matched in advance;

the text information quantity is used for measuring the information quantity and the semantic richness degree contained in the text, for example, a 'notebook' and a 'notebook' are used for measuring the size of the text, and a light, thin and portable computer type is adopted, wherein the former contains smaller information quantity, the semantic richness degree is weaker, the latter contains more information quantity, and the semantic richness degree is stronger;

the negative sample is used for confusing the text sample, and the obtained confusing sample has smaller semantic feature similarity with the text sample;

and the positive sample is obtained after the operations such as cutting, adding mask coverage and the like are performed on the image sample, so that the similarity between the image content of the disturbance image sample and the semantic features of the image content of the image sample is large.

Example 1

According to an embodiment of the present application, there is provided an information matching method, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different from that herein.

The method embodiment provided in the first embodiment of the present application may be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing an information matching method according to an embodiment of the present application. As shown in fig. 1, the computer terminal 10 (or mobile device) may include one or more processors 102 (shown as 102a, 102b, … …,102 n) which may include, but are not limited to, a microprocessor (Micro Controller Unit, MCU) or a programmable logic device (Field Programmable Gate Array, FPGA) or the like, a memory 104 for storing data, and a transmission module 106 for communication functions. In addition, the method may further include: a display, an Input/Output interface (I/O) interface, a universal serial BUS (Universal Serial Bus, USB) port (which may be included as one of the ports of the BUS), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 1 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuits described above may be referred to generally herein as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the present application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination to interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the information matching method in the embodiments of the present application, and the processor 102 executes the software programs and modules stored in the memory 104, thereby performing various functional applications and data processing, that is, implementing the information matching method described above. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

The display may be, for example, a touch screen type liquid crystal display (Liquid Crystal Display, LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

The hardware block diagram shown in fig. 1 may be used not only as an exemplary block diagram of the computer terminal 10 (or mobile device) described above, but also as an exemplary block diagram of the server described above, and in an alternative embodiment, fig. 2 shows, in block diagram form, one embodiment of using the computer terminal 10 (or mobile device) shown in fig. 1 described above as a computing node in a computing environment 201. Fig. 2 illustrates a block diagram of a computing environment for an information matching method, as shown in fig. 2, where the computing environment 201 includes a plurality of computing nodes (e.g., servers) running on a distributed network (shown as 210-1, 210-2, …). The computing nodes each contain local processing and memory resources and end user 202 may run applications or store data remotely in computing environment 201. An application may be provided as a plurality of services 220-1,220-2,220-3 and 220-4 in computing environment 201, representing services "A", "D", "E", and "H", respectively.

End user 202 may provide and access services through a web browser or other software application on a client, in some embodiments, provisioning and/or requests of end user 202 may be provided to portal gateway 230. Ingress gateway 230 may include a corresponding agent to handle provisioning and/or request for services (one or more services provided in computing environment 201).

Services are provided or deployed in accordance with various virtualization techniques supported by the computing environment 201. In some embodiments, services may be provided according to Virtual Machine (VM) based virtualization, container based virtualization, and/or the like. Virtual machine-based virtualization may be the emulation of a real computer by initializing a virtual machine, executing programs and applications without directly touching any real hardware resources. While the virtual machine virtualizes the machine, according to container-based virtualization, a container may be started to virtualize the entire Operating System (OS) so that multiple workloads may run on a single Operating System instance.

In one embodiment based on container virtualization, several containers of a service may be assembled into one workload container group (Pod) (e.g., kubernetes Pod). For example, as shown in FIG. 2, the service 220-2 may be equipped with one or more Pods 240-1,240-2, …,240-N (collectively referred to as Pods). The Pod may include an agent 245 and one or more containers 242-1,242-2, …,242-M (collectively referred to as containers). One or more containers in the Pod handle requests related to one or more corresponding functions of the service, and the agent 245 generally controls network functions related to the service, such as routing, load balancing, etc. Other services may also be equipped with Pod similar to Pod.

In operation, executing a user request from end user 202 may require invoking one or more services in computing environment 201, and executing one or more functions of one service may require invoking one or more functions of another service. As shown in FIG. 2, service "A"220-1 receives a user request of end user 202 from ingress gateway 230, service "A"220-1 may invoke service "D"220-2, and service "D"220-2 may request service "E"220-3 to perform one or more functions.

The computing environment may be a cloud computing environment, and the allocation of resources is managed by a cloud service provider, allowing the development of functions without considering the implementation, adjustment or expansion of the server. The computing environment allows developers to execute code that responds to events without building or maintaining a complex infrastructure. Instead of expanding a single hardware device to handle the potential load, the service may be partitioned to a set of functions that can be automatically scaled independently.

In another alternative embodiment, FIG. 3 illustrates in block diagram form one embodiment of using the computer terminal 10 (or mobile device) illustrated in FIG. 1 described above as a service grid. Fig. 3 illustrates a block diagram of a service grid, as shown in fig. 3, that is a service grid 300 that is primarily used to facilitate secure and reliable communication between a plurality of micro services, i.e., applications that are broken down into a plurality of smaller services or instances and run on different clusters/machines.

As shown in fig. 3, the micro-services may include an application service instance a and an application service instance B, which form a functional application layer of the service grid 300. In one embodiment, application service instance A runs in the form of container/process 308 on machine/workload container set 314 (Pod) and application service instance B runs in the form of container/process 310 on machine/workload container set 316 (Pod).

In one embodiment, the application service instance a may be an information search service based on the information to be matched, and the application service instance B may be a risk prevention service based on the searched target image.

As shown in fig. 3, application service instance a and grid agent (sidecar) 303 coexist in machine workload container set 614 and application service instance B and grid agent 305 coexist in machine workload container 314. Grid agent 303 and grid agent 305 form a data plane layer (data plane) of service grid 300. Wherein the grid agent 303 and the grid agent 305 are running in the form of containers/processes 304, 306, respectively, which may receive requests 312 for goods inquiry services, and which may be in bi-directional communication between the grid agent 303 and the application service instance a, and which may be in bi-directional communication between the grid agent 305 and the application service instance B. In addition, two-way communication is also possible between the grid agent 303 and the grid agent 305.

In one embodiment, application service instance A's traffic is routed through grid agent 303 to the appropriate destination and application service instance B's network traffic is routed through grid agent 305 to the appropriate destination. It should be noted that, the network traffic mentioned herein includes, but is not limited to, hypertext transfer protocol (Hyper Text Transfer Protocol, abbreviated as HTTP), representational state transfer (Representational State Transfer, abbreviated as REST), high performance, general purpose open source framework (google Remote Procedure Call, gRPC), data structure storage system in the memory of the open source (such as remote dictionary service (Remote Dictionary Server, redis)), and the like.

In one embodiment, the functionality of the extended data plane layer may be implemented by writing custom filters (filters) for agents (envoys) in the service grid 300, which may be configured to enable the service grid to properly proxy service traffic for service interworking and service remediation. Grid agent 303 and grid agent 305 may be configured to perform at least one of the following functions: service discovery (service discovery), health checking (Routing), load Balancing (Load Balancing), authentication and authorization (authentication andauthorization), and observability (observability).

As shown in fig. 3, the service grid 300 also includes a control plane layer. Wherein the control plane layer may be a set of services running in a dedicated namespace, hosted by the hosting control plane component 301 in the machine/workload container set (machine/Pod) 302. As shown in fig. 3, managed control plane component 301 is in bi-directional communication with grid agent 303 and grid agent 305. Managed control plane component 301 is configured to perform some control management functions. For example, managed control plane component 301 receives telemetry data transmitted by grid agent 303 and grid agent 305, which may be further aggregated. These services, managed control plane component 301 may also provide user-oriented application program interfaces (Application Programming Interface, APIs) to more easily manipulate network behavior, provide configuration data to grid agents 303 and 305, and the like.

In the above-described operating environment, the present application provides an information matching method as shown in fig. 4. Fig. 4 is a flowchart of an information matching method according to embodiment 1 of the present application, and as shown in fig. 4, the method includes the steps of:

in step S401, information to be matched is monitored, where the information to be matched includes text information to be matched and/or image information to be matched.

In the technical solution provided in the above step S401 of the present application, information to be matched, which is triggered and input by a user, may be received, where the information to be matched may include text information to be matched and/or image information to be matched, where the information to be matched is used to search a target image in a database, and there is a higher matching degree between the semantic features of the image of the target image and the semantic features of the information to be matched.

For example, the user may input information to be matched in the search engine, where the information to be matched may be text information to be matched, for example, the text "an lovely dog", or the information to be matched may be image information to be matched, for example, a picture of a dog, that is, the text information to be matched and the image information to be matched may be related to each other, and information describing the same object is used. And inquiring the target image matched with the information to be matched in the database based on the information to be matched input by the user. It should be noted that, the user may input the text information to be matched and the image information to be matched in the search engine at the same time, or may input only the text information to be matched, or may input only the image information to be matched.

For another example, the user may search the web page for a commodity picture, that is, the information to be matched may be a commodity picture, and the target image corresponding to the commodity picture may be searched in the database based on the commodity picture input by the user.

In step S402, a semantic recognition model is invoked, where the semantic recognition model is obtained based on training a confusing text sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample, and the text sample is used to describe image content of the image sample.

In the technical solution provided in step S402, after the information to be matched is detected, a semantic recognition model may be invoked to extract semantic features in the information to be matched, where the semantic recognition model is obtained by taking a confounding sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample and combining with contrast learning training, where the text sample for training corresponds to the image sample, and the text sample is used for describing image content of the image sample.

In this embodiment, when training the semantic recognition model, multiple sets of image-text pairs may be used, where each set of image-text pairs includes a text sample and an image sample, and the text sample is used to describe the image content of the image sample, that is, the text sample and the image sample express the same semantic feature. In the training process, the text samples and the image samples in each group of the graph pairs can be simultaneously input into a semantic recognition model so as to train the similarity recognition capability of the semantic recognition model on the text samples and the image samples in each group of the graph pairs. In order to enhance the similarity recognition capability of the semantic recognition model on the text sample and the image sample, when the semantic recognition model is trained, the text sample in each group of the text pairs can be subjected to Multi-granularity (Multi-Level) disorder processing to obtain a confused text sample of the text sample, the image sample is subjected to small disturbance to obtain a disturbance image sample of the image sample, the confused text sample is further used as a negative sample for model training, the disturbance image sample is used as a positive sample for model training, and the semantic recognition model is trained in combination with contrast learning.

For example, the text samples in each set of graphics pairs may be disordered to different degrees at the character level, the word level, and the knowledge level, so as to obtain a mixed text sample, because the mixed text sample is obtained by disordered text samples, the semantics expressed by the mixed text sample and the text sample may be different, based on which, training the mixed text sample as a negative sample model, the similarity distance between the text sample and the image sample in the graphics pair may be shortened, and the similarity distance between the mixed text sample and the image sample may be reduced by cross entropy loss and Softmax (an activation function, which may normalize a numerical vector into a probability distribution vector, and the sum of the probabilities is 1) the activation function is pulled away, so as to form fine-grained supervision on the image sample. In addition, small perturbation can be performed on the Image samples in each group of images to strengthen the anti-jamming capability of the Image samples, for example, a visual Self-supervision text-Image Pre-training method (Self-supervision Language-Image Pre-training method, abbreviated as SLIP) can be used to perform small perturbation such as random cutting on the Image samples to obtain perturbed Image samples, or a Fast text-Image Pre-training method (Fast text-Image Pre-training method, abbreviated as FLIP) is used to add partial masks into the Image samples to mask partial contents of the images or the Image texts, so as to obtain masked images, the process does not need to pass through a full graph, and only the unmasked parts pass through the graph, thereby accelerating the training process. That is, the FLIP is used for masking the image samples, so that the speed of obtaining the disturbance image samples is high, and the model training process can be accelerated.

After obtaining the disturbance image sample, the semantic recognition model may be trained with the disturbance image sample as a positive sample. In the process of identifying the image sample and the disturbance image sample, the semantic identification model can conduct similarity supervision by comparing the characteristics of the image sample and the disturbance image sample. Since the substantial content presented by the disturbance image sample and the image sample is not changed, by comparing the features of the image sample and the disturbance image sample, it can be found that the features of the image sample and the disturbance image sample should have higher similarity, that is, the image semantic features expressed by the disturbance image sample are substantially the same as the image semantic features expressed by the image sample, so that the visual disturbance rejection capability of the semantic recognition model can be enhanced.

Step S403, extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic recognition model.

In the technical solution provided in the above step S403 of the present application, the semantic recognition model includes a feature extraction model, where the feature extraction model is used to extract semantic features in the information to be matched, based on which, after the semantic recognition model is called, the feature extraction model corresponding to the information to be matched in the semantic recognition model may be used to extract the semantic features from the information to be matched.

In this embodiment, the feature extraction model may perform semantic understanding on the information to be matched, and further extract semantic features of the information to be matched from the semantic understanding. The feature extraction model may be a picture encoder or a text encoder, if the information to be matched is text information to be matched, the text encoder may be used to extract semantic features from the text information, and if the information to be matched is image information to be matched, the picture encoder may be used to extract semantic features from the image information.

Step S404, determining at least one target image matched with the semantic features in a database, wherein the database is used for storing images matched with different semantic features.

In the technical solution provided in step S404 of the present application, after extracting the semantic features from the information to be matched, at least one target image matched with the semantic features may be determined from a database based on the semantic features, where the database stores the images matched with different semantic features in advance.

For example, a plurality of images are stored in the database, each image corresponds to a semantic feature, and if the semantic feature extracted from the information to be matched is "one dog", the semantic feature matched with the "one dog" can be searched for from the database based on the semantic feature, and then the image corresponding to the searched semantic feature is determined as the target image matched with the semantic feature. Wherein, there may be only one image related to the dog or there may be multiple images related to the dog in the database, so that, based on the semantic feature, there may be more than one image searched in the database, the searched image may be used as at least one target image matched with the semantic feature.

Based on the schemes disclosed in the steps S401 to S404 in the foregoing embodiments, after the information to be matched is monitored, a semantic recognition model is called, and a feature extraction model corresponding to the information to be matched in the semantic recognition model is used to extract semantic features from the information to be matched, where the information to be matched may include text information to be matched and/or image information to be matched. After the semantic features are extracted, at least one target image that matches the semantic features may be determined in a database. The semantic recognition model is obtained by training a confusion sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample, the text sample is used for describing the image content of the image sample, and the confusion text forces the model to learn semantics with finer granularity, so that the semantic recognition model has better understanding capability on long text, better recognition capability on text information with high similarity, more accurate semantic features extracted from information to be matched by utilizing the semantic recognition model, and higher matching degree between a searched target image and the information to be matched can be achieved by searching in a database based on the extracted semantic features, further, the technical effect of improving the accuracy of information searching is achieved, and further, the technical problem of low information searching accuracy is solved.

The above-described method of this embodiment is further described below.

As an alternative embodiment, the method further comprises: and carrying out disorder processing on the text samples according to different text information amounts to obtain confused text samples, wherein the text information amounts are used for at least determining the semantics of the text samples, and the semantics of the confused text samples are different from the semantics of the text samples.

In this embodiment, when training the semantic recognition model, multiple sets of image-text pairs may be used, where each set of image-text pairs includes a text sample and an image sample, in order to enhance the understanding capability of the model on the semantics of the text sample, the text samples in each set of image-text pairs may be processed in an out-of-order manner according to different text information amounts, so as to obtain a confusing text sample, where the text information amounts are used to at least determine the semantics of the text sample, and the semantics of the confusing text sample are different from the semantics of the text sample.

For example, assuming that the text sample is "astronaut riding horse", the obtained confusing text sample may be "horse riding astronaut" after the text sample is processed out of order, so that the text sample and the confusing text sample after out of order express different semantics. For example, in the actual operation, one text sample may correspond to a plurality of mixed text samples after disorder.

As an alternative embodiment, the method further comprises: acquiring a first semantic similarity between the semantics of the confusing text sample as the negative sample and the semantics of the image sample, wherein the first semantic similarity is smaller than a first semantic similarity threshold; acquiring a disturbance image sample as a second semantic similarity between the semantics of the positive sample and the semantics of the image sample, wherein the second semantic similarity is greater than a second semantic similarity threshold; based on the first semantic similarity and the second semantic similarity, training to obtain a semantic recognition model.

In this embodiment, taking an arbitrary set of image-text pair samples as an example, after obtaining a mixed text sample corresponding to a text sample in the set of image-text pairs, a first semantic similarity between a semantic of the mixed text sample as a negative sample and a semantic of an image sample may be further obtained, where the first semantic similarity is used to represent a degree of similarity between the semantic of the mixed text sample and the semantic corresponding to the image sample, and the first semantic similarity is smaller than a first semantic similarity threshold; in addition, after obtaining the disturbance image samples corresponding to the image samples in the group of graph pairs, obtaining a second semantic similarity between the semantics of the disturbance image samples as positive samples and the semantics of the image samples, wherein the second semantic similarity is used for representing the similarity degree between the semantics of the disturbance image samples and the semantics of the image samples, and the second semantic similarity is larger than a second semantic similarity threshold; then, based on the first semantic similarity and the second semantic similarity, training to obtain a semantic recognition model, wherein a first semantic similarity threshold and a second semantic similarity threshold can be preset, and the method is not limited.

For example, as can be seen from the foregoing description, the text samples in a set of text pairs are used to describe the image content of the image samples, that is, the semantics of the text samples are the same as those of the corresponding image samples, and in addition, after the text samples are processed in a disordered manner, the obtained semantics of the confused text samples are different from those of the text samples, based on this, after the confused text samples corresponding to the text samples are obtained, a first similarity between the semantics of the confused text samples and those of the image samples can be obtained, where the first similarity is used to represent the similarity between the semantics of the confused text samples and those of the image samples, and also is used to represent the similarity between the semantics of the confused text samples and those of the text samples, and therefore, when the confused text samples are used as negative samples for training, it is ensured that the first similarity between the semantics of the confused text samples and those of the image samples is smaller than a first similarity threshold, so that fine-grained monitoring is formed on the image samples.

In addition, as can be seen from the foregoing description, the disturbance image sample is obtained after the image sample is cut, the semantics of the disturbance image sample and the image sample are substantially the same, based on this, the disturbance image sample may be obtained as a second semantic similarity between the positive sample and the image sample, where the second semantic similarity is used to represent the similarity degree between the disturbance image sample and the image sample, and since the semantics of the disturbance image sample and the image sample are substantially the same, when the disturbance image sample is used as the positive sample to train the semantic recognition model, it is ensured that the second semantic similarity between the disturbance image sample and the image sample is greater than the second semantic similarity threshold, so as to improve the visual anti-interference capability of the semantic recognition model.

Optionally, if the first semantic similarity is not less than the first semantic similarity threshold, or the second semantic similarity is not greater than the second semantic similarity threshold, training the semantic recognition model is continued until the first semantic similarity is less than the first semantic similarity threshold and the second semantic similarity is greater than the second semantic similarity threshold.

Optionally, when training the semantic recognition model by using the aliased text sample as a negative sample, the aliased text sample exists as a negative example in the model contrast learning training process, so that when training the distinguishing capability of the semantic recognition model on similar texts, the similarity between the aliased text sample obtained by model recognition and the text sample is continuously approximate to 0%, so as to ensure the high distinguishing capability of the semantic recognition model on high similar texts. In addition, when the disturbance image sample is used as a positive sample to train the semantic recognition model, the disturbance image sample is used as a positive example in the model contrast learning training process, namely, a supervision training is formed between the disturbance image sample and the image sample, so that the recognition similarity of the semantic recognition model to the disturbance image sample and the image sample is approximately 100%, and the recognition accuracy of the semantic recognition model to the image sample is ensured.

As an optional implementation manner, the information to be matched at least includes text information to be matched, and the method further includes: acquiring at least one extended text message of the text message; selecting target text information from the text information and at least one extended text information; step S403, extracting semantic features from the information to be matched using a feature extraction model corresponding to the information to be matched in the semantic recognition model, including: text semantic features are extracted from the target text information using a text feature extraction model corresponding to the text information.

In this embodiment, the to-be-matched information at least includes to-be-matched text information, based on which expansion processing can be performed on the to-be-matched text information by using a political knowledge map module, where the political knowledge map module includes a political knowledge base, and a huge amount of knowledge is stored in the political knowledge base, so that expansion processing can be performed on the to-be-matched text information by using the political knowledge map module to obtain at least one expanded text information, then the to-be-matched text information and the at least one expanded text information can be displayed to a user, and when a user selects a certain text information, the selected text information can be used as target text information, and further text semantic features can be extracted from the target text information by using a text feature extraction module corresponding to the text information in the semantic recognition model.

For example, assuming that the text information to be matched is a "notebook computer", the at least one text information obtained by expanding the text information to be matched by the political knowledge graph module may include text contents of "notebook computer", a light and thin portable computer type "," host computer ", and the like, which are strongly related to the" notebook computer ", and then the user may select one text information from the" notebook computer "," notebook computer, a light and thin portable computer type "," host computer ", and the like as the target text information.

As an alternative embodiment, obtaining at least one extended text message of the text message includes: acquiring a plurality of keywords of text information; determining a risk index of the keyword, wherein the risk index is used for representing the degree of risk of the keyword; and converting the keywords with risk indexes higher than the risk index threshold value in the keywords into expanded text information.

In this embodiment, when at least one extended text message of the text message is acquired, the text message may be split into a plurality of keywords according to a named entity recognition technology (Name Entity Recognition, abbreviated as NER) in the administrative knowledge graph recommendation module, a risk keyword list is constructed, and risk scores are performed on each keyword in the risk keyword list by using a real-time risk prediction function of the administrative knowledge graph recommendation module, where the risk scores are used to indicate risk indexes of the corresponding keywords, and the risk indexes of the keywords with higher scores are higher and are used to represent the risk degrees of the keywords; after determining the risk indexes corresponding to the risk keywords in the risk keyword list, keywords with risk indexes higher than the risk index threshold value in the keywords can be expanded to obtain expanded text information.

For example, after risk scoring is performed on each keyword in the text information, the risk score may be compared with a risk indicator threshold, so that keywords with risk scores higher than the risk indicator threshold are screened out, the screened keywords are keywords with higher risk, and the screened keywords with higher risk indicators are expanded to obtain the expanded text information. When the screened keywords are expanded, the wiki knowledge base and the administrative knowledge map can be utilized to perform double knowledge expansion, the wiki knowledge base is more focused on the expansion of the knowledge, the administrative knowledge map is more focused on the expansion of the administrative wind control, and the expansion text information expanded by the two methods can be analyzed and spliced to display more accurate expansion text information for users.

As an alternative embodiment, obtaining a plurality of keywords of text information includes: in the knowledge graph, based on the weight of the entity of the text information, the text information is segmented into a plurality of keywords, wherein the knowledge graph is used for representing the association relationship among the entities.

In this embodiment, the knowledge graph is configured by association relationships between a plurality of entities and the entities, where the entities are used to indicate things in the real world, such as names of persons, names of places, companies, telephones, animals, and the like, where each entity corresponds to a weight, the weight is used to indicate a risk degree of a corresponding entity, where the entity with a high risk degree corresponds to a larger weight, and the entity with a low risk degree corresponds to a smaller weight, based on which the entity included in the text information can be determined, and based on the entity included in the text information, the weight of each entity included in the text information is determined in the knowledge graph, and the text information is further segmented into a plurality of keywords, each keyword is used to represent one entity, and based on the association relationships between the entities represented by each keyword in the text information can be searched in the knowledge graph.

As an alternative embodiment, the method further comprises: and adjusting the weight of the entity of the text information in the knowledge graph based on the selection times of the target text information.

In this embodiment, since the text information input by the user can be expanded to obtain at least one expanded text information, the user can select the target text information from the text information and the at least one expanded text information, and based on this, the weight of the entity of the text information in the knowledge graph can be adjusted according to the number of selections of the target text information.

For example, assume that the text information input by the user is "crayon xiao xin", the extended text information obtained after the text information is extended includes "crayon xiao xin", japanese animation, "crayon xiao xin, and 4 years old child", and the extended text information obtained by associating the text information includes "oltmann," "dynamic superman," and the extended text information includes one or more entities, and the entities correspond to attribute information, such as "oltmann," as an entity, and the corresponding attribute information is "movie work," based on which the weight of the attribute information can be adjusted according to the number of times the user selects the attribute information corresponding to the entity in each text information, and then the weight of the entity is adjusted based on the weight of the attribute information. For example, if the number of selections of the user for the "movie works" is high, the weight of the "movie works" may be increased, and since the entity corresponding to the "movie works" is "oltmann", the weight of the entity "oltmann" in the knowledge graph may be also increased based on this.

As an alternative embodiment, converting a keyword having a risk indicator higher than a risk indicator threshold in a plurality of keywords into extended text information includes: converting keywords with risk indexes higher than risk index thresholds in the keywords into expanded text information based on a knowledge graph, wherein the knowledge graph is used for representing association relations among the entities; and/or converting keywords having risk indicators higher than the risk indicator threshold among the plurality of keywords into expanded text information based on a knowledge base, wherein the knowledge base is used for storing knowledge information associated with the keywords.

In this embodiment, when expanding text information to obtain expanded text information, first, risk indexes of a plurality of keywords in the text information may be determined, and then keywords with risk indexes higher than a risk index threshold are screened out based on risk indexes and risk index thresholds of the keywords, and further, keywords with higher risk indexes are expanded to obtain expanded text information. When the keywords are expanded, the keywords with risk indexes higher than the risk index threshold value can be converted into expanded text information based on a knowledge graph and/or a knowledge base, wherein the knowledge graph is used for representing the association relation among a plurality of entities, and the knowledge base is used for storing knowledge information associated with the keywords.

For example, the knowledge graph may be a political knowledge graph, and the knowledge base may be a wiki knowledge base, wherein when keywords with risk indexes higher than a risk index threshold are expanded based on the knowledge graph, the keywords are mainly expanded in the aspect of political management and wind control; when keywords with risk indexes higher than the risk index threshold are expanded based on the knowledge base, the keywords are mainly expanded in the aspect of the intelligibility, and expanded text information is obtained.

As an alternative embodiment, determining, in the database, a target image that matches the semantic feature includes: in the database, a target image with similarity to the text semantic feature greater than a third similarity threshold is determined.

In this embodiment, as can be seen from the foregoing description, the text semantic features can be extracted from the target text information using the feature extraction model corresponding to the text information in the semantic recognition model. After extracting the text semantic features, searching for an image with similarity to the semantic features greater than a third similarity threshold in the database, and determining the searched image as a target image matched with the semantic features, wherein the third similarity threshold can be preset.

For example, the database may be a picture feature library, a large number of images are stored in the picture feature library, each image corresponds to a semantic feature, based on the semantic feature, after extracting text semantic features from the target text information, the picture feature library may be searched, and then a similarity between the text semantic features and the semantic features of each image in the picture feature library is calculated, and then an image corresponding to a highest similarity is determined as the target image searched based on the text semantic features, where the highest similarity is greater than a third similarity threshold.

Optionally, when searching the target image most matched with the text semantic feature in the picture feature library, the Euclidean distance between the text semantic feature and the semantic feature of each picture in the picture feature library can be calculated, so that the target image is determined based on the Euclidean distance, wherein the shorter the Euclidean distance is, the higher the similarity between the text semantic feature and the semantic feature of the picture is indicated, so that the image corresponding to the semantic feature with the shortest Euclidean distance between the semantic feature of the image in the picture feature library and the text semantic feature can be determined as the target image.

As an optional implementation manner, the information to be matched includes text information to be matched and image information to be matched, and the extracting of semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic recognition model includes: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: fusing the text semantic features and the image semantic features to obtain a fusion result; and determining target images with similarity with the fusion result being greater than a third similarity threshold in the database.

In this embodiment, since the to-be-matched information may include both the to-be-matched text information and the to-be-matched image information, when the feature extraction model corresponding to the to-be-matched information in the semantic recognition model is used to extract the semantic features from the to-be-matched information, the text feature may be first extracted from the text information by using the text feature extraction model corresponding to the to-be-matched text information in the semantic recognition model, the image feature may be extracted from the image information by using the image feature extraction model corresponding to the to-be-matched image information in the semantic recognition model, and then the extracted text feature and the image semantic feature may be fused to obtain a fusion result, and then, based on the fusion result, an image with a similarity greater than a third similarity threshold may be searched in the database, and the searched image may be determined as the target image.

For example, the text feature extraction model corresponding to the text information to be matched in the semantic recognition model may be a text encoder, and the text encoder may recognize text semantic features to be expressed by the text information; the image feature extraction model corresponding to the image information to be matched in the semantic recognition model may be a picture encoder, and the picture encoder may recognize the image semantic features to be expressed by the image information. After the text semantic features and the image semantic features are obtained, feature fusion can be carried out on the text semantic features and the image semantic features according to average pooling to obtain a fusion result, the fusion result corresponds to the semantic features, and based on the fusion result, a target image with similarity to the semantic features of the fusion result being greater than a third similarity threshold value can be searched in a picture feature library according to the method described above.

As an optional implementation manner, the information to be matched includes image information to be matched, and the extracting of semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic recognition model includes: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: in the database, a target image with similarity to the image semantic features greater than a third similarity threshold is determined.

In this embodiment, when the to-be-matched information includes to-be-matched image information, when a target image matched with the to-be-matched image information is queried from a database based on the to-be-matched image information, an image feature extraction model corresponding to the image information in a semantic recognition model may be used first to extract image semantic features from the to-be-matched image information, and then an image with similarity greater than a third similarity threshold value with the image semantic features is searched in the database, and the searched image is determined as the target image.

For example, the similarity between the semantic features of the image and the semantic features of each image in the database may be calculated, and then the calculated similarity is compared with a third similarity threshold, and an image with the similarity greater than the third similarity threshold is determined as the target image.

Alternatively, when determining the target image, the determination may be based on the euclidean distance, for example, the euclidean distance between the semantic features of the image and the semantic features of each image in the database may be calculated, and then the image with the minimum euclidean distance between the semantic features of the image and the semantic features of the image in the database may be determined as the target image.

It should be noted that, the above steps may be performed by an information search platform or an information search device that deploys a semantic recognition model and a political knowledge graph.

In the above steps, after the information to be matched is monitored, a semantic recognition model is called, and a feature extraction model corresponding to the information to be matched in the semantic recognition model is used to extract semantic features from the information to be matched, where the information to be matched may include text information to be matched and/or image information to be matched, and after the semantic features are extracted from the information to be matched, at least one target image matching the semantic features may be determined in a database. The semantic recognition model is obtained by training a confusion sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample, the text sample is used for describing the image content of the image sample, and the confusion text forces the model to learn semantics with finer granularity, so that the semantic recognition model has better understanding capability on long text, better recognition capability on text information with high similarity, more accurate semantic features extracted from information to be matched by utilizing the semantic recognition model, and higher matching degree between a searched target image and the information to be matched can be achieved by searching in a database based on the extracted semantic features, further, the technical effect of improving the accuracy of information searching is achieved, and further, the technical problem of low information searching accuracy is solved.

The application also provides an information matching method as shown in fig. 5. Fig. 5 is a flowchart of an information matching method according to an embodiment of the present application. As shown in fig. 5, the method may include the steps of:

in step S501, the information to be matched is displayed on the operation interface, where the information to be matched includes text information to be matched and/or image information to be matched.

In the technical solution provided in step S501 of the present application, an operation interface is provided on the terminal device, the operation interface includes a search information input box, a user may input information to be matched in the search information input box, when the device responds to an input operation of the user, the information to be matched may be displayed on the operation interface, where the information to be matched includes text information to be matched and/or image information to be matched, and a target image matched with the information to be matched may be searched in a database based on the information to be matched.

It should be noted that, the device performing this step may be an information search device that deploys a semantic recognition model and a political knowledge graph.

Step S502, in response to the matching operation instruction acting on the operation interface, displaying at least one target image matched with the semantic features of the information to be matched on the operation interface.

In the technical solution provided in step S502, the operation interface includes a search control, after the user inputs the information to be matched on the operation interface, the user may click on the search control on the operation interface, and when the user responds to the selection operation of the search control, that is, to the matching operation instruction acting on the operation interface, the matching operation instruction is used to instruct to search the database for the target image matched with the information to be matched based on the information to be matched. After responding to a matching operation instruction acting on an operation interface, at least one target image matched with the information to be matched can be queried in a database based on the matching operation instruction, wherein at least one target image is searched out from the database, semantic features are extracted from the information to be matched based on a feature extraction model corresponding to the information to be matched in a semantic recognition model, the semantic recognition model is trained based on a confusing text sample as a negative sample and a disturbance image sample of the image sample as a positive sample, and the text sample is used for describing image contents of the image sample.

In this embodiment, when a target image matched with information to be matched is queried in a database based on a matching operation instruction, a semantic feature may be extracted from the information to be matched based on a feature extraction model corresponding to the information to be matched in a semantic recognition model, and then at least one target image matched with the semantic feature may be searched in the database based on the semantic feature.

For example, the database may be a picture feature library, a large number of images are stored in the picture feature library, each image corresponds to a semantic feature, based on this, after the semantic features are extracted, the similarity between the semantic features and the semantic features of each image in the picture feature library may be calculated, and then the image with the highest similarity is determined as the target image matched with the semantic features.

Optionally, when the information to be matched at least includes text information to be matched, the method further includes: responding to an information expansion operation instruction acted on an operation interface, and displaying at least one expansion text message of the text message on the operation interface; and responding to a selection operation instruction acted on the operation interface, and displaying target text information selected from the text information and at least one piece of expanded text information on the operation interface, wherein semantic features are extracted based on a feature extraction model corresponding to the target text information in the semantic recognition model.

In this embodiment, an information expansion control may be further provided in the operation interface, based on which, when the information to be matched includes at least text information to be matched, the user may click on the information expansion control in the operation interface after inputting the text information to be matched, and when the device responds to a selection operation of the information expansion control, that is, to an information expansion operation instruction acting on the operation interface. After the information expansion operation instruction is responded, the text information to be matched can be expanded based on the information expansion operation instruction, so that at least one expanded text information is obtained. At least one augmented text message of the text message and the text message may then be displayed on the operator interface. The user can select satisfactory information from the text information and the expanded text information, and after the device responds to the selection operation of the user on a certain text information, namely, responds to a selection operation instruction acting on an operation interface, the information selected by the user can be used as target text information, and a target image can be queried in a database based on the target text information.

Based on the schemes disclosed in the steps S501 and S502 in the above embodiments, when an input operation instruction acting on an operation interface is responded, information to be matched can be input on the operation interface, and then a target image matched with the information to be matched can be queried in a database based on the information to be matched, and the target image is displayed on the operation interface, so that a user can view a search result in time, the operation is simple and convenient, the use experience of the user is improved, the information search efficiency is improved, and the technical problem of low information search efficiency is solved.

The application also provides an information matching method as shown in fig. 6. Fig. 6 is a flow chart of a method of information matching according to an embodiment of the present application. As shown in fig. 6, the method may include the steps of:

in step S601, risk information to be matched from the information matching platform is monitored, where the risk information to be matched includes risk text information to be matched and/or risk image information to be matched.

In the technical scheme provided in the above step S601, the device may continuously monitor the information input on the information matching platform, based on which, when the user inputs the risk information to be matched in the information matching platform, the device may monitor the risk information to be matched from the information matching platform, and further search the database for the target image matched with the risk information to be matched based on the risk information to be matched, where the risk information to be matched includes the risk text information to be matched and/or the risk image information to be matched.

Step S602, invoking a semantic recognition model.

In the technical scheme provided in the above step S602 of the present application, after monitoring risk information to be matched from the information matching platform, a semantic recognition model may be invoked to extract risk semantic features in the risk information to be matched. The semantic recognition model is trained based on a mixed text sample of a risk text sample serving as a negative sample and a disturbance image sample of a risk image sample serving as a positive sample, and the risk text sample is used for describing image content of the risk image sample.

Step S603, extracting risk semantic features from the risk information to be matched by using a feature extraction model corresponding to the risk information to be matched in the semantic recognition model.

In the technical solution provided in the above step S603 of the present application, when the risk semantic features in the risk information to be matched are extracted by using the semantic recognition model, the feature extraction model corresponding to the risk information to be matched in the semantic recognition model may be used to extract the risk semantic features from the risk information to be matched.

Step S604, determining at least one target image matched with the risk semantic features in a database, wherein the database is used for storing images matched with different risk semantic features.

In the technical solution provided in step S604 of the present application, the database stores images matched with different risk semantic features, each image corresponds to a risk semantic feature, based on which, after extracting a risk semantic feature from risk information to be matched, at least one target image matched with the risk semantic feature can be searched in the database, and the similarity between the risk semantic feature of the target image and the risk semantic feature corresponding to the risk information to be matched is higher.

Step S605, returning at least one target image to the information matching platform for display, where the information matching platform is configured to transmit the at least one target image to the terminal device, and the risk event corresponding to the target image is controlled by the terminal device.

In the technical scheme provided in the above step S605, after at least one target image is queried from the database, the queried at least one target image may be returned to the information matching platform for display, where the information matching platform is configured to transmit the at least one target image to the terminal device, and the risk event corresponding to the target image is prevented and controlled by the terminal device.

Based on the schemes disclosed in steps S601 to S605 in the foregoing embodiments, after risk information to be matched from the information matching platform is monitored, a semantic recognition model may be called, and a feature extraction model corresponding to the risk information to be matched in the semantic recognition model is used to extract risk semantic features from the risk information to be matched, so that at least one target image matched with the risk semantic features is searched in a database, and the target image is returned to the information matching platform for display. That is, in the embodiment of the present application, a target image may be searched in a database based on risk semantic features corresponding to risk information to be matched, so as to prevent and control a risk event corresponding to the target image, thereby implementing rapid positioning and accurate prevention and control of the risk event.

The application also provides a generation method of the semantic recognition model shown in fig. 7. Fig. 7 is a flowchart of a method of generating a semantic recognition model according to an embodiment of the present application. As shown in fig. 7, the method may include the steps of:

in step S701, a text sample and an image sample are acquired, wherein the text sample is used to describe the image content of the image sample.

In the technical solution provided in the above step S701 of the present application, the text sample and the image sample are samples for training a semantic recognition model, and multiple sets of text samples and image samples may be obtained through the internet, where the text samples are used for describing the image content of the image sample, that is, the semantic content expressed by the text samples is consistent with the semantic content expressed by the image sample.

In step S702, a confusing text sample of the text sample and a disturbed image sample of the image sample are generated.

In the technical solution provided in step S702 of the present application, after obtaining the image sample of the text sample, multiple granularity disorder processing may be performed on the text sample to obtain an confused text sample of the text sample, and small perturbation is performed on the image sample to obtain a perturbed image sample of the image sample, where a specific implementation manner of generating the confused text sample of the text sample and the perturbed image sample of the image sample may refer to the method described in step S402, and will not be repeated herein.

In step S703, training is performed with the confusing text sample as a negative sample and the disturbance image sample as a positive sample to obtain a semantic recognition model.

In the technical solution provided in step S703 of the present application, after the aliased text sample and the disturbed image sample are generated, the aliased text sample may be used as a negative sample, and the disturbed image sample may be used as a positive sample and integrated with contrast learning, so as to train the model to obtain a semantic recognition model, where the semantic recognition model includes a feature extraction model for extracting semantic features of the input text information, and a feature extraction model for extracting semantic features of the input image information.

Based on the schemes disclosed in steps S701 to S703 in the foregoing embodiments, after the text sample and the image sample are acquired, a mixed text sample may be generated based on the text sample, and a disturbance image sample may be generated based on the image sample, and the mixed text sample may be used as a negative sample, and the disturbance image sample may be used as a positive sample, so as to train to obtain a semantic recognition model, where the mixed text forces the model to learn semantics with finer granularity, so that the semantic recognition model has better understanding ability for long text, better recognition ability for text information with high similarity, and the disturbance image sample may enhance the visual disturbance resistance of the semantic recognition model, and therefore, the semantic features of the text information or the image information extracted by using the semantic recognition model are more accurate.

The technical solutions of the examples of the present application are further exemplified below in conjunction with the preferred embodiments.

Currently, when the risk event is controlled, the risk event is mainly realized through an on-line master control link. However, the main control link only monitors and processes the risk events in the real-time traffic and does not process the risk events in the stock, so that a great number of potential threats of risk missing prevention exist; in addition, the main control link structure is complex, and the cost for changing the structure is high, so that the precaution performance of the main control link is insufficient for some newly increased risks; in addition, the flow of executing risk prevention and control by the main control link is abstract, the understanding cost of a user is high, and the user experience obtained by the user is difficult to obtain, so that a set of risk intelligent checking system is needed to supplement the defects of the main control link.

In the related technology, risk information retrieval is mainly performed through a CLIP model, but the CLIP model has poor understanding capability on long texts and has weak distinguishing capability on similar concepts in text information, so that the accuracy of information retrieval is low and the risk prevention and control capability is poor.

However, the embodiment of the application provides an information matching method, which is characterized in that information to be matched is input through a real-time monitoring information searching platform, after the information to be matched is monitored, a semantic recognition model can be called, and a feature extraction model corresponding to the information to be matched in the semantic recognition model is used for extracting semantic features from the information to be matched, wherein the information to be matched can comprise text information to be matched and/or image information to be matched, after the semantic features are extracted from the information to be matched, at least one target image matched with the semantic features can be searched in a database, and after the target image is searched, risk events corresponding to the target image can be prevented and controlled. The semantic recognition model is obtained by taking a confusion sample of a text sample as a negative sample and taking a disturbance image sample of an image sample as a positive sample and combining contrast learning training, the text sample is used for describing the image content of the image sample, and the confusion text forces the model to learn semantics with finer granularity, so that the semantic recognition model has better understanding capability on long text, better recognition capability on text information with high similarity, more accurate semantic features extracted from information to be matched by utilizing the semantic recognition model, and higher matching degree between a searched target image and the information to be matched can be achieved by searching in a database based on the extracted semantic features, and further the technical effect of improving the accuracy of information searching is achieved, and further the technical problem of low information searching accuracy is solved.

The semantic recognition model provided by the embodiment of the application is further described below.

In the embodiment of the application, a semantic recognition model is provided, and the semantic recognition model can be a model trained based on a Multi-scale out-of-order pre-training (MLSP) scheme, wherein the semantic recognition model can be called as an MLSP model. In the embodiment of the application, the semantic recognition model can be trained by utilizing multiple groups of graphics and texts, wherein each group of graphics and texts comprises a text sample and an image sample, and semantic features expressed by the text sample are consistent with image semantic features expressed by the image sample. For example, when the text information in the graphic pair is "an lovely dog", the image sample in the graphic pair may be a picture of an lovely dog. When training the semantic recognition model based on multiple groups of graphics context, multiple granularity disorder processing can be performed on the text samples in each group of graphics context pair, namely, the text samples are subjected to semantic and grammar confusion processing, so that the confusion text samples are obtained. For example, when the text sample is "astronaut riding a horse", the confused text sample after the disorder processing may be "horse riding astronaut", and then the confused text sample may be used as a negative sample to train the semantic recognition model, so that the model is forced to learn some sentences with abnormal grammar and semantics, so that the model draws the similarity distance between the text sample and the image sample and draws the distance between the confused text sample and the image sample, thereby forming fine granularity supervision on the image sample.

In addition, the image samples in each group of the graph pairs can be cut to different degrees, or masks are added into the image samples by utilizing SLIP pre-training means, so that small disturbance is carried out on the image samples, disturbance image samples are obtained, the disturbed image samples are used as positive samples, the semantic recognition model is trained by combining contrast training, training supervision is carried out on the model through the feature similarity of the contrast image samples and the disturbance image samples, the similarity of the image samples determined by the model is ensured to approach 100%, and therefore the visual disturbance rejection capability of the model is improved.

When the text sample is subjected to multi-granularity disorder processing, the disorder processing of character level, word level and knowledge level can be included, and the purposes of blurring the semantics to be expressed by the text sample and forcing the model to learn the semantics with finer granularity are achieved by carrying out meaning confusion on the text sample and the text with the same semantics as the text sample. In addition, in order to accelerate training, the FLIP pre-training method can be used, partial masks are properly added into the image samples to mask partial contents of the image samples, so that generalization is improved, and training time is shortened. And the finally trained semantic recognition model has obviously improved understanding capability on long sentences, and compared with the CLIP model, the semantic recognition model is especially good at distinguishing sentences with similar meanings from sentences with complex logical relations.

The information matching process in the embodiment of the present application is further described below.

The user can input information to be matched in the information search platform, wherein the information to be matched at least comprises text information to be matched, when the text information to be matched is monitored, the text information to be matched can be expanded by using the administrative knowledge graph recommendation module, the administrative knowledge graph recommendation module comprises an administrative knowledge base, and the text information to be matched can be expanded and associated accurately based on huge knowledge in the administrative knowledge base to obtain at least one expanded text information. For example, the information to be matched input by the user is "notebook computer", so that the expanded text information obtained after expansion can be "notebook computer", which is a thin and light computer type that can be carried with, and the like, and is a precise content strongly related to "notebook computer", and the content can help the model to better understand the concept of "notebook computer", thereby achieving more precise search. In addition, the associated result may be the concepts related to "host computer", "workstation", etc. and "notebook computer", so as to help the user to diverge the thinking and further achieve the omnibearing prevention and control of a certain political concept.

The to-be-matched information can also comprise to-be-matched image information besides to-be-matched text information, based on the to-be-matched image information, after the to-be-matched text information is expanded, text semantic features in the expanded text information can be extracted based on a text encoder in a semantic recognition model, image semantic features in the to-be-matched image information are extracted based on a picture encoder in the semantic recognition model, then the extracted text semantic features and the image semantic features are fused according to average pooling to obtain a fusion result, further Euclidean distance query is carried out according to the features of the fusion result and the features of each image in a picture base, further the image with the nearest Euclidean distance in the picture base is determined to be a target image corresponding to the fusion result, and risk events corresponding to the target image are risk treated.

The device for performing intelligent investigation in the embodiment of the application can be a device for deploying a semantic recognition model and a political knowledge graph recommendation module, and the device can provide a user experience obtained through seeing. For example, the user may optionally input text information to be matched and/or image information to be matched in an operation interface of the device, after responding to the information to be matched, the device may perform an investigation in a picture base based on the text information to be matched and/or the image information to be matched, and return a search result within a millisecond-level response time range, and the user may perform a corresponding risk prevention operation based on the search result returned by the device.

When inquiring the image with the nearest Euclidean distance between the features of the fusion result in the image base, the image base comprises images and image features extracted from offline historical stock data, online real-time data and disposal link data by using an image encoder, based on the images, euclidean distances between the features of each image in the image base and the features corresponding to the fusion result can be inquired, and then an inquiry result is returned, wherein the inquiry result comprises Euclidean distance sequencing from low to high, the Euclidean distance is used for representing the similarity, and the smaller the Euclidean distance is, the higher the similarity is.

Fig. 8 is a flowchart of a method for matching information according to an embodiment of the present application, as shown in fig. 8, the method may include the following steps:

in step S801, a user inputs text information and image information in an operation interface.

In the technical solution provided in step S801, the search platform is provided with an input interface, and the user may input the text information to be matched and the image information to be matched in the input interface of the search platform.

Step S802, the administrative knowledge graph recommendation is involved.

In the technical scheme provided in step S802, after the text information to be matched and the image information to be matched input by the user are monitored, the text information can be expanded based on the administrative knowledge graph to obtain expanded text information.

Step S803, extracting image features in the image information based on the large model picture encoder, extracting semantic features in the extended text information based on the large model text encoder.

In the technical solution provided in the above step S803, the semantic recognition model includes a large model picture encoder and a large model text encoder, based on which the large model picture encoder may be used to extract image features in the image information, or the large model text encoder may be used to extract semantic features in the extended text information.

Step S804, feature fusion is carried out on the image features and the semantic features.

In the technical scheme provided in step S804, after the image features and the semantic features are extracted, feature fusion can be performed on the image features and the semantic features according to the principle of average pooling, so as to obtain a fusion result.

And S805, checking a picture base based on the fusion result.

In the technical scheme provided in the step S805, after the fusion result is obtained, the image base can be checked based on the fusion result, and then the target image matched with the feature direction of the fusion result can be queried in the image base.

Step S806, the search result is returned to the user.

In the technical solution provided in step S806, after the target image is obtained, the target image may be displayed on the operation interface, so that the user may perform risk treatment on the risk event corresponding to the target image in time.

Fig. 9 is a schematic diagram of information matching according to an embodiment of the present application, and fig. 9 shows an operation flow executed to implement information matching, where the operation flow includes large model and knowledge retrieval, intelligent investigation picture base, political knowledge base, political indication map recommendation module, and real-time investigation of intelligent products. The large model and the knowledge retrieval process may refer to the steps described in fig. 8, and are not described herein.

The intelligent investigation picture base mainly comprises a generation method of the picture base, as shown in fig. 9, the picture base comprises images and image features extracted from processing link data, offline historical stock data and online real-time data by using a large model picture encoder, based on the images and the image features, the Euclidean distance between each image feature in the picture base and the corresponding feature of the text information input by a user and the feature fusion result of the image information can be determined, and then the image with the nearest Euclidean distance between the feature fusion results is returned to the user as a search result, wherein the Euclidean distance is used for representing the similarity, and the smaller the Euclidean distance is, the higher the similarity is.

The administrative knowledge base and the administrative knowledge map mainly comprise structured data, real-time risk prediction and high timeliness, wherein the structured data is mainly used for storing a large number of risk points according to entities and carrying out structural association on attributes of the risk points; real-time risk prediction, namely efficiently judging real-time risk scores of various sentences by combining a model algorithm manually so as to determine whether to carry out knowledge enabling; the high timeliness is mainly used for indicating that when newly increased risks appear in the Internet, the newly increased risks can be quickly iterated into the knowledge graph so as to help the large model to quickly cope with brand new risks.

The administrative knowledge graph recommendation module is mainly used for carrying out risk keyword segmentation and expansion on text information input by a user based on an administrative knowledge base and a wiki knowledge base. For example, the administrative knowledge graph recommendation module may segment risk keywords of text information input by a user, predict risk wind of the segmented risk keywords, then screen each risk keyword by using a high-timeliness early-back mechanism to obtain keywords with higher risk wind, and analyze the keywords with higher risk scores by using a wiki knowledge base and an administrative knowledge base respectively to form a knowledge recommendation result.

The real-time intelligent investigation product mainly utilizes the result obtained by searching to perform risk processing, for example, the result obtained by searching can be displayed to a user, the user can click the issuing treatment, and after responding to the issuing treatment operation of the user, the risk event corresponding to the search result can be processed according to the issuing treatment operation of the user.

The embodiments of the present application will be further described with reference to a political knowledge base and a political knowledge map recommendation module.

And the administrative knowledge base is used for indicating an administrative knowledge map which is constructed and maintained in advance. The administrative knowledge base takes different administrative knowledge points as entities (Entity), and constructs structural association of various attributes around the entities, so as to achieve query logic around the entities. The administrative knowledge base has the keyword risk prediction capability, the risk classification prediction can be carried out on any piece of text information based on the administrative knowledge base, and the capability is achieved by a Mention list precipitated by a model algorithm and a manual auditing result. Because the daily operation process utilizes an algorithm and manual auditing to quickly store newly added risk points on the Internet, the administrative-related knowledge base has high timeliness. Compared with the pretraining cost of a larger model, the warehousing cost of the administrative knowledge base is negligible, so that the high timeliness of the model can make up for the shortage of timeliness of the large model, and the large model can finish accurate understanding and query through knowledge energization when facing to brand-new risk query.

The administrative knowledge graph recommendation module is used for splitting risk keywords from text information input by a user by using an NER technology to form a risk keyword list, performing real-time risk prediction classification on each keyword in the risk keyword list, and sequencing the risk keywords according to the degree of risk pre-side scoring. The risk threshold can be preset, further risk keywords with risk prediction scores lower than the risk threshold are filtered, and the remaining keywords with high risk are expanded, so that timeliness of overall recommendation is remarkably improved.

When the high-risk keywords are expanded, double expansion can be carried out through a wiki knowledge base and a political knowledge map, wherein the former is more focused on the expansion of the consciousness, the latter is more focused on the expansion of the political wind control, the structuring results expanded by the wiki knowledge base and the political wind control can be disassembled word by word, long sentences with extremely high knowledge concentration are analyzed and spliced, and the recommendation sequence is determined according to behavior data of user query and knowledge enabling algorithm capability of a large model, so that the model can be helped to display knowledge fields of the user in priority.

For example, a user may input a keyword in the operation interface, such as "crayon xiao xin", and first may recommend several alternatives based on the extended function of knowledge recommendation, for example, "crayon xiao xin", japan cartoon "," crayon xiao xin, children of 4 years old ", and related concepts such as" ottman "," cherry miniball "recommended based on the association function, and the user may choose to click one of them, and then query, and the result picture of the query will be displayed.

In the embodiment of the application, an MLSP pre-training scheme is innovatively provided, and a final MLSP large model is obtained by training the model through the disorder of text information at a character level, a word level and a knowledge level and the constraint of self-supervision of pictures, wherein the model has deeper understanding on long texts, better distinguishing capability on texts with high similarity and better understanding on logic relations in pictures; in addition, in the embodiment of the application, the administrative-related knowledge graph is newly applied to the wind-control intelligent investigation scene, based on NER technology, entity linking technology, risk score prediction technology, structured knowledge expansion technology, early-return mechanism and the like, the recommendation capability based on the outward penetration of wind-control knowledge in the intelligent investigation system is designed, the user can be helped to understand the concept of the obscure administrative-related function, and the model can also be enabled to accurately understand the demands of the user; based on the combination of the technologies, the wind control treatment product obtained immediately after the user sees is provided for the user, the response speed to the newly increased risk is greatly improved, a detection tool for the stock risk is provided, the prevention and control cost is reduced, and the defect of a main control link is overcome. The system has the universality on the wind control function level, so that the system can be rapidly copied and iterated in various wind control function domains, and has higher application value.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus a necessary general hardware platform, but that it may also be implemented by means of hardware. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method described in the embodiments of the present application.

Example 2

According to an embodiment of the present application, there is also provided an information matching apparatus for implementing the above information matching method, and fig. 10 is an information matching apparatus according to an embodiment of the present application, as shown in fig. 10, the information matching apparatus 1000 includes: a monitoring unit 1001, a calling unit 1002, an extracting unit 1003, and a searching unit 1004.

The monitoring unit 1001 is configured to monitor information to be matched, where the information to be matched includes text information to be matched and/or image information to be matched.

The invoking unit 1002 is configured to invoke a semantic recognition model, where the semantic recognition model is obtained by training a confusing text sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample, and the text sample is used to describe image content of the image sample.

The extracting unit 1003 is configured to extract semantic features from the information to be matched using a feature extraction model corresponding to the information to be matched in the semantic recognition model.

The searching unit 1004 is configured to determine at least one target image that matches the semantic feature in a database, where the database is configured to store images that match different semantic features.

Here, it should be noted that the monitoring unit 1001, the calling unit 1002, the extracting unit 1003, and the searching unit 1004 correspond to steps S401 to S404 in embodiment 1, and the four modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory (for example, the memory 104) and processed by one or more processors (for example, the processors 102a, 102b, … …,102 n), or the above-mentioned modules may be part of the apparatus and may be executed in the computer terminal 10 provided in the first embodiment.

According to an embodiment of the present application, there is also provided an information matching apparatus for implementing the above information matching method, and fig. 11 is an information matching apparatus according to an embodiment of the present application, and as shown in fig. 11, the information matching apparatus 1100 includes: a first display unit 1101, a second display unit 1102.

The first display unit 1101 is configured to display information to be matched on an operation interface, where the information to be matched includes text information to be matched and/or image information to be matched.

The second display unit 1102 is configured to display, on the operation interface, at least one target image that matches a semantic feature of the information to be matched in response to a matching operation instruction acting on the operation interface, where the at least one target image is determined from a database, the semantic feature is extracted from the information to be matched based on a feature extraction model corresponding to the information to be matched in a semantic recognition model, the semantic recognition model is obtained by training a confusing text sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample, and the text sample is used to describe image content of the image sample.

Here, it should be noted that the first display unit 1101 and the second display unit 1102 correspond to steps S501 to S502 in embodiment 1, and the two modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory (for example, the memory 104) and processed by one or more processors (for example, the processors 102a, 102b, … …,102 n), or the above-mentioned modules may be part of the apparatus and may be executed in the computer terminal 10 provided in the first embodiment.

According to an embodiment of the present application, there is also provided an information matching apparatus for implementing the above information matching method, and fig. 12 is an information matching apparatus according to an embodiment of the present application, as shown in fig. 12, the information matching apparatus 1200 includes: a monitoring unit 1201, a calling unit 1202, an extracting unit 1203, a searching unit 1204 and a presentation unit 1205.

The monitoring unit 1201 is configured to monitor risk information to be matched from the information matching platform, where the risk information to be matched includes risk text information to be matched and/or risk image information to be matched.

A calling unit 1202, configured to call a semantic recognition model, where the semantic recognition model is trained based on a confusing text sample of a risk text sample as a negative sample and a disturbance image sample of a risk image sample as a positive sample, and the risk text sample is used to describe image content of the risk image sample.

The extracting unit 1203 is configured to extract risk semantic features from risk information to be matched using a feature extraction model corresponding to the risk information to be matched in the semantic recognition model.

A search unit 1204, configured to determine at least one target image matching the risk semantic features in a database, where the database is configured to store images matching different risk semantic features.

The display unit 1205 is configured to return at least one target image to the information matching platform for display, where the information matching platform is configured to transmit the at least one target image to the terminal device, and the risk event corresponding to the target image is controlled by the terminal device.

It should be noted that, the monitoring unit 1201, the calling unit 1202, the extracting unit 1203, the searching unit 1204 and the displaying unit 1205 correspond to steps S601 to S605 in embodiment 1, and the five modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory (for example, the memory 104) and processed by one or more processors (for example, the processors 102a, 102b, … …,102 n), or the above-mentioned modules may be part of the apparatus and may be executed in the computer terminal 10 provided in the first embodiment.

According to an embodiment of the present application, there is further provided an information matching apparatus for implementing the above information matching method, and fig. 13 is a generating apparatus of a semantic recognition model according to an embodiment of the present application, as shown in fig. 13, the generating apparatus 1300 of the semantic recognition model includes: acquisition unit 1301, generation unit 1302, training unit 1303.

An obtaining unit 1301 configured to obtain a text sample and an image sample, where the text sample is used to describe image content of the image sample.

A generating unit 1302 is configured to generate a confusing text sample of the text sample and a disturbed image sample of the image sample.

The training unit 1303 is configured to train to obtain a semantic recognition model by taking the confusing text sample as a negative sample and taking the disturbance image sample as a positive sample, where the semantic recognition model includes a feature extraction model for extracting semantic features of the input text information and a feature extraction model for extracting semantic features of the input image information.

Here, it should be noted that the acquiring unit 1301, the generating unit 1302, and the training unit 1303 correspond to steps S701 to S703 in embodiment 1, and the three modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the first embodiment. It should be noted that the above-mentioned modules or units may be hardware components or software components stored in a memory (for example, the memory 104) and processed by one or more processors (for example, the processors 102a, 102b, … …,102 n), or the above-mentioned modules may be part of the apparatus and may be executed in the computer terminal 10 provided in the first embodiment.

It should be noted that, the preferred embodiments in the foregoing examples of the present application are the same as the embodiments provided in example 1, the application scenario and the implementation process, but are not limited to the embodiments provided in example 1.

Example 3

Embodiments of the present application may provide a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.

Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.

In this embodiment, the above-mentioned computer terminal may execute the program code of the following steps in the information matching method: monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched; invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a disturbance image sample of an image sample as a positive sample based on a confusing text sample of a text sample as a negative sample, and the text sample is used for describing the image content of the image sample; extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic identification model; at least one target image matched with the semantic features is determined in a database, wherein the database is used for storing images matched with different semantic features.

Alternatively, fig. 14 is a block diagram of a computer terminal according to an embodiment of the present application. As shown in fig. 14, the computer terminal a may include: one or more (only one shown) processors 1402, memory 1404, memory controller, and peripheral interfaces, wherein the peripheral interfaces are coupled to the radio frequency module, the audio module, and the display.

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the information matching method and apparatus in the embodiments of the present application, and the processor executes the software programs and modules stored in the memory, thereby executing various functional applications and data processing, that is, implementing the information matching method described above. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched; invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a disturbance image sample of an image sample as a positive sample based on a confusing text sample of a text sample as a negative sample, and the text sample is used for describing the image content of the image sample; extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic identification model; at least one target image matched with the semantic features is determined in a database, wherein the database is used for storing images matched with different semantic features.

Optionally, the above processor may further execute program code for: and carrying out disorder processing on the text samples according to different text information amounts to obtain confused text samples, wherein the granularity of the text information amounts is used for at least determining the semantics of the text samples, and the semantics of the confused text samples are different from the semantics of the text samples.

Optionally, the above processor may further execute program code for: acquiring a first semantic similarity between the semantics of the confusing text sample as the negative sample and the semantics of the image sample, wherein the first semantic similarity is smaller than a first semantic similarity threshold; acquiring a second semantic similarity between the semantics of the image sample serving as the positive sample and the semantics of the disturbance image sample, wherein the second semantic similarity is larger than a second semantic similarity threshold; based on the first semantic similarity and the second semantic similarity, training to obtain a semantic recognition model.

Optionally, the above processor may further execute program code for: acquiring at least one extended text message of the text message; selecting target text information from the text information and the at least one augmented text information.

Optionally, the above processor may further execute program code for: text semantic features are extracted from the target text information using a text feature extraction model corresponding to the text information.

Optionally, the above processor may further execute program code for: acquiring a plurality of keywords of text information; determining a risk index of the keyword, wherein the risk index is used for representing the degree of risk of the keyword; and converting the keywords with risk indexes higher than the risk index threshold value in the keywords into expanded text information.

Optionally, the above processor may further execute program code for: in the knowledge graph, based on the weight of the entity of the text information, the text information is segmented into a plurality of keywords, wherein the knowledge graph is used for representing the association relationship among the entities.

Optionally, the above processor may further execute program code for: and adjusting the weight of the entity of the text information in the knowledge graph based on the selection times of the target text information.

Optionally, the above processor may further execute program code for: converting keywords with risk indexes higher than risk index thresholds in the keywords into expanded text information based on a knowledge graph, wherein the knowledge graph is used for representing association relations among the entities; and/or converting keywords having risk indicators higher than the risk indicator threshold among the plurality of keywords into expanded text information based on a knowledge base, wherein the knowledge base is used for storing knowledge information associated with the keywords.

Optionally, the above processor may further execute program code for: in the database, a target image with similarity to the text semantic feature greater than a third similarity threshold is determined.

Optionally, the above processor may further execute program code for: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: fusing the text semantic features and the image semantic features to obtain a fusion result; and determining target images with similarity with the fusion result being greater than a third similarity threshold in the database.

Optionally, the above processor may further execute program code for: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: in the database, a target image with similarity to the image semantic features greater than a third similarity threshold is determined.

By adopting the embodiment of the application, the scheme of the information matching method is provided. After the information to be matched is monitored, a semantic recognition model is called, a feature extraction model corresponding to the information to be matched in the semantic recognition model is used for extracting semantic features from the information to be matched, the information to be matched can comprise text information to be matched and/or image information to be matched, and after the semantic features are extracted from the information to be matched, at least one target image matched with the semantic features can be searched in a database. The semantic recognition model is trained based on the fact that a confusion sample of a text sample is used as a negative sample, and a disturbance image sample of an image sample is used as a positive sample, the text sample is used for describing image content of the image sample, and the confusion text forces the model to learn finer granularity semantics, so that the semantic recognition model has better understanding ability on long texts, better recognition ability on text information with high similarity, more accurate semantic features are extracted from information to be matched by utilizing the semantic recognition model, and the purpose of higher matching degree between a searched target image and the information to be matched can be achieved by searching in a database based on the extracted semantic features, further, the technical effect of improving the accuracy of information searching is achieved, and further the technical problem of low information searching accuracy is solved.

It will be appreciated by those skilled in the art that the structure shown in the figure is only illustrative, and the computer terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a mobile internet device (MobileInternetDevices, MID), a PAD, etc. Fig. 14 is not limited to the structure of the electronic device. For example, the computer terminal a may also include more or fewer components (such as a network interface, a display device, etc.) than shown in fig. 14, or have a different configuration than shown in fig. 14.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

Example 4

Embodiments of the present application also provide a storage medium. Alternatively, in this embodiment, the storage medium may be used to store the program code executed by the information matching method provided in the first embodiment.

Alternatively, in this embodiment, the storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched; invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a disturbance image sample of an image sample as a positive sample based on a confusing text sample of a text sample as a negative sample, and the text sample is used for describing the image content of the image sample; extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic identification model; at least one target image matched with the semantic features is determined in a database, wherein the database is used for storing images matched with different semantic features.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: and carrying out disorder processing on the text samples according to different text information amounts to obtain confused text samples, wherein the text information amounts are used for at least determining the semantics of the text samples, and the semantics of the confused text samples are different from the semantics of the text samples.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: acquiring a first semantic similarity between the semantics of the confusing text sample as the negative sample and the semantics of the image sample, wherein the first semantic similarity is smaller than a first semantic similarity threshold; acquiring a second semantic similarity between the semantics of the image sample serving as the positive sample and the semantics of the disturbance image sample, wherein the second semantic similarity is larger than a second semantic similarity threshold; based on the first semantic similarity and the second semantic similarity, training to obtain a semantic recognition model.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: acquiring at least one extended text message of the text message; selecting target text information from the text information and the at least one augmented text information.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: text semantic features are extracted from the target text information using a text feature extraction model corresponding to the text information.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: acquiring a plurality of keywords of text information; determining a risk index of the keyword, wherein the risk index is used for representing the degree of risk of the keyword; and converting the keywords with risk indexes higher than the risk index threshold value in the keywords into expanded text information.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: in the knowledge graph, based on the weight of the entity of the text information, the text information is segmented into a plurality of keywords, wherein the knowledge graph is used for representing the association relationship among the entities.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: and adjusting the weight of the entity of the text information in the knowledge graph based on the selection times of the target text information.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: converting keywords with risk indexes higher than risk index thresholds in the keywords into expanded text information based on a knowledge graph, wherein the knowledge graph is used for representing association relations among the entities; and/or converting keywords having risk indicators higher than the risk indicator threshold among the plurality of keywords into expanded text information based on a knowledge base, wherein the knowledge base is used for storing knowledge information associated with the keywords.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: in the database, a target image with similarity to the text semantic feature greater than a third similarity threshold is determined.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: fusing the text semantic features and the image semantic features to obtain a fusion result; and determining target images with similarity with the fusion result being greater than a third similarity threshold in the database.

Optionally, the above-mentioned storage medium is further configured to store program code for performing the steps of: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: in the database, a target image with similarity to the image semantic features greater than a third similarity threshold is determined.

Example 5

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor; the memory is for storing computer executable instructions for executing computer executable instructions that when executed by the processor perform method steps for generating a semantic recognition model. In further alternative embodiments, the memory may store an executable program; the processor may be configured to run a program and perform the information matching method provided in the first embodiment.

Optionally, in this embodiment, the above processor is further configured to execute the following program code: monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched; invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a disturbance image sample of an image sample as a positive sample based on a confusing text sample of a text sample as a negative sample, and the text sample is used for describing the image content of the image sample; extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic identification model; at least one target image matched with the semantic features is determined in a database, wherein the database is used for storing images matched with different semantic features.

Optionally, the above processor is further configured to execute the following program code: and carrying out disorder processing on the text samples according to different text information amounts to obtain confused text samples, wherein the text information amounts are used for at least determining the semantics of the text samples, and the semantics of the confused text samples are different from the semantics of the text samples.

Optionally, the above processor is further configured to execute the following program code: acquiring a first semantic similarity between the semantics of the confusing text sample as the negative sample and the semantics of the image sample, wherein the first semantic similarity is smaller than a first semantic similarity threshold; acquiring a second semantic similarity between the semantics of the image sample serving as the positive sample and the semantics of the disturbance image sample, wherein the second semantic similarity is larger than a second semantic similarity threshold; based on the first semantic similarity and the second semantic similarity, training to obtain a semantic recognition model.

Optionally, the above processor is further configured to execute the following program code: acquiring at least one extended text message of the text message; selecting target text information from the text information and the at least one augmented text information.

Optionally, the above processor is further configured to execute the following program code: acquiring a plurality of keywords of text information; determining a risk index of the keyword, wherein the risk index is used for representing the degree of risk of the keyword; and converting the keywords with risk indexes higher than the risk index threshold value in the keywords into expanded text information.

Optionally, the above processor is further configured to execute the following program code: in the knowledge graph, based on the weight of the entity of the text information, the text information is segmented into a plurality of keywords, wherein the knowledge graph is used for representing the association relationship among the entities.

Optionally, the above processor is further configured to execute the following program code: and adjusting the weight of the entity of the text information in the knowledge graph based on the selection times of the target text information.

Optionally, the above processor is further configured to execute the following program code: converting keywords with risk indexes higher than risk index thresholds in the keywords into expanded text information based on a knowledge graph, wherein the knowledge graph is used for representing association relations among the entities; and/or converting keywords having risk indicators higher than the risk indicator threshold among the plurality of keywords into expanded text information based on a knowledge base, wherein the knowledge base is used for storing knowledge information associated with the keywords.

Optionally, the above processor is further configured to execute the following program code: in the database, a target image with similarity to the text semantic feature greater than a third similarity threshold is determined.

Optionally, the above processor is further configured to execute the following program code: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: fusing the text semantic features and the image semantic features to obtain a fusion result; and determining target images with similarity with the fusion result being greater than a third similarity threshold in the database.

Optionally, the above processor is further configured to execute the following program code: extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information; determining, in a database, a target image that matches the semantic feature, comprising: in the database, a target image with similarity to the image semantic features greater than a third similarity threshold is determined.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. An information matching method, comprising:

monitoring information to be matched, wherein the information to be matched comprises text information to be matched and/or image information to be matched;

invoking a semantic recognition model, wherein the semantic recognition model is obtained by training a confused text sample of a text sample as a negative sample and a disturbance image sample of an image sample as a positive sample, and the text sample is used for describing the image content of the image sample;

extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic recognition model;

at least one target image matched with the semantic features is determined in a database, wherein the database is used for storing images matched with different semantic features.

2. The method according to claim 1, wherein the method further comprises:

And carrying out disorder processing on the text samples according to different text information amounts to obtain the confusing text samples, wherein the text information amounts are used for at least determining the semantics of the text samples, and the semantics of the confusing text samples are different from the semantics of the text samples.

3. The method according to claim 1, wherein the method further comprises:

acquiring the confusing text sample as a first semantic similarity between the semantics of the negative sample and the semantics of the image sample, wherein the first semantic similarity is smaller than a first semantic similarity threshold;

acquiring the disturbance image sample as a second semantic similarity between the semantics of the positive sample and the semantics of the image sample, wherein the second semantic similarity is greater than a second semantic similarity threshold;

and training to obtain the semantic recognition model based on the first semantic similarity and the second semantic similarity.

4. The method of claim 1, wherein the information to be matched comprises at least text information to be matched,

the method further comprises the steps of: acquiring at least one extended text message of the text message;

Selecting target text information from the text information and the at least one extended text information;

extracting semantic features from the information to be matched by using a feature extraction model corresponding to the information to be matched in the semantic recognition model, wherein the feature extraction model comprises the following steps: and extracting text semantic features from the target text information by using a text feature extraction model corresponding to the text information.

5. The method of claim 4, wherein obtaining at least one augmented text message of the text message comprises:

acquiring a plurality of keywords of the text information;

determining a risk index of the keyword, wherein the risk index is used for representing the degree of risk of the keyword;

and converting the keywords, of which the risk indexes are higher than a risk index threshold, into the expanded text information.

6. The method of claim 5, wherein obtaining the plurality of keywords of the text information comprises:

and in a knowledge graph, based on the weight of the entity of the text information, segmenting the text information into the plurality of keywords, wherein the knowledge graph is used for representing the association relationship among the plurality of entities.

7. The method of claim 6, wherein the method further comprises:

and adjusting the weight of the entity of the text information in the knowledge graph based on the selection times of the target text information.

8. The method of claim 6, wherein converting the keywords of the plurality of keywords having the risk indicator above a risk indicator threshold to the augmented text information comprises:

converting the keywords, of which the risk indexes are higher than a risk index threshold, in the plurality of keywords into the extended text information based on the knowledge graph, wherein the knowledge graph is used for representing the association relationship among a plurality of entities; and/or the number of the groups of groups,

and converting the keywords, of which the risk indexes are higher than the risk index threshold, into the extended text information based on a knowledge base, wherein the knowledge base is used for storing knowledge information associated with the keywords.

9. The method of claim 4, wherein determining, in a database, a target image that matches the semantic feature comprises:

and determining the target image with the similarity with the text semantic features being greater than a third similarity threshold in the database.

10. The method according to claim 4, wherein the information to be matched includes text information to be matched and image information to be matched, and extracting semantic features from the information to be matched using a feature extraction model corresponding to the information to be matched in the semantic recognition model includes:

extracting image semantic features from the image information by using an image feature extraction model corresponding to the image information;

determining, in a database, a target image that matches the semantic feature, comprising: fusing the text semantic features and the image semantic features to obtain a fusion result; and determining the target image with the similarity with the fusion result being larger than a third similarity threshold value in the database.

11. The method according to claim 1, wherein the information to be matched includes image information to be matched, and extracting semantic features from the information to be matched using a feature extraction model corresponding to the information to be matched in the semantic recognition model includes:

Determining, in the database, a target image that matches the semantic feature, comprising: and determining the target image with the similarity with the image semantic features being greater than a third similarity threshold in the database.

12. An information matching method, comprising:

displaying information to be matched on an operation interface, wherein the information to be matched comprises text information to be matched and/or image information to be matched;

and responding to a matching operation instruction acted on the operation interface, and displaying at least one target image matched with the semantic features of the information to be matched on the operation interface, wherein the at least one target image is determined from a database, the semantic features are extracted from the information to be matched based on a feature extraction model corresponding to the information to be matched in a semantic recognition model, the semantic recognition model is trained based on a confusing text sample as a negative sample and a disturbance image sample of an image sample as a positive sample, and the text sample is used for describing the image content of the image sample.

13. The method of claim 12, wherein the information to be matched comprises at least text information to be matched, the method further comprising:

Responding to an information expansion operation instruction acted on the operation interface, and displaying at least one expansion text message of the text message on the operation interface;

and responding to a selection operation instruction acted on the operation interface, and displaying target text information selected from the text information and the at least one piece of extended text information on the operation interface, wherein the semantic features are extracted based on a feature extraction model corresponding to the target text information in a semantic recognition model.

14. An information matching method, comprising:

monitoring risk information to be matched from an information matching platform, wherein the risk information to be matched comprises risk text information to be matched and/or risk image information to be matched;

invoking a semantic recognition model, wherein the semantic recognition model is trained based on a mixed text sample of a risk text sample serving as a negative sample and a disturbance image sample of a risk image sample serving as a positive sample, and the risk text sample is used for describing image content of the risk image sample;

extracting risk semantic features from the risk information to be matched by using a feature extraction model corresponding to the risk information to be matched in the semantic recognition model;

Determining at least one target image matched with the risk semantic features in a database, wherein the database is used for storing images matched with different risk semantic features;

and returning the at least one target image to the information matching platform for display, wherein the information matching platform is used for transmitting the at least one target image to terminal equipment, and the risk event corresponding to the target image is prevented and controlled by the terminal equipment.

15. A method for generating a semantic recognition model, comprising:

obtaining a text sample and an image sample, wherein the text sample is used for describing the image content of the image sample;

generating a confusing text sample of the text sample and a perturbed image sample of the image sample;

training the confusing text sample as a negative sample and the disturbance image sample as a positive sample to obtain a semantic recognition model, wherein the semantic recognition model comprises a feature extraction model for extracting semantic features of input text information and a feature extraction model for extracting semantic features of the input image information.

16. An electronic device, comprising: a memory and a processor; the memory is configured to store computer executable instructions, the processor being configured to execute the computer executable instructions, which when executed by the processor, implement the steps of the method of any one of claims 1 to 15.