CN113688658A

CN113688658A - Object identification method, device, equipment and medium

Info

Publication number: CN113688658A
Application number: CN202010588784.XA
Authority: CN
Inventors: 朱启源; 朱声高; 于欣; 叶奕斌; 涂丹丹; 鲍江宏
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2020-05-18
Filing date: 2020-06-24
Publication date: 2021-11-23
Anticipated expiration: 2040-06-24
Also published as: WO2021232865A1

Abstract

The application provides an object identification method, device, equipment and medium, comprising the following steps: the first recognition system acquires a target image including a plurality of objects, determines a plurality of sub-images corresponding to the target image, each sub-image may include at least one object, and then may send the plurality of sub-images of the target image to the second recognition system through remote communication so that the second recognition system may recognize the objects in the plurality of sub-images. Because the first recognition system sends the plurality of sub-images corresponding to the target image to the second recognition system, not the whole target image, even if the target image contains sensitive information, each part of the sensitive information corresponding to the plurality of sub-images is not sensitive any more, the complete sensitive information is difficult to determine, and the risk of leakage of the sensitive information can be reduced.

Description

Object identification method, device, equipment and medium

The present application claims priority of chinese patent application filed on 18/5/2020 and entitled "method, apparatus and device for identifying characters by combining side and cloud" under the application number 202010420378.2, which is incorporated herein by reference in its entirety.

Technical Field

The present application relates to the field of computer vision, and in particular, to a method, an apparatus, a device, and a medium for object recognition.

Background

With the development of deep learning technology, the recognition accuracy of object recognition technology is higher and higher, and the object recognition technology is widely applied to various industries. Generally, recognizing objects (such as characters, objects, etc.) in an image may have a high requirement on the data processing capability of the device, for example, in some scenes with a high requirement on the recognition speed and/or the recognition accuracy, the data processing capability of the device cannot be too low. Thus, some enterprises may employ multiple recognition systems in coordination for object recognition, for example, by a recognition system in their edge data center and in concert with a recognition system in a cloud data center of a public cloud service provider.

However, when the image to be recognized contains sensitive information, in an application scenario in which a plurality of recognition systems perform object recognition, there is a high risk of sensitive information leakage.

Disclosure of Invention

In view of this, the present application provides an object identification method to reduce the risk of leakage of sensitive information when identifying an object in an image. Corresponding apparatus, devices, computer-readable storage media, and computer program products are also provided.

In a first aspect, an embodiment of the present application provides an object recognition method, which may be performed by a first recognition system. The first recognition system may acquire a target image including a plurality of objects, where the objects in the target image may be, for example, characters or objects. The first recognition system may determine a plurality of sub-images to which the target image corresponds and each of which may include at least one object, and then the first recognition system may send the plurality of sub-images of the target image to the second recognition system via remote communication so that the second recognition system may recognize the object in the plurality of sub-images.

Because the first recognition system sends the plurality of sub-images corresponding to the target image to the second recognition system instead of the whole target image, even if the target image contains sensitive information, the information recognized by the second recognition system based on the plurality of sub-images is usually only each part of the sensitive information, and the combination relationship of each part of the sensitive information cannot be known, so that when each part of the sensitive information recognized by the second recognition system leaks, the sensitive information in the target image is difficult to determine from the plurality of information combinations due to the plurality of possible information combinations among the parts, which is equivalent to that the sensitive information is not actually leaked, and the risk of the leakage of the sensitive information in the target image in the process of recognizing the object can be reduced.

In addition, when the first recognition system is deployed in the edge data center and the second recognition system is deployed in the cloud data center, the device for recognizing the object in the sub-image may be the cloud data center with higher data processing capacity instead of the edge data center, which not only can make the recognition accuracy and recognition efficiency of the object higher, but also does not require the user to deploy a high-performance server on the edge side, thereby reducing the hardware cost requirement for the user.

In one possible embodiment, the first recognition system may be a plurality of sub-images corresponding to the target image obtained by cropping the target image. Specifically, the first recognition system may acquire pixel positions of a plurality of objects on the target image, and crop the target image according to the pixel positions of the objects on the target image, so as to obtain a plurality of sub-images corresponding to the target image. The first recognition system may detect a plurality of objects in the target image by using an object detection algorithm configured by the first recognition system, and further obtain pixel positions of the objects on the target image; alternatively, the first recognition system may recognize a plurality of objects in the target image through the second recognition system, and receive pixel positions of the objects on the target image returned by the second recognition system.

In one possible implementation, when the first recognition system recognizes a plurality of objects in the target image through the second recognition system, the first recognition system may perform a transformation process on the target image, specifically, may transform image contents of the objects in the target image into other image contents, for example, when the objects are characters, characters in the target image may be replaced with other characters. Then, the first recognition system may send the converted image obtained after the conversion processing to the second recognition system, and the second recognition system may detect pixel positions of a plurality of converted images in the converted image on the converted image by using a high-precision depth learning algorithm or an object detection model, and return the pixel positions to the first recognition system. In this way, the first recognition system may determine pixel locations of the plurality of objects on the target image based on the pixel locations of the received plurality of transformed images. The pixel position of the transformation object on the transformation image is detected by the second recognition system with a high-precision deep learning algorithm or an object detection model, so that the first recognition system can obtain the pixel position of the object with higher detection precision in the target image according to the pixel position returned by the second recognition system; moreover, in the process of obtaining the pixel position with higher precision by the first recognition system, the first recognition system sends the converted image obtained by conversion processing to the second recognition system instead of the target image, so that even if the target image contains sensitive information, the sensitive information is not contained in the converted image obtained based on the target image, and the risk of leakage of the sensitive information in the target image is low in the process of transmitting the converted image to the second recognition system and in the process of object detection on the second recognition system.

In a possible implementation manner, the first recognition system may further receive a first recognition result for the plurality of sub-images returned by the second recognition system, where the first recognition result may be a set of recognition results corresponding to the plurality of objects respectively. Then, the first recognition system may determine a second recognition result for the target image according to the first recognition result and the position relationship of the objects in the plurality of sub-images on the target image, and the second recognition result may include a combination relationship of the objects in the target image, and the combination relationship may be determined according to the position relationship of the objects on the target image. For example, assume that the first recognition result includes the recognition result "zhangsan" corresponding to the object 1, the recognition result "XX street XX cell in XX city" corresponding to the object 2, and the recognition result "master" corresponding to the object 3, so that the home address of the second recognition result "zhangsan" is determined to be "XX street XX cell in XX city", "zhangsan" and the like according to the first recognition result, the positional relationship between the object 1 and the object 2 in the target image, and the positional relationship between the object 1 and the object 3 in the target image.

In a possible implementation manner, the first recognition system may send the plurality of sub-images to the second recognition system based on a preset sequence, where the preset sequence may be a sequence in which the first recognition system cuts the plurality of sub-images, or a sequence obtained by performing disorder adjustment on the cutting sequence, so that it is difficult for an illegal user to obtain the target image according to the sending sequence after the disorder adjustment and the restoration of the plurality of sub-images, and thus the risk of leakage of the sensitive information in the target image may be reduced by sending the sub-images in the disorder. Accordingly, when the second recognition system returns the first recognition results for the plurality of sub-images to the first recognition system, the second recognition system may sequentially return the recognition results of the objects in the respective sub-images based on the preset order, so that the first recognition system determines which recognition result each sub-image corresponds to according to the preset order.

In a possible embodiment, the second recognition system may return the first recognition results of the plurality of sub-images and also return the correspondence between each sub-image and its corresponding recognition result, so that the second recognition system is not required to return the recognition results of the objects in each sub-image in a specific order, and the first recognition system may determine which recognition result each sub-image corresponds to according to the received correspondence.

In one possible embodiment, the target image may include one image or may include a plurality of images. When the target image includes a plurality of images, taking the target image includes a first image and a second image as an example, the plurality of sub-images corresponding to the target image may include at least one sub-image corresponding to the first image and at least one sub-image corresponding to the second image. The first image and the second image may both contain sensitive information, either one of the first image and the second image may contain sensitive information, or both of the first image and the second image may not contain sensitive information. Therefore, when the first image and the second image contain sensitive information, the first identification system sends a plurality of sub-images corresponding to the target image to the second identification system, or the sub-images of the plurality of images are mixed, so that the combination complexity between the sub-images can be increased, the difficulty of combining the plurality of sub-images to obtain the sensitive information can be increased, and the risk of sensitive information leakage is further reduced.

In a possible implementation manner, the first recognition system may be deployed in an edge data center, and the second recognition system may be deployed in a cloud data center, so that a high-precision object recognition result may be obtained by using a high data processing performance of the cloud data center, and a risk that sensitive information is leaked may be reduced, and meanwhile, a user is not required to deploy a high-performance server with a high cost in the edge data center.

In one possible embodiment, the plurality of objects included in the target image may include a plurality of characters, such as chinese, english, numbers, symbols, and the like. Of course, other types of objects, such as trademarks, components, etc., are also possible.

In a possible implementation manner, the first recognition system receives a target image uploaded by a user and sensitive indication information of the target image, so that the first recognition system can determine that the target image is an image containing sensitive information according to the sensitive indication information. Thus, the first recognition system may perform object recognition on a target image containing sensitive information according to any one of the possible embodiments of the first aspect, and for an image not containing sensitive information, the first recognition system may send the image to the second recognition system for object recognition. Illustratively, the sensitive indication information may be, for example, a sensitive label added to the target image.

In a second aspect, embodiments of the present application provide an object recognition method, which may be performed by a second recognition system. The second recognition system receives a plurality of sub-images corresponding to the target object from the first recognition system, wherein the target image may include a plurality of objects and each sub-image of the target image may include at least one object therein. Then, the second recognition system may recognize the object in each sub-image by using a preset object recognition algorithm to obtain a first recognition result, where the first recognition result may be a set of recognition results corresponding to the multiple objects, respectively. Since the second recognition system performs object recognition on each sub-image of the target image instead of directly performing object recognition on the whole target image, it is difficult for the second recognition system to know the combination relationship among a plurality of objects in the target image, and therefore, even if the target image contains sensitive information and each part of the recognized sensitive information leaks, it is difficult to determine the sensitive information in the target image from a plurality of information combinations due to the plurality of possible information combinations among the parts, which is equivalent to that the sensitive information does not actually leak, and the risk of leaking the sensitive information in the target image in the process of object recognition can be reduced.

In one possible embodiment, the second recognition system sends the first recognition result for the plurality of sub-images to the first recognition system, so that the first recognition system can further determine the second recognition result for the target image based on the first recognition system for the plurality of sub-images.

In one possible embodiment, the second recognition system further receives a transformed image from the first recognition system, the transformed image being an image obtained by transforming the target image, wherein the object in the target image may differ in image content from the transformed object obtained by the transforming. The second recognition system may then detect a plurality of transformed objects in the transformed image to obtain pixel locations of the plurality of transformed images in the transformed image, such as by using a high-precision object detection algorithm or object detection model, and return the detected pixel locations to the first recognition system. Since the object in the target image is changed in the image content after the transformation, the information presented by the transformation object is not part of the sensitive information, which makes the second recognition system unable to obtain the sensitive information contained in the target image based on the transformation image, thereby reducing the risk of the sensitive information being leaked.

In a possible implementation manner, the second recognition system may sequentially return the recognition results of the objects in the respective sub-images to the first recognition system according to the receiving order of the respective sub-images of the target image, thereby completing the feedback of the first recognition result. In this way, the first recognition system can determine which recognition result each sub-image of the target image corresponds to according to the order in which the second recognition system transmits the recognition results corresponding to the respective sub-images.

In one possible embodiment, the second recognition system may be a system that recognizes objects in the plurality of sub-images in parallel using a plurality of processes, which may significantly improve object recognition efficiency compared to recognizing objects in the plurality of sub-images in series. For example, the multiple processes may be located on the same device, or located on different devices, for example, multiple sub-images of the target image may be respectively divided into multiple parts, and the identification of the object in the sub-images may be completed on different devices, which may further increase the risk of sensitive information being leaked while improving the efficiency of object identification.

In one possible embodiment, the target image may include one image or may include a plurality of images. When the target image includes a plurality of images, taking the target image includes a first image and a second image as an example, the plurality of sub-images corresponding to the target image may include at least one sub-image corresponding to the first image and at least one sub-image corresponding to the second image. The first image and the second image may both contain sensitive information, either one of the first image and the second image may contain sensitive information, or both of the first image and the second image may not contain sensitive information. Therefore, when the first image and the second image contain the sensitive information, the plurality of sub-images corresponding to the target image received by the second recognition system can also be a mixture of the sub-images of the plurality of images, so that the combination complexity between the sub-images can be increased, the difficulty of combining the plurality of sub-images to obtain the sensitive information can be increased, and the risk of leakage of the sensitive information can be further reduced.

In a third aspect, the present application provides an object recognition apparatus, which may be applied to a first recognition system, the apparatus comprising: an acquisition module for acquiring a target image, the target image comprising a plurality of objects; a determining module for determining a plurality of sub-images corresponding to the target image, each sub-image comprising at least one object; a transmission module, configured to send the multiple sub-images to the second device, so that the second device identifies objects in the multiple sub-images.

In a possible implementation manner, the determining module is specifically configured to acquire pixel positions of the multiple objects on the target image, and crop the target image according to the pixel positions of the objects on the target image to obtain multiple sub-images corresponding to the target image.

In a possible implementation manner, the determining module is specifically configured to perform transformation processing on the target image to obtain a transformed image; the transmission module is further configured to send the transformed image to the second recognition system, and receive pixel positions of a plurality of transformation objects on the transformed image returned by the second recognition system; the determining module is specifically configured to determine pixel positions of the plurality of objects on the target image according to the pixel positions of the plurality of transformed objects.

In a possible implementation manner, the transmission module is further configured to receive a first recognition result for the plurality of sub-images returned by the second recognition system; the determining module is further configured to determine a second recognition result for the target image according to the first recognition result and a position relationship of the object in the plurality of sub-images on the target image.

In a possible implementation manner, the transmission module is specifically configured to send, by the first recognition system, the plurality of sub-images to the second recognition system based on a preset order, and receive a first recognition result for the plurality of sub-images, which is returned by the second recognition system based on the preset order.

In a possible implementation manner, the target image includes at least a first image and a second image, and the plurality of sub-images corresponding to the target image includes at least a sub-image corresponding to the first image and a sub-image corresponding to the second image.

In one possible embodiment, the first identification system is deployed in an edge data center and the second identification system is deployed in a cloud data center.

In one possible embodiment, the target image includes a plurality of objects including a plurality of characters.

In a possible implementation manner, the transmission module is further configured to receive the target image and the sensitive indication information of the target image uploaded by the user; the determining module is further configured to determine that the target image is an image containing sensitive information.

In a fourth aspect, the present application provides another object recognition apparatus, which may be applied to a second recognition system, and which may include: the system comprises a transmission module, a first recognition system and a second recognition system, wherein the transmission module is used for receiving a plurality of sub-images corresponding to a target object sent by the first recognition system through remote communication, the target image comprises a plurality of objects, and each sub-image comprises at least one object; the identification module is used for carrying out object identification on the plurality of sub-images to obtain a first identification result aiming at the plurality of sub-images, and the first identification result comprises an identification result of an object in each sub-image.

In a possible embodiment, the transmission module is further configured to send the first recognition result for the plurality of sub-images to the first recognition system through remote communication.

In a possible implementation, the transmission module is further configured to receive a transformed image from the first recognition system, where the transformed image is an image obtained by transforming the target image; the device further comprises: a detection module; the detection module is further configured to detect a plurality of transformation objects in the transformation image, and obtain pixel positions of the plurality of transformation objects in the transformation image; the transmission module is further used for returning the pixel positions of the plurality of transformation objects in the transformation image to the first recognition system.

In a possible implementation manner, the transmission module is specifically configured to sequentially return, to the first recognition system, recognition results of objects in each sub-image according to a receiving order of each sub-image of the target image.

In a possible embodiment, the identifying module is specifically configured to identify the images in the plurality of sub-images in parallel by using a plurality of processes.

In a fifth aspect, the present application provides a computing device comprising a processor, a memory, and a display. The processor and the memory are in communication with each other. The processor is configured to execute instructions stored in the memory to cause the computing device to perform the object recognition method as in the first aspect or any implementation manner of the first aspect.

In a sixth aspect, the present application provides a computing device comprising a processor, a memory, and a display. The processor and the memory are in communication with each other. The processor is configured to execute instructions stored in the memory to cause the computing device to perform the object recognition method as in the second aspect or any implementation manner of the second aspect.

In a seventh aspect, the present application provides a computer-readable storage medium, having stored therein instructions, which, when run on a computing device, cause the computing device to execute the object recognition method according to the first aspect or any implementation manner of the first aspect.

In an eighth aspect, the present application provides a computer-readable storage medium having stored therein instructions, which, when run on a computing device, cause the computing device to perform the object recognition method of the second aspect or any implementation of the second aspect.

In a ninth aspect, the present application provides a computer program product comprising instructions which, when run on a computing device, cause the computing device to perform the object recognition method of the first aspect or any of the implementations of the first aspect.

In a tenth aspect, the present application provides a computer program product comprising instructions which, when run on a computing device, cause the computing device to perform the object recognition method of the second aspect or any implementation of the second aspect.

The present application can further combine to provide more implementations on the basis of the implementations provided by the above aspects.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is an architectural diagram of an exemplary application scenario of the present application;

FIG. 2 is a schematic flowchart of an object recognition method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of coordinates representing pixel positions in an embodiment of the present application;

FIG. 4 is a diagram illustrating a second recognition result determined in the embodiment of the present application;

fig. 5 is an interaction diagram of a user, an edge data center, and a cloud data center in an embodiment of the present application;

fig. 6 is an interaction diagram of a user, an edge data center, and a cloud data center in an embodiment of the present application;

fig. 7 is a schematic structural diagram of an object recognition apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of another object recognition apparatus according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of another computing device according to an embodiment of the present application.

Detailed Description

The scheme in the embodiments provided in the present application will be described below with reference to the drawings in the present application.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished.

The object recognition technology generally refers to a technology for processing, analyzing and understanding an image by using a computer or the like to recognize various objects in the image. The object in the image may be, for example, a character, an object, or the like. Taking the recognition of the text in the image as an example, the text in the image provided by the user (through the user terminal) can be recognized in the application scene shown in fig. 1. Of course, the application scenario to which the present application is applied is not limited to the scenario example shown in fig. 1, and may also be applied to other possible application scenarios.

As shown in fig. 1, a user (which may be through a device such as a user terminal) may provide an image containing text to be recognized to an edge data center, where the edge data center may be a collection including devices such as a server deployed at an edge side, and may provide a corresponding service for the user at the edge side, such as providing a text recognition service for the user. The edge data center represents a physical environment that is closer to the user's device. Correspondingly, the cloud data center may be a set including devices such as servers deployed on a cloud side, and the cloud data center is usually farther from the devices of the users than the edge data center, and the cloud data center is dedicated to provide cloud services for the users in a plurality of different regions. Or, the edge data center and the cloud data center may cooperate with each other to cooperatively provide corresponding business services for the user.

Since the recognition capability of the edge data center for the characters in the image is generally difficult to meet the requirements of the user on the recognition accuracy, the recognition speed and the like of the object in the image, the edge data center can forward the image provided by the user to the cloud data center through remote communication, so that the cloud data center recognizes the characters in the image based on the corresponding deep learning algorithm, and then the cloud data center transmits the recognition result to the user through the edge data center. For example, in a text translation scenario, a user may send an image containing a piece of english text to a cloud data center through an edge data center; the cloud data center can identify English contents in the image, the identified English contents are transmitted to the user through the edge data center, and the user translates the received English contents into Chinese texts by using a corresponding translation tool.

However, when the image provided by the user includes sensitive information, such as privacy information of the work experience of the user, a home address, and the like, the sensitive information in the image may be leaked on the cloud data center, for example, a result recognized by the cloud data center is stolen in a cloud end, and the like. The sensitive information may refer to information that the user does not want to be leaked, for example: may be information that is not properly used or that is not authorized to be contacted/modified by a person that would be detrimental to the privacy rights of the person that the person is legally entitled to.

In order to reduce the risk of sensitive information leakage in a target image, part of users select to purchase and deploy a high-performance server in an edge data center, so that the edge data center has high data processing capacity, thus, objects in the image can be identified on the edge data center, the identification accuracy and efficiency of the objects can generally meet the requirements of the users, and meanwhile, the risk of sensitive information leakage caused by object identification in a cloud data center can be avoided. However, purchasing a high-performance server can dramatically increase the cost to the user. However, if the edge data center with low data processing capability is used to identify the object in the image, the identification speed is slow, the accuracy is low, and the user requirements are generally difficult to achieve.

Therefore, the object identification method provided by the embodiment of the application can reduce the risk of leakage of sensitive information when identifying the object in the image. Specifically, the edge data center may obtain a target image including a plurality of objects, and further determine a plurality of sub-images corresponding to the target image, for example, the edge data center may crop the target image into a plurality of sub-images, and the like, where each determined sub-image may include at least one object, and then the edge data center may send the plurality of sub-images to the cloud data center through remote communication, and the cloud data center identifies the object in each sub-image, obtains an identification result corresponding to each sub-image, and sends the identification result to the edge data center, so that the edge data center obtains an identification result for the target image based on the identification result corresponding to each sub-image.

Because the edge data center sends the plurality of sub-images corresponding to the target image to the cloud data center instead of the whole target image, even if the target image contains sensitive information, the information identified by the cloud data center based on the plurality of sub-images is usually only each part of the sensitive information, and the combination relationship of each part of the sensitive information cannot be known, so that when each part of the sensitive information identified by the cloud data center leaks, the sensitive information in the target image is difficult to determine from the plurality of information combinations due to the plurality of possible information combinations among the parts, which is equivalent to that the sensitive information does not actually leak, and the risk of the sensitive information in the target image leaking in the object identification process can be reduced.

Meanwhile, the equipment for identifying the object in the sub-image is a cloud data center with higher data processing capacity instead of an edge data center, so that the identification accuracy and the identification efficiency of the object are higher, a user does not need to deploy a high-performance server on an edge side, and the requirement on the hardware cost of the user is reduced.

Various non-limiting embodiments of object recognition are described in detail below.

Fig. 2 is a schematic flow chart of an object identification method according to an embodiment of the present application. The method may be applied to the application scenario shown in fig. 1, and certainly may also be applied to other application scenarios, which is not limited in this embodiment. For example, the first identification system in the embodiment shown in fig. 2 may be deployed in the edge data center (or in the edge data center and the user terminal) in fig. 1, for example, may be a server system of the edge data center, or may be a software system, which is deployed in the form of software in a device in the edge data center; the second identification system in the embodiment shown in fig. 2 may be deployed in the cloud data center shown in fig. 1, for example, may be a server system of the cloud data center, or may be a software system, which is deployed in the form of software in a device in the cloud data center. The object identification method shown in fig. 2 may specifically include:

s201: a first recognition system acquires a target image, the target image including a plurality of objects.

The object recognition in the present embodiment may refer to a process of recognizing an object included in a target image based on an object recognition technique. The object to be recognized may be a character in the target image, such as chinese, english, numeral, symbol, formula, and the like, and correspondingly, when the target image includes a plurality of objects, the target image may include a plurality of, multiple segments, or multiple types of character contents. Optionally, the object to be recognized may also refer to an object in the target image, such as a trademark, a component, or the like, and may also be another type of object.

In some scenarios, the information characterized by the combination of multiple objects in the target image may be sensitive information. For example, the target image may include information such as a name, a home address, a mobile phone number, and a calendar of a certain person, which all belong to private information and the like, and also belong to sensitive information in this embodiment.

In this embodiment, the target image containing sensitive information may have difficulty in completing object recognition in the first recognition system. For example, the load of the first recognition system is high, and it is difficult to provide the user with the object recognition service of the target image; for another example, the data processing performance of the first recognition system may be lower than that of the second recognition system, and if the object recognition is performed on the target image based on the first recognition system, the recognition accuracy and efficiency may be low. Thus, object recognition may be performed on the target image by the second recognition system. However, since the target image may contain sensitive information, if the first recognition system directly sends the target image to the second recognition system, the sensitive information recognized by the second recognition system on the target image, such as the home address information, the mobile phone number, and the academic calendar, which are recognized to obtain "zhang san" may be leaked at the second recognition system, thereby threatening the privacy security of "zhang san".

For this reason, after the first recognition system acquires the target image, step S202 and the subsequent steps may be further performed to reduce the risk of leakage of sensitive information while completing the object recognition. When the target image does not contain sensitive information, the technical solution of the present embodiment may also be adopted to perform object recognition.

As an example, the first recognition system may be to determine whether the target image contains sensitive information based on the sensitive indication information. Specifically, the user may upload a target image to the first recognition system, and meanwhile, the user may also upload sensitive indication information corresponding to the target image, where the sensitive indication information may be, for example, a sensitive tag added to the target image, and may be used to indicate whether the target image includes sensitive information, so that the first recognition system may determine that the target object includes sensitive information according to the received sensitive indication information, and perform a subsequent processing process on the target image.

Optionally, the first recognition system may also provide two mechanisms of object recognition services for the user, namely an object recognition service 1 and an object recognition service 2, where the object recognition service 1 may be used to recognize an object in an image containing sensitive information, and the object recognition service 2 may be used to recognize an object in an image not containing sensitive information. In this way, the user can determine which object recognition service to select on the first recognition system based on whether sensitive information is contained in the image. When the user selects the object recognition service 1, the first recognition system may complete recognition of the object in the image by using the object recognition process described in this embodiment, and when the user selects the object recognition service 2, the first recognition system may directly transmit the image to the second recognition system to complete object recognition.

S202: the first recognition system may determine a plurality of sub-images corresponding to the target image, wherein each sub-image may include at least one object.

After the first recognition system obtains the target image, a plurality of sub-images corresponding to the target image can be further determined. Each sub-image may be a part of the target image, and each sub-image may include at least one object, and accordingly, when the target image includes sensitive information composed of a plurality of objects, respective components of the sensitive information may be respectively located in different sub-images.

In a possible embodiment, the first recognition system may first acquire pixel positions of a plurality of objects on the target image, and crop the target image according to the pixel position of each object on the target image, so as to obtain a plurality of sub-images. For example, the pixel position of the object on the target image may be, for example, the coordinate of a central pixel point of the object on the target image, so that the first identification system may determine, according to the coordinate of the central pixel point, that pixel points within a certain pixel range from the central pixel point all belong to pixel points of the object on the target image, so as to obtain a pixel region of the image of the object on the target image; alternatively, the pixel position may be a vertex position of the image of the object on the target image, as shown in fig. 3, each row may represent a rectangular frame position of the image of the object on the target image, and sequentially represents an abscissa (e.g., 32, 203, etc. in fig. 3), an ordinate (e.g., 162, 124, etc. in fig. 3) of an upper left corner of the rectangular frame, and an abscissa (e.g., 165, 379, etc. in fig. 3), an ordinate (e.g., 182, 168, etc. in fig. 3) of a lower right corner of the rectangular frame. Of course, the pixel position may also be expressed in other forms, and this embodiment does not limit this. Then, the first recognition system may crop the target image according to the pixel position of each object on the target image, so as to obtain a plurality of sub-images corresponding to the target image.

As an example of determining the pixel position, the first recognition system may recognize the pixel position of each object on the target image by using an object detection algorithm configured thereon. However, in some scenarios, the accuracy of the pixel position detected by the first recognition system may be low, and for example, when the first recognition system is deployed in the edge data center, based on the limitation of the data processing performance of the edge data center, it is difficult for the first recognition system to accurately and quickly detect the pixel position of the object in the target image, so that when the first recognition system detects the object in the target image, there may be problems of low detection accuracy and slow detection speed.

Based on this, in another example of determining the pixel position, the pixel position corresponding to each object may be detected by the second recognition system with higher data processing performance, so as to improve the detection accuracy of the pixel position. For example, a first recognition system may send a target image to a second recognition system; the second recognition system can have high data processing performance and can support the running of an object detection algorithm with high precision, so that the second recognition system can quickly detect the pixel positions of a plurality of objects on the target image, the detection precision of the pixel positions is high, and then the second recognition system can feed back the pixel positions to the first recognition system, so that the first recognition system can obtain the pixel positions with high detection precision.

However, when the target image includes the sensitive information, if the target image is sent to the second identification system, the sensitive information on the target image may be leaked on the second identification system, for example, after an illegal user steals the target image on the second identification system, the sensitive information on the target image is known by the illegal user, thereby causing the leakage of the sensitive information. Alternatively, sensitive information leakage may also easily occur during the process of the first recognition system sending the target image to the second recognition system (e.g., in a communication link).

Thus, in yet another example of determining pixel locations, the first recognition system may first transform the target image to obtain a transformed image. Specifically, the first recognition system may detect a plurality of objects on the target image using a preset object detection algorithm or an object detection model, and transform the detected objects, for example, the object may be subjected to any one or more of processes such as replacement, encryption, and masking, the object subjected to the transformation process may be referred to as a transformation object, and the target image subjected to the transformation process may be referred to as a transformation image, so that sensitive information consisting of a plurality of objects can be changed into non-sensitive information due to the transformation of the objects, thereby realizing the desensitization of the information, and thus, even if an illegal user steals the target image after the conversion processing (namely, the converted image), the sensitive information can not be contained in the target image after the conversion processing, so that the original sensitive information on the target image can not be known by the illegal user.

The first recognition system may then send the transformed image to the second recognition system. The second recognition system may perform object detection on the received transformed image by using a preset deep learning algorithm, determine a pixel position of each transformed object on the transformed image, and return the pixel position to the edge data center. It is to be noted that after the target image is subjected to the transformation processing, the object on the target image and the transformation object on the transformation image generally have a difference in content, but the pixel position of the transformation object on the transformation image may be the same as the pixel position of the object on the target image, or there may be a certain correspondence between the two pixel positions. In this way, based on the pixel positions of the transformation objects on the transformation image, the pixel positions of the respective objects in the target image on the target image can be determined.

After the first recognition system determines the pixel position of the object in the target image on the target image and cuts the target image based on the pixel position to obtain a plurality of sub-images, the following step S203 may be continuously performed to send the plurality of sub-images to the second recognition system, so as to perform object recognition on the second recognition system.

It is to be noted that, when the data processing performance of the second recognition system is higher than that of the first recognition system, the objects detected by the second recognition system based on the high-precision deep learning algorithm configured thereon can be more comprehensive, that is, the second recognition system can detect, in addition to the pixel position of the conversion object, the pixel position of a missing object on the conversion image, which can be an object that the first recognition system cannot detect based on the limitation of the precision of the object detection algorithm thereof, so that the first recognition system can correct a plurality of objects included in the target image based on the pixel position returned by the second recognition system and can obtain the pixel positions of the respective objects with relatively high precision on the target image.

S203: the first recognition system sends a plurality of sub-images corresponding to the target image to the second recognition system through remote communication.

In this embodiment, the first recognition system may perform remote communication with the second recognition system, such as communication based on a hypertext Transfer Protocol (HTTP), and send the plurality of sub-images corresponding to the target image to the second recognition system based on a preset sequence, as shown in fig. 2. Since the first recognition system sends the second recognition system a plurality of sub-images of the target image instead of the whole target image, even if the target image contains sensitive information and the process of transmitting the plurality of sub-images of the target object to the second recognition system is stolen by an illegal user, the illegal user cannot know the combination relationship among the sub-images, so that the illegal user is difficult to combine the information on the sub-images into the sensitive information included in the target image, and the risk of the sensitive information being leaked in the process of transmitting the sensitive information to the second recognition system can be reduced.

It should be noted that the target image in this embodiment may be one image, and in some scenes, the target image may also include a plurality of images. Taking the example that the target image at least includes the first image and the second image (the target image may also include three or more images), in this case, the plurality of sub-images corresponding to the target image may include at least one sub-image corresponding to the first image and at least one sub-image corresponding to the second image. The first image and the second image may both include sensitive information, or one of the images may include sensitive information while the other image does not include sensitive information. Therefore, the sub-images of the first image and the second image are mixed and sent to the second recognition system, the difficulty of obtaining the sensitive information on each image based on the combination of the sub-images of different images can be further increased, and the risk of leakage of the sensitive information is reduced. Of course, neither the first image nor the second image may include sensitive information, which is not limited in this embodiment.

S204: the second recognition system recognizes the object in each sub-image, and obtains a first recognition result aiming at the plurality of sub-images, wherein the first recognition result comprises recognition results of the plurality of objects.

After receiving the plurality of sub-images, the second recognition system may recognize the object in each sub-image by using a high-precision object recognition algorithm or an object recognition model, for example, when the object is a text, the second recognition system may perform text recognition by using a high-precision Long Short Term Memory (LSTM) algorithm, and the like, so as to obtain a recognition result of the object in each sub-image, thereby obtaining a first recognition result of the plurality of sub-images, where the first recognition result is a set of recognition results of the object in each sub-image. In addition, the second recognition system may support a high-precision object recognition algorithm or an object recognition model (for example, the second recognition result may be deployed in a cloud data center, and the data processing performance is high), so that the first recognition result obtained by the second recognition system performing object recognition on a plurality of sub-images may generally achieve high precision.

For example, the second recognition system may recognize the objects in the sub-images in parallel by using a plurality of processes thereon, for example, recognize the object in the sub-image 1 by using the process 1, recognize the object in the sub-image 2 by using the process 2, and the like, which may effectively improve the object recognition efficiency compared to the embodiment of serially recognizing the objects in the sub-images.

As shown in fig. 5 and 6, the first recognition system may be deployed in one device, or may be deployed in a plurality of devices (e.g., device 1 and device 2 shown in fig. 6). When the first recognition system is deployed in a device, it may be the device that detects the pixel position of the transformation object in the transformation image and recognizes the object in the target image; however, when the first recognition system is deployed in a plurality of devices, the detection of the pixel position of the transformation object in the transformation image may be completed by the device 1, and the recognition of the object in the target image may be completed by the device 2, which is not limited in this embodiment.

S205: the second recognition system returns the first recognition result to the first recognition system through remote communication.

As shown in fig. 2, the second recognition system may return the first recognition result obtained by recognition to the first recognition system.

In a further possible embodiment, in order for the first recognition system to determine which recognition result each of the sub-images of the target image corresponds to, the first recognition system may send the plurality of sub-images to the second recognition system based on a preset order, and the second recognition system may record a receiving order of the plurality of sub-images of the target image when receiving the plurality of sub-images, and then sequentially return the recognition results corresponding to the respective sub-images according to the receiving order of the respective sub-images. Thus, the first recognition system may determine that the first received recognition result is the recognition result corresponding to the first transmitted sub-image, the second received recognition result is the recognition result corresponding to the second transmitted sub-image, and so on. Of course, the first recognition system and the second recognition system may negotiate to adopt other order correspondence rules, for example, the recognition result received first by the first recognition system is the recognition result corresponding to the sub-image sent last by the second recognition system, the recognition result received second by the first recognition system is the recognition result corresponding to the sub-image sent last by the second recognition system, and the like.

In addition to determining the corresponding relationship of the sub-images according to the receiving and sending sequence of the sub-images and the recognition results, in other possible embodiments, the first recognition system may also assign an image identifier to each sub-image, and send the image identifier and the first sub-image to the second recognition system, so that the second recognition system may establish the corresponding relationship between the image identifier and the recognition results after determining the recognition results of the objects in the sub-images, thereby obtaining the corresponding relationship between the image identifier of each sub-image and the recognition results of the sub-images, and thus returning the corresponding relationship and the first recognition results to the first recognition system, so that the first recognition system determines which recognition result each sub-image corresponds to according to the corresponding relationship, and there is no need to make a request for the sequence of sending the recognition results by the second recognition system.

S206: and the first recognition system determines a second recognition result aiming at the target image according to the received first recognition result and the position relation of the objects in the plurality of sub-images on the target image.

After receiving the first recognition result, the first recognition system may acquire a position relationship of the object in the plurality of sub-images on the target image, where the position relationship may reflect a combination relationship between different objects, and the first recognition system may record the target image locally when cutting the target image, so that the first recognition system may combine the recognition results corresponding to the sub-images in the first recognition result according to the position relationship to obtain a second recognition result for the target image, as shown in fig. 2, where the second recognition result may reflect information included in the target image. For example, assuming that the first recognition result received by the first recognition system includes "zhang san", "lie si", "master", "doctor", "street a", "street C", "cell B", "cell D", "cell 123XXXX 8901", "123 XXXX 1098", the first recognition system may determine, based on the positional relationship of the object in the plurality of sub-images on the target image as shown in the left side of fig. 4, that the second recognition result is "zhang san, learned as master, home address is cell a street B, and telephone number is 123XXXX 8901", and "lie si, learned as doctor, home address is cell C street D, and telephone number is 123XXXX 1098", as shown in the right side of fig. 4.

As an example, the first recognition system may integrate the positions of the objects in the respective sub-images in the target image and the corresponding recognition results by using a preset Software Development Kit (SDK), extract structured data from the target image, such as extracting the structured data shown on the right side of fig. 4.

Since the second recognition result for the target image is obtained on the first recognition system, even if the target image contains sensitive information, the sensitive information is held in the first recognition system, and it is difficult for leakage to occur at the second recognition system. For example, when the first identification system is deployed in the edge data center on the customer edge side and the second identification system is deployed in the cloud data center on the cloud side, the complete sensitive information may be only on the customer edge side and may not be present on the cloud side, and the risk of the sensitive information being leaked is relatively low. Further, the client edge side may return the extracted sensitive information (e.g., structured data) to the user, as shown in fig. 3 and fig. 4, the user may only perceive the interaction with the client edge side, and does not need to interact with the cloud side, for the user, it is equivalent to complete the identification of the object in the target image at the client edge side, and not only the accuracy of the identification is higher, but also the risk that the identified sensitive information is leaked is lower.

In addition, the first recognition system may utilize a plurality of processes thereon to perform an object recognition process on a plurality of images in parallel, for example, the first recognition system may utilize process 1 to perform an object recognition process on a target image, which may include the above-mentioned transformation process, cropping, and extraction of structured data on the target image, and utilize process 2 to perform an object recognition process on another image, which may include the transformation process, cropping, extraction of structured data on the image, and so on, using the above-mentioned similar implementation, thereby improving the efficiency of recognizing an object in a plurality of images on the first recognition system.

For convenience of understanding, the technical solutions of the embodiments of the present application are described below with reference to a scene example in which an object is specifically a character. Also, in this scenario, the first recognition system may be deployed at the edge data center and the second recognition system may be deployed at the cloud data center. As shown in fig. 5, a schematic flow chart of an object identification method combined with a specific scene in the embodiment of the present application is shown, where the method specifically includes:

s501: and uploading a target image to the edge data center by a user, wherein the target image comprises a plurality of sections of characters.

The information obtained by combining the multiple sections of characters in the target image may belong to the sensitive information described in the foregoing embodiment. For example, when the plurality of text segments in the target image include text contents of "zhang san" (name), "XX street XX cell" (home address), "135 XXXXXXXX" (mobile phone number), and "master" (academic calendar), "zhang san" is combined with "XX street XX cell" in XX city, and can represent specific home address information of the person named zhang san; the combination of Zhang III and 135XXXXXXX can represent the mobile phone number information of Zhang III; "zhang san" and "master" are combined, and the academic information of "zhang san" and the like can be represented, and these information are the personal privacy information of "zhang san" and also belong to the sensitive information in this embodiment. Of course, the objects in the target image may be finer-grained partitions, for example, the home address information may be divided into a plurality of objects, and the plurality of objects in the target image may also include "zhangsan", "XX city", "XX street", "XX cell", "135 XXXXXXXX" (the mobile phone number may also be divided into a plurality of objects), and "master", and the like. The dividing manner of the object in the target image is not limited in this embodiment.

For example, the user may upload the target image to the edge data center through a user terminal or a client. Furthermore, when the user uploads the target image, the sensitive indication information corresponding to the target image can be uploaded, so that the edge data center can determine that the target image contains the sensitive information according to the sensitive indication information.

S502: and the edge data center performs character replacement on the received multiple sections of characters on the target image to obtain pixel coordinates of the replaced characters on the converted image and the converted image.

In specific implementation, the edge data center may detect multiple segments of text on the target image by using a preset text detection algorithm, such as an mser (maximum Stable explicit regions) algorithm. Of course, the edge data center may also use other algorithms to perform text detection, which is not limited in this embodiment. The edge data center may then replace the detected text. The replacement refers to replacing the content on the target image with other content, such as replacing the mobile phone number shown in fig. 4 from "123 XXXX 8901" to "00000000000". After the characters on the target image are replaced, the obtained new image is the converted image of step S502, and the pixel positions of the replaced characters in the converted image and the pixel positions of the characters before replacement in the target image can be kept consistent. In this embodiment, the pixel position of the replaced character in the transformed image may be expressed as the pixel coordinate of the character in the transformed image.

Note that, when the edge data center performs the transformation processing on the target image, the characters in the target image may be replaced, or the characters in the target image may be encrypted, masked, and the like. The encryption means that the content on the target image is encrypted by using a corresponding encryption algorithm, for example, the mobile phone number shown in fig. 4 may be encrypted from "123 XXXX 8901" to "234 XXXX 90123"; the mask is used for masking part of information of the object on the target image and retaining the rest information, and the length of the object information is kept unchanged, for example, the mask is used for masking a mobile phone number "123 XXXX 8901" shown in fig. 4, and the masked mobile phone number is "123 x 01". Of course, the conversion process may be other processes, such as rearrangement (information is scrambled according to a certain order), and the present embodiment does not limit this.

Further, the edge data center may also perform random replacement on the characters on the target image through a random algorithm, for example, a corresponding random probability may be calculated for each character, and when the random probability of the character is greater than a preset value, the character may be replaced, otherwise, the character is not replaced, so that the difficulty in restoring the desensitized image may be increased. Of course, the characters on the target image may be replaced according to a certain rule, which is not limited in this embodiment. It should be noted that the replaced characters described in this embodiment may refer to characters that are not actually replaced (for example, characters with a random probability value smaller than a preset value) and replaced characters on the transformed image.

S503: the edge data center sends the transformed image to the cloud data center through remote communication.

When the target image contains sensitive information, if the edge data center directly sends the target image to the cloud data center, the risk that the sensitive information is leaked in the transmission process of the target image and on the cloud data center is high. Therefore, in this embodiment, the edge data center may perform conversion processing on the target image, specifically, replace the characters on the target image, and send the converted image obtained through the conversion processing to the cloud data center, so that the cloud data center determines the pixel position of each segment of characters on the converted image.

S504: the cloud data center can preprocess the conversion image, and perform character detection on the preprocessed conversion image by adopting a high-precision depth learning algorithm to obtain pixel positions of a plurality of sections of characters on the conversion image after preprocessing.

The cloud center may pre-process the transformed image after receiving the transformed image. Illustratively, the preprocessing may be, for example, a process of cropping or perspective correction of the transformed image.

For example, in some scenes, there may be more blank areas (such as the background of the transformed image) in the transformed image that do not contain the transformed object, and the cloud data center does not need to perform object detection in the blank areas, so after receiving the transformed image, the cloud data center may cut the transformed image to remove the blank areas of the transformed image, and obtain a cut image; then, the cloud data center may perform object detection on the cropped image by using a preset depth learning algorithm, and determine a pixel position of the transformation object on the cropped image, where the pixel position may also be a pixel position of the transformation object on the transformation image, or may calculate a pixel position of the transformation object on the transformation image according to the pixel position of the transformation object on the cropped image. The cloud data center can adopt algorithms such as a high-precision psenet algorithm or a retinet algorithm to perform character detection on the converted image, and the pixel position of each character on the converted image is determined. Of course, the algorithm for recognizing the characters by the cloud data center is not limited to the above example, and other applicable high-precision algorithms may be adopted.

For another example, in some other scenarios, the imaging of the replaced text on the transformed image is not a front view (e.g., the image is not photographed directly against the object), and the cloud data center may perform perspective correction on the transformed image before detecting the pixel position of the transformed object by using the high-precision depth learning algorithm, for example, the transformed image may be rotated by a certain angle and the text in the front view after rotation may be recognized. Correspondingly, the pixel position of the replaced character detected by the cloud data center by using the deep learning algorithm is the pixel position of the replaced character on the transformed image after the perspective correction processing.

S505: and the cloud data center returns the pixel position and the preprocessing information to the edge data center.

The cloud data center preprocesses the received converted image, and the pixel position of the replaced character on the preprocessed converted image may be different from the pixel position of the character on the converted image before the conversion processing, so that the cloud data center returns preprocessing information while returning the pixel position to the edge data center, so that the edge data center can calculate the pixel position of the replaced character on the preprocessed converted image according to the preprocessing information. Of course, in other possible embodiments, the cloud data center may also calculate, according to the preprocessing information, a pixel position of the replaced text on the transformed image before being preprocessed, and return the pixel position to the edge data center, where the embodiment shown in fig. 5 is merely an exemplary illustration and is not limited.

S506: and the edge data center corrects the pixel positions of the multiple sections of the alternative characters on the converted image according to the received pixel positions and the preprocessing information, and cuts the target image according to the modified pixel positions to obtain multiple sub-images corresponding to the target image.

After receiving the pixel position returned by the cloud data center, the edge data center can further determine the pixel position of the multiple segments of the replacement characters in the converted image on the converted image according to the received pixel position. For example, the edge data center may be a pixel position of the transformation object on the transformation image as a pixel position of the corresponding object in the target image on the target image; or, the edge data center may calculate pixel positions of the multiple segments of characters in the converted image based on the received pixel positions and the preprocessing information, and correct the pixel positions of the multiple segments of characters detected by the character detection algorithm in the previous edge data center using the calculated pixel positions, where the corrected pixel positions may be the pixel positions of the characters in the target image.

After the pixel positions of the multiple segments of the replacement characters in the converted image are determined according to the received pixel positions and the preprocessing information, because the pixel positions are always consistent with the pixel positions of the characters on the target image in the target image, the edge data center can cut the target image according to the pixel positions of the replacement characters in the converted image to obtain multiple sub-images, and each sub-image can contain at least one segment of the characters in the target image.

For example, when the target image includes the words "zhangsan", "XX city XX street XX cell", "135 XXXXXXXX" and "master", at least 4 sub-images can be obtained by cropping the target image, which can be a sub-image containing "zhangsan", a sub-image containing "XX city XX street XX cell", a sub-image containing "135 XXXXXXXX", a sub-image containing "master", and the like; of course, when the object in the target image is divided more finely, at least 6 sub-images may be determined, which may be a sub-image containing "zhangsan", a sub-image containing "XX city", a sub-image containing "XX street", a sub-image containing "XX cell", a sub-image containing "135 xxxxxx", and a sub-image containing "master", respectively; of course, the sub-images obtained by clipping may be in other numbers and contain other forms of text, which is not limited in this embodiment.

S507: and the edge data center sends the plurality of sub-images obtained by cutting to the cloud data center.

In some possible implementations, the edge data center may adjust the order of sending each sub-image to the cloud data center. Specifically, the edge data center may cut out sub-images from the target image according to a certain sequence according to the pixel positions of the characters on the target image, for example, cut out sub-images according to the sequence of the pixel points on the target image from top to bottom (or from left to right), and if the edge data center sends a plurality of sub-images to the cloud data center according to the sequence of cutting out the sub-images or the reverse sequence, an illegal user may determine the combination relationship between the sub-images according to the sequence of the intercepted sub-images, thereby increasing the risk of sensitive information leakage on the target image. Therefore, the edge data center can adjust the sending sequence of each sub-image, so that an illegal user is difficult to determine the combination relationship between the sub-images based on the adjusted sending sequence of the sub-images, and is also difficult to obtain the sensitive information in the target image, thereby reducing the risk of the sensitive information being leaked to a certain extent. Of course, the edge data center may also send a plurality of sub-images to the second recognition system according to the order of cropping the sub-images, which is not limited in this embodiment.

It should be noted that the target image in this embodiment may be one image or may include a plurality of images. When the target image includes a plurality of images, the sub-images corresponding to the target image are the sets of the sub-images corresponding to the plurality of images. Therefore, the sub-images of the plurality of images are mixed and sent to the second recognition system, the difficulty of obtaining the sensitive information on each image based on the combination of the sub-images of different images can be further increased, and the risk of leakage of the sensitive information is reduced.

S508: and the cloud data center performs character recognition on each received sub-image by adopting a high-precision deep learning algorithm to obtain a recognition result of characters in each sub-image.

For example, the cloud data center may use a high-precision LSTM algorithm to recognize characters in each sub-image, and the like. Because the cloud data center can have higher data processing capacity and can support a high-precision deep learning algorithm, the accuracy and the efficiency of the obtained character result are generally higher when the characters in the sub-images are identified in the cloud data center.

Meanwhile, although the target image may contain sensitive information, after the sensitive image is cut into a plurality of sub-images, the information carried by each sub-image is usually a part of the sensitive information, and the plurality of sub-images are sent to the cloud data center, even if the plurality of sub-images are transmitted to the cloud data center or leak from the cloud data center, the sensitive information in the target image is difficult to obtain according to the information combination of the sub-images because the combination relation between the plurality of sub-images cannot be known, so that the risk of leakage of the sensitive information in the process of being identified can be reduced.

S509: and the cloud data center returns the recognition result aiming at the characters in each sub-image to the edge data center.

For example, the cloud data center may record a receiving sequence of each sub-image when receiving each sub-image, and sequentially send the recognition result of the characters in each sub-image to the edge data center according to the recorded receiving sequence of each sub-image, so that the edge data center may determine, according to the sequence of the received recognition result, which sub-image the recognition result corresponds to, the recognition result.

Or in other examples, the cloud data center may return the corresponding relationship between the recognition result and the sub-images while returning the recognition result, so that the edge data center determines which recognition result corresponds to each sub-image according to the received corresponding relationship, and thus the cloud data center may not be required to send the corresponding sub-images in a certain order.

Further, while returning the first recognition result, the cloud data center may also return a confidence level corresponding to the recognition result for the characters in each sub-image to the edge data center, where the confidence level may be used to indicate a confidence level of each recognition result. In this way, when the edge data center determines that the confidence of the recognition result of the characters in the sub-images is lower than the preset value, the recognition result of the characters in each sub-image can be selected to be abandoned and the characters in the sub-image can be re-recognized, and when the confidence of the recognition result of the characters in the sub-images is determined to be higher than the preset value, the character recognition result of the target image can be further determined based on the recognition result of the characters in each sub-image. Of course, the edge data center may also obtain the text recognition result for the target image according to the text recognition result of the sub-image, regardless of whether the confidence of the recognition result of the text in the sub-image is higher than the preset value, which is not limited in this embodiment.

S510: and the edge data center extracts the structured data from the target image according to the received recognition result of the object in each sub-image.

For example, the edge data center may integrate the positions of the characters in the sub-images in the target image and the corresponding recognition results by using a preset SDK, and extract the structured data from the target image.

S511: and the edge data center returns the extracted structured data to the user.

In the embodiment shown in fig. 5, the detection process of the replacement characters in the transformed image and the recognition process of the characters in the sub-images are both performed by the same device in the cloud data center, and in other possible implementations, the two processes may be performed by different devices in the cloud data center respectively. As shown in fig. 6, the cloud data center may include a device 1 and a device 2, where the device 1 may perform a detection process of a pixel position of a replacement text in a transformed image, and the device 2 may perform a recognition process of a text in a plurality of sub-images. Of course, the cloud data center may also include three or more devices, and may both participate in the above-described text detection and text recognition processes. For example, for a plurality of sub-images received by the cloud data center, multiple devices in the cloud data may perform character recognition on different sub-images at the same time, and the like.

The object recognition method provided by the embodiment of the present application is described above with reference to fig. 1 to 6, and the object recognition apparatus provided by the embodiment of the present application and the computing device for implementing the function of the object recognition apparatus are described next with reference to the accompanying drawings.

As shown in fig. 7, an object recognition apparatus 700 is further provided in the embodiment of the present application, where the apparatus 700 can be applied to the aforementioned first recognition system, and executes an object recognition method executed by the aforementioned first recognition system. The embodiment of the present application does not limit the division of the functional modules in the apparatus 700, and the following exemplary provides a division of the functional modules:

an obtaining module 701, configured to obtain a target image, where the target image includes a plurality of objects;

a determining module 702, configured to determine a plurality of sub-images corresponding to the target image, each sub-image including at least one object;

a transmission module 703, configured to send the multiple sub-images to the second device, so that the second device identifies objects in the multiple sub-images.

In a possible implementation manner, the determining module 702 is specifically configured to acquire pixel positions of the multiple objects on the target image, and crop the target image according to the pixel positions of the objects on the target image, so as to obtain multiple sub-images corresponding to the target image.

In a possible implementation manner, the determining module 702 is specifically configured to perform transformation processing on the target image to obtain a transformed image;

the transmission module 703 is further configured to send the transformed image to the second recognition system, and receive pixel positions of a plurality of transformation objects on the transformed image returned by the second recognition system;

the determining module 702 is specifically configured to determine pixel positions of the plurality of objects on the target image according to the pixel positions of the plurality of transformed objects.

In a possible implementation manner, the transmission module 703 is further configured to receive a first recognition result for the plurality of sub-images returned by the second recognition system;

the determining module 703 is further configured to determine a second recognition result for the target image according to the first recognition result and the position relationship of the objects in the plurality of sub-images on the target image.

In a possible implementation manner, the transmission module 703 is specifically configured to send, by the first recognition system, the plurality of sub-images to the second recognition system based on a preset sequence, and receive a first recognition result for the plurality of sub-images, which is returned by the second recognition system based on the preset sequence.

In a possible implementation, the target image includes at least a first image and a second image, and the plurality of sub-images corresponding to the target image includes at least a sub-image corresponding to the first image and a sub-image corresponding to the second image.

In a possible implementation manner, the transmission module 703 is further configured to receive the target image and the sensitive indication information of the target image uploaded by the user;

the determining module 702 is further configured to determine that the target image is an image containing sensitive information.

The object recognition apparatus 700 according to the embodiment of the present application may correspond to performing the object recognition method described in the embodiment of the present application, and the above and other operations and/or functions of each module of the object recognition apparatus 700 are respectively for implementing corresponding processes of each method performed by the first recognition system in fig. 2, and are not described herein again for brevity.

In addition, as shown in fig. 8, an object recognition apparatus 800 is further provided in the embodiment of the present application, where the apparatus 800 may be applied to the aforementioned second recognition system, and executes an object recognition method executed by the aforementioned second recognition system. The embodiment of the present application does not limit the division of the functional modules in the apparatus 800, and the following exemplary provides a division of the functional modules:

a transmission module 801, configured to receive a plurality of sub-images corresponding to a target object sent by a first recognition system through remote communication, where the target image includes a plurality of objects, and each sub-image includes at least one object;

an identifying module 802, configured to perform object identification on the multiple sub-images to obtain a first identification result for the multiple sub-images, where the first identification result includes an identification result of an object in each sub-image.

In a possible implementation, the transmission module 801 is further configured to send the first recognition result for the plurality of sub-images to the first recognition system through remote communication.

In a possible implementation, the transmission module 801 is further configured to receive a transformed image from the first recognition system, where the transformed image is an image obtained by performing a transformation process on the target image;

the device further comprises: a detection module 803;

the detecting module 803 is further configured to detect a plurality of transformation objects in the transformed image, and obtain pixel positions of the plurality of transformation objects in the transformed image;

the transmission module 801 is further configured to return the pixel positions of the plurality of transformation objects in the transformed image to the first recognition system.

In a possible implementation manner, the transmission module 801 is specifically configured to sequentially return the recognition results of the objects in the sub-images to the first recognition system according to the receiving order of the sub-images of the target image.

In a possible implementation, the identifying module 802 is specifically configured to identify the images in the plurality of sub-images in parallel by using a plurality of processes.

The object recognition apparatus 800 according to the embodiment of the present application may correspondingly execute the object recognition method described in the embodiment of the present application, and the above and other operations and/or functions of each module of the object recognition apparatus 800 are respectively for implementing corresponding processes of each method executed by the second recognition system in fig. 2, and are not repeated herein for brevity.

The object recognition apparatus 700 and the object recognition apparatus 800 may be implemented by computing devices, respectively. Fig. 9 and 10 each provide a computing device.

As shown in fig. 9, the computing device 900 may be specifically configured to implement the functions of the object recognition apparatus 700 in the embodiment shown in fig. 7.

Computing device 900 includes a bus 901, a processor 902, and a memory 903. The processor 902 and the memory 903 communicate with each other via a bus 901.

The bus 901 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 9, but this does not indicate only one bus or one type of bus.

The processor 902 may be any one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Micro Processor (MP), a Digital Signal Processor (DSP), and the like.

The memory 903 may include a volatile memory (volatile memory), such as a Random Access Memory (RAM). The memory 903 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory, a hard drive (HDD) or a Solid State Drive (SSD).

The memory 903 stores executable program codes, and the processor 902 executes the executable program codes to perform the object recognition method performed by the first recognition system.

As shown in fig. 10, the computing device 1000 may be specifically configured to implement the functions of the object recognition apparatus 800 in the embodiment shown in fig. 8.

Computing device 1000 includes a bus 1001, a processor 1002, and memory 1003. The processor 1002 and the memory 1003 communicate with each other via a bus 1001.

The bus 1001 may be a PCI bus, an EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.

The processor 1002 may be any one or more of a CPU, GPU, MP or DSP.

The memory 1003 may include a volatile memory (volatile memory), such as RAM. The memory 1003 may also include a non-volatile memory (non-volatile memory) such as a ROM, a flash memory, an HDD, or an SSD.

The memory 1003 stores executable program codes, and the processor 1002 executes the executable program codes to perform the object recognition method performed by the first recognition system.

The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store or a data storage device, such as a data center, that contains one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others. The computer-readable storage medium includes instructions that direct a computing device to perform the object recognition method performed by the first recognition system described above.

The embodiment of the application also provides another computer readable storage medium. The computer-readable storage medium can be any available medium that a computing device can store or a data storage device, such as a data center, that contains one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others. The computer-readable storage medium includes instructions that direct a computing device to perform the object recognition method performed by the second recognition system described above.

The embodiment of the application also provides a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computing device, cause the processes or functions described in accordance with embodiments of the application to occur, in whole or in part.

The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website site, computer, or data center to another website site, computer, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.).

The computer program product may be a software installation package which may be downloaded and executed on a computing device in the event that any of the aforementioned object recognition methods are required.

The description of the flow or structure corresponding to each of the above drawings has emphasis, and a part not described in detail in a certain flow or structure may refer to the related description of other flows or structures.

Claims

1. An object recognition method, characterized in that the method comprises:

a first recognition system acquires a target image, wherein the target image comprises a plurality of objects;

the first recognition system determines a plurality of sub-images corresponding to the target image, wherein each sub-image comprises at least one object;

the first recognition system sends the plurality of sub-images to a second recognition system via remote communication to cause the second recognition system to recognize objects in the plurality of sub-images.

2. The method of claim 1, wherein the first recognition system determines a plurality of sub-images corresponding to the target image, comprising:

the first recognition system acquires pixel positions of the plurality of objects on the target image;

and the first recognition system cuts the target image according to the pixel position of each object on the target image to obtain a plurality of sub-images corresponding to the target image.

3. The method of claim 2, wherein the first recognition system obtaining pixel locations of the plurality of objects on the target image comprises:

the first recognition system carries out transformation processing on the target image to obtain a transformed image;

the first recognition system sending the transformed image to the second recognition system;

the first recognition system receives pixel positions of a plurality of transformation objects on the transformation image returned by the second recognition system;

the first recognition system determines pixel locations of the plurality of objects on the target image based on pixel locations of the plurality of transformed objects.

4. The method according to any one of claims 1 to 3, further comprising:

the first recognition system receives a first recognition result for the plurality of sub-images returned by the second recognition system;

and the first recognition system determines a second recognition result aiming at the target image according to the first recognition result and the position relation of the objects in the plurality of sub-images on the target image.

5. The method according to any one of claims 1 to 4,

the first recognition system sending the plurality of sub-images to the second recognition system, including:

the first recognition system sends the plurality of sub-images to the second recognition system based on a preset sequence;

the first recognition system receives a first recognition result for the plurality of sub-images returned by the second recognition system, and comprises:

the first recognition system receives a first recognition result for the plurality of sub-images returned by the second recognition system based on the preset sequence.

6. The method according to any one of claims 1 to 5, wherein the target image comprises at least a first image and a second image, and the plurality of sub-images corresponding to the target image comprises at least a sub-image corresponding to the first image and a sub-image corresponding to the second image.

7. The method of any one of claims 1 to 6, wherein the first identification system is deployed in a fringe data center and the second identification system is deployed in a cloud data center.

8. The method of any one of claims 1 to 7, wherein the target image comprises a plurality of objects comprising a plurality of words.

9. The method of any of claims 1-8, wherein prior to the first recognition system acquiring the target image, the method further comprises:

the first recognition system receives the target image uploaded by a user and sensitive indication information of the target image;

the first recognition system determines the target image to be an image containing sensitive information.

10. An object recognition apparatus, applied to a first recognition system, the apparatus comprising:

an acquisition module for acquiring a target image, the target image comprising a plurality of objects;

a determining module for determining a plurality of sub-images corresponding to the target image, each sub-image comprising at least one object;

a transmission module, configured to send the multiple sub-images to the second device, so that the second device identifies objects in the multiple sub-images.

11. The apparatus according to claim 10, wherein the determining module is specifically configured to acquire pixel positions of the objects on the target image, and crop the target image according to the pixel positions of the objects on the target image to obtain a plurality of sub-images corresponding to the target image.

12. The apparatus of claim 11,

the determining module is specifically configured to perform transformation processing on the target image to obtain a transformed image;

the transmission module is further used for sending the transformed image to the second recognition system and receiving pixel positions of a plurality of transformation objects on the transformed image returned by the second recognition system;

the determining module is specifically configured to determine pixel positions of the plurality of objects on the target image according to the pixel positions of the plurality of transformed objects.

13. The apparatus according to any one of claims 10 to 12,

the transmission module is further used for receiving a first recognition result for the plurality of sub-images returned by the second recognition system;

the determining module is further configured to determine a second recognition result for the target image according to the first recognition result and a position relationship of the object in the plurality of sub-images on the target image.

14. The apparatus according to any one of claims 10 to 13,

the transmission module is specifically configured to send, by the first recognition system, the plurality of sub-images to the second recognition system based on a preset order, and receive a first recognition result for the plurality of sub-images, which is returned by the second recognition system based on the preset order.

15. The apparatus according to any one of claims 10 to 14, wherein the target image comprises at least a first image and a second image, and the plurality of sub-images corresponding to the target image comprises at least a sub-image corresponding to the first image and a sub-image corresponding to the second image.

16. The apparatus of any one of claims 10 to 15, wherein the first identification system is deployed in a fringe data center and the second identification system is deployed in a cloud data center.

17. The apparatus according to any one of claims 10 to 16, wherein the target image comprises a plurality of objects comprising a plurality of words.

18. The apparatus according to any one of claims 10-17, wherein the transmission module is further configured to receive the target image and sensitive indication information of the target image uploaded by a user;

the determining module is further configured to determine that the target image is an image containing sensitive information.

19. A computing device comprising a processor, a memory;

the processor is to execute instructions stored in the memory to cause the computing device to perform the method of any of claims 1 to 9.

20. A computer-readable storage medium comprising instructions that, when executed on a computing device, cause the computing device to perform the method of any of claims 1 to 9.