CN113378836A

CN113378836A - Image recognition method, apparatus, device, medium, and program product

Info

Publication number: CN113378836A
Application number: CN202110721750.8A
Authority: CN
Inventors: 张旭东; 辛颖; 冯原; 李超; 张滨; 王云浩; 王晓迪; 谷祎; 彭岩; 龙翔; 郑弘晖; 贾壮; 韩树民
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-09-10

Abstract

The present disclosure provides an image recognition method, apparatus, device, medium, and program product, which relate to the field of artificial intelligence, and in particular to computer vision and deep learning technologies, and are particularly applicable to intelligent transportation and smart city scenes. One embodiment of the method comprises: acquiring an image to be identified; determining the position of a digital display screen in an image to be recognized by using a pre-trained image recognition model; segmenting the area image of the digital display screen from the image to be identified based on the position of the digital display screen; identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen; and obtaining a character recognition result of the image to be recognized according to each character and the position of each character.

Description

Image recognition method, apparatus, device, medium, and program product

Technical Field

The present disclosure relates to the field of computers, and in particular to computer vision and deep learning technologies, and more particularly to an image recognition method, apparatus, device, medium, and program product, which can be used in intelligent traffic and smart city scenes.

Background

The digital display screen is used as common digital display equipment and has wide use scenes; for example, the reading of the nixie tube, which is a common instrument display device, is generally used as the basis for important parameters such as quality, temperature, humidity, and the like, so the reading of the reading is particularly important. The nixie tube is generally composed of 7 segments of Light-Emitting diodes (LEDs).

At present, the method for recognizing characters in a digital display screen comprises manual reading and recognition of characters of a nixie tube by using an intelligent device.

Disclosure of Invention

The embodiment of the disclosure provides an image identification method, an image identification device, image identification equipment, an image identification medium and a program product.

In a first aspect, an embodiment of the present disclosure provides an image recognition method, including: acquiring an image to be identified; determining the position of a digital display screen in an image to be recognized by using a pre-trained image recognition model; segmenting the area image of the digital display screen from the image to be identified based on the position of the digital display screen; identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen; and obtaining a character recognition result of the image to be recognized according to each character and the position of each character.

In a second aspect, an embodiment of the present disclosure provides an image recognition apparatus, including: an image acquisition module configured to acquire an image to be recognized; the position determining module is configured to determine the position of the digital display screen in the image to be recognized by utilizing a pre-trained image recognition model; the image segmentation module is configured to segment the area image of the digital display screen from the image to be identified based on the position of the digital display screen; the position obtaining module is configured to identify the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen; and the result obtaining module is configured to obtain a character recognition result of the image to be recognized according to each character and the position of each character.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.

In a fourth aspect, the disclosed embodiments propose a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in the first aspect.

In a fifth aspect, the disclosed embodiments propose a computer program product comprising a computer program that, when executed by a processor, implements the method as described in the first aspect.

In a sixth aspect, an embodiment of the present disclosure provides an image recognition system, which includes a camera of a terminal device, a gateway, and the electronic device described in the third aspect.

In a seventh aspect, an embodiment of the present disclosure provides a cloud control platform, including the electronic device described in the third aspect.

The image recognition method, device, equipment, medium and program product provided by the embodiment of the disclosure firstly acquire an image to be recognized; then, determining the position of a digital display screen in the image to be recognized by utilizing a pre-trained image recognition model; then, based on the position of the digital display screen, dividing the area image of the digital display screen from the image to be identified; then, identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen; and finally, obtaining a character recognition result of the image to be recognized according to each character and the position of each character. The method can be used for identifying the characters in the area image of the digital display screen after segmenting the background outside the digital display screen in the image to be identified so as to obtain a character identification result according to each character and the position of each character, thereby improving the accuracy of identifying the characters in the digital display screen.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a flow diagram for one embodiment of an image recognition method according to the present disclosure;

FIG. 3 is a flow diagram for one embodiment of an image recognition method according to the present disclosure;

FIG. 4 is a schematic diagram of one embodiment of an application scenario for an image recognition method according to the present disclosure;

FIG. 5 is a schematic block diagram of one embodiment of an image recognition device according to the present disclosure;

FIG. 6 is a block diagram of an electronic device used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the image recognition method or image recognition apparatus of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 103, and a server 104. The network 103 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 104. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal device

101, 102 to interact with a server 104, e.g. an image to be recognized, via a network 103. The

terminal devices

101, 102 may have installed thereon various client applications, intelligent interactive applications, such as image processing applications, image recognition software, and so on.

The

terminal apparatuses

101 and 102 may be hardware or software. When the

terminal devices

101 and 102 are hardware, the terminal devices may be electronic products that perform human-Computer interaction with a user through one or more modes of a keyboard, a touch pad, a display screen, a touch screen, a remote controller, voice interaction or handwriting equipment, such as a PC (Personal Computer), a mobile phone, a smart phone, a PDA (Personal Digital Assistant), a wearable device, a PPC (Pocket PC, palmtop), a tablet Computer, a smart car machine, a smart television, a smart speaker, a tablet Computer, a laptop Computer, a desktop Computer, and the like. When the

terminal apparatuses

101 and 102 are software, they can be installed in the electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 104 may provide various services. For example, the server 104 may acquire an image to be recognized on the

terminal devices

101, 102; then, the server 104 determines the position of the digital display screen in the image to be recognized by using a pre-trained image recognition model; segmenting the area image of the digital display screen from the image to be identified based on the position of the digital display screen; identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen; and obtaining a character recognition result of the image to be recognized according to each character and the position of each character.

The server 104 may be hardware or software. When the server 104 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server 104 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the image recognition method provided by the embodiment of the present disclosure is generally executed by the server 104, and accordingly, the image recognition apparatus is generally disposed in the server 104.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of an image recognition method according to the present disclosure is shown. The image recognition method may include the steps of:

step 201, acquiring an image to be identified.

In the present embodiment, the execution subject of the image recognition method (e.g., the server 104 shown in fig. 1) may receive an image to be recognized captured by a camera of a terminal apparatus (e.g., the

terminal apparatuses

101, 102 shown in fig. 1). The image to be recognized may include an object to be recognized, and the object to be recognized may be a specific object, such as a digital display screen and characters displayed on the digital display screen; the characters displayed on the digital display screen may include numbers and decimal points and/or units, for example, in degrees centigrade, where the numbers represent different contents and units in different scenes, for example, temperature measurement and timing in smart city scenes; for example, speed limit and speed measurement in an intelligent traffic scene.

It should be noted that the image to be recognized may be at least one image in one image or video. Optionally, when at least one frame of image in the video needs to be identified, the video may be deframed in advance; and then extracting at least one frame of image.

Here, the digital display screen, when powered on, may be used to display characters corresponding to a scene, typically time-shifted characters, such as time series data. The digital display screen may be a display screen for displaying characters through a Light-Emitting Diode (LED), such as a graphic LED display screen (asynchronous screen), a video LED display screen (synchronous screen), and a nixie tube. The nixie tube can be an 8-shaped nixie tube consisting of seven LEDs.

It should be noted that, the characters displayed on the digital display screen within the preset time period may be analyzed to determine whether the device in the above scenario operates normally; or, based on the analysis of characters displayed on the digital display screen within a preset time period, the working state corresponding to the next moment of the equipment under the scene is predicted, and when the predicted operation is abnormal, early warning is performed so as to play a role in early intervention.

In the technical scheme of the disclosure, the related acquisition, storage, application and the like of the image to be identified all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

Step 202, determining the position of the digital display screen in the image to be recognized by using the pre-trained image recognition model.

In this embodiment, the executing body may input the image to be recognized into a pre-trained image recognition model, so as to obtain a position of the digital display screen in the image to be recognized. The position may be a position of the digital display screen in the image to be recognized, for example, a position of a rectangular detection frame of the digital display screen on the image to be recognized.

It should be noted that the image recognition model may be determined based on the following steps: and taking the image to be processed as the input of the image recognition model, taking the category label and the position label of the digital display screen in the image to be recognized as the expected output of the image recognition model, training the initial model, and obtaining the image recognition model.

In this embodiment, after obtaining the image to be recognized and the category label and the position label of the digital display screen in the image to be recognized, the executing body may train the initial model by using the category label and the position label of the digital display screen in the image to be recognized and the image to be recognized, so as to obtain the image recognition model. During training, the executing subject can take the image to be recognized as the input of the image recognition model, and take the input corresponding category label and position label of the digital display screen in the image to be recognized as the expected output, so as to obtain the image recognition model. The initial model may be a neural network model in the prior art or future development technology, for example, the neural network model may include a classification model, such as fast R-CNN, YOLO, KNN, SVM, etc., to identify the category and location of the digital display screen.

It should be noted that the image recognition model may be trained by using a paddledetection framework.

In one example, the image to be recognized may be recognized by using a target detection method to obtain a position of a digital display screen in the image to be recognized, and the target detection method may include: (1) a Two-Stage target detection method (Two-Stage) represented by fast R-CNN; (2) a single-Stage target detection method (One-Stage) represented by YOLO.

And step 203, segmenting the area image of the digital display screen from the image to be identified based on the position of the digital display screen.

In this embodiment, the executing body uses an image segmentation model to segment the digital display screen in the image to be recognized, so as to obtain an area image of the digital display screen.

Here, the image segmentation model may be a Convolutional Neural Network (CNN) of image segmentation, and the model structure mainly includes an input layer, a feature extraction layer, and an output layer. The image segmentation model can be used for segmenting the area image of the digital display screen from the image to be identified.

Optionally, a Faster R-CNN model of ResNet-50 may be adopted to detect a region of the digital display screen in the image to be recognized, and the region may be segmented based on the detection result and used as the input of the subsequent digital display screen display reading model.

In one example, a Faster R-CNN model of ResNet-50 can be adopted to perform region detection on the nixie tube in the image to be recognized, and the region image of the nixie tube is segmented from the image to be recognized based on the position of the nixie tube; then, the area image of the nixie tube is used as the input of the nixie tube number reading model.

In this embodiment, the area of the digital display screen is located and identified based on the image identification model, the area of the digital display screen in the image to be identified is separated (for example, the background in the image to be identified is separated from the area of the digital display screen), the interference of the background is removed, and only the area image of the digital display screen is reserved for the identification of the subsequent characters.

And 204, identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen.

In this embodiment, the execution subject may recognize the area image of the digital display screen to obtain each character and a position of each character. The area image of the digital display screen may be an image comprising a digital display screen that may be used to display at least one character (i.e., an indication) that may include a number, and units of decimal points and/or numbers, such as, for example, degrees celsius (deg.c). The above-mentioned position may be a position of a character on the digital display screen (also, a position in the area image of the digital display screen), for example, a position of a rectangular detection frame of the character on the area image of the digital display screen.

In one example, the execution subject can recognize the area image of the nixie tube, and obtain each character and the position of each character in the area image of the nixie tube.

And step 205, obtaining a character recognition result of the image to be recognized according to each character and the position of each character.

In this embodiment, the execution main body may obtain a character recognition result of the image to be recognized according to each character and a position of each character. The character recognition result may be a recognition result of characters displayed on a digital display screen in the image to be recognized.

The image identification method provided by the embodiment of the disclosure comprises the steps of firstly obtaining an image to be identified; then, determining the position of a digital display screen in the image to be recognized by utilizing a pre-trained image recognition model; then, based on the position of the digital display screen, dividing the area image of the digital display screen from the image to be identified; then, identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen; and finally, obtaining a character recognition result of the image to be recognized according to each character and the position of each character. The method can be used for identifying the characters in the area image of the digital display screen after segmenting the background outside the digital display screen in the image to be identified so as to obtain a character identification result according to each character and the position of each character, thereby improving the accuracy of identifying the characters in the digital display screen.

With further reference to fig. 3, fig. 3 illustrates a flow 300 of one embodiment of an image recognition method according to the present disclosure. The image recognition method may include the steps of:

step 301, acquiring an image to be identified.

And step 302, determining the position of a digital display screen in the image to be recognized by using a pre-trained image recognition model.

And 303, segmenting the area image of the digital display screen from the image to be identified based on the position of the digital display screen.

And 304, identifying the area image of the digital display screen to obtain each character and a corresponding position in the area image of the digital display screen.

Based on the position of each character, the relative position between the characters is determined, step 305.

In the present embodiment, the execution subject of the image recognition method (e.g., the server 104 shown in fig. 1) may determine the relative position between characters according to the position of each character. The relative position between the characters may be a relative position between every two characters.

In one example, the characters displayed on the nixie tube are "5", "3", "0", "6", "0", and the positions of "5", "3", "0", "6", and "0" on the area image of the nixie tube can be determined first. Then, according to the position of the 5 and the position of the 3, determining the relative position between the 5 and the 3; and determining a relative position between "3" and "0" based on the position of "3" and the position of "0"; and a position of "0" and a position of "6", determining a relative position between "0" and "6"; and the position of "6" and the position of "0", the relative position between "6" and "0" is determined.

After the area image of the digital display screen is identified, the area image of the digital display screen is projected in the horizontal direction, a projection histogram is generated by taking the column number of the image as the horizontal axis and the number of pixel points of each corresponding column as the vertical axis, the projection histogram is scanned sequentially, the left and right boundaries of each character are extracted sequentially, the coordinates of each character (namely, the position of each character) can be obtained by combining the upper and lower boundaries of the character, and then the character is segmented.

And step 306, combining each character according to the relative position between the characters to obtain a character recognition result of the image to be recognized.

In this embodiment, the execution main body may merge the characters according to the relative positions between the characters to obtain a character recognition result of the image to be recognized. The merging may be displaying the characters together according to the relative positions between the characters.

In one example, after determining the relative position between "5" and "3", the relative position between "3" and "0", the relative position between "0" and "6", and the relative position between "6" and "0", the "5", "3", "0", "6", and "0" are combined to obtain "53060", which is the character recognition result of the image to be recognized, according to the relative position between "5" and "3", the relative position between "3" and "0", the relative position between "0" and "6", and the relative position between "6" and "0".

In the present embodiment, the specific operations of steps 301-304 have been described in detail in steps 201-204 in the embodiment shown in fig. 2, and are not described herein again.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the image recognition method in the present embodiment highlights the step of obtaining the character recognition result of the image to be recognized. Therefore, the scheme described in this embodiment determines the relative position between the characters according to the position of each character, and then combines the characters according to the relative position between the characters to obtain the character recognition result of the image to be recognized. And further improves the accuracy of obtaining character recognition results.

In some optional implementation manners of this embodiment, recognizing the area image of the digital display screen to obtain each character and a position of each character in the area image of the digital display screen may include: and inputting the area image of the digital display screen into the image recognition model to obtain each character and the position of each character in the area image of the digital display screen.

In this implementation, the execution subject may input the area image of the digital display screen into the image recognition model, and obtain each character and a position of each character in the area image of the digital display screen. The image recognition model may be the image recognition model described in step 202 (e.g., the reading model described above).

In one example, the characters displayed on the digital display screen can be identified by using the Faster R-CNN of ResNet-50, and the position of each character (e.g., number and decimal point) is identified one by one from top to bottom and from left to right according to the rectangular frame coordinate position of each character, so as to obtain the character identification result of the characters displayed on the digital display screen.

In this implementation, each character and the location of each character in the area image of the digital display screen may be determined by an image recognition model.

In some optional implementations of this embodiment, the image recognition model includes: fast R-CNN (Region-based Convolutional Neural Network), yolo (you Only Look one), KNN (k-Nearest Neighbor), or Support Vector Machine (SVM).

In this implementation manner, the determination of the region of the digital display screen in the image to be recognized can be realized through the image recognition model, so as to improve the recognition accuracy.

In some optional implementations of this embodiment, acquiring the image to be recognized includes: and shooting an image to be identified through a camera of the terminal equipment arranged on the side of the digital display screen.

In this implementation manner, the image to be recognized may be captured by a camera of the terminal device disposed on the digital display screen side. The camera may be connected to a terminal device (e.g.,

terminal devices

101 and 102 shown in fig. 1) via a network (e.g., network 103 shown in fig. 1).

Here, the camera of the terminal device is disposed on the digital display screen side, for example, in front of the digital display screen within a preset distance.

In one example, the digital display screen may be disposed beside the road; for example, a digital display screen displaying driving speed, and the displayed driving speed is used for reminding traffic participants of paying attention to the adjustment of driving.

After the image to be identified is shot by the camera, the terminal equipment is transmitted to the gateway through a network (such as the network 103 shown in fig. 1); the gateway transmits the identification image to a server (such as the server 104 shown in fig. 1); then, the server identifies the characters displayed on the digital display screen in the image to be identified to obtain a character identification result; then, the server sends the character recognition result to the terminal device through the gateway, and the display screen of the terminal device displays the character recognition result.

It should be noted that, in the process of sending the image to be recognized to the server, the identifier of the terminal device may also be sent to the server together, so that after the server obtains the character recognition result, the character recognition result is sent to the terminal device. The identifier of the terminal device may include a device number of the terminal device, a region number (e.g., a road number, a cell number) where the camera is located, and the like.

Optionally, the image to be recognized may be transmitted to a cloud control platform for recognition and storage, so as to improve the recognition speed.

Optionally, after the identifier of the terminal device is sent to the server, the server may store the character recognition result corresponding to each terminal device according to the identifier of the terminal device, so that the character displayed on the digital display screen in a preset time period may be analyzed subsequently, and it is determined whether the device in the above scenario operates normally; or, based on the analysis of characters displayed on the digital display screen within a preset time period, the working state corresponding to the next moment of the equipment under the scene is predicted, and when the predicted operation is abnormal, early warning is performed so as to play a role in early intervention.

It should be noted that the number of cameras may be set according to a specific scene.

In this implementation manner, the execution main body may capture an image to be recognized through a camera of a terminal device disposed on the digital display screen side.

With further reference to fig. 4, fig. 4 shows a schematic view of an embodiment of an application scenario of the image recognition method according to the present disclosure. In this application scenario, a camera of a terminal device (e.g.,

terminal devices

101, 102 shown in fig. 1) 401 is used to capture an image to be recognized. Thereafter, the image to be recognized is transmitted by the terminal apparatus 401 to the gateway 402 through a network (e.g., the network 103 shown in fig. 1). And then transmitted by gateway 402 to server (e.g., server 104 shown in fig. 1) 403. Then, the server 403 determines the position of the digital display screen in the image to be recognized by using a pre-trained image recognition model; then, based on the position of the digital display screen, dividing the area image of the digital display screen from the image to be identified; identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen; and obtaining a character recognition result of the image to be recognized according to each character and the position of each character. Thereafter, the character recognition result is transmitted to the gateway 402 by the server 403. Thereafter, the gateway 402 transmits the character recognition result to the terminal apparatus 401, and the display screen of the terminal apparatus 401 displays the character recognition result.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an image recognition apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the image recognition apparatus 500 of the present embodiment may include: an image acquisition module 501, a position determination module 502, an image segmentation module 503, a position obtaining module 504 and a result obtaining module 505. The image obtaining module 501 is configured to obtain an image to be identified; a position determination module 502 configured to determine a position of a digital display screen in an image to be recognized using a pre-trained image recognition model; an image segmentation module 503 configured to segment an area image of the digital display screen from the image to be recognized based on the position of the digital display screen; a position obtaining module 504 configured to identify the area image of the digital display screen to obtain each character and a position of each character in the area image of the digital display screen; and a result obtaining module 505 configured to obtain a character recognition result of the image to be recognized according to each character and the position of each character.

In the present embodiment, in the image recognition apparatus 500: the specific processing and the technical effects of the image obtaining module 501, the position determining module 502, the image dividing module 503, the position obtaining module 504 and the result obtaining module 505 can refer to the related descriptions of step 201 and step 205 in the corresponding embodiment of fig. 2, and are not described herein again. The location determining module 502 and the location obtaining module 504 may be the same module.

In some optional implementations of this embodiment, the result obtaining module 505 is further configured to: determining the relative position between the characters according to the position of each character; and combining each character according to the relative position between the characters to obtain a character recognition result of the image to be recognized.

In some optional implementations of this embodiment, the location obtaining module 504 is further configured to: and inputting the area image of the digital display screen into the image recognition model to obtain each character and the position of each character in the area image of the digital display screen.

In some optional implementations of this embodiment, the image recognition model includes: faster R-CNN, YOLO, KNN, or support vector machine SVM.

In some optional implementations of this embodiment, the image acquisition module 501 is further configured to: and shooting an image to be identified through a camera of the terminal equipment arranged on the side of the digital display screen.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device, an image recognition system, a readable storage medium, a computer program product, and a cloud control platform.

The image recognition system can comprise the terminal equipment, the camera of the terminal equipment and the electronic equipment.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as an image recognition method. For example, in some embodiments, the image recognition method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the image recognition method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the image recognition method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

In the context of the present disclosure, a cloud control platform performs processing at a cloud end, and the cloud control platform includes the electronic device described above, which can acquire data, such as pictures and videos, of a sensing device (such as a roadside camera), so as to perform image video processing and data calculation; the cloud control platform can also be called a vehicle-road cooperative management platform, an edge computing platform, a cloud computing platform, a central system, a cloud server and the like.

Artificial intelligence is the subject of studying computers to simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural voice processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions mentioned in this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. An image recognition method, comprising:

acquiring an image to be identified;

determining the position of a digital display screen in the image to be recognized by utilizing a pre-trained image recognition model;

segmenting an area image of the digital display screen from the image to be identified based on the position of the digital display screen;

identifying the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen;

and obtaining a character recognition result of the image to be recognized according to each character and the position of each character.

2. The method according to claim 1, wherein the obtaining of the character recognition result of the image to be recognized according to each character and the position of each character comprises:

determining the relative position between the characters according to the position of each character;

and combining the characters according to the relative positions of the characters to obtain a character recognition result of the image to be recognized.

3. The method of claim 1 or 2, wherein identifying the area image of the digital display screen to obtain each character and a position of each character in the area image of the digital display screen comprises:

and inputting the area image of the digital display screen into the image recognition model to obtain each character and the position of each character in the area image of the digital display screen.

4. The method of any of claims 1-3, wherein the image recognition model includes at least one of: faster R-CNN, YOLO, KNN, or support vector machine SVM.

5. The method according to any one of claims 1-4, wherein the acquiring an image to be identified comprises:

and shooting the image to be identified through a camera of the terminal equipment arranged at the side of the digital display screen.

6. An image recognition apparatus comprising:

an image acquisition module configured to acquire an image to be recognized;

the position determining module is configured to determine the position of a digital display screen in the image to be recognized by utilizing a pre-trained image recognition model;

the image segmentation module is configured to segment an area image of the digital display screen from the image to be identified based on the position of the digital display screen;

the position obtaining module is configured to identify the area image of the digital display screen to obtain each character and the position of each character in the area image of the digital display screen;

and the result obtaining module is configured to obtain a character recognition result of the image to be recognized according to each character and the position of each character.

7. The apparatus of claim 5, wherein the results obtaining module is further configured to:

8. The apparatus of claim 6 or 7, wherein the location derivation module is further configured to:

9. The apparatus of any of claims 6-8, wherein the image recognition model comprises: faster R-CNN, YOLO, KNN, or support vector machine SVM.

10. The apparatus of any of claims 6-9, wherein the image acquisition module is further configured to;

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-5.

14. An image recognition system comprising a camera of a terminal device, a gateway, and the electronic device of claim 11.

15. A cloud controlled platform comprising the electronic device of claim 11.