CN111476275A

CN111476275A - Target detection method based on picture recognition, server and storage medium

Info

Publication number: CN111476275A
Application number: CN202010185440.4A
Authority: CN
Inventors: 付美蓉
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2020-03-17
Filing date: 2020-03-17
Publication date: 2020-07-31

Abstract

The invention discloses a target detection method based on picture recognition, which is applied to a server and comprises the steps of receiving an image to be detected uploaded by a client, extracting a corresponding image characteristic diagram, inputting the image characteristic diagram into a target extraction model to output first image data, judging whether the first image data contains at least two first preset target types, and the number of the first target frames is a first preset number, if the condition is met, the position relation between the first target frames is judged, the corresponding judgment result is found out from the database, if the judgment result is the first judgment result, whether the first target frames contain the second target frames is respectively judged, if the judgment result contains the second target frames, whether the first preset data corresponding to the second target frames correspond to the second preset data in the database is respectively identified and judged, and feedback information is generated according to the analysis result and fed back to the client. The method can replace manual work to judge whether the rescue condition of the photo is real, and the judgment efficiency is improved.

Description

Target detection method based on picture recognition, server and storage medium

Technical Field

The invention relates to the technical field of data processing, in particular to a target detection method based on picture recognition, a server and a storage medium.

Background

The insurance company gives the corresponding rescue service to the user who buys the car insurance, and entrusts the rescue provider to provide the rescue service for the user. After the rescue businessman provides rescue service for the user, in order to prove the authenticity of rescue to the insurance company instead of counterfeit behaviors, the rescue businessman needs to take pictures of the scene to obtain evidence after finishing rescue, the shot pictures need to show that the fault car is placed on a trailer and uploaded to the businessman, and the businessman manually judges whether the rescue is real or not.

However, the mode of judging whether the rescue in the photo is true or not by naked eyes not only has low efficiency, but also has the condition of large error in manual judgment, so that the judgment result is inaccurate. Therefore, how to improve the efficiency of identifying the authenticity of the photo becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

The invention mainly aims to provide an object detection method based on picture recognition, a server and a storage medium, aiming at improving the efficiency of recognizing the authenticity of a picture.

In order to achieve the above object, the present invention provides a target detection method based on picture recognition, which is applied to a server, and the method includes:

a receiving step: receiving an image to be detected uploaded by a client, inputting the image to be detected into a pre-trained image data feature extraction model, and obtaining an image feature map corresponding to the image to be detected;

a first judgment step: inputting the obtained image feature map into a pre-trained target extraction model, outputting first image data, and judging whether the first image data contains first target frames of at least two first preset target types, wherein the number of the first target frames of each preset type is a first preset number;

a second judgment step: if the first image data comprises first target frames of at least two first preset target types and the number of the first target frames of each preset type is a first preset number, respectively acquiring the position coordinates of the first target frames, and judging the position relation between the first target frames based on a preset calculation rule;

a third judging step: according to the position relation, finding out a corresponding judgment result from a mapping relation table between the position relation and the judgment result which are pre-established in a database, wherein the judgment result comprises a first judgment result and a second judgment result, the first judgment result shows that the authenticity of the information corresponding to the first image data is to be determined, and the second judgment result shows that the authenticity of the information corresponding to the first image data is not true; and

a feedback step: if the judgment result is a first judgment result, respectively judging whether the first target frame comprises a second target frame, if so, respectively identifying first preset data corresponding to the second target frame, analyzing whether the first preset data correspond to second preset data stored in a database in advance, and generating feedback information according to the analysis result to feed back the feedback information to the client, wherein the analysis result comprises an actual fact and a non-actual fact.

Preferably, the target extraction model is an SSD model, and the determining whether the first image data includes a first target frame of at least two first preset target types includes:

respectively generating corresponding default frames for each pixel point in the image feature map based on the SSD model, acquiring position coordinates of each default frame in the image feature map and probability scores corresponding to different first preset target types, and setting the maximum value in the probability scores of each default frame as a primary confidence coefficient;

sorting the default frames corresponding to the primary confidence degrees from large to small according to probability scores, sequentially obtaining a preset number of default frames as target candidate frames by taking the default frame corresponding to the maximum value of the probability scores as a starting point, and performing bounding box regression analysis based on the position coordinates of each target candidate frame to obtain the area size corresponding to each target candidate frame;

performing softxmax classification on the probability score of each target candidate frame to obtain target confidence coefficients of each target candidate frame corresponding to different preset target type classifications; and

and acquiring a third preset number of target candidate frames with iou (M, b) higher than a preset threshold as the first target frame based on a non-maximum suppression algorithm, wherein M represents a default frame corresponding to the maximum probability score, b represents other default frames except the default frame M in the image feature map, and iou (M, b) represents the overlapping degree between the default frame M and the default frame b.

Preferably, the training process of the target extraction model includes:

acquiring an image feature map sample, respectively generating corresponding default frame samples for each pixel point in the image feature map sample based on the target extraction model, and acquiring the coordinate position of each default frame sample in the image feature map sample and probability scores corresponding to different first preset target types;

respectively calculating the sum of softmax classification loss and bounding box regression loss of each default frame sample based on the position coordinate and probability score of each default frame sample; and

sequencing the sum of the softmax classification loss and the bounding box regression loss from large to small, sequentially acquiring a preset number of default frame samples by taking the default frame sample corresponding to the minimum softmax classification loss and the bounding box regression loss as a starting point, calculating the loss functions of the default frame samples of the preset number, performing back propagation on the calculated loss functions of the default frame samples of the preset number in the target extraction model, updating the weight values of each layer of network of the target extraction model, and training to obtain the target extraction model.

Preferably, the loss function is calculated by the following formula:

wherein, L_conf(x, c) Softmax classification loss, L_loc(x, l, g) is a bounding boxRegression loss, K ═ f_k|*|f_k|*α，|f_kThe method comprises the steps of |, weighting value α, default frame c, default frame position information l and calibration area result g.

Preferably, the feedback step further comprises:

and if the analysis result is not true, performing histogram equalization processing on the image to be detected to obtain second image data, adjusting the second image data to a preset angle, and inputting the second image data into the image feature extraction model in the receiving step again.

In order to achieve the above object, the present invention further provides a server, which includes a memory and a processor, wherein the memory stores an object detection program based on picture recognition, and the object detection program based on picture recognition, when executed by the processor, implements the following steps:

sorting the default frames corresponding to the primary confidence degrees from large to small according to probability scores, sequentially obtaining a second preset number of default frames as target candidate frames by taking the default frame corresponding to the maximum value of the probability scores as a starting point, and performing bounding box regression analysis based on the position coordinates of each target candidate frame to obtain the area size corresponding to each target candidate frame;

Preferably, the training process of the target extraction model includes:

Preferably, the loss function is calculated by the following formula:

wherein, L_conf(x, c) Softmax classification loss, L_loc(x, l, g) is bounding box regression loss, K ═ f_k|*|f_k|*α，|f_kThe method comprises the steps of |, weighting value α, default frame c, default frame position information l and calibration area result g.

To achieve the above object, the present invention further provides a computer readable storage medium, on which an object detection program based on picture recognition is stored, the object detection program based on picture recognition being executable by one or more processors to implement the steps of the object detection method based on picture recognition as described above.

The invention provides a target detection method based on picture recognition, a server and a storage medium, which are characterized in that an image to be detected uploaded by a client is received, an image characteristic diagram is obtained by inputting the image to be detected into an image data characteristic extraction model, first image data is output by inputting the image characteristic diagram into a target extraction model, whether the first image data comprises first target frames of at least two first preset target types is judged, the number of each first target frame is a first preset number, if the first target frames meet the condition, the position coordinates of each first target frame are respectively obtained, the position relation between the first target frames is judged, the corresponding judgment result is found out from a mapping relation table in a database according to the position relation, if the judgment result is the first judgment result, whether the first target frames comprise second target frames is respectively judged, if the first target frames comprise the second target frames, the first preset data corresponding to the second target frames are respectively identified, and judging whether the first preset data corresponds to second preset data in the database or not, and generating feedback information according to the analysis result and feeding the feedback information back to the client. The method can replace manual work to judge whether the rescue situation is real by judging the photo, improves the judgment efficiency, and reduces the error caused by manual judgment.

Drawings

FIG. 1 is a diagram of an application environment of a server according to a preferred embodiment of the present invention;

FIG. 2 is a block diagram of a preferred embodiment of the image recognition-based object detection process of FIG. 1;

fig. 3 is a flowchart illustrating a target detection method based on image recognition according to a preferred embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical embodiments and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, the technical embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the combination of the technical embodiments contradicts each other or cannot be realized, such combination of the technical embodiments should be considered to be absent and not within the protection scope of the present invention.

The invention provides a server 1.

The server 1 includes, but is not limited to, a memory 11, a processor 12, and a network interface 13.

The memory 11 includes at least one type of readable storage medium, which includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 11 may in some embodiments be an internal storage unit of the server 1, for example a hard disk of the server 1. The memory 11 may also be an external storage device of the server 1 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the server 1.

Further, the memory 11 may also include both an internal storage unit of the server 1 and an external storage device. The memory 11 may be used not only to store application software installed in the server 1 and various types of data such as codes of the object detection program 10 based on picture recognition, but also to temporarily store data that has been output or is to be output.

The processor 12 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for executing program codes or Processing data stored in the memory 11, such as executing the target detection program 10 based on picture recognition.

The network interface 13 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), typically used to establish a communication link between the server and other electronic devices.

The client 14 may be a desktop computer, a notebook, a tablet, a cell phone, etc.

The network 15 may be the Internet, a cloud network, a wireless fidelity (Wi-Fi) network, a Personal Area Network (PAN), a local area network (L AN), and/or a Metropolitan Area Network (MAN). various devices in a network environment may be configured to connect to communication networks according to various wired and wireless communication protocols.

Optionally, the server 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and an optional user interface may also comprise a standard wired interface, a wireless interface, optionally, in some embodiments, the Display may be an L ED Display, a liquid crystal Display, a touch-sensitive liquid crystal Display, an O L ED (Organic light-Emitting Diode) touch-sensitive device, and the like, wherein the Display may also be referred to as a Display screen or a Display unit for displaying information processed in the server 1 and for displaying a visualized user interface.

Fig. 1 shows only the server 1 with the components 11-15 and the picture recognition based object detection program 10, and it will be understood by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the server 1, and may comprise fewer or more components than shown, or may combine certain components, or a different arrangement of components.

In the present embodiment, when the object detection program 10 based on picture recognition in fig. 1 is executed by the processor 12, the following steps are implemented:

a receiving step: receiving an image to be detected uploaded by a client 14, inputting the image to be detected into a pre-trained image data feature extraction model, and obtaining an image feature map corresponding to the image to be detected;

a feedback step: if the judgment result is a first judgment result, respectively judging whether the first target frame comprises a second target frame, if so, respectively identifying first preset data corresponding to the second target frame, analyzing whether the first preset data correspond to second preset data stored in a database in advance, and generating feedback information according to the analysis result to feed back the feedback information to the client 14, wherein the analysis result comprises an actual fact and a non-actual fact.

In another embodiment, the feedback step further comprises:

For detailed description of the above steps, please refer to the following description of fig. 2 for a schematic diagram of program modules of an embodiment of the object detection program 10 based on image recognition and fig. 3 for a schematic diagram of a method flow of an embodiment of an object detection method based on image recognition.

Fig. 2 is a schematic diagram illustrating program modules of the image recognition-based object detection program 10 in fig. 1 according to an embodiment. The object detection program 10 based on picture recognition is divided into a plurality of modules, which are stored in the memory 11 and executed by the processor 12, to complete the present invention. The modules referred to herein are referred to as a series of computer program instruction segments capable of performing specified functions.

In this embodiment, the object detection program 10 based on image recognition includes a receiving module 110, a first determining module 120, a second determining module 130, a third determining module 140, and a feedback module 150.

The receiving module 110 is configured to receive an image to be detected uploaded by the client 14, and input the image to be detected into a pre-trained image data feature extraction model to obtain an image feature map corresponding to the image to be detected.

In this embodiment, the server 1 receives an image to be detected uploaded by a client 14 (for example, a camera or other shooting terminal with a shooting function, or an apparatus with a shooting function and an image transmission function), and extracts an image feature map from the image to be detected by using a pre-trained image feature extraction model, in this embodiment, the image feature extraction model is obtained by training a MobileNetV2 network model, the MobileNetV2 network model is a lightweight convolutional neural network structure model, and the MobileNetV2 network model can efficiently and quickly identify an image with low resolution, has the characteristic of small operation occupied bandwidth, and can be loaded on a mobile device for use.

In other embodiments, when training the MobileNetV2 network model, a loss function may be set for the MobileNetV2 network model in advance, a training sample is input into the MobileNetV2 network model, forward propagation is performed on the input training sample to obtain an actual output, a preset target output and the actual output are substituted into the loss function, a loss value of the loss function is calculated, backward propagation is performed, parameters of the MobileNetV2 network model are optimized by using the loss value, and the optimized MobileNetV2 network model is obtained. And then selecting a training sample to be input into the optimized MobileNet V2 network model, and referring to the operation, training the optimized MobileNet V2 network model again until the condition of stopping training is reached.

The first determining module 120 is configured to input the obtained image feature map into a pre-trained target extraction model, output first image data, and determine whether the first image data includes first target frames of at least two first preset target types, where the number of the first target frames of each of the preset types is a first preset number.

In this embodiment, when it is determined that the first image data includes first target frames of at least two first preset target types and the number of the first target frames of each preset type is a first preset number (in this embodiment, the first preset number is 1), it indicates that the image to be detected uploaded by the client 14 meets the requirement, and then the subsequent steps are executed; otherwise, it indicates that the image to be detected uploaded by the client 14 does not meet the requirements, and generates feedback information to be fed back to the client 14.

Wherein the target frames are drawn based on a third-party marking tool (e.g. Rect L abel), and each target frame corresponds to a first preset target type.

The following is further illustrated by specific examples:

for example, the insurance company gives the corresponding rescue service to the user who buys the car insurance, and entrusts the rescue provider to provide the rescue service for the user. After the rescue businessman provides rescue services for users, in order to prove the authenticity of rescue to the insurance company instead of counterfeit behaviors, the rescue businessman needs to take pictures of the scene to obtain evidence and feed the pictures back to the service businessman after completing the rescue.

Therefore, according to the scheme, after a picture shot by a rescue dealer on site is input into the image feature extraction model, an image feature graph of the picture is obtained, then the image feature graph is input into the target extraction model to obtain first image data of the picture, the first image data is analyzed and judged, when the first image data is judged to contain first target frames (at least containing a picture frame of a fault car and a picture frame of a trailer) of at least two first preset target types (such as the fault car and the trailer), and the number of the first target frames of each preset type is a first preset number, the situation that the image to be detected uploaded by the client 14 meets the requirement is indicated, and then the subsequent steps are executed; otherwise, it indicates that the image to be detected uploaded by the client 14 does not meet the requirement, there may be a fake behavior, or the shot picture does not meet the requirement, and generates feedback information to be fed back to the client 14.

And then, inputting the image characteristic diagram into a pre-trained target extraction model to obtain first image data corresponding to the image to be monitored. The target extraction model is an SSD model. In the above step, determining whether the first image data includes the first target frame of at least two first preset target types includes:

The training process of the target extraction model comprises the following steps:

the method comprises the steps that an acquired image feature map sample respectively generates corresponding default frame samples for each pixel point in the image feature map sample based on a target extraction model, and obtains the coordinate position of each default frame sample in the image feature map sample and probability scores corresponding to different first preset target types;

sequencing the sum of the softmax classification loss and the bounding box regression loss from large to small, sequentially obtaining a preset number of default frame samples by taking a default frame sample corresponding to the minimum softmax classification loss and bounding box regression loss as a starting point, calculating loss functions of the preset number of default frame samples, performing back propagation on the calculated loss functions of the preset number of default frame samples in the target extraction model, updating the weight values of each layer of network of the target extraction model, and training to obtain the target extraction model;

the loss function is calculated by the following formula:

The second determining module 130 is configured to, if the first image data includes first target frames of at least two first preset target types and the number of the first target frames of each of the preset types is a first preset number (in this embodiment, the first preset number is 1), respectively obtain position coordinates of each of the first target frames, and determine a position relationship between the first target frames based on a preset calculation rule.

In this embodiment, when the first image data is identified to include the first target frames of at least two first preset target types, the position coordinates of the four vertices of the first target frame output by the target extraction model are respectively obtained, and the position relationship between the first target frames is determined based on the preset calculation rule, taking a faulty vehicle and a trailer as an example, the position relationship may be that the faulty vehicle is above the trailer, or the faulty vehicle is below the trailer, or the faulty vehicle is on the left of the trailer, or the faulty vehicle is on the right of the trailer.

Wherein the calculation rule is as follows: coordinates of each vertex of the first target frame represented by the fault vehicle and the first target frame represented by the trailer are respectively taken, i.e. the left upper coordinate of the faulted car (x1, y1), the right upper coordinate of the faulted car (x2, y2), the left lower coordinate of the faulted car (x4, y4), the right lower coordinate of the faulted car (x3, y3), the left upper coordinate of the trailer (a1, b1), the right upper coordinate of the trailer (a2, b2), the left lower coordinate of the trailer (a4, b4), the right lower coordinate of the trailer (a3, b3), subtracting a1 with x1, subtracting a2 with x2, if the result is positive, negative, it is said that the faulted car is in the middle position of the trailer, meanwhile, if the result is positive by subtracting b1 from y1 and subtracting b2 from y2, the fault vehicle is positioned above the trailer, when meeting above-mentioned two conditions simultaneously, then explain that the trouble parking stall is directly over the trailer, the photo that can tentatively judge that the rescue side uploaded satisfies the requirement that the underwriter collected evidence to the scene.

The third determining module 140 is configured to find a corresponding determining result from a mapping relationship table between a position relationship and a determining result pre-established in a database according to the position relationship, where the determining result includes a first determining result and a second determining result, the first determining result indicates that the authenticity of the information corresponding to the first image data is to be determined, and the second determining result indicates that the authenticity of the information corresponding to the first image data is not to be determined.

In order to further explain the specific scheme of this step, in this embodiment, the above example is continued. And finding out a corresponding judgment result from a mapping relation table between the position relation and the judgment result which are pre-established in the database, wherein the judgment result comprises a first judgment result and a second judgment result. When the faulty vehicle is shown above the trailer in the first image data, it is indicated that the picture uploaded by the rescuer meets the requirement, but since the authenticity of the faulty vehicle or the trailer may have uncertainty, i.e., is pending, feedback information of the first determination result (pending, i.e., the true identity of the faulty vehicle and/or the faulty vehicle is pending) is sent to the client 14, and the feedback information shows that the authenticity of the information corresponding to the first image data is pending. When the first image data does not show that the fault car is placed above the trailer, it indicates that the photos uploaded by the rescuer are not satisfactory, and sends feedback information of a second judgment result (not true) to the client 14.

A feedback module 150, configured to respectively determine whether the first target frame includes a second target frame if the determination result is a first determination result, respectively identify first preset data corresponding to the second target frame if the first target frame includes the second target frame, analyze whether the first preset data corresponds to second preset data pre-stored in a database, and generate feedback information according to an analysis result and feed the feedback information back to the client 14, where the analysis result includes an actual fact and a non-actual fact.

In order to avoid the situation that there may be uncertainty about the authenticity of the faulty vehicle or trailer, for example, a false photo may be uploaded by the rescuer, and the faulty vehicle or trailer in the photo is not taken by the rescuer on site, therefore, in this embodiment, after the position relationship between different first target frames is determined, the first target frames further need to be analyzed, and the specific scheme of this step is continued with the above example. If the judgment result is the first judgment result, namely the picture uploaded by the rescue dealer shows that the fault car is above the trailer, whether the first target frame comprises a second target frame, such as a license plate of the fault car or the trailer, is judged respectively. If the first target frame comprises a second target frame, first preset data (such as a license plate number) corresponding to the second target frame is respectively identified, whether the first preset data correspond to second preset data (such as a car owner name) stored in a database in advance is analyzed, and feedback information is generated according to an analysis result and fed back to the client 14.

In another embodiment, the feedback module 150 is further configured to, if the analysis result is not true, perform histogram equalization on the image to be detected to obtain second image data, adjust the second image data to a preset angle, and then input the second image data into the image feature extraction model in the receiving step again.

In this embodiment, if the analysis result is not true, the photo uploaded by the rescuer may be a counterfeit photo, or the photo uploaded by the rescuer may not meet the identification requirement of the server, for example, the photo is dark in light or not well placed at an angle. Therefore, when the analysis result is not real-time, histogram equalization processing can be performed on the image to be detected to obtain second image data, the second image data is adjusted by a preset angle (for example, 270 degrees, namely, symmetrical inversion), and then the second image data is input into the image feature extraction model in the receiving step again, and the above steps are repeated.

In addition, the invention also provides a target detection method based on the picture identification. Fig. 3 is a schematic method flow diagram of an embodiment of the target detection method based on image recognition according to the present invention. When the processor 12 of the server 1 executes the object detection program 10 based on picture recognition stored in the memory 11, the following steps of the object detection method based on picture recognition are implemented:

s110, receiving the image to be detected uploaded by the client 14, inputting the image to be detected into a pre-trained image data feature extraction model, and obtaining an image feature map corresponding to the image to be detected.

And S120, inputting the obtained image feature map into a pre-trained target extraction model, outputting first image data, and judging whether the first image data contains first target frames of at least two first preset target types, wherein the number of the first target frames of each preset type is a first preset number.

The following is further illustrated by specific examples:

the loss function is calculated by the following formula:

S130, if the first image data includes at least two first target frames of first preset target types, and the number of the first target frames of each of the preset types is a first preset number (in this embodiment, the first preset number is 1), respectively obtaining the position coordinates of each of the first target frames, and determining the position relationship between the first target frames based on a preset calculation rule.

And S140, finding out a corresponding judgment result from a mapping relation table between the position relation and the judgment result which is pre-established in the database according to the position relation, wherein the judgment result comprises a first judgment result and a second judgment result, the first judgment result indicates that the authenticity of the information corresponding to the first image data is to be determined, and the second judgment result indicates that the authenticity of the information corresponding to the first image data is not true.

S150, if the determination result is the first determination result, respectively determining whether the first target frame includes a second target frame, if the first target frame includes the second target frame, respectively identifying first preset data corresponding to the second target frame, analyzing whether the first preset data corresponds to second preset data pre-stored in a database, and generating feedback information according to the analysis result to feed back the feedback information to the client 14, where the analysis result includes an attribute and a non-attribute.

In another embodiment, the method further comprises the steps of:

In addition, the embodiment of the present invention further provides a computer-readable storage medium, which may be any one of or any combination of a hard disk, a multimedia card, an SD card, a flash memory card, an SMC, a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a portable compact disc read only memory (CD-ROM), a USB memory, and the like. The computer-readable storage medium includes an object detection program 10 based on image recognition, and the specific implementation of the computer-readable storage medium of the present invention is substantially the same as the above-mentioned object detection method based on image recognition and the specific implementation of the server 1, and will not be described herein again.

It should be noted that the sequence of the above embodiments of the present invention is only for description, and does not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description of the embodiments of the present invention is for illustrative purposes only and does not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A target detection method based on picture recognition is applied to a server, and is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the object extraction model is an SSD model, and the determining whether the first image data includes a first object frame of at least two first preset object types comprises:

3. The image recognition-based target detection method of claim 2, wherein the training process of the target extraction model comprises:

4. The image recognition-based target detection method of claim 3, wherein the loss function is calculated by the following formula:

5. The target detection method based on picture recognition according to any one of claims 1-4, wherein the feedback step further comprises:

6. A server, comprising a memory and a processor, wherein the memory stores a picture recognition-based object detection program, and the picture recognition-based object detection program when executed by the processor implements the steps of:

7. The server according to claim 6, wherein the object extraction model is an SSD model, and the determining whether the first image data includes a first object box of at least two first preset object types includes:

8. The server of claim 7, wherein the training process of the target extraction model comprises:

9. The server of claim 8, wherein the loss function is calculated by the formula:

10. A computer-readable storage medium, wherein a picture recognition-based object detection program is stored on the computer-readable storage medium, and the picture recognition-based object detection program is executable by one or more processors to implement the steps of the picture recognition-based object detection method according to any one of claims 1 to 5.