CN111091022A

CN111091022A - Machine vision efficiency evaluation method and system

Info

Publication number: CN111091022A
Application number: CN201811235451.8A
Authority: CN
Inventors: 林昀廷; 萧妟如; 刘书承; 林承毅
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2018-10-23
Filing date: 2018-10-23
Publication date: 2020-05-01

Abstract

The embodiment of the invention provides a machine vision performance evaluation method and system. The method comprises the following steps: obtaining an image, wherein the image presents a plurality of objects and the objects comprise a first object and a second object; performing image recognition on the image via machine vision to obtain a prediction box corresponding to at least one of the first object and the second object; merging a first standard frame corresponding to the first object and a second standard frame corresponding to the second object to obtain a third standard frame; and obtaining assessment information according to the third standard box and the prediction box, wherein the assessment information reflects the prediction efficiency of the machine vision on the object in the image.

Description

Machine vision efficiency evaluation method and system

Technical Field

The present invention relates to image recognition technologies, and in particular, to a method and a system for evaluating performance of machine vision.

Background

Image recognition technology has been developed more and more, but the performance evaluation mechanism for machine vision is still insufficient. For example, when multiple objects are encountered in an image that are clustered together, machine vision may select the objects over a large range. However, in the earliest evaluation mechanisms, the machine vision performance may be considered poor because the objects are not marked one by one. Furthermore, according to the specification of OpenImage proposed by Google (Google), although multiple objects grouped together may be considered to be successfully recognized at one time, it is still impossible to specifically distinguish that only a few of the objects are successfully recognized and scored.

Disclosure of Invention

The invention provides a machine vision performance evaluation method and system, which can improve the problems.

The embodiment of the invention provides a machine vision performance evaluation method, which comprises the following steps: obtaining an image, wherein the image presents a plurality of objects and the objects comprise a first object and a second object; performing image recognition on the image via machine vision to obtain a prediction box corresponding to at least one of the first object and the second object; merging a first standard frame corresponding to the first object and a second standard frame corresponding to the second object to obtain a third standard frame; and obtaining assessment information according to the third standard box and the prediction box, wherein the assessment information reflects the prediction efficiency of the machine vision on the object in the image.

The embodiment of the invention also provides a machine vision performance evaluation system which comprises a storage device, an image identification module and a processor. The storage device stores an image, wherein the image presents a plurality of objects and the objects comprise a first object and a second object. The processor is connected with the storage device and the image identification module. The image recognition module performs image recognition on the image via machine vision to obtain a prediction box corresponding to at least one of the first object and the second object. The processor merges a first standard box corresponding to the first object with a second standard box corresponding to the second object to obtain a third standard box, and the processor obtains evaluation information according to the third standard box and the prediction box, wherein the evaluation information reflects the prediction efficiency of the machine vision on the object in the image.

Based on the above, after obtaining an image including a first object and a second object, image recognition may be performed on the image via machine vision to obtain a prediction frame corresponding to at least one of the first object and the second object. Then, the first standard frame corresponding to the first object and the second standard frame corresponding to the second object may be merged to obtain a third standard frame. Evaluation information may be obtained according to the third criteria box and the prediction box. The assessment information reflects a prediction efficiency of the machine vision for the object in the image. Therefore, the defect that the identification of the clustered objects by the existing machine vision cannot be scored effectively can be effectively overcome.

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

Fig. 1 is a schematic diagram of a performance evaluation system of machine vision according to an embodiment of the invention.

FIG. 2 is a schematic diagram of a target image shown in accordance with an embodiment of the present invention.

FIG. 3 is a diagram illustrating a standard block and a prediction block according to an embodiment of the invention.

FIG. 4 is a diagram illustrating a merge criteria box, according to an embodiment of the invention.

FIG. 5 is a diagram illustrating a merge criteria box, according to an embodiment of the invention.

FIG. 6 is a diagram illustrating a standard block and a prediction block according to an embodiment of the invention.

Fig. 7 is a schematic diagram of evaluation information according to an embodiment of the present invention.

Fig. 8 is a flowchart illustrating a performance evaluation method of machine vision according to an embodiment of the invention.

[ notation ] to show

10: machine vision performance evaluation system

11: storage device

12: image identification module

13: processor with a memory having a plurality of memory cells

21: image of a person

201-206: article

301-304: prediction box

311 to 316, 401, 402, 501, 502: standard frame

510-540: region(s)

71: evaluating information

S801 to S804: step (ii) of

Detailed Description

Fig. 1 is a schematic diagram of a performance evaluation system of machine vision according to an embodiment of the invention. Referring to fig. 1, a system (also referred to as a performance evaluation system for machine vision) 10 includes a storage device 11, an image recognition module 12, and a processor 13. In one embodiment, the system 10 can be implemented in an electronic device with image processing and computing functions, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, or an industrial computer. In one embodiment, the system 10 may include a plurality of independent electronic devices that may be connected to each other, either by wire or wirelessly. For example, in one embodiment, the storage device 11 and the image recognition module 12 may be implemented in a local device, and the processor 13 may be located in a remote server. The local device and the remote server may be connected via the Internet or a local area network.

The storage device 11 is used for storing one or more images and other data. For example, the storage device 11 may include a volatile storage medium and a non-volatile storage medium. The volatile storage medium may be a Random Access Memory (RAM), and the non-volatile storage medium may be a Read Only Memory (ROM), a Solid State Disk (SSD), or a conventional hard disk (HDD), etc.

The image recognition module 12 is configured to perform image recognition on the image stored in the storage device 11 through machine vision to recognize the target object in the image. The image recognition module 12 may be implemented as a software module, a firmware module, or a hardware circuit. For example, in one embodiment, the image recognition module 12 may include at least one Graphics Processing Unit (GPU) or similar processing chip to perform machine vision image recognition. Alternatively, in one embodiment, the image recognition module 12 is a program code that can be loaded into the storage device 11 and executed by the processor 13. Furthermore, the image recognition module 12 may have an artificial intelligence architecture such as machine learning and/or deep learning and may continuously improve its image recognition performance through training.

The processor 13 is connected to the storage device 11 and the image recognition module 12. The Processor 13 may be a Central Processing Unit (CPU), or other programmable general purpose or special purpose microprocessor, Digital Signal Processor (DSP), programmable controller, Application Specific Integrated Circuit (ASIC), Programmable Logic Device (PLD), or the like, or a combination thereof.

A certain image (also referred to as a target image) stored in the storage device 11 represents a plurality of objects (also referred to as target objects). The target objects at least comprise a first object and a second object. It should be noted that the shortest distance interval between the first object and the second object in the target image is smaller than a distance threshold. In one embodiment, the processor 13 may determine that the first object and the second object belong to a clustered object in response to the shortest distance interval between the first object and the second object in the target image being smaller than a distance threshold value. In addition, if the shortest distance interval between the first object and the second object in the target image is not less than the distance threshold, the processor 13 may determine that the first object and the second object do not belong to the clustered object.

The image recognition module 12 may perform image recognition on the target image via machine vision to obtain at least one prediction frame corresponding to at least one of the first object and the second object. For example, the first object, the second object, and the prediction frame may each cover an image range (also referred to as a pixel range) in the target image. The location of this prediction box in the target image and the covered image range reflect that the image-recognition module 12 believes, through automatic image recognition by machine vision, that there is one or more target objects to be sought within this image range. The processor 13 may analyze the target image and the recognition result of the image recognition module 12 and generate evaluation information. This evaluation information may reflect the prediction efficiency of the machine vision of the image-recognition module 12 for multiple target objects in the target image.

Specifically, the processor 13 may determine a standard frame (also referred to as a first standard frame) corresponding to the first object and a standard frame (also referred to as a second standard frame) corresponding to the second object in the target image. The first standard frame reflects the real position of the first object in the target image and the covered image range. The second standard frame reflects the real position of the second object in the target image and the covered image range. For example, the processor 13 may automatically determine the position and coverage of a certain standard frame according to the setting information corresponding to the target image. Alternatively, the processor 13 may determine the position and coverage of a certain standard frame according to the operation of the user.

The processor 13 may merge a first standard frame corresponding to the first object with a second standard frame corresponding to the second object to obtain another standard frame (also referred to as a third standard frame). The third standard frame covers at least a part of the image range of the first standard frame and at least a part of the image range of the second standard frame. The processor 13 may then obtain the assessment information based on a third criteria box and the prediction box.

FIG. 2 is a schematic diagram of a target image shown in accordance with an embodiment of the present invention. Referring to fig. 2, an image 21 is an example of a target image. The image 21 includes objects 201-206. Each of the objects 201-206 is a target object to be searched. In the present embodiment, the target object is a baseball pattern as an example. However, in another embodiment, the target object may be other types of object patterns, and the invention is not limited thereto. For example, in one embodiment, the image 21 is an image of an interior of a human body captured by an endoscope, and the objects 201-206 may be patterns of stones (e.g., gallstones or kidney stones).

It is noted that the shortest distance interval between the

objects

201 and 202 in the image 21, the shortest distance interval between the

objects

204 and 205 in the image 21, and the shortest distance interval between the

objects

205 and 206 in the image 21 are all smaller than the distance threshold. Thus, the

objects

201 and 202 belong to a clustered object, and the objects 204-206 also belong to a clustered object. In addition, the shortest distance intervals between the object 203 and other objects are all greater than the distance threshold, so the object 203 does not belong to a clustered object. In another embodiment, the image 21 may not include an object (e.g., the object 203) that does not belong to the clustered object, and the invention is not limited thereto.

FIG. 3 is a diagram illustrating a standard block and a prediction block according to an embodiment of the invention. Referring to fig. 1 to 3, the standard frames 311 to 316 are determined and generated corresponding to the objects 201 to 206, respectively. In addition, after image recognition by machine vision, the prediction frames 301-304 can be determined and generated in sequence. For example, the prediction box 301 may be generated corresponding to at least one of the

objects

201 and 202, the prediction box 302 may be generated corresponding to the object 203, the prediction box 303 may be generated corresponding to a noise pattern in the image 21 (indicating a prediction error), and the prediction box 304 may be generated corresponding to at least one of the objects 204 to 206. That is, the image recognition module 12 may consider that the image ranges covered by the prediction frames 301-304 respectively include at least one target object.

In one embodiment, the prediction blocks 301-304 are generated sequentially, and the generation sequence of the prediction blocks 301-304 reflects the confidence of the image recognition result of the image recognition module 12. For example, the prediction box 301 is generated first, which means that the image recognition module 12 considers that the prediction box 301 contains at least one target object most likely; the prediction block 304 is generated last, which means that the image recognition module 12 considers that the prediction block 304 contains at least one target object with a lower probability than the other prediction blocks 301-303.

In one embodiment, the processor 13 may determine whether the criterion blocks 311 and 312 corresponding to the clustered object belong to a target group (also referred to as a first target group) corresponding to the prediction block 301. For example, the processor 13 may determine whether the standard box 311 belongs to the first target group according to an overlap state between the standard box 311 and the prediction box 301. Further, the processor 13 may determine whether the standard block 312 belongs to the first target group according to an overlapping state between the standard block 312 and the prediction block 301.

In one embodiment, the processor 13 may obtain the image range covered by the standard block 311 and the image range covered by the prediction block 301. The processor 13 may obtain the overlapping state between the standard frame 311 and the prediction frame 301 according to the image range covered by the standard frame 311 and the image range covered by the prediction frame 301. This overlap state reflects the degree of overlap of the image range covered by the standard frame 311 and the image range covered by the prediction frame 301.

In an embodiment, the processor 13 may obtain an overlapping area (also referred to as a first area) between the image range covered by the standard block 311 and the image range covered by the prediction block 301. In an embodiment, the overlapping region between a certain image range and another image range is also referred to as an intersection (intersection set) region. In addition, the processor 13 may obtain an area (also referred to as a second area) of the image range covered by the standard box 311. The processor 13 may then divide the first area by the second area to obtain a value (also referred to as a first value). The processor 13 can determine whether the first value is greater than a predetermined value (also referred to as a first predetermined value). If the first value is greater than the first predetermined value, the processor 13 may determine that the standard box 311 belongs to the first target group. However, if the first value is not greater than the first predetermined value, the processor 13 may determine that the standard box 311 does not belong to the first target group. In this embodiment, the first value is greater than the first predetermined value, so the processor 13 can determine that the standard box 311 belongs to the first target group corresponding to the prediction box 301. In a similar manner, the processor 13 may determine that the criterion box 312 also belongs to the first target group corresponding to the prediction box 301 and determine that the

criterion boxes

314 and 315 belong to the target group (also referred to as the second target group) corresponding to the prediction box 304.

It is noted that, in one embodiment, the processor 13 may determine that the standard box 316 does not belong to the second target group according to the overlapping status between the standard box 316 and the prediction box 304. For example, according to the overlapping state between the standard box 316 and the prediction box 304, the processor 13 may obtain the overlapping area (also referred to as a third area) between the image range covered by the standard box 316 and the image range covered by the prediction box 304. Further, the processor 13 may obtain the area of the image range covered by the standard box 316 (also referred to as a fourth area). The processor 13 may then divide the third area by the fourth area to obtain a value (also referred to as a second value). In this embodiment, the second value is not greater than the first predetermined value, so the processor 13 may determine that the standard box 316 does not belong to the second target group corresponding to the prediction box 304.

FIG. 4 is a diagram illustrating a merge criteria box, according to an embodiment of the invention. Referring to fig. 1 to 4, in response to the

standard boxes

311 and 312 both belonging to the first target group corresponding to the prediction box 301, the processor 13 may merge the

standard boxes

311 and 312 into one standard box 401. In addition, in response to both

standard boxes

314 and 315 belonging to the second target group corresponding to the prediction box 304, the processor 13 may merge the

standard boxes

314 and 315 into one standard box 402. It should be noted that in the present embodiment, the standard box 316 does not belong to the second target group corresponding to the prediction box 304, so the merge operation for generating the standard box 402 does not include merging the standard box 316.

It should be noted that, in the embodiment of fig. 4, the image range covered by the standard frame 401 (only) includes the original image range covered by the standard frames 311 and 312. In addition, the image range covered by the standard box 402 (only) includes the original image range covered by the

standard boxes

314 and 315. However, in another embodiment, the operation of merging the first standard box and the second standard box further includes merging a partial image region in the target image that does not belong to the first standard box and/or the second standard box.

FIG. 5 is a diagram illustrating a merge criteria box, according to an embodiment of the invention. Referring to fig. 3 and 5, in the present embodiment, in response to the

standard boxes

311 and 312 belonging to the first target group corresponding to the prediction box 301, the

standard boxes

311 and 312 may be merged into one standard box 501, and the

areas

510 and 520 may also be merged into a part of the standard box 501. The

regions

510 and 520 are adjacent to at least one of the

standard boxes

311 and 312. The

regions

510 and 520 do not fall within the coverage of the

standard boxes

311 and 312. In addition, in response to both

standard boxes

314 and 315 belonging to the second target group corresponding to the prediction box 304, the

standard boxes

314 and 315 may be merged into one standard box 502, and the

regions

530 and 540 may also be merged as part of the standard box 502.

Regions

530 and 540 are adjacent to at least one of

standard boxes

314 and 315. The

regions

530 and 540 do not fall within the coverage of

standard boxes

314 and 315.

FIG. 6 is a diagram illustrating a standard block and a prediction block according to an embodiment of the invention. It should be noted that in the embodiment of fig. 6, the standard blocks 501 and 502 of fig. 5 are used as an example of the merged standard block. However, in another embodiment of fig. 6, the standard blocks 401 and 402 of fig. 4 can be used as an example of the merged standard block, and the invention is not limited thereto.

Referring to fig. 1, 2 and 6, the processor 13 may obtain the evaluation information according to the standard blocks 501, 313, 316 and 502 and the prediction blocks 301 to 304. The evaluation data may reflect the machine vision's prediction efficiency for the objects 201-206 in the image 21. Taking prediction block 301 as an example, processor 13 may obtain the prediction state of prediction block 301 from the overlap state between standard block 501 and prediction block 301. For example, this overlap state reflects the degree of overlap of the image range covered by the standard box 501 and the image range covered by the prediction box 301.

In an embodiment, the processor 13 may obtain an area of an intersection region (also referred to as a fifth area) between the image range covered by the standard box 501 and the image range covered by the prediction box 301. Further, the processor 13 may obtain an area (also referred to as a sixth area) of a union (sets) region between the image range covered by the standard box 501 and the image range covered by the prediction box 301. In the present embodiment, the area of this union region is equal to the area of the prediction block 301. The processor 13 may then divide the fifth area by the sixth area to obtain a value (also referred to as a third value). The processor 13 can determine whether the third value is greater than a predetermined value (also referred to as a second predetermined value). If the third value is greater than the second predetermined value, the processor 13 may determine that the

objects

201 and 202 have been found by the machine vision. However, if the third value is not greater than the second predetermined value, the processor 13 may determine that the

objects

201 and 202 are not found by the machine vision. In the present embodiment, the third value is greater than the second predetermined value, so the processor 13 can obtain a prediction status corresponding to the prediction box 301 to reflect that the

objects

201 and 202 have been found by the machine vision.

Taking the prediction block 302 as an example, the processor 13 may obtain an area of an intersection region (also referred to as a seventh area) between the image range covered by the standard block 313 and the image range covered by the prediction block 302. Further, the processor 13 may obtain an area (also referred to as an eighth area) of the union region between the image range covered by the standard block 313 and the image range covered by the prediction block 302. The processor 13 may then divide the seventh area by the eighth area to obtain a value (also referred to as a fourth value). The processor 13 may determine whether the fourth value is greater than the second predetermined value. In the present embodiment, the fourth value is greater than the second predetermined value, so the processor 13 can obtain a predicted state corresponding to the prediction box 302 to reflect that the object 203 has been found by the machine vision.

Taking the prediction block 303 as an example, the prediction block 303 does not cover any standard block, so the processor 13 may obtain a prediction status corresponding to the prediction block 303 to reflect that the prediction block 303 does not find any target object. Further, taking prediction block 304 as an example, processor 13 may obtain the prediction state of prediction block 304 according to the overlap state between standard block 502 and prediction block 304. In the present embodiment, the prediction state of the prediction box 304 reflects that the

objects

204 and 205 have been found by machine vision. In addition, the processor 13 may determine that the object 206 is not found by machine vision based on the overlap between the standard block 316 and the prediction block 304. Based on the prediction states corresponding to the prediction blocks 301-304, the processor 13 can obtain evaluation information reflecting the efficiency of machine vision in predicting the objects 201-206 in the image 21.

Fig. 7 is a schematic diagram of evaluation information according to an embodiment of the present invention. Referring to fig. 1 to 3, 6 and 7, the processor 13 may update the first type parameters and the second type parameters according to the prediction states corresponding to the prediction blocks 301 to 304 and the generation order (i.e., the prediction order) of the prediction blocks 301 to 304. The processor 13 may then obtain evaluation information 71 based on the first type of parameter and the second type of parameter. In an embodiment, the first type of parameter is also referred to as a precision parameter and/or the second type of parameter is also referred to as a recall parameter.

In the present embodiment, the prediction order 0 represents that no prediction frame has been generated yet, and the prediction orders 1 to 4 represent that the prediction frames 301 to 304 are sequentially generated by image recognition through machine vision, respectively. Corresponding to the prediction order 0, no prediction frame is generated, so the first type parameter and the second type parameter are both initial values (e.g., 0).

A prediction block 301 is generated corresponding to the prediction order 1. According to the prediction state of the prediction box 301, the

objects

201 and 202 are found. Therefore, the processor 13 may update the first type parameter 1/1 according to the total number (e.g. 1) of the standard boxes 501 and the total number (e.g. 1) of the generated prediction boxes 301 corresponding to the found

objects

201 and 202. In addition, the processor 13 may update the second type parameter 2/6 according to the total number (e.g., 2) of the original

standard boxes

311 and 312 and the total number (e.g., 6) of the original standard boxes 311-316 corresponding to the found

objects

201 and 202. That is, the variation of the second type parameter is 2/6 corresponding to the prediction order 0 to 1.

A prediction block 302 is generated corresponding to prediction order 2. Based on the prediction state of the prediction block 302, the item 203 is found. Therefore, the processor 13 may update the first type parameter 2/2 according to the total number (e.g. 2) of the

standard boxes

501 and 313 corresponding to the found objects 201 to 203 and the total number (e.g. 2) of the generated

prediction boxes

301 and 302. In addition, the processor 13 may update the second type parameter 3/6 according to the total number (e.g., 3) of the original standard boxes 311 to 313 corresponding to the found objects 201 to 203 and the total number (e.g., 6) of the original standard boxes 311 to 316. That is, the variation of the second type parameter is 1/6 corresponding to the prediction orders 1 to 2.

A prediction block 303 is generated corresponding to prediction order 3. According to the prediction state of the prediction block 303, no target object is found. Therefore, the processor 13 may update the first type parameter 2/3 according to the total number (e.g. 2) of the

standard boxes

501 and 313 corresponding to the found objects 201 to 203 and the total number (e.g. 3) of the generated prediction boxes 301 to 303. Further, the processor 13 may maintain 3/6 the second type parameter. That is, the variation of the second type parameter is 0 corresponding to the prediction orders 2 to 3.

A prediction block 304 is generated corresponding to prediction order 4. Based on the prediction state of the prediction block 304, the

objects

204 and 205 are found. Therefore, the processor 13 may update the first type parameter 3/4 according to the total number (e.g., 3) of the

standard boxes

501, 313 and 502 corresponding to the found objects 201 to 205 and the total number (e.g., 4) of the generated prediction boxes 301 to 304. In addition, the processor 13 may update the second type parameter 5/6 according to the total number (e.g., 5) of the original standard boxes 311-315 corresponding to the found objects 201-205 and the total number (e.g., 6) of the original standard boxes 311-316. That is, the variation of the second type parameter is 2/6 corresponding to the prediction orders 3 to 4.

The processor 13 may multiply each variation of the second type parameter by the corresponding first type parameter and obtain the evaluation information 71 according to the sum of the multiplication results. For example, the processor 13 may obtain the accuracy information AP of 0.75 according to the following equation (1). The accuracy information AP may reflect that the recognition accuracy (or prediction efficiency) of the image recognition module 12 for the objects 201-206 including clustered objects (non-clustered objects) is about 75%.

The accuracy information AP (e.g., 0.75) of the evaluation information 71 may more accurately reflect the recognition accuracy of the image recognition module 12 for both clustered and non-clustered objects, compared to the conventional accuracy calculation method without considering the clustering effect and the OpenImage algorithm proposed by Google. In one embodiment, the evaluation information 71 may be used to maintain or modify the image recognition algorithms and/or artificial intelligence modules employed by the image recognition module 12.

Fig. 8 is a flowchart illustrating a performance evaluation method of machine vision according to an embodiment of the invention. Referring to fig. 8, in step S801, an image is obtained. The image presents a plurality of objects (i.e., target objects) and the objects include a first object and a second object. In step S802, image recognition is performed on the image via machine vision to obtain a prediction box corresponding to at least one of the first object and the second object. In step S803, the first standard frame corresponding to the first object and the second standard frame corresponding to the second object are merged to obtain a third standard frame. In step S804, evaluation information is obtained according to the third standard box and the prediction box. The assessment information reflects a prediction efficiency of the machine vision for a target object in the image.

In summary, after obtaining an image including a plurality of target objects, image recognition may be performed on the image via machine vision to obtain a prediction box corresponding to at least one of the first object and the second object. Then, the first standard frame corresponding to the first object and the second standard frame corresponding to the second object may be merged to obtain a third standard frame. Evaluation information may be obtained according to the third criteria box and the prediction box. The assessment information reflects a prediction efficiency of the machine vision for the object in the image. Therefore, the defect that the identification of the clustered objects by the existing machine vision cannot be scored effectively can be effectively overcome. Furthermore, the evaluation information may be used to maintain or modify image recognition algorithms and/or artificial intelligence modules employed by the image recognition module, thereby improving image recognition techniques and/or image recognition devices.

Although the present invention has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the invention.

Claims

1. A method for machine vision performance assessment, comprising:

obtaining an image, wherein the image presents a plurality of objects and the plurality of objects comprises a first object and a second object;

performing image recognition on the image via machine vision to obtain a prediction box corresponding to at least one of the first object and the second object;

merging a first standard frame corresponding to the first object and a second standard frame corresponding to the second object to obtain a third standard frame; and

obtaining assessment information from the third criteria box and the prediction box, wherein the assessment information reflects a prediction efficiency of the machine vision for the plurality of objects in the image.

2. The method of claim 1, wherein a shortest distance between the first object and the second object in the image is less than a distance threshold.

3. The machine-vision performance evaluation method of claim 1, wherein merging the first standard box corresponding to the first object with the second standard box corresponding to the second object to obtain the third standard box comprises:

judging whether the first standard frame belongs to a target group or not;

judging whether the second standard frame belongs to the target group; and

merging the first standard box and the second standard box to obtain the third standard box in response to the first standard box and the second standard box both belonging to the target group.

4. The method of claim 3, wherein the step of determining whether the first criteria box belongs to the target group comprises:

and judging that the first standard box belongs to the target group according to the overlapping state between the first standard box and the prediction box.

5. The machine-vision performance evaluation method of claim 1, wherein merging the first standard box corresponding to the first object with the second standard box corresponding to the second object to obtain the third standard box comprises:

merging partial areas of the image not belonging to the first standard frame and the second standard frame into a part of the third standard frame.

6. The machine-vision performance evaluation method of claim 1, wherein the step of obtaining the evaluation information according to the third criterion block and the prediction block comprises:

obtaining a predicted state of the predicted box according to an overlap state between the third standard box and the predicted box, wherein the predicted state reflects that both the first object and the second object are found by the machine vision; and

and obtaining the evaluation information according to the prediction state.

7. The machine-vision performance evaluation method of claim 6, wherein the step of obtaining the evaluation information based on the predicted status comprises:

updating the first type of parameters and the second type of parameters according to the prediction state and the generation sequence of the prediction frame; and

and obtaining the evaluation information according to the first type of parameters and the second type of parameters.

8. The machine-vision performance assessment method of claim 7, further comprising:

updating the first type parameter according to the total number of the prediction frames; and

and updating the second type of parameters according to the total number of the first standard boxes and the second standard boxes.

9. A machine-vision performance assessment system, comprising:

the storage device stores an image, wherein the image presents a plurality of objects and the plurality of objects comprise a first object and a second object;

an image recognition module; and

a processor connected with the storage device and the image recognition module,

wherein the image recognition module performs image recognition on the image via machine vision to obtain a prediction box corresponding to at least one of the first object and the second object,

the processor merges a first standard frame corresponding to the first object and a second standard frame corresponding to the second object to obtain a third standard frame, and

the processor obtains evaluation information according to the third criterion box and the prediction box, wherein the evaluation information reflects prediction efficiency of the machine vision on the plurality of objects in the image.

10. The machine-vision performance evaluation system of claim 9, wherein a shortest distance between the first object and the second object in the image is less than a distance threshold.

11. The machine-vision performance evaluation system of claim 9, wherein the operation of the processor merging the first standard box corresponding to the first object with the second standard box corresponding to the second object to obtain the third standard box comprises:

judging whether the first standard frame belongs to a target group or not;

judging whether the second standard frame belongs to the target group; and

12. The machine-vision performance evaluation system of claim 11, wherein the processor determining whether the first criteria box belongs to the target group comprises:

13. The machine-vision performance evaluation system of claim 9, wherein the operation of the processor merging the first standard box corresponding to the first object with the second standard box corresponding to the second object to obtain the third standard box comprises:

14. The machine-vision performance evaluation system of claim 9, wherein the processor obtaining the evaluation information according to the third criteria block and the prediction block comprises:

and obtaining the evaluation information according to the prediction state.

15. The machine-vision performance evaluation system of claim 14, wherein the processor obtaining the evaluation information based on the predicted status comprises:

16. The machine-vision performance evaluation system of claim 15, wherein the processor further updates the first class of parameters based on a total number of the prediction boxes and updates the second class of parameters based on a total number of the first standard boxes and the second standard boxes.