WO2024009661A1

WO2024009661A1 - Estimation system, estimation method, and estimation program

Info

Publication number: WO2024009661A1
Application number: PCT/JP2023/020468
Authority: WO
Inventors: 博史峰野; 健太郎平原
Original assignee: 国立大学法人静岡大学
Priority date: 2022-07-04
Filing date: 2023-06-01
Publication date: 2024-01-11

Abstract

An estimation system includes at least one processor. At least one processor acquires a video showing a plant such that the position of the plant changes, performs object detection on each of the plurality of frames of the video to detect instances corresponding to the plant, identifies the transition of the instance including two or more feature points corresponding to two or more nodes of the plant as an instance history, estimates the position of each of the two or more nodes on the basis of the transition of each of the two or more feature points included in the instance history, and estimates an internodal distance, which is a distance between two nodes, on the basis of the estimated two or more positions.

Description

Estimation system, estimation method, and estimation program

One aspect of the present disclosure relates to an estimation system, an estimation method, and an estimation program.

Patent Document 1 and Non-Patent Document 1 describe environmental control systems for plant cultivation. This system includes a distance image acquisition means that outputs a distance image associated with distance information representing the distance to a subject including a plant photographed from above, an environment adjustment means that adjusts the environment in which plants are cultivated, and a distance image and an environment control means for controlling the environment control means based on the detected height of the plant.

Unexamined Japanese Patent Publication No. 2016-52293

The internodal distance, which is the distance between nodes of a plant, is closely related to the growth state of the plant. However, the above-mentioned environmental control system cannot obtain the internodal distance. Therefore, it is desired to accurately understand the internodal distance of plants.

An estimation system according to one aspect of the present disclosure includes at least one processor. The at least one processor acquires an image of the plant such that the position of the plant changes, and performs object detection on each of the plurality of frames of the image to detect an instance corresponding to the plant. , identify the transition of the instance that includes two or more feature points corresponding to two or more nodes of the plant as the instance history, and identify the transition of the two or more nodes based on the transition of each of the two or more feature points included in the instance history. , and based on the estimated two or more positions, the internodal distance, which is the distance between two nodes, is estimated.

In this aspect, the transition of two or more feature points corresponding to two or more nodes of the plant is identified over two or more frames of an image showing the plant, and the positions of those nodes are determined based on the transition. and internodal distances are estimated based on their positions. As the position of the plant changes in the video, each feature point also changes over two or more frames, thus providing multiple pieces of information about each feature point. By referring to multiple pieces of information, the positions of plant nodes can be accurately determined. As a result, it is possible to accurately grasp the internodal distance, which is closely related to the growth state of the plant. .

According to one aspect of the present disclosure, the internodal distance of a plant can be accurately determined.

FIG. 1 is a diagram showing an example of an estimation system. FIG. 3 is a diagram illustrating an example of a video processed by the estimation system. 1 is a diagram showing an example of a hardware configuration of a computer that constitutes an estimation system. It is a flow chart which shows an example of processing by an estimation system. 3 is a flowchart illustrating an example of instance settings. FIG. 3 is a diagram showing an example of an instance. FIG. 2 is a diagram illustrating an example of a method for tracking instances. It is a figure which shows an example of an instance history. 3 is a flowchart illustrating an example of a method for estimating the position of a node. FIG. 3 is a diagram illustrating an example of estimating the position of a node.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the description of the drawings, the same or equivalent elements are denoted by the same reference numerals, and redundant description will be omitted.

[System configuration]
The estimation system 1 according to the present disclosure is a computer system that estimates the growth state of plants. Users of the estimation system 1 refer to the estimation results to understand the degree or quality of plant growth, to appropriately control the cultivation environment such as irrigation control and air conditioning control, and to predict yields. can do. The estimation system 1 can estimate the growth status of various types of plants. The plants may be cultivated or native.

The estimation system 1 estimates the positions of the nodes of the plant as at least part of the growth state. In the present disclosure, a node refers to a portion where a leaf attaches to a stem, and a leaf refers to an entire lateral branch that branches from a stem. Therefore, individual nodes are present on the stem. The estimation system 1 represents the position of a node using position coordinates, for example, represents the position of a node using a coordinate system in which the horizontal direction is the X axis and the height direction is the Y axis. The estimation system 1 may represent the position coordinates indicating the position of the node by the Y coordinate without using the X coordinate, or by the X coordinate and the Y coordinate. The estimation system 1 may further estimate the internodal distance, which is the distance between two nodes, as at least part of the growth state. For example, the estimation system 1 may estimate the distance between two consecutive nodes as the internodal distance. Alternatively, the estimation system 1 may estimate the distance between the first node and the second node, which are located so as to sandwich one or more other nodes, as the internodal distance.

FIG. 1 is a diagram showing an example of an estimation system 1. In this example, the estimation system 1 is connected to a database 2, which is a device that stores images showing one or more plants, via a predetermined communication network. The database 2 may be a component of the estimation system 1 or may be located outside the estimation system 1. In one example, each image stored in the database depicts one or more plants 9 such that each of the plants 9 flows laterally, that is, the position of each plant 9 changes along the lateral direction. Show. "An image showing a plant such that the position of the plant changes" means that the position of the plant changes in the image.

The video may be photographed manually, or may be photographed by the self-propelled camera 3 in order to reduce or eliminate the labor involved in photographing. The self-propelled camera 3 is a device that takes images of one or more plants 9 such that the position of the one or more plants 9 changes. In one example, the self-propelled camera 3 executes photography while moving along a row of a plurality of plants 9 so that both ends of the stems of the plants 9 are captured. For example, the self-propelled camera 3 takes pictures while moving along the length of the cultivation bed. The self-propelled camera 3 transmits the captured video to the database 2 via a predetermined communication network. The self-propelled camera 3 may be a component of the estimation system 1 or may be located outside the estimation system 1.

FIG. 2 is a diagram showing an example of a video processed by the estimation system 1. Video 20 in this example is composed of a plurality of

frames including frames

21, 22, and 23. In FIG. 2, a plurality of frames are sequentially arranged along the traveling direction of the self-propelled camera 3 to express an image 20 like a panoramic image.

Returning to FIG. 1, the estimation system 1 includes an acquisition section 11, a detection section 12, a tracking section 13, and an estimation section 14 as functional components.

The acquisition unit 11 is a functional module that acquires the video to be processed from the database 2.

The detection unit 12 is a functional module that executes object detection, which is a process of detecting an object in an image, on each of a plurality of frames constituting the video, and sets an instance corresponding to a plant in each frame. be. The instance is information indicating the plant 9 shown in the frame. In one example, an instance includes a bounding box corresponding to a stem and one or more feature points corresponding to one or more nodes. For example, an instance includes its bounding box and two or more feature points corresponding to two or more nodes. Feature points are also called key points. The instance may further include feature points corresponding to the apexes of the stems.

In one example, the detection unit 12 may detect instances from individual frames by object detection using a trained model generated by machine learning. The trained model receives an input image and estimates an instance that includes one or more feature points and a bounding box. The detection unit 12 inputs frames to this learned model and detects instances. Machine learning is a method of autonomously discovering laws or rules by iteratively learning based on given information. An example of an object detection algorithm based on machine learning is Mask R-CNN, which uses deep learning.

In one example, a trained model is generated by having a machine learning model repeat learning using training data in which bounding boxes and annotations regarding feature points (i.e., correct answers) are added to each image. In order to reduce the time and effort required to prepare a dataset, the existing trained model is fine-tuned and then machine learning is performed again using a small amount of dataset to create the final trained model. may be generated. Generation of a trained model corresponds to the learning phase. It should be noted that the trained model is a machine learning model that is estimated to be optimal for detecting an object in an image, and is not necessarily an "actually optimal machine learning model." The process in which the detection unit 12 obtains a bounding box and feature points from a frame using a learned model corresponds to the estimation phase or the operation phase. Trained models are portable between computer systems. Therefore, the detection unit 12 may use a trained model generated by another computer system. Alternatively, the estimation system 1 may further include a function of generating a trained model.

The tracking unit 13 is a functional module that tracks the transition of instances of each plant 9 over a plurality of frames and specifies the transition as an instance history. The transition of an instance refers to a change in the position of an instance that occurs over two or more consecutive frames. The transition of the instance includes the transition of the bounding box (change in position) and the transition of each feature point (change in position). The instance history indicates the history of the position of the bounding box and the history of the position of each feature point. The position history can be rephrased as the history of position coordinates.

The estimation unit 14 is a functional module that estimates the position of each of one or more nodes based on the transition of each of one or more feature points included in the instance history. For example, the estimation unit 14 estimates the positions of two or more nodes based on the transitions of two or more feature points included in the instance history.

FIG. 3 is a diagram showing an example of the hardware configuration of the computer 100 that constitutes the estimation system 1. For example, the computer 100 includes a processor 101, a main storage section 102, an auxiliary storage section 103, a communication control section 104, an input device 105, and an output device 106. Processor 101 executes an operating system and application programs. The main storage unit 102 is composed of, for example, a ROM and a RAM. The auxiliary storage unit 103 is composed of, for example, a hard disk or a flash memory, and generally stores a larger amount of data than the main storage unit 102. The communication control unit 104 is composed of, for example, a network card or a wireless communication module. The input device 105 includes, for example, a keyboard, a mouse, a touch panel, and the like. The output device 106 includes, for example, a monitor and a speaker.

The computer 100 may be a stationary or portable personal computer (PC), a workstation, or a mobile terminal such as a high-performance mobile phone (smartphone), a mobile phone, or a personal digital assistant (PDA).

Each functional element of the estimation system 1 is realized by an estimation program 110 stored in advance in the auxiliary storage unit 103. Specifically, each functional element is realized by loading an estimation program 110 onto the processor 101 or the main storage unit 102 and causing the processor 101 to execute the estimation program 110. The processor 101 operates the communication control unit 104, the input device 105, or the output device 106 according to the estimation program 110, and reads and writes data in the main storage unit 102 or the auxiliary storage unit 103. Data or databases required for processing may be stored in the main storage unit 102 or the auxiliary storage unit 103.

The estimation program 110 may be provided after being fixedly recorded on a tangible recording medium such as a CD-ROM, DVD-ROM, or semiconductor memory, for example. Alternatively, the estimation program 110 may be provided via a communication network as a data signal superimposed on a carrier wave.

The estimation system 1 may be composed of one computer 100 or may be composed of multiple computers 100. When a plurality of computers 100 are used, one estimation system 1 is logically constructed by connecting these computers 100 via a communication network such as the Internet or an intranet. The estimation system 1 may be constructed by combining multiple types of computers.

[System operation]
FIG. 4 is a flowchart showing an example of processing by the estimation system 1 as a processing flow S1.

In step S11, the acquisition unit 11 accesses the database 2 and acquires the video to be processed. The acquisition unit 11 may read a video specified by a user operation from the database 2, or may read a video specified based on a predetermined algorithm. The acquisition unit 11 may acquire the entirety of one video, or may acquire a partial section of one video.

In step S12, the detection unit 12 and the tracking unit 13 cooperate to set an instance corresponding to a plant in each of a plurality of frames of the video. An example of this processing will be explained with reference to FIG. FIG. 5 is a flowchart showing an example of setting an instance. In FIG. 5, a variable i is used to identify each frame.

In steps S121 and S122, the detection unit 12 performs object detection on the first frame and detects an instance from that frame. The detection unit 12 can detect two or more instances from one frame. In one example, the detection unit 12 inputs the frame to a trained model based on Mask R-CNN and obtains one or more instances output from the trained model. FIG. 6 is a diagram showing an example of an instance. In this example, the detection unit 12 detects a bounding box 201 corresponding to the stem, a plurality of feature points 202 corresponding to a plurality of nodes existing on the stem, and a plurality of feature points 202 corresponding to the apex of the stem for the plant 9 shown in the frame. The instance 200 including the corresponding feature point 203 is detected.

In step S123, the detection unit 12 eliminates instance duplication. Overlapping instances refer to a situation where another instance is located within the outer edge of a certain instance, for example, a situation where two bounding boxes intersect or overlap. There may be cases where three or more instances are related to one duplication. In one example, the detection unit 12 calculates IoU (Intersection over Union) between instances and determines whether there is any duplication of instances. IoU is the ratio of the overlap (ie, AND area) of two instances to the union of the two instances (ie, OR area). The detection unit 12 may identify instance duplication when the IoU is greater than a given threshold. The detection unit 12 may identify two or more duplicate instances within one frame.

If one or more overlaps exist within one frame, the detection unit 12 eliminates each overlap. In one example, the detection unit 12 obtains a detection score indicating the degree of certainty of detection for each of two or more instances related to one duplication. The detection unit 12 may use a detection score output from a trained model based on Mask R-CNN or the like. The detection unit 12 maintains the instance with the highest detection score and deletes the remaining instances to eliminate duplication. If there is no overlap within the frame, step S123 is omitted.

As shown in step S124, the next process changes depending on whether the first frame is processed or the second and subsequent frames are processed. If the instance in the first frame is set (NO in step S124), the process advances to step S125, and then to step S122. That is, the detection unit 12 performs object detection on the next frame, that is, the second frame, and detects an instance from that frame. In step S123, the detection unit 12 eliminates instance duplication for the second frame as necessary. On the other hand, if the second or subsequent frames have been processed (YES in step S124), the process proceeds to step S126.

In step S126, the tracking unit 13 complements the undetected instances. An "undetected instance" should be detected in the i-th frame considering the change from the (i-1)th frame (first frame) to the i-th frame (second frame). is an instance that was not detected by object detection. In other words, an "undetected instance" refers to a subsequent instance (second instance) that corresponds to the preceding instance (first instance) set in the (i-1)th frame and that was not detected in the i-th frame. instance). In one example, the detection unit 12 applies a method called DaSiamRPN to the (i-1)th frame, and sets undetected instances by interpolation. In this disclosure, an instance obtained by completion is also referred to as a completion instance. The complementary instance is treated as a second instance. The tracking unit 13 sets a bounding box for each complementary instance without setting feature points. Therefore, the complementary instance contains a bounding box that indicates the stem, but no feature points that indicate nodes and vertices.

In step S127, the tracking unit 13 tracks the movement of the instance between the (i-1)th frame and the i-th frame. FIG. 7 is a diagram showing an example of the tracking method. In this example, three first instances 311 to 313 are set in the (i-1)th frame (first frame) 301, and five second instances 321 are set in the i-th frame (second frame) 302. It is assumed that 325 to 325 is set. At least one of the second instances 321-325 may be a complementary instance. In one example, the tracking unit 13 compares the first instances 311 to 313 and the second instances 321 to 325, and associates the instances located closest between these two frames. For example, the tracking unit 13 virtually overlaps the second frame 302 with the first frame 301, and calculates the degree of overlap between the first instance and the second instance as the degree of overlap. Subsequently, for each first instance, the tracking unit 13 selects a second instance corresponding to the first instance based on the degree of duplication. For example, the tracking unit 13 calculates IoU as the degree of overlap, and selects the second instance that has the highest IoU with respect to the first instance. The tracking unit 13 then associates the selected second instance with the first instance as at least part of the instance history. Due to this association, the tracking unit 13 recognizes that the first instance and the second instance are the same, and specifies the transition of the one instance from the first frame to the second frame. As described above, the transition of the instance indicates a change in the position of the bounding box and a change in the position of each of the one or more feature points. In the example of FIG. 7, the tracking unit 13 associates the second instance 321 with the first instance 311 and identifies these two instances as a transition 341 of the instance 331. Using a similar method, the tracking unit 13 identifies the transition 342 of the instance 332 from the first instance 312 and the second instance 322, and identifies the transition 343 of the instance 333 from the first instance 313 and the second instance 323.

The tracking unit 13 treats the isolated instance in the i-th frame (second frame) that is not associated with the instance in the (i-1)-th frame (first frame) as a new instance, that is, as a new instance in the video. It may be recorded as a plant that appeared in Alternatively, the tracking unit 13 may delete the isolated instance as a false detection. The tracking unit 13 may newly record an isolated instance located in or near a region that is not shown in the first frame but shown in the second frame. The tracking unit 13 may delete an isolated instance located in a region shown in both the first frame and the second frame as a false detection. In the example of FIG. 7, the

second instances

324 and 325 are isolated instances. In that example, the tracking unit 13 deletes the second instance 324 and records the second instance 325 as a new instance 334.

The tracking unit 13 excludes instances that are about to deviate from the video from the tracking targets. In one example, the tracking unit 13 sets a predetermined number of pixels (for example, several pixels) inside the edge of the video (frame) as a threshold, and detects instances that have entered the edge of the video (frame) beyond this threshold. May be excluded. Such exclusion is done to eliminate the phenomenon that instances that are no longer visible in the video continue to be erroneously detected and tracked.

As another example regarding steps S126 and S127, the tracking unit 13 may change the execution order of these two steps. That is, the tracking unit 13 tracks the movement of the instance between the (i-1)th frame and the i-th frame, and uses a method such as DaSiamRPN to detect instances in the i-th frame that are not detected in this tracking. It may be supplemented by The tracking unit 13 associates the second instance complemented in the i-th frame with the first instance in the (i-1)th frame used for the complementation. Through this association, the tracking unit 13 identifies the first instance and the second instance as one instance, and specifies the transition of the one instance from the first frame to the second frame.

As shown in step S128, if there is an unprocessed frame (NO in step S128), the process proceeds to step S125, and the processes from step S122 onwards are executed for the next frame. On the other hand, if all frames have been processed (YES in step S128), step S12 ends, and for each instance, an instance including a position history of a bounding box and a position history of each of one or more feature points. History is confirmed. Assuming that the position of an instance obtained in one frame is one record for a certain instance, the instance history of an instance that appears in n consecutive frames is represented by n records.

FIG. 8 is a diagram showing an example of instance history. This example shows the instance history of one instance over five consecutive frames 401-405. This instance history includes six feature points 411 to 416 corresponding to six nodes and a feature point 417 corresponding to the apex of the stem. In frames 401, 403 to 405, instances including feature points are set, which indicates that instances were detected by object detection in those four frames. On the other hand, in frame 402, the instance does not include feature points, which indicates that the instance is a complementary instance. Even if an instance is detected by object detection for a certain frame, not all feature points are necessarily detected in the instance. Focusing on the feature point 411 in FIG. 8, the feature point 411 is detected in

frames

401 and 404, but not in

frames

403 and 405.

Returning to FIG. 4, in step S13, the estimation unit 14 estimates the position of each node for each instance, that is, for each plant. In one example, the estimation unit 14 estimates the position of each plant node while selecting instances one by one from the first frame to the last frame. An example of this processing will be explained with reference to FIG. FIG. 9 is a flowchart showing an example of a method for estimating the position of a node.

In step S131, the estimation unit 14 selects one instance, that is, one plant, from within the video. The estimating unit 14 may identify, for each of one or more instances, the number of frames in which the instance is identified, that is, the number of records in the instance history, and select instances for which the number is equal to or greater than a threshold.

In step S132, the estimation unit 14 refers to the instance history of the instance and identifies the position history of one feature point.

In step S133, the estimation unit 14 removes outliers from the identified position history and calculates statistical values of the set of valid positions, that is, the remaining portion of the position history. In one example, the estimation unit 14 calculates a confidence interval by performing interval estimation using a plurality of position coordinates indicated in the position history as a population, and excludes position coordinates that deviate from this confidence interval as outliers. In this process, the estimation unit 14 may calculate an 80% confidence interval, or may calculate a confidence interval using another numerical range. The estimation unit 14 identifies a set of position coordinates within the confidence interval as a set of valid positions, and calculates statistical values for the set. The estimation unit 14 may use the median value as the statistical value, or may use other indicators such as the average value.

Even for feature points that are not detected in some frames, such as the feature point 411 shown in FIG. 8, the estimation unit 14 executes steps S132 and S133 to calculate statistical values for the feature points. In step S132, the estimation unit 14 identifies the position history of the feature point based on the detection in the remaining frames. In step S133, the estimation unit 14 calculates statistical values based on the position history. Regarding the feature point 411 shown in FIG. 8, the estimation unit 14 identifies the position history of the feature point 411 based on the positions detected in the

frames

401 and 404, and calculates statistical values based on the position history.

In step S134, the estimation unit 14 estimates the position of the node corresponding to the feature point based on the statistical value. The estimation unit 14 may directly estimate the statistical value as the position of the node, or may convert the statistical value into the position of the node by converting the coordinate system of the image to the coordinate system of real space. In this step, the estimation unit 14 can estimate the position of the apex of the stem, as in the case of nodes.

As shown in step S135, if there is an unprocessed feature point in the selected instance (NO in step S135), the process returns to step S132, and the estimation unit 14 calculates the next feature point from step S132 onwards. Then, the position of the node (or stem apex) corresponding to the feature point is estimated. On the other hand, if all feature points of the selected instance have been processed (YES in step S135), the process proceeds to step S136.

FIG. 10 is a diagram showing an example of estimating the position of a node. This example shows position histories 501 to 506 of six feature points for one plant 9. Each of the position histories 501 to 505 corresponds to a node, and the position history 506 corresponds to the apex of a stem. The estimation unit 14 removes outliers from the position history 501 and estimates the node position 511 based on the statistical value (for example, median value) of the remaining part of the position history 501. Similarly, the estimation unit 14 estimates the positions 512 to 515 of the nodes from the position histories 502 to 505, and estimates the position 516 of the apex of the stem from the position history 506.

In step S136, the estimation unit 14 estimates the internodal distance. When estimating the distance between two consecutive nodes as an internodal distance, the estimation unit 14 calculates (n-1) internodal distances based on the position coordinates of n feature points. This step S136 may be omitted.

As shown in step S137, if there is an unprocessed instance in the video (NO in step S137), the process returns to step S131, and the estimation unit 14 performs the steps from step S132 onwards for each feature point of the next instance. A process is performed to estimate the position of each node (and stem apex) and at least one internodal distance. On the other hand, if all instances have been processed (YES in step S137), step S13 ends.

Returning to FIG. 4, in step S14, the estimation unit 14 outputs estimation results indicating the positions of each node of each plant. The estimation unit 14 may output estimation results that further indicate at least one internodal distance of each plant. The estimation unit 14 may display the estimation result on a monitor, store it in a predetermined database, or transmit it to another computer system. Alternatively, the estimation system 1 may control the cultivation environment, such as irrigation control and air conditioning control, based on the estimation result.

[Modified example]
Various examples in the present disclosure have been described above in detail. However, the present disclosure is not limited to the above examples. Various modifications can be made to the present disclosure without departing from the gist thereof.

In the above example, the tracking unit 13 tracks the movement of instances based on the degree of duplication of instances. However, the tracking unit 13 may perform the tracking using another method, for example, using an existing tracking algorithm.

In this disclosure, the expression "at least one processor executes a first process, executes a second process, ... executes an n-th process" or an expression corresponding to this The concept includes a case where the executing entity (that is, the processor) of n processes from the first process to the nth process changes midway. That is, this expression indicates a concept that includes both a case in which all of the n processes are executed by the same processor, and a case in which the processors in the n processes are changed according to an arbitrary policy.

The processing procedure of the method executed by at least one processor is not limited to the example in the above embodiment. For example, some of the steps described above may be omitted, or each step may be performed in a different order. Furthermore, any two or more of the steps described above may be combined, or some of the steps may be modified or deleted. Alternatively, other steps may be performed in addition to each of the above steps.

When comparing the magnitude relationship between two numerical values, either of the two criteria "greater than" or "greater than" may be used, and either of the two criteria "less than or equal to" and "less than" may be used. good.

[Additional notes]
As understood from the various examples above, the present disclosure includes the following aspects.
(Additional note 1)
comprising at least one processor;
the at least one processor,
Obtain a video showing the plant so that the position of the plant changes,
Object detection for detecting an instance corresponding to the plant is performed on each of the plurality of frames of the video, and the transition of the instance including two or more feature points corresponding to two or more nodes of the plant as the instance history,
estimating the position of each of the two or more nodes based on the transition of each of the two or more feature points included in the instance history;
estimating an internodal distance, which is a distance between two nodes, based on the estimated two or more positions;
Estimation system.
(Additional note 2)
The plurality of frames include a first frame and a second frame located next to the first frame,
the at least one processor,
setting a first instance of the plant in the first frame;
detecting one or more second instances corresponding to the first instance in the second frame;
selecting one second instance from the one or more second instances based on the degree of overlap with the first instance;
associating the selected second instance with the first instance as at least part of the instance history;
Estimation system described in Appendix 1.
(Additional note 3)
the instance further includes a bounding box corresponding to a stem of the plant;
the at least one processor, if the second instance is not detected in the second frame;
setting a complementary instance that includes the bounding box and does not include the feature points as the second instance;
associating the configured second instance with the first instance as at least part of the instance history;
Estimation system described in Appendix 2.
(Additional note 4)
The transition of the feature point is expressed by a position history of the feature point,
The at least one processor, for each of the two or more feature points,
Calculating statistical values of the position history;
estimating the position of the corresponding node based on the statistical value;
The estimation system described in any one of Supplementary Notes 1 to 3.
(Appendix 5)
The at least one processor, for each of the two or more feature points,
Excluding outliers from the position history of the feature point to identify a set of valid positions;
calculating the statistical value based on the set of valid locations;
Estimation system described in Appendix 4.
(Appendix 6)
the at least one processor detects the instance by inputting the frame to a trained model that estimates the instance including the two or more feature points from an input image;
The estimation system described in any one of Supplementary Notes 1 to 5.
(Appendix 7)
An estimation method performed by an estimation system comprising at least one processor, the method comprising:
acquiring an image showing the plant such that the position of the plant changes;
Object detection for detecting an instance corresponding to the plant is performed on each of the plurality of frames of the video, and the transition of the instance including two or more feature points corresponding to two or more nodes of the plant identifying the instance history as the instance history;
estimating the position of each of the two or more nodes based on the transition of each of the two or more feature points included in the instance history;
estimating an internodal distance, which is a distance between two nodes, based on the estimated two or more positions;
Estimation methods including.
(Appendix 8)
acquiring an image showing the plant such that the position of the plant changes;
Object detection for detecting an instance corresponding to the plant is performed on each of the plurality of frames of the video, and the transition of the instance including two or more feature points corresponding to two or more nodes of the plant identifying the instance history as the instance history;
estimating the position of each of the two or more nodes based on the transition of each of the two or more feature points included in the instance history;
estimating an internodal distance, which is a distance between two nodes, based on the estimated two or more positions;
An estimation program that causes a computer to execute

According to Appendices 1, 7, and 8, changes in two or more feature points corresponding to two or more nodes of the plant are identified over two or more frames of an image showing a plant, and those nodes are identified based on the changes. The positions of are estimated, and the internodal distances are estimated based on those positions. As the position of the plant changes in the video, each feature point also changes over two or more frames, thus providing multiple pieces of information about each feature point. By referring to multiple pieces of information, the positions of plant nodes can be accurately determined. As a result, it is possible to accurately grasp the internodal distance, which is closely related to the growth state of the plant. As the plant grows, the leaves become thicker, and depending on the angle of photography, the nodes may be hidden by the leaves. However, by processing multiple frames, the probability that such nodes will be detected increases, making it possible to determine the location of the nodes. In addition, since the positions of plant nodes are automatically estimated from the video, internodal distances, which are closely related to the growth state of the plant, can be efficiently grasped, and the growth state can be managed efficiently and easily.

According to appendix 2, the transition of instances over two adjacent frames is identified based on the degree of overlap between instances between the two frames. By considering the degree of redundancy, each instance can be accurately tracked across multiple frames to obtain an accurate instance history.

According to appendix 3, if an instance to be detected is not obtained by object detection, the instance is complemented, so the instance can be reliably tracked and an accurate instance history can be obtained.

According to appendix 4, since the position history of feature points is statistically processed, the position of the node can be estimated more accurately.

According to appendix 5, statistical processing is performed after outliers that may result in false detection are excluded from the position history, so the position of the node can be estimated more accurately.

According to Appendix 6, since a trained model is employed for object detection, instances of plants of various types and shapes can be accurately detected.

DESCRIPTION OF SYMBOLS 1... Estimation system, 2... Database, 3... Self-propelled camera, 9... Plant, 11... Acquisition unit, 12... Detection unit, 13... Tracking unit, 14... Estimation unit, 20... Video, 21-23... Frame, 110 ...Estimation program, 200...Instance, 201...Bounding box, 202, 203...Feature point, 301...First frame, 302...Second frame, 311-313...First instance, 321-325...Second instance, 411- 417...Feature points, 501-506...Position history.

Claims

comprising at least one processor;
the at least one processor,
Obtain a video showing the plant so that the position of the plant changes,
Object detection for detecting an instance corresponding to the plant is performed on each of the plurality of frames of the video, and the transition of the instance including two or more feature points corresponding to two or more nodes of the plant as the instance history,
estimating the position of each of the two or more nodes based on the transition of each of the two or more feature points included in the instance history;
estimating an internodal distance, which is a distance between two nodes, based on the estimated two or more positions;
Estimation system.
The plurality of frames include a first frame and a second frame located next to the first frame,
the at least one processor,
setting a first instance of the plant in the first frame;
detecting one or more second instances corresponding to the first instance in the second frame;
selecting one second instance from the one or more second instances based on the degree of overlap with the first instance;
associating the selected second instance with the first instance as at least part of the instance history;
The estimation system according to claim 1.
the instance further includes a bounding box corresponding to a stem of the plant;
the at least one processor, if the second instance is not detected in the second frame;
setting a complementary instance that includes the bounding box and does not include the feature points as the second instance;
associating the configured second instance with the first instance as at least part of the instance history;
The estimation system according to claim 2.
The transition of the feature point is expressed by a position history of the feature point,
The at least one processor, for each of the two or more feature points,
Calculating statistical values of the position history;
estimating the position of the corresponding node based on the statistical value;
The estimation system according to any one of claims 1 to 3.
The at least one processor, for each of the two or more feature points,
Excluding outliers from the position history of the feature point to identify a set of valid positions;
calculating the statistical value based on the set of valid locations;
The estimation system according to claim 4.
the at least one processor detects the instance by inputting the frame to a trained model that estimates the instance including the two or more feature points from an input image;
The estimation system according to any one of claims 1 to 3.
An estimation method performed by an estimation system comprising at least one processor, the method comprising:
acquiring an image showing the plant such that the position of the plant changes;
Object detection for detecting an instance corresponding to the plant is performed on each of the plurality of frames of the video, and the transition of the instance including two or more feature points corresponding to two or more nodes of the plant identifying the instance history as the instance history;
estimating the position of each of the two or more nodes based on the transition of each of the two or more feature points included in the instance history;
estimating an internodal distance, which is a distance between two nodes, based on the estimated two or more positions;
Estimation methods including.
acquiring an image showing the plant such that the position of the plant changes;
Object detection for detecting an instance corresponding to the plant is performed on each of the plurality of frames of the video, and the transition of the instance including two or more feature points corresponding to two or more nodes of the plant identifying the instance history as the instance history;
estimating the position of each of the two or more nodes based on the transition of each of the two or more feature points included in the instance history;
estimating an internodal distance, which is a distance between two nodes, based on the estimated two or more positions;
An estimation program that causes a computer to execute