WO2022004370A1 - データ収集装置及びデータ収集方法 - Google Patents

データ収集装置及びデータ収集方法 Download PDF

Info

Publication number
WO2022004370A1
WO2022004370A1 PCT/JP2021/022779 JP2021022779W WO2022004370A1 WO 2022004370 A1 WO2022004370 A1 WO 2022004370A1 JP 2021022779 W JP2021022779 W JP 2021022779W WO 2022004370 A1 WO2022004370 A1 WO 2022004370A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
recognition
unit
recognition unit
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/022779
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
文平 田路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Konica Minolta Inc
Original Assignee
Konica Minolta Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Konica Minolta Inc filed Critical Konica Minolta Inc
Priority to JP2022533823A priority Critical patent/JP7690957B2/ja
Priority to US18/002,534 priority patent/US12394184B2/en
Publication of WO2022004370A1 publication Critical patent/WO2022004370A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/091Active learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Definitions

  • This disclosure relates to a technique for collecting learning data used for learning a data recognition model.
  • an image recognition system that recognizes the position and state of an object such as a person or a car from an image by using machine learning is known.
  • Patent Document 1 in order to generate learning data, it is necessary for the user to give a correct answer to the image taken at the installation site.
  • a large amount of learning data is required, and the man-hours required by the user to prepare a sufficient amount of learning data for each of multiple installation sites is enormous. There is a problem that it becomes.
  • the present disclosure has been made in view of the above problems, and an object of the present disclosure is to provide a data collection device and a data collection method capable of reducing the burden on the user in generating training data of a data recognition model. ..
  • the data collection device of one aspect of the present disclosure is a data collection device that collects training data of a data recognition model, and is a first recognition unit, a second recognition unit different from the first recognition unit, and input data.
  • a comparison unit that compares the recognition result of the first recognition unit with the recognition result of the second recognition unit, and a collection unit that collects the input data as learning data according to the comparison result of the comparison unit. To prepare for.
  • calculation scale of the first recognition unit may be smaller than the calculation scale of the second recognition unit.
  • the collecting unit may collect the recognition result of the second recognition unit as learning data indicating correct answer data for the input data.
  • a learning unit that performs additional learning of the first recognition unit using the learning data collected by the collection unit may be provided.
  • the learning unit may modify the correct answer data according to an external input.
  • the comparison unit determines whether or not the recognition result of the first recognition unit and the recognition result of the second recognition unit are different, and the collection unit determines whether the recognition result of the first recognition unit and the recognition result of the second recognition unit are different. 2
  • the input data may be collected as learning data.
  • the comparison unit determines whether or not the difference between the recognition result of the first recognition unit and the recognition result of the second recognition unit is equal to or greater than a predetermined threshold value, and the collection unit determines whether the difference is the same.
  • the input data may be collected as learning data.
  • a time determination unit for determining the timing for operating the second recognition unit is further provided, the first recognition unit constantly recognizes data, and the second recognition unit is determined by the time determination unit. Data recognition may be performed at the timing.
  • the time determination unit may determine the timing at fixed intervals.
  • the timing determination unit may determine the timing according to the learning proficiency level in the first recognition unit.
  • the timing determination unit may determine the timing according to an external input.
  • the data collection device may be composed of an edge terminal including the first recognition unit and a server terminal including the second recognition unit.
  • the second edge terminal may further include one or more second edge terminals, and the second edge terminal may include a recognition unit having the same configuration as the first recognition unit.
  • the first recognition unit and the second recognition unit may perform image recognition, voice recognition, or natural language recognition, respectively.
  • another aspect of the present disclosure is a data collection method for collecting training data of a data recognition model, which is a first recognition step for obtaining a recognition result by a first recognition unit for input data, and the input. Comparison between the second recognition step of obtaining the recognition result by the second recognition unit different from the first recognition unit for the data, the recognition result of the first recognition unit, and the recognition result of the second recognition unit. It includes a step and a collection step of collecting the input data as learning data according to the comparison result of the comparison step.
  • calculation scale of the first recognition unit may be smaller than the calculation scale of the second recognition unit.
  • the recognition result of the second recognition unit may be collected as learning data indicating correct answer data for the input data.
  • the additional learning of the first recognition unit may be performed using the learning data collected in the collection step.
  • the comparison step determines whether or not the recognition result of the first recognition unit and the recognition result of the second image recognition unit are different, and the collection step is the recognition result of the first recognition unit and the above.
  • the recognition result of the second recognition unit is different, the input data may be collected as the learning data.
  • the comparison step determines whether or not the difference between the recognition result of the first recognition unit and the recognition result of the second recognition unit is equal to or greater than a predetermined threshold value, and in the collection step, the difference is said.
  • the input data may be collected as learning data.
  • the first recognition unit may perform data recognition at all times, and the second recognition unit may perform data recognition at a predetermined timing.
  • FP Fe Positive
  • FN False Negative
  • the input data is classified into FP (False Positive) or FN (False Negative) for the recognition unit that makes a mistake.
  • FP means that the input data is identified as being included even though the detection target is not included, and contrary to FN, the input data is included even though the detection target is included. It is to identify that it is not.
  • the purpose of learning in data recognition is to reduce such FPs and FNs.
  • An effective method for reducing FP and FN is to correctly answer the data classified as FP and FN, generate learning data, and perform additional learning to correct similar data. To be able to recognize.
  • such data classified as FP or FN can be easily collected as learning data. Further, as for the correct answer, the user does not need to manually correct the answer by using the recognition result of the recognition unit of the person who correctly answered the recognition. Therefore, it is possible to reduce the burden on the user related to the generation of learning data.
  • A It is a schematic diagram which shows one neuron U of CNN.
  • B It is a figure which shows the data structure of the trained parameter of CNN.
  • A It is a figure which shows typically the data propagation at the time of learning.
  • B It is a figure which shows typically the data propagation at the time of estimation.
  • the image recognition system 1 includes an image recognition device 100 and a camera 190.
  • the image recognition device 100 includes a control unit 110, a non-volatile storage unit 120, a first CNN 130 (first recognition unit), a second CNN 140 (second recognition unit), a recognition result comparison unit 150 (comparison unit), and the like. It includes a data collection unit 160 (collection unit), a time adjustment unit 170 (time determination unit), and an additional learning unit 180 (learning unit).
  • the first CNN 130, the second CNN 140, the recognition result comparison unit 150, the data collection unit 160, the time adjustment unit 170, and the additional learning unit 180 constitute a data acquisition device.
  • the camera 190 includes an image pickup element such as a CMOS (Complementary Metal-Common-Semiconductor field-effect transistor) image sensor or a CCD (Choice-Coupled Device) image sensor, and an electric signal is obtained by photoelectric conversion of the light imaged on the image pickup element. By converting to, an image of a predetermined size is output.
  • CMOS Complementary Metal-Common-Semiconductor field-effect transistor
  • CCD Choice-Coupled Device
  • the camera 190 outputs an image at a predetermined rate. For example, an image is output at 30 FPS.
  • the control unit 110 is composed of a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random access memory), and the like. Computer programs and data stored in the ROM and the storage unit 120 are loaded into the RAM, and the CPU operates according to the computer programs and data on the RAM, so that each processing unit (first CNN 130, second CNN 140, recognition result)
  • the comparison unit 150, the data collection unit 160, the time adjustment unit 170, and the additional learning unit 180) are realized.
  • the storage unit 120 is composed of a hard disk as an example.
  • the storage unit 120 may be composed of a non-volatile semiconductor memory.
  • the storage unit 120 stores the first learning parameter 121, the second learning parameter 122, and the additional learning data 123.
  • the additional learning data 123 includes the learning image 123a and the correct answer data 123b.
  • 1.2 CNN As an example of the convolutional neural network, the neural network 200 shown in FIG. 2 will be described.
  • the neural network 200 is a hierarchical neural network having an input layer 200a, a feature extraction layer 200b, and an identification layer 200c.
  • the neural network is an information processing system that imitates a human neural network.
  • an engineering neuron model corresponding to a nerve cell is referred to here as a neuron U.
  • the input layer 200a, the feature extraction layer 200b, and the identification layer 200c each have a plurality of neurons U.
  • the input layer 200a usually consists of one layer.
  • Each neuron U of the input layer 200a receives, for example, the pixel value of each pixel constituting one image.
  • the received image value is directly output from each neuron U of the input layer 200a to the feature extraction layer 200b.
  • the feature extraction layer 200b extracts features from the data received from the input layer 200a and outputs the features to the identification layer 200c.
  • the feature extraction layer 200b is sometimes called a backbone network.
  • the identification layer 300c discriminates using the features extracted by the feature extraction layer 300b.
  • the neuron U an element with multiple inputs and one output is usually used as shown in FIG. 3 (a).
  • the neuron weighted value can be changed by learning.
  • the sum of each input value (SUwi ⁇ xi) multiplied by the neuron weighted value SUwi is transformed by the activation function f (X) and then output. That is, the output value y of the neuron U is expressed by the following mathematical formula.
  • activation function for example, ReLU or a sigmoid function can be used.
  • an error is calculated from a value indicating a correct answer (teacher data) and an output value of the neural network 200 using a predetermined error function, and the error is minimized so as to be the minimum.
  • An error back propagation method (backpropagation) is used in which the neural weighted value of the feature extraction layer 200b and the neural weighted value of the discrimination layer 200c are sequentially changed by using a steep descent method or the like.
  • the learning process is a process of learning the neural network 200.
  • FIG. 4A schematically shows a data propagation model of the learning process.
  • the learning image 123a is input to the input layer 200a of the neural network 200 for each image, and is output from the input layer 200a to the feature extraction layer 200b.
  • an operation with a neuron weighted value is performed on the input data, and the data indicating the extracted feature is output to the identification layer 200c.
  • an operation with a neuron weighted value is performed on the input data (step S11).
  • object estimation based on the above characteristics is performed.
  • the data showing the result of the object estimation is output from the identification layer 200c.
  • the output value of the identification layer 200c is compared with the teacher data (correct answer data) 123b, and an error (loss) is calculated using a predetermined error function (step S12).
  • the neuron weighted value of the identification layer 200c and the neuron weighted value of the feature extraction layer 200b are sequentially changed so that this error becomes small (backpropagation) (step S13). As a result, the neural network 200 is learned.
  • FIG. 3B shows a data structure of learning parameters stored in the storage unit 120.
  • the learning parameter 210 is composed of a plurality of neuron information 211.
  • Each neuron information 211 corresponds to each neuron U in the feature extraction layer 200b and the identification layer 200c.
  • Each neuron information 211 includes a neuron number 212 and a neuron weighted value 213.
  • the neuron number 212 is a number that identifies each neuron U in the feature extraction layer 200b and the identification layer 200c.
  • the neuron weighted value 213 is a neuron weighted value of each neuron U in the feature extraction layer 200b and the identification layer 200c, respectively.
  • the model learned in this way is called a data recognition model.
  • the data recognition model is used to identify the objects contained in the data.
  • FIG. 4B shows a data propagation model when object estimation is performed using the image data obtained by the camera 190 as an input using the neural network 200 learned by the above learning step.
  • step S14 feature extraction and object estimation are performed using the learned feature extraction layer 200b and the learned identification layer 200c.
  • the image recognition system 1 includes two image recognizers (first CNN 130 and second CNN 140).
  • the first CNN 130 and the second CNN 140 are image recognizers that perform image recognition and, for example, perform person detection, and if a person is detected in the image input from the camera 190, it is recognized that the person is included. The result is output, and if it is not detected, the recognition result that the person is not included is output.
  • the first CNN 130 and the second CNN 140 have the same configuration as the neural network 200.
  • CNNs differ in recognition speed (time required to recognize one image) and recognition accuracy (accuracy that can correctly recognize an input image) even if learning is performed with the same learning data, depending on the calculation scale.
  • the calculation scale differs depending on the CNN algorithm and the number of backbone network stages. Therefore, the recognition speed and recognition accuracy differ depending on the CNN algorithm. Even with the same algorithm, if the number of stages of the backbone network is different, the recognition speed and recognition accuracy will be different. In general, the larger the calculation scale, the higher the recognition accuracy but the slower the recognition speed. On the contrary, the smaller the calculation scale, the faster the recognition speed, but the lower the recognition accuracy tends to be.
  • the calculation scale is different between the 1st CNN130 and the 2nd CNN140.
  • the second CNN 140 has a larger calculation scale than the first CNN 130. That is, the second CNN 140 has a higher recognition accuracy than the first CNN 130, and the first CNN 130 has a higher recognition speed than the second CNN 140.
  • the first CNN 130 is an image recognition device that performs image recognition in real time, and has a recognition speed sufficient to complete image recognition within the interval of images output by the camera 190.
  • the second CNN 140 is an image recognizer that performs image recognition only when instructed by the time adjustment unit 170.
  • the first CNN 130 and the second CNN 140 are pre-learned using the same learning data, respectively, and the first learning parameter 121 which is the learning result of the first CNN 130 and the second learning parameter which is the learning result of the second CNN 140 are stored. It is stored in the unit 120.
  • Additional learning unit 180 learns the first CNN 130 using the additional learning data 123 stored in the storage unit 120, and updates the first learning parameter 121 using the learning result.
  • the recognition result comparison unit 150 acquires the recognition result of the first CNN 130 and the recognition result of the second CNN 140, compares them, and outputs whether or not the recognition results match.
  • the data collecting unit 160 acquires the input image input to the first CNN 130 and the second CNN 140 and the recognition result of the second CNN 140 when the results of the comparison in the recognition result comparison unit 150 are different from each other, and uses the input image for learning.
  • the recognition result of the image 123a and the second CNN 140 is used as the correct answer data 123b for the learning image to generate the additional learning data 123, which is stored in the storage unit 120.
  • Time adjustment unit 170 controls (determines) the timing at which the second CNN 140 and the additional learning unit 180 are operated.
  • FIG. 5 is a flowchart showing the operation of the image recognition system 1.
  • control unit 110 assigns 0 to the control variable n indicating the frame number of the image for one frame acquired from the camera as an initial setting (step S101).
  • the control unit 110 determines whether or not an interrupt for ending processing has occurred (step S102), and if it has occurred (step S102: Yes), terminates processing.
  • step S102 the control unit 110 acquires an image (camera image) for one frame from the camera 190 (step S103).
  • the frame number of the camera image matches the control variable n. For example, when the control variable n is 1, the frame number of the camera image is 1.
  • the control unit 110 inputs the camera image of the frame number n to the first CNN 130 and executes image recognition (step S104), and the first CNN 130 outputs the recognition result for the camera image of the frame number n (step S105). ..
  • the time adjustment unit 170 determines whether or not the remainder obtained by dividing the control variable n by the threshold value T1 is 0 (step S106).
  • the determination result is true (step S106: Yes)
  • the threshold value T1 is a variable that specifies an interval for operating the second CNN 140.
  • the output speed of the camera 190 is 30 FPS and the threshold value T1 is 1800, the second CNN 140 is operated once every 1800 frames, that is, once a minute.
  • control unit 110 When operating the second CNN 140, the control unit 110 inputs the camera image of the frame number n to the second CNN 140 and executes image recognition (step S107), and the second CNN 140 recognizes the camera image of the frame number n. Is output.
  • the recognition result comparison unit 150 acquires the recognition results for the camera images of the frame numbers n of the first CNN 130 and the second CNN 140, compares them, and outputs the comparison result (step S108).
  • the data collection unit 160 acquires the comparison result by the recognition result comparison unit, and when the two are different (step S109: Yes), the camera image of the frame number n is set as the learning image 123a, and the second CNN 140 with respect to the camera image of the frame number n.
  • the correct answer data 123b for the learning image 123a the additional learning data 123 in which the learning image 123a and the correct answer data 123b are combined is generated and stored in the storage unit 120 (step S110).
  • the time adjustment unit 170 determines whether or not the remainder obtained by dividing the control variable n by the threshold value T2 is 0 (step S111).
  • the determination result is true (step S111: Yes)
  • the determination result is false
  • the threshold value T2 is a variable that specifies an interval for performing additional learning of the first CNN 130.
  • the additional learning unit 180 When performing additional learning of the first CNN 130, the additional learning unit 180 performs additional learning of the first CNN 130 using the additional learning data 123 stored in the storage unit 120 (step S112).
  • the control unit 110 assigns n + 1 to the control variable n and repeats the process from step S102.
  • the image recognition system 1 includes two image recognizers having different calculation scales, and causes the same camera image to perform image recognition.
  • the input image is classified as FP or FN for the wrong image recognition unit.
  • FP means that the input image is identified as being included even though the detection target is not included, and contrary to FN, the input image is included even though the detection target is included. It is to identify that it is not.
  • the purpose of learning in image recognition is to reduce such FPs and FNs.
  • An effective method for reducing FP and FN is to correctly answer images classified as FP and FN, generate learning data, and perform additional learning to correctly apply to similar images. To be able to recognize.
  • such images classified as FP or FN can be easily collected as learning data. Further, as for the correct answer, the user does not need to manually correct the answer by using the recognition result of the image recognition unit of the person who correctly answered the recognition. Therefore, it is possible to reduce the burden on the user related to the generation of learning data.
  • the image recognition system 1 in the above-described first embodiment includes two image recognition devices (first CNN 130 and second CNN 140) in the image recognition device 100 of the same housing.
  • the two image recognizers may be mounted on different terminal devices.
  • FIG. 6 is a block diagram showing a configuration of an image recognition system 2 having a configuration in which two image recognizers are mounted on different terminal devices.
  • the image recognition system 2 includes an edge terminal 300 and a server terminal 400.
  • the edge terminal 300 includes a control unit 310, a non-volatile storage unit 320, a sensor 330, a first CNN340 (first recognition unit), an additional learning unit 350 (learning unit), and a time adjustment unit 360 (time determination unit). ) And a communication unit 370.
  • the control unit 310 is composed of a CPU, ROM, RAM, and the like. Computer programs and data stored in the ROM and storage unit 320 are loaded into the RAM, and the CPU operates according to the computer programs and data on the RAM to operate each processing unit (first CNN340 and additional learning unit 350). And, the time adjustment unit 360) is realized, and the sensor 330 and the communication unit 370 are controlled.
  • the storage unit 320 is composed of a hard disk as an example.
  • the storage unit 320 may be composed of a non-volatile semiconductor memory.
  • the storage unit 320 stores the first learning parameter 321.
  • the sensor 330 is an image pickup element such as a CMOS image sensor or a CCD image sensor, and outputs an image of a predetermined size by converting the light formed on the image pickup element into an electric signal by photoelectric conversion.
  • the sensor 330 outputs an image at a predetermined rate. For example, an image is output at 30 FPS.
  • the first CNN340 has the same configuration as the first CNN130 of the first embodiment.
  • the learning result of the first CNN340 is stored in the storage unit 320 as the first learning parameter 321.
  • the additional learning unit 350 learns the first CNN340 using the additional learning data 422 received from the server terminal 400, and updates the first learning parameter 321 using the learning result.
  • the time adjustment unit 360 controls the timing of operating the additional learning unit 350 and the second CNN 430 of the server terminal 400.
  • the communication unit 370 is a network interface that communicates with the server terminal 400.
  • the edge terminal 300 transmits data such as an image captured by the sensor 330 and a recognition result of the first CNN 340 to the server terminal 400 via the communication unit 370. Further, the edge terminal 300 receives, for example, additional learning data 422 or the like from the server terminal 400 via the communication unit 370.
  • the server terminal 400 communicates with the control unit 410, the non-volatile storage unit 420, the second CNN430 (second recognition unit), the recognition result comparison unit 440 (comparison unit), and the data collection unit 450 (collection unit).
  • a unit 460 is provided.
  • the control unit 410 is composed of a CPU, ROM, RAM, and the like. Computer programs and data stored in the ROM and the storage unit 420 are loaded into the RAM, and the CPU operates according to the computer programs and data on the RAM, so that each processing unit (second CNN430 and the recognition result comparison unit) The 440 and the data collection unit 450) are realized, and the communication unit 460 is controlled.
  • the storage unit 420 is composed of a hard disk as an example.
  • the storage unit 420 may be composed of a non-volatile semiconductor memory.
  • the storage unit 420 stores the second learning parameter 421 and the additional learning data 422.
  • the additional learning data 422 includes a learning image 422a and correct answer data 422b.
  • the second CNN430 has the same configuration as the second CNN140 of the first embodiment.
  • the learning result of the second CNN 430 is stored in the storage unit 420 as the second learning parameter 421.
  • the recognition result comparison unit 440 has the same configuration as the recognition result comparison unit 150 of the first embodiment, acquires the recognition result of the first CNN 340 and the recognition result of the second CNN 430, compares them, and the recognition results match. Outputs whether or not.
  • the data collection unit 450 has the same configuration as the data collection unit 160 of the first embodiment, and when the results of comparison in the recognition result comparison unit 440 show that they are different, the input images input to the first CNN 340 and the second CNN 430, Then, the recognition result of the second CNN430 is acquired, the input image is used as the learning image 422a, and the recognition result of the second CNN430 is used as the correct answer data 422b for the learning image 422a, and the additional learning data 422 is generated and stored in the storage unit 420. ..
  • the communication unit 460 is a network interface that communicates with the edge terminal 300.
  • the server terminal 400 receives data such as an image captured by the sensor 330 and a recognition result of the first CNN 340 from the edge terminal 300 via the communication unit 460. Further, the server terminal 400 transmits, for example, additional learning data 422 or the like to the edge terminal 300 via the communication unit 460.
  • FIG. 7 is a flowchart showing the operation of the edge terminal 300.
  • control unit 410 assigns 0 to the control variable n indicating the frame number of the image for one frame acquired from the camera as an initial setting (step S201).
  • the control unit 410 determines whether or not an interrupt for ending processing has occurred (step S202), and if it has occurred (step S202: Yes), the processing ends.
  • step S202 the control unit 410 acquires an image (sensor image) for one frame from the sensor 330 (step S203).
  • the frame number of the sensor image matches the control variable n. For example, when the control variable n is 1, the frame number of the sensor image is 1.
  • the control unit 310 inputs the sensor image of the frame number n to the first CNN340 and causes the image recognition to be executed (step S204), and the first CNN340 outputs the recognition result for the camera image of the frame number n (step S205). ..
  • step S206 determines whether or not the remainder obtained by dividing the control variable n by the threshold value T1 is 0 (step S206).
  • step S206: Yes it is determined that the second CNN430 is operated, and when the determination result is false (step S206: No), it is determined that the second CNN430 is not operated.
  • control unit 310 When operating the second CNN 430, the control unit 310 transmits the recognition result of the first CNN 340 for the sensor image of the frame number n and the sensor image of the frame number n to the server terminal 400 via the communication unit 370 (step S207).
  • the time adjustment unit 360 determines whether or not the remainder obtained by dividing the control variable n by the threshold value T2 is 0 (step S208). If the determination result is true (step S208: Yes), it is determined that the additional learning of the first CNN340 is performed, and if the determination result is false (step S208: No), it is determined that the additional learning of the first CNN340 is not performed. ..
  • the control unit 310 transmits a request for acquisition of the additional learning data 422 to the server terminal 400 via the communication unit 370, and receives the additional learning data 422 from the server terminal 400 as a response. (Step S209).
  • the additional learning unit 350 performs additional learning of the first CNN340 using the additional learning data 422 (step S210).
  • the control unit 310 assigns n + 1 to the control variable n and repeats the process from step S202.
  • FIG. 8 is a flowchart showing the operation of the server terminal 400.
  • the control unit 410 waits until data is received from the edge terminal 300 (step S301).
  • the control unit 410 determines whether or not the recognition result of the first CNN 340 for the sensor image of the frame number n and the sensor image of the frame number n has been received from the edge terminal 300 via the communication unit 460 (step S302).
  • step S302 When the recognition result of the first CNN340 for the sensor image of the frame number n and the sensor image of the frame number n is received (step S302: Yes), the control unit 410 inputs the sensor image of the frame number n to the second CNN430, and inputs the sensor image of the frame number n. Image recognition is executed (step S303), and the second CNN430 outputs the recognition result for the camera image of the frame number n.
  • the recognition result comparison unit 440 acquires the recognition results for the sensor images of the frame numbers n of the first CNN 430 and the second CNN 430, compares them, and outputs the comparison result (step S304).
  • the data collecting unit 450 acquires the comparison result by the recognition result comparison unit 440, and when the two are different (step S305: Yes), the sensor image of the frame number n is set as the learning image 422a, and the sensor image of the frame number n is the first.
  • the recognition result of 2CNN430 is used as the correct answer data 422b for the learning image 422a, and the additional learning data 422 in which the learning image 422a and the correct answer data 422b are combined is generated and stored in the storage unit 420 (step S306).
  • the control unit 410 determines whether or not the acquisition request for the additional learning data 422 has been received from the edge terminal 300 via the communication unit 460 (step S307).
  • control unit transmits the additional learning data 422 to the edge terminal 300 via the communication unit 460 as a response.
  • a configuration may include a plurality of edge terminals 300.
  • the recognition result of the second CNN 140 is the correct answer data 123b, but the user input may be accepted and the correct answer data 123b may be modified based on the user input.
  • the first CNN 130 and the second CNN 140 are image recognizers that detect a person, and if a person is detected in the image input from the camera 190, the person is included.
  • the recognition result may be output as a numerical value such as a likelihood.
  • the data collection unit 160 may determine that the recognition results are different when the difference between the two numerical values exceeds a predetermined threshold value, and collect the data as additional learning data.
  • the time adjustment unit 170 operates the second CNN 140 at a predetermined interval T1, but the timing at which the second CNN 140 is operated is not limited to this.
  • the timing of operating the second CNN 140 may be changed according to the learning proficiency level of the first CNN 130. For example, in the early stage of learning, the interval for operating the second CNN 140 may be shortened, and in the latter stage of learning, the interval for operating the second CNN 140 may be lengthened.
  • the learning proficiency level is, for example, based on the number of executions of image recognition in the first CNN 130. good.
  • the degree of agreement when the degree of agreement is smaller than a predetermined threshold value, it may be the early stage of learning, and when the degree of agreement is larger than the predetermined threshold value, it may be the latter stage of learning.
  • the user input may be accepted and the interval T1 may be set based on the user input.
  • the target of learning and recognition may be voice data.
  • examples of voice data may be music, human voice, natural sound, or the like. Examples of music are classical music, folk music, pop music, Latin music and the like. Examples of human voices are news voices, lecture voices, conversation voices, and the like. Examples of natural sounds are the sounds of birds, the sounds of the wind, the sounds of the flow of rivers, and the like.
  • voice recognition may be executed. For example, when the voice data is a human voice, the voice of a specific person may be recognized.
  • the target of learning and recognition may be character data in the natural language recognition process.
  • examples of character data may be conversational sentences, literary works, newspaper articles, dissertations, and the like.
  • Examples of conversational sentences are conversational sentences in Japanese, English, Italian, and the like.
  • Examples of literary works are poetry, novels, stories, plays, critiques, essays, and the like.
  • Examples of newspaper articles are political news, economic news, scientific news, and the like.
  • a natural language processing system that performs natural language recognition using machine learning, it is equipped with two natural language recognizers (first recognition unit and second recognition unit) with different calculation scales, and natural language recognition for the same character data. May be executed.
  • the character data is a newspaper article
  • the main part (subject) and the predicate (predicate) may be recognized and extracted from one sentence in the newspaper article.
  • the data collection device can reduce the burden on the user related to the generation of learning data, and is useful as a data collection device for collecting learning data.
  • Image recognition device 100
  • Control unit 120
  • Storage unit 130 1st CNN 140
  • Recognition result comparison unit 160
  • Data acquisition unit 170
  • Time adjustment unit 180
  • Additional learning unit 190 Camera

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)
PCT/JP2021/022779 2020-07-03 2021-06-16 データ収集装置及びデータ収集方法 Ceased WO2022004370A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022533823A JP7690957B2 (ja) 2020-07-03 2021-06-16 データ収集装置及びデータ収集方法
US18/002,534 US12394184B2 (en) 2020-07-03 2021-06-16 Data collection device and data collection method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020115540 2020-07-03
JP2020-115540 2020-07-03

Publications (1)

Publication Number Publication Date
WO2022004370A1 true WO2022004370A1 (ja) 2022-01-06

Family

ID=79316092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/022779 Ceased WO2022004370A1 (ja) 2020-07-03 2021-06-16 データ収集装置及びデータ収集方法

Country Status (3)

Country Link
US (1) US12394184B2 (https=)
JP (1) JP7690957B2 (https=)
WO (1) WO2022004370A1 (https=)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024253004A1 (ja) * 2023-06-05 2024-12-12 日本電気株式会社 画像選択装置、画像選択方法、及びプログラム

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0561843A (ja) * 1991-05-15 1993-03-12 Wacom Co Ltd ニユーラルネツトワーク装置
JP2018169752A (ja) * 2017-03-29 2018-11-01 パナソニックIpマネジメント株式会社 商品認識システム、学習済みモデル、及び商品認識方法
JP2020052484A (ja) * 2018-09-25 2020-04-02 Awl株式会社 物体認識カメラシステム、再学習システム、及び物体認識プログラム

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5923723B2 (ja) 2011-06-02 2016-05-25 パナソニックIpマネジメント株式会社 人物属性推定システム、人物属性推定装置、及び人物属性推定方法
JP2019003554A (ja) 2017-06-19 2019-01-10 コニカミノルタ株式会社 画像認識装置、画像認識方法、および画像認識装置用プログラム
JP6985856B2 (ja) * 2017-08-31 2021-12-22 キヤノン株式会社 情報処理装置、情報処理装置の制御方法及びプログラム
JP7153477B2 (ja) 2018-06-13 2022-10-14 日本放送協会 情報判定モデル学習装置およびそのプログラム
JP6935368B2 (ja) 2018-07-06 2021-09-15 株式会社 日立産業制御ソリューションズ 機械学習装置及び方法
US10311335B1 (en) * 2018-09-05 2019-06-04 StradVision, Inc. Method and device for generating image data set to be used for learning CNN capable of detecting obstruction in autonomous driving circumstance, and testing method, and testing device using the same
US11042799B2 (en) * 2019-08-20 2021-06-22 International Business Machines Corporation Cohort based adversarial attack detection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0561843A (ja) * 1991-05-15 1993-03-12 Wacom Co Ltd ニユーラルネツトワーク装置
JP2018169752A (ja) * 2017-03-29 2018-11-01 パナソニックIpマネジメント株式会社 商品認識システム、学習済みモデル、及び商品認識方法
JP2020052484A (ja) * 2018-09-25 2020-04-02 Awl株式会社 物体認識カメラシステム、再学習システム、及び物体認識プログラム

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024253004A1 (ja) * 2023-06-05 2024-12-12 日本電気株式会社 画像選択装置、画像選択方法、及びプログラム

Also Published As

Publication number Publication date
JP7690957B2 (ja) 2025-06-11
JPWO2022004370A1 (https=) 2022-01-06
US20230245428A1 (en) 2023-08-03
US12394184B2 (en) 2025-08-19

Similar Documents

Publication Publication Date Title
US11170788B2 (en) Speaker recognition
CN111292764B (zh) 辨识系统及辨识方法
CN108182937B (zh) 关键词识别方法、装置、设备及存储介质
CN111164676B (zh) 经由环境语境采集进行的语音模型个性化
CN112233698B (zh) 人物情绪识别方法、装置、终端设备及存储介质
CN114550703B (zh) 语音识别系统的训练方法和装置、语音识别方法和装置
EP3951617A1 (en) Video description information generation method, video processing method, and corresponding devices
US20210012766A1 (en) Voice conversation analysis method and apparatus using artificial intelligence
JP6866715B2 (ja) 情報処理装置、感情認識方法、及び、プログラム
CN105741836A (zh) 声音识别装置以及声音识别方法
CN111326152A (zh) 语音控制方法及装置
Vayadande et al. Lipreadnet: A deep learning approach to lip reading
KR20200018154A (ko) 브이에이이 모델 기반의 반지도 학습을 이용한 음향 정보 인식 방법 및 시스템
CN116868266A (zh) 支持语音识别的电子设备及其操作方法
US20180033432A1 (en) Voice interactive device and voice interaction method
WO2024114303A1 (zh) 音素识别方法、装置、电子设备及存储介质
CN116705013B (zh) 语音唤醒词的检测方法、装置、存储介质和电子设备
KR20210053722A (ko) 전자장치 및 그 제어방법
WO2022004370A1 (ja) データ収集装置及びデータ収集方法
Singh et al. Speaker recognition assessment in a continuous system for speaker identification
CN119580691A (zh) 语音合成模型训练方法和装置、电子设备及存储介质
CN115641849B (zh) 语音识别方法、装置、电子设备及存储介质
CN110517679A (zh) 一种人工智能的音频数据处理方法及装置、存储介质
CN116071472A (zh) 图像生成方法及装置、计算机可读存储介质、终端
US12142262B2 (en) Segment detecting device, segment detecting method, and model generating method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21833874

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022533823

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21833874

Country of ref document: EP

Kind code of ref document: A1

WWG Wipo information: grant in national office

Ref document number: 18002534

Country of ref document: US