WO2019074316A1 - 이미지 및 영상의 등록, 검색, 재생을 모바일 디바이스 및 서버에서 분할하여 수행하는 컨벌루션 인공신경망 기반 인식 시스템 - Google Patents
이미지 및 영상의 등록, 검색, 재생을 모바일 디바이스 및 서버에서 분할하여 수행하는 컨벌루션 인공신경망 기반 인식 시스템 Download PDFInfo
- Publication number
- WO2019074316A1 WO2019074316A1 PCT/KR2018/012022 KR2018012022W WO2019074316A1 WO 2019074316 A1 WO2019074316 A1 WO 2019074316A1 KR 2018012022 W KR2018012022 W KR 2018012022W WO 2019074316 A1 WO2019074316 A1 WO 2019074316A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- image
- neural network
- artificial neural
- data
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/43—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5011—Pool
Definitions
- the present invention relates to a convolutional artificial neural network based recognition system, and more particularly, to a convolutional artificial neural network based recognition system which derives information for searching from a video input from a camera mounted on a mobile device, A digital fingerprint of previously registered data is searched, a target is searched through a convolution artificial neural network or an in-memory tree search technology, and a combination of the target and the content is transmitted to the mobile device, Reproducing technology.
- an artificial neural network refers to an algorithm that simulates a pattern in which a human brain recognizes a pattern.
- the artificial neural network interprets visual and auditory input data using perceptron, classification, and cluster. Using this result, it is possible to recognize a specific pattern in image, sound, character, and time series data.
- classification and clustering can be performed, and desired layers can be put on the desired data for classification or clustering.
- the unlabeled data can be compared with each other to obtain similarity, or the data can be automatically classified by learning the searcher based on the labeled data.
- This artificial neural network includes a deep neural network, and the deep neural network means a neural network composed of several layers in the neural network algorithm.
- One layer consists of several nodes, and each node actually operates, and this computation process is designed to simulate the process that takes place in neurons that make up a human neural network.
- a node reacts when it receives a stimulus of a certain magnitude or more.
- the magnitude of the response is approximately proportional to the input value multiplied by the node's coefficient (or weight).
- a node receives multiple inputs and has a number of inputs. Therefore, different weights can be given to different inputs by adjusting this coefficient.
- the final multiplied values are all added and the sum is input to the activation function.
- the result of the active function corresponds to the output of the node, which is ultimately used for classification or regression analysis.
- Each layer consists of several nodes, and each node is activated / deactivated according to the input value. At this time, the input data becomes the input of the first layer, and thereafter, the output of each layer becomes the input of the next layer again.
- features of low level are learned by simple and specific features (eg horizontal line, vertical line, diagonal line of image), and features of higher level are learned more complex and abstract features.
- features of higher level are learned more complex and abstract features.
- in-depth neural networks can use data to determine the latent structures of the data.
- you can identify the potential structure of photos, text, video, audio, and music (what objects are in the photo, what the content and feelings of the text are, what the content and feelings of the voice are).
- the similarity between data can be effectively grasped even if the data is not labeled.
- the deep neural network is effective for data clustering.
- In-depth neural networks are algorithms that allow computers to do this, and they learn to perform much faster and more effectively than humans.
- the neural network can automatically extract the characteristics of the data. This automatic extraction can be done in a variety of ways, usually by learning the output of the neural network to be equal to the input.
- the neural network finds the correlation between input and output.
- the neural network may be learned to some extent with labeled data, and data that is not labeled may be added to continue learning. This method can maximize the performance of the neural network.
- the last layer of the deep neural network is the output layer.
- the active function of the output layer is mostly logistic or softmax, and finally the probability of a specific label can be obtained in the output layer. For example, when you insert a photo as an input, you can get the probability of each person, cat, or individual.
- the objective of this artificial neural network is to minimize errors in output.
- the principle of the coefficient update is to first estimate the coefficient, measure the error that occurs when the coefficient is used, and then update the coefficient slightly based on the error.
- the model may be initialized, or the learning may be completed.
- the initialized model can not do meaningful work, but as the learning progresses, the model outputs realistic results rather than random values.
- the feature point descriptor method has a low recognition rate.
- the artificial neural network structure it is difficult to recognize specific objects and defects.
- the present invention has been made keeping in mind the above problems occurring in the prior art, and it is an object of the present invention to provide a method and apparatus for searching a specific object and recognizing general objects in a single process, In addition, it is an object of the present invention to provide an artificial neural network system capable of minimizing the amount of computation of the mobile device and minimizing the load on the server.
- the present invention also provides an artificial neural network system capable of increasing the recognition rate by utilizing artificial neural networks having rotation invariant characteristics in specific object recognition.
- the user of the mobile device 3 photographs an object with the camera 10 in order to receive the contents registered in the first step S10 and the mobile device 3 transmits the obtained image and image data to the convolutional artificial neural network A second step (S20) of processing by lower layers L1 to L6;
- the server 5 receives the data processed in the second step S20 and determines it as a main direction of the obtained image and image data by processing the data by the layers L7 to L12 of the upper and lower layers of the convolutional artificial neural network
- FC layer value of the fully connected layer of the convolutional artificial neural network among the data processed in the second step S20 is transmitted to the KD tree searching unit 7 of the server 5 to search for the nearest target registered by the content provider (Step S40);
- the data processing is performed in each layer of the neural network corresponding to the middle and upper layers of the convolutional artificial neural network and the data is registered, analyzed, searched, and classified to specific objects (images) and images A server 5; And
- the convolution artificial neural network based recognition system has the following advantages.
- one complete convolution artificial neural network is separated into two connected convolution artificial neural networks, and the neural network corresponding to the lower layer is transmitted to the server by the mobile device, and the neural network corresponding to the middle and higher layers is processed by the server Since the size of the data is reduced, the mobile device is responsible for part of the load on the traffic, the communication speed improvement, and the recognition processing, so that the load of the server is reduced and the source data exists in the server. .
- the lower layer tends to be learned as a layer serving as a feature extractor of conventional feature-based image recognition that extracts local region feature information of an image
- the higher hierarchical layer is learned as a hierarchical layer that plays a role of matching the features of the conventional learning-based image recognition and the conventional feature-based image recognition
- the output values of the lower layer and the upper layer are transformed by using one complete convolution artificial neural network, and the output values of the upper layer are used for the purpose of understanding the category of objects through the obtained convolution artificial neural network,
- a content provider is a person who provides a variety of contents such as advertisement, learning, and information to the user, and registers a target for the recognition of a specific object and the content linked thereto, In the process of registering, the restriction of the action to be taken by the user is reduced, so that it is possible to provide the content to more users.
- contents can be provided to similar products which are not the same as the corresponding products, and contents and advertisements can be provided to many users.
- FIG. 1 is a diagram illustrating an overall structure of a recognition system based on convolutional artificial neural networks according to an embodiment of the present invention.
- FIG. 2 is a schematic view showing the structure of the mobile device shown in FIG. 1.
- FIG. 2 is a schematic view showing the structure of the mobile device shown in FIG. 1.
- FIG. 3 is a view schematically showing the structure of the server shown in FIG. 1. Referring to FIG. 3
- FIG. 4 is a block diagram schematically illustrating a process in which a convolution artificial neural network is divided and processed in the mobile device and the server shown in FIG. 1.
- FIG. 4 is a block diagram schematically illustrating a process in which a convolution artificial neural network is divided and processed in the mobile device and the server shown in FIG. 1.
- FIG. 5 is a block diagram schematically illustrating a process of processing data by a second module and a third module of the convolution artificial neural network in the mobile device and server shown in FIG. 1, and registering the data in a searcher.
- FIG. 6 is a block diagram schematically illustrating a target search and query response process by the convolution artificial neural networks of the fourth through sixth modules in the mobile device and the server shown in FIG.
- FIG. 7 is a flowchart illustrating a recognition method based on a convolutional artificial neural network according to another embodiment of the present invention.
- FIG. 8 is a flowchart showing a result of a search query of a mobile device and a content reproduction process.
- a mobile device 3 for performing analysis of a lower layer, transmitting and receiving and managing the query items of the user to the server 5;
- a server connected to the mobile device 3 through a network for performing data processing in each layer of the neural network corresponding to the middle and upper layers of the convolutional artificial neural network and registering, analyzing, searching and classifying a specific object image and images 5);
- a searcher 7 for comparing the FC layer value of the artificial neural network processing of the image and the image registered by the contents provider with the FC layer value of the artificial neural network processing of the image and the image transmitted from the mobile device 3.
- the mobile device 3 refers to a device capable of obtaining image or moving image information and transmitting the image or moving image information through a network, for example, a smart phone, a tablet PC, or the like.
- This mobile device 3 includes a camera 10 for acquiring images and images; A lower layer processing module (12) for executing layers 1 to 6 layers, which are the lower two layers of the convolutional artificial neural network processing image and image data acquired by the camera (10); A memory (14) for storing data processed by the module; And a display window 16 for displaying an image.
- the lower layer processing module 12 derives the local area and size invariant features by a conventional convolutional artificial neural network.
- the convolutional neural network consists of an input layer, a hidden layer, and an output layer.
- the hidden layer may be composed of one layer or may be composed of multiple layers, and in the present invention, it is composed of six layers.
- Each layer is composed of a plurality of nodes, and each node has a value.
- Each node is connected to a front node and a rear node by a weight, which means a connection weight.
- Each layer has an activation function, and the node value is obtained by using the value of the previous node, the connection weight of the weight, and the activation function of the previous node.
- image data is input. For example, three-dimensional image data in which channels are combined into 410 * 410 pixels is prepared.
- a feature map is generated by reducing a dimension to 204 * 204 * 128 by applying a 4 * 4 filter.
- padding is mainly used to adjust the size of the output data. For example, if a 3 * 3 filter is applied to 4 * 4 input data, the output becomes 2 * 2, which is reduced by 2 from the input data.
- the periphery of the input data is filled with a specific value, for example, zero.
- padding having a width of 1 is applied to input data of 4 * 4 size, and one pixel every four pixels of input data is filled with a specific value.
- padding is added to the input data with the initial size of 4 * 4, resulting in 6 * 6.
- 4 * 4 input data is padded to 6 * 6, and a 3 * 3 size filter produces 4 * 4 output data.
- the padding is set to 1, but the present invention is not limited to this, and the padding can be set to 2 or 3.
- padding is set to zero.
- stride means the interval of the position where the filter is applied, and the stride may be 1 or 2.
- the window to which the filter is applied moves by one space.
- the window is moved by two spaces.
- the stride is set to 2.
- the second layer L2 is performed.
- a feature map of 204 * 204 * 128 is generated once again by applying a 3 * 3 filter. At this time, the padding is set to 1 and the stride is set to 1.
- Pooling is used to reduce the spatial size of data, such as padding and stride. These pooling functions to keep the size of the output data as it is in the size of the input data and adjust the size only in the pooling layer so as to reduce the sensitivity of the position change in the image so that the fine parallel movement of the previous layer does not affect the next layer do.
- the pooled layer is the largest, even if some numbers in 4 * 4 change when fetching or averaging a value in a granular layer, e.g., in a 4 * 4 cell As a result, the minute changes in the previous layer do not affect the next layer.
- the pooling layer merges the localized portions of the previous layer nodes.
- the processing that takes the maximum value of the local part is called Max pooling.
- the window size and the stride value of the pulling are set to the same value.
- image data of 102 * 102 * 128 size is obtained by performing fax pooling in 2 * 2 size.
- the fourth layer L4 is performed.
- a feature map of 102 * 102 * 128 is generated once again by applying a 4 * 4 filter. At this time, the padding is set to 0, and the stride is set to 2. Therefore, a feature map of 50 * 50 * 256 size is generated.
- a feature map of 50 * 50 * 256 is applied to the feature maps ).
- the padding is set to 1 and the stride is set to 1.
- the sixth layer L6 is performed by performing fax pooling.
- the feature map of 50 * 50 * 256 size is maximized by 2 * 2 size to obtain image data of 25 * 25 * 256 size.
- the image data of 25 * 25 * 256 size is compressed and derived as an output value.
- the output value is outputted in a size of about 336kb, but it is not limited thereto and can be appropriately outputted according to the kind of data.
- the output value to be derived by performing the first to sixth layers L1 to L6 in the mobile device 3 is transmitted to the server 5 to perform the seventh to the twelfth layers L7 to L12.
- the server 5 includes a first module M1 for determining an output value of a convolution artificial neural network sub-layer transmitted from the mobile device 3 as a main direction of an input image to regenerate a rotation invariant feature; A second module (M2) for processing the image target transmitted from the content provider by the artificial neural network; A third module (M3) for registering the information obtained from the characteristic descriptor layer in the searcher (7) by combining the lower layer information transmitted from the second module (M2) Fully-connected; A fourth module (M4) for transmitting information obtained from the feature descriptor layer to the searcher (7) by performing a complete connection of the lower layer information transmitted from the first module (M1) of the mobile device (3) to perform search; A fifth module (M5) for determining whether the target search is successful by comparing the similarity of the nearest target with a threshold value; And a sixth module M6 for obtaining an input value of a searcher 7 layer for recognizing the general image and the shape of the image.
- M1 for determining an output value of a convolution artificial neural network sub
- the first module M1 performs the seventh to twelfth layers L7 to L12.
- the seventh layer L7 is performed, and a 3 * 3 filter is applied to the feature map of 25 * 25 * And generates feature maps once more.
- the padding is set to 0, and the stride is set to 2.
- the feature map of 12 * 12 * 512 is applied to the feature map Feature maps).
- the padding is set to 1 and the stride is set to 1.
- the same process as that in the 8th layer L8 is repeated one more time.
- a feature map is generated by applying a 3 * 3 filter to a 12 * 12 * 512 feature map, with padding of 1 and stride of 1.
- the tenth layer L10 that performs max pooling is performed.
- the feature map of 12 * 12 * 512 size is maximized by 2 * 2 size to obtain image data of 6 * 6 * 512 size.
- the second module M2 After performing the first to tenth layers L1 to L10 by the first module M1, the second module M2 generates a completely connected layer at the eleventh layer L11.
- the complete connection layer means a layer in which the nodes of the front layer and the nodes of the main layer are connected by weight.
- the node value can be calculated by the connection weight and the activation function.
- the fully connected layer connects and combines nodes and nodes with weights, the amount of computation to process this layer increases, so that it takes a long time to operate in a layer having a large number of nodes.
- Such a perfect connection layer is mainly used in the latter half of the hidden layer or the output layer.
- a complete connection layer can be additionally generated in the twelfth layer L12.
- the output value is classified through the softmax.
- SoftMax is mainly used for classification at the output node. That is, the sum of all the sigmoid values is calculated as a probability of normalization. This is to extend the logistic regression to more than one multiple categorical variable. That is, a "probability value" corresponding to which category belongs to the classification of k category variables is calculated.
- the twelfth layer L12 performs the general object recognition by finally classifying the output values of the layers outputted through the soft max.
- the general object recognition is performed through the lower layer (1-6 layer) of the mobile device 3 and the output values of the upper layer (7-12 layers) in the server 5,
- This specific object recognizing process is shown in FIG. 5, where a content provider registers an image in the web server 5.
- the second module M2 is installed in the web server 5 to process both the lower layer processed by the mobile device 3 and the upper layer processed by the server 5.
- the second module M2 processes all of the first to twelfth layers L1 to L12 and processes the same in the same manner, and thus a detailed description thereof will be omitted.
- the third module M3 registers the nearest target in the KD-Tree searcher 7.
- the resultant value of the FC layer is derived immediately before taking the soft max in the twelfth layer L12, And registers it in the KD-Tree searcher 7.
- the KD tree expands a binary search tree into a multidimensional space.
- the basic structure and algorithm are similar to a binary search tree, but have a feature of alternating the level of the tree.
- a method of searching a node in the k-d tree two methods are mainly used. First, as a range search method, a range of a key value to be searched is determined and a node included in the range is searched.
- the second is to find the closest node to the given key value by the nearest neighbor search method, and to use the hyper cube to find the space on both sides of the node.
- the fourth module M4 transmits the FC layer value of the twelfth layer L12 to the KD-Tree retriever 7 by completely connecting the lower layer information transmitted from the first module M1 of the mobile device 3 And searches for the nearest target.
- the feature information of 16 KB (4 bytes 4096), which is a feature descriptor obtained by fully-connecting lower layer information in the second complete connection layer of the two complete connection layers, is transmitted to the search device 7 of the K-D tree structure.
- the fifth module M5 compares the closest target registered by the content provider with the image transmitted by the user of the mobile device 3 in the search device 7 of the KD tree structure to determine whether the similarity is equal to or greater than a threshold value It is judged whether or not it is successful.
- the KD-Tree searcher 7 if the similarity of the closest target is equal to or greater than the threshold value, it is judged that the search for the target matching the specific object recognition is successful.
- the mobile device 3 transmits the action execution information linked to the image to the mobile device 3 so that the action can be performed.
- the most similar category is selected by taking the soft max of the output value of the twelfth layer L12.
- the output of the hierarchy of the searcher 7 for recognizing the shape of a general object is used to determine the corresponding general object, and the general object (person, dalmatian, rose, If the guessed value is greater than or equal to the threshold value, the corresponding action information is transmitted to the mobile device.
- the no operation state is transmitted to the mobile device 3.
- the search result obtained as a result of the search from the server 5 does not have an operation, the standard deviation of the value obtained in the layer deriving the characteristic of the local region is obtained. After the result value is temporarily stored, a new image is acquired from the camera 10 of the mobile device 3, and if there is no difference between the processing result of the previous frame and the threshold value, the newly acquired corresponding frame data is ignored.
- the sixth module M6 reproduces the content linked to the target, and the content provider registers the previously registered content with the related target to reproduce the content on the mobile device 3.
- the content can be provided even to similar products belonging to the same category, but not necessarily, using the " general object recognition "
- contents can be provided to a specific object.
- the conventional general object recognition technology can not implement this QR code or a coupon number is provided to the user, it can be solved by a page which can be accessed only by the product purchaser through a specific object recognition technology, or a method of delivering the print in the sales process.
- the mobile device 3 can authorize and simultaneously receive GPS satellite signals or receive information that can infer location information from Beacon or Wifi
- the server 5 can send the location information to the server 5 as additional information of the search request information, and the server 5 can determine whether the operation is allowed or not by determining whether the location is limited to the operation performing information of the search result.
- operation restriction information is sent together with the recognition result to the mobile device 3, so that the mobile device 3 indicates that the operation is restricted to the user.
- the user of the mobile device 3 photographs an object with the camera 10 in order to receive the contents registered in the first step S10 and the mobile device 3 transmits the obtained image and image data to the convolutional artificial neural network
- the server 5 receives the data processed in the second step S20 and determines it as a main direction of the obtained image and image data by processing the data by the layers L7 to L12 of the upper and lower layers of the convolutional artificial neural network
- the FC layer value of the fully connected layer of the convolutional artificial neural network among the data processed in the second step S20 is transmitted to the KD tree searching unit 7 of the server 5 to search for the nearest target registered by the content provider (Step S40);
- the content provider registers the image target and contents in the search device 7 of the K-D tree structure of the web server 5 in the first step (S10).
- At least one layer for preparing image data, applying a filter to data, performing padding and striding to generate feature maps, is performed in the web server 5, Max pooling is performed at least once, a filter is applied to the max pooled data, and padding and striding are performed to generate feature maps.
- image data is input, and three-dimensional image data in which channels are combined to, for example, 410 * 410 pixels is prepared.
- a feature map is generated by reducing a dimension to 204 * 204 * 128 by applying a 4 * 4 filter.
- padding is mainly used to adjust the size of the output data. For example, if a 3 * 3 filter is applied to 4 * 4 input data, the output becomes 2 * 2, which is reduced by 2 from the input data.
- the periphery of the input data is filled with a specific value, for example, zero.
- padding is added to the input data with the initial size of 4 * 4, resulting in 6 * 6.
- padding is set to zero.
- a stride may be performed.
- the stride means the interval of the position where the filter is applied, and the stride may be 1 or 2.
- the stride is set to 2.
- the second layer L2 is performed.
- a feature map of 204 * 204 * 128 is generated once again by applying a 3 * 3 filter. At this time, the padding is set to 1 and the stride is set to 1.
- Pooling is used to reduce the spatial size of data, such as padding and stride. These pooling functions to keep the size of the output data as it is in the size of the input data and adjust the size only in the pooling layer so as to reduce the sensitivity of the position change in the image so that the fine parallel movement of the previous layer does not affect the next layer do.
- the pooling layer merges the localized portions of the previous layer nodes.
- the processing that takes the maximum value of the local part is called Max pooling.
- the window size and the stride value of the pulling are set to the same value.
- image data of 102 * 102 * 128 size is obtained by performing fax pooling in 2 * 2 size.
- the fourth layer L4 is performed.
- a feature map of 102 * 102 * 128 is generated once again by applying a 4 * 4 filter. At this time, the padding is set to 0, and the stride is set to 2. Therefore, a feature map of 50 * 50 * 256 size is generated.
- a feature map of 50 * 50 * 256 is applied to the feature maps ).
- the padding is set to 1 and the stride is set to 1. Therefore, a feature map of 50 * 50 * 256 size is generated.
- the sixth layer L6 is performed by performing fax pooling.
- the feature map of 50 * 50 * 256 size is maximized by 2 * 2 size to obtain image data of 25 * 25 * 256 size.
- a feature map is generated once more by applying a 3 * 3 filter to the feature map of 25 * 25 * 256 size. At this time, the padding is set to 0, and the stride is set to 2.
- the feature map of 12 * 12 * 512 is applied to the feature map Feature maps).
- the padding is set to 1 and the stride is set to 1.
- the same process as the 8th layer is repeated one more time.
- a feature map is generated by applying a 3 * 3 filter to a 12 * 12 * 512 feature map, with padding of 1 and stride of 1.
- the tenth layer L10 that performs max pooling is performed.
- the feature map of 12 * 12 * 512 size is maximized by 2 * 2 size to obtain image data of 6 * 6 * 512 size.
- the 11th hierarchy L11 After the tenth hierarchy L10 is performed, the 11th hierarchy L11 generates a complete connection layer.
- a complete connection layer can be additionally created in the twelfth layer L12.
- the output value is classified through the softmax.
- the twelfth layer L12 performs the general object recognition by finally classifying the output values of the layers outputted through the soft max.
- FC layer value of the secondary full connection layer is registered by storing the FC layer value in the K-D tree structure searcher 7 stored on the memory 14 of the web server 5.
- the second to sixth steps (S20-S60) of recognizing the corresponding object and receiving the corresponding content may be performed in order.
- the user of the mobile device 3 photographs an object with the camera 10 (S21), and the obtained image and image data is processed by the lower layer L1-L6 of the convolutional artificial neural network (S22) (S23).
- the lower layer process of the second step S20 is the same as the lower layer process of the first step S10 except that the mobile device 3 proceeds.
- image data is acquired by photographing an object with the camera 10 mounted on the mobile device 3 (S21).
- a feature map is generated by applying a filter in the first layer L1 using the image data, padding is performed, stride is performed, and a filter is applied in the second layer L2 To generate feature maps once more, and to perform padding and stride.
- Max pooling is performed in the third layer L3, feature maps are generated once more in the fourth layer L4, and padding and striding are performed.
- the same feature map is generated. Padding and stride.
- fax pooling is performed and padding and striding are performed.
- the third step S30 is performed.
- the data processed in the mobile device 3 in the second step S20 is transmitted to the server 5 and processed by the upper layer.
- the process of the upper layer being processed by the server 5 is the same as the process of registering the image to the server 5 by the content provider in the first step (S10).
- a feature map is generated once again by applying a feature map filter. At this time, padding and stride are performed.
- a feature map is generated once more by applying a feature map to the filter. At this time, padding and stride are also performed.
- the same process as the eighth layer L8 is repeated to generate the feature map.
- max pooling is performed, a first complete interconnecting layer is created in the eleventh layer L11, and a second complete interconnecting layer is formed in the twelfth layer L12.
- the general object recognition is performed by classifying the output values through the softmax.
- the fourth layer S40 is performed, and the FC layer value of the twelfth layer L12 is transferred to the KD tree searching unit 7 mounted on the memory 14 S32) to retrieve the nearest target registered by the content provider (S41).
- the fifth step S50 is performed.
- the FC layer value of the image input to the K-D tree search unit 7 is compared with the similarity of the closest target previously registered by the content provider (S51).
- the most similar category is selected as the value obtained by taking the soft max as the output value of the twelfth layer L12 of the third step S30 (S52).
- the similarity degree is equal to or less than the threshold value
- the content metadata is searched and reproduced (S61). If the similarity degree is equal to or less than the threshold value, the camera 10 acquires a new image (S54).
- the output of the hierarchy of the searcher 7 for recognizing the shape of a general object is used to judge a general object to be a general object (person, dalmatian, rose, If the guessed value is greater than or equal to the threshold value, the corresponding action information is transmitted to the mobile device.
- the no operation state is transmitted to the mobile device 3.
- the search result obtained as a result of the search from the server 5 does not have an operation, the standard deviation of the value obtained in the layer deriving the characteristic of the local region is obtained. After the result value is temporarily stored, a new image is acquired from the camera 10 of the mobile device 3, and if there is no difference between the processing result of the previous frame and the threshold value, the newly acquired corresponding frame data is ignored.
- step S60 a process of reproducing the content linked to the target is performed, and the content provider plays back the previously registered content on the mobile device 3 by linking with the related target.
- the present invention relates to a convolutional artificial neural network based recognition system, and more particularly, to a convolutional artificial neural network based recognition system which extracts information for searching from a video input from a camera mounted on a mobile device, identifies an object through data learning based on a convolution artificial neural network Generates a digital fingerprint of data registered in advance based on a convolution artificial neural network, searches for a target through a convolution artificial neural network or an in-memory tree search technology, transmits the combined information of the target and the content to the mobile device
- the present invention is applicable to a technical field for reproducing contents linked to one object.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
- Processing Or Creating Images (AREA)
Abstract
Description
Claims (14)
- 컨텐츠 제공자가 이미지 타겟 및 컨텐츠를 서버(5)의 K-D 트리 구조의 검색기(7)에 등록하는 제 1단계(S10);모바일 디바이스(3) 사용자가 제 1단계(S10)에서 등록된 컨텐츠를 제공받기 위하여 카메라(10)로 대상물을 촬영하고, 상기 모바일 디바이스(3)가 획득된 이미지 및 영상 데이터를 컨벌류션 인공 신경망의 하위계층인 L1 내지 L6계층에 의하여 처리하는 제 2단계(S20)와;제 2단계(S20)에서 처리된 데이터를 서버(5)가 전송받아서 상기 컨벌류션 인공 신경망의 중, 상위 계층인 L7 내지 L12계층에 의하여 처리함으로써 상기 획득된 이미지 및 영상 데이터의 주된 방향으로 결정하여 회전 불변의 특징을 재생성하는 제 3단계(S30)와;제 3단계(S30)에서 처리된 데이터 중 상기 컨벌류션 인공 신경망의 완전 연결층의 FC 레이어값을 서버(5)의 K-D 트리 검색기(7)에 전송하여 검색함으로써 컨텐츠 제공자가 등록한 최근접 타켓을 검색하는 제 4단계(S40)와;최근접 타켓의 유사도와 문턱값을 비교하여 타겟 검색 성공여부를 판단하는 제 5단계(S50)와; 그리고타켓과 컨텐츠 결합 정보를 모바일 디바이스(3)로 전송하여 검색한 대상에 연결된 컨텐츠를 재생하는 제 6단계(S60)를 포함하는 인공신경망 기반의 인식 방법.
- 제 1항에 있어서,제 1단계(S10)에서는, 서버(5)에 포함된 상기 컨벌류션 인공 신경망을 통하여 다수의 계층을 순차적으로 수행함으로써 이미지 타켓 및 컨텐츠가 상기 검색기(7)에 등록되는 바, 상기 다수의 계층은,서버(5)에서 이미지 데이터를 준비하고, 데이터에 필터를 적용하고, 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층과; 맥스 풀링(Max pooling)을 1차로 실시하는 계층과; 맥스 풀링된 데이터에 필터를 적용하고, 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층과; 맥스 풀링을 2차로 실시하는 계층과; 맥스 풀링된 데이터에 필터를 적용하고 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층과; 맥스 풀링을 3차로 실시하는 계층과; 적어도 하나 이상의 완전 연결층과; 완전 연결층의 FC 레이어값을 웹서버(5)의 메모리(14)상에 저장된 K-D 트리 구조의 검색기(7)에 저장함으로써 등록하는 계층과; 완전 연결층의 출력값에 대하여 소프트 맥스(Softmax)를 통하여 출력값을 분류하는 계층을 포함하는 인공신경망 기반의 인식 방법.
- 제 1항에 있어서,제 2단계(S20)에서는, 상기 모바일 디바이스(3)가 상기 컨벌류션 인공 신경망에 의하여 이미지 및 영상 데이터를 처리하는 바, 상기 제 2단계(S20)는, 모바일 디바이스(3)의 카메라(10)에 의하여 이미지 및 영상을 획득하는 단계와; 상기 획득된 이미지 및 영상 데이터에 필터를 적용하고, 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층이 진행되는 단계와; 맥스 풀링(Max pooling)을 1차로 실시하는 단계와; 맥스 풀링된 데이터에 필터를 적용하고, 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층이 진행되는 단계와; 맥스 풀링을 2차로 실시하는 단계와; 2차로 맥스 풀링된 데이터에 필터를 적용하고 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층이 진행되는 단계를 포함함으로써 데이터를 서버(5)로 전송하는 것을 특징으로 하는 인공 신경망 기반의 인식 방법.
- 제 3항에 있어서,제 3단계(S30)에서는, 상기 컨벌류션 인공 신경망에 의하여 회전불변의 재생성을 처리하는 바, 제 3단계(S30)는, 모바일 디바이스(3)에서 서버(5)로 전송된 데이터에 필터를 적용하고 패딩 및 스트라이드를 실시하여 특징맵을 생성하는 단계와; 맥스 풀링을 3차로 실시하는 계층이 진행되는 단계와; 적어도 하나 이상의 완전 연결층이 진행되는 단계와; 완전 연결층의 출력값에 대하여 소프트 맥스(Softmax)를 통하여 출력값을 분류하는 단계를 포함하는 인공 신경망 기반의 인식 방법.
- 제 1항에 있어서,제 5단계(S50)에서는, 타켓 검색 성공여부 판단시, 최근접 타켓의 유사도가 문턱값 이상으로 일치할 경우는 특정 사물 인식과 일치하는 타켓을 검색하는데 성공한 것으로 판단하고, 최근접 타켓의 유사도가 문턱값 미만일 경우에는 출력값에 소프트 맥스를 취한 값으로 유사한 카테고리를 선정하는 인공 신경망 기반의 인식 방법.
- 제 5항에 있어서,제 5단계(S50)에서, 타켓 검색 성공 여부 판단시, 선정된 카테고리의 유사도와 문턱값을 비교한 결과, 유사도가 문턱값 이상인 경우 컨텐츠 메타 데이터를 검색, 재생하고, 유사도가 문턱값 이하인 경우 카메라(10)의 의하여 새로운 이미지를 취득하는 것을 특징으로 하는 인공 신경망 기반의 인식 방법.
- 하위 계층의 분석을 수행하고, 사용자의 질의 사항을 서버(5)로 전송 및 수신하여 관리하는 모바일 디바이스(3)와;모바일 디바이스(3)와 네트워크를 통하여 연결되며, 컨벌류션 인공 신경망의 중위 및 상위 계층에 해당하는 신경망의 각 층에서 데이터 처리를 수행하고 특정한 사물(이미지) 및 영상을 등록, 분석, 검색, 분류하는 서버(5)와; 그리고컨텐츠 제공자가 등록한 이미지 및 영상의 인공 신경망 처리의 FC 레이어값과, 모바일 디바이스(3)로부터 전송된 이미지 및 영상의 인공 신경망 처리의 FC 레이어값을 비교하는 검색기(7)를 포함하는 인공 신경망 기반 인식 시스템.
- 제 7항에 있어서,상기 모바일 디바이스(3)는 영상 및 이미지를 획득하는 카메라(10)와; 상기 카메라(10)에 의하여 획득한 영상 및 이미지 데이터를 처리하는 컨벌류션 인공 신경망의 하위 2계층을 실행하는 하위 계층 처리모듈(12)과; 상기 하위 계층 처리모듈(12)에 의하여 처리된 데이터를 저장하는 메모리(14)와; 이미지를 표시하는 표시창(16)을 포함하는 인공 신경망 기반 인식 시스템.
- 제 8항에 있어서,상기 하위 계층 처리모듈(12)은, 상기 카메라(10)에 의하여 획득된 데이터에 필터를 적용하고, 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층이 진행되고, 맥스 풀링(Max pooling)을 적어도 1회 실시하고, 맥스 풀링된 데이터에 필터를 적용하고, 패딩 및 스트라이드를 실시하여 특징맵들을 생성하는 적어도 하나의 계층이 진행되며, 데이터를 상기 서버(5)로 전송하는 것을 특징으로 하는 인공 신경망 기반 인식 시스템.
- 제 7항에 있어서,상기 서버(5)는 상기 모바일 디바이스(3)로부터 전송받은 컨벌루션 인공신경망 하위 계층의 출력값을 입력 이미지의 주된 방향으로 결정하여 회전 불변의 특징을 재생성하는 제 1모듈(M1)과;상기 컨텐츠 제공자로부터 전송된 이미지 타겟을 인공 신경망에 의하여 처리하는 제 2모듈(M2)과;제 2모듈(M2)에서 전송된 하위 계층 정보를 Fully-connected 결합함으로 특징 서술자 계층에서 얻은 정보를 검색기(7)에 등록하는 제 3모듈(M3)과;모바일 디바이스(3)의 제 1모듈(M1)에서 전송된 하위 계층 정보를 완전 연결함으로써 특징 서술자 계층에서 얻은 정보를 상기 검색기(7)로 전송하여 검색을 실시하는 제 4모듈(M4)과;최근접 타켓의 유사도와 문턱값을 비교하여 타겟 검색 성공여부를 판단하는 제 5모듈(M5)과; 그리고일반적 이미지 및 영상의 형태를 인식하기 위한 검색기(7) 계층의 입력값을 얻는 제 6모듈(M6)을 포함하는 인공 신경망 기반 인식 시스템.
- 제 10항에 있어서,상기 제 1모듈(M1)은, 상기 모바일 디바이스(3)에서 상기 서버(5)로 전송된 데이터에 필터를 적용하고 패딩 및 스트라이드를 실시하여 특징맵을 생성하고, 맥스 풀링을 실시하며, 적어도 하나 이상의 완전 연결층이 진행되고, 완전 연결층의 출력값에 대하여 소프트 맥스(Softmax)를 통하여 출력값을 분류하는 것을 특징으로 하는 인공 신경망 기반 인식 시스템.
- 제 10항에 있어서,상기 제 3모듈(M3)은, 완전 연결층의 FC 레이어값을 웹서버(5)의 메모리(14)상에 저장된 K-D 트리 구조의 검색기(7)에 전송함으로써 컨텐츠 제공자가 등록한 최근접 타켓을 등록하는 것을 특징으로 하는 인공 신경망 기반 인식 시스템.
- 제 11항에 있어서,상기 제 5모듈(M5)은, 최근접 타켓의 유사도가 문턱값 이상으로 일치할 경우는 특정 사물 인식과 일치하는 타켓을 검색하는데 성공한 것으로 판단하고, 최근접 타켓의 유사도가 문턱값 미만일 경우에는 완전 연결층의 출력값에 소프트 맥스를 취한 값으로 유사한 카테고리를 선정하며,선정된 카테고리의 유사도와 문턱값을 비교한 결과, 유사도가 문턱값 이상인 경우 컨텐츠 메타 데이터를 검색, 재생하고, 유사도가 문턱값 이하인 경우 상기 카메라(10)의 의하여 새로운 이미지를 취득하는 것을 특징으로 하는 인공 신경망 기반 인식시스템.
- 제 10항에 있어서,상기 제 6모듈(M6)은 상기 타켓과 연결된 컨텐츠를 재생하는 바, 컨텐츠 제공자가 기 등록한 컨텐츠를 관련 타켓과 연동함으로써 상기 모바일 디바이스(3)상에서 재생하는 것을 특징으로 하는 인공 신경망 기반 인식 시스템.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020541631A JP2020537274A (ja) | 2017-10-12 | 2018-10-12 | イメージ及び映像の登録、検索、再生をモバイルデバイス及びサーバで分割して行う畳み込み人工ニューラルネットワークベースの認識システム |
US16/755,521 US11966829B2 (en) | 2017-10-12 | 2018-10-12 | Convolutional artificial neural network based recognition system in which registration, search, and reproduction of image and video are divided between and performed by mobile device and server |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020170132692A KR101949881B1 (ko) | 2017-10-12 | 2017-10-12 | 이미지 및 영상의 등록, 검색, 재생을 모바일 디바이스 및 서버에서 분할하여 수행하는 컨벌루션 인공신경망 기반 인식 시스템 |
KR10-2017-0132692 | 2017-10-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019074316A1 true WO2019074316A1 (ko) | 2019-04-18 |
Family
ID=66100927
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2018/012022 WO2019074316A1 (ko) | 2017-10-12 | 2018-10-12 | 이미지 및 영상의 등록, 검색, 재생을 모바일 디바이스 및 서버에서 분할하여 수행하는 컨벌루션 인공신경망 기반 인식 시스템 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11966829B2 (ko) |
JP (1) | JP2020537274A (ko) |
KR (1) | KR101949881B1 (ko) |
WO (1) | WO2019074316A1 (ko) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347851A (zh) * | 2019-05-30 | 2019-10-18 | 中国地质大学(武汉) | 基于卷积神经网络的图像检索方法及系统 |
CN110532866A (zh) * | 2019-07-22 | 2019-12-03 | 平安科技(深圳)有限公司 | 视频数据检测方法、装置、计算机设备及存储介质 |
CN113094547A (zh) * | 2021-04-06 | 2021-07-09 | 大连理工大学 | 日语在线视频语料中特定动作视频片断检索方法 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7363107B2 (ja) * | 2019-06-04 | 2023-10-18 | コニカミノルタ株式会社 | 発想支援装置、発想支援システム及びプログラム |
KR102289817B1 (ko) * | 2021-01-05 | 2021-08-17 | 주식회사 아티팩츠 | 작품의 진품 검증 및 관리 제공 시스템 및 방법 |
US11867639B2 (en) * | 2021-09-15 | 2024-01-09 | Te Connectivity Solutions Gmbh | Method and apparatus for flattening and imaging a printed thin film product |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20160118028A (ko) * | 2015-04-01 | 2016-10-11 | 주식회사 씨케이앤비 | 서버 컴퓨팅 장치 및 이를 이용한 콘텐츠 인식 기반의 영상 검색 시스템 |
US20160342623A1 (en) * | 2015-05-18 | 2016-11-24 | Yahoo! Inc. | Mobile visual search using deep variant coding |
KR20170078516A (ko) * | 2015-12-29 | 2017-07-07 | 삼성전자주식회사 | 신경망 기반 영상 신호 처리를 수행하는 방법 및 장치 |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101169955B1 (ko) | 2010-10-06 | 2012-09-19 | (주)케이티엑스 | 전자파 차단용 쉴드캔 고정용 클립 |
KR101877914B1 (ko) | 2012-07-06 | 2018-07-12 | 엘지디스플레이 주식회사 | 광협제어패널과 그 제조방법 및 그를 이용한 액정표시장치 |
US20160225053A1 (en) | 2015-01-29 | 2016-08-04 | Clear Research Corporation | Mobile visual commerce system |
-
2017
- 2017-10-12 KR KR1020170132692A patent/KR101949881B1/ko active
-
2018
- 2018-10-12 JP JP2020541631A patent/JP2020537274A/ja active Pending
- 2018-10-12 US US16/755,521 patent/US11966829B2/en active Active
- 2018-10-12 WO PCT/KR2018/012022 patent/WO2019074316A1/ko active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20160118028A (ko) * | 2015-04-01 | 2016-10-11 | 주식회사 씨케이앤비 | 서버 컴퓨팅 장치 및 이를 이용한 콘텐츠 인식 기반의 영상 검색 시스템 |
US20160342623A1 (en) * | 2015-05-18 | 2016-11-24 | Yahoo! Inc. | Mobile visual search using deep variant coding |
KR20170078516A (ko) * | 2015-12-29 | 2017-07-07 | 삼성전자주식회사 | 신경망 기반 영상 신호 처리를 수행하는 방법 및 장치 |
Non-Patent Citations (2)
Title |
---|
DOMOMKOS, VARGA ET AL.: "Fast Content-based Image Retrieval Using Convolutional Neural Network and Hash Function", 2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 9 October 2016 (2016-10-09), pages 2636 - 2640, XP033060541 * |
WHANG, MINCHEOL ET AL.: "Satellite Image Matching Using K-D Tree with SURF", PROCEEDINGS OF KOREAN INSTITUTE OF INFORMATION SCIENTISTS AND ENGINEERS CONFERENCE, December 2014 (2014-12-01), pages 550 - 552 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347851A (zh) * | 2019-05-30 | 2019-10-18 | 中国地质大学(武汉) | 基于卷积神经网络的图像检索方法及系统 |
CN110532866A (zh) * | 2019-07-22 | 2019-12-03 | 平安科技(深圳)有限公司 | 视频数据检测方法、装置、计算机设备及存储介质 |
CN113094547A (zh) * | 2021-04-06 | 2021-07-09 | 大连理工大学 | 日语在线视频语料中特定动作视频片断检索方法 |
CN113094547B (zh) * | 2021-04-06 | 2022-01-18 | 大连理工大学 | 日语在线视频语料中特定动作视频片断检索方法 |
Also Published As
Publication number | Publication date |
---|---|
US20220292328A1 (en) | 2022-09-15 |
JP2020537274A (ja) | 2020-12-17 |
US11966829B2 (en) | 2024-04-23 |
KR101949881B1 (ko) | 2019-05-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019074316A1 (ko) | 이미지 및 영상의 등록, 검색, 재생을 모바일 디바이스 및 서버에서 분할하여 수행하는 컨벌루션 인공신경망 기반 인식 시스템 | |
WO2020159232A1 (en) | Method, apparatus, electronic device and computer readable storage medium for image searching | |
WO2021132927A1 (en) | Computing device and method of classifying category of data | |
WO2020138928A1 (en) | Information processing method, apparatus, electrical device and readable storage medium | |
WO2019098449A1 (ko) | 메트릭 학습 기반의 데이터 분류와 관련된 장치 및 그 방법 | |
WO2019031714A1 (ko) | 객체를 인식하는 방법 및 장치 | |
WO2010119996A1 (ko) | 동영상 관련 광고를 제공하는 방법 및 그 장치 | |
WO2019059505A1 (ko) | 객체를 인식하는 방법 및 장치 | |
WO2021006482A1 (en) | Apparatus and method for generating image | |
WO2014007586A1 (en) | Apparatus and method for performing visual search | |
WO2023224430A1 (en) | Method and apparatus for on-device personalised analysis using a machine learning model | |
WO2020085653A1 (ko) | 교사-학생 랜덤 펀을 이용한 다수의 보행자 추적 방법 및 시스템 | |
WO2020091207A1 (ko) | 이미지의 채색 완성 방법, 장치 및 컴퓨터 프로그램과 인공 신경망 학습 방법, 장치 및 컴퓨터 프로그램 | |
WO2020168606A1 (zh) | 广告视频优化方法、装置、设备及计算机可读存储介质 | |
EP3821378A1 (en) | Apparatus for deep representation learning and method thereof | |
WO2024025220A1 (ko) | 온라인 광고 컨텐트 플랫폼을 제공하기 위한 시스템 | |
WO2022244997A1 (en) | Method and apparatus for processing data | |
WO2020071618A1 (ko) | 엔트로피 기반 신경망 부분학습 방법 및 시스템 | |
WO2021071240A1 (ko) | 패션 상품 추천 방법, 장치 및 컴퓨터 프로그램 | |
WO2024172245A1 (ko) | 인공지능 모델을 이용한 블랙박스 영상 기반의 교통사고 정보 처리 시스템 및 방법 | |
WO2023018084A1 (en) | Method and system for automatically capturing and processing an image of a user | |
WO2020251236A1 (ko) | 딥러닝 알고리즘을 이용한 영상데이터 검색 방법, 장치 및 프로그램 | |
WO2021150016A1 (en) | Methods and systems for performing tasks on media using attribute specific joint learning | |
WO2019107624A1 (ko) | 시퀀스-대-시퀀스 번역 방법 및 이를 위한 장치 | |
WO2022019389A1 (ko) | 데이터 증강 기반 공간 분석 모델 학습 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18866508 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020541631 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18866508 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 220121) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18866508 Country of ref document: EP Kind code of ref document: A1 |