WO2022002242A1 - Procédé et système de reconnaissance de scène, ainsi que dispositif électronique et support - Google Patents

Procédé et système de reconnaissance de scène, ainsi que dispositif électronique et support Download PDF

Info

Publication number
WO2022002242A1
WO2022002242A1 PCT/CN2021/104224 CN2021104224W WO2022002242A1 WO 2022002242 A1 WO2022002242 A1 WO 2022002242A1 CN 2021104224 W CN2021104224 W CN 2021104224W WO 2022002242 A1 WO2022002242 A1 WO 2022002242A1
Authority
WO
WIPO (PCT)
Prior art keywords
scene
network
training
recognition
trained
Prior art date
Application number
PCT/CN2021/104224
Other languages
English (en)
Chinese (zh)
Inventor
吴臻志
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010633894.3A external-priority patent/CN111797762A/zh
Priority claimed from CN202010633911.3A external-priority patent/CN111797763A/zh
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022002242A1 publication Critical patent/WO2022002242A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the field of identification technologies, and in particular, to a scene identification method and system, an electronic device, and a computer-readable medium.
  • Neural network refers to a mathematical model that applies a structure similar to the synaptic connections of the brain for information processing. Neural networks can be used to recognize scenes.
  • the present application provides a scene identification method and system, an electronic device, and a computer-readable medium, which can realize accurate identification of various scenes.
  • an embodiment of the present application provides a scene recognition method, including: extracting features of scene data to be recognized; inputting the extracted features into a scene recognition network for recognition, and obtaining multiple scene recognition results corresponding to different scenes.
  • an embodiment of the present application provides a scene recognition system, including: a backbone network configured to extract features of scene data to be recognized; a scene recognition network device, including a scene recognition network, and the scene recognition network device is set to The extracted features are input into the scene recognition network, and multiple scene recognition results corresponding to different scenes are obtained.
  • an electronic device which includes:
  • processors one or more processors
  • a memory on which one or more programs are stored, and when the one or more programs are executed by the one or more processors, so that the one or more processors implement any one of the embodiments of the present application scene recognition method;
  • One or more I/O interfaces connected between the processor and the memory, are configured to realize the information interaction between the processor and the memory.
  • embodiments of the present application provide a computer-readable medium on which a computer program is stored, and when the program is executed by a processor, implements any one of the scene recognition methods in the embodiments of the present application.
  • a scene recognition method and system, an electronic device, and a computer-readable medium proposed in the present application input the features of the extracted scene data into a scene recognition network capable of identifying whether the scene data is a corresponding scene for a variety of scenes. Recognition to obtain multiple scene recognition results corresponding to different scenes respectively. Compared with the related art, which can only obtain the similarity between scene data and each scene, the solution identification result of the present application is more accurate.
  • FIG. 1 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of a scene recognition system provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • FIG. 11 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a computer-readable medium provided by an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • the system includes, but is not limited to, a backbone network 110 and a scene recognition network device 120 .
  • the backbone network 110 is configured to extract features of the scene data to be identified.
  • the backbone network is responsible for feature extraction of scene data.
  • the scene data includes at least one of scene video data, scene picture data and scene text data.
  • the backbone network is a deep neural network pre-trained with text, and the scene data obtains vectors representing text features through the backbone network.
  • the backbone network is a deep neural network pre-trained with an image network (ImageNet), and the scene data obtains a vector representing image features through the backbone network.
  • ImageNet image network
  • the backbone network is a multi-layer deep neural network that removes the front network part of the last few fully connected layers.
  • scene data is collected by a collection device such as a camera or a microphone, and the collected scene data is stored in the memory.
  • the scene identification network device 120 includes a scene identification network, and the scene identification network device 120 is configured to input the extracted features into the scene identification network to obtain multiple scene identification results corresponding to different scenes.
  • the scene recognition network can identify whether the scene data is a corresponding scene for a variety of scenes.
  • the features of the extracted scene data are input into the scene recognition network for identification, and multiple scene identification results corresponding to different scenes are obtained, and the scene identification results can represent whether the scene is a corresponding scene.
  • the solution identification result of this embodiment has higher accuracy.
  • FIG. 2 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • the system includes, but is not limited to, a backbone network 210 and a scene recognition network device 220 .
  • the scene identification network device 220 includes a scene identification network, and the scene identification network includes a plurality of scene networks corresponding to different scenes, for example, scene network 1, scene network 2, and scene network 3 in FIG. 2 .
  • the scene recognition network device 220 is set so that the extracted features pass through each scene network in parallel, and respectively obtain the scene recognition result corresponding to each scene network.
  • Each scene network can be set as a single-layer fully connected or multi-layer perceptron (MLP), and each scene network is called a head. There can be multiple heads that exist in parallel without affecting each other, or new heads can be added. Each head outputs a binary classification, that is, whether the scene data corresponds to the scene corresponding to the scene network.
  • a scene recognition network composed of multiple scene networks can also be called a multi-head network.
  • the scene recognition result output by the neural network is the approximation of each scene, rather than whether it is the exact result of which scene.
  • the approximation of the scene data N and the scene A is 40%
  • the similarity between scene data N and scene B is 30%
  • the similarity between scene data N and scene C is 30%
  • the recognition accuracy is poor.
  • the extracted features are passed through different scene networks in parallel, and the scene recognition results corresponding to each scene network are obtained respectively.
  • scene network 1 outputs identification result 1, indicating that scene data N and The scene corresponding to the scene network 1 is similar
  • the scene network 2 outputs the recognition result 0, indicating that the scene data N is not similar to the scene corresponding to the scene network 2
  • the scene network 3 outputs the recognition result 0, indicating that the scene data N and the scene corresponding to the scene network 3 are not similar. Therefore, it is clear that the scene data N is the scene data corresponding to the scene network 1, and the recognition result is more accurate.
  • FIG. 3 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • the system includes, but is not limited to, a backbone network 310 and a scene recognition network device 320 .
  • the scene identification network device 320 includes a scene identification network, the scene identification network includes an attention network, the attention network includes subnets corresponding to different scene identifiers, and multiple scene identifiers respectively correspond to different scenes.
  • the scene recognition network device 320 is configured to obtain the scene recognition results corresponding to each scene identifier through the extracted features through the subnets corresponding to each scene identifier.
  • the attention network is a kind of gated network.
  • some neural network nodes are connected, and the connected neural network nodes form a sub-network.
  • the form of attention input can be one-hot encoding or activity value.
  • the form of attention input is one-hot encoding
  • the scene identifier of scene A is [1,0]
  • the corresponding gated branch A is turned on (subnet A works)
  • the gated branch B is turned off, at this time the attention
  • the neurons controlled by the gated branch A are in working state, and the neurons controlled by the gated branch B are inhibited (no output is produced regardless of the input situation).
  • the scene identifier of scene B is [0,1], the corresponding gated branch B is turned on (subnet B works), and the gated branch A is turned off.
  • the neurons controlled by the gated branch B in the attention network are in In the working state, the neurons controlled by gated branch A are inhibited (no output is produced regardless of the input situation).
  • the gated input is a set of values, and each value is used for the activation activity of one gated branch. For example, the activity of gated branch A is 0.2, the activity of gated branch B is 0.8, and the gated branch If the input is [0.2, 0.8], the corresponding gated branch B is turned on (subnet B works), and the gated branch A is turned off.
  • the scene recognition result output by the neural network is the approximation degree to each scene, rather than whether it is the exact result of which scene, for example, the approximation degree to scene A is 40% , the similarity with scene B is 30%, the similarity with scene C is 30%, and the recognition accuracy is poor.
  • the scene identification results corresponding to each scene identification will be obtained through the subnets corresponding to different scene identifications.
  • subnet A outputs identification result 1, indicating that scene data N and Scenario A is similar
  • subnet B outputs a recognition result of 0, indicating that scene data N is not similar to scene B
  • subnet C outputs a recognition result of 0, indicating that scene data N is not similar to scene C, and the recognition result is more accurate. Therefore, it is clear that the scene data N is the scene data corresponding to the subnet A, and the accuracy of the identification result is higher.
  • FIG. 4 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • the system includes, but is not limited to, a positive sample device 410 , a backbone network 420 and a scene device network device 430 .
  • the positive sample device 410 is configured to output scene data to be identified to the backbone network.
  • the positive sample device collects the data of the current scene, obtains text data, image data or video data, and waits to identify the scene data.
  • the backbone network 420 is configured to extract features of the scene data to be identified.
  • the scene identification network device 430 includes a scene identification network, and the scene identification network device 430 is configured to input the extracted features into the scene identification network to obtain multiple scene identification results corresponding to different scenes.
  • the features of the extracted scene data are input into the scene recognition network for identification, and multiple scene identification results corresponding to different scenes are obtained, and the scene identification results can represent whether the scene is a corresponding scene.
  • the solution identification result of this embodiment has higher accuracy.
  • FIG. 5 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • the system includes, but is not limited to, a positive sample device 510 , a negative sample generator 520 , a scene identification device 530 , a backbone network 540 and a scene identification network device 550 .
  • the positive sample device 510 is configured to output training positive samples to the backbone network.
  • the negative sample generator 520 is configured to output training negative samples to the backbone network.
  • the training positive samples are selected scene files, and the training negative samples are other scene files except the selected scene.
  • scene data refers to collected scene data directly stored in a storage space (eg, memory)
  • scene files are an ordered collection of scene data. For example, read the data of 128 sectors from 0 to 127 in the memory, or read the first 128 bytes of the tellme.txt file in the X directory in the memory.
  • the scene identification device 530 is configured to obtain the target scene identification of the target scene, and output the target scene identification to the backbone network.
  • the target scene identification is set to identify the selected scene, and the selected scene is the target scene.
  • the backbone network 540 is configured to extract training features according to the target scene identifier, and the training features include training features for training positive samples and training features for training negative samples.
  • the scene recognition network device 550 is configured to train the scene recognition network to be trained for the target scene according to the training feature to obtain the scene recognition network.
  • the target scene may be a newly-added scene
  • the training-to-be-trained scene recognition network may be trained for the newly-added scene to obtain the ability to identify whether the scene data is the newly-added scene and whether the scene data is the original scene.
  • the scene recognition network of the scene; the target scene can also be an existing scene, and the scene recognition network to be trained can be trained for the existing scene, so as to update the recognition function for the existing scene, and the recognition function for other scenes. constant. It is more convenient and quicker to use the solution of this embodiment to train the scene recognition network.
  • FIG. 6 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • the system includes, but is not limited to, a positive sample device 610 , a negative sample generator 620 , a scene identification device 630 , a backbone network 640 and a scene identification network device 650 .
  • the positive sample device 610 is configured to output training positive samples to the backbone network.
  • the negative sample generator 620 is configured to output training negative samples to the backbone network.
  • the training positive samples are selected scene files, and the training negative samples are other scene files except the selected scene.
  • scene data refers to collected scene data directly stored in a storage space (eg, memory)
  • scene files are an ordered collection of scene data. For example, read the data of 128 sectors from 0 to 127 in the memory, or read the first 128 bytes of the tellme.txt file in the X directory in the memory.
  • the scene identification device 630 is configured to obtain the target scene identification of the target scene, and output the target scene identification to the backbone network.
  • the target scene identification is set to identify the selected scene, and the selected scene is the target scene.
  • the backbone network 640 is configured to extract training features according to the target scene identifier, and the training features include training features for training positive samples and training features for training negative samples.
  • the scene recognition network device 650 includes a plurality of scene networks and new scene networks corresponding to different scenes respectively, and the target scene identifier of the target scene corresponds to the new scene network; the scene recognition network device 650 is configured to train positive samples and training negative samples for training The feature passes through the new scene network to obtain the training recognition result corresponding to the new scene network; according to the training recognition result corresponding to the new scene network, the label of the training positive sample and the label of the training negative sample, determine the weight of the new scene network, and obtain The trained scene network.
  • the scene recognition network device 650 includes a plurality of scene networks corresponding to different scenes respectively, and the target scene identifier of the target scene corresponds to an existing scene network in the scene recognition network device; the scene recognition network device 650 is set to pass the training feature through The existing scene network is used to obtain the training identification result corresponding to the existing scene network; according to the training identification result corresponding to the existing scene network, the label of the training positive sample and the label of the training negative sample, the weight of the existing scene network is updated, and the updated scene network.
  • the multi-head network device may be instructed to identify scene data, train a new scene network, or update an existing scene network by means of button triggering, button triggering, or sending an instruction.
  • the neural network when a new scene recognition function needs to be added, the neural network is retrained according to the samples corresponding to the original scene recognition function and the samples corresponding to the new scene recognition function.
  • the original neural network can recognize the scene A, but cannot recognize the scene.
  • the neural network is retrained according to the samples of scene A and scene B, so that the similarity between scene data and scene A and scene B can be identified, for example, the similarity between scene data and scene A is 30%, and the similarity with scene B is 60%.
  • the neural network when the scene recognition function needs to be updated, the neural network is retrained according to the samples corresponding to the scene recognition function that need to be updated and the samples corresponding to other scene recognition functions that do not need to be updated.
  • the original neural network can recognize the scene A and the scene. B. If the ability to recognize scene B needs to be updated, the neural network is retrained according to the samples of scene A and the updated scene B.
  • the scene recognition network device needs to update the scene recognition function, there is no need to retrain the entire scene recognition network, and only the scene network that needs to be updated can be retrained, which is convenient and quick to update.
  • FIG. 7 is a schematic structural diagram of a scene recognition system provided by an embodiment of the present application.
  • the system includes, but is not limited to, a positive sample device 710 , a negative sample generator 720 , a scene identification device 730 , a backbone network 740 and a scene identification network device 750 .
  • the scene identification network device 750 includes a scene identification network, the scene identification network includes an attention network, the attention network includes subnetworks corresponding to different scene identifiers, and multiple scene identifiers correspond to different scenes respectively.
  • the positive sample device 710 is configured to output training positive samples to the backbone network.
  • the negative sample generator 720 is configured to output training negative samples to the backbone network.
  • the training positive samples are selected scene files, and the training negative samples are other scene files except the selected scene.
  • scene data refers to collected scene data directly stored in a storage space (eg, memory)
  • scene files are an ordered collection of scene data. For example, read the data of 128 sectors from 0 to 127 in the memory, or read the first 128 bytes of the tellme.txt file in the X directory in the memory.
  • the scene identification device 730 is configured to obtain the target scene identification of the target scene, and output the target scene identification to the backbone network.
  • the target scene identification is set to identify the selected scene, and the selected scene is the target scene.
  • the backbone network 740 is configured to extract training features according to the target scene identifier, and the training features include training features for training positive samples and training features for training negative samples.
  • the scene recognition network device 750 is configured to input the training feature and the target scene identifier into the attention network to be trained, and obtain the training recognition result of the attention network to be trained corresponding to the target scene identifier; according to the training recognition result, the label of the training positive sample and the training The label of the negative sample determines the weight of the attention network to be trained corresponding to the target scene identification, and obtains the trained attention network corresponding to the target scene identification.
  • the subnet corresponding to the target scene identifier acquired by the scene identification device may be a new subnet, that is, the training process is a training process of a new subnet (new scene); the subnet corresponding to the target scene identifier acquired by the scene identification device may be Existing subnet, that is, the training process is an update process of the existing subnet (existing scene).
  • the attention network can be instructed to recognize scene data, train a new scene network, or update an existing scene network by means of button triggering, button triggering, or sending an instruction.
  • the neural network when a new scene recognition function needs to be added, the neural network is retrained according to the samples corresponding to the original scene recognition function and the samples corresponding to the new scene recognition function.
  • the original neural network can recognize the scene A, but cannot recognize the scene.
  • the neural network is retrained according to the samples of scene A and scene B, so that the similarity between scene data and scene A and scene B can be identified, for example, the similarity between scene data and scene A is 30%, and the similarity with scene B is 60%.
  • the neural network when the scene recognition function needs to be updated, the neural network is retrained according to the samples corresponding to the scene recognition function that need to be updated and the samples corresponding to other scene recognition functions that do not need to be updated.
  • the original neural network can recognize the scene A and the scene. B. If the ability to recognize scene B needs to be updated, the neural network is retrained according to the samples of scene A and the updated scene B.
  • FIG. 8 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application. The method includes but is not limited to step S110 and step S120.
  • Step S110 extracting features of the scene data to be identified.
  • the scene data includes at least one of scene video data, scene picture data and scene text data.
  • the size of the scene data to be recognized can be 64*64*3.
  • the scene data with the size of 64*64*3 has a higher resolution and reduces the dimension. It's clearer after processing.
  • Step S120 Input the extracted features into a scene recognition network for recognition, and obtain multiple scene recognition results corresponding to different scenes.
  • the features of the extracted scene data are input into the scene recognition network for identification, and multiple scene identification results corresponding to different scenes are obtained, and the scene identification results can represent whether the scene is a corresponding scene.
  • the solution identification result of this embodiment has higher accuracy.
  • FIG. 9 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • the scene recognition network includes multiple scene networks respectively corresponding to different scenes.
  • the method includes but is not limited to step S210 and step S220.
  • Step S210 extracting features of the scene data to be identified.
  • the scene data includes at least one of scene video data, scene picture data and scene text data.
  • the size of the scene data to be recognized can be 64*64*3.
  • the scene data with the size of 64*64*3 has a higher resolution and reduces the dimension. It's clearer after processing.
  • Step S220 Pass the extracted features through each of the scene networks in parallel to obtain scene recognition results corresponding to each of the scene networks.
  • the scene recognition result output by the neural network is the similarity with each scene, rather than whether it is the exact result of which scene, for example, the similarity with scene A is 40%, The similarity with scene B is 30%, the similarity with scene C is 30%, and the recognition accuracy is poor.
  • the extracted features are passed through different scene networks in parallel, and the scene recognition results corresponding to each scene network are obtained respectively.
  • scene network 1 outputs identification result 1, indicating that scene data N and The scene corresponding to the scene network 1 is similar
  • the scene network 2 outputs the recognition result 0, indicating that the scene data N is not similar to the scene corresponding to the scene network 2
  • the scene network 3 outputs the recognition result 0, indicating that the scene data N and the scene corresponding to the scene network 3 are not similar. Therefore, it is clear that the scene data N is the scene data corresponding to the scene network 1, and the recognition result is more accurate.
  • FIG. 10 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • the scene recognition network includes an attention network, and the attention network includes a plurality of scene identifiers respectively corresponding to different scenes.
  • the method includes but is not limited to step S310 and step S320.
  • Step S310 extracting features of the scene data to be identified.
  • the scene data includes at least one of scene video data, scene picture data and scene text data.
  • the size of the scene data to be recognized can be 64*64*3.
  • the scene data with the size of 64*64*3 has a higher resolution and reduces the dimension. It's clearer after processing.
  • Step S320 traverse a plurality of the scene identifiers of the attention network according to the extracted features, and obtain a scene recognition result corresponding to each of the scene identifiers.
  • the scene recognition result output by the neural network is the similarity with each scene, rather than whether it is the exact result of which scene, for example, the similarity with scene A is 40%, The similarity with scene B is 30%, the similarity with scene C is 30%, and the recognition accuracy is poor.
  • the scene identification results corresponding to each scene identification will be obtained through the subnets corresponding to different scene identifications.
  • subnet A outputs identification result 1, indicating that scene data N and Scenario A is similar
  • subnet B outputs a recognition result of 0, indicating that scene data N is not similar to scene B
  • subnet C outputs a recognition result of 0, indicating that scene data N is not similar to scene C, and the recognition result is more accurate. Therefore, it is clear that the scene data N is the scene data corresponding to the subnet A, and the accuracy of the identification result is higher.
  • FIG. 11 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • the method includes but is not limited to step S410, step S420, step S430, and step S440.
  • Step S410 Extract training features according to the target scene identifier of the target scene, where the training features include training features for training positive samples and training features for training negative samples.
  • the training positive samples are selected scene files, and the training negative samples are other scene files except the selected scene.
  • Step S420 According to the training feature, train the scene recognition network to be trained for the target scene to obtain the scene recognition network.
  • Step S430 extracting features of the scene data to be identified.
  • the scene data includes at least one of scene video data, scene picture data and scene text data.
  • the size of the scene data to be recognized can be 64*64*3.
  • the scene data with the size of 64*64*3 has a higher resolution and reduces the dimension. It's clearer after processing.
  • Step S440 Input the extracted features into a scene recognition network for recognition, and obtain multiple scene recognition results corresponding to different scenes.
  • the target scene may be a newly-added scene
  • the training-to-be-trained scene recognition network may be trained for the newly-added scene to obtain the ability to identify whether the scene data is the newly-added scene and whether the scene data is the original scene.
  • the scene recognition network of the scene; the target scene can also be an existing scene, and the scene recognition network to be trained can be trained for the existing scene, so as to update the recognition function for the existing scene, and the recognition function for other scenes. constant. It is more convenient and quicker to use the solution of this embodiment to train the scene recognition network.
  • FIG. 12 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • the scene recognition network includes multiple scene networks respectively corresponding to different scenes.
  • the method includes but is not limited to step 510, step 520, step 530, step S540 and step S550.
  • Step 510 Extract training features according to the target scene identifier of the target scene, where the training features include training features for training positive samples and training features for training negative samples.
  • the training positive samples are selected scene files, and the training negative samples are other scene files except the selected scene.
  • Step 520 Pass the training feature through the network to be trained corresponding to the target scene to obtain a training identification result corresponding to the network to be trained.
  • the network to be trained is an existing scene network or a new scene network.
  • Step 530 Determine the weight of the network to be trained according to the training recognition result, the label of the training positive sample and the label of the training negative sample, and obtain the trained scene network.
  • the training mechanism of the network to be trained is as follows, among which:
  • Y pr is the obtained output
  • Y gt is the correct output
  • W is the weight
  • X is the input
  • is the activation function (sigmoid)
  • is a constant.
  • Step S540 extracting features of the scene data to be identified.
  • Step S550 Pass the extracted features through each of the scene networks in parallel to obtain scene recognition results corresponding to each of the scene networks.
  • the neural network when a new scene recognition function needs to be added, the neural network is retrained according to the samples corresponding to the original scene recognition function and the samples corresponding to the new scene recognition function.
  • the original neural network can recognize the scene A, but cannot recognize the scene.
  • the neural network is retrained according to the samples of scene A and scene B, so that the similarity between scene data and scene A and scene B can be identified, for example, the similarity between scene data and scene A is 30%, and the similarity with scene B is 60%.
  • the neural network when the scene recognition function needs to be updated, the neural network is retrained according to the samples corresponding to the scene recognition function that need to be updated and the samples corresponding to other scene recognition functions that do not need to be updated.
  • the original neural network can recognize the scene A and the scene. B. If the ability to recognize scene B needs to be updated, the neural network is retrained according to the samples of scene A and the updated scene B.
  • FIG. 13 is a schematic flowchart of a scene recognition method provided by an embodiment of the present application.
  • the scene recognition network includes an attention network, and the attention network includes a plurality of scene identifiers respectively corresponding to different scenes.
  • the method includes, but is not limited to, steps S610, S620, S630, S640, and S650.
  • Step S610 Extract training features according to the target scene identifier of the target scene, where the training features include training features for training positive samples and training features for training negative samples.
  • Step S620 Input the training feature and the target scene identifier into the attention network to be trained, and obtain the training recognition result of the attention network to be trained corresponding to the target scene identifier.
  • the target scene identifier corresponds to an existing subnet or a new subnet in the network to be trained.
  • Step S630 according to the training recognition result, the label of the training positive sample and the label of the training negative sample, determine the weight of the attention network to be trained corresponding to the target scene identifier, and obtain the corresponding target scene identifier.
  • the trained attention network according to the training recognition result, the label of the training positive sample and the label of the training negative sample, determine the weight of the attention network to be trained corresponding to the target scene identifier, and obtain the corresponding target scene identifier.
  • Step S640 extracting features of the scene data to be identified.
  • Step S650 traverse a plurality of the scene identifiers of the attention network according to the extracted features, and obtain a scene recognition result corresponding to each of the scene identifiers.
  • the neural network when a new scene recognition function needs to be added, the neural network is retrained according to the samples corresponding to the original scene recognition function and the samples corresponding to the new scene recognition function.
  • the original neural network can recognize the scene A, but cannot recognize the scene.
  • the neural network is retrained according to the samples of scene A and scene B, so that the similarity between scene data and scene A and scene B can be identified, for example, the similarity between scene data and scene A is 30%, and the similarity with scene B is 60%.
  • the neural network when the scene recognition function needs to be updated, the neural network is retrained according to the samples corresponding to the scene recognition function that need to be updated and the samples corresponding to other scene recognition functions that do not need to be updated.
  • the original neural network can recognize the scene A and the scene. B. If the ability to recognize scene B needs to be updated, the neural network is retrained according to the samples of scene A and the updated scene B.
  • FIG. 14 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic equipment includes:
  • processors 810 one or more processors 810;
  • the memory 820 on which one or more programs are stored, and when the one or more programs are executed by one or more processors, the one or more processors implement any one of the scene recognition methods in the embodiments of the present application;
  • One or more I/O interfaces 830 connected between the processor and the memory, are configured to realize the information exchange between the processor and the memory.
  • the processor 810 is a device with data processing capability, including but not limited to a central processing unit (CPU), etc.;
  • the memory 820 is a device with data storage capability, including but not limited to random access memory (RAM, more specifically Such as SDRAM, DDR, etc.), read only memory (ROM), electrified erasable programmable read only memory (EEPROM), flash memory (FLASH);
  • the I/O interface (read and write interface) 830 is connected between the processor 810 and the memory 820 , which can realize the information interaction between the processor 810 and the memory 820, which includes but is not limited to a data bus (Bus) and the like.
  • Buss data bus
  • processor 810, memory 820, and I/O interface 830 are interconnected by bus 840, which in turn is connected to other components of the computing device.
  • FIG. 15 is a schematic structural diagram of a computer-readable medium provided by an embodiment of the present application.
  • the computer-readable medium has a computer program stored thereon, and when the program is executed by the processor, any one of the scene recognition methods in the embodiments of the present application is implemented.
  • the various embodiments of the present application may be implemented in hardware or special purpose circuits, software, logic, or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor or other computing device, although the application is not limited thereto.
  • Embodiments of the present application may be implemented by the execution of computer program instructions by a data processor of a mobile device, eg in a processor entity, or by hardware, or by a combination of software and hardware.
  • the computer program instructions may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or source code written in any combination of one or more programming languages or object code.
  • ISA instruction set architecture
  • the block diagrams of any logic flow in the figures of the present application may represent program steps, or may represent interconnected logic circuits, modules and functions, or may represent a combination of program steps and logic circuits, modules and functions.
  • Computer programs can be stored on memory.
  • the memory may be of any type suitable for the local technical environment and may be implemented using any suitable data storage technology, such as but not limited to read only memory (ROM), random access memory (RAM), optical memory devices and systems (Digital Versatile Discs). DVD or CD disc) etc.
  • Computer-readable media may include non-transitory storage media.
  • the data processor may be of any type suitable for the local technical environment, such as, but not limited to, a general purpose computer, special purpose computer, microprocessor, digital signal processor (DSP), application specific integrated circuit (ASIC), programmable logic device (FPGA) and processors based on multi-core processor architectures.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA programmable logic device

Abstract

La présente invention concerne un procédé et un système de reconnaissance de scène, ainsi qu'un dispositif électronique et un support lisible par ordinateur. Le procédé comprend les étapes consistant à : extraire une caractéristique de données de scène à reconnaître ; et entrer la caractéristique extraite dans un réseau de reconnaissance de scène pour une reconnaissance, de façon à obtenir une pluralité de résultats de reconnaissance de scène correspondant respectivement à différentes scènes. Au moyen du procédé et du système de reconnaissance de scène selon la présente demande, une caractéristique de données de scène extraites est entrée dans un réseau de reconnaissance de scène qui est susceptible de reconnaître, pour diverses scènes, si les données de scène sont une scène correspondante, de façon à obtenir une pluralité de résultats de reconnaissance de scène correspondant respectivement à différentes scènes. Par comparaison avec l'état de la technique associé, dans lequel seule la similarité entre des données de scène et chaque scène peut être obtenue, la solution de la présente demande fournit un résultat de reconnaissance ayant une précision plus élevée.
PCT/CN2021/104224 2020-07-02 2021-07-02 Procédé et système de reconnaissance de scène, ainsi que dispositif électronique et support WO2022002242A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202010633894.3A CN111797762A (zh) 2020-07-02 2020-07-02 一种场景识别方法和系统
CN202010633894.3 2020-07-02
CN202010633911.3 2020-07-02
CN202010633911.3A CN111797763A (zh) 2020-07-02 2020-07-02 一种场景识别方法和系统

Publications (1)

Publication Number Publication Date
WO2022002242A1 true WO2022002242A1 (fr) 2022-01-06

Family

ID=79317469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104224 WO2022002242A1 (fr) 2020-07-02 2021-07-02 Procédé et système de reconnaissance de scène, ainsi que dispositif électronique et support

Country Status (1)

Country Link
WO (1) WO2022002242A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114740751A (zh) * 2022-06-15 2022-07-12 新缪斯(深圳)音乐科技产业发展有限公司 基于人工智能的音乐场景识别方法及系统
CN116170829A (zh) * 2023-04-26 2023-05-26 浙江省公众信息产业有限公司 一种独立专网业务的运维场景识别方法及装置
CN116528282A (zh) * 2023-07-04 2023-08-01 亚信科技(中国)有限公司 覆盖场景识别方法、装置、电子设备和可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663448A (zh) * 2012-03-07 2012-09-12 北京理工大学 一种基于网络的增强现实物体识别分析方法
CN105930794A (zh) * 2016-04-20 2016-09-07 东北大学 一种基于云计算的室内场景识别方法
CN111104898A (zh) * 2019-12-18 2020-05-05 武汉大学 基于目标语义和注意力机制的图像场景分类方法及装置
CN111797762A (zh) * 2020-07-02 2020-10-20 北京灵汐科技有限公司 一种场景识别方法和系统
CN111797763A (zh) * 2020-07-02 2020-10-20 北京灵汐科技有限公司 一种场景识别方法和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663448A (zh) * 2012-03-07 2012-09-12 北京理工大学 一种基于网络的增强现实物体识别分析方法
CN105930794A (zh) * 2016-04-20 2016-09-07 东北大学 一种基于云计算的室内场景识别方法
CN111104898A (zh) * 2019-12-18 2020-05-05 武汉大学 基于目标语义和注意力机制的图像场景分类方法及装置
CN111797762A (zh) * 2020-07-02 2020-10-20 北京灵汐科技有限公司 一种场景识别方法和系统
CN111797763A (zh) * 2020-07-02 2020-10-20 北京灵汐科技有限公司 一种场景识别方法和系统

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114740751A (zh) * 2022-06-15 2022-07-12 新缪斯(深圳)音乐科技产业发展有限公司 基于人工智能的音乐场景识别方法及系统
CN114740751B (zh) * 2022-06-15 2022-09-02 新缪斯(深圳)音乐科技产业发展有限公司 基于人工智能的音乐场景识别方法及系统
CN116170829A (zh) * 2023-04-26 2023-05-26 浙江省公众信息产业有限公司 一种独立专网业务的运维场景识别方法及装置
CN116528282A (zh) * 2023-07-04 2023-08-01 亚信科技(中国)有限公司 覆盖场景识别方法、装置、电子设备和可读存储介质
CN116528282B (zh) * 2023-07-04 2023-09-22 亚信科技(中国)有限公司 覆盖场景识别方法、装置、电子设备和可读存储介质

Similar Documents

Publication Publication Date Title
WO2022002242A1 (fr) Procédé et système de reconnaissance de scène, ainsi que dispositif électronique et support
CN109977262B (zh) 从视频中获取候选片段的方法、装置及处理设备
CN109145766B (zh) 模型训练方法、装置、识别方法、电子设备及存储介质
US10002290B2 (en) Learning device and learning method for object detection
JP2016072964A (ja) 被写体再識別のためのシステム及び方法
CN106850338B (zh) 一种基于语义分析的r+1类应用层协议识别方法与装置
CN111797762A (zh) 一种场景识别方法和系统
JP2017062778A (ja) 画像のオブジェクトを分類するための方法およびデバイスならびに対応するコンピュータプログラム製品およびコンピュータ可読媒体
Li et al. Domain adaption of vehicle detector based on convolutional neural networks
EP3937076A1 (fr) Dispositif de détection d'activité, système de détection d'activité et procédé de détection d'activité
CN111291887A (zh) 神经网络的训练方法、图像识别方法、装置及电子设备
CN111046971A (zh) 图像识别方法、装置、设备及计算机可读存储介质
US11380133B2 (en) Domain adaptation-based object recognition apparatus and method
KR102185979B1 (ko) 동영상에 포함된 객체의 운동 유형을 결정하기 위한 방법 및 장치
Wang et al. Rethinking the learning paradigm for dynamic facial expression recognition
CN113012054A (zh) 基于抠图的样本增强方法和训练方法及其系统和电子设备
US11423262B2 (en) Automatically filtering out objects based on user preferences
CN111797763A (zh) 一种场景识别方法和系统
KR20200018154A (ko) 브이에이이 모델 기반의 반지도 학습을 이용한 음향 정보 인식 방법 및 시스템
CN110659631A (zh) 车牌识别方法和终端设备
Baba et al. Stray dogs behavior detection in urban area video surveillance streams
JP2009122829A (ja) 情報処理装置、情報処理方法、およびプログラム
WO2022228325A1 (fr) Procédé de détection de comportement, dispositif électronique et support d'enregistrement lisible par ordinateur
Nguyen et al. Real-time smile detection using deep learning
CN109145991B (zh) 图像组生成方法、图像组生成装置和电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21833875

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.04.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21833875

Country of ref document: EP

Kind code of ref document: A1