WO2016041442A1 - 数据处理的方法和设备 - Google Patents

数据处理的方法和设备 Download PDF

Info

Publication number
WO2016041442A1
WO2016041442A1 PCT/CN2015/088832 CN2015088832W WO2016041442A1 WO 2016041442 A1 WO2016041442 A1 WO 2016041442A1 CN 2015088832 W CN2015088832 W CN 2015088832W WO 2016041442 A1 WO2016041442 A1 WO 2016041442A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
data
scenario
perceptual
sensing data
Prior art date
Application number
PCT/CN2015/088832
Other languages
English (en)
French (fr)
Inventor
王靓伟
陈嘉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP15842270.9A priority Critical patent/EP3188081B1/en
Publication of WO2016041442A1 publication Critical patent/WO2016041442A1/zh
Priority to US15/460,339 priority patent/US10452962B2/en
Priority to US16/586,209 priority patent/US11093806B2/en
Priority to US17/403,315 priority patent/US20220036142A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/771Feature selection, e.g. selecting representative features from a multi-dimensional feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • Terminal devices such as mobile phones, wearable devices, and robots all have the need to recognize a variety of objects, sounds, and motions from sensing data such as images, videos, and sounds. For example, if a mobile phone wants to perform a photo search, it needs to identify the target item in the photographed photo before searching for the information related to the target item. For another example, the robot needs to perform the task of grasping the target item, and first needs to obtain the position of the target item in the surrounding environment through the camera data.
  • a common method is to train a perceptual model that distinguishes various objects, sounds, or motions from a large amount of known sample data. For each new image, video or sound input, the terminal device can calculate the corresponding recognition result based on the trained perception model.
  • the perceptual models used to identify perceptual data are becoming more and more complex, such as the parameters of the perceptual model are increasing.
  • the parameters of the Convolutional Neural Network (CNN) model for image recognition have reached tens of millions or even hundreds of millions.
  • CNN Convolutional Neural Network
  • the perceptual model needs to accurately identify a large number of objects, motions, and sounds in various given scenarios, which poses a great challenge to the accuracy of the perceptual model.
  • all the recognition tasks are completed by using the parameter-aware perceptual model, and the complexity of the perceptual model will increase infinitely with the refinement of the identification requirements, which will bring great challenges to storage and calculation.
  • Embodiments of the present invention provide a data processing method and device, which can solve the problem of contradiction between the computing power of the device and the complexity of the sensing model.
  • the first aspect provides a method of data processing, the method comprising:
  • Acquiring target sensing data which is any one of the following data: image data, video data, and sound data;
  • the recognition result of the target sensing data is calculated.
  • determining a target scenario to which the target sensing data belongs includes:
  • the target scene is determined by performing scene analysis on the target sensing data.
  • the target sensing data is data generated at a current location of the terminal
  • the scenario is determined by performing scene analysis on the target sensing data, including:
  • determining a target scenario to which the target sensing data belongs includes:
  • determining that the target scenario corresponds to Target awareness model including:
  • the target perceptual model corresponding to the target scenario is determined, and each perceptual model in the perceptual model library corresponds to a scenario.
  • the method further includes:
  • determining that the target scenario corresponds to Target awareness model including:
  • the service is provided And sending a second request for requesting a perceptual model corresponding to the target scenario, where each perceptual model in the perceptual model library corresponds to a scenario;
  • the second aspect provides a method of data processing, the method comprising:
  • the terminal Receiving, by the terminal, a request message for requesting a sensing model corresponding to a scenario to which the target sensing data belongs, the target sensing data being any one of the following data: image data, video data, and sound data;
  • the method before the receiving the request message, the method further includes:
  • the perceptual data sample including at least a part of the perceptual data having the scene annotation information and the item annotation information;
  • the perceptual model corresponding to the different scenarios is stored in the perceptual model library, and the perceptual model library includes the target perceptual model.
  • determining, according to the request message, the target scenario to which the target sensing data belongs includes:
  • the target sensing data is data generated at a current location of the terminal
  • the determining the target scenario to which the target sensing data belongs includes:
  • the fourth aspect in the second aspect In a possible implementation manner, determining, according to the request message, a target scenario to which the target sensing data belongs, including:
  • a third aspect provides a data processing device, the device comprising:
  • An obtaining module configured to acquire target sensing data, where the target sensing data is any one of the following data: image data, video data, and sound data;
  • a first determining module configured to determine a target scenario to which the target sensing data acquired by the acquiring module belongs
  • a second determining module configured to determine a target sensing model corresponding to the target scenario determined by the first determining module
  • a calculation module configured to calculate, according to the target sensing model determined by the second determining module, a recognition result of the target sensing data acquired by the acquiring module.
  • the first determining module is specifically configured to determine the target scenario by performing scene analysis on the target sensing data.
  • the target sensing data is data generated at a current location of the terminal
  • the first determining module is specifically configured to perform scene analysis on the target sensing data according to the positioning information of the current location of the terminal, and determine the target scenario.
  • a first sending unit configured to send, to the server, a first request for requesting a scenario to which the target sensing data belongs;
  • the first receiving unit is configured to receive the target scenario that is sent by the server according to the first request.
  • the second determining module is specific And determining, by using the pre-stored perceptual model library, the target perceptual model corresponding to the target scenario, where each perceptual model in the perceptual model library corresponds to a scenario.
  • the device further includes:
  • An update module configured to be based on a user history field before the obtaining module acquires target sensing data
  • the scene sequence updates the perceptual model library, and the updated perceptual model library includes the target perceptual model corresponding to the target scene.
  • the second determining module includes :
  • a second sending unit configured to: when determining that the pre-stored perceptual model library does not have a perceptual model corresponding to the target scenario, send a second request for requesting a perceptual model corresponding to the target scenario to the server, where each perceptual model library Each perception model corresponds to a scene;
  • a second receiving unit configured to receive the target sensing model corresponding to the target scenario that is sent by the server according to the second request.
  • a fourth aspect provides a data processing device, the device comprising:
  • a second determining module configured to determine, from the pre-stored perceptual model library, a target perceptual model corresponding to the target scenario determined by the first determining module, where each model in the perceptual model library corresponds to a scenario;
  • the device further includes:
  • An acquiring module configured to acquire, before the receiving module receives the request message, a sample of the sensing data, where the sensing data sample includes at least a part of the sensing data having the scene labeling information and the item labeling information;
  • a training module configured to train a perceptual model corresponding to different scenarios according to the perceptual data sample
  • the first determining module is specifically configured to determine the target scenario to which the target sensing data belongs by performing scene analysis on the target sensing data included in the request message.
  • the first determining module is specifically configured to perform scene analysis on the target sensing data according to the positioning information of the current location of the terminal, and determine the target scenario.
  • the first determining module is specifically configured to: according to the indication included in the request message The identifier of the target scene determines the target scene.
  • the perceptual model corresponding to the scenario is used to calculate and obtain the recognition result of the perceptual data, which can be compared with the prior art. Reduce computational complexity and increase the efficiency of data processing.
  • FIG. 1 is a schematic flowchart of a method of data processing according to an embodiment of the present invention.
  • FIG. 2 is a schematic flow chart showing a method of data processing according to another embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a training perception model provided by another embodiment of the present invention.
  • FIG. 4 shows a schematic block diagram of an apparatus for data processing according to an embodiment of the present invention.
  • FIG. 5 shows another schematic block diagram of an apparatus for data processing according to an embodiment of the present invention.
  • FIG. 6 is a schematic block diagram of an apparatus for data processing according to another embodiment of the present invention.
  • FIG. 7 is another schematic block diagram of an apparatus for data processing according to another embodiment of the present invention.
  • FIG. 8 is a schematic block diagram of an apparatus for data processing according to an embodiment of the present invention.
  • FIG. 9 is a schematic block diagram of an apparatus for data processing according to another embodiment of the present invention.
  • GSM Universal Mobile Telecommunication System
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • GPRS General Packet Radio Service
  • LTE Long Term Evolution
  • FDD Frequency Division Duplex
  • TDD Time Division duplex
  • TDD Universal Mobile Telecommunication System
  • WiMAX Worldwide Interoperability for Microwave Access
  • the terminal may also be referred to as a user equipment (User Equipment, referred to as "UE"), a mobile station (Mobile Station, referred to as "MS”), a mobile terminal (Mobile Terminal), and the like.
  • the terminal may communicate with one or more core networks via a Radio Access Network (RAN), for example, the terminal may be a mobile phone (or "cellular" phone) or have a mobile terminal.
  • RAN Radio Access Network
  • the terminal may also be a portable, pocket-sized, handheld, computer-integrated or in-vehicle mobile device that exchanges language and/or data with the wireless access network.
  • the terminal may be a mobile phone. Wear equipment, robots, etc.
  • a fixed parameter perceptual model is generally used to complete all the recognition tasks. For example, for the perceptual data generated in a supermarket, a hospital or a kitchen, the same perceptual model is used to perform the identification task. As more and more types of objects are to be identified and the requirements for recognition accuracy become higher and higher, the parameters of the perceptual model are increasing. For example, the parameters of CNN models for image recognition of tens of thousands of objects have been reached. Tens of millions, even hundreds of millions, will inevitably increase the computational complexity of the perceptual model, and at the same time challenge the storage space of the perceptual model.
  • the present invention provides a data processing method.
  • a perceptual model considering the factors of the scene, generating respective sensing models of different scenarios; in the process of recognizing and calculating the perceptual data, first determining the Perceive the scene to which the data belongs, and then acquire the scene Corresponding perceptual model, finally, using the perceptual model corresponding to the scene to which the perceptual data belongs, the recognition result of the perceptual data is calculated.
  • the data processing method proposed by the present invention can greatly simplify the complexity and the calculation amount of the sensing model under the premise of maintaining or even improving the accuracy of data recognition, thereby solving the contradiction between the computing power of the terminal and the complexity of the model. Can effectively improve the recognition ability.
  • the user takes a picture on the square, and can use the mobile phone to identify objects such as flower beds, cafes, buses, etc. according to the photo; for example, if the user takes a picture in the forest, the mobile phone can recognize the objects such as sputum, chrysanthemum, scorpion and the like according to the photo. That is, the technical solution provided by the embodiment of the present invention can be applied to the following.
  • the user takes a photo with a mobile phone in various scenarios, and the mobile phone recognizes various objects from the photo and returns the name of the object. It should be understood that the above mobile phone may also be other terminal devices.
  • FIG. 1 illustrates a method 100 of data processing in accordance with an embodiment of the present invention, such as being performed by a terminal. As shown in FIG. 1, the method 100 includes:
  • target sensing data is any one of the following data: image data, video data, and sound data.
  • the target sensing data may be data measured or generated by a sensing device such as a camera, a microphone, an infrared sensor, or a depth sensor; for example, the target sensing data is a picture, or a video, or a recording.
  • a sensing device such as a camera, a microphone, an infrared sensor, or a depth sensor; for example, the target sensing data is a picture, or a video, or a recording.
  • the terminal that acquires the target sensing data may also be a device that generates the target sensing data.
  • the S110 acquires the target sensing data, including:
  • the target sensing data is generated at the current location of the terminal.
  • the user takes a picture on the scene A with the mobile phone and directly processes the taken picture with the mobile phone.
  • the terminal that acquires the target sensing data may be associated with the device that generates the target sensing data. It is a different device. For example, the user takes a photo with a mobile phone (the mobile phone generates sensory data), and then uploads the photo to a laptop (the laptop acquires the sensory data for subsequent data processing) and processes it accordingly.
  • the S120 determines the target scenario to which the target sensing data belongs, and includes: determining the target scenario by performing scene analysis on the target sensing data.
  • the scene recognizer may perform scene analysis on the target sensing data to determine a target scene to which the target sensing data belongs.
  • the scene recognizer may be an existing scene classification model, such as a Support Vector Machine (SVM) multi-classifier.
  • SVM Support Vector Machine
  • GIST global feature
  • Dense SIFT Dense Scale-invariant feature transform
  • the S120 determines the target scenario by performing scene analysis on the target sensing data, including: combining the location information of the current location of the terminal, performing scene analysis on the target sensing data, and determining the target scenario.
  • the target perceptual data is taken as the example of the photo taken in the square A where the terminal is currently located, and the target perceptual data is analyzed by the SVM multi-classifier.
  • the recognition result is that the target scene to which the target perceptual data belongs is a square.
  • the location information of the current location of the terminal ie, the square A
  • the target scene to which the target sensing data belongs may be further limited to the square A by the square.
  • the target sensing data is a photo taken in the kitchen where the terminal is currently located, and the target sensing data is analyzed by the SVM multi-classifier, and the identification thereof is performed.
  • the target scene to which the target sensing data belongs is indoor; then, the positioning information (ie, the kitchen) of the current location of the terminal is obtained, and according to the positioning information, the target scene to which the target sensing data belongs may be further limited from the indoor to the kitchen.
  • the scene to which the target sensing data belongs can be further limited to a smaller space-time region, and it should be understood that a more specific scene with a relatively small range corresponds to
  • the perceptual model will be simplified accordingly, and its computational complexity is relatively small.
  • the calculation amount of the perceptual model based on the perceptual model is also relatively small.
  • the method for acquiring the location information of the current location of the terminal may be any one of the following methods, or a combination of multiple methods: by wifi positioning method, by synchronous positioning and map construction ( Simultaneous Localization and Mapping (SLAM) is a method of functional positioning.
  • the method for locating by wifi is specifically: the terminal scans and collects the wireless access point signal of the surrounding wifi, and acquires the MAC address. Generally, the wireless access point does not move for a certain period of time.
  • the terminal can report the MAC address to the location server, and the location server can retrieve the geographical location of the previously saved wireless access point, and combine each The strength of the wireless access point signal is calculated, and the geographic location of the corresponding terminal is calculated, and the corresponding positioning information is sent to the terminal, so that the terminal acquires the positioning information of the current location.
  • the SLAM technology specifically refers to the construction of a map and the determination of its position by repeating observations during the movement of the camera.
  • the SLAM technology is prior art and will not be described here for brevity.
  • the positioning information of the current location of the terminal such as the GPS, may be obtained by other positioning methods, which is not limited by the embodiment of the present invention.
  • the target scenario to which the target-aware data belongs may also be sent to the server.
  • S120 determines a target scenario to which the target sensing data belongs, including:
  • the first request includes the target sensing data, and may further include an identifier of the terminal.
  • the target perceptual model corresponding to the target scenario may be determined according to the perceptual model library pre-stored by the terminal; and the target perceptual model corresponding to the target scenario may be requested by the network side server, which is not limited in this embodiment of the present invention.
  • S130 determines a target perception model corresponding to the target scenario, including:
  • the target perceptual model corresponding to the target scenario is determined, and each perceptual model in the perceptual model library corresponds to a scenario.
  • the terminal may cache the received sensing model and its corresponding scene identifier into the sensing model library for subsequent use; optionally, when determining the sensing When the storage space of the model library is fully occupied, the first cached perceptual model can be deleted, and then the latest received perceptual model is cached into the perceptual model library.
  • the target sensing model When it is determined that the target sensing model corresponding to the target scenario to which the target sensing data belongs is not in the sensing model library of the terminal local cache, the target sensing model may be requested from the network side server.
  • S130 determines a target perception model corresponding to the target scenario, including:
  • a second request for requesting a perceptual model corresponding to the target scenario is sent to the server, and each perceptual model in the perceptual model library corresponds to one Scenes;
  • the second request includes an identifier for indicating the target scenario, and may also include an identifier of the terminal.
  • the received target sensing model corresponding to the target scenario may be cached in the local model pre-stored perceptual model library, so that the target perceptual model corresponding to the target scenario needs to be acquired next time. At that time, it can be obtained directly from the terminal without requesting the target perception model again from the server.
  • the target scene belongs to the plaza as an example, and loading a corresponding target sensing model from the pre-stored sensing model library, for example, ⁇ piazza.model>, the image of the camera The data is analyzed and identified.
  • the recognition process is as follows: reading an image; generating a plurality of partial image regions from the original image using a sliding window; and inputting the partial region image into a convolutional neural network configured according to a link weight parameter in a ⁇ piazza.model> file (Convolutional Neural Networks, Referred to as "CNN", it outputs one or more recognition results, such as ⁇ flower beds, benches, tourists, buses, cars, police, children, balloons, cafes>.
  • CNN Convolutional Neural Networks, Referred to as "CNN”
  • the calculation process of S140 can also calculate the recognition result according to an algorithm model such as Deep Neural Networks (DNN).
  • DNN Deep Neural Networks
  • the calculation step is: sequentially inputting the perceptual data, selecting a data block, sequentially concatenating and sampling the neural network, and calculating the classification matrix to generate a classification result.
  • the calculation process in S140 includes, but is not limited to, being performed entirely on a Central Processing Unit ("CPU"); for example, taking input perceptual data as image data as an example, the calculation The process may also be partially performed on a graphics processing unit (Graphic Processing Unit, referred to as "GPU") chip; for example, if the input sensing data is sound data or video data, etc., the calculation process may be partially on the corresponding dedicated chip.
  • CPU Central Processing Unit
  • GPU Graphic Processing Unit
  • the server needs to respond to the request of the perceptual data identification of each terminal, and
  • the data processing method of the embodiment of the present invention can effectively reduce the computational burden of the network side server and the bandwidth burden required for transmitting data, and can also improve the speed of the recognition calculation.
  • the perceptual model corresponding to the scenario is used to calculate and obtain the recognition result of the perceptual data, which can reduce the computational complexity compared with the prior art. Degree, which can improve the efficiency of data processing.
  • S151 Predict a scene to which the perceptual data to be identified belongs according to a sequence of user historical scenes.
  • the user S is one day on weekdays.
  • the scene and its time series are: 06:00 bedroom; 7:00 living room; 7:20 street; 7:30 high speed; 7:40 park garage; the above scene and its time series as a conditional random field (Conditional Random Field, short for For the input sequence of the "CRF" algorithm model, predict the next most likely scenario and its probability, for example, office: 0.83; conference room: 0.14.
  • a third request for requesting a perceptual model corresponding to the office scene and the conference room scene respectively is sent to the server, where the third request includes an identifier for indicating an office scene and an identifier for indicating the conference room scene.
  • the sensing model corresponding to the office scene and the conference room scene sent by the server is received, and the sensing model corresponding to the two scenes is stored in the form of the scene identifier (number or type) + the sensing model to the local pre-stored sensing model library. in.
  • the sensing model corresponding to the office can be directly obtained according to the locally stored sensing model library, and the recognition result of the sensing data is obtained according to the updated sensing model.
  • the recognition result corresponding to each scene pre-stored by the terminal can be used to calculate the recognition result of the acquired sensing data, which can effectively improve the efficiency of data processing.
  • the executive body takes a mobile phone as an example, taking the target-aware data as a photo taken in A-square.
  • the execution subject is taken as an example of a photograph taken in the kitchen of the indoor user John.
  • the precise location of the mobile phone in the indoor map that is, the kitchen of the user John's home, can be obtained by the wifi signal and the SLAM function.
  • the convolutional neural network of several configurations outputs one or more recognition results, such as ⁇ gas stove, range hood, cupboard, wok, spoon, seasoning box>.
  • the perceptual model corresponding to the scenario is used to calculate and obtain the recognition result of the perceptual data, which can reduce the computational complexity compared with the prior art. Degree, which can improve the efficiency of data processing.
  • the target sensing model for processing the target sensing data may also be requested directly by sending the target sensing data to the server.
  • the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • the target sensing data when the target sensing data is identified, the target scene to which the target sensing data belongs is determined first, and then the target sensing model corresponding to the target scene is used to calculate and acquire the target sensing data.
  • the recognition result of the present invention is that the corresponding sensing model of the specific scene has a high recognition accuracy rate for the object in the scene, and the present invention is implemented with respect to the perceptual computing model for processing the sensing data in different scenarios in the prior art.
  • the computational complexity of the corresponding perceptual models of each scenario is greatly reduced, thereby effectively reducing the requirements on computing power and improving the efficiency of data processing.
  • the executor of the data processing method of the embodiment of the present invention may be a terminal, a server, or a combination of a terminal and a server, which is not limited by the embodiment of the present invention.
  • the perceptual model corresponding to the scenario is used to obtain the recognition result of the perceptual data, which can reduce the computational complexity compared to the prior art. , thereby improving the efficiency of data processing.
  • the target sensing data may be data measured or generated by a sensing device such as a camera, a microphone, an infrared sensor, or a depth sensor; for example, the target sensing data is a picture, or a video, or a recording.
  • a sensing device such as a camera, a microphone, an infrared sensor, or a depth sensor; for example, the target sensing data is a picture, or a video, or a recording.
  • the request message may only include the target sensing data, that is, directly requesting a sensing model for processing the target sensing data from the server, and it should be understood that, in this case, the server determines the target scenario to which the target sensing data belongs, and then determines Corresponding target perceptual model; the request message may also directly include an identifier for indicating a target scenario to which the target perceptual data belongs, that is, a requesting model for requesting the target scenario, in which case the server may directly The request message determines the target scene to determine the corresponding perceptual model.
  • the target scenario data may be analyzed and identified to determine a target scenario to which the target sensing data belongs; the request message sent by the terminal indicates the In the case that the target-aware data belongs to the scenario, the target scenario can be directly determined according to the request message, which is not limited in this embodiment of the present invention.
  • the S220 determines, according to the request message, a target scenario to which the target sensing data belongs, including:
  • the target scenario to which the target sensing data belongs is determined by performing scene analysis on the target sensing data included in the request message.
  • the scene recognizer may perform scene analysis on the target sensing data to determine a target scene to which the target sensing data belongs.
  • the scene recognizer may be an existing scene classification model, such as a Support Vector Machine (SVM) multi-classifier.
  • SVM Support Vector Machine
  • GIST global feature
  • Dense SIFT Dense Scale-invariant feature transform
  • Textton Histograms Textton Histograms
  • the type of scene for example, identifies the scene as "square".
  • the location information to which the target sensing data belongs may be further defined in combination with the positioning information of the current location of the terminal.
  • the determining the target scenario to which the target sensing data belongs includes:
  • the target perceptual data is taken as the example of the photo taken in the square A where the terminal is currently located, and the target perceptual data is analyzed by the SVM multi-classifier.
  • the recognition result is that the target scene to which the target perceptual data belongs is a square.
  • the location information of the current location of the terminal ie, the square A
  • the target sensing data is a photo taken in the kitchen where the terminal is currently located, and the target sensing data is analyzed by the SVM multi-classifier, and the recognition result is that the target scene to which the target sensing data belongs is indoor; and then the terminal is acquired.
  • the location information of the current location ie, the kitchen
  • the scene to which the target sensing data belongs can be further limited to a smaller space-time region, and it should be understood that a more specific scene with a relatively small range corresponds to
  • the perceptual model will be simplified accordingly, and its computational complexity is relatively small.
  • the calculation amount of the perceptual model based on the perceptual model is also relatively small.
  • the positioning information of the current location of the terminal may be obtained by using any existing positioning method or a combination of multiple positioning methods, which is not limited by the present invention.
  • the target scenario may be directly determined according to the request message.
  • the S220 determines, according to the request message, a target scenario to which the target sensing data belongs, including:
  • the pre-stored perceptual model library stores the perceptual models corresponding to the different scenarios that are trained according to different perceptual data and sample data of different scenarios.
  • the storage model of the sensing model of the different scenarios in the perceptual model library may be: a scene identifier + a perceptual model. It should be understood that the storage model of the perceptual model in different scenarios may be in any other form, which is not limited in this embodiment of the present invention, as long as the identification according to the scene (scene number or type) can be made from the perceptual model.
  • the corresponding perceptual model library can be obtained in the library.
  • S240 Send the target sensing model to the terminal according to the request message, so that the terminal calculates the recognition result of the target sensing data according to the target sensing model.
  • the data processing method of the embodiment of the present invention provides the terminal with the perceptual model corresponding to the scenario to which the perceptual data to be identified belongs, so that the terminal processes the corresponding perceptual data according to the perceptual model, because the perceptual model corresponding to the specific scenario is
  • the complexity is relatively small and the accuracy of the model is relatively high, so the computational complexity can be effectively reduced, and the speed and accuracy of data processing can be improved.
  • S250 Acquire a perceptual data sample, where the perceptual data sample includes at least a part of the perceptual data having the scene annotation information and the item annotation information;
  • S270 Store the perceptual model corresponding to the different scenarios into the perceptual model library, where the perceptual model library includes the target perceptual model.
  • an image sample is read, at least a part of the image sample has scene annotation information and item annotation information.
  • the scene annotation information of the image Img00001 in the image sample is ⁇ scene: square>
  • the item labeling information is: ⁇ article: flower bed, bench, tourist, bus, bus stop, car, police, children, balloons, cafes, pigeons>.
  • the item labeling information of the image may also include the position of the item in the image, for example, using a partial rectangular area.
  • the general perception model is determined according to the local area image file and the item annotation information carried in the image sample, or is referred to as a parameter file for determining the perceptual model.
  • the local area image file of all the original images in the image sample generated in S320 is used as an input of the universal perceptual model, and the item type calculation information outputted by the universal perceptual model and the item annotation information carried in the image sample are determined to determine the A parameter file for the generic perceptual model.
  • the universal perceptual model can be regarded as a model in which a convolutional neural network (CNN) model is combined with a logical regression (Softmax) model supporting multi-classification.
  • the step of determining the universal perceptual model includes, for example, using the local area image file as an input of the CNN model, and correspondingly, the CNN model outputs relevant matrix information; then, the matrix information output by the CNN model is used as an input of the Softmax model, and accordingly,
  • the Softmax model outputs the item type calculation result; based on the item type calculation result and the item labeling information carried in the image sample (the matching degree or error rate between the two), the respective parameters of the CNN model and the Softmax model can be calculated. That is, the general perception model is determined.
  • a target image sample is determined, the image in the target image sample has a scene marker Note information and item labeling information.
  • an image having both the scene annotation information and the item annotation information is selected, and this type of image is determined as the target image sample.
  • the labeling information of the first type of image is: ⁇ scene: restaurant; items: chair, table, wine bottle, plate, chopsticks>
  • the labeling information of the second type of image is: ⁇ scene: square; article: flower bed, bench, Visitors, buses, cars, police, children, balloons, cafes>
  • the third type of image annotation information ⁇ scene: square; items: flower beds, benches, tourists, buses, bus stops, cars, balloons, cafes , pigeons>
  • the fourth type of image annotation information is: ⁇ scene: ward; items: bed, monitor, ventilator, pager, bracket, dirt bucket>
  • the fifth type of image labeling information ⁇ scene: Kitchen; items: kettle, faucet, microwave, salt shaker, sugar bowl, tomato juice, plate, gas stove>.
  • each scene corresponds to an image set including a plurality of images, and should not be understood as a scene corresponding to only one image, in other words, the scene annotation information mentioned above.
  • the first type of image for ⁇ Scene: Restaurant> is a collection of several images.
  • the local area image file (which has been acquired in S320) of the third type image of the scene labeling information is ⁇ Scene: Square> as the input of the universal sensing model determined in S330, and correspondingly, the universal sensing model will Outputting the calculated item type calculation information; then determining an error rate of the item type calculation information according to the item type indicated by the item label information carried by the third type image, and measuring the complexity of the universal perception model, comprehensively considering The error rate and complexity adjust and simplify the parameters of the universal perceptual model, wherein the simplification of the parameters includes clustering and merging the computing nodes with similar parameters, and cutting the parameters without contributing to the output.
  • the perceptual model after the parameter reduction can become the scene ⁇ square > Corresponding perception model. It should be understood that, in the process of determining the perceptual model corresponding to the scene ⁇ square>, the general perceptual model determined in S330 is backed up, so as to subsequently determine the perceptual model corresponding to other scenarios. It should also be understood that similarly, the perceptual models corresponding to other various scenarios can be acquired.
  • the perceptual models corresponding to the respective scenes are stored in the perceptual model library.
  • the data processing method proposed by the present invention can greatly simplify the complexity and the calculation amount of the sensing model under the premise of maintaining or even improving the accuracy of data recognition, thereby solving the contradiction between the computing power and the complexity of the model. Can effectively improve the recognition ability.
  • the above process of training the perceptual model corresponding to each scenario is not limited to being performed only before the request message sent by the receiving terminal for requesting the target perceptual model of the target perceptual data is executed.
  • the training process of the perceptual model can be performed periodically according to a certain period, so that the perceptual data requested by the terminal can be referenced, and the sample library enriching the perceptual data and the scene is continuously enriched, thereby enriching and perfecting the type of the scene and its corresponding
  • the perceptual model can continuously improve the accuracy of the recognition calculation of the perceptual models corresponding to each scene.
  • the data processing method of the embodiment of the present invention provides the terminal with the perceptual model corresponding to the scenario to which the perceptual data to be identified belongs, so that the terminal processes the corresponding perceptual data according to the perceptual model, because the perceptual model corresponding to the specific scenario is
  • the complexity is relatively small and the accuracy of the model is relatively high, so the computational complexity can be effectively reduced, and the speed and accuracy of data processing can be improved.
  • the execution entity of the data processing method 200 provided by the embodiment of the present invention is generally a server, and the “pre-stored perceptual model library” refers to each scenario generated by the server for storing training samples according to different sensing data and different scenarios.
  • FIG. 4 to FIG. 7 a device for data processing according to an embodiment of the present invention will be described in detail below.
  • FIG. 4 shows a schematic block diagram of a device 400 for data processing according to an embodiment of the present invention.
  • the device 400 includes:
  • the obtaining module 410 is configured to acquire target sensing data, where the target sensing data is any one of the following data: image data, video data, and sound data;
  • a first determining module 420 configured to determine a target scenario to which the target sensing data acquired by the acquiring module belongs;
  • a second determining module 430 configured to determine a target sensing model corresponding to the target scenario determined by the first determining module
  • the calculation module 440 is configured to calculate a recognition result of the target sensing data acquired by the acquiring module according to the target sensing model determined by the second determining module.
  • the device 400 for data processing determines the recognition result of the perceptual data by using the perceptual model corresponding to the scenario to determine the scenario to which the perceptual data to be identified belongs, which can reduce the computational complexity compared with the prior art. Degree, which can improve the efficiency of data processing.
  • the calculation module 440 may be a perceptual computing processor whose function is to perform perceptual calculation, for example, performing recognition processing of the perceptual data according to an algorithm model such as a convolutional neural network CNN, a deep learning network DNN, or the like.
  • the input sensing data is sequentially selected for data block selection, concatenated convolution and sampling calculation of each layer of the neural network, and classification matrix calculation, and finally the recognition result is generated.
  • the computational process including and not limited to being performed entirely on a general purpose CPU, is performed partially on the GPU acceleration chip or on a dedicated chip.
  • the first determining module 420 is specifically configured to determine the target scenario by performing scene analysis on the target sensing data.
  • the first determining module 420 can be a scene recognizer, and its function is to identify the input. Perceive the scene in which the data is located. Enter the perceptual data, output the scene type or scene code, or other identifier that can represent the scene.
  • the target sensing data is data generated at a location where the terminal is currently located
  • the first determining module inputs the sensing data and the positioning information, and outputs the scene type or encoding.
  • a first sending unit configured to send, to the server, a first request for requesting a scenario to which the target sensing data belongs;
  • the first receiving unit is configured to receive the target scenario that is sent by the server according to the first request.
  • the second determining module is specifically configured to determine, according to the pre-stored perceptual model library, the target perceptual model corresponding to the target scenario, where each perceptual model in the perceptual model library corresponds to one Kind of scene.
  • the device includes:
  • the obtaining module 410 is configured to acquire target sensing data, where the target sensing data is any one of the following data: image data, video data, and sound data;
  • a first determining module 420 configured to determine a target scenario to which the target sensing data acquired by the acquiring module belongs;
  • a second determining module 430 configured to determine a target sensing model corresponding to the target scenario determined by the first determining module
  • the update module 450 may, after the second determining module 430 requests the server to acquire the target sensing model, store the target sensing model to the pre-stored sensing model library to update the sensing model library; the update module 450 may also be Before obtaining the target sensing data that needs to be recognized, the obtaining module 410 requests the server to request the sensing model to be needed in advance through the prediction algorithm, that is, The pre-stored perceptual model library is updated in advance.
  • the second determining module includes:
  • the device 400 further includes: a cache module, configured to cache the target sensing model and the scene identifier (scene type or scene number) received by the second receiving unit to the pre-stored perceptual model In the library.
  • the cache module can be a high-speed access device such as a memory.
  • the device 400 for data processing determines the recognition result of the perceptual data by using the perceptual model corresponding to the scenario to determine the scenario to which the perceptual data to be identified belongs, which can reduce the computational complexity compared with the prior art. Degree, which can improve the efficiency of data processing.
  • the device 400 for data processing may correspond to a terminal in the method of data processing of the embodiment of the present invention, and the above and other operations and/or functions of the respective modules in the device 400 are respectively implemented for The corresponding processes of the respective methods in FIG. 1 to FIG. 3 are not described herein again for the sake of brevity.
  • FIG. 6 shows a schematic block diagram of an apparatus 500 for data processing in accordance with an embodiment of the present invention.
  • the device 500 includes:
  • the receiving module 510 is configured to receive, by the terminal, a request message for requesting a sensing model corresponding to a scenario to which the target sensing data belongs, where the target sensing data is any one of the following data: image data, video data, and sound data;
  • the first determining module 520 is configured to determine, according to the request message received by the receiving module, a target scenario to which the target sensing data belongs;
  • the second determining module 530 is configured to determine, from the pre-stored perceptual model library, the target perceptual model corresponding to the target scenario determined by the first determining module, where each model in the perceptual model library respectively corresponds to a scenario;
  • the sending module 540 is configured to send, according to the request message received by the receiving module, the terminal The target determining model determined by the second determining module, so that the terminal calculates the recognition result of the target sensing data according to the target sensing model.
  • the device 500 for data processing provides a terminal with a perceptual model corresponding to a scenario required by the terminal, so that the terminal processes the corresponding perceptual data according to the perceptual model, because the perceptual model corresponding to the specific scenario is
  • the complexity is relatively small and the accuracy of the model is relatively high, so the computational complexity can be effectively reduced, and the speed and accuracy of data processing can be improved.
  • the device 500 includes:
  • the receiving module 510 is configured to receive, by the terminal, a request message for requesting a sensing model corresponding to a scenario to which the target sensing data belongs, where the target sensing data is any one of the following data: image data, video data, and sound data;
  • the first determining module 520 is configured to determine, according to the request message received by the receiving module, a target scenario to which the target sensing data belongs;
  • the second determining module 530 is configured to determine, from the pre-stored perceptual model library, the target perceptual model corresponding to the target scenario determined by the first determining module, where each model in the perceptual model library respectively corresponds to a scenario;
  • the sending module 540 is configured to send, according to the request message received by the receiving module, the target sensing model determined by the second determining module to the terminal, so that the terminal calculates the recognition result of the target sensing data according to the target sensing model;
  • the obtaining module 550 is configured to acquire a sensing data sample, where the sensing data sample includes at least a part of the sensing data having the scene labeling information and the item labeling information, before the receiving module receives the request message;
  • the storage module 570 is configured to store the perceptual model corresponding to the different scenarios obtained by training the training module into the perceptual model library, where the perceptual model library includes the target perceptual model.
  • the training module 560 may be referred to as a model training server, and the function is to read the training sample database, and to train the perceptual model parameters required for various scenarios according to each scene classification description in the scene knowledge base.
  • the training model 560 inputs training sample data and a scene classification description file; and outputs a perceptual model parameter file of each scene.
  • the scenario knowledge base is an empty storage space for managing and saving the classification description corresponding to various scenarios. between.
  • the classification descriptions corresponding to various scenarios include: categories that may appear in various scenarios, such as items, characters, action behaviors, events, texts, etc., and may also include hierarchical relationships between all categories, such as animal-dog-golden dogs, Car-sedan-BMW-BMW 3 Series, party-birthday party.
  • the scene knowledge base may further include spatial structure information, and a scene number corresponding to each spatial region.
  • the storage module 570 is configured to save a model parameter file of each scene generated by the training model 560 (model training server), for example, including a scene identifier (type or number) and a corresponding model parameter file.
  • the storage module 570 can be referred to as a model parameter library.
  • the first determining module is configured to determine, by performing scene analysis on the target sensing data included in the request message, the target scenario to which the target sensing data belongs.
  • the target sensing data is data generated at a location where the terminal is currently located
  • the first determining module is specifically configured to perform scene analysis on the target sensing data according to the positioning information of the current location of the terminal, and determine the target scenario.
  • the first determining module is specifically configured to determine the target scenario according to the identifier included in the request message for indicating the target scenario.
  • the device 500 for data processing provides a terminal with a perceptual model corresponding to a scenario required by the terminal, so that the terminal processes the corresponding perceptual data according to the perceptual model, because the perceptual model corresponding to the specific scenario is
  • the complexity is relatively small and the accuracy of the model is relatively high, so the computational complexity can be effectively reduced, and the speed and accuracy of data processing can be improved.
  • the apparatus 500 for data processing may correspond to a server in the method of data processing of the embodiment of the present invention, and the above and other operations and/or functions of the respective modules in the apparatus 500 are respectively implemented in order to implement the map.
  • the corresponding processes of the respective methods in FIG. 1 to FIG. 3 are not described herein again for the sake of brevity.
  • an embodiment of the present invention further provides a data processing device 600.
  • the device 600 includes a processor 610, a memory 620, a bus system 630, a receiver 640, and a transmitter 650.
  • the processor 610, the memory 620, the receiver 640, and the transmitter 650 are connected by a bus system 630.
  • the memory 620 is configured to store instructions for executing the instructions stored in the memory 620 to control the receiver 640 to receive.
  • Signal and control transmitter 650 to send a signal.
  • the processor 610 is configured to acquire target sensing data, where the target sensing data is any one of the following data: image data, video data, and sound data; determining a target scene to which the target sensing data belongs; determining corresponding to the target scene. a target perception model; calculating a recognition result of the target perception data according to the target perception model.
  • the device 600 for data processing determines the recognition result of the perceptual data by using the perceptual model corresponding to the scenario to determine the scenario to which the perceptual data to be identified belongs, which can reduce the computational complexity compared with the prior art. Degree, which can improve the efficiency of data processing.
  • the processor 610 is specifically configured to determine the target scenario by performing scene analysis on the target sensing data.
  • the target sensing data is data generated at a location where the terminal is currently located
  • the processor 610 is specifically configured to perform scene analysis on the target sensing data according to the positioning information of the current location of the terminal, and determine the target scenario.
  • the sender 650 is configured to send, to the server, a first request for requesting a scenario to which the target sensing data belongs, and a receiver 640, configured to receive the server, according to the first request, Target scenario.
  • the processor 610 is specifically configured to determine, according to the pre-stored perceptual model library, the target perceptual model corresponding to the target scenario, where each perceptual model in the perceptual model library corresponds to a scenario respectively. .
  • the processor 610 is specifically configured to: after acquiring the target sensing data, update the sensing model library according to the user historical scene sequence, where the updated sensing model library includes the target scene corresponding to the Target perception model.
  • the transmitter 650 is configured to: when determining that the pre-stored perceptual model library does not have the perceptual model corresponding to the target scenario, send a second request for requesting the perceptual model corresponding to the target scenario to the server.
  • Each of the perceptual models in the perceptual model library corresponds to a scenario; the receiver 640 is configured to receive the target perceptual model corresponding to the target scenario sent by the server according to the second request.
  • the processor 610 may be a central processing unit ("CPU"), and the processor 610 may also be other general-purpose processors, digital signal processors (DSPs). , an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, and the like.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 620 can include read only memory and random access memory and provides instructions and data to the processor 610. A portion of the memory 620 can also include a non-volatile random access memory. For example, the memory 620 can also store information of the device type.
  • the bus system 630 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus system 630 in the figure.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 610 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 620, and the processor 610 reads the information in the memory 620 and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the device 600 for data processing determines the recognition result of the perceptual data by using the perceptual model corresponding to the scenario to determine the scenario to which the perceptual data to be identified belongs, which can reduce the computational complexity compared with the prior art. Degree, which can improve the efficiency of data processing.
  • the device 600 for data processing according to an embodiment of the present invention may correspond to a terminal in the method of data processing according to the embodiment of the present invention, and the device 600 may further correspond to the device 400 for data processing of the embodiment of the present invention, and the device
  • the above and other operations and/or functions of the respective modules in the 600 are respectively implemented in order to implement the respective processes of the respective methods in FIG. 1 to FIG. 3, and are not described herein again for brevity.
  • an embodiment of the present invention further provides a data processing device 700.
  • the device 700 includes a processor 710, a memory 720, a bus system 730, a receiver 740, and a transmitter 750.
  • the processor 710, the memory 720, the receiver 740 and the transmitter 750 are connected by a bus system 730 for storing instructions for executing instructions stored in the memory 720 to control the receiver 740 to receive.
  • Signal and control transmitter 750 to send a signal. among them,
  • the receiver 740 is configured to receive, by the terminal, a request message for requesting a sensing model corresponding to a scenario to which the target sensing data belongs, where the target sensing data is any one of the following data: image data, video data, and sound data;
  • the processor 710 is configured to determine, according to the request message, a target scenario to which the target sensing data belongs; and determine, from the pre-stored perceptual model library, the target scenario corresponding to the target scenario a target perceptual model, each model in the perceptual model library corresponding to a scenario; a transmitter 750, configured to send the target perceptual model to the terminal according to the request message, so that the terminal calculates according to the target perceptual model The target perceives the recognition result of the data.
  • the device 700 for data processing provides a terminal with a perceptual model corresponding to a scenario required by the terminal, so that the terminal processes the corresponding perceptual data according to the perceptual model, because the perceptual model corresponding to the specific scenario is
  • the complexity is relatively small and the accuracy of the model is relatively high, so the computational complexity can be effectively reduced, and the speed and accuracy of data processing can be improved.
  • the processor 710 is specifically configured to: before the receiver 740 receives the request message, acquire a perceptual data sample, where the perceptual data sample includes at least a part of the perceptual data with the scene annotation information and the item annotation information. And sensing a perceptual model corresponding to different scenarios according to the perceptual data sample; storing the perceptual model corresponding to the different scenarios into the perceptual model library, where the perceptual model library includes the target perceptual model.
  • the target sensing data is data generated at a location where the terminal is currently located
  • the processor 710 is specifically configured to perform scene analysis on the target sensing data according to the positioning information of the current location of the terminal, and determine the target scenario.
  • the processor 710 is specifically configured to determine the target scenario according to the identifier included in the request message for indicating the target scenario.
  • the processor 710 may be a central processing unit (“CPU"), and the processor 710 may also be other general-purpose processors, digital signal processors (DSPs). , an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, and the like.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 720 can include read only memory and random access memory and provides instructions and data to the processor 710. A portion of the memory 720 can also include a non-volatile random access memory. For example, the memory 720 can also store information of the device type.
  • the bus system 730 may include a power bus and a control bus in addition to the data bus. And status signal bus, etc. However, for clarity of description, various buses are labeled as bus system 730 in the figure.
  • each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 710 or an instruction in a form of software.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented as a hardware processor, or may be performed by a combination of hardware and software modules in the processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in memory 720, and processor 710 reads the information in memory 720 and, in conjunction with its hardware, performs the steps of the above method. To avoid repetition, it will not be described in detail here.
  • the device 700 for data processing provides a terminal with a perceptual model corresponding to a scenario required by the terminal, so that the terminal processes the corresponding perceptual data according to the perceptual model, because the perceptual model corresponding to the specific scenario is
  • the complexity is relatively small and the accuracy of the model is relatively high, so the computational complexity can be effectively reduced, and the speed and accuracy of data processing can be improved.
  • the data processing device 700 may correspond to the server in the data processing method of the embodiment of the present invention, and the device 700 may further correspond to the data processing device 500 of the embodiment of the present invention, and the device
  • the above and other operations and/or functions of the respective modules in the 700 are respectively implemented in order to implement the respective processes of the respective methods in FIG. 1 to FIG. 3, and are not described herein again for brevity.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another The system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明实施例提供一种数据处理的方法和设备,该方法包括:获取目标感知数据,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;确定该目标感知数据所属的目标场景;确定该目标场景对应的目标感知模型;根据该目标感知模型,计算该目标感知数据的识别结果。因此,本发明实施例的数据处理的方法和设备,通过确定感知数据所属的场景,利用该场景所对应的感知模型计算获得该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。

Description

数据处理的方法和设备
本申请要求于2014年9月16日提交中国专利局、申请号为201410471480.X、发明名称为“数据处理的方法和设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及数据处理领域,并且更具体地,涉及一种数据处理的方法和设备。
背景技术
手机、可穿戴设备、机器人等终端设备,都存在从图像、视频、声音等感知数据中识别多种物体、声音、动作的需求。例如手机要进行拍照搜索,需要先识别出所拍照片中的目标物品,然后才可以搜索目标物品相关的信息。再例如,机器人要执行抓取目标物品的任务,需要先通过摄像头数据获取到目标物品在周围环境中的位置。
为了让终端设备具备广泛的识别能力,通常方法为:从大量已知样本数据训练出能区分各种物体、声音或动作的感知模型。针对每输入的新的图像、视频或声音,终端设备基于训练好的感知模型,可以计算出相应的识别结果。
随着需要识别的类型越来越多,并为了提高识别的准确率,用来识别感知数据的感知模型越来越复杂,例如感知模型的参数越来越多。如目前用于图像识别的卷积神经网络(Convolutional Neural Network,简称为“CNN”)模型的参数已经达到数千万,甚至数亿。目前,在许多应用中,为了提高用户体验,感知模型需要对各种给定场景下的大量的物体、动作、声音进行精准的识别,这对感知模型的准确度提出很大的挑战,当前技术中通常采用参数固定的感知模型完成所有的识别任务,则感知模型的复杂度将随着识别需求的细化而无限增加,从而会对存储和计算带来巨大挑战。
发明内容
本发明实施例提供一种数据处理的方法和设备,能够解决设备的计算能力与感知模型的复杂度之间的矛盾的问题。
第一方面提供了一种数据处理的方法,该方法包括:
获取目标感知数据,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
确定该目标感知数据所属的目标场景;
确定该目标场景对应的目标感知模型;
根据该目标感知模型,计算该目标感知数据的识别结果。
结合第一方面,在第一方面的第一种可能的实现方式中,确定该目标感知数据所属的目标场景,包括:
通过对该目标感知数据进行场景分析,确定该目标场景。
结合第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,该目标感知数据为在终端当前所处位置生成的数据;
其中,该通过对该目标感知数据进行场景分析,确定该目标场景,包括:
结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
结合第一方面,在第一方面的第三种可能的实现方式中,确定该目标感知数据所属的目标场景,包括:
向服务器发送用于请求该目标感知数据所属的场景的第一请求;
接收该服务器根据该第一请求发送的该目标场景。
结合第一方面和第一方面的第一种至第三种可能的实现方式中的任一种可能的实现方式,在第一方面的第四种可能的实现方式中,确定该目标场景对应的目标感知模型,包括:
从预存的感知模型库中,确定该目标场景对应的该目标感知模型,该感知模型库中的每个感知模型分别对应一种场景。
结合第一方面的第四种可能的实现方式,在第一方面的第五种可能的实现方式中,该方法还包括:
根据用户历史场景序列,更新该感知模型库,该更新后的感知模型库中包括该目标场景对应的该目标感知模型。
结合第一方面和第一方面的第一种至第三种可能的实现方式中的任一种可能的实现方式,在第一方面的第六种可能的实现方式中,确定该目标场景对应的目标感知模型,包括:
当确定预存的感知模型库中没有该目标场景对应的感知模型时,向服务 器发送用于请求该目标场景对应的感知模型的第二请求,该感知模型库中的每个感知模型分别对应一种场景;
接收该服务器根据该第二请求发送的该目标场景对应的该目标感知模型。
第二方面提供了一种数据处理的方法,该方法包括:
接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
根据该请求消息,确定该目标感知数据所属的目标场景;
从预存的感知模型库中,确定该目标场景对应的目标感知模型,该感知模型库中的每个模型分别对应一种场景;
根据该请求消息,向该终端发送该目标感知模型,以便于该终端根据该目标感知模型计算该目标感知数据的识别结果。
结合第二方面,在第二方面的第一种可能的实现方式中,在接收到该请求消息之前,该方法还包括:
获取感知数据样本,该感知数据样本至少包括一部分具有场景标注信息和物品标注信息的感知数据;
根据该感知数据样本,训练不同场景分别对应的感知模型;
将该不同场景分别对应的感知模型存储到该感知模型库中,该感知模型库中包括该目标感知模型。
结合第二方面或第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,根据该请求消息,确定该目标感知数据所属的目标场景,包括:
通过对该请求消息中包括的该目标感知数据进行场景分析,确定该目标感知数据所属的该目标场景。
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,目标感知数据为在终端当前所处位置生成的数据;
其中,确定该目标感知数据所属的该目标场景,包括:
结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
结合第二方面或第二方面的第一种可能的实现方式,在第二方面的第四 种可能的实现方式中,根据该请求消息,确定该目标感知数据所属的目标场景,包括:
根据该请求消息中包括的用于指示该目标场景的标识,确定该目标场景。
第三方面提供了一种数据处理的设备,该设备包括:
获取模块,用于获取目标感知数据,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
第一确定模块,用于确定该获取模块获取的该目标感知数据所属的目标场景;
第二确定模块,用于确定该第一确定模块确定的该目标场景对应的目标感知模型;
计算模块,用于根据该第二确定模块确定的该目标感知模型,计算该获取模块获取的该目标感知数据的识别结果。
结合第三方面,在第三方面的第一种可能的实现方式中,该第一确定模块具体用于,通过对该目标感知数据进行场景分析,确定该目标场景。
结合第三方面的第一种可能的实现方式,在第三方面的第二种可能的实现方式中,该目标感知数据为在终端当前所处位置生成的数据;
其中,该第一确定模块具体用于,结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
结合第三方面,在第三方面的第三种可能的实现方式中,该第一确定模块包括:
第一发送单元,用于向服务器发送用于请求该目标感知数据所属的场景的第一请求;
第一接收单元,用于接收该服务器根据该第一请求发送的该目标场景。
结合第三方面和第三方面的第一种至第三种可能的实现方式中的任一种可能的实现方式,在第三方面的第四种可能的实现方式中,该第二确定模块具体用于,从预存的感知模型库中,确定该目标场景对应的该目标感知模型,该感知模型库中的每个感知模型分别对应一种场景。
结合第三方面的第四种可能的实现方式,在第三方面的第五种可能的实现方式中,该设备还包括:
更新模块,用于在该获取模块获取目标感知数据之前,根据用户历史场 景序列,更新该感知模型库,该更新后的感知模型库中包括该目标场景对应的该目标感知模型。
结合第三方面和第三方面的第一种至第三种可能的实现方式中的任一种可能的实现方式,在第三方面的第六种可能的实现方式中,该第二确定模块包括:
第二发送单元,用于当确定预存的感知模型库中没有该目标场景对应的感知模型时,向服务器发送用于请求该目标场景对应的感知模型的第二请求,该感知模型库中的每个感知模型分别对应一种场景;
第二接收单元,用于接收该服务器根据该第二请求发送的该目标场景对应的该目标感知模型。
第四方面提供了一种数据处理的设备,该设备包括:
接收模块,用于接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
第一确定模块,用于根据该接收模块接收的该请求消息,确定该目标感知数据所属的目标场景;
第二确定模块,用于从预存的感知模型库中,确定该第一确定模块确定的该目标场景对应的目标感知模型,该感知模型库中的每个模型分别对应一种场景;
发送模块,用于根据该接收模块接收的该请求消息,向该终端发送该第二确定模块确定的该目标感知模型,以便于该终端根据该目标感知模型计算该目标感知数据的识别结果。
结合第四方面,在第四方面的第一种可能的实现方式中,该设备还包括:
获取模块,用于在该接收模块接收到该请求消息之前,获取感知数据样本,该感知数据样本至少包括一部分具有场景标注信息和物品标注信息的感知数据;
训练模块,用于根据该感知数据样本,训练不同场景分别对应的感知模型;
存储模块,用于将该训练模块训练得到的该不同场景分别对应的感知模型存储到该感知模型库中,该感知模型库中包括该目标感知模型。
结合第四方面或第四方面的第一种可能的实现方式,在第四方面的第二 种可能的实现方式中,该第一确定模块具体用于,通过对该请求消息中包括的该目标感知数据进行场景分析,确定该目标感知数据所属的该目标场景。
结合第四方面的第二种可能的实现方式,在第四方面的第三种可能的实现方式中,该目标感知数据为在终端当前所处位置生成的数据;
其中,该第一确定模块具体用于,结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
结合第四方面或第四方面的第一种可能的实现方式,在第四方面的第四种可能的实现方式中,该第一确定模块具体用于,根据该请求消息中包括的用于指示该目标场景的标识,确定该目标场景。
基于上述技术方案,本发明实施例的数据处理的方法和设备中,通过确定感知数据所属的场景,利用该场景所对应的感知模型计算获得该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1示出了本发明实施例的数据处理的方法的示意性流程图。
图2示出了本发明另一实施例的数据处理的方法的示意性流程图。
图3示出了本发明另一实施例提供的训练感知模型的示意性流程图。
图4示出了本发明实施例的数据处理的设备的示意性框图。
图5示出了本发明实施例的数据处理的设备的另一示意性框图。
图6示出了本发明另一实施例的数据处理的设备的示意性框图。
图7示出了本发明另一实施例的数据处理的设备的另一示意性框图。
图8示出了本发明实施例提供的数据处理的设备的示意性框图。
图9示出了本发明另一实施例提供的数据处理的设备的示意性框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
应理解,本发明实施例的技术方案可以应用于各种通信系统,例如:通用移动通信系统(Universal Mobile Telecommunication System,简称为“UMTS”)、全球移动通讯(Global System of Mobile communication,简称为“GSM”)系统、码分多址(Code Division Multiple Access,简称为“CDMA”)系统、宽带码分多址(Wideband Code Division Multiple Access,简称为“WCDMA”)系统、通用分组无线业务(General Packet Radio Service,简称为“GPRS”)、长期演进(Long Term Evolution,简称为“LTE”)系统、LTE频分双工(Frequency Division Duplex,简称为“FDD”)系统、LTE时分双工(Time Division Duplex,简称为“TDD”)、通用移动通信系统(Universal Mobile Telecommunication System,简称为“UMTS”)或全球互联微波接入(Worldwide Interoperability for Microwave Access,简称为“WiMAX”)通信系统等。
还应理解,在本发明实施例中,终端也可称之为用户设备(User Equipment,简称为“UE”)、移动台(Mobile Station,简称为“MS”)、移动终端(Mobile Terminal)等,该终端可以经无线接入网(Radio Access Network,简称为“RAN”)与一个或多个核心网进行通信,例如,终端可以是移动电话(或称为“蜂窝”电话)或具有移动终端的计算机等,例如,终端还可以是便携式、袖珍式、手持式、计算机内置的或者车载的移动装置,它们与无线接入网交换语言和/或数据,具体地,该终端可以是手机、可穿戴设备、机器人等设备。
目前技术中一般采用一个参数固定的感知模型完成所有的识别任务,例如对在超市、医院或厨房生成的感知数据,都采用同一个感知模型执行识别任务。随着要识别的对象类型越来越多、对识别准确度的要求越来越高,导致感知模型的参数越来越多,如目前用于图像识别数万类物体的CNN模型的参数已经达到数千万,甚至数亿,这必然使得感知模型的计算复杂度大大增加,同时对感知模型的存储空间也提出挑战。
针对上述存在的问题,本发明提出一种数据处理的方法,在训练感知模型的过程中,考虑场景的因素,生成不同场景各自的感知模型;在对感知数据识别计算的过程中,首先确定该感知数据所属于的场景,然后获取该场景 对应的感知模型,最后,利用该感知数据所属场景对应的感知模型,计算感知数据的识别结果。
由于每个场景中出现的对象类型有限,例如对于室外场景或者城市场景,可能需要识别人、车、建筑、或文字等对象,但基本不会出现各种动物和植物的识别需求。换句话说,每种场景中经常出现的需要识别的对象类别数都是相对较少的,相应地,每个场景各自对应的感知模型的模型参数也相对较少,从而,每个场景对应的感知模型的计算复杂度大大减小,而且也不会对存储空间有很高的需求。因此,本发明提出的数据处理的方法,能够在保持甚至提高数据识别的准确度的前提下,大大简化感知模型的复杂度和计算量,从而解决了终端计算能力与模型的复杂度之间矛盾,能够有效提高识别能力。
为了便于本领域技术人员更好地理解本发明的技术方案,下面以一个具体的例子介绍本发明实施例的一个具体的应用场景。例如,用户在广场上拍照,可以利用手机根据该照片识别出花坛、咖啡馆、巴士等物体;再例如,用户在林间拍照,可以利用手机根据该照片识别出苜蓿、菊花、螳螂等物体。即本发明实施例提供的技术方案可以应用于以下,用户在各种场景下用手机拍照,手机从照片中识别各种物体,并返回物体的名称。应理解,上述的手机也可以是其他终端设备。
图1示出了本发明实施例的数据处理的方法100,该方法100例如由终端来执行。如图1所示,该方法100包括:
S110,获取目标感知数据,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据。
具体地,该目标感知数据可以为摄像头、麦克风、红外传感器或深度传感器等传感装置所测量或生成的数据;例如该目标感知数据为一张图片、或者一段视频,再或者一节录音等。
应理解,获取该目标感知数据的终端也可以是生成该目标感知数据的设备。可选地,在本发明实施例中,S110获取目标感知数据,包括:
在终端当前所处位置生成该目标感知数据。
具体地,例如,用户用手机在场景A拍照,并直接用手机识别处理所拍摄的照片。
还应理解,获取该目标感知数据的终端可以与生成该目标感知数据的设 备是不同的设备。例如,用户用手机拍照(手机生成感知数据),然后将照片上传到笔记本电脑中(笔记本电脑获取该感知数据,以进行后续的数据处理),进行相应地处理。
S120,确定该目标感知数据所属的目标场景。
具体地,可以对该目标感知数据进行场景分析与识别,确定该目标感知数据所属的目标场景;也可以向服务器发送请求该目标感知数据所属的目标场景,本发明实施例对此不作限定。
可选地,在本发明实施例中,S120确定该目标感知数据所属的目标场景,包括:通过对该目标感知数据进行场景分析,确定该目标场景。
具体地,可以场景识别器对该目标感知数据进行场景分析,以确定该目标感知数据所属的目标场景。其中场景识别器可以是现有的场景分类模型,例如支持向量机(Support Vector Machine,简称为“SVM”)多分类器。具体地,以该目标感知数据为摄像头拍摄数据为例,从摄像头数据中抽取图像的全局特征(GIST)、稠密尺度不变特征转换(Dense Scale-invariant feature transform,简称为“Dense SIFT”)、或纹理直方图(Texton Histograms)等特征,将这些特征输入事先训练好的SVM多分类器,进行场景识别,输出场景类型,例如识别出场景为“广场”。
进一步地,在该目标感知模型为在终端当前所处位置生成的感知数据的情况下,还可以结合终端当前所处位置的定位信息,进一步限定该目标感知数据所属的场景。
可选地,在本发明实施例中,该目标感知数据为在终端当前所处位置生成的数据;
其中,S120通过对该目标感知数据进行场景分析,确定该目标场景,包括:结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
具体地,以目标感知数据为在终端当前所在的广场A所拍的照片为例,通过SVM多分类器对该目标感知数据进行场景分析,其识别结果为该目标感知数据所属的目标场景为广场;然后获取终端当前所在位置的定位信息(即广场A),根据该定位信息,可以将该目标感知数据所属的目标场景由广场进一步限定为广场A。再例如,目标感知数据为在终端当前所在的厨房所拍的照片,通过SVM多分类器对该目标感知数据进行场景分析,其识别 结果为该目标感知数据所属的目标场景为室内;然后获取终端当前所在位置的定位信息(即厨房),根据该定位信息,可以将该目标感知数据所属的目标场景由室内进一步限定为厨房。
可知,在本发明实施例中,通过结合终端的定位信息,可以将目标感知数据所属的场景进一步限定到更小的时空区域,应理解,更加具体的、范围相对较小的场景,其对应的感知模型也将相应地更加简化,其计算复杂度也相对较小,后续基于该感知模型对感知数据的识别计算的计算量也相对较小。
应理解,在本发明实施例中,获取终端当前所在位置的定位信息的方法可以是下列方法中的任意一个,或者是多个方法的组合:通过wifi定位的方法、通过同步定位与地图构建(Simultaneous Localization and Mapping,简称为“SLAM”)功能定位的方法。其中,通过wifi定位的方法具体为:终端扫描和搜集周围wifi的无线接入点信号,获取MAC地址。由于通常来说,无线接入点在一定时间段内不会移动,因此,终端可以将MAC地址上报给位置服务器,位置服务器可以检索出事先保存的无线接入点的地理位置,并结合每个无线接入点信号的强弱程度,计算获得对应终端的地理位置,并将相应的定位信息下发给终端,这样终端就获取了当前所在位置的定位信息。SLAM技术具体指的是通过摄像头在运动过程中,重复观测的同时构建地图和确定自身位置,SLAM技术为现有技术,为了简洁,这里不再赘述。还可以采用其他定位方法获取终端当前所处位置的定位信息,例如GPS等,本发明实施例对此不作限定。
也可以向服务器发送请求该目标感知数据所属的目标场景。
可选地,在本发明实施例中,S120确定该目标感知数据所属的目标场景,包括:
向服务器发送用于请求该目标感知数据所属的场景的第一请求;
接收该服务器根据该第一请求发送的该目标场景。
具体地,该第一请求中包括该目标感知数据,还可以包括该终端的标识。
S130,确定该目标场景对应的目标感知模型;
具体地,可以根据终端本地预存的感知模型库,确定该目标场景对应的该目标感知模型;还可以向网络侧服务器请求该目标场景对应的该目标感知模型,本发明实施例对此不作限定。
可选地,在本发明实施例中,S130确定该目标场景对应的目标感知模型,包括:
从预存的感知模型库中,确定该目标场景对应的该目标感知模型,该感知模型库中的每个感知模型分别对应一种场景。
具体地,预存的感知模型库可以理解为终端本地用于缓存感知模型的一段存储区域,可选地,各个感知模型可以:场景标识(场景编号或场景或类型)+感知模型的存储形式存储在感知模型库,即感知模型库中的每个感知模型分别对应一种场景。例如,确定目标感知数据所属的目标场景为场景D,在终端本地缓存的感知模型库中具有场景D对应的感知模型d情况下,就可以直接从该感知模型库中获取到用于识别处理该目标感知数据的感知模型d。
应理解,终端在每次接收到服务器下发的感知模型后,可以将接收到的感知模型及其对应的场景标识缓存到该感知模型库中,以便后续使用;可选地,当确定该感知模型库的存储空间全部被占用时,可以删除最先缓存的感知模型,然后就将最新接收的感知模型缓存到该感知模型库中。
当确定终端本地缓存的感知模型库中没有目标感知数据所属的目标场景对应的目标感知模型时,可以向网络侧服务器请求该目标感知模型。
可选地,在本发明实施例中,S130确定该目标场景对应的目标感知模型,包括:
当确定预存的感知模型库中没有该目标场景对应的感知模型时,向服务器发送用于请求该目标场景对应的感知模型的第二请求,该感知模型库中的每个感知模型分别对应一种场景;
接收该服务器根据该第二请求发送的该目标场景对应的该目标感知模型。
应理解,该第二请求包括用于指示该目标场景的标识,还可以包括该终端的标识。
可选地,在本发明实施例中,可以将接收到的该目标场景对应的该目标感知模型缓存到终端本地预存的感知模型库中,以便于下次需要获取该目标场景对应的目标感知模型时,可以直接从终端本地获取,而无需向服务器再次请求该目标感知模型。
S140,根据该目标感知模型,计算该目标感知数据的识别结果。
具体地,以目标感知数据为摄像头的图像数据为例,其所属的目标场景为广场为例,从预存的感知模型库中加载相应的目标感知模型,例如<piazza.model>,对摄像头的图像数据进行分析识别。识别过程如下:读取图像;用滑动窗口从原始图像中产生多个局部图像区域;将局部区域图像输入按照<piazza.model>文件中的链接权重参数配置的卷积神经网络(Convolutional Neural Networks,简称为“CNN”),输出一到多个识别结果,如<花坛、长椅、游客、巴士、轿车、警察、儿童、气球、咖啡馆>。
应理解,S140的计算过程还可以根据深度学习网络(Deep Neural Networks,简称为“DNN”)等算法模型,计算识别结果。例如计算步骤为:将输入的感知数据依次进行数据块选择、神经网络各层级联卷积和抽样计算、分类矩阵计算,产生分类结果。
还应理解,S140中的计算过程,包括但并不限于完全在通用中央处理器(Central Processing Unit,简称为“CPU”)上执行;例如,以输入的感知数据为图像数据为例,该计算过程还可以部分在图形处理器(Graphic Processing Unit,简称为“GPU”)芯片上执行;再例如,如果输入的感知数据为声音数据或视频数据等,该计算过程可以部分在相应的专用芯片上执行,本发明实施例对此不作限定。
还应理解,在本发明实施例中,由于利用终端本地预存的感知模型计算获得感知数据的识别结果,避免了现有技术中存在的:服务器需要响应各个终端的感知数据识别的请求、并一一下发各个感知数据的识别结果,因此本发明实施例的数据处理的方法,能够有效降低网络侧服务器的计算负担和传输数据所需的带宽负担,同时还能够提高识别计算的速度。
因此,本发明实施例的数据处理的方法中,通过确定要识别的感知数据所属的场景,采用该场景对应的感知模型计算获取该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
可选地,在本发明实施例中,在获取目标感知数据之前,该方法还包括:
S150,根据用户历史场景序列,更新该感知模型库,该更新后的感知模型库中包括该目标场景对应的该目标感知模型。
具体步骤如下:
S151,根据用户历史场景序列,预测即将识别的感知数据所属的场景。
例如,根据终端本地预存的感知模型库可获知,用户S在工作日一天的 场景及其时间序列为:06:00卧室;7:00客厅;7:20街道;7:30高速;7:40园区车库;将上述场景及其时间序列作为条件随机场(Conditional Random Field,简称为“CRF”)算法模型的输入序列,预测得到在下一个最可能出现的场景及其概率,例如,办公室:0.83;会议室:0.14。
S152,向服务器请求即将识别的感知数据所属的场景对应的感知模型。
例如,向服务器发送用于请求办公室场景和会议室场景分别对应的感知模型的第三请求,该第三请求中包括用于指示办公室场景的标识和用于指示会议室场景的标识。
S153,接收该服务器发送的该即将识别的感知数据所属的场景对应的感知模型,并更新本地预存的感知模型库。
具体地,接收到服务器发送的办公室场景和会议室场景分别对应的感知模型,将该两个场景对应的感知模型以场景标识(编号或类型)+感知模型的形式存储到本地预存的感知模型库中。这样,当后续获取的感知数据所属的场景就是办公室时,就可以直接根据本地预存的感知模型库中获取到办公室对应的感知模型,进而根据该更新后的感知模型获取该感知数据的识别结果。
因此,在本发明实施例中,可以利用终端预存的各个场景对应的感知模型,计算获取到感知数据的识别结果,能够有效提高数据处理的效率。
下面将结合具体实施例对本发明进行进一步描述,需要理解的是,下面的实施例仅是为了帮助更好的理解本发明,而并非对本发明的限制。
执行主体以手机为例,以目标感知数据为在A广场拍摄的照片为例。
1)手机获取到该照片,以对其进行识别处理;
2)从照片中抽取GIST、Dense SIFT,Texton Histograms等特征,将这些这些特征输入事先训练好的场景分类模型(SVM多分类器),进行场景识别,识别出场景为广场;
3)进一步地,获取终端当前所在的位置的定位信息;
4)结合该定位信息进一步将该照片的场景限定为A广场;
5)确定终端本地缓存的感知模型库中是否有A广场对应的感知模型,如果有,转到6),如果没有,转到7)
6)在终端本地缓存的感知模型库中获取到A广场对应的感知模型,如<piazza.model>;转到9)
7)将识别出的场景,即A广场的标识(编号或类型),手机ID和请求序列号,发送到网络侧的服务器,以请求该A广场的感知模型。
8)接收服务器发送的A广场对应的感知模型,如<piazza.model>,并将该收到的模型参数文件<piazza.model>缓存到手机的感知模型库中。如果缓存已满,按照更新策略,删除以前缓存的一些感知模型。
9)根据该感知模型<piazza.model>,对该照片的图像数据进行分析识别。具体识别过程如下:读取图像;用滑动窗口从原始图像中产生多个局部图像区域;将局部区域图像输入按照<piazza.model>文件中的链接权重参数配置的卷积神经网络,输出一到多个识别结果,如<花坛、长椅、游客、巴士、轿车、警察、儿童、气球、咖啡馆>。
再以执行主体以手机为例,以目标感知数据为在室内用户John的厨房拍摄的照片为例。
1)手机获取到该照片,以对其进行识别处理;
2)从照片中抽取GIST、Dense SIFT,Texton Histograms等特征,将这些这些特征输入事先训练好的场景分类模型(SVM多分类器),进行场景识别,识别出场景为室内;
3)进一步地,通过wifi信号和SLAM功能可获得在手机在室内地图中精确位置,即用户John家的厨房。
4)结合该定位信息进一步将该照片的场景限定为厨房;
5)确定终端本地缓存的感知模型库中是否有厨房对应的感知模型,如果有,转到6),如果没有,转到7)
6)在终端本地缓存的感知模型库中获取到厨房对应的感知模型,如<kitchen.model>;转到9)
7)将识别出的场景,即厨房的标识(编号或类型),手机ID和请求序列号,发送到网络侧的服务器,以请求该厨房的感知模型。
8)接收服务器发送的厨房对应的感知模型,如<kitchen.model>,并将该收到的模型参数文件<kitchen.model>缓存到手机的感知模型库中。如果缓存已满,按照更新策略,删除以前缓存的一些感知模型。
9)根据该感知模型<kitchen.model>,对该照片的图像数据进行分析识别。具体识别过程如下:读取图像;用滑动窗口从原始图像中产生多个局部图像区域;将局部区域图像输入按照<kitchen.model>文件中的链接权重参 数配置的卷积神经网络,输出一到多个识别结果,如<煤气灶、吸油烟机、碗柜、炒锅、勺子、调料盒>。
因此,本发明实施例的数据处理的方法中,通过确定要识别的感知数据所属的场景,采用该场景对应的感知模型计算获取该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
可选地,作为一个实施例,也可以直接通过向服务器发送目标感知数据已请求用于处理该目标感知数据的目标感知模型。
具体地,
1),获取目标感知数据,该目标感知数据可以为下列数据中的任一种:图像数据、视频数据和声音数据;
2),向服务器请求用于处理该目标感知数据的感知模型;
3),接收该服务器在确定该目标感知数据所属的目标场景后,发送的该目标场景对应的该目标感知模型;
4),根据该目标感知模型计算该目标感知数据的识别结果。
应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
还应理解,在本发明实施例中,在对目标感知数据进行识别处理时,先确定该目标感知数据所属的目标场景,然后采用该目标场景相对应的目标感知模型来计算获取该目标感知数据的识别结果,由于具体的场景其对应的感知模型对该场景内的对象的识别准确率较高,且相对于现有技术中的用于处理不同场景下感知数据的感知计算模型,本发明实施例中的各个场景各自对应的感知模型的计算复杂度大大降低,从而能够有效降低对计算能力的要求,同时提高数据处理的效率。还应理解,本发明实施例的数据处理的方法的执行主体可以是终端,也可以是服务器,还可以是终端和服务器的组合,本发明实施例对此不作限定。
因此,本发明实施例的数据处理的方法中,通过确定要识别的感知数据所属的场景,采用该场景对应的感知模型获取该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
上文中结合图1,从终端的角度详细描述了根据本发明实施例的数据处理的方法,下面将结合图2和图3,从服务器的角度描述根据本发明实施例 的数据处理的方法。
如图2所示,根据本发明实施例的数据处理的方法200,例如可以由服务器执行,该方法200包括:
S210,接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据。
具体地,该目标感知数据可以为摄像头、麦克风、红外传感器或深度传感器等传感装置所测量或生成的数据;例如该目标感知数据为一张图片、或者一段视频,再或者一节录音等。
该请求消息可以只包括该目标感知数据的,即直接向服务器请求用于处理该目标感知数据的感知模型,应理解,这种情形下,由服务器确定该目标感知数据所属的目标场景,进而确定对应的目标感知模型;该请求消息也可以直接包括用于指示该目标感知数据所属的目标场景的标识,即用于请求该目标场景对应到的感知模型,这种情形下,服务器可以直接根据该请求消息确定目标场景,进而确定相应的感知模型。
S220,根据该请求消息,确定该目标感知数据所属的目标场景。
具体地,在终端发送的请求消息包括该目标感知数据的情况下,可以通过对该目标感知数据进行场景分析与识别,确定该目标感知数据所属的目标场景;在终端发送的请求消息指示了该目标感知数据所属场景的情况下,可以直接根据该请求消息确定该目标场景,本发明实施例对此不作限定。
可选地,在本发明实施例中,S220根据该请求消息,确定该目标感知数据所属的目标场景,包括:
通过对该请求消息中包括的该目标感知数据进行场景分析,确定该目标感知数据所属的该目标场景。
具体地,可以场景识别器对该目标感知数据进行场景分析,以确定该目标感知数据所属的目标场景。其中场景识别器可以是现有的场景分类模型,例如支持向量机(Support Vector Machine,简称为“SVM”)多分类器。具体地,以该目标感知数据为摄像头拍摄数据为例,从摄像头数据中抽取图像的全局特征(GIST)、稠密尺度不变特征转换(Dense Scale-invariant feature transform,简称为“Dense SIFT”)、或纹理直方图(Texton Histograms)等特征,将这些特征输入事先训练好的SVM多分类器,进行场景识别,输出场 景类型,例如识别出场景为“广场”。
进一步地,在该目标感知模型为在终端当前所处位置生成的感知数据的情况下,还可以结合终端当前所处位置的定位信息,进一步限定该目标感知数据所属的场景。
可选地,在本发明实施例中,该目标感知数据为在终端当前所处位置生成的数据;
其中,确定该目标感知数据所属的该目标场景,包括:
结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
具体地,以目标感知数据为在终端当前所在的广场A所拍的照片为例,通过SVM多分类器对该目标感知数据进行场景分析,其识别结果为该目标感知数据所属的目标场景为广场;然后获取终端当前所在位置的定位信息(即广场A),根据该定位信息,可以将该目标感知数据所属的目标场景由广场进一步限定为广场A。再例如,目标感知数据为在终端当前所在的厨房所拍的照片,通过SVM多分类器对该目标感知数据进行场景分析,其识别结果为该目标感知数据所属的目标场景为室内;然后获取终端当前所在位置的定位信息(即厨房),根据该定位信息,可以将该目标感知数据所属的目标场景由室内进一步限定为厨房。
可知,在本发明实施例中,通过结合终端的定位信息,可以将目标感知数据所属的场景进一步限定到更小的时空区域,应理解,更加具体的、范围相对较小的场景,其对应的感知模型也将相应地更加简化,其计算复杂度也相对较小,后续基于该感知模型对感知数据的识别计算的计算量也相对较小。
应理解,在本发明实施例中,可以采用现有的任意一种定位方法或多种定位方法的组合,获取终端当前位置的定位信息,本发明对此不作限定。
在终端发送的请求消息指示了该目标感知数据所属场景的情况下,可以直接根据该请求消息确定该目标场景。
可选地,在本发明实施例中,S220根据该请求消息,确定该目标感知数据所属的目标场景,包括:
根据该请求消息中包括的用于指示该目标场景的标识,确定该目标场景。
S230,从预存的感知模型库中,确定该目标场景对应的目标感知模型,该感知模型库中的每个模型分别对应一种场景。
该预存的感知模型库中存储有根据不同感知数据和不同场景的样本数据,所训练得到的不同场景分别对应的感知模型。
具体地,不同场景的感知模型在感知模型库中的存储形式可以为,场景标识+感知模型。应理解,不同场景的感知模型在感知模型库中的存储形式还可以是其他任意形式,本发明实施例对此不作限定,只要能使得根据场景的标识(场景编号或类型),从该感知模型库中获取到对应的感知模型库即可。
S240,根据该请求消息,向该终端发送该目标感知模型,以便于该终端根据该目标感知模型计算该目标感知数据的识别结果。
因此,本发明实施例的数据处理的方法,通过为终端提供需要识别的感知数据所属的场景对应的感知模型,使得终端根据该感知模型处理对应的感知数据,由于具体一种场景对应的感知模型的复杂度相对较小、模型准确度相对较高,因此能够有效降低计算复杂度,同时能够提高数据处理的速度和准确度。
在本发明实施例中,该预存的感知模型库中存储有根据不同感知数据和不同场景的样本数据,所训练得到的不同场景分别对应的感知模型。
可选地,在本发明实施例中,在接收到该请求消息之前,该方法200还包括:
S250,获取感知数据样本,该感知数据样本至少包括一部分具有场景标注信息和物品标注信息的感知数据;
S260,根据该感知数据样本,训练不同场景分别对应的感知模型;
S270,将该不同场景分别对应的感知模型存储到该感知模型库中,该感知模型库中包括该目标感知模型。
具体地,以感知数据为图像数据为例,训练不同场景对应的感知模型的具体步骤如图3所示:
在S310中,读取图像样本,该图像样本中至少有一部分图像具有场景标注信息和物品标注信息。
具体地,例如该图像样本中的图像Img00001的场景标注信息为<场景:广场>,物品标注信息为:<物品:花坛、长椅、游客、巴士、巴士站、轿车、 警察、儿童、气球、咖啡馆、鸽子>。进一步地,图像的物品标注信息还可以包括该物品在图像中的位置,例如采用局部矩形区域表示。
应理解,在本发明实施例中,图像样本的其中一部分图像可以同时具有场景标注信息和物品标注信息、另一部图像可以只具有物品标注信息、还有一部分图像可以既没有场景标注信息也没有物品标注信息。还应理解,该图像样本中的全部图像可以都具有各自的场景标注信息和物品标注信息,本发明实施例对此不作限定,只要保证该图像样本中至少具有一部分图像具有场景标注信息和物品标注信息即可。
在S320中,获取该图像样本中包括的所有图像的局部区域图像文件。
具体地,对读取的图像样本中包括的所有图像进行局部区域抽取。例如用不同大小的矩形滑动窗口,分别自左向右、自上而下地从图像P(图像样本中的任一个原始图像)中截取多个局部图像区域,生成多个局部区域图像文件。更具体地,采用大小分别为200×200和400×400的矩形滑动窗口,从左向右、自上而下地从3264×2448的原始图像中截取多个200*200和400×400的局部图像区域,进而生成该原始图像的多个局部区域图像文件。
在S330中,根据局部区域图像文件和图像样本中携带的物品标注信息,确定通用感知模型,或者称之为确定通过感知模型的参数文件。
具体地,将在S320中生成的图像样本中所有原始图像的局部区域图像文件作为通用感知模型的输入,结合该通用感知模型输出的物品类型计算信息和图像样本中携带的物品标注信息,确定该通用感知模型的参数文件。
应理解,该通用感知模型可以看作是卷积神经网络(CNN)模型与支持多分类的逻辑回归(Softmax)模型合并后的模型。确定该通用感知模型的步骤例如包括:将局部区域图像文件作为CNN模型的输入,相应地,CNN模型会输出相关的矩阵信息;然后将CNN模型输出的矩阵信息作为Softmax模型的输入,相应地,Softmax模型会输出物品类型计算结果;基于该物品类型计算结果和图像样本中携带的物品标注信息(二者之间的匹配度或错误率),可以计算得到CNN模型和Softmax模型的各自的参数,即确定了该通用感知模型。
还应理解,S330生成通用感知模型的方法可以采用现有的相关方法,为了简洁,这里不再赘述。
在S340中,确定目标图像样本,该目标图像样本中的图像具有场景标 注信息和物品标注信息。
具体地,在S310中读取的图像样本中,选择既有场景标注信息,又有物品标注信息的图像,将这类型图像确定为目标图像样本。例如第一类图像的标注信息为:<场景:餐馆;物品:椅子、桌子、酒瓶、盘子、筷子>、第二类图像的标注信息为:<场景:广场;物品:花坛、长椅、游客、巴士、轿车、警察、儿童、气球、咖啡馆>、第三类图像的标注信息为:<场景:广场;物品:花坛、长椅、游客、巴士、巴士站、轿车、气球、咖啡馆、鸽子>、第四类图像的标注信息为:<场景:病房;物品:病床、监护仪、呼吸机、呼叫器、支架、污物桶>、第五类图像的标注信息为:<场景:厨房;物品:烧水壶、水龙头、微波炉、盐罐、糖罐、番茄汁、盘子、燃气灶>。
应理解,S340中确定的目标图像样本中,每一种场景对应包括多张图像的图像集合,不应该理解为一种场景只对应一张图像的,换句话说,上面提到的场景标注信息为<场景:餐馆>的第一类图像是数张图像的集合。
在S350中,根据该目标图像样本中的图像及其物品标注信息和场景标注信息,基于通用感知模型,确定不同场景各自对应的感知模型。
具体地,以场景标注信息为<场景:广场>的第三类图像的局部区域图像文件(在S320中已经获取到)作为S330中确定的通用感知模型的输入,相应地,该通用感知模型会输出计算所得的物品类型计算信息;然后根据该第三类图像所携带的物品标注信息所指示的物品类型,确定该物品类型计算信息的错误率,同时衡量该通用感知模型的复杂度,综合考虑该错误率和复杂度,对该通用感知模型的参数进行调节和简化,其中,参数的简化包括对参数相似的计算节点进行聚类合并,对输出无贡献参数的裁剪等。通过上述对通用感知模型的参数的调节和简化,使得该物品类型计算信息的错误率和通用感知模型的复杂度均满足预定条件后,这时的经过参数简化后的感知模型可以成为场景<广场>对应的感知模型。应理解,在上述确定场景<广场>对应的感知模型的过程中,会备份好S330中确定的通用感知模型,以便于后续确定其他场景对应的感知模型。还应理解,类似地,可以获取到其他各个场景所对应的感知模型。
由于与全局通用的物体识别相比,特定场景中的物体识别的类别数量相对较少,则每个场景对应的感知模型的参数相对于S330中确定的通用感知模型的参数的数量大大减少。能够在提高识别计算准确率的前提下,有效降 低计算复杂度。
在S360中,将各个场景对应的感知模型,存储感知模型库中。
应理解,上面的实施例仅是为了帮助更好的理解本发明,而并非对本发明的限制。
应理解,由于每个场景中出现的对象类型有限,例如对于室外场景或者城市场景,可能需要识别人、车、建筑、或文字等对象,但基本不会出现各种动物和植物的识别需求。换句话说,每种场景中经常出现的需要识别的对象类别数都是相对较少的,相应地,每个场景各自对应的感知模型的模型参数也相对较少,从而,每个场景对应的感知模型的计算复杂度大大减小,而且也不会对存储空间有很高的需求。因此,本发明提出的数据处理的方法,能够在保持甚至提高数据识别的准确度的前提下,大大简化感知模型的复杂度和计算量,从而解决了计算能力与模型的复杂度之间矛盾,能够有效提高识别能力。
还应理解,上述训练各个场景对应的感知模型的过程(相当于更新感知模型库的过程),不局限于只在接收终端发送的用于请求处理目标感知数据的目标感知模型的请求消息之前执行一次;也可以按照一定的周期,定期执行感知模型的训练过程,这样可以参考终端实时请求的感知数据,不断丰富充实感知数据和场景的样本库,从而更加充实和完善场景的类型及其对应的感知模型,同时能够不断提高各个场景分别对应的感知模型的识别计算的准确度。
因此,本发明实施例的数据处理的方法,通过为终端提供需要识别的感知数据所属的场景对应的感知模型,使得终端根据该感知模型处理对应的感知数据,由于具体一种场景对应的感知模型的复杂度相对较小、模型准确度相对较高,因此能够有效降低计算复杂度,同时能够提高数据处理的速度和准确度。
还应理解,上文中在数据处理的方法100和数据处理的方法200中都提到了“预存的感知模型库”,总的来说,该“预存的感知模型库”用于存储用于处理感知数据的各个场景所对应的感知模型;但在方法100和方法200中,该“预存的感知模型库”的含义略有区别,具体说明如下:本发明实施例提供的数据处理的方法100例如由终端来执行,则该“预存的感知模型库”指的是终端内用于缓存从服务器获取的各个场景对应的感知模型的存储区域;换句 话说,该“预存的感知模型库”存储的感知模型为该终端已经处理的或者即将处理的感知数据所属的场景所对应的感知模型,可能没有包括所有场景分别对应的感知模型。而本发明实施例提供的数据处理的方法200的执行主体一般是服务器,则该“预存的感知模型库”指的是服务器中用于存储根据不同感知数据和不同场景的训练样本生成的各个场景分别对应的感知模型的存储区域;可以理解为服务器内的“预存的感知模型库”包括了所有场景分别对应的感知模型。
上文中结合图1至图3,详细描述了根据本发明实施例的数据处理的方法,下面将结合图4至图7,详细描述根据本发明实施例的数据处理的设备。
图4示出了根据本发明实施例的数据处理的设备400的示意性框图,如图4所示,该设备400包括:
获取模块410,用于获取目标感知数据,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
第一确定模块420,用于确定该获取模块获取的该目标感知数据所属的目标场景;
第二确定模块430,用于确定该第一确定模块确定的该目标场景对应的目标感知模型;
计算模块440,用于根据该第二确定模块确定的该目标感知模型,计算该获取模块获取的该目标感知数据的识别结果。
因此,本发明实施例的数据处理的设备400,通过确定要识别的感知数据所属的场景,采用该场景对应的感知模型计算获取该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
具体地,计算模块440可以为感知计算处理器,其功能为执行感知计算,例如根据卷积神经网络CNN、深度学习网络DNN等算法模型,执行感知数据的识别处理。将输入的感知数据,依次进行数据块选择、神经网络各层级联卷积和抽样计算、分类矩阵计算,最终产生识别结果。计算过程,包括并不限于在完全在通用CPU上执行,部分在GPU加速芯片上执行,或在专用芯片上执行。
可选地,作为一个实施例,该第一确定模块420具体用于,通过对该目标感知数据进行场景分析,确定该目标场景。
具体地,该第一确定模块420可以为场景识别器,其功能为识别输入的 感知数据所处的场景。输入感知数据,输出场景类型或场景编码,或其他可以表示场景的标识。
可选地,作为一个实施例,该目标感知数据为在终端当前所处位置生成的数据;
其中,该第一确定模块具体用于,结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
具体地,该第一确定模块输入感知数据和定位信息,输出场景类型或编码。
可选地,作为一个实施例,该第一确定模块包括:
第一发送单元,用于向服务器发送用于请求该目标感知数据所属的场景的第一请求;
第一接收单元,用于接收该服务器根据该第一请求发送的该目标场景。
可选地,作为一个实施例,该第二确定模块具体用于,从预存的感知模型库中,确定该目标场景对应的该目标感知模型,该感知模型库中的每个感知模型分别对应一种场景。
可选地,如图5所示,作为一个实施例,该设备包括:
获取模块410,用于获取目标感知数据,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
第一确定模块420,用于确定该获取模块获取的该目标感知数据所属的目标场景;
第二确定模块430,用于确定该第一确定模块确定的该目标场景对应的目标感知模型;
计算模块440,用于根据该第二确定模块确定的该目标感知模型,计算该获取模块获取的该目标感知数据的识别结果;
更新模块450,用于在该获取模块获取目标感知数据之前,根据用户历史场景序列,更新该感知模型库,该更新后的感知模型库中包括该目标场景对应的该目标感知模型。
应理解,更新模块450可以在第二确定模块430向服务器请求获取到目标感知模型后,将该目标感知模型存储到预存的感知模型库,以更新该感知模型库;该更新模块450还可以在获取模块410获取到需要识别计算的目标感知数据之前,通过预测算法,提前向服务器请求即将需要的感知模型,即 事先更新预存的感知模型库。
可选地,作为一个实施例,该第二确定模块包括:
第二发送单元,用于当确定预存的感知模型库中没有该目标场景对应的感知模型时,向服务器发送用于请求该目标场景对应的感知模型的第二请求,该感知模型库中的每个感知模型分别对应一种场景;
第二接收单元,用于接收该服务器根据该第二请求发送的该目标场景对应的该目标感知模型。
可选地,在本发明实施例中,该设备400还包括:缓存模块,用于将第二接收单元接收的该目标感知模型及其场景标识(场景类型或场景编号)缓存到预存的感知模型库中。具体地,该缓存模块可以为内存等高速访问设备。
因此,本发明实施例的数据处理的设备400,通过确定要识别的感知数据所属的场景,采用该场景对应的感知模型计算获取该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
应理解,根据本发明实施例的数据处理的设备400可对应于本发明实施例的数据处理的方法中的终端,并且设备400中的各个模块的上述和其它操作和/或功能分别为了实现图1至图3中的各个方法的相应流程,为了简洁,在此不再赘述。
上文中结合图4和图5,详细描述了根据本发明实施例的数据处理的设备400,下面将结合图6和图7,详细描述根据本发明实施例的数据处理的另一设备。
图6示出了根据本发明实施例的数据处理的设备500的示意性框图。如图6所示,该设备500包括:
接收模块510,用于接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
第一确定模块520,用于根据该接收模块接收的该请求消息,确定该目标感知数据所属的目标场景;
第二确定模块530,用于从预存的感知模型库中,确定该第一确定模块确定的该目标场景对应的目标感知模型,该感知模型库中的每个模型分别对应一种场景;
发送模块540,用于根据该接收模块接收的该请求消息,向该终端发送 该第二确定模块确定的该目标感知模型,以便于该终端根据该目标感知模型计算该目标感知数据的识别结果。
因此,本发明实施例的数据处理的设备500,通过为终端提供终端所需的场景所对应的感知模型,以便于终端根据该感知模型处理对应的感知数据,由于具体一个场景对应的感知模型的复杂度相对较小、模型准确度相对较高,因此能够有效降低计算复杂度,同时能够提高数据处理的速度和准确度。
如图7所示,可选地,作为一个实施例,该设备500包括:
接收模块510,用于接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
第一确定模块520,用于根据该接收模块接收的该请求消息,确定该目标感知数据所属的目标场景;
第二确定模块530,用于从预存的感知模型库中,确定该第一确定模块确定的该目标场景对应的目标感知模型,该感知模型库中的每个模型分别对应一种场景;
发送模块540,用于根据该接收模块接收的该请求消息,向该终端发送该第二确定模块确定的该目标感知模型,以便于该终端根据该目标感知模型计算该目标感知数据的识别结果;
获取模块550,用于在该接收模块接收到该请求消息之前,获取感知数据样本,该感知数据样本至少包括一部分具有场景标注信息和物品标注信息的感知数据;
训练模块560,用于根据该感知数据样本,训练不同场景分别对应的感知模型;
存储模块570,用于将该训练模块训练得到的该不同场景分别对应的感知模型存储到该感知模型库中,该感知模型库中包括该目标感知模型。
具体地,该训练模块560可以称之为模型训练服务器,功能为读取训练样本数据库,根据场景知识库中的各场景分类描述,训练出各种场景所需的感知模型参数。该训练模型560,输入训练样本数据、场景分类描述文件;输出各场景的感知模型参数文件。
其中,场景知识库为用于管理和保存各种场景对应的分类描述的存储空 间。各种场景对应的分类描述包括:各种场景下可能出现的类别,如物品、人物、动作行为、事件、文字等,还可以包括所有类别之间的层次关系,如动物-狗-金毛犬、汽车-轿车-宝马-宝马3系还、聚会-生日聚会。此外,对于已知空间结构的具体场景,该场景知识库还可以包括空间结构信息,以及每个空间区域对应的场景编号。
具体地,存储模块570用于保存训练模型560(模型训练服务器)生成的各场景的模型参数文件,例如,包括场景标识(类型或编号)及对应的模型参数文件。该存储模块570可以称之模型参数库。
可选地,作为一个实施例,该第一确定模块具体用于,通过对该请求消息中包括的该目标感知数据进行场景分析,确定该目标感知数据所属的该目标场景。
可选地,作为一个实施例,该目标感知数据为在终端当前所处位置生成的数据;
其中,该第一确定模块具体用于,结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
可选地,作为一个实施例,该第一确定模块具体用于,根据该请求消息中包括的用于指示该目标场景的标识,确定该目标场景。
因此,本发明实施例的数据处理的设备500,通过为终端提供终端所需的场景所对应的感知模型,以便于终端根据该感知模型处理对应的感知数据,由于具体一个场景对应的感知模型的复杂度相对较小、模型准确度相对较高,因此能够有效降低计算复杂度,同时能够提高数据处理的速度和准确度。
应理解,根据本发明实施例的数据处理的设备500可对应于本发明实施例的数据处理的方法中的服务器,并且设备500中的各个模块的上述和其它操作和/或功能分别为了实现图1至图3中的各个方法的相应流程,为了简洁,在此不再赘述。
如图8所示,本发明实施例还提供了一种数据处理的设备600,该设备600包括处理器610、存储器620、总线系统630、接收器640和发送器650。其中,处理器610、存储器620、接收器640和发送器650通过总线系统630相连,该存储器620用于存储指令,该处理器610用于执行该存储器620存储的指令,以控制接收器640接收信号,并控制发送器650发送信号。其中, 该处理器610用于,获取目标感知数据,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;确定该目标感知数据所属的目标场景;确定该目标场景对应的目标感知模型;根据该目标感知模型,计算该目标感知数据的识别结果。
因此,本发明实施例的数据处理的设备600,通过确定要识别的感知数据所属的场景,采用该场景对应的感知模型计算获取该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
可选地,作为一个实施例,处理器610具体用于,通过对该目标感知数据进行场景分析,确定该目标场景。
可选地,作为一个实施例,该目标感知数据为在终端当前所处位置生成的数据;
处理器610具体用于,结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
可选地,作为一个实施例,发送器650,用于向服务器发送用于请求该目标感知数据所属的场景的第一请求;接收器640,用于接收该服务器根据该第一请求发送的该目标场景。
可选地,作为一个实施例,处理器610具体用于,从预存的感知模型库中,确定该目标场景对应的该目标感知模型,该感知模型库中的每个感知模型分别对应一种场景。
可选地,作为一个实施例,处理器610具体用于,在获取目标感知数据之前,根据用户历史场景序列,更新该感知模型库,该更新后的感知模型库中包括该目标场景对应的该目标感知模型。
可选地,作为一个实施例,发送器650,用于当确定预存的感知模型库中没有该目标场景对应的感知模型时,向服务器发送用于请求该目标场景对应的感知模型的第二请求,该感知模型库中的每个感知模型分别对应一种场景;接收器640,用于接收该服务器根据该第二请求发送的该目标场景对应的该目标感知模型。
应理解,在本发明实施例中,该处理器610可以是中央处理单元(Central Processing Unit,简称为“CPU”),该处理器610还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。 通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器620可以包括只读存储器和随机存取存储器,并向处理器610提供指令和数据。存储器620的一部分还可以包括非易失性随机存取存储器。例如,存储器620还可以存储设备类型的信息。
该总线系统630除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统630。
在实现过程中,上述方法的各步骤可以通过处理器610中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器620,处理器610读取存储器620中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
因此,本发明实施例的数据处理的设备600,通过确定要识别的感知数据所属的场景,采用该场景对应的感知模型计算获取该感知数据的识别结果,相比现有技术,能够降低计算复杂度,从而能够提高数据处理的效率。
应理解,根据本发明实施例的数据处理的设备600可对应于本发明实施例的数据处理的方法中的终端,该设备600还可以对应于本发明实施例的数据处理的设备400,并且设备600中的各个模块的上述和其它操作和/或功能分别为了实现图1至图3中的各个方法的相应流程,为了简洁,在此不再赘述。
如图9所示,本发明实施例还提供了一种数据处理的设备700,该设备700包括处理器710、存储器720、总线系统730、接收器740和发送器750。其中,处理器710、存储器720、接收器740和发送器750通过总线系统730相连,该存储器720用于存储指令,该处理器710用于执行该存储器720存储的指令,以控制接收器740接收信号,并控制发送器750发送信号。其中,
接收器740,用于接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,该目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;处理器710,用于根据该请求消息,确定该目标感知数据所属的目标场景;从预存的感知模型库中,确定该目标场景对应 的目标感知模型,该感知模型库中的每个模型分别对应一种场景;发送器750,用于根据该请求消息,向该终端发送该目标感知模型,以便于该终端根据该目标感知模型计算该目标感知数据的识别结果。
因此,本发明实施例的数据处理的设备700,通过为终端提供终端所需的场景所对应的感知模型,以便于终端根据该感知模型处理对应的感知数据,由于具体一个场景对应的感知模型的复杂度相对较小、模型准确度相对较高,因此能够有效降低计算复杂度,同时能够提高数据处理的速度和准确度。
可选地,作为一个实施例,处理器710具体用于,在接收器740接收到该请求消息之前,获取感知数据样本,该感知数据样本至少包括一部分具有场景标注信息和物品标注信息的感知数据;根据该感知数据样本,训练不同场景分别对应的感知模型;将该不同场景分别对应的感知模型存储到该感知模型库中,该感知模型库中包括该目标感知模型。
可选地,作为一个实施例,处理器710具体用于,通过对该请求消息中包括的该目标感知数据进行场景分析,确定该目标感知数据所属的该目标场景。
可选地,作为一个实施例,该目标感知数据为在终端当前所处位置生成的数据;
处理器710具体用于,结合该终端当前所处位置的定位信息,对该目标感知数据进行场景分析,确定该目标场景。
可选地,作为一个实施例,处理器710具体用于,根据该请求消息中包括的用于指示该目标场景的标识,确定该目标场景。
应理解,在本发明实施例中,该处理器710可以是中央处理单元(Central Processing Unit,简称为“CPU”),该处理器710还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器720可以包括只读存储器和随机存取存储器,并向处理器710提供指令和数据。存储器720的一部分还可以包括非易失性随机存取存储器。例如,存储器720还可以存储设备类型的信息。
该总线系统730除包括数据总线之外,还可以包括电源总线、控制总线 和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线系统730。
在实现过程中,上述方法的各步骤可以通过处理器710中的硬件的集成逻辑电路或者软件形式的指令完成。结合本发明实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器720,处理器710读取存储器720中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。
因此,本发明实施例的数据处理的设备700,通过为终端提供终端所需的场景所对应的感知模型,以便于终端根据该感知模型处理对应的感知数据,由于具体一个场景对应的感知模型的复杂度相对较小、模型准确度相对较高,因此能够有效降低计算复杂度,同时能够提高数据处理的速度和准确度。
应理解,根据本发明实施例的数据处理的设备700可对应于本发明实施例的数据处理的方法中的服务器,该设备700还可以对应于本发明实施例的数据处理的设备500,并且设备700中的各个模块的上述和其它操作和/或功能分别为了实现图1至图3中的各个方法的相应流程,为了简洁,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个 系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种数据处理的方法,其特征在于,包括:
    获取目标感知数据,所述目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
    确定所述目标感知数据所属的目标场景;
    确定所述目标场景对应的目标感知模型;
    根据所述目标感知模型,计算所述目标感知数据的识别结果。
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述目标感知数据所属的目标场景,包括:
    通过对所述目标感知数据进行场景分析,确定所述目标场景。
  3. 根据权利要求2所述的方法,其特征在于,所述目标感知数据为在终端当前所处位置生成的数据;
    其中,所述通过对所述目标感知数据进行场景分析,确定所述目标场景,包括:
    结合所述终端当前所处位置的定位信息,对所述目标感知数据进行场景分析,确定所述目标场景。
  4. 根据权利要求1所述的方法,其特征在于,所述确定所述目标感知数据所属的目标场景,包括:
    向服务器发送用于请求所述目标感知数据所属的场景的第一请求;
    接收所述服务器根据所述第一请求发送的所述目标场景。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述确定所述目标场景对应的目标感知模型,包括:
    从预存的感知模型库中,确定所述目标场景对应的所述目标感知模型,所述感知模型库中的每个感知模型分别对应一种场景。
  6. 根据权利要求5所述的方法,其特征在于,在获取目标感知数据之前,所述方法还包括:
    根据用户历史场景序列,更新所述感知模型库,所述更新后的感知模型库中包括所述目标场景对应的所述目标感知模型。
  7. 根据权利要求1至4中任一项所述的方法,其特征在于,所述确定所述目标场景对应的目标感知模型,包括:
    当确定预存的感知模型库中没有所述目标场景对应的感知模型时,向服 务器发送用于请求所述目标场景对应的感知模型的第二请求,所述感知模型库中的每个感知模型分别对应一种场景;
    接收所述服务器根据所述第二请求发送的所述目标场景对应的所述目标感知模型。
  8. 一种数据处理的方法,其特征在于,包括:
    接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,所述目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
    根据所述请求消息,确定所述目标感知数据所属的目标场景;
    从预存的感知模型库中,确定所述目标场景对应的目标感知模型,所述感知模型库中的每个模型分别对应一种场景;
    根据所述请求消息,向所述终端发送所述目标感知模型,以便于所述终端根据所述目标感知模型计算所述目标感知数据的识别结果。
  9. 根据权利要求8所述的方法,其特征在于,在接收到所述请求消息之前,所述方法还包括:
    获取感知数据样本,所述感知数据样本至少包括一部分具有场景标注信息和物品标注信息的感知数据;
    根据所述感知数据样本,训练不同场景分别对应的感知模型;
    将所述不同场景分别对应的感知模型存储到所述感知模型库中,所述感知模型库中包括所述目标感知模型。
  10. 根据权利要求8或9所述的方法,其特征在于,所述根据所述请求消息,确定所述目标感知数据所属的目标场景,包括:
    通过对所述请求消息中包括的所述目标感知数据进行场景分析,确定所述目标感知数据所属的所述目标场景。
  11. 根据权利要求10所述的方法,其特征在于,所述目标感知数据为在终端当前所处位置生成的数据;
    其中,所述确定所述目标感知数据所属的所述目标场景,包括:
    结合所述终端当前所处位置的定位信息,对所述目标感知数据进行场景分析,确定所述目标场景。
  12. 根据权利要求8或9所述的方法,其特征在于,所述根据所述请求消息,确定所述目标感知数据所属的目标场景,包括:
    根据所述请求消息中包括的用于指示所述目标场景的标识,确定所述目标场景。
  13. 一种数据处理的设备,其特征在于,包括:
    获取模块,用于获取目标感知数据,所述目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
    第一确定模块,用于确定所述获取模块获取的所述目标感知数据所属的目标场景;
    第二确定模块,用于确定所述第一确定模块确定的所述目标场景对应的目标感知模型;
    计算模块,用于根据所述第二确定模块确定的所述目标感知模型,计算所述获取模块获取的所述目标感知数据的识别结果。
  14. 根据权利要求13所述的设备,其特征在于,所述第一确定模块具体用于,通过对所述目标感知数据进行场景分析,确定所述目标场景。
  15. 根据权利要求14所述的设备,其特征在于,所述目标感知数据为在终端当前所处位置生成的数据;
    其中,所述第一确定模块具体用于,结合所述终端当前所处位置的定位信息,对所述目标感知数据进行场景分析,确定所述目标场景。
  16. 根据权利要求13所述的设备,其特征在于,所述第一确定模块包括:
    第一发送单元,用于向服务器发送用于请求所述目标感知数据所属的场景的第一请求;
    第一接收单元,用于接收所述服务器根据所述第一请求发送的所述目标场景。
  17. 根据权利要求13至16中任一项所述的设备,其特征在于,所述第二确定模块具体用于,从预存的感知模型库中,确定所述目标场景对应的所述目标感知模型,所述感知模型库中的每个感知模型分别对应一种场景。
  18. 根据权利要求17所述的设备,其特征在于,所述设备还包括:
    更新模块,用于在所述获取模块获取目标感知数据之前,根据用户历史场景序列,更新所述感知模型库,所述更新后的感知模型库中包括所述目标场景对应的所述目标感知模型。
  19. 根据权利要求13至16中任一项所述的设备,其特征在于,所述第 二确定模块包括:
    第二发送单元,用于当确定预存的感知模型库中没有所述目标场景对应的感知模型时,向服务器发送用于请求所述目标场景对应的感知模型的第二请求,所述感知模型库中的每个感知模型分别对应一种场景;
    第二接收单元,用于接收所述服务器根据所述第二请求发送的所述目标场景对应的所述目标感知模型。
  20. 一种数据处理的设备,其特征在于,包括:
    接收模块,用于接收终端发送的用于请求目标感知数据所属的场景所对应的感知模型的请求消息,所述目标感知数据为下列数据中的任一种:图像数据、视频数据和声音数据;
    第一确定模块,用于根据所述接收模块接收的所述请求消息,确定所述目标感知数据所属的目标场景;
    第二确定模块,用于从预存的感知模型库中,确定所述第一确定模块确定的所述目标场景对应的目标感知模型,所述感知模型库中的每个模型分别对应一种场景;
    发送模块,用于根据所述接收模块接收的所述请求消息,向所述终端发送所述第二确定模块确定的所述目标感知模型,以便于所述终端根据所述目标感知模型计算所述目标感知数据的识别结果。
  21. 根据权利要求20所述的设备,其特征在于,所述设备还包括:
    获取模块,用于在所述接收模块接收到所述请求消息之前,获取感知数据样本,所述感知数据样本至少包括一部分具有场景标注信息和物品标注信息的感知数据;
    训练模块,用于根据所述感知数据样本,训练不同场景分别对应的感知模型;
    存储模块,用于将所述训练模块训练得到的所述不同场景分别对应的感知模型存储到所述感知模型库中,所述感知模型库中包括所述目标感知模型。
  22. 根据权利要求20或21所述的设备,其特征在于,所述第一确定模块具体用于,通过对所述请求消息中包括的所述目标感知数据进行场景分析,确定所述目标感知数据所属的所述目标场景。
  23. 根据权利要求22所述的设备,其特征在于,所述目标感知数据为 在终端当前所处位置生成的数据;
    其中,所述第一确定模块具体用于,结合所述终端当前所处位置的定位信息,对所述目标感知数据进行场景分析,确定所述目标场景。
  24. 根据权利要求20或21所述的设备,其特征在于,所述第一确定模块具体用于,根据所述请求消息中包括的用于指示所述目标场景的标识,确定所述目标场景。
PCT/CN2015/088832 2014-09-16 2015-09-02 数据处理的方法和设备 WO2016041442A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP15842270.9A EP3188081B1 (en) 2014-09-16 2015-09-02 Data processing method and device
US15/460,339 US10452962B2 (en) 2014-09-16 2017-03-16 Recognition method and device for a target perception data
US16/586,209 US11093806B2 (en) 2014-09-16 2019-09-27 Recognition method and device for a target perception data
US17/403,315 US20220036142A1 (en) 2014-09-16 2021-08-16 Recognition method and device for a target perception data

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410471480.X 2014-09-16
CN201410471480.XA CN105488044A (zh) 2014-09-16 2014-09-16 数据处理的方法和设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/460,339 Continuation US10452962B2 (en) 2014-09-16 2017-03-16 Recognition method and device for a target perception data

Publications (1)

Publication Number Publication Date
WO2016041442A1 true WO2016041442A1 (zh) 2016-03-24

Family

ID=55532544

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/088832 WO2016041442A1 (zh) 2014-09-16 2015-09-02 数据处理的方法和设备

Country Status (4)

Country Link
US (3) US10452962B2 (zh)
EP (1) EP3188081B1 (zh)
CN (2) CN105488044A (zh)
WO (1) WO2016041442A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147711A (zh) * 2019-02-27 2019-08-20 腾讯科技(深圳)有限公司 视频场景识别方法、装置、存储介质和电子装置
EP3557846A4 (en) * 2016-12-26 2019-12-25 Huawei Technologies Co., Ltd. DATA PROCESSING METHOD, END DEVICE, CLOUD DEVICE AND END CLOUD COLLABORATION SYSTEM
CN112131916A (zh) * 2019-06-25 2020-12-25 杭州海康威视数字技术股份有限公司 目标抓拍方法、装置、电子设备及存储介质
CN113408319A (zh) * 2020-03-16 2021-09-17 广州汽车集团股份有限公司 一种城市道路异常感知处理方法、装置、系统及存储介质
CN113625599A (zh) * 2020-05-08 2021-11-09 未来穿戴技术有限公司 按摩仪控制方法、装置、系统、计算机设备和存储介质
CN112131916B (zh) * 2019-06-25 2024-06-04 杭州海康威视数字技术股份有限公司 目标抓拍方法、装置、电子设备及存储介质

Families Citing this family (44)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109478311A (zh) * 2016-07-30 2019-03-15 华为技术有限公司 一种图像识别方法及终端
CN106653057A (zh) * 2016-09-30 2017-05-10 北京智能管家科技有限公司 一种数据处理方法及装置
EP3340609B1 (en) 2016-12-22 2024-03-27 Samsung Electronics Co., Ltd. Apparatus and method for processing image
KR102407815B1 (ko) * 2016-12-22 2022-06-13 삼성전자주식회사 영상 처리 장치 및 방법
CN106997243B (zh) * 2017-03-28 2019-11-08 北京光年无限科技有限公司 基于智能机器人的演讲场景监控方法及装置
US11176632B2 (en) 2017-04-07 2021-11-16 Intel Corporation Advanced artificial intelligence agent for modeling physical interactions
WO2018201349A1 (zh) * 2017-05-03 2018-11-08 华为技术有限公司 一种应急车辆的识别方法及装置
CN107316635B (zh) * 2017-05-19 2020-09-11 科大讯飞股份有限公司 语音识别方法及装置、存储介质、电子设备
CN107329973A (zh) * 2017-05-24 2017-11-07 努比亚技术有限公司 一种实现通知处理的方法及设备
CN107221332A (zh) * 2017-06-28 2017-09-29 上海与德通讯技术有限公司 机器人的交互方法及系统
TWI636404B (zh) * 2017-07-31 2018-09-21 財團法人工業技術研究院 深度神經網路、使用深度神經網路的方法與電腦可讀媒體
CN107316035A (zh) * 2017-08-07 2017-11-03 北京中星微电子有限公司 基于深度学习神经网络的对象识别方法及装置
CN107832769A (zh) * 2017-11-09 2018-03-23 苏州铭冠软件科技有限公司 物体位于环境中的视觉识别方法
CN107832795B (zh) * 2017-11-14 2021-07-27 深圳码隆科技有限公司 物品识别方法、系统以及电子设备
CN107885855B (zh) * 2017-11-15 2021-07-13 福州掌易通信息技术有限公司 基于智能终端的动态漫画生成方法及系统
CN109934077B (zh) * 2017-12-19 2020-12-04 杭州海康威视数字技术股份有限公司 一种图像识别方法和电子设备
CN108307067A (zh) * 2018-01-25 2018-07-20 维沃移动通信有限公司 一种定时提醒方法及移动终端
CN108198559A (zh) * 2018-01-26 2018-06-22 上海萌王智能科技有限公司 一种可学习动作的语音控制机器人系统
CN108573279A (zh) * 2018-03-19 2018-09-25 精锐视觉智能科技(深圳)有限公司 图像标注方法及终端设备
CN108830235B (zh) * 2018-06-21 2020-11-24 北京字节跳动网络技术有限公司 用于生成信息的方法和装置
CN109117817B (zh) * 2018-08-28 2022-06-14 摩佰尔(天津)大数据科技有限公司 人脸识别的方法及装置
CN109189986B (zh) * 2018-08-29 2020-07-28 百度在线网络技术(北京)有限公司 信息推荐方法、装置、电子设备和可读存储介质
CN109461495B (zh) * 2018-11-01 2023-04-14 腾讯科技(深圳)有限公司 一种医学图像的识别方法、模型训练的方法及服务器
CN111209904A (zh) * 2018-11-21 2020-05-29 华为技术有限公司 一种业务处理的方法以及相关装置
CN109598885B (zh) * 2018-12-21 2021-06-11 广东中安金狮科创有限公司 监控系统及其报警方法
CN109815844A (zh) * 2018-12-29 2019-05-28 西安天和防务技术股份有限公司 目标检测方法及装置、电子设备和存储介质
CN111598976B (zh) * 2019-02-01 2023-08-22 华为技术有限公司 场景识别方法及装置、终端、存储介质
US11182890B2 (en) * 2019-03-01 2021-11-23 Husky Oil Operations Limited Efficient system and method of determining a permeability ratio curve
CN111796663B (zh) * 2019-04-09 2022-08-16 Oppo广东移动通信有限公司 场景识别模型更新方法、装置、存储介质及电子设备
CN111797865A (zh) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 数据处理方法、装置、存储介质和电子设备
KR20200132295A (ko) * 2019-05-16 2020-11-25 한국전자통신연구원 패킷 캡쳐 기반 실내 위치 추정 장치 및 방법
CN110517665B (zh) * 2019-08-29 2021-09-03 中国银行股份有限公司 获取测试样本的方法及装置
CN112818689B (zh) * 2019-11-15 2023-07-21 马上消费金融股份有限公司 一种实体识别方法、模型训练方法及装置
CN110991381B (zh) * 2019-12-12 2023-04-25 山东大学 一种基于行为和语音智能识别的实时课堂学生状态分析与指示提醒系统和方法
CN111639525A (zh) * 2020-04-22 2020-09-08 上海擎感智能科技有限公司 一种感知算法的训练方法、装置及计算机存储介质
CN112164401B (zh) * 2020-09-18 2022-03-18 广州小鹏汽车科技有限公司 语音交互方法、服务器和计算机可读存储介质
CN112277951B (zh) * 2020-10-29 2021-08-20 北京罗克维尔斯科技有限公司 车辆感知模型生成方法、车辆自动驾驶控制方法及装置
CN112580470A (zh) * 2020-12-11 2021-03-30 北京软通智慧城市科技有限公司 城市视觉感知方法、装置、电子设备和存储介质
CN114970654B (zh) * 2021-05-21 2023-04-18 华为技术有限公司 数据处理方法、装置和终端
CN113505743B (zh) * 2021-07-27 2023-07-25 中国平安人寿保险股份有限公司 关键视频数据提取方法、系统、计算机设备及存储介质
CN113657228A (zh) * 2021-08-06 2021-11-16 北京百度网讯科技有限公司 数据处理的方法、设备和存储介质
US11880428B2 (en) * 2021-11-12 2024-01-23 Toyota Motor Engineering & Manufacturing North America, Inc. Methods and systems for updating perception models based on geolocation features
CN114202709B (zh) * 2021-12-15 2023-10-10 中国电信股份有限公司 对象识别方法、装置及存储介质
CN116561604B (zh) * 2023-07-10 2023-09-29 北京千方科技股份有限公司 货车的识别方法、装置、电子设备及介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880879A (zh) * 2012-08-16 2013-01-16 北京理工大学 基于分布式和svm分类器的室外海量物体识别方法和系统
CN103942573A (zh) * 2014-02-18 2014-07-23 西安电子科技大学 一种基于空间关系的潜在狄利克雷模型自然场景图像分类方法

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011140701A1 (en) * 2010-05-11 2011-11-17 Nokia Corporation Method and apparatus for determining user context
US8645554B2 (en) 2010-05-27 2014-02-04 Nokia Corporation Method and apparatus for identifying network functions based on user data
CN102074231A (zh) * 2010-12-30 2011-05-25 万音达有限公司 语音识别方法和语音识别系统
US20120232993A1 (en) * 2011-03-08 2012-09-13 Bank Of America Corporation Real-time video image analysis for providing deepening customer value
CN104094287A (zh) * 2011-12-21 2014-10-08 诺基亚公司 用于情境识别的方法、装置以及计算机软件
US8687104B2 (en) 2012-03-27 2014-04-01 Amazon Technologies, Inc. User-guided object identification
US8694522B1 (en) * 2012-03-28 2014-04-08 Amazon Technologies, Inc. Context dependent recognition
CN103593643B (zh) 2012-08-16 2019-02-12 百度在线网络技术(北京)有限公司 一种图像识别的方法及系统
CN103035135B (zh) * 2012-11-27 2014-12-10 北京航空航天大学 基于增强现实技术的儿童认知系统及认知方法
US20140207873A1 (en) 2013-01-18 2014-07-24 Ford Global Technologies, Llc Method and Apparatus for Crowd-Sourced Information Presentation
US9589205B2 (en) * 2014-05-15 2017-03-07 Fuji Xerox Co., Ltd. Systems and methods for identifying a user's demographic characteristics based on the user's social media photographs
US9697235B2 (en) * 2014-07-16 2017-07-04 Verizon Patent And Licensing Inc. On device image keyword identification and content overlay
US20160239706A1 (en) * 2015-02-13 2016-08-18 Qualcomm Incorporated Convolution matrix multiply with callback for deep tiling for deep convolutional neural networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880879A (zh) * 2012-08-16 2013-01-16 北京理工大学 基于分布式和svm分类器的室外海量物体识别方法和系统
CN103942573A (zh) * 2014-02-18 2014-07-23 西安电子科技大学 一种基于空间关系的潜在狄利克雷模型自然场景图像分类方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3188081A4 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3557846A4 (en) * 2016-12-26 2019-12-25 Huawei Technologies Co., Ltd. DATA PROCESSING METHOD, END DEVICE, CLOUD DEVICE AND END CLOUD COLLABORATION SYSTEM
US11861499B2 (en) 2016-12-26 2024-01-02 Huawei Technologies Co., Ltd. Method, terminal-side device, and cloud-side device for data processing and terminal-cloud collaboration system
EP4280112A3 (en) * 2016-12-26 2024-02-14 Huawei Technologies Co., Ltd. Data processing method and end-cloud collaboration system
CN110147711A (zh) * 2019-02-27 2019-08-20 腾讯科技(深圳)有限公司 视频场景识别方法、装置、存储介质和电子装置
CN110147711B (zh) * 2019-02-27 2023-11-14 腾讯科技(深圳)有限公司 视频场景识别方法、装置、存储介质和电子装置
CN112131916A (zh) * 2019-06-25 2020-12-25 杭州海康威视数字技术股份有限公司 目标抓拍方法、装置、电子设备及存储介质
CN112131916B (zh) * 2019-06-25 2024-06-04 杭州海康威视数字技术股份有限公司 目标抓拍方法、装置、电子设备及存储介质
CN113408319A (zh) * 2020-03-16 2021-09-17 广州汽车集团股份有限公司 一种城市道路异常感知处理方法、装置、系统及存储介质
CN113625599A (zh) * 2020-05-08 2021-11-09 未来穿戴技术有限公司 按摩仪控制方法、装置、系统、计算机设备和存储介质
CN113625599B (zh) * 2020-05-08 2023-09-22 未来穿戴技术有限公司 按摩仪控制方法、装置、系统、计算机设备和存储介质

Also Published As

Publication number Publication date
EP3188081A4 (en) 2017-09-20
US20200097779A1 (en) 2020-03-26
US11093806B2 (en) 2021-08-17
US20220036142A1 (en) 2022-02-03
EP3188081A1 (en) 2017-07-05
CN115690558A (zh) 2023-02-03
EP3188081B1 (en) 2021-05-05
US10452962B2 (en) 2019-10-22
US20170185873A1 (en) 2017-06-29
CN105488044A (zh) 2016-04-13

Similar Documents

Publication Publication Date Title
WO2016041442A1 (zh) 数据处理的方法和设备
US11822600B2 (en) Content tagging
WO2020187153A1 (zh) 目标检测方法、模型训练方法、装置、设备及存储介质
US20210021551A1 (en) Content navigation with automated curation
TWI546519B (zh) 興趣點的展現方法及裝置
KR102567285B1 (ko) 모바일 비디오 서치 기법
US10291767B2 (en) Information presentation method and device
US8914393B2 (en) Search results using density-based map tiles
US9905051B2 (en) Context-aware tagging for augmented reality environments
KR20210137236A (ko) 머신 러닝 분류들에 기초한 디바이스 위치
US11809450B2 (en) Selectively identifying and recommending digital content items for synchronization
US20120011142A1 (en) Feedback to improve object recognition
US20090083275A1 (en) Method, Apparatus and Computer Program Product for Performing a Visual Search Using Grid-Based Feature Organization
US20190236450A1 (en) Multimodal machine learning selector
CN108764051B (zh) 图像处理方法、装置及移动终端
WO2020047261A1 (en) Active image depth prediction
US11763130B2 (en) Compact neural networks using condensed filters
CN104520848A (zh) 按照出席者搜索事件
KR20230013280A (ko) 클라이언트 애플리케이션 콘텐츠 분류 및 발견
US20200342366A1 (en) Recording Medium, Information Processing System, and Information Processing Method
WO2020115944A1 (ja) マップデータ生成装置
US20230239721A1 (en) Information processing apparatus, information processing method, and non-transitory storage medium
CN108121735B (zh) 语音搜索方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15842270

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015842270

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015842270

Country of ref document: EP