WO2022179271A1 - 反馈搜索结果的方法、装置及存储介质 - Google Patents

反馈搜索结果的方法、装置及存储介质 Download PDF

Info

Publication number
WO2022179271A1
WO2022179271A1 PCT/CN2021/139762 CN2021139762W WO2022179271A1 WO 2022179271 A1 WO2022179271 A1 WO 2022179271A1 CN 2021139762 W CN2021139762 W CN 2021139762W WO 2022179271 A1 WO2022179271 A1 WO 2022179271A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
objects
search
score
scene
Prior art date
Application number
PCT/CN2021/139762
Other languages
English (en)
French (fr)
Inventor
郝磊
王育
王敏
许松岑
钟伟才
赵振华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21927695.3A priority Critical patent/EP4283491A1/en
Priority to US18/548,039 priority patent/US20240126808A1/en
Publication of WO2022179271A1 publication Critical patent/WO2022179271A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/535Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5854Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects

Definitions

  • the present application relates to the technical field of terminals, and in particular, to a method, an apparatus and a storage medium for feeding back search results.
  • Visual search uses images as a search input source to search to obtain various search results such as related images and texts of objects in the image. More and more users use visual search technology to achieve search requirements for some objects on electronic devices.
  • an electronic device acquires an image, and then performs image detection processing on the image to determine one or more objects in the image.
  • the electronic device searches the determined object to obtain search results.
  • the electronic device feeds back the search result of each object to the user according to the position sequence of each object in the image.
  • search results fed back to the user are the search results of each object in the first image, and the feedback is less pertinent, resulting in low accuracy of the search results finally fed back to the user.
  • the present application provides a method, device and storage medium for feeding back search results, which solve the problem of low accuracy of search results fed back to users in the prior art.
  • a method for feeding back search results comprising:
  • the first image includes M objects, where M is an integer greater than or equal to 2;
  • N objects among the M objects when the N is greater than or equal to 2, determine the arrangement order of the N objects, and the N is a positive integer less than or equal to the M;
  • the arrangement order of any one of the N objects is determined based on any one or more of a scene intent weight, a confidence score, and an object relationship score, and the scene intent weight is used to indicate that in the first
  • the search results of some or all of the N objects are fed back.
  • the search results of the objects that have good search results and the users may be interested in are preferentially fed back to the user, so that the feedback has a certain pertinence, so as to achieve The effect of improving the accuracy of the final feedback search results.
  • any one or more of the scene intent weight, confidence score, and object relationship score of any one of the N objects are determined in the following manner:
  • a confidence score for the first object is determined based on the object area of the first object in the first image and the object category of the first object, and/or, based on the first object in the first object Object regions in an image determine an object relationship score for the first object, and/or determine a scene intent weight for the first object based on the object category of the first object.
  • the present application performs image detection processing on the first image through the target detection model to obtain the object area and/or object category of the first object in the first image, so as to determine the first object based on the image area and/or object category of the first object At least one of the scene intent weight, confidence score, and object relationship score of an object, and then use at least one of the scene intent weight, confidence score, and object relationship score of each object to filter the objects in the first image. and sorting to accurately feed back to the user what may be of interest to the user.
  • the determining the confidence score of the first object based on the object area of the first object in the first image and the object category of the first object includes:
  • the largest similarity among the determined multiple similarities is used as the confidence score of the first object.
  • the search object is screened out from the first image according to the confidence score, so that the intent understanding is echoed with the search result, and the accuracy of the intent search can be improved.
  • the first image includes a plurality of regions, and each region in the plurality of regions has a preset score indicating the degree of importance of the position of the each region, and the The preset scores of at least two of the multiple areas are different;
  • the determining the object relationship score of the first object based on the object area of the first object in the first image includes:
  • An object relationship score of the first object is determined based on the positional importance value of the first object.
  • the method before determining the object relationship score of the first object based on the position importance value of the first object, the method further includes:
  • the first object When a reference object is included in the first image, the first object is acquired based on an object area of the first object in the first image and an object area of the reference object in the first image The distance from the reference object in the first image is used as the degree of affiliation of the first object;
  • the determining the object relationship score of the first object based on the position importance value of the first object includes:
  • An object relationship score of the first object is determined based on the positional importance degree value and the affiliation degree value of the first object.
  • the object relationship score of each object makes the search intention of the object associated with the object attribute, so as to preferentially recommend the search results of the search object that the user may be interested in, thereby improving the user experience.
  • the determining the scene intent weight of the first object based on the object category of the first object includes:
  • the scene intent weight of the first object is determined based on the scene category of the first image, the object category of the first object, and the correspondence between the scene category, the object category, and the scene intent weight.
  • the search results of the search objects in the first image are displayed according to the search intent ranking.
  • the search results are combined with the scene, so that the search results are closer to the scene, and the search results are prevented from being separated from the scene of the first image, so that the search results displayed to the user are more in line with the scene of the first image, thereby giving priority to the user.
  • the quality scores of the N objects in the first image are greater than or equal to a quality score threshold, and the quality score of any one of the objects is based on the blurriness and/or of the any one of the objects. Completeness is determined.
  • the intent understanding is associated with the object attributes, and the search processing of some objects that do not match the search intent can be avoided, the effectiveness and accuracy of the search can be improved, and the user experience can be improved.
  • feeding back the search results of some or all of the N objects including:
  • query information Based on the acquired object tags, generate query information corresponding to some or all of the N objects, where the query information is used to prompt whether information associated with the object tags needs to be acquired;
  • the query information of some or all of the N objects is displayed, so as to feed back the search results of some or all of the N objects.
  • the query information is generated based on the object tag, and the search result is fed back to the user through the query information, so that the user can quickly understand the feedback result, and the simplicity of the feedback interface can be improved.
  • the N objects include a second object and a third object
  • the arrangement order of the second object is before the arrangement order of the third object
  • the search result corresponding to the second object is in
  • the display order on the display screen is before the display order of the search result corresponding to the third object on the display screen.
  • Display the search results of the objects in the front row in the front row that is, the search results of the objects in the front row are displayed first, so that the user can view the search results displayed in the front row of the display screen first, so as to give the user feedback to the user.
  • an apparatus for feeding back search results comprising:
  • an acquisition module configured to acquire a first image, where the first image includes M objects, where M is an integer greater than or equal to 2;
  • a determination module configured to, for N objects among the M objects, determine the arrangement order of the N objects when the N is greater than or equal to 2, and the N is less than or equal to the M positive integer;
  • the arrangement order of any one of the N objects is determined based on any one or more of a scene intent weight, a confidence score, and an object relationship score, and the scene intent weight is used to indicate that in the first
  • a feedback module configured to feed back the search results of some or all of the N objects according to the arrangement order of the N objects.
  • the determining module is used for:
  • a confidence score for the first object is determined based on the object area of the first object in the first image and the object category of the first object, and/or, based on the first object in the first object Object regions in an image determine an object relationship score for the first object, and/or determine a scene intent weight for the first object based on the object category of the first object.
  • the determining module is used for:
  • the largest similarity among the determined multiple similarities is used as the confidence score of the first object.
  • the first image includes a plurality of regions, and each region in the plurality of regions has a preset score indicating the degree of importance of the position of the each region, and the The preset scores of at least two areas in the multiple areas are different; the determining module is used for:
  • An object relationship score of the first object is determined based on the positional importance value of the first object.
  • the determining module is used for:
  • the first object When a reference object is included in the first image, the first object is acquired based on an object area of the first object in the first image and an object area of the reference object in the first image The distance from the reference object in the first image is used as the degree of affiliation of the first object;
  • An object relationship score of the first object is determined based on the positional importance degree value and the affiliation degree value of the first object.
  • the determining module is used for:
  • the scene intent weight of the first object is determined based on the scene category of the first image, the object category of the first object, and the correspondence between the scene category, the object category, and the scene intent weight.
  • the quality scores of the N objects in the first image are greater than or equal to a quality score threshold, and the quality score of any one of the objects is based on the blurriness and/or of the any one of the objects. Completeness is determined.
  • the feedback module is used for:
  • query information Based on the acquired object tags, generate query information corresponding to some or all of the N objects, where the query information is used to prompt whether information associated with the object tags needs to be acquired;
  • the query information of some or all of the N objects is displayed, so as to feed back the search results of some or all of the N objects.
  • the N objects include a second object and a third object
  • the arrangement order of the second object is before the arrangement order of the third object
  • the search result corresponding to the second object is in
  • the display order on the display screen is before the display order of the search result corresponding to the third object on the display screen.
  • an electronic device in a third aspect, includes a processor and a memory, and the memory is used to store a program that supports the electronic device to perform the method for feeding back a search result according to any one of the above-mentioned first aspects, and storing data involved in implementing the method for feeding back search results according to any one of the above first aspects;
  • the processor is configured to execute the program stored in the memory;
  • the electronic device may also include a communication a bus for establishing a connection between the processor and the memory.
  • a computer-readable storage medium where instructions are stored in the computer-readable storage medium, when the computer-readable storage medium is run on a computer, the computer executes the method described in any one of the above-mentioned first aspect.
  • a computer program product comprising instructions that, when executed on a computer, cause the computer to execute the method for feeding back search results described in the first aspect above.
  • a first image is acquired, where the first image includes M objects, where M is an integer greater than or equal to 2.
  • N objects among the M objects when the number of N objects is multiple, determine the arrangement order of the N objects, and feed back some or all of the N objects according to the arrangement order of the N objects search results.
  • the arrangement order of any one of the N objects is determined based on any one or more of the scene intent weight, confidence score, and object relationship score, and the scene intent weight is used to indicate that in the scene corresponding to the first image
  • the probability of any object being searched, the confidence score is the similarity between any object and the image in the image library, and the object relationship score is used to indicate the importance of any object in the first image.
  • the N objects are filtered and sorted according to the arrangement order of the N objects, so that the search results of the objects with good search results and the user may be interested in are preferentially fed back to the user, so that the feedback has a certain pertinence, thereby improving the efficiency of the user.
  • the accuracy of the final feedback search results are filtered and sorted according to the arrangement order of the N objects, so that the search results of the objects with good search results and the user may be interested in are preferentially fed back to the user, so that the feedback has a certain pertinence, thereby improving the efficiency of the user.
  • FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 2 is a block diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is an interface display diagram of a browser running in a mobile phone according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an intent search process provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a feedback effect of a search result provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a feedback effect of another search result provided by an embodiment of the present application.
  • FIG. 7 is a system architecture diagram of a feedback search result provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an implementation framework for feeding back search results provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a method for feeding back search results provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram of an image detection process provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a region image of an object obtained after image processing according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of the ambiguity and completeness of an object provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a confidence score of an object provided by an embodiment of the present application.
  • FIG. 15 is a schematic diagram of an image division effect provided by an embodiment of the present application.
  • 16 is a schematic diagram of a Gaussian distribution of a preset score provided by an embodiment of the present application.
  • 17 is a schematic flowchart of another method for feeding back search results provided by an embodiment of the present application.
  • FIG. 19 is a schematic structural diagram of an apparatus for feeding back search results provided by an embodiment of the present application.
  • the electronic device can install and run an application (application, APP) with a visual search function, for example, the APP can be a shopping APP, a celebrity search APP, a browser, and the like.
  • the electronic device may be a device such as a wearable device or a terminal device.
  • the wearable device may include, but is not limited to, a smart watch, a smart bracelet, a smart brooch, a smart eye mask, and smart glasses.
  • Terminal devices may include but are not limited to mobile phones, tablet computers, augmented reality (AR)/virtual reality (VR) devices, ultra-mobile personal computers (UMPC), laptops, netbooks, Personal digital assistant (PDA).
  • AR augmented reality
  • VR virtual reality
  • UMPC ultra-mobile personal computers
  • PDA Personal digital assistant
  • FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM Subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device 100 .
  • the electronic device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • graphics processor graphics processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate an operation control signal according to the instruction operation code and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory.
  • the memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use instructions or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus that includes a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may contain multiple sets of I2C buses.
  • the processor 110 can be respectively coupled to the touch sensor 180K, the charger, the flash, the camera 193 and the like through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate with each other through the I2C bus interface, so as to realize the touch function of the electronic device 100 .
  • the I2S interface can be used for audio communication.
  • the processor 110 may contain multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications, sampling, quantizing and encoding analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is typically used to connect the processor 110 with the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interfaces include camera serial interface (CSI), display serial interface (DSI), etc.
  • the processor 110 communicates with the camera 193 through a CSI interface, so as to realize the photographing function of the electronic device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to implement the display function of the electronic device 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like.
  • the GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
  • the interface can also be used to connect other electronic devices, such as AR devices, etc.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 .
  • the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100 . While the charging management module 140 charges the battery 142 , it can also supply power to the electronic device through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a separate device.
  • the modulation and demodulation processor may be independent of the processor 110, and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellites System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT wireless fidelity
  • GNSS global navigation satellites System
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou navigation satellite system (beidou navigation satellite system, BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • global positioning system global positioning system, GPS
  • global navigation satellite system global navigation satellite system, GLONASS
  • Beidou navigation satellite system beidou navigation satellite system, BDS
  • quasi-zenith satellite system quadsi -zenith satellite system, QZSS
  • SBAS satellite based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED diode AMOLED
  • flexible light-emitting diode flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 100 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the electronic device 100 can realize the shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194 and the application processor.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the camera photosensitive element through the lens, the light signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 100 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy and so on.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs.
  • the electronic device 100 can play or record videos of various encoding formats, such as: Moving Picture Experts Group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4 and so on.
  • MPEG Moving Picture Experts Group
  • MPEG2 moving picture experts group
  • MPEG3 MPEG4
  • MPEG4 Moving Picture Experts Group
  • the NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • Applications such as intelligent cognition of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device 100 by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like.
  • the storage data area may store data (such as audio data, phone book, etc.) created during the use of the electronic device 100 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece" is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D may be the USB interface 130, or may be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense pressure signals, and can convert the pressure signals into electrical signals.
  • the pressure sensor 180A may be provided on the display screen 194 .
  • the capacitive pressure sensor may be comprised of at least two parallel plates of conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes.
  • the electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A.
  • the electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example, when a touch operation whose intensity is less than the first pressure threshold acts on the short message application icon, the instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, the instruction to create a new short message is executed.
  • the gyro sensor 180B may be used to determine the motion attitude of the electronic device 100 .
  • the angular velocity of electronic device 100 about three axes ie, x, y, and z axes
  • the gyro sensor 180B can be used for image stabilization.
  • the gyro sensor 180B detects the shaking angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to offset the shaking of the electronic device 100 through reverse motion to achieve anti-shake.
  • the gyro sensor 180B can also be used for navigation and somatosensory game scenarios.
  • the magnetic sensor 180D includes a Hall sensor.
  • the electronic device 100 can detect the opening and closing of the flip holster using the magnetic sensor 180D.
  • the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Further, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the electronic device 100 is stationary. It can also be used to recognize the posture of the electronic device 100, and can be used in applications such as horizontal and vertical screen switching, pedometers, and the like.
  • the electronic device 100 can measure the distance through infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 can use the distance sensor 180F to measure the distance to achieve fast focusing.
  • Proximity light sensor 180G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • the electronic device 100 emits infrared light to the outside through the light emitting diode.
  • Electronic device 100 uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100 . When insufficient reflected light is detected, the electronic device 100 may determine that there is no object near the electronic device 100 .
  • the electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power.
  • Proximity light sensor 180G can also be used in holster mode, pocket mode automatically unlocks and locks the screen.
  • the ambient light sensor 180L is used to sense ambient light brightness.
  • the electronic device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in a pocket, so as to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, accessing application locks, taking pictures with fingerprints, answering incoming calls with fingerprints, and the like.
  • the temperature sensor 180J is used to detect the temperature.
  • the electronic device 100 uses the temperature detected by the temperature sensor 180J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold value, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J in order to reduce power consumption and implement thermal protection.
  • the electronic device 100 when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 caused by the low temperature.
  • the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device 100 , which is different from the location where the display screen 194 is located.
  • the bone conduction sensor 180M can acquire vibration signals.
  • the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 180M can also contact the pulse of the human body and receive the blood pressure beating signal.
  • the bone conduction sensor 180M can also be disposed in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vocal vibration bone block obtained by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beat signal obtained by the bone conduction sensor 180M, and realize the function of heart rate detection.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback.
  • touch operations acting on different applications can correspond to different vibration feedback effects.
  • the motor 191 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 194 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be contacted and separated from the electronic device 100 by inserting into the SIM card interface 195 or pulling out from the SIM card interface 195 .
  • the electronic device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • the SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 can also be compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the electronic device 100 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the electronic device 100 employs an eSIM, ie: an embedded SIM card.
  • the eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100 .
  • the software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiments of the present application take an Android system with a layered architecture as an example to exemplarily describe the software structure of the electronic device 100 .
  • FIG. 2 is a block diagram of the software structure of the electronic device 100 according to the embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer, an Android runtime (Android runtime) and a system library, and a kernel layer.
  • the application layer can include a series of application packages.
  • the application package can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
  • the application package can also include a visual search function. application.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device 100 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and so on.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • Android Runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library Media Libraries
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into raw input events (including touch coordinates, timestamps of touch operations, etc.). Raw input events are stored at the kernel layer.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example, the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer, through the camera 193 Capture still images or video.
  • the electronic device is a mobile phone as an example:
  • the poster may be captured by a user triggering a mobile phone, or it may be an image in a gallery of the mobile phone.
  • a browser with a visual search function is installed in the mobile phone.
  • the user wants to perform visual search based on the poster, he can click on the browser in the mobile phone.
  • the mobile phone starts the browser.
  • the browser is provided with an add option 31 , and the user can trigger the add option 31 .
  • the mobile phone in response to the triggering operation of the adding option 31 by the user, displays the “photograph” option and the “photo album” option.
  • the “shoot” option in (b) in FIG. 3 can be triggered, and the mobile phone can turn on the camera after detecting the triggering operation of the “shoot” option by the user, so that The poster can be captured by camera.
  • the user wants to use the poster in the gallery he can click on the "album” option in (b) of Fig. 3, and the mobile phone opens the gallery after detecting the user's triggering operation on the "album” option, The user selects the poster from the gallery, so the phone can get the poster based on the user's selection.
  • the mobile phone can display the poster in the display interface after acquiring the poster.
  • the display effect of the poster is as shown in (a) of FIG. 4 .
  • a "search" option 32 may also be provided in the presentation interface. The user may click on the "Search" option 32 .
  • the mobile phone After detecting the triggering operation of the "search" option 32 by the user, the mobile phone performs a visual search based on the poster, and obtains the search result of the search object in the poster.
  • the search object is the object that the user may be interested in in the poster
  • the search object is determined by the mobile phone, and the specific determination process can refer to the embodiments described below.
  • the characteristics of the search object in the poster include at least one of the following (1)-(3): (1) clear outline, (2) high integrity, (3) location close to the center of the poster area, that is, the location is relatively centered.
  • the number of search objects in the poster may be one or more.
  • the search objects in the poster may include face B, or may also include face B and mobile phone A.
  • the user can also trigger the mobile phone to perform a visual search in other ways. For example, after the mobile phone displays the poster, the user can also "shake" the mobile phone. Correspondingly, the mobile phone detects that the user's "shake” ” operation, perform a visual search based on the acquired posters.
  • the mobile phone After obtaining the search results, the mobile phone feeds back the search results to the user.
  • the mobile phone can ask the user to feedback.
  • the inquiry scene may include inquiry information related to the search object in the poster.
  • the poster is as shown in (a) in FIG. 4 , and the mobile phone determines that the search object in the poster includes mobile phone A and a person. face B, the inquiry scene can include inquiry information related to mobile phone A and face B.
  • the inquiry information fed back by the mobile phone to the user includes "Do you want to know about celebrity Ah Xing's related information?” and “Would you like to buy this phone A?”.
  • the mobile phone displays the query information of the multiple search objects in descending order of priority of the multiple search objects, as shown in FIG. 4 (b ), since the priority of face B is higher than that of mobile phone A, the inquiry information of face B is displayed before the inquiry information of mobile phone A.
  • the user may click on any one of the displayed query information, so that the mobile phone displays other contents in the search results related to the query information clicked by the user.
  • the user wants to know more information about the celebrity Ah Xing in the poster, he can click on the query message "Do you want to know about the celebrity Ah Xing?" Do you know the relevant information about celebrity Ah Xing?"
  • you can obtain and display other content in the search results of celebrity Ah Xing such as displaying the introduction information of celebrity Ah Xing, published works, etc.
  • the results are shown in (c) in Figure 4.
  • the user wants to buy the mobile phone A in the poster he can click the inquiry message "Do you want to buy this mobile phone A".
  • the search result of mobile phone A After the triggering operation of the query information "is mobile phone A", other contents in the search results of mobile phone A can be obtained and displayed, such as displaying multiple images of mobile phone A, and the display results are shown in (d) in Figure 4. .
  • the mobile phone detects a click operation on the query information "Do you want to buy this mobile phone A?”, if the purchase address information of mobile phone A can be obtained, for example, the search result includes the information of mobile phone A's purchase address. purchase address information, the purchase address information may also be displayed to the user.
  • the user's trigger operation on the purchase address information When the user's trigger operation on the purchase address information is detected, it jumps to the purchase page, so that the user can purchase mobile phone A on the purchase page, avoiding the need to manually input the retrieval information of mobile phone A to search again when the user wants to purchase mobile phone A. Thereby, the user experience can be improved.
  • the entire query information is triggerable.
  • part of the text in the query information can be triggerable, for example, "Do you want to know the relevant information about the famous star?"
  • the "celebrity star" in this query message is Triggerable, when the user wants to know more information about celebrity Axing, he can click the text of "celebrity Axing" to trigger the mobile phone to display other contents in the search result of celebrity Axing. For another example, “Do you want to buy this mobile phone A?" The "mobile phone A" and/or "purchase” in this query message can be triggered.
  • the user wants to know about mobile phone A in the poster he can click "Phone A" or "Buy” to trigger additional content in the search results for Phone A in the phone display poster.
  • the feedback form of the search results may also be as shown in FIG. 5 , that is, the triggerable items may be identified in the area where each search object of the one or more search objects in the poster is located, and the triggerable items may be identified in the display Other areas in the interface except posters identify the triggerable items corresponding to each search object.
  • the priority of the multiple search objects can be ordered from high to low.
  • the user wants to know more relevant information related to the search object he can click the triggerable item corresponding to the search object.
  • the mobile phone detects the user's triggering operation on the triggerable item, it can display other contents in the search result of the search object.
  • the display effect is as shown in (c) or (d) in FIG. 4 .
  • the triggerable items corresponding to different search objects can be identified by different colors and/or different shapes.
  • the triggerable items corresponding to face B can be identified by blue dots
  • the triggerable items corresponding to mobile phone A can be identified by using blue dots.
  • Trigger items can be identified by red dots.
  • the triggerable item corresponding to the face B may be identified by a dot
  • the triggerable item corresponding to the mobile phone A may be identified by a square.
  • the area near the triggerable item identified by other areas other than the poster can also display the corresponding inquiry information.
  • star Xing" inquiry information the "Buy this phone A” inquiry information can be displayed behind the triggerable item corresponding to mobile phone A in the poster. In this way, the user can intuitively perceive the search result corresponding to the triggerable item, thereby improving the user experience.
  • FIG. 5 is only an example in which triggerable items are identified in the area where each search object is located and other areas in the poster.
  • the triggerable item may be identified only in the area where each search object in the poster is located, or the triggerable item may be identified only in the other area, which is not limited in this embodiment of the present application.
  • the mobile phone can also display the search results of the search objects in different rows.
  • the mobile phone can display the search results of different search objects according to the priority of the multiple search objects from high to high.
  • the order from the lowest to the lowest is listed and displayed in different rows, that is, the search results of one or more search objects in the poster are displayed in different rows respectively.
  • the mobile phone displays the search result of face B in the first row, and the search result of mobile phone A is displayed in the second row.
  • one row may not be able to display all the search results of the search object.
  • the mobile phone may also identify the triggerable items of the search object.
  • the corresponding triggerable item is identified in the area where the face of face B is located, and the corresponding triggerable item is identified in the area where the mobile phone A is located in the poster.
  • the triggerable item is displayed in the row corresponding to the search object. In this way, when the user wants to know more about other content in the search result related to the search object, he can click on the triggerable item corresponding to the search object to make the mobile phone display other content.
  • the mobile phone may also feed back the search results of the search object to the user in other ways, which are not limited in this embodiment of the present application. .
  • FIG. 7 is a schematic diagram of a system architecture according to an exemplary embodiment.
  • the system architecture includes a terminal side and a cloud side.
  • the execution subject on the terminal side is an electronic device
  • the execution subject on the cloud side is a server
  • a communication connection is established between the electronic device and the server.
  • the electronic device includes an intent recognition module, a data association module, and a content layout and processing module.
  • the electronic device acquires the first image.
  • the first image may be captured by a camera, or may also be an image uploaded from an album.
  • the intent recognition module is used for determining the search object in the first image through operations such as image detection processing, performing an intent search on the search object in the first image through the server, and returning the search result by the server.
  • the intent recognition module is further configured to determine the priority of the multiple search objects.
  • the data association module is used to obtain and display the search results returned by the server.
  • the data association module is further configured to obtain the priorities of the multiple search objects determined by the intent recognition module, and pair the obtained priorities based on the priorities of the multiple search objects. Sorting the search results can also be understood as sorting multiple search objects.
  • the content layout and processing module is used for layout and feedback of the sorted multiple search results.
  • the electronic device may further include an image preprocessing module, and the image preprocessing module is used to preprocess the image captured or uploaded by the electronic device. image.
  • preprocessing the captured or uploaded image can make the obtained first image more in line with the processing requirements of the intent recognition module.
  • size adjustment can solve the problem of inconsistent image sizes caused by different camera specifications.
  • denoising processing The accuracy of the image detection processing of the intent recognition module can be improved, and the accuracy of the visual search can be further improved.
  • the server is used to determine and return search results.
  • the server is provided with databases corresponding to different vertical domains.
  • One vertical domain corresponds to one object type.
  • different vertical domains may include, but are not limited to, celebrity vertical domains, commodity vertical domains, and scenic spot vertical domains.
  • the database corresponding to each vertical domain includes, but is not limited to, an image library, semantic information, and work information.
  • the image library includes at least one image of the search object, for example, the at least one image may be images taken from different angles of the search object.
  • the semantic information is language description information of the search object, for example, may include introduction information of the search object.
  • Work information includes information about works such as articles published by characters and videos published.
  • At least one image, semantic information and work information of each search object included in the database are associated with each other.
  • the search result returned by the server to the electronic device generally includes at least one image of the search object in the image library, and semantic information and work information associated with the at least one image.
  • the search result of a certain search object may include multiple data.
  • the server before the server feeds back the search result to the electronic device, it can also reorder the plurality of data in the search result of the search object to determine the display order of the plurality of data in the search result of the search object, The reordered search results are fed back to the electronic device.
  • the server may sort a plurality of data in the search result of the search object according to a preset strategy.
  • the preset strategy may be set in advance according to actual requirements.
  • the preset strategy may be sorted according to the shooting angle, for example, the frontal images are ranked in the images of other angles.
  • the intent recognition module may also perform an initial search during the process of performing an intent search through the server, that is, perform an initial move operation.
  • the server determines the similarity between the object in the first image and the image in the image library to determine whether there is a search result matching the object in the image library, so that the object in the first image is searched for the first time. Screen, and then determine the search object in the first image according to the screening result, and the specific implementation can refer to the embodiments shown below.
  • the server determines the similarity based on a library of image primaries in the database.
  • the initial image library is a sub-library of the corresponding image library.
  • a part of the images can be extracted from the image library corresponding to each vertical domain in advance according to a certain preset strategy, and the corresponding image of each vertical domain can be obtained.
  • the initial image library for example, for an image library corresponding to any vertical domain, an image of each object is extracted from the image library to obtain an image preliminary library corresponding to any vertical domain. In this way, the need to match the object with all the images in the image library can be avoided when the similarity is determined, and the computational load of the server can be saved.
  • the embodiment of the present application is implemented through the interaction between the electronic device and the server, that is, the electronic device obtains the search results and performs the initial search through the server, so that the need to set up databases corresponding to different vertical domains in the electronic device can be avoided.
  • the operating burden of the electronic equipment thereby increasing the operating speed of the electronic equipment.
  • the above is described by taking the method provided in this embodiment of the present application as an example of interactive implementation between the terminal side and the cloud side.
  • the electronics are implemented separately.
  • an electronic device is provided with databases corresponding to different vertical domains, so that the electronic device can perform the search based on the database set by itself when determining the search result and the initial search, avoiding the need to rely on the server to realize visual search.
  • the method can also be implemented by the server alone.
  • the execution body of the embodiment of the present application may be an electronic device or a server. If the execution subject is a server, the involved display process (eg, the display process involved in step 905 in the embodiment shown in FIG. 9 ) can be omitted, or the display process can also be performed by other devices equipped with a display device.
  • the intent recognition module in the electronic device includes three branches, and each branch corresponds to a processing unit, that is, the intent recognition module includes three processing units, namely the first processing unit, the second processing unit, the third processing unit.
  • the first processing unit is used to execute the branch corresponding to the left virtual frame
  • the second processing unit is used to execute the branch corresponding to the middle virtual frame
  • the third processing unit is used to execute the branch corresponding to the right virtual frame operation in .
  • the related process of determining the arrangement order of objects involved in the method may be independently implemented by any one of the three processing units.
  • the related process of determining the arrangement order of the objects involved in the method may be implemented by any two processing units among the three processing units in combination.
  • the related process of determining the arrangement order of objects involved in the method may also be implemented by three processing units in combination. For ease of understanding, next, implementations of the three processing units are introduced through the following embodiments respectively.
  • FIG. 9 is a flow chart of a method for feeding back search results according to an exemplary embodiment.
  • the related process of determining the arrangement order of objects involved in the method is taken as an example that the second processing unit is independently implemented.
  • the introduction can include some or all of the following:
  • Step 901 Acquire a first image.
  • the first image is captured by an electronic device through a camera, or the first image is an image from a gallery.
  • the first image may also be obtained after preprocessing a captured or uploaded image.
  • reference may be made to the foregoing description, which will not be repeated here.
  • the first image includes M objects, where M is an integer greater than or equal to 2.
  • Step 902 Determine the object regions of the N objects and/or the object categories of the N objects included in the first image.
  • N is a positive integer less than or equal to M.
  • the object area of each object refers to the object area of each object in the first image.
  • the electronic device performs image detection processing on the first image through the intent recognition module to determine the object area of each object and/or the object category of each object among the N objects included in the first image.
  • the intent recognition module includes a target detection model, so that the first image is subjected to image detection processing by the target detection model to obtain the object area of each object and/or the object category of each object.
  • the target detection model may be a pre-trained detection model, which can be used to detect the object region and/or the object category of the object in any image in the any image.
  • the target detection model determines the position of each of the N objects included in the first image.
  • Information and object category the location information is used to indicate the object area of the object in the first image.
  • the electronic device uses an object frame to mark each object in the first image according to the position information determined by the target detection model, thereby obtaining the second image.
  • the second image is shown in (b) of FIG. 10, that is, after image detection processing, it is determined that the N objects included in the first image are faces, mobile phones, and clothes, and the object categories of the N objects are faces, mobile phones, and clothes, respectively. merchandise and clothing.
  • the above-mentioned target detection model may be obtained by pre-training the model to be trained based on the object detection sample set.
  • each object detection sample in the object detection sample set is an object image marked with an object category label.
  • the object detection sample set may include face image samples, clothing image samples, commodity image samples, landmark image samples, Animal image samples, plant image samples, etc.
  • the model to be trained can be a combined model of mobilenetV3 and a single shot multi-box detector (Single Shot MultiBox Detector, SSD), that is, a model based on mobilenetV3 as the network foundation and a model using the structure of SSD.
  • a single shot multi-box detector Single Shot MultiBox Detector, SSD
  • the network structure of mobilenetV3 can be described by Table 1.
  • the electronic device may cut out the object region of each of the N objects from the first image. For example, after cutting the object area of each object in the first image shown in (b) of FIG. 10 , the obtained object area of each object is shown in FIG. 11 , wherein ( The object shown in a) is “mobile phone”, the object shown in FIG. 11(b) is “face”, and the object shown in FIG. 11(c) is “clothing”.
  • the N objects included in the first image are the M objects in the first image, that is, N is equal to M.
  • the N objects included in the first image are determined after quality screening, that is, before the object regions of the N objects and/or the object categories of the N objects are determined, the The M objects in the first image are quality screened to determine N objects.
  • the quality scores of the N objects in the first image are greater than or equal to a quality score threshold, and the quality score of any one of the N objects is determined based on the ambiguity and/or integrity of any one object.
  • the quality score threshold can be set according to actual needs.
  • the quality screening process may include the following sub-steps (1)-(2):
  • the image quality of one or some objects in the first image may be poor due to reasons such as the shooting angle.
  • the first image may include passers-by or other objects, and passers-by or other objects are The image is relatively blurry, or only a small part appears, for example, the first image may only include part of the body of a passerby. And such objects are not the objects that users pay attention to in most cases. Therefore, the electronic device can determine the quality score of each of the M objects included in the first image through the second processing unit, so as to perform filtering processing on some objects among the M objects included in the first image, thereby filtering out the image Objects of poor quality.
  • the image quality of any object may be measured by the degree of blurriness and/or integrity of any one object, that is, the second processing unit determines the blurriness of each of the M objects included in the first image degree and/or completeness to determine the quality score for each subject.
  • the ambiguity can be described by the clarity of the outline of the object, that is, if the outline of an object is clear, it can be determined that the ambiguity of the object is low; on the contrary, if the outline of an object is not clear, it can be determined that the object higher ambiguity.
  • Integrity refers to the proportion of the unoccluded part of the object to the whole.
  • the second processing unit may utilize a Laplace variance algorithm to determine the ambiguity of the object.
  • a Laplacian mask can be used to perform a convolution operation on the pixel values in the region where the object in the first image is located, then calculate the variance, and perform score mapping on the variance to obtain the ambiguity of the object.
  • it can be implemented by the following code:
  • the integrity of any one of the M objects may be determined by a pre-trained integrity prediction model.
  • the object region of any object can be input into the integrity prediction model, the integrity prediction model performs prediction processing, and the output of any object is output. Completion.
  • the integrity prediction model may be obtained by training the model to be trained based on the integrity training sample set, the integrity training sample set may include multiple integrity training samples, and each integrity training sample includes an object sample image And the integrity of the object sample image, the model to be trained can be an untrained mobilenetV3 model. That is, the electronic device can obtain a large number of object samples with different integrity levels in advance, and then input them into the untrained mobilenetV3 model for model training to obtain the integrity prediction model.
  • the integrity prediction model may be a separate model, or may be a model integrated with the above target detection model, which is not limited in this embodiment of the present application.
  • the second processing unit may also determine the integrity of the object in other ways.
  • the area of the occluded area of the object and the area of the area where the object frame of the object is located may be determined, and then the integrity of the object is determined according to the two determined areas, which is not limited in this embodiment of the present application.
  • the second processing unit determines the blurriness and completeness of each of the M objects included in the first image as shown in FIG. 12 .
  • the ambiguity and integrity of the mobile phone are shown in (a) of Figure 12, which are 0.022 and 0.86, respectively, and the ambiguity and integrity of the face are shown in Figure 12 (b), which are 0.042 and 0.92, respectively.
  • the ambiguity and integrity of the clothes are 0.037 and 0.34, respectively, as shown in (c) in Figure 12.
  • the second processing unit determines a quality score for each object based on the blurriness and/or completeness of each object in the first image, the The quality score is used to indicate the image quality of the object, and the quality score is positively correlated with the image quality, that is, the higher the quality score, the better the image quality of the object.
  • the second processing unit may set the ambiguity weight and the completeness weight, and then, based on the ambiguity and completeness of any object, Integrity, the quality score of any object is determined by the following formula (1):
  • Si represents the quality score of object i
  • a represents the ambiguity weight
  • a is a negative value
  • F i represents the ambiguity of object i
  • b represents the integrity weight
  • I i represents the integrity of object i.
  • the ambiguity weight and the completeness weight may be set according to actual requirements, which are not limited in this embodiment of the present application.
  • the second processing unit performs filtering processing according to the quality score, for example, for any object among the M objects included in the first image, when the quality score of the any object is lower than the quality score threshold, filter out the any object .
  • the M objects included in the first image may also be sorted in descending order of quality scores, and then the top N objects in the ranking are obtained as N objects, where N is Integer greater than 1.
  • the above-mentioned quality screening process is only exemplary.
  • the specific implementation of the quality screening may include any one of the following manners: for any object among the M objects included in the first image, if the blurriness of the any object is higher than the blurriness threshold , then any object is filtered out, that is, objects with relatively blurred images are filtered out; or, if the integrity of any object is lower than the integrity threshold, then any object is filtered out, that is, the integrity of any object is lower than the integrity threshold.
  • any object will be filtered out, that is, the blurred objects will be filtered out. Incomplete object.
  • the remaining objects after the filtering process are determined as the N objects included in the first image.
  • the ambiguity threshold may be set by a user according to actual needs, or may also be set by default by an electronic device, which is not limited in this embodiment of the present application.
  • the integrity threshold may be set by the user according to actual needs, or may also be set by default by the electronic device, which is not limited in this embodiment of the present application.
  • the electronic device ends the intent search. That is, if all the M objects included in the first image are blurred or unclear or have low integrity, the electronic device does not perform the intent search, for example, the user may be prompted that the first image is unavailable.
  • Step 903 Perform an initial search on the N objects included in the first image.
  • the second processing unit sends an initial recruitment request to the server, where the initial recruitment request carries the object regions of the N objects and the object categories of the N objects.
  • the server parses the initial recruitment request, and obtains the object areas of the N objects and the object categories of the N objects.
  • the server determines the corresponding vertical domain according to the object type of the first object, and then determines the object region of the first object and all the objects. Determine the similarity between the images in the primary image library for the vertical domain.
  • the server extracts the object features in the object area of the first object, and obtains the object features of each image in the initial image database corresponding to the determined vertical domain, and determines the relationship between the extracted object features and the acquired images. Similarity between object features, multiple similarities are obtained. The server determines the largest similarity among the plurality of similarities as the confidence score of the first object.
  • the initial image library is a sub-library of the image library, when the similarity is large, it means that there is an image matching the first object in the image library, or in other words, there is an image matching the first object in the database of the server.
  • the confidence score is the similarity between the first object and the image in the image library, which can be used to indicate the matching degree between the search intent of the first object and the search result.
  • the server determines the label of the image corresponding to the maximum similarity, and uses the label as the object label of the first object.
  • the server sends the confidence score and the object label of the first object to the electronic device as the result of the initial search (or the result of the initial move).
  • the object tag can be used for the subsequent selective display of the electronic device.
  • the description here is only taken as an example of sending the object region and object type of the object to the server, and the server extracting the object feature of the object.
  • the second processing unit may further extract the object features of the object region, and send the object features and the object category to the server, so that the server can avoid the operation of performing feature extraction on the object region.
  • the electronic device itself is provided with a database.
  • the second processing unit may determine an image library corresponding to the object category of the first object, and determine the object area and image library of the first object in the first image. The similarity between each of the included multiple images, and then the maximum similarity among the determined multiple similarities is used as the confidence score of the first object.
  • Step 904 Determine the arrangement order of the N objects according to the results of the initial search.
  • the electronic device determines the arrangement order of the N objects according to the size of the confidence score.
  • the arrangement order of any one of the N objects corresponds to the priority, and the higher the arrangement order is, the higher the priority.
  • Step 905 Feed back the search results of some or all of the N objects according to the arrangement order of the N objects.
  • the electronic device can feed back the search results of all the objects in the N objects according to the arrangement order of the N objects, or can also select some objects from the N objects, and feed back the search results according to the arrangement order of the N objects. Search results for some objects.
  • search objects For ease of description and understanding, some or all of the N objects whose search results are to be fed back are referred to as search objects. It is not difficult to understand that the number of search objects may be one or more. That is, the search objects include some of the N objects, and in another embodiment, the search objects include all the N objects.
  • the electronic device selects a preset number of objects at the top of the N objects as search objects, and the preset number can be set according to actual requirements, for example, the preset number is 2.
  • the electronic device may also determine the search object according to the confidence scores of the N objects. For example, for the first object among the N objects, the electronic device compares the confidence score of the first object with the confidence score threshold. If the confidence score of the first object is lower than the confidence score threshold, it means that the server may not exist. The search result matches the first object, so in order to improve the user experience, that is, to provide the user with good search results, the first object may be filtered out, that is, the intent search for the first object may not be performed subsequently. Conversely, if the confidence score of the first object is higher than the confidence score threshold, it means that there is a search result matching the first object in the server, and at this time, the first object can be determined as the search object.
  • the search object is screened out from the first image according to the confidence score, so that the intent understanding is echoed with the search result, and the accuracy of the intent search can be improved.
  • the electronic device sends a search request to the server, where the search request carries the object area and object category of at least one search object.
  • the server parses the search request to obtain the object area and object category of at least one search object.
  • the server determines the corresponding vertical domain based on the object category of the search object, and obtains the search result of the search object from the database of the determined vertical domain based on the object region.
  • the database includes an image library and related information, and the server matches the object area of the search object with the images in the image library, so as to determine at least one image and related information of the search object.
  • the server takes at least one image and associated information of the search object as a search result of the search object. After that, the server returns the search result of each search object in the at least one search object to the electronic device.
  • the above description is given by taking an example of sending the object area and object category of at least one search object to the server when the electronic device performs an intention search.
  • the server may locally generate index information of each of the N objects, and establish each index information and The mapping between the object area of each object and the object category.
  • the server sends the index information of each of the N objects to the electronic device.
  • the electronic device when the electronic device performs an intent search on the search object, it does not need to send the object area and object category of the search object to the server again, but sends the index information of the search object to the server, and the server can directly Index information, obtain the object area and object category of the search object from the local. In this way, it is avoided that the electronic device needs to upload the object area and the object category again, which can save traffic.
  • the electronic device directly feeds back the search results of the search objects to the user.
  • the search results of the multiple search objects are fed back according to the arrangement order of the multiple search objects. That is, the search results of the search objects with higher priority are displayed first.
  • the specific implementation of feeding back the search results of the multiple search objects may include: acquiring the object tags included in the search results of each search object in the multiple search objects, and based on each search object The object label of the object, generates query information corresponding to each search object, and the query information is used to prompt whether the information associated with the object label needs to be obtained, and the query information of multiple search objects is displayed according to the arrangement order of multiple search objects. to feed back the search results of multiple search objects.
  • the information associated with the object tag of each search object refers to other information other than the object tag included in the search result of each search object, for example, including semantic information, work information, and images of the search object.
  • the N objects include a second object and a third object
  • the arrangement order of the second object is before the arrangement order of the third object
  • the display order of the search result corresponding to the second object on the display screen is located in the third object
  • the corresponding search result precedes the display order on the display.
  • the obtained search results include the search results of face B in the first image and the search results of mobile phone A
  • the confidence score of mobile phone A is is 0.886
  • the confidence score of face B is 0.998.
  • the confidence score of face B is higher than that of mobile phone A, that is, the arrangement order of face B is before the arrangement order of mobile phone A.
  • the search results of face B are given priority.
  • the search results of face B can be displayed before the search results of mobile phone A, as shown in (b) of FIG. 4 .
  • an initial search is performed on N objects in the first image to determine the confidence scores of the N objects in the first image, and then the confidence scores of the N objects in the first image are determined.
  • Object filtering to filter out objects whose intent understanding does not match search results, thereby improving the accuracy of intent searches.
  • the filtered objects are sorted and displayed based on the confidence score, so that the search results of the objects that the user may be interested in are displayed preferentially, which improves the user experience.
  • FIG. 14 is a flowchart of a method for feeding back search results according to an exemplary embodiment.
  • the related process of determining the arrangement order of objects involved in the method is independently implemented by a third processing unit.
  • the introduction can include some or all of the following:
  • steps 1401 to 1402 For the specific implementation of steps 1401 to 1402, reference may be made to steps 901 to 902 in the above embodiment of FIG. 9 .
  • Step 1403 Determine the object relationship score of each of the N objects in the first image.
  • the object relationship score of any one of the N objects is used to indicate the importance of any one object in the first image.
  • the first image may be divided into multiple regions, exemplarily, the first image may be divided into N*N regions, where N is an integer greater than 1, and can be divided according to Set according to actual needs.
  • a preset score is set for each region starting from the central region, the preset score of each region is used to indicate the importance of the position of each region, and the preset scores of at least two regions in the multiple regions are Are not the same.
  • objects closer to the center area are more likely to be interesting to the user. Therefore, when setting the preset score, the preset score of the center point area is set to the highest value, and the diffusion to the surrounding area gradually decreases.
  • the preset score of each region may be set according to a Gaussian distribution, and the Gaussian distribution of the preset score is shown in FIG. 16 .
  • the electronic device determines a position importance value for each of the N objects.
  • the first object is any one of the N objects.
  • the average value of the preset scores of the regions included in the object frame of the first object is determined, and the obtained score is used as the position importance value of the first object.
  • the N objects in the first image include a face, a mobile phone and clothes
  • the positional importance value of the face in the first image is 0.78
  • the positional importance value of the mobile phone is 0.7
  • the value of the positional importance of the clothes is determined by calculation.
  • the location importance value is 0.68.
  • an object relationship score for each object is determined based on the positional importance value of each object.
  • the position importance value of each object is determined as the object relationship score of each object, for example, the position importance value of the first object is determined as the object relationship score of the first object.
  • the first image may further include a reference object, and the reference object may be set according to actual requirements, for example, the reference object may be a human body.
  • the reference object being a human body as an example, when the first image includes a human body, usually objects closer to the human body are objects that the user may be interested in, while objects farther away from the human body are objects that the user is not interested in. Therefore, when determining the object relationship scores of the N objects in the first image, if the first image includes a reference object, it can also be based on the object area of the first object in the first image and the reference object in the first image.
  • the distance between the first object and the reference object in the first image is obtained as the degree of attachment of the first object.
  • the reference object is a human body
  • the degree of affiliation of the mobile phone is determined to be 0.42.
  • the distance between the first object and the reference object when the distance between the first object and the reference object is determined, the distance between the center point of the first object and the center point of the reference object may be determined.
  • the distance between the center point of the first object and the center point of the reference object when the size of the first image is adjusted, the distance between the center point of the first object and the center point of the reference object will change accordingly, so in order to ensure that the first images of different sizes determine The degree of affiliation is the same, and the distance can be normalized, for example, the diagonal distance of the first image can be used for normalization.
  • its degree of affiliation can be set according to actual needs, for example, it can be set to 1.
  • the degree of affiliation of a face is 1.
  • the distance between the first object and each reference object in the first image is determined to obtain multiple distances, and then the minimum distance among the multiple distances is used as Affiliation degree value.
  • the electronic device determines the object relationship score of the first object according to the position importance value and the affiliation value of the first object.
  • a certain weight may be set for the position importance degree value and the affiliation degree value respectively, and then the object relationship score of the first object is obtained by merging in an equal weight manner.
  • the weight of the position importance degree value and the weight of the affiliation degree value may be set according to actual requirements, which are not limited in this embodiment of the present application.
  • Step 1404 Determine the arrangement order of the N objects according to the object relationship scores of the N objects in the first image.
  • the arrangement order of the N objects is determined according to the size of the object relationship scores of the N objects.
  • Step 1405 Feed back the search results of some or all of the N objects according to the arrangement order of the N objects.
  • the search object may be determined from among N objects.
  • the search objects include some of the N objects, and in another embodiment, the search objects include some of the N objects.
  • a preset number of objects that are ranked first from the N objects may be selected as the search objects.
  • the search object may be determined according to the object relationship score.
  • the object relationship score is high, it indicates that the corresponding object is more important and may be the object of interest to the user.
  • the object relationship score is small, it indicates that the corresponding object is not important, that is, it may not be the object of interest to the user. Therefore, the electronic device can filter out objects whose object relationship scores are lower than the scoring threshold, and determine objects whose object relationship scores are higher than the scoring threshold as search objects.
  • the scoring threshold may be set by the user according to actual needs, or may also be set by default by the electronic device, which is not limited in this embodiment of the present application.
  • the search objects are screened from the first image according to the object relationship score of the objects in the first image, so that only the more important objects in the first image can be searched for intent, so as to provide users with The search results of the search object of interest can improve the user experience.
  • an intent search is performed on the search object, a search result is obtained, and the search result of the search object is fed back.
  • a search result is obtained, and the search result of the search object is fed back.
  • the search objects in the first image include face B and mobile phone A
  • the obtained search results include the search results of face B and the search results of mobile phone A
  • the search results of face B will be given priority.
  • the search results of face B can be displayed in the before the search results for phone A.
  • the object relationship score of each object is determined based on the degree of affiliation, so that the search intention of the object is associated with the object attribute, so that the user can preferentially recommend the search results of the search object that the user may be interested in, thereby improving the user experience.
  • FIG. 17 is a flowchart of a method for feeding back search results according to an exemplary embodiment.
  • the related process of determining the arrangement order of objects involved in the method is independently implemented by the first processing unit.
  • the introduction can include some or all of the following:
  • steps 1701 to 1702 For the specific implementation of steps 1701 to 1702, reference may be made to steps 901 to 902 in the above-mentioned embodiment of FIG. 9 .
  • Step 1703 Perform scene recognition on the first image.
  • scene categories may include, but are not limited to, character posters, bars, restaurants, coffee houses, cinemas, conference rooms, outdoor scenery, and outdoor landmarks.
  • the first processing unit may perform scene recognition on the first image through a scene recognition model
  • the scene recognition model may be a pre-trained model that can determine a corresponding scene category based on any image .
  • the first image may be input into the scene recognition model, the scene recognition model performs scene recognition on the first image, and outputs the scene category corresponding to the first image.
  • the scene category corresponding to the first image is determined to be a character poster by the scene recognition model.
  • the training process of the scene recognition model may include: acquiring a scene training sample set, where the scene training sample set includes a plurality of scene training samples, and each scene training sample includes a scene image marked with a scene category label.
  • the scene training sample set is input into the model to be trained for training, and the training ends when the preset training conditions are met, so that the scene recognition model can be obtained.
  • Preset training conditions can be set according to actual needs.
  • the preset training condition may mean that the number of training times reaches the threshold of times, or the preset training condition may also mean that the scene recognition accuracy of the trained model reaches the accuracy threshold.
  • the number of times threshold may be set by the user according to actual needs, or may also be set by default by the electronic device, which is not limited in this embodiment of the present application.
  • the accuracy threshold may be set by the user according to actual needs, or may also be set by default by the electronic device, which is not limited in this embodiment of the present application.
  • the model to be trained may be a classification network model.
  • the classification network model may be a model combining mobilenetV3 and SSD, that is, a model formed by using mobilenetV3 as the network foundation and using the structure of SSD.
  • the model to be trained is a model combining mobilenetV3 and SSD.
  • the model to be trained may also be another classification network model, which is not limited in this embodiment of the present application.
  • the training of the scene recognition model may be performed by an electronic device, or may also be performed by other devices, and then the trained scene recognition module is stored in the first processing unit of the electronic device.
  • Step 1704 Determine the arrangement order of the N objects in the first image according to the scene category of the first image and the object categories of the N objects in the first image.
  • different scenes correspond to at least one scene intent weight.
  • the scene intent weight of any object is used to indicate the probability of any object being searched in the scene of the first image, and at least one scene intent weight corresponding to each scene can be preset according to actual requirements.
  • at least one scene intent weight corresponding to different scenes is shown in Table 2:
  • Each search intent corresponds to an object category.
  • the electronic device may determine at least one corresponding scene intent weight according to the scene category, and in addition, determine the corresponding search intent according to the object categories of the N objects in the first image, so that the scene intent weight of each of the N objects may be determined.
  • the scene category of the first image is "character poster”
  • the N objects in the first image include faces, mobile phones and clothes
  • the object categories of the N objects include: people, commodities, items
  • the search intents are celebrities, shopping, and recognizing things, respectively. According to Table 1 above, it can be determined that the scene intent weights of each object are 1.0, 0.9, and 0.8, respectively.
  • the arrangement order of the N objects may be determined according to the size of the scene intent weights of the N objects.
  • Step 1705 Feed back the search results of some or all of the N objects according to the arrangement order of the N objects.
  • the electronic device filters out objects whose scene intent weight is greater than a weight threshold from the first image, and determines the filtered objects as search objects.
  • the weight threshold may be set by the user according to actual needs, or may also be set by default by the electronic device, which is not limited in this embodiment of the present application.
  • an intent search is performed on the search object in the first image, a search result is obtained, and the search result of the search object is fed back.
  • a search result is obtained, and the search result of the search object is fed back.
  • the electronic device may sort the search intent according to the scene intent weight of each search object.
  • the sorting results are: celebrity, shopping. That is, the most likely search intent corresponding to the first image is celebrity search, followed by shopping, for example, the purchase of the mobile phone in the first image. Since the weight of the celebrity's scene intention is higher than that of the shopping scene, that is, the order of the face B is before the order of the mobile phone A, so in the process of feeding back the search results, the search results of the face B are given priority. For example, you can Display the search result of face B before the search result of mobile phone A.
  • the scene category of the first image is determined, and then the search intent ranking of the search objects in the first image is determined according to the scene category, and the search results of the search objects in the first image are displayed according to the search intent ranking.
  • the search results are combined with the scene, so that the search results are closer to the scene, and the search results are prevented from being separated from the scene of the first image, so that the search results displayed to the user are more in line with the scene of the first image, thereby giving priority to the user.
  • the above-mentioned embodiments are described by taking an example that the related process of determining the arrangement order of objects involved in the method is independently implemented by a processing unit.
  • the embodiment of the present application The related process of determining the arrangement order of objects involved in the provided method can be implemented by any two of the above-mentioned three processing units in combination, or can also be implemented by the combination of the above-mentioned three processing units.
  • the specific implementation of the method provided by the embodiment of the present application is introduced by taking the combined implementation of three processing units as an example:
  • FIG. 18 is a schematic flowchart of a method for feeding back search results provided by an embodiment of the present application.
  • the method may include some or all of the following contents:
  • steps 1801 to 1802 reference may be made to steps 901 to 902 in the embodiment shown in FIG. 9 above.
  • Step 1803 Determine the scene intent weight of each of the N objects by the first processing unit.
  • the scene intent weight is used to indicate the probability that the first object is searched in the scene of the first image, and the first object is any one of the N objects. For its specific implementation, refer to the embodiment shown in FIG. 9 .
  • Step 1804 Determine the confidence score of each of the N objects by the second processing unit.
  • the confidence score is the similarity between the first object and the image in the image library. For its specific implementation, refer to the embodiment shown in FIG. 14 .
  • Step 1805 Determine the object relationship score of each of the N objects by the third processing unit.
  • the object relationship score is used to indicate the degree of importance of the first object in the first image.
  • the object relationship score is determined based on the value of the degree of importance of the position, or determined based on the value of the degree of importance of the position and the degree of affiliation. For its specific implementation, refer to the embodiment shown in FIG. 17 .
  • first processing unit, the second processing unit and the third processing unit are not performed in sequence.
  • first processing unit, the second processing unit and the third processing unit may be performed in parallel.
  • Step 1806 Determine the priority of the first object based on the confidence score, the object relationship score, and the scene intent weight of the first object.
  • the priority of the first object corresponds to the sorting order of the first object, that is, the higher the priority of the first object, the higher the sorting order.
  • the first object is any one of the N objects, that is, the first object is used as an example for description.
  • each object is determined based on the confidence score, the object relationship score, and the scene intent weight of each object. priority.
  • the object relationship score is determined based on the location importance value and the affiliation degree value, where S represents the scene intent weight, F represents the object's confidence score, and L and H represent the object's location importance value and affiliation, respectively.
  • the relationship degree value, the priority value of each object can be determined by the following formula (2), and the priority value is used to indicate the priority:
  • f n represents the priority value of object n
  • F n represents the confidence score of object n
  • L n represents the position importance value of object n
  • H n represents the degree of affiliation of object n
  • S ncI represents the value of object n’s degree of affiliation.
  • the scene intent weight corresponding to the object category in the first image. x, y, k, z are preset values respectively.
  • Step 1807 Determine a search object from the first image based on the priorities of the N objects.
  • the search object includes some or all of the N objects included in the first image.
  • Step 1808 Obtain the search result of the search object.
  • Step 1809 Feed back the search result of the search object.
  • the scene category of the first image is determined, the confidence score and the object relationship score of the objects in the first image are determined, and then based on the scene corresponding to the scene category
  • the intent weight, the confidence score and the object relationship score of the object in the first image determine the priority of the object, and the objects in the first image are displayed in order according to the priority, so as to display the objects that the user may be interested in as much as possible.
  • the search results improve the accuracy of the intent search, thereby improving the user experience.
  • FIG. 19 is a structural block diagram of an apparatus for feeding back search results provided by the embodiments of the present application. part.
  • the device includes:
  • an acquisition module 1910 configured to acquire a first image, where the first image includes M objects, where M is an integer greater than or equal to 2;
  • the determining module 1920 is configured to, for N objects among the M objects, determine the arrangement order of the N objects when the N is greater than or equal to 2, and the N is less than or equal to the M positive integer of ;
  • the arrangement order of any one of the N objects is determined based on any one or more of a scene intent weight, a confidence score, and an object relationship score, and the scene intent weight is used to indicate that in the first
  • the feedback module 1930 is configured to feed back the search results of some or all of the N objects according to the arrangement order of the N objects.
  • the determining module 1920 is used to:
  • a confidence score for the first object is determined based on the object area of the first object in the first image and the object category of the first object, and/or, based on the first object in the first object Object regions in an image determine an object relationship score for the first object, and/or determine a scene intent weight for the first object based on the object category of the first object.
  • the determining module 1920 is used to:
  • the largest similarity among the determined multiple similarities is used as the confidence score of the first object.
  • the first image includes a plurality of regions, and each region in the plurality of regions has a preset score indicating the degree of importance of the position of the each region, and the The preset scores of at least two of the multiple regions are different; the determining module 1920 is used for:
  • An object relationship score of the first object is determined based on the positional importance value of the first object.
  • the determining module 1920 is used to:
  • the first object When a reference object is included in the first image, the first object is acquired based on an object area of the first object in the first image and an object area of the reference object in the first image The distance from the reference object in the first image is used as the degree of affiliation of the first object;
  • An object relationship score of the first object is determined based on the positional importance degree value and the affiliation degree value of the first object.
  • the determining module 1920 is used to:
  • the scene intent weight of the first object is determined based on the scene category of the first image, the object category of the first object, and the correspondence between the scene category, the object category, and the scene intent weight.
  • the quality scores of the N objects in the first image are greater than or equal to a quality score threshold, and the quality score of any one of the objects is based on the blurriness and/or of the any one of the objects. Completeness is determined.
  • the feedback module 1930 is used to:
  • query information Based on the acquired object tags, generate query information corresponding to some or all of the N objects, where the query information is used to prompt whether information associated with the object tags needs to be acquired;
  • the query information of some or all of the N objects is displayed, so as to feed back the search results of some or all of the N objects.
  • the N objects include a second object and a third object
  • the arrangement order of the second object is before the arrangement order of the third object
  • the search result corresponding to the second object is in
  • the display order on the display screen is before the display order of the search result corresponding to the third object on the display screen.
  • a first image is acquired, where the first image includes M objects, where M is an integer greater than or equal to 2.
  • M is an integer greater than or equal to 2.
  • For N objects among the M objects when the number of N objects is multiple, determine the arrangement order of the N objects, and feed back some or all of the N objects according to the arrangement order of the N objects search results.
  • the arrangement order of any one of the N objects is determined based on any one or more of the scene intent weight, confidence score, and object relationship score, and the scene intent weight is used to indicate that in the scene corresponding to the first image
  • the probability of any object being searched, the confidence score is the similarity between any object and the image in the image library, and the object relationship score is used to indicate the importance of any object in the first image.
  • the N objects are filtered and sorted according to the arrangement order of the N objects, so that the search results of the objects with good search results and the user may be interested in are preferentially fed back to the user, so that the feedback has a certain pertinence, thereby improving the efficiency of the user.
  • the accuracy of the final feedback search results are filtered and sorted according to the arrangement order of the N objects, so that the search results of the objects with good search results and the user may be interested in are preferentially fed back to the user, so that the feedback has a certain pertinence, thereby improving the efficiency of the user.
  • the disclosed apparatus and method may be implemented in other manners.
  • the system embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • all or part of the processes in the methods of the above embodiments can be implemented by the present application, which can be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form, and the like.
  • the computer-readable medium may include at least: any entity or device capable of carrying computer program code to an electronic device, recording medium, computer memory, read-only memory (ROM), random access memory (random access memory) memory, RAM), electrical carrier signals, telecommunications signals, and software distribution media.
  • ROM read-only memory
  • RAM random access memory
  • electrical carrier signals telecommunications signals
  • software distribution media For example, U disk, mobile hard disk, disk or CD, etc.
  • computer readable media may not be electrical carrier signals and telecommunications signals.

Abstract

一种反馈搜索结果的方法、装置及存储介质,涉及终端技术领域。包括:获取包括M个对象的第一图像,对于M个中的N个对象,在N大于或等于2的情况下,确定N个对象的排列顺序。任一个对象的排列顺序基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定,场景意图权重用于指示在第一图像对应的场景下任一个对象被搜索的概率,置信度评分为任一个对象与图像库中的图像的相似度,物体关系评分用于指示任一个对象在第一图像中的重要程度,根据N个对象的排列顺序,反馈N个对象中的部分对象或者全部对象的搜索结果。将具有良好搜索结果且用户可能感兴趣的对象的搜索结果优先反馈给用户,使得反馈具有一定的针对性。

Description

反馈搜索结果的方法、装置及存储介质
本申请要求于2021年02月26日提交国家知识产权局、申请号为202110222937.3、申请名称为“反馈搜索结果的方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种反馈搜索结果的方法、装置及存储介质。
背景技术
视觉搜索是以图像作为搜索输入源进行搜索,以获取图像中的对象的相关图像、文字等多种搜索结果。越来越多的用户在电子设备上通过视觉搜索技术来实现对一些对象的搜索需求。
目前,在视觉搜索过程中,电子设备获取图像,之后,通过对图像进行图像检测处理以确定图像中的一个或多个对象。电子设备对所确定的对象进行搜索,得到搜索结果。电子设备得到搜索结果后,按照每个对象在图像中的位置顺序,向用户反馈每个对象的搜索结果。
然而,上述向用户反馈的搜索结果为第一图像中的每个对象的搜索结果,反馈的针对性较差,导致最终向用户反馈的搜索结果的准确度较低。
发明内容
本申请提供一种反馈搜索结果的方法、装置及存储介质,解决了现有技术中向用户反馈的搜索结果的准确度较低的问题。
为达到上述目的,本申请采用如下技术方案:
第一方面,提供一种反馈搜索结果的方法,所述方法包括:
获取第一图像,所述第一图像包括M个对象,所述M为大于或等于2的整数;
对于所述M个对象中的N个对象,在所述N大于或等于2的情况下,确定所述N个对象的排列顺序,所述N为小于或等于所述M的正整数;
其中,所述N个对象中任一个对象的排列顺序是基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定的,所述场景意图权重用于指示在所述第一图像对应的场景下所述任一个对象被搜索的概率,所述置信度评分为所述任一个对象与图像库中的图像的相似度,所述物体关系评分用于指示所述任一个对象在所述第一图像中的重要程度;
根据所述N个对象的排列顺序,反馈所述N个对象中的部分对象或者全部对象的搜索结果。
本申请通过根据N个对象的排列顺序对N个对象进行筛选和排序,以将具有良好搜索结果且用户可能感兴趣的对象的搜索结果优先反馈给用户,使得反馈具有一定的针对性,从而达到提高了最终反馈的搜索结果的准确度的效果。
作为本申请的一个示例,所述N个对象中任一个对象的场景意图权重、置信度评分、 物体关系评分中的任一项或多项通过如下方式确定:
通过目标检测模型对所述第一图像进行图像检测处理,得到第一对象在所述第一图像中的对象区域和/或所述第一对象的对象类别,所述第一对象为所述N个对象中的任一个对象;
基于所述第一对象在所述第一图像中的对象区域和所述第一对象的对象类别确定所述第一对象的置信度评分,和/或,基于所述第一对象在所述第一图像中的对象区域确定所述第一对象的物体关系评分,和/或,基于所述第一对象的对象类别确定所述第一对象的场景意图权重。
本申请通过目标检测模型对第一图像进行图像检测处理,以得到第一对象在第一图像中的对象区域和/或对象类别,从而基于第一对象的图像区域和/或对象类别,确定第一对象的场景意图权重、置信度评分、物体关系评分中的至少一项,进而利用各个对象的场景意图权重、置信度评分、物体关系评分中的至少一项对第一图像中的对象进行筛选和排序,以准确地向用户反馈用户可能感兴趣的对象。
作为本申请的一个示例,所述基于所述第一对象在所述第一图像中的对象区域和所述第一对象的对象类别确定所述第一对象的置信度评分,包括:
确定与所述第一对象的对象类别对应的图像库;
确定所述第一对象在所述第一图像中的对象区域与所述图像库包括的多个图像中每个图像之间的相似度;
将所确定的多个相似度中的最大相似度作为所述第一对象的置信度评分。
如此,根据置信度评分从第一图像中筛选出搜索对象,使得意图理解与搜索结果相呼应,可以提高意图搜索的准确性。
作为本申请的一个示例,所述第一图像包括多个区域,且所述多个区域中的每个区域具有用于指示所述每个区域的位置重要程度的预设分值,且所述多个区域中的至少两个区域的预设分值不相同;
所述基于所述第一对象在所述第一图像中的对象区域确定所述第一对象的物体关系评分,包括:
基于所述第一对象在所述第一图像中的对象区域内包括的每个区域的预设分值,确定所述第一对象在所述第一图像中的位置重要程度值;
基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分。
通过确定第一图像中的N个对象在二维空间里的位置重要程度值,基于位置重要程度值确定物体关系评分,根据物体关系评分从第一图像中筛选搜索对象,以便于只对第一图像中较为重要的对象进行意图搜索,从而为用户提供感兴趣的搜索对象的搜索结果,可以提高用户体验。
作为本申请的一个示例,所述基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分之前,还包括:
当所述第一图像中包括参考对象时,基于所述第一对象在所述第一图像中的对象区域与所述参考对象在所述第一图像中的对象区域,获取所述第一对象与所述参考对象在所述第一图像中的距离作为所述第一对象的附属关系程度值;
所述基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分,包括:
基于所述第一对象的位置重要程度值和附属关系程度值,确定所述第一对象的物体关系评分。
通过确定第一图像中的N个对象在二维空间里的位置重要程度值,以及确定每个对象与参考对象之间的附属关系程度值,然后基于位置重要程度值和附属关系程度值确定每个对象的物体关系评分,使得对象的搜索意图与对象属性相关联,从而为用户优先推荐用户可能感兴趣的搜索对象的搜索结果,进而提高了用户体验。
作为本申请的一个示例,所述基于所述第一对象的对象类别确定所述第一对象的场景意图权重,包括:
确定所述第一图像的场景类别;
基于所述第一图像的场景类别和所述第一对象的对象类别,以及场景类别、对象类别与场景意图权重之间的对应关系,确定所述第一对象的场景意图权重。
通过确定第一图像的场景类别,然后根据场景类别确定第一图像中的搜索对象的搜索意图排序,根据搜索意图排序展示第一图像中的搜索对象的搜索结果。如此,将搜索结果与场景相结合,使得搜索结果与场景更为贴近,避免搜索结果与第一图像的场景相脱离,使得向用户展示的搜索结果更加符合第一图像的场景,从而为用户优先推荐用户可能感兴趣的搜索结果,进而提高了用户体验。
作为本申请的一个示例,所述N个对象在所述第一图像中的质量评分大于或等于质量评分阈值,所述任一个对象的质量评分是基于所述任一个对象的模糊度和/或完整度确定的。
通过将一些图像质量较差的对象过滤掉,使得意图理解与对象属性相关联,避免对一些与搜索意图不相符的对象进行搜索处理,可以提高搜索的有效性和准确性,从而可以提高用户体验。
作为本申请的一个实例,所述根据所述N个对象的排列顺序,反馈所述N个对象中的部分对象或者全部对象的搜索结果,包括:
获取所述N个对象中的部分对象或者全部对象的搜索结果包括的对象标签;
基于所获取的对象标签,生成所述N个对象中的部分对象或者全部对象对应的问询信息,所述问询信息用于提示是否需要获取与对象标签关联的信息;
根据所述N个对象的排列顺序,展示所述N个对象中的部分对象或者全部对象的问询信息,以反馈所述N个对象中的部分对象或者全部对象的搜索结果。
基于对象标签生成问询信息,并通过问询信息向用户反馈搜索结果,如此可以使得用户快速了解反馈结果,并且可以提高反馈界面的简洁性。
作为本申请的一个示例,所述N个对象包括第二对象和第三对象,所述第二对象的排列顺序位于所述第三对象的排列顺序之前,所述第二对象对应的搜索结果在显示屏上的显示顺序位于所述第三对象对应的搜索结果在显示屏上的显示顺序之前。
将排列顺序位于前列的对象的搜索结果显示在前,也即对排列顺序靠前的对象的搜索结果优先展示,使得用户可以优先查看展示在显示屏的前列的搜索结果,以优先向用户反馈用户可能感兴趣的对象的搜索结果,从而达到了提高用户体验的效果。
第二方面,提供一种反馈搜索结果的装置,所述装置包括:
获取模块,用于获取第一图像,所述第一图像包括M个对象,所述M为大于或等于2的整数;
确定模块,用于对于所述M个对象中的N个对象,在所述N大于或等于2的情况下,确定所述N个对象的排列顺序,所述N为小于或等于所述M的正整数;
其中,所述N个对象中任一个对象的排列顺序是基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定的,所述场景意图权重用于指示在所述第一图像对应的场景下所述任一个对象被搜索的概率,所述置信度评分为所述任一个对象与图像库中的图像的相似度,所述物体关系评分用于指示所述任一个对象在所述第一图像中的重要程度;
反馈模块,用于根据所述N个对象的排列顺序,反馈所述N个对象中的部分对象或者全部对象的搜索结果。
作为本申请的一个示例,所述确定模块用于:
通过目标检测模型对所述第一图像进行图像检测处理,得到第一对象在所述第一图像中的对象区域和/或所述第一对象的对象类别,所述第一对象为所述N个对象中的任一个对象;
基于所述第一对象在所述第一图像中的对象区域和所述第一对象的对象类别确定所述第一对象的置信度评分,和/或,基于所述第一对象在所述第一图像中的对象区域确定所述第一对象的物体关系评分,和/或,基于所述第一对象的对象类别确定所述第一对象的场景意图权重。
作为本申请的一个示例,所述确定模块用于:
确定与所述第一对象的对象类别对应的图像库;
确定所述第一对象在所述第一图像中的对象区域与所述图像库包括的多个图像中每个图像之间的相似度;
将所确定的多个相似度中的最大相似度作为所述第一对象的置信度评分。
作为本申请的一个示例,所述第一图像包括多个区域,且所述多个区域中的每个区域具有用于指示所述每个区域的位置重要程度的预设分值,且所述多个区域中的至少两个区域的预设分值不相同;所述确定模块用于:
基于所述第一对象在所述第一图像中的对象区域内包括的每个区域的预设分值,确定所述第一对象在所述第一图像中的位置重要程度值;
基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分。
作为本申请的一个示例,所述确定模块用于:
当所述第一图像中包括参考对象时,基于所述第一对象在所述第一图像中的对象区域与所述参考对象在所述第一图像中的对象区域,获取所述第一对象与所述参考对象在所述第一图像中的距离作为所述第一对象的附属关系程度值;
基于所述第一对象的位置重要程度值和附属关系程度值,确定所述第一对象的物体关系评分。
作为本申请的一个示例,所述确定模块用于:
确定所述第一图像的场景类别;
基于所述第一图像的场景类别和所述第一对象的对象类别,以及场景类别、对象类别与场景意图权重之间的对应关系,确定所述第一对象的场景意图权重。
作为本申请的一个示例,所述N个对象在所述第一图像中的质量评分大于或等于质量评分阈值,所述任一个对象的质量评分是基于所述任一个对象的模糊度和/或完整度确定的。
作为本申请的一个示例,所述反馈模块用于:
获取所述N个对象中的部分对象或者全部对象的搜索结果包括的对象标签;
基于所获取的对象标签,生成所述N个对象中的部分对象或者全部对象对应的问询信息,所述问询信息用于提示是否需要获取与对象标签关联的信息;
根据所述N个对象的排列顺序,展示所述N个对象中的部分对象或者全部对象的问询信息,以反馈所述N个对象中的部分对象或者全部对象的搜索结果。
作为本申请的一个示例,所述N个对象包括第二对象和第三对象,所述第二对象的排列顺序位于所述第三对象的排列顺序之前,所述第二对象对应的搜索结果在显示屏上的显示顺序位于所述第三对象对应的搜索结果在显示屏上的显示顺序之前。
第三方面,提供一种电子设备,所述电子设备的结构中包括处理器和存储器,所述存储器用于存储支持电子设备执行上述第一方面任一所述的反馈搜索结果的方法的程序,以及存储用于实现上述第一方面任一所述的反馈搜索结果的方法所涉及的数据;所述处理器被配置为用于执行所述存储器中存储的程序;所述电子设备还可以包括通信总线,所述通信总线用于在所述处理器与所述存储器之间建立连接。
第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如上述第一方面任意一项所述的方法。
第五方面,提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面所述的反馈搜索结果的方法。
上述第二方面、第三方面、第四方面和第五方面所获得的技术效果与上述第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请提供的技术方案至少可以带来以下有益效果:
获取第一图像,第一图像包括M个对象,M为大于或等于2的整数。对于M个对象中的N个对象,在N个对象的数量为多个的情况下,确定N个对象的排列顺序,根据N个对象的排列顺序,反馈N个对象中的部分对象或者全部对象的搜索结果。其中,N个对象中任一个对象的排列顺序是基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定的,场景意图权重用于指示在第一图像对应的场景下任一个对象被搜索的概率,置信度评分为任一个对象与图像库中的图像的相似度,物体关系评分用于指示任一个对象在第一图像中的重要程度。如此,根据N个对象的排列顺序对N个对象进行筛选和排序,以将具有良好搜索结果且用户可能感兴趣的对象的搜索结果优先反馈给用户,使得反馈具有一定的针对性,从而提高了最终反馈的搜索结果的准确度。
附图说明
图1为本申请实施例提供的一种电子设备的结构示意图;
图2为本申请实施例提供的一种电子设备的软件结构框图;
图3为本申请实施例提供的一种手机中运行的浏览器的界面显示图;
图4为本申请实施例提供的一种意图搜索过程的示意图;
图5为本申请实施例提供的一种搜索结果的反馈效果示意图;
图6为本申请实施例提供的另一种搜索结果的反馈效果示意图;
图7为本申请实施例提供的一种反馈搜索结果的系统架构图;
图8为本申请实施例提供的一种反馈搜索结果的实现框架示意图;
图9为本申请实施例提供的一种反馈搜索结果的方法的流程示意图;
图10为本申请实施例提供的一种图像检测过程的示意图;
图11为本申请实施例提供的一种图像处理后得到的对象的区域图像示意图;
图12为本申请实施例提供的一种对象的模糊度和完整度的示意图;
图13为本申请实施例提供的一种对象的置信度评分的示意图;
图14为本申请实施例提供的另一种反馈搜索结果的方法的流程示意图;
图15为本申请实施例提供的一种图像的划分效果示意图;
图16为本申请实施例提供的一种预设分值的高斯分布示意图;
图17为本申请实施例提供的另一种反馈搜索结果的方法的流程示意图;
图18为本申请实施例提供的另一种反馈搜索结果的方法的流程示意图;
图19为本申请实施例提供的一种反馈搜索结果的装置的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
应当理解的是,本申请提及的“多个”是指两个或两个以上。在本申请的描述中,除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,为了便于清楚描述本申请的技术方案,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
本申请实施例提供的方法可以应用于电子设备中。在一个实施例中,电子设备可以安装并运行具有视觉搜索功能的应用程序(application,APP),譬如,该APP可以为购物APP、名人搜索APP、浏览器等。作为一种示例,电子设备可以为诸如可穿戴设备、终端设备之类的设备。示例性地,可穿戴设备可以包括但不限于智能手表,智能手环,智能胸针,智能眼罩、智能眼镜。终端设备可以包括但不限于手机、平板电脑、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、笔记本电脑、上网本、个人数字助理(personal digital assistant,PDA)。
请参阅图1,图1是本申请实施例提供的一种电子设备的结构示意图。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
可以理解的是,本申请实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模 块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实 施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用 处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置至少一个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器180A,电极之间的电容改变。电子设备100根据电容的变化确定压力的强度。当有触摸操作作用于显示屏194,电子设备100根据压力传感器180A检测所述触摸操作强度。电子设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定电子设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定电子设备100围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器180B检测电子设备100抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航,体感游戏场景。
气压传感器180C用于测量气压。在一些实施例中,电子设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。电子设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当电子设备100是翻盖机时,电子设备100可以根据磁传感器180D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测电子设备100在各个方向上(一般为三轴)加速度的大小。当电子设备100静止时可检测出重力的大小及方向。还可以用于识别电子设备100的姿态,应用于横竖屏切换,计步器等应用。
距离传感器180F,用于测量距离。电子设备100可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。 发光二极管可以是红外发光二极管。电子设备100通过发光二极管向外发射红外光。电子设备100使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备100附近有物体。当检测到不充分的反射光时,电子设备100可以确定电子设备100附近没有物体。电子设备100可以利用接近光传感器180G检测用户手持电子设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器180L用于感知环境光亮度。电子设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测电子设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。电子设备100可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器180J用于检测温度。在一些实施例中,电子设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,电子设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备100对电池142加热,以避免低温导致电子设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM 卡接口195拔出,实现和电子设备100的接触和分离。电子设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。电子设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备100中,不能和电子设备100分离。
电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图2是本申请实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序,另外,应用程序包还可以包括具有视觉搜索功能的应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明电子设备100软件以及硬件的工作流程。
当触摸传感器180K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别输入事件所对应的控件。以触摸操作是触摸单击操作,单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头193捕获静态图像或视频。
基于上述图1和图2所示实施例提供的电子设备,接下来对本申请实施例涉及的应用场景进行简单介绍,这里以电子设备是手机为例:
手机获取一张海报。作为一种示例,海报可以是由用户触发手机拍摄得到的,或者,也可以是手机的图库中的一张图像。示例性地,假设手机中安装了具有视觉搜索功能的浏览器,当用户想要基于海报进行视觉搜索时可以点击手机中的浏览器,响应于用户对浏览器的点击操作,手机启动浏览器。作为一种示例,如图3中的(a)图所示,浏览器提供有添加选项31,用户可以触发添加选项31。如图3中的(b)图所示,响应于用户对添加选项31的触发操作,手机展示“拍摄”选项和“相册”选项。在一个实施例中,当用户想要通过手机拍摄海报时可以触发图3中的(b)图中的“拍摄”选项,手机检测到用户对“拍摄”选项的触发操作后开启摄像头,如此就可以通过摄像头拍摄得到海报。在另一个实施例中,当用户想要使用图库中的海报时可以点击图3中的(b)图中的“相册”选项,手机检测到用户对“相册”选项的触发操作后打开图库,用户从图库中选择海报,如此手机就可以基于用户的选择获取到海报。
作为一种示例,手机获取海报后可以在展示界面中展示海报,示例性地,海报的展示效果如图4中的(a)图所示。在一个实施例中,该展示界面中还可以提供有“搜索”选项32。用户可以点击“搜索”选项32。手机在检测到用户对“搜索”选项32的触发操作后, 基于海报进行视觉搜索,得到海报中的搜索对象的搜索结果。其中,搜索对象为用户在海报中可能感兴趣的对象,搜索对象由手机确定,具体确定过程可以参见下文所述的实施例。在一个实施例中,搜索对象在海报中的特征包括如下(1)-(3)中的至少一项:(1)轮廓清晰,(2)完整度较高,(3)位置靠近海报的中心区域,也即位置比较居中。海报中的搜索对象的数量可能为一个或者多个,譬如图4中的(a)图所示的海报中的搜索对象可能包括人脸B,或者,也可能包括人脸B和手机A。
需要说明的是,这里仅是以在展示界面中提供“搜索”选项32供用户触发手机进行视觉搜索为例。在另一实施例中,用户还可以通过其他方式触发手机进行视觉搜索,示例性地,手机展示海报后,用户还可以“摇一摇”手机,相应地,手机检测到用户的“摇一摇”操作后,基于所获取的海报进行视觉搜索。
得到搜索结果之后,手机向用户反馈搜索结果。作为本申请的一个示例,由于搜索对象的搜索结果内容可能比较多,所以为了在提高展示效果的简洁性的同时使得用户快速、直观地了解搜索结果,手机可以通过问询场景的方式向用户进行反馈。譬如,问询场景可以包括与海报中的搜索对象相关的问询信息,示例性地,假设海报如图4中的(a)图所示,且手机确定海报中的搜索对象包括手机A和人脸B,则问询场景可以包括手机A和人脸B相关的问询信息,如图4中的(b)图所示,手机向用户反馈的问询信息包括“您想了解名人阿星的相关信息吗?”和“您想购买这款手机A吗?”。在一个实施例中,当搜索对象的数量为多个时,手机按照多个搜索对象的优先级从高到低的顺序对多个搜索对象的问询信息进行展示,如图4中的(b)图所示,由于人脸B的优先级高于手机A的优先级,所以人脸B的问询信息展示在手机A的问询信息之前。
作为一种示例,用户可以点击所展示的问询信息中的任意一个,以使得手机展示与用户所点击的问询信息相关的搜索结果中的其他内容。示例性地,当用户想要了解海报中的名人阿星的更多相关信息时,可以点击“您想了解名人阿星的相关信息吗?”这条问询信息,手机检测到对“您想了解名人阿星的相关信息吗?”这条问询信息的触发操作后,可以获取并展示名人阿星的搜索结果中的其他内容,譬如展示名人阿星的介绍信息、发表过的作品等内容,展示结果如图4中的(c)图所示。再如,当用户想要购买海报中的手机A时,可以点击“您想购买这款手机A吗”这条问询信息,在一个实施例中,当手机检测到对“您想购买这款手机A吗”这条问询信息的触发操作后,可以获取并展示手机A的搜索结果中的其他内容,譬如展示手机A的多张图像,展示结果如图4中的(d)图所示。在另一个实施例中,手机检测到对“您想购买这款手机A吗”这条问询信息的点击操作后,若能够获取到手机A的购买地址信息,譬如搜索结果中包括手机A的购买地址信息,则还可以向用户展示购买地址信息。当检测到用户对购买地址信息的触发操作后跳转至购买页面,使得用户可以在购买页面中购买手机A,避免当用户想要购买手机A时需要手动输入手机A的检索信息进行再次搜索,从而可以提高用户体验。
需要说明的是,在上述实施例中仅是以整条问询信息是可触发的为例进行说明。在另一个实施例中,也可以是问询信息中的部分文字是可触发的,譬如,“您想了解名人阿星的相关信息吗?”这条问询信息中的“名人阿星”是可触发的,当用户想要了解名人阿星的更多相关信息时,可以点击“名人阿星”文字,以触发手机展示名人阿星的搜索结果中的其他内容。再如,“您想购买这款手机A吗?”这条问询信息中的“手机A”和/或“购 买”是可触发的,当用户想要了解海报中的手机A时,可以点击“手机A”或“购买”以触发手机展示海报中的手机A的搜索结果中的其他内容。
作为本申请的另一个示例,搜索结果的反馈形式还可以如图5所示,也即可以在海报中的一个或者多个搜索对象中的每个搜索对象所在区域标识可触发项,以及在展示界面中除海报之外的其他区域标识每个搜索对象对应的可触发项,作为一种示例,当搜索对象的数量为多个时,可以按照多个搜索对象的优先级从高到低的顺序标识多个搜索对象的可触发项。当用户想要了解与搜索对象相关的更多相关信息时,可以点击搜索对象对应的可触发项,手机检测到用户对可触发项的触发操作后,可以展示搜索对象的搜索结果中的其他内容,示例性地,展示效果如图4中的(c)图或(d)图所示。
作为一种示例,不同搜索对象对应的可触发项可以采用不同颜色和/或不同形状进行标识,譬如,人脸B对应的可触发项可以采用蓝色的圆点进行标识,手机A对应的可触发项可以采用红色的圆点进行标识。再如,人脸B对应的可触发项可以采用圆点进行标识,手机A对应的可触发项可以采用方块进行标识。
值得一提的是,通过可触发项向用户反馈搜索结果,可以提高界面展示的简洁性。
在一个实施例中,在展示界面中除海报之外的其他区域标识的可触发项附近区域还可以展示对应的问询信息,譬如,名人阿星对应的可触发项的后面可以展示“Know about star Xing”问询信息,海报中的手机A对应的可触发项的后面可以展示“Buy this phone A”问询信息。如此以便于用户可以直观感知可触发项对应的搜索结果,从而可以提高用户体验。
需要说明的是,上述图5仅是以在海报中的每个搜索对象所在区域和其他区域均标识可触发项为例。在另一实施例中,还可以仅在海报中的每个搜索对象所在区域标识可触发项,或者,还可以仅在该其他区域标识可触发项,本申请实施例对此不作限定。
作为本申请的又一个示例,手机还可以通过不同行展示搜索对象的搜索结果,当搜索对象的数量为多个时,手机可以将不同搜索对象的搜索结果按照多个搜索对象的优先级从高到低的顺序通过不同行进行罗列展示,也即将海报中的一个或者多个搜索对象的搜索结果通过不同行分别进行展示。示例性地,请参考图6,手机在第一行展示人脸B的搜索结果,在第二行展示手机A的搜索结果。在一个实施例中,由于每一行能够展示的内容有限,所以一行可能无法展示搜索对象的全部搜索结果,在该种情况下,手机还可以标识搜索对象的可触发项。譬如,在人脸B的面部所在区域标识对应的可触发项,在海报中的手机A所在区域标识对应的可触发项。再如,在搜索对象对应的行内展示可触发项。如此,当用户想要了解更多与搜索对象相关的搜索结果中的其他内容时,可以点击搜索对象对应的可触发项以使得手机展示其他内容。
需要说明的是,上述的搜索结果的几种反馈形式仅是示例性的,在另一个实施例中,手机还可能采用其他方式向用户反馈搜索对象的搜索结果,本申请实施例对此不作限定。
在介绍完本申请实施例涉及的应用场景后,接下来对本申请实施例涉及的系统架构进行介绍。请参阅图7,图7是根据一示例性实施例示出的一种系统架构的示意图,作为本申请的一个示例,系统架构包括端侧和云侧。在一个实施例中,端侧的执行主体为电子设备,云侧的执行主体为服务器,电子设备与服务器之间建立有通信连接。
作为一种示例,如图7所示,电子设备包括意图识别模块、数据关联模块、内容布局 与处理模块。在实施中,电子设备获取第一图像,作为一种示例,第一图像可以是通过相机拍摄得到的,或者也可以是从相册中上传的一张图像。意图识别模块用于通过图像检测处理等操作确定第一图像中的搜索对象,通过服务器对第一图像中的搜索对象进行意图搜索,由服务器返回搜索结果。另外,当第一图像中包括多个搜索对象时,意图识别模块还用于确定多个搜索对象的优先级。数据关联模块用于获取服务器返回的搜索结果并展示。在一个实施例中,当搜索对象的数量为多个时,数据关联模块还用于获取意图识别模块所确定的多个搜索对象的优先级,并基于多个搜索对象的优先级对所获得的搜索结果进行排序,这里也可以理解为对多个搜索对象进行排序。内容布局与处理模块用于对排序后的多个搜索结果进行布局并反馈。
可选地,电子设备还可以包括图像预处理模块,图像预处理模块用于对电子设备拍摄或上传的图像进行预处理,示例性地,可以进行尺寸调整、去噪等预处理,得到第一图像。如此,对拍摄或上传的图像进行预处理可以使得得到的第一图像能够更加符合意图识别模块的处理要求,譬如尺寸调整可以解决因相机规格不同导致的图像尺寸不统一问题,再如去噪处理可以提高意图识别模块图像检测处理的准确性,进而可以提高视觉搜索的准确性。
服务器用于确定并返回搜索结果。服务器中设有不同垂域对应的数据库,一种垂域对应一种对象类型,示例性地,不同垂域可以包括但不限于名人垂域、商品垂域、景点垂域。在一个实施例中,每个垂域对应的数据库中包括但不限于图像库、语义信息、作品信息。图像库包括搜索对象的至少一张图像,譬如至少一张图像可以是搜索对象的不同角度的拍摄图像。语义信息为搜索对象的语言描述信息,譬如,可以包括搜索对象的介绍信息。作品信息包括人物发表的文章、发布的视频等作品的信息。数据库中包括的每个搜索对象的至少一张图像、语义信息和作品信息是相互关联的。示例性地,服务器向电子设备返回的搜索结果通常包括搜索对象在图像库中的至少一张图像、以及与至少一张图像相关联的语义信息和作品信息。
在一种可能的实现方式中,某搜索对象的搜索结果可能包括多个数据。该种情况下,服务器将搜索结果反馈给电子设备之前,还可以对该搜索对象的搜索结果中的多个数据进行重排序,以确定该搜索对象的搜索结果中的多个数据的展示顺序,并将重排序后的搜索结果反馈给电子设备。作为一种示例,服务器可以根据预设策略对搜索对象的搜索结果中的多个数据进行排序。其中,预设策略可以根据实际需求预先进行设置,示例性地,若搜索对象的搜索结果包括多张图像,则预设策略可以为按照拍摄角度进行排序,比如正面的图像排于其他角度的图像之前。
在本申请一种可能的实现方式中,意图识别模块通过服务器进行意图搜索的过程中还可能进行初次搜索,也即执行初招操作。在初次搜索时,服务器确定第一图像中的对象与图像库中的图像的相似度,以确定图像库中是否存在与对象相匹配的搜索结果,从而通过初次搜索对第一图像中的对象进行筛选,进而根据筛选结果确定第一图像中的搜索对象,其具体实现可以参见下文所示实施例。在一个实施例中,服务器基于数据库中的图像初招库来确定相似度。其中,图像初招库为对应的图像库的子库,作为一种示例,可以按照某种预设的策略预先从每个垂域对应的图像库中抽取一部分图像,得到每个垂域对应的图像初招库,譬如,对于任一垂域对应的图像库,从图像库中抽取每个对象的一张图像,得到任一垂域对应的图像初招库。如此,在确定相似度时可以避免需要将对象与图像库中的所 有图像进行匹配,可以节省服务器的运算量。
值得一提的是,本申请实施例通过电子设备与服务器交互实现,即电子设备通过服务器获取搜索结果及进行初次搜索,如此可以避免需要电子设备中设置不同垂域对应的数据库,从而可以减小电子设备的运行负担,进而提高了电子设备的运行速度。
当然需要说明的是,上述是以本申请实施例提供的方法由端侧与云侧交互实现为例进行说明,在另一实施例中,该方法还可以由端侧单独实现,也即可以由电子设备单独实现。譬如电子设备中设有不同垂域对应的数据库,如此,电子设备在确定搜索结果以及初次搜索时就可以基于自身设置的数据库来执行,避免需要依赖服务器才能实现视觉搜索。在又一个实施例中,该方法还可以由服务器单独实现。
在介绍完本申请实施例涉及的应用场景和系统架构后,接下来对本申请实施例提供的确定搜索意图的方法的具体实现进行介绍,该方法可以应用于上述图7所示的系统架构中。本申请实施例的执行主体可以是电子设备,也可以是服务器。若执行主体是服务器,则涉及的显示过程(譬如图9所示实施例中的步骤905中涉及的显示过程)可以省略,或者显示过程也可以通过其他配有显示装置的设备执行。
下述以执行主体是电子设备为例。作为本申请的一个示例,电子设备中的意图识别模块包括三个分支,每个分支对应一个处理单元,也即意图识别模块包括三个处理单元,分别为第一处理单元、第二处理单元、第三处理单元。请参考图8,第一处理单元用于执行左边虚框对应的分支的操作,第二处理单元用于执行中间虚框对应的分支的操作,第三处理单元用于执行右边虚框对应的分支中的操作。在一个实施例中,该方法中涉及的确定对象的排列顺序的相关过程可以由三个处理单元中的任一个处理单元独立实现。在另一个实施例中,该方法中涉及的确定对象的排列顺序的相关过程可以由三个处理单元中的任意两个处理单元结合实现。在又一个实施例中,该方法中涉及的确定对象的排列顺序的相关过程还可以由三个处理单元结合实现。为了便于理解,接下来,分别通过如下几个实施例对三个处理单元的实现方式进行介绍。
请参考图9,图9是根据一示例性实施例示出的一种搜索结果的反馈方法流程图,这里以该方法中涉及的确定对象的排列顺序的相关过程通过第二处理单元单独实现为例进行介绍,具体可以包括如下部分或者全部内容:
步骤901:获取第一图像。
作为一种示例,第一图像是由电子设备通过相机拍摄得到的,或者,第一图像是从图库中的一张图像。作为另一种示例,第一图像还可以是对拍摄的或上传的图像进行预处理后得到的。获取第一图像的具体实现可以参见前文,这里不再赘述。
其中,第一图像中包括M个对象,M为大于或等于2的整数。
步骤902:确定第一图像包括的N个对象的对象区域和/或N个对象的对象类别。
其中,N为小于或等于M的正整数。
每个对象的对象区域是指每个对象在第一图像中的对象区域。
请参考图8,电子设备通过意图识别模块对第一图像进行图像检测处理,以确定第一图像包括的N个对象中每个对象的对象区域和/或每个对象的对象类别。作为一种示例,意图识别模块包括目标检测模型,如此,通过该目标检测模型对第一图像进行图像检测处 理,得到每个对象的对象区域和/或每个对象的对象类别。其中,目标检测模型可以是预先训练好的检测模型,能够用于检测任一图像中的对象在该任一图像中的对象区域和/或对象类别。
示例性地,假设第一图像如图10中的(a)图所示,则将第一图像输入至目标检测模型后,目标检测模型确定第一图像包括的N个对象中每个对象的位置信息和对象类别,位置信息用于指示对象在第一图像中的对象区域。作为一种示例,电子设备根据目标检测模型所确定的位置信息采用对象框将每个对象在第一图像中标注出来,从而得到第二图像。譬如第二图像如图10中的(b)图所示,也即通过图像检测处理后确定第一图像包括的N个对象为人脸、手机和衣服,该N个对象的对象类别分别为人脸、商品和服装。
在一个实施例中,上述目标检测模型可以是预先基于对象检测样本集对待训练的模型进行训练后得到的。作为一种示例,对象检测样本集中的每个对象检测样本为标有对象类别标签的对象图像,譬如,对象检测样本集中可以包括人脸图像样本、衣服图像样本、商品图像样本、地标图像样本、动物图像样本、植物图像样本等。
作为本申请的一个示例,待训练的模型可以为mobilenetV3和单发多框检测器(Single Shot MultiBox Detector,SSD)结合的模型,也即以mobilenetV3作为网络基础,以及使用SSD的结构构成的模型。其中,mobilenetV3的网络结构可以通过表1进行描述。
表1
Figure PCTCN2021139762-appb-000001
作为本申请一个示例,电子设备确定第一图像包括的N个对象的对象区域和/或N个 对象的对象类别后,可以从第一图像中切割出N个对象中每个对象的对象区域。譬如,对图10中的(b)图所示的第一图像中的每个对象的对象区域进行切割后,得到的每个对象的对象区域如图11所示,其中,图11中的(a)图示出的对象为“手机”,图11中的(b)图示出的对象为“人脸”,图11中的(c)图示出的对象为“衣服”。
在一种可能的实现方式中,第一图像包括的N个对象即为第一图像中的M个对象,也即N等于M。在另一种可能的实现方式中,第一图像包括的N个对象是通过质量筛选后确定的,也即在确定N个对象的对象区域和/或N个对象的对象类别之前,还可以对第一图像中的M个对象进行质量筛选,以确定N个对象。
作为一种示例,N个对象在第一图像中的质量评分大于或等于质量评分阈值,N个对象中的任一个对象的质量评分是基于任一个对象的模糊度和/或完整度确定的。其中,质量评分阈值可以根据实际需求进行设置。在一个实施例中,质量筛选过程可以包括如下(1)-(2)几个子步骤:
(1)确定第一图像包括的M个对象中每个对象的质量评分。
在一实施例中,由于拍摄角度等原因可能使得第一图像中的某个或某些对象的图像质量较差,譬如,第一图像中可能包括路人或者其他物体,路人或者其他物体在第一图像中较为模糊,或者仅出现一小部分,比如第一图像可能仅包括某路人的部分身体。而这类对象大多数情况下不是用户关注的对象。所以,电子设备可以通过第二处理单元确定第一图像包括的M个对象中的每个对象的质量评分,以对第一图像包括的M个对象中的部分对象进行过滤处理,从而过滤掉图像质量较差的对象。
作为一种示例,任一个对象的图像质量可以通过任一个对象的模糊度和/或完整度来进行衡量,也即,第二处理单元确定第一图像包括的M个对象中每个对象的模糊度和/或完整度,从而确定每个对象的质量评分。其中,模糊度可以通过对象的轮廓的清晰度进行描述,也即若某对象的轮廓清晰,则可以确定该对象的模糊度较低,反之,若某对象的轮廓不清晰,则可以确定该对象的模糊度较高。完整度是指对象未被遮挡部分占整体的比例。
在一个实施例中,第二处理单元可以利用拉普拉斯方差算法来确定对象的模糊度。譬如可以使用拉普拉斯掩模对第一图像中的对象所在区域内的像素值做卷积运算,然后计算方差,对方差进行分值映射,得到对象的模糊度。示例性地,可以通过如下代码实现:
frame=cv2.imread(img)
resImg=cv2.resize(frame,(800,900),interpolation=cv2.INTER_CUBIC)
img2gray=cv2.cvtColor(resImg,cv2.COLOR_BGR2GRAY)
res=cv2.Laplacian(img2gray,cv2.CV_64F)
score=res.var()
在上述代码中,读取M个对象中任一个对象的对象区域,然后调整任一个对象所在区域的尺寸至固定尺寸,对尺寸调整后的任一个对象所在区域内的像素值进行灰度处理,即转为单通道灰度图像,基于单通道灰度图像进行拉普拉斯标准差计算,从而得到任一个对象的模糊度。
在一个实施例中,可以通过预先训练好的完整度预测模型来确定M个对象中任一个对象的完整度。示例性地,对于第一图像中包括的M个对象中的任一对象,可以将任一对象的对象区域输入至完整度预测模型中,由完整度预测模型进行预测处理,输出任一对象的 完整度。
在一个实施例中,完整度预测模型可以是基于完整度训练样本集对待训练的模型进行训练得到,完整度训练样本集可以包括多个完整度训练样本,每个完整度训练样本包括对象样本图像和对象样本图像的完整度,待训练的模型可以为未训练的mobilenetV3模型。也即电子设备可以预先获取大量不同完整度的对象样本,然后输入至未训练的mobilenetV3模型进行模型训练,得到完整度预测模型。
需要说明的是,完整度预测模型可以为单独的模型,也可以为与上述目标检测模型结合为一体的模型,本申请实施例对此不作限定。
另外需要说明的是,上述仅是以通过完整度预测模型确定对象的完整度为例进行说明,在另一实施例中,第二处理单元还可以通过其他方式来确定对象的完整度。示例性地,可以确定对象被遮挡区域的面积,以及确定对象的对象框所在区域的面积,然后根据所确定的两个面积来确定对象的完整度,本申请实施例对此不作限定。
示例性地,第二处理单元确定第一图像包括的M个对象中每个对象的模糊度和完整度如图12所示。手机的模糊度和完整度如图12中的(a)图所示,分别为0.022和0.86,人脸的模糊度和完整度如图12中的(b)图所示,分别为0.042和0.92,衣服的模糊度和完整度如图12中的(c)图所示,分别为0.037和0.34。
第二处理单元确定第一图像中的每个对象的模糊度和/或完整度之后,基于第一图像中的每个对象的模糊度和/或完整度,确定每个对象的质量评分,该质量评分用于指示对象的图像质量,质量评分与图像质量正相关,也即质量评分越大说明对象的图像质量越好。
作为一种示例,以基于任一个对象的模糊度和完整度确定任一个对象的质量评分为例,第二处理单元可以设置模糊度权重和完整度权重,之后,基于任一个对象的模糊度和完整度,通过如下公式(1)来确定任一个对象的质量评分:
S i=aF i+bI i   (1)
其中,S i表示对象i的质量评分,a表示模糊度权重,a为负值,F i表示对象i的模糊度,b表示完整度权重,I i表示对象i的完整度。其中,模糊度权重和完整度权重可以根据实际需求进行设置,本申请实施例对此不作限定。
(2)将基于M个对象中每个对象的质量评分从第一图像包括的M个对象中筛选出的对象作为N个对象。
第二处理单元根据质量评分进行过滤处理,譬如,对于第一图像包括的M个对象中的任一对象,当该任一对象的质量评分低于质量评分阈值时,将该任一对象过滤掉。
在另一个实施例中,还可以按照质量评分从高到低的顺序对第一图像包括的M个对象进行排序,然后获取将排序靠前的前N个对象作为N个对象,其中,N为大于1的整数。
需要说明的是,上述质量筛选过程仅是示例性的。在另一实施例中,质量筛选的具体实现可以包括如下方式中的任一种:对于第一图像包括的M个对象中的任一对象,若该任一对象的模糊度高于模糊度阈值,则将该任一对象过滤掉,也即将图像较为模糊的对象过滤掉;或者,若该任一对象的完整度低于完整度阈值,则将该任一对象过滤掉,也即将完整度较低的对象过滤掉;或者,若该任一对象的模糊度高于模糊度阈值且该任一对象的完整度低于完整度阈值,则将该任一对象过滤掉,也即过滤掉既模糊又不完整的对象。另外,将过滤处理后剩余的对象确定为第一图像包括的N个对象。
其中,模糊度阈值可以由用户根据实际需求进行设置,或者,也可以由电子设备默认设置,本申请实施例对此不作限定。
其中,完整度阈值可以由用户根据实际需求进行设置,或者,也可以由电子设备默认设置,本申请实施例对此不作限定。
值得一提的是,本申请实施例将一些图像质量较差的对象过滤掉,使得意图理解与对象属性相关联,避免对一些与搜索意图不相符的对象进行搜索处理,可以提高搜索的有效性和准确性,从而可以提高用户体验。
需要说明的是,若第一图像中的M个对象的质量评分均低于质量评分阈值,则电子设备结束意图搜索。也即若第一图像中包括的M个对象均模糊不清楚或完整度较低,则电子设备不进行意图搜索,譬如可以提示用户第一图像不可用。
步骤903:对第一图像包括的N个对象进行初次搜索。
作为一种示例,第二处理单元向服务器发送初招请求,初招请求中携带N个对象的对象区域和N个对象的对象类别。服务器接收初招请求后对初招请求进行解析,得到N个对象的对象区域和N个对象的对象类别。本实施例以第一对象为例进行说明,第一对象为N个对象中的任意一个对象:服务器根据第一对象的对象类别确定对应的垂域,之后,确定第一对象的对象区域与所确定的垂域的图像初招库中的图像之间的相似度。在实施中,服务器提取第一对象的对象区域中的对象特征,以及获取所确定的垂域对应的图像初招库中各个图像的对象特征,确定所提取的对象特征与所获取的各个图像的对象特征之间的相似度,得到多个相似度。服务器将多个相似度中的最大相似度确定为第一对象的置信度评分。如前文所述,由于图像初招库为图像库的子库,所以,当相似度较大时,说明图像库中存在与第一对象相匹配的图像,或者说,服务器的数据库中存在与第一对象相符的搜索结果,因此,置信度评分为第一对象与图像库中的图像的相似度,能够用于指示第一对象的搜索意图与搜索结果的匹配程度。作为一种示例而非限定,服务器确定最大相似度对应的图像的标签,将该标签作为第一对象的对象标签。服务器将第一对象的置信度评分和对象标签作为初次搜索的结果(或称初招结果)发送给电子设备。其中,对象标签可以用于电子设备后续选择性地展示。
需要说明的是,这里仅是以将对象的对象区域和对象类别发送给服务器,由服务器提取对象的对象特征为例进行说明。在另一实施例中,第二处理单元还可以提取对象区域的对象特征,将对象特征和对象类别发送给服务器,如此可以避免需要服务器对对象区域执行特征提取的操作。
作为另一种示例,电子设备自身设有数据库,此时,可以由第二处理单元确定与第一对象的对象类别对应的图像库,确定第一对象在第一图像中的对象区域与图像库包括的多个图像中每个图像之间的相似度,之后,将所确定的多个相似度中的最大相似度作为第一对象的置信度评分。
步骤904:根据初次搜索的结果确定N个对象的排列顺序。
电子设备根据置信度评分的大小,确定N个对象的排列顺序。其中,N个对象中任一个对象的排列顺序与优先级对应,排列顺序越靠前,优先级越高。
步骤905:根据N个对象的排列顺序,反馈N个对象中的部分对象或者全部对象的搜索结果。
也即是,电子设备可以根据N个对象的排列顺序,反馈N个对象中的全部对象的搜索结果,或者,也可以从N个对象中选择部分对象,并根据N个对象的排列顺序,反馈部分对象的搜索结果。
为了便于描述和理解,将N个对象中待反馈搜索结果的部分对象或者全部对象称为搜索对象。不难理解,搜索对象的数量可能为一个或者多个。也即,搜索对象包括N个对象中的部分对象,在另一个实施例中,搜索对象包括N个对象中的全部对象。
作为一种示例,电子设备从N个对象中选择排序靠前的预设数量个对象作为搜索对象,预设数量可以根据实际需求进行设置,譬如预设数量为2。
作为另一种示例,电子设备还可以根据N个对象的置信度评分确定搜索对象。譬如对于N个对象中的第一对象,电子设备将第一对象的置信度评分与置信度评分阈值进行比较,若第一对象的置信度评分低于置信度评分阈值,说明服务器中可能不存在与第一对象相匹配的搜索结果,因此为了能够提高用户体验,也即为用户提供良好的搜索结果,可以将第一对象过滤掉,即后续可以不对第一对象进行意图搜索。反之,若第一对象的置信度评分高于置信度评分阈值,说明服务器中存在与第一对象相匹配的搜索结果,此时,可以将第一对象确定为搜索对象。
如此,根据置信度评分从第一图像中筛选出搜索对象,使得意图理解与搜索结果相呼应,可以提高意图搜索的准确性。
作为一种示例,电子设备确定搜索对象后向服务器发送搜索请求,搜索请求中携带至少一个搜索对象的对象区域和对象类别。服务器接收到搜索请求后,对搜索请求进行解析,得到至少一个搜索对象的对象区域和对象类别。以一个搜索对象为例进行说明:服务器基于搜索对象的对象类别确定对应的垂域,并基于对象区域从所确定的垂域的数据库中获取搜索对象的搜索结果。在一个实施例中,数据库中包括图像库和关联信息,服务器将搜索对象的对象区域与图像库中的图像进行匹配,从而确定搜索对象的至少一张图像和关联信息。服务器将搜索对象的至少一张图像和关联信息作为搜索对象的搜索结果。之后,服务器向电子设备返回至少一个搜索对象中的每个搜索对象的搜索结果。
需要说明的是,上述是以电子设备进行意图搜索时将至少一个搜索对象的对象区域和对象类别发送给服务器为例进行说明。在另一实施例中,服务器在对初招请求进行解析得到N个对象的对象区域和对象类别之后,可以在本地生成N个对象中的每个对象的索引信息,并建立每个索引信息与每个对象的对象区域和对象类别之间的映射关系。服务器将N个对象中的每个对象的索引信息发送给电子设备。在该种情况下,当电子设备对搜索对象进行意图搜索时,不需要再次将搜索对象的对象区域和对象类别发送给服务器,而是将搜索对象的索引信息发送给服务器,服务器就可以直接基于索引信息,从本地获取搜索对象的对象区域和对象类别。如此,避免需要电子设备再次上传对象区域和对象类别,可以节省流量。
当搜索对象的数量为一个时,电子设备直接向用户反馈搜索对象的搜索结果。
当搜索对象的数量为多个时,按照多个搜索对象的排列顺序,反馈多个搜索对象的搜索结果。也即将优先级较高的搜索对象的搜索结果优先显示。
作为一种示例,按照多个搜索对象的排列顺序,反馈多个搜索对象的搜索结果的具体实现可以包括:获取多个搜索对象中每个搜索对象的搜索结果包括的对象标签,基于每个 搜索对象的对象标签,生成每个搜索对象对应的问询信息,问询信息用于提示是否需要获取与对象标签关联的信息,按照多个搜索对象的排列顺序展示多个搜索对象的问询信息,以反馈多个搜索对象的搜索结果。
其中,与每个搜索对象的对象标签关联的信息是指每个搜索对象的搜索结果中包括的除了对象标签之外的其他信息,譬如,包括搜索对象的语义信息、作品信息、图像等。
作为一种示例,N个对象包括第二对象和第三对象,第二对象的排列顺序位于第三对象的排列顺序之前,第二对象对应的搜索结果在显示屏上的显示顺序位于第三对象对应的搜索结果在显示屏上的显示顺序之前。
譬如,假设所获取的搜索结果包括第一图像中的人脸B的搜索结果和手机A的搜索结果,请参阅图13,如图13中的(a)图所示,手机A的置信度评分为0.886,以及如图13中的(b)图所示,人脸B的置信度评分为0.998。人脸B的置信度评分高于手机A的置信度评分,也即人脸B的排列顺序位于手机A的排列顺序之前。则在展示搜索结果的过程中,优先反馈人脸B的搜索结果,譬如可以将人脸B的搜索结果展示在手机A的搜索结果之前,显示结果如图4中的(b)图所示。
在本申请实施例中,对第一图像中的N个对象进行初次搜索,以确定第一图像中的N个对象的置信度评分,然后根据第一图像中的N个对象的置信度评分进行对象筛选,以筛选掉意图理解与搜索结果不匹配的对象,从而可以提高意图搜索的准确性。另外,基于置信度评分对筛选后的对象进行排序展示,使得用户可能感兴趣的对象的搜索结果优先展示,提高了用户体验。
请参考图14,图14是根据一示例性实施例示出的一种搜索结果的反馈方法流程图,这里以该方法中涉及的确定对象的排列顺序的相关过程由第三处理单元单独实现为例进行介绍,具体可以包括如下部分或者全部内容:
步骤1401至步骤1402的具体实现可以参见上述图9实施例中的步骤901至步骤902。
步骤1403:确定第一图像中的N个对象中每个对象的物体关系评分。
N个对象中任一个对象的物体关系评分用于指示任一个对象在第一图像中的重要程度。
作为本申请的一个示例,请参考图15,可以将第一图像划分为多个区域,示例性地,将第一图像划分为N*N个区域,其中,N为大于1的整数,可以根据实际需求进行设置。之后,从中心区域开始为每个区域设置预设分值,每个区域的预设分值用于指示每个区域的位置重要程度,且多个区域中的至少两个区域的预设分值不相同。通常情况下,越靠近中心区域的对象越是用户可能感兴趣的对象,因此在设置预设分值时,中心点的区域的预设分值设置的最高,并且,向周围扩散逐步降低。示例性地,可以按照高斯分布来设置每个区域的预设分值,预设分值的高斯分布情况如图16所示。
电子设备确定N个对象中每个对象的位置重要程度值。以第一对象为例,第一对象为N个对象中的任意一个对象。确定第一对象的对象框内包括的区域的预设分值的平均值,将得到的评分作为第一对象的位置重要程度值。示例性地,假设第一图像中的N个对象包括人脸、手机和衣服,通过计算确定第一图像中的人脸的位置重要程度值为0.78,手机的位置重要程度值为0.7,衣服的位置重要程度值为0.68。
之后,基于每个对象的位置重要程度值确定每个对象的物体关系评分。示例性地,将 每个对象的位置重要程度值确定为每个对象的物体关系评分,譬如,将第一对象的位置重要程度值确定为第一对象的物体关系评分。
在本申请一个实施例中,第一图像中还可能包括参考对象,参考对象可以根据实际需求进行设置,譬如参考对象可以为人体。以参考对象是人体为例,当第一图像包括人体时,通常距离人体较近的对象为用户可能感兴趣的对象,而距离人体较远的对象为用户不感兴趣的对象。因此,在确定第一图像中的N个对象的物体关系评分时,若第一图像中包括参考对象,则还可以基于第一对象在第一图像中的对象区域与所述参考对象在第一图像中的对象区域,获取第一对象与参考对象在所述第一图像中的距离作为第一对象的附属关系程度值。示例性地,若参考对象为人体,当确定手机与人体之间的距离为0.42时,确定手机的附属关系程度值为0.42。
作为一种示例,在确定第一对象与参考对象之间的距离时,可以确定第一对象的中心点与参考对象的中心点之间的距离。在一些实施例中,由于当第一图像的尺寸被调整时,第一对象的中心点与参考对象的中心点之间的距离会随之发生变化,所以为了保证针对不同尺寸的第一图像确定的附属关系程度值相同,可以对距离进行归一化处理,譬如可以利用第一图像对角线距离进行归一化处理。
当然,对于参考对象来说其附属关系程度值可以根据实际需求进行设置,譬如可以设置为1,譬如当参考对象为人体时,人脸的附属关系程度值为1。
在一个实施例中,当第一图像中包括多个参考对象时,确定第一对象与每个参考对象在第一图像中的距离,得到多个距离,然后将多个距离中的最小距离作为附属关系程度值。
在一个实施例中,当第一图像包括参考对象(如人体)时,电子设备根据第一对象的位置重要程度值和附属关系程度值确定第一对象的物体关系评分。示例性地,可以分别为位置重要程度值和附属关系程度值设置一定的权重,然后通过等权方式融合,得到第一对象的物体关系评分。其中,位置重要程度值的权重和附属关系程度值的权重均可以根据实际需求进行设置,本申请实施例对此不作限定。
步骤1404:根据第一图像中的N个对象的物体关系评分,确定N个对象的排列顺序。
在实施中,根据N个对象的物体关系评分的大小,确定N个对象的排列顺序。
步骤1405:根据N个对象的排列顺序,反馈N个对象中的部分对象或者全部对象的搜索结果。
在实施中,可以从N个对象中确定搜索对象。在一个实施中,搜索对象包括N个对象中的部分对象,在另一个实施例中,搜索对象包括N个对象中的部分对象。
作为一种示例,可以根据N个对象的排列顺序,从N个对象中选择排序靠前的预设数量个对象作为搜索对象。
作为另一种示例,可以根据物体关系评分确定搜索对象。当物体关系评分较大时,说明对应的对象较为重要,可能是用户感兴趣的对象,反之,当物体关系评分较小时,说明对应的对象不重要,也即可能不是用户感兴趣的对象。因此,电子设备可以将物体关系评分低于评分阈值的对象过滤掉,将物体关系评分高于评分阈值的对象确定为搜索对象。
其中,评分阈值可以由用户根据实际需求进行设置,或者,也可以由电子设备默认设置,本申请实施例对此不作限定。
值得一提的是,本申请实施例根据第一图像中的对象的物体关系评分从第一图像中筛 选搜索对象,以便于只对第一图像中较为重要的对象进行意图搜索,从而为用户提供感兴趣的搜索对象的搜索结果,可以提高用户体验。
之后,对搜索对象进行意图搜索,得到搜索结果,并反馈搜索对象的搜索结果。具体实现可以参见图9所示实施例中的步骤905的相关内容。
示例性地,假设第一图像中的搜索对象包括人脸B和手机A,所获取的搜索结果包括人脸B的搜索结果和手机A的搜索结果,若人脸B的物体关系评分大于手机A的物体关系评分,也即人脸B的排列顺序位于手机A的排列顺序之前,则在展示搜索结果的过程中,优先反馈人脸B的搜索结果,譬如可以将人脸B的搜索结果展示在手机A的搜索结果之前。
在本申请实施例中,通过确定第一图像中的N个对象在二维空间里的位置重要程度值,以及确定每个对象与参考对象之间的附属关系程度值,然后基于位置重要程度值和附属关系程度值确定每个对象的物体关系评分,使得对象的搜索意图与对象属性相关联,从而为用户优先推荐用户可能感兴趣的搜索对象的搜索结果,进而提高了用户体验。
请参考图17,图17是根据一示例性实施例示出的一种搜索结果的反馈方法流程图,这里以该方法中涉及的确定对象的排列顺序的相关过程由第一处理单元单独实现为例进行介绍,具体可以包括如下部分或者全部内容:
步骤1701至步骤1702的具体实现可以参见上述图9实施例中的步骤901至步骤902。
步骤1703:对第一图像进行场景识别。
作为一种示例,电子设备通过第一处理单元对第一图像进行场景识别,以确定第一图像对应的场景类别。作为本申请的一个示例,场景类别可以包括但不限于人物海报、酒吧、饭店、咖啡屋、电影院、会议室、室外风景、室外地标。
在本申请一种可能的实现方式中,第一处理单元可以通过场景识别模型对第一图像进行场景识别,场景识别模型可以是预先训练好的且能够基于任一图像确定对应的场景类别的模型。在实施中,可以将第一图像输入至场景识别模型中,由场景识别模型对第一图像进行场景识别,输出第一图像对应的场景类别。譬如,通过场景识别模型确定第一图像对应的场景类别为人物海报。
作为一种示例,场景识别模型的训练过程可以包括:获取场景训练样本集,场景训练样本集包括多个场景训练样本,每个场景训练样本包括标识有场景类别标签的场景图像。将场景训练样本集输入至待训练的模型中进行训练,当满足预设训练条件时训练结束,如此即可得到场景识别模型。
预设训练条件可以根据实际需求进行设置。示例性地,预设训练条件可以是指训练次数达到次数阈值,或者,预设训练条件还可以是指训练后的模型的场景识别准确度达到准确度阈值。
次数阈值可以由用户根据实际需求进行设置,或者,也可以由电子设备默认设置,本申请实施例对此不作限定。
准确度阈值可以由用户根据实际需求进行设置,或者,也可以由电子设备默认设置,本申请实施例对此不作限定。
作为一种示例,待训练的模型可以为分类网络模型,示例性地,分类网络模型可以为 mobilenetV3和SSD结合的模型,也即以mobilenetV3作为网络基础,以及使用SSD的结构构成的模型。当然这里仅是以待训练的模型是mobilenetV3和SSD结合的模型为例,在另一实施例中,待训练的模型还可以是其他分类网络模型,本申请实施例对此不作限定。
需要说明的是,场景识别模型的训练可以由电子设备来执行,或者,也可以由其他设备执行,之后,将训练好的场景识别模块存储至电子设备的第一处理单元中。
步骤1704:根据第一图像的场景类别和第一图像中的N个对象的对象类别,确定第一图像中的N个对象的排列顺序。
其中,不同场景对应有至少一种场景意图权重。任一个对象的场景意图权重用于指示在第一图像的场景中任一个对象被搜索的概率,每种场景对应的至少一种场景意图权重可以根据实际需求预先进行设置。示例性地,不同场景对应的至少一种场景意图权重如表2所示:
表2
场景类别 场景意图权重1 场景意图权重2 场景意图权重3 场景意图权重n
人物海报 名人1.0 购物0.9 识物0.8 ......
风景 地标1.0 识物0.6 名人0.5 ......
居家室内 购物1.0 识物0.6 名人0.6 ......
每种搜索意图与一种对象类别对应。电子设备可以根据场景类别确定对应的至少一种场景意图权重,另外,根据第一图像中的N个对象的对象类别确定对应的搜索意图,从而可以确定N个对象中每个对象的场景意图权重。譬如,假设电子设备确定第一图像的场景类别为“人物海报”,第一图像中的N个对象包括人脸、手机和衣服,N个对象的对象类别包括:人物、商品、物品,对应的搜索意图分别为名人、购物和识物,则根据上述表1可以确定各个对象的场景意图权重分别为1.0,0.9和0.8。
之后,可以根据N个对象的场景意图权重的大小,确定N个对象的排列顺序。
步骤1705:根据N个对象的排列顺序,反馈N个对象中的部分对象或者全部对象的搜索结果。
在一个实施例中,电子设备从第一图像中筛选出场景意图权重大于权重阈值的对象,将筛选出的对象确定为搜索对象。
其中,权重阈值可以由用户根据实际需求进行设置,或者,也可以由电子设备默认设置,本申请实施例对此不作限定。
之后,对第一图像中的搜索对象进行意图搜索,得到搜索结果,并反馈搜索对象的搜索结果。具体实现可以参见图9所示实施例中的步骤905的相关内容。
示例性地,电子设备可以根据各个搜索对象的场景意图权重进行搜索意图排序,譬如假设搜索对象包括人脸B和手机A,排序结果为:名人、购物。也即是,第一图像对应的最大可能性的搜索意图是名人搜索,其次是购物,譬如可能是购买第一图像中的手机。由于名人的场景意图权重高于购物的场景意图权重,也即人脸B的排列顺序位于手机A的排列顺序之前,所以在反馈搜索结果的过程中,优先反馈人脸B的搜索结果,譬如可以将人脸B的搜索结果展示在手机A的搜索结果之前。
在本申请实施例中,通过确定第一图像的场景类别,然后根据场景类别确定第一图像中的搜索对象的搜索意图排序,根据搜索意图排序展示第一图像中的搜索对象的搜索结果。 如此,将搜索结果与场景相结合,使得搜索结果与场景更为贴近,避免搜索结果与第一图像的场景相脱离,使得向用户展示的搜索结果更加符合第一图像的场景,从而为用户优先推荐用户可能感兴趣的搜索结果,进而提高了用户体验。
需要说明的是,上述各个实施例是以该方法中涉及的确定对象的排列顺序的相关过程由一个处理单元独立实现为例进行说明,如前文所述,在一个实施例中,本申请实施例提供的方法中涉及的确定对象的排列顺序的相关过程由可以由上述三个处理单元中的任意两个结合实现,或者,还可以由上述三个处理单元相互结合实现。接下来以通过三个处理单元结合实现为例对本申请实施例提供的方法的具体实现进行介绍:
请参考图18,图18是本申请实施例提供的一种搜索结果的反馈方法的示意性流程图,该方法可以包括如下部分或者全部内容:
步骤1801至步骤1802可以参见上述图9所示实施例中的步骤901至步骤902。
步骤1803:通过第一处理单元确定N个对象中每个对象的场景意图权重。
场景意图权重用于指示在第一图像的场景下第一对象被搜索的概率,第一对象为N个对象中的任意一个对象。其具体实现可以参见图9所示的实施例。
步骤1804:通过第二处理单元确定N个对象中每个对象的置信度评分。
置信度评分为第一对象与图像库中的图像的相似度。其具体实现可以参见图14所示的实施例。
步骤1805:通过第三处理单元确定N个对象中每个对象的物体关系评分。
物体关系评分用于指示第一对象在第一图像中的重要程度,示例性地,物体关系评分基于位置重要程度值确定,或者基于位置重要程度值和附属关系程度值确定。其具体实现可以参见图17所示的实施例。
需要说明的是,上述步骤第一处理单元、第二处理单元和第三处理单元没有先后执行顺序,在一个实施例中,第一处理单元、第二处理单元和第三处理单元可以并行执行。
步骤1806:基于第一对象的置信度评分、物体关系评分和场景意图权重,确定第一对象的优先级。
其中,第一对象的优先级与第一对象的排序顺序对应,也即第一对象的优先级越高,排列顺序越靠前。
第一对象为N个对象中的任一个对象,也即这里以第一对象为例进行说明,在实施中,基于每个对象的置信度评分、物体关系评分和场景意图权重,确定每个对象的优先级。
示例性地,假设物体关系评分是基于位置重要程度值和附属关系程度值确定的,以S表示场景意图权重,F表示对象的置信度评分,L和H分别表示对象的位置重要程度值和附属关系程度值,则可以通过如下公式(2)来确定每个对象的优先级值,优先级值用于指示优先级:
f n=xF n+yL n+kH n+zS n-c-I     (2)
其中,f n表示对象n的优先级值,F n表示对象n的置信度评分,L n表示对象n的位置重要程度值,H n表示对象n的附属关系程度值,S n-c-I表示对象n的对象类别在第一图像中对应的场景意图权重。x,y,k,z分别为预设值。
步骤1807:基于N个对象的优先级,从第一图像中确定搜索对象。
搜索对象包括第一图像包括的N个对象中的部分对象或者全部对象。
步骤1808:获取搜索对象的搜索结果。
步骤1809:反馈搜索对象的搜索结果。
在本申请实施例中,在基于第一图像进行意图搜索的过程中,确定第一图像的场景类别,确定第一图像中的对象的置信度评分和物体关系评分,然后基于场景类别对应的场景意图权重,以及第一图像中的对象的置信度评分和物体关系评分确定对象的优先级,根据优先级对第一图像中的对象进行排序展示,可以尽可能地展示用户可能感兴趣的对象的搜索结果,提高了意图搜索的准确性,进而提高了用户体验。
应理解,上述实施例中各步骤的序号并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于上文实施例所述的反馈搜索结果的方法,图19是本申请实施例提供的一种反馈搜索结果的装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。
参照图19,装置包括:
获取模块1910,用于获取第一图像,所述第一图像包括M个对象,所述M为大于或等于2的整数;
确定模块1920,用于对于所述M个对象中的N个对象,在所述N大于或等于2的情况下,确定所述N个对象的排列顺序,所述N为小于或等于所述M的正整数;
其中,所述N个对象中任一个对象的排列顺序是基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定的,所述场景意图权重用于指示在所述第一图像对应的场景下所述任一个对象被搜索的概率,所述置信度评分为所述任一个对象与图像库中的图像的相似度,所述物体关系评分用于指示所述任一个对象在所述第一图像中的重要程度;
反馈模块1930,用于根据所述N个对象的排列顺序,反馈所述N个对象中的部分对象或者全部对象的搜索结果。
作为本申请的一个示例,所述确定模块1920用于:
通过目标检测模型对所述第一图像进行图像检测处理,得到第一对象在所述第一图像中的对象区域和/或所述第一对象的对象类别,所述第一对象为所述N个对象中的任一个对象;
基于所述第一对象在所述第一图像中的对象区域和所述第一对象的对象类别确定所述第一对象的置信度评分,和/或,基于所述第一对象在所述第一图像中的对象区域确定所述第一对象的物体关系评分,和/或,基于所述第一对象的对象类别确定所述第一对象的场景意图权重。
作为本申请的一个示例,所述确定模块1920用于:
确定与所述第一对象的对象类别对应的图像库;
确定所述第一对象在所述第一图像中的对象区域与所述图像库包括的多个图像中每个图像之间的相似度;
将所确定的多个相似度中的最大相似度作为所述第一对象的置信度评分。
作为本申请的一个示例,所述第一图像包括多个区域,且所述多个区域中的每个区域具有用于指示所述每个区域的位置重要程度的预设分值,且所述多个区域中的至少两个区 域的预设分值不相同;所述确定模块1920用于:
基于所述第一对象在所述第一图像中的对象区域内包括的每个区域的预设分值,确定所述第一对象在所述第一图像中的位置重要程度值;
基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分。
作为本申请的一个示例,所述确定模块1920用于:
当所述第一图像中包括参考对象时,基于所述第一对象在所述第一图像中的对象区域与所述参考对象在所述第一图像中的对象区域,获取所述第一对象与所述参考对象在所述第一图像中的距离作为所述第一对象的附属关系程度值;
基于所述第一对象的位置重要程度值和附属关系程度值,确定所述第一对象的物体关系评分。
作为本申请的一个示例,所述确定模块1920用于:
确定所述第一图像的场景类别;
基于所述第一图像的场景类别和所述第一对象的对象类别,以及场景类别、对象类别与场景意图权重之间的对应关系,确定所述第一对象的场景意图权重。
作为本申请的一个示例,所述N个对象在所述第一图像中的质量评分大于或等于质量评分阈值,所述任一个对象的质量评分是基于所述任一个对象的模糊度和/或完整度确定的。
作为本申请的一个示例,所述反馈模块1930用于:
获取所述N个对象中的部分对象或者全部对象的搜索结果包括的对象标签;
基于所获取的对象标签,生成所述N个对象中的部分对象或者全部对象对应的问询信息,所述问询信息用于提示是否需要获取与对象标签关联的信息;
根据所述N个对象的排列顺序,展示所述N个对象中的部分对象或者全部对象的问询信息,以反馈所述N个对象中的部分对象或者全部对象的搜索结果。
作为本申请的一个示例,所述N个对象包括第二对象和第三对象,所述第二对象的排列顺序位于所述第三对象的排列顺序之前,所述第二对象对应的搜索结果在显示屏上的显示顺序位于所述第三对象对应的搜索结果在显示屏上的显示顺序之前。
在本申请实施例中,获取第一图像,第一图像包括M个对象,M为大于或等于2的整数。对于M个对象中的N个对象,在N个对象的数量为多个的情况下,确定N个对象的排列顺序,根据N个对象的排列顺序,反馈N个对象中的部分对象或者全部对象的搜索结果。其中,N个对象中任一个对象的排列顺序是基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定的,场景意图权重用于指示在第一图像对应的场景下任一个对象被搜索的概率,置信度评分为任一个对象与图像库中的图像的相似度,物体关系评分用于指示任一个对象在第一图像中的重要程度。如此,根据N个对象的排列顺序对N个对象进行筛选和排序,以将具有良好搜索结果且用户可能感兴趣的对象的搜索结果优先反馈给用户,使得反馈具有一定的针对性,从而提高了最终反馈的搜索结果的准确度。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也 可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的系统实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到电子设备的任何实体或装置、记录介质、计算机存储器、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种反馈搜索结果的方法,其特征在于,所述方法包括:
    获取第一图像,所述第一图像包括M个对象,所述M为大于或等于2的整数;
    对于所述M个对象中的N个对象,在所述N大于或等于2的情况下,确定所述N个对象的排列顺序,所述N为小于或等于所述M的正整数;
    其中,所述N个对象中任一个对象的排列顺序是基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定的,所述场景意图权重用于指示在所述第一图像对应的场景下所述任一个对象被搜索的概率,所述置信度评分为所述任一个对象与图像库中的图像的相似度,所述物体关系评分用于指示所述任一个对象在所述第一图像中的重要程度;
    根据所述N个对象的排列顺序,反馈所述N个对象中的部分对象或者全部对象的搜索结果。
  2. 如权利要求1所述的方法,其特征在于,所述N个对象中任一个对象的场景意图权重、置信度评分、物体关系评分中的任一项或多项通过如下方式确定:
    通过目标检测模型对所述第一图像进行图像检测处理,得到第一对象在所述第一图像中的对象区域和/或所述第一对象的对象类别,所述第一对象为所述N个对象中的任一个对象;
    基于所述第一对象在所述第一图像中的对象区域和所述第一对象的对象类别确定所述第一对象的置信度评分,和/或,基于所述第一对象在所述第一图像中的对象区域确定所述第一对象的物体关系评分,和/或,基于所述第一对象的对象类别确定所述第一对象的场景意图权重。
  3. 如权利要求2所述的方法,其特征在于,所述基于所述第一对象在所述第一图像中的对象区域和所述第一对象的对象类别确定所述第一对象的置信度评分,包括:
    确定与所述第一对象的对象类别对应的图像库;
    确定所述第一对象在所述第一图像中的对象区域与所述图像库包括的多个图像中每个图像之间的相似度;
    将所确定的多个相似度中的最大相似度作为所述第一对象的置信度评分。
  4. 如权利要求2所述的方法,其特征在于,所述第一图像包括多个区域,且所述多个区域中的每个区域具有用于指示所述每个区域的位置重要程度的预设分值,且所述多个区域中的至少两个区域的预设分值不相同;
    所述基于所述第一对象在所述第一图像中的对象区域确定所述第一对象的物体关系评分,包括:
    基于所述第一对象在所述第一图像中的对象区域内包括的每个区域的预设分值,确定所述第一对象在所述第一图像中的位置重要程度值;
    基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分。
  5. 如权利要求4所述的方法,其特征在于,所述基于所述第一对象的位置重要程度值 确定所述第一对象的物体关系评分之前,还包括:
    当所述第一图像中包括参考对象时,基于所述第一对象在所述第一图像中的对象区域与所述参考对象在所述第一图像中的对象区域,获取所述第一对象与所述参考对象在所述第一图像中的距离作为所述第一对象的附属关系程度值;
    所述基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分,包括:
    基于所述第一对象的位置重要程度值和附属关系程度值,确定所述第一对象的物体关系评分。
  6. 如权利要求2所述的方法,其特征在于,所述基于所述第一对象的对象类别确定所述第一对象的场景意图权重,包括:
    确定所述第一图像的场景类别;
    基于所述第一图像的场景类别和所述第一对象的对象类别,以及场景类别、对象类别与场景意图权重之间的对应关系,确定所述第一对象的场景意图权重。
  7. 如权利要求1-6中任一项所述的方法,其特征在于,所述N个对象在所述第一图像中的质量评分大于或等于质量评分阈值,所述任一个对象的质量评分是基于所述任一个对象的模糊度和/或完整度确定的。
  8. 如权利要求1-7中任一项所述的方法,其特征在于,所述根据所述N个对象的排列顺序,反馈所述N个对象中的部分对象或者全部对象的搜索结果,包括:
    获取所述N个对象中的部分对象或者全部对象的搜索结果包括的对象标签;
    基于所获取的对象标签,生成所述N个对象中的部分对象或者全部对象对应的问询信息,所述问询信息用于提示是否需要获取与对象标签关联的信息;
    根据所述N个对象的排列顺序,展示所述N个对象中的部分对象或者全部对象的问询信息,以反馈所述N个对象中的部分对象或者全部对象的搜索结果。
  9. 如权利要求1-8中任一项所述的方法,其特征在于,所述N个对象包括第二对象和第三对象,所述第二对象的排列顺序位于所述第三对象的排列顺序之前,所述第二对象对应的搜索结果在显示屏上的显示顺序位于所述第三对象对应的搜索结果在显示屏上的显示顺序之前。
  10. 一种反馈搜索结果的装置,其特征在于,所述装置包括:
    获取模块,用于获取第一图像,所述第一图像包括M个对象,所述M为大于或等于2的整数;
    确定模块,用于对于所述M个对象中的N个对象,在所述N大于或等于2的情况下,确定所述N个对象的排列顺序,所述N为小于或等于所述M的正整数;
    其中,所述N个对象中任一个对象的排列顺序是基于场景意图权重、置信度评分、物体关系评分中的任一项或多项确定的,所述场景意图权重用于指示在所述第一图像对应的场景下所述任一个对象被搜索的概率,所述置信度评分为所述任一个对象与图像库中的图 像的相似度,所述物体关系评分用于指示所述任一个对象在所述第一图像中的重要程度;
    反馈模块,用于根据所述N个对象的排列顺序,反馈所述N个对象中的部分对象或者全部对象的搜索结果。
  11. 如权利要求10所述的装置,其特征在于,所述确定模块用于:
    通过目标检测模型对所述第一图像进行图像检测处理,得到第一对象在所述第一图像中的对象区域和/或所述第一对象的对象类别,所述第一对象为所述N个对象中的任一个对象;
    基于所述第一对象在所述第一图像中的对象区域和所述第一对象的对象类别确定所述第一对象的置信度评分,和/或,基于所述第一对象在所述第一图像中的对象区域确定所述第一对象的物体关系评分,和/或,基于所述第一对象的对象类别确定所述第一对象的场景意图权重。
  12. 如权利要求11所述的装置,其特征在于,所述确定模块用于:
    确定与所述第一对象的对象类别对应的图像库;
    确定所述第一对象在所述第一图像中的对象区域与所述图像库包括的多个图像中每个图像之间的相似度;
    将所确定的多个相似度中的最大相似度作为所述第一对象的置信度评分。
  13. 如权利要求11所述的装置,其特征在于,所述第一图像包括多个区域,且所述多个区域中的每个区域具有用于指示所述每个区域的位置重要程度的预设分值,且所述多个区域中的至少两个区域的预设分值不相同;所述确定模块用于:
    基于所述第一对象在所述第一图像中的对象区域内包括的每个区域的预设分值,确定所述第一对象在所述第一图像中的位置重要程度值;
    基于所述第一对象的位置重要程度值确定所述第一对象的物体关系评分。
  14. 如权利要求13所述的装置,其特征在于,所述确定模块用于:
    当所述第一图像中包括参考对象时,基于所述第一对象在所述第一图像中的对象区域与所述参考对象在所述第一图像中的对象区域,获取所述第一对象与所述参考对象在所述第一图像中的距离作为所述第一对象的附属关系程度值;
    基于所述第一对象的位置重要程度值和附属关系程度值,确定所述第一对象的物体关系评分。
  15. 一种反馈搜索结果的装置,其特征在于,所述装置包括存储器和处理器;
    所述存储器用于存储支持所述装置执行权利要求1-9任一项所述的方法的程序,以及存储用于实现权利要求1-9任一项所述的方法所涉及的数据;
    所述处理器被配置为用于执行所述存储器中存储的程序。
  16. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,其特征在于,当其在计算机上运行时,使得计算机执行权利要求1-9任一项所述的方法。
PCT/CN2021/139762 2021-02-26 2021-12-20 反馈搜索结果的方法、装置及存储介质 WO2022179271A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21927695.3A EP4283491A1 (en) 2021-02-26 2021-12-20 Search result feedback method and device, and storage medium
US18/548,039 US20240126808A1 (en) 2021-02-26 2021-12-20 Search result feedback method and apparatus, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110222937.3A CN114969408A (zh) 2021-02-26 2021-02-26 反馈搜索结果的方法、装置及存储介质
CN202110222937.3 2021-02-26

Publications (1)

Publication Number Publication Date
WO2022179271A1 true WO2022179271A1 (zh) 2022-09-01

Family

ID=82974308

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139762 WO2022179271A1 (zh) 2021-02-26 2021-12-20 反馈搜索结果的方法、装置及存储介质

Country Status (4)

Country Link
US (1) US20240126808A1 (zh)
EP (1) EP4283491A1 (zh)
CN (1) CN114969408A (zh)
WO (1) WO2022179271A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412938A (zh) * 2013-08-22 2013-11-27 成都数之联科技有限公司 一种基于图片交互式多目标提取的商品比价方法
CN105468628A (zh) * 2014-09-04 2016-04-06 阿里巴巴集团控股有限公司 一种排序方法及装置
WO2019226299A1 (en) * 2018-05-21 2019-11-28 Microsoft Technology Licensing, Llc System and method for attribute-based visual search over a computer communication network
CN111143543A (zh) * 2019-12-04 2020-05-12 北京达佳互联信息技术有限公司 一种对象推荐方法、装置、设备及介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412938A (zh) * 2013-08-22 2013-11-27 成都数之联科技有限公司 一种基于图片交互式多目标提取的商品比价方法
CN105468628A (zh) * 2014-09-04 2016-04-06 阿里巴巴集团控股有限公司 一种排序方法及装置
WO2019226299A1 (en) * 2018-05-21 2019-11-28 Microsoft Technology Licensing, Llc System and method for attribute-based visual search over a computer communication network
CN111143543A (zh) * 2019-12-04 2020-05-12 北京达佳互联信息技术有限公司 一种对象推荐方法、装置、设备及介质

Also Published As

Publication number Publication date
CN114969408A (zh) 2022-08-30
EP4283491A1 (en) 2023-11-29
US20240126808A1 (en) 2024-04-18

Similar Documents

Publication Publication Date Title
WO2020238356A1 (zh) 界面显示方法、装置、终端及存储介质
EP3859561A1 (en) Method for processing video file, and electronic device
WO2021104485A1 (zh) 一种拍摄方法及电子设备
WO2021244457A1 (zh) 一种视频生成方法及相关装置
WO2022127787A1 (zh) 一种图像显示的方法及电子设备
WO2021013132A1 (zh) 输入方法及电子设备
WO2021258797A1 (zh) 图像信息输入方法、电子设备及计算机可读存储介质
WO2022017261A1 (zh) 图像合成方法和电子设备
WO2021258814A1 (zh) 视频合成方法、装置、电子设备及存储介质
US20220262035A1 (en) Method, apparatus, and system for determining pose
CN115016869A (zh) 帧率调整方法、终端设备及帧率调整系统
CN111835904A (zh) 一种基于情景感知和用户画像开启应用的方法及电子设备
WO2022206494A1 (zh) 目标跟踪方法及其装置
WO2022156473A1 (zh) 一种播放视频的方法及电子设备
CN115115679A (zh) 一种图像配准方法及相关设备
WO2022007707A1 (zh) 家居设备控制方法、终端设备及计算机可读存储介质
CN115437601B (zh) 图像排序方法、电子设备、程序产品及介质
WO2022166435A1 (zh) 分享图片的方法和电子设备
WO2021196980A1 (zh) 多屏交互方法、电子设备及计算机可读存储介质
US20230385345A1 (en) Content recommendation method, electronic device, and server
WO2022062902A1 (zh) 一种文件传输方法和电子设备
WO2022179271A1 (zh) 反馈搜索结果的方法、装置及存储介质
CN113970965A (zh) 消息显示方法和电子设备
WO2022228010A1 (zh) 一种生成封面的方法及电子设备
WO2023155746A1 (zh) 图片搜索方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21927695

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2021927695

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 18548039

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2021927695

Country of ref document: EP

Effective date: 20230823

NENP Non-entry into the national phase

Ref country code: DE