WO2017196169A1 - System for determining user exposure to audio fragments - Google Patents

System for determining user exposure to audio fragments Download PDF

Info

Publication number
WO2017196169A1
WO2017196169A1 PCT/NL2017/050289 NL2017050289W WO2017196169A1 WO 2017196169 A1 WO2017196169 A1 WO 2017196169A1 NL 2017050289 W NL2017050289 W NL 2017050289W WO 2017196169 A1 WO2017196169 A1 WO 2017196169A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
user device
audio fragment
database
trigger
Prior art date
Application number
PCT/NL2017/050289
Other languages
French (fr)
Inventor
Stefan Petrus Reinier Maria VERHAGEN
Original Assignee
Audiocoup B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Audiocoup B.V. filed Critical Audiocoup B.V.
Priority to EP17727416.4A priority Critical patent/EP3455814A1/en
Priority to US16/300,145 priority patent/US20190146994A1/en
Publication of WO2017196169A1 publication Critical patent/WO2017196169A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/64Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/638Presentation of query results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the invention relates to automated recognition of audio fragments.
  • the system includes a first database including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith.
  • the system includes a user device, such as a mobile user device.
  • the user device can e.g. be a smart phone.
  • the user device can include a microphone, a triggering unit and a communications unit.
  • the user device is arranged for receiving a trigger. After having received the trigger the user device is arranged to record ambient sound.
  • the user device is arranged to determine at least one fingerprint for the recorded sound.
  • the user device communicates with the first database for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound.
  • the system includes a second database.
  • the user device is arranged for communicating with the second database for storing therein data representative of the identified audio fragment having been heard.
  • the user device can be triggered, e.g. externally or internally, to start listening to ambient sound and automatically checks for potential audio fragments included in the first database.
  • the first database need not include the full audio fragments, it can also include data representative of the audio fragments, such as one or more fingerprints and e.g. an identification of the audio fragment.
  • the first database can be internal to the user device and/or external to the user device, e.g. at an internet server. If it has been determined that the user device has listened to an audio fragment included in the first database, a note of this can be made in the second database.
  • the note can include the identified audio fragment and/or an identification thereof.
  • the note can include a time of the identified audio fragment having been heard.
  • the note can include an identification of the user and/or user device.
  • the note can include information relating to the circumstances under which the identified audio fragment was heard, e.g. noisy background, quiet background, a location of the user device (e.g. GPS) when the fragment was heard, an indication of movement of the user device while the fragment was heard, connection of the user device to other devices while the fragment was heard (e.g. WiFi of a home router, Bluetooth of a car), which app was active on the user device.
  • the fingerprint includes one or more parameters representative of the audio fragment.
  • the parameters can include a (relative) volume, pitch, and/or spectrum.
  • the fingerprint can include a time evolution of the one or more parameters.
  • the fingerprint includes a spectral slice fingerprint, a multi- slice fingerprint, an LPC coefficient, a cepstral coefficient, frequency components of a spectrogram peak, or a combination thereof.
  • the fingerprint includes a time evolution of a spectral slice fingerprint, a multi-slice fingerprint, an LPC coefficient, a cepstral coefficient, frequency components of a spectrogram peak, or a combination thereof.
  • the fingerprint includes a transcription of a speech fragment included in the audio fragment.
  • the transcription is representative for the speech fragment.
  • the identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound includes comparing the at least one fingerprint of the recorded sound with the at least one fingerprint of each of the audio fragments of the plurality of audio fragments.
  • the identifying can include selecting the audio fragment of the plurality of audio fragments of which the at least one fingerprint best matches the at least one fingerprint of the recorded sound.
  • the audio fragment can include an audiovisual fragment, such as a video fragment including an audio track.
  • the audio fragment can e.g. be a commercial.
  • the first database can e.g. include data representative of a plurality of commercials.
  • the system includes a broadcasting unit for broadcasting one or more audio fragments of the plurality of audio fragments.
  • the broadcasting unit can e.g. be a radio station, television station, internet server, environmental broadcast unit in a store or the like.
  • the system includes a trigger transmitter arranged for transmitting the trigger to the user device.
  • the trigger transmitter can e.g. be an internet server, a telecommunications server, a GPS satellite, a beacon or the like.
  • the trigger transmitter is arranged for receiving a broadcast audio fragment, determining whether the received audio fragment corresponds to an audio fragment of the plurality of audio fragments and if the received audio fragment corresponds to an audio fragment of the plurality of audio fragments transmitting the trigger to the user device.
  • the trigger transmitter can, e.g. continuously, check whether an audio fragment of the plurality of audio fragments is being broadcast.
  • the trigger transmitter determines that one such audio fragment is being broadcast, it can trigger the user device, or a plurality of user devices, to start listening whether the user device(s) hear the audio fragment as well.
  • the trigger transmitter may e.g. monitor a radio channel or a television channel to check whether one or more commercials included in the first database are being broadcast. Once the trigger transmitter detects such commercial, it can trigger user devices to check whether they hear the commercial as well.
  • the user device includes a Speech-To-Text, STT, unit for converting the recorded sound to text.
  • the first database may include text fragments associated with audio fragments via STT when the text fragments are spoken.
  • the user device may record ambient sound and convert it into text using the STT unit. It can then be determined whether the recorded ambient sound includes one or more of the text fragments included in the first database.
  • the user device is arranged for receiving a trigger in the form of a transmission, a geolocation, a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like.
  • the user device is arranged for in response to identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound directing a web navigation to a predetermined URL.
  • the user device has a user account associated therewith, wherein in response to the user device storing in the second database the data representative of the identified audio fragment having been heard credits are assigned to the user account. It is for example possible that the user that heard a certain commercial (as determined by his user device) is eligible for a discount or cashback is provided.
  • a user device for determining user exposure to audio fragments.
  • the user device includes a microphone, a triggering unit and a communications unit.
  • the user device is arranged for receiving a trigger.
  • the user device is arranged for after having received the trigger recording ambient sound, determining at least one fingerprint for the recorded sound, and communicating with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound.
  • the user device is arranged for communicating with a second database for storing therein data representative of the identified audio fragment having been heard.
  • the user device can be a mobile user device, such as a smart phone having dedicated software, such as an app, installed and running thereon.
  • the user device includes or is in communication with a Speech-To-Text unit, STT, for converting the recorded sound to text.
  • STT Speech-To-Text unit
  • the user device is arranged for receiving a trigger in the form of a transmission, a geolocation, a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like.
  • the user device is arranged for communicating with the second database for storing therein, in relation to the data representative of the identified audio fragment having been heard, one or more of a time of the identified audio fragment having been heard, an identification of the user and/or user device.
  • a trigger transmitter is arranged for receiving a broadcast audio fragment, further arranged for communicating with a first database including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith.
  • the trigger transmitter is arranged for determining whether the received audio fragment corresponds to an audio fragment of the plurality of audio fragments and if the received audio fragment corresponds to an audio fragment of the plurality of audio fragments transmitting a trigger to a user device.
  • a method for determining user exposure to audio fragments using a user device includes: receiving a trigger; after having received the trigger recording ambient sound; determining at least one fingerprint for the recorded sound; communicating with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound; and communicating with a second database for storing therein data representative of the identified audio fragment having been heard.
  • a computer program product including software code portions which, when run on a programmable apparatus, cause the apparatus to be ready to receive a trigger; after having received the trigger record ambient sound; determine at least one fingerprint for the recorded sound; communicate with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound; and communicate with a second database for storing therein data representative of the identified audio fragment having been heard.
  • the computer program product can e.g. be an app designed to be run on a mobile device such as a smart phone, tablet, or vehicle such as a car.
  • the computer program product can be included on a non-transitory storage medium.
  • Fig. 1 shows a schematic representation of a system for determining user exposure to audio fragments
  • Fig. 2 shows a schematic representation of a system for determining user exposure to audio fragments
  • Fig. 3 shows a schematic representation of a system for determining user exposure to audio fragments.
  • Figure 1 shows a schematic representation of a system 1 for determining user exposure to audio fragments.
  • the system 1 includes a user device 2. It will be appreciated that the system 1 may include a plurality of user devices 2.
  • the system 1 further includes a broadcasting unit 4, here in the form of a television broadcasting station.
  • the broadcasting station broadcasts programming of a television channel.
  • the programming includes audiovisual fragments.
  • the audiovisual fragments may e.g. relate to programs, such as shows, movies, sports registrations, news programs and the like. Some of the audiovisual fragments may relate to commercials.
  • the system 1 includes a trigger transmitter 6.
  • the trigger transmitter includes an internet server.
  • the trigger transmitter 6 monitors television channel broadcasts by the broadcasting unit 4.
  • the trigger transmitter 6 monitors television channel broadcasts by the broadcasting unit 4 continuously.
  • the trigger transmitter 6 can monitor one or a plurality of television channels simultaneously.
  • the trigger transmitter 6 is arranged for recognizing certain audiovisual fragments in the programming of the television channel.
  • the trigger transmitter 6 performs video recognition on the broadcasted videos.
  • the trigger transmitter 6 has a trigger database associated therewith which includes a plurality of audiovisual fragments (or data representative thereof) which the trigger transmitter 6 is to recognize. Once the trigger transmitter recognizes one of the audiovisual fragments included in the trigger database being broadcast, it generates a trigger which is transmitted to the user devices 2.
  • the audiovisual fragments in the trigger database relate to commercials.
  • the user device 2 in this example runs an app causing it to activate upon receipt of a trigger from the trigger transmitter 6. It will be appreciated that the app may thereto remain active in the background of the user device.
  • the user device 2 starts listening.
  • the user device 2 records ambient sound using a, e.g. built in, microphone. If the user device 2 is in the neighborhood of a television set tuned to any television channel monitored by the trigger transmitter 6 it may record an audio fragment associated with the television programming of the monitored channel.
  • the user device 2 determines one or more fingerprints for the audio being recorded.
  • the user device 2 communicates with an audio fragment database 8.
  • the audio fragment database 8 includes data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith.
  • the audio fragments in the audio fragment database relate to commercials, e.g. to audio tracks of television commercials.
  • the fingerprint(s) determined by the user device 2 is compared with fingerprints in the audio fragment database 8. This comparing can be done by the user device 2 and/or by an internet server. If a match is found, it is determined that the associated audio fragment in the audio fragment database 8 has been heard by the user device 2.
  • the user device 2 then communicates with a result database 10.
  • the user device 2 transmits to the result database 10, for storing therein, an indication that the identified audio fragment has been heard.
  • the indication may include data representative of the identified audio fragment.
  • the indication can include a time of the identified audio fragment having been heard.
  • the indication can include an identification of the user and/or user device.
  • the indication can include information relating to the circumstances under which the identified audio fragment was heard, e.g. noisy background, quiet background, a location of the user device (e.g. GPS) when the fragment was heard, an indication of movement of the user device while the fragment was heard, connection of the user device to other devices while the fragment was heard (e.g. WiFi of a home router, Bluetooth of a car), or which app was active on the user device.
  • the result database 10 includes data
  • An internet browser on the user device 2 may automatically be directed to a predetermined URL, e.g. related to the commercial.
  • An offer may be made to the user of the user device 2.
  • a credit may be awarded to the user of the user device, such as a discount a store credit or a cashback.
  • the user device 2 After the user device 2 has been triggered, it may remain active, i.e. recording ambient sound for a predetermined time. It is also possible that the user device remains active as long as it recognizes audio fragments. It is also possible that the user device remains active for a predetermined time after having last recognized an audio fragment.
  • FIG. 2 shows a schematic representation of a system 1 for determining user exposure to audio fragments.
  • the system 1 includes a user device 2. It will be appreciated that the system 1 may include a plurality of user devices 2.
  • the user device is triggered manually, e.g. by pressing a button on the device 2 or by touch interaction with a touch screen of the device 2.
  • the user device 2 is associated with a service employee 12.
  • the service employee 12 may e.g. trigger the device 2 when approaching a customer 14.
  • the ambient sound can include a conversation between the service employee 12 and the customer 14.
  • the user device 2 determines fingerprints for fragments of the recorded audio.
  • the audio fragment database 8 includes
  • predetermined sentences the service employee 12 may speak to the customer 14 such as "would you like a cup of coffee?", “would you like to use our restaurant?", “do you know our special offer?", "do you already have coupons?”, "cola”, “coffee with cake”, “coffee with sugar”, or the like.
  • Each predetermined sentence has at least one fingerprint associated therewith.
  • the indication may include an indication of the identity of the service employee, a time and/or date, a location in a store, or the like.
  • the system 1 includes a manager dashboard 16.
  • the manager dashboard can be a graphical user interface.
  • the manager dashboard 16 can present results included in the result database 10, statistics on the results and the like.
  • FIG. 3 shows a schematic representation of a system 1 for determining user exposure to audio fragments.
  • the system 1 includes a user device 2.
  • the system 1 may include a plurality of user devices 2.
  • the user device is triggered by GPS signal or a beacon or from the cloud.
  • the user device 2 may be triggered manually, e.g. by shaking the device 2 and/or by pressing a button.
  • the user device 2 is arranged to continuously listens to ambient sound, e.g. when the user device is connected to an external power source such as a wall outlet socket..
  • the user device 2 may be triggered on time basis.
  • the user device 2 After triggering the user device 2 records ambient sound.
  • the ambient sound can include television programming, radio programming, commercials in a cinema, commercials in a shop, or the like.
  • the user device determines at least one fingerprint for the recorded sound.
  • the audio fragment database 8 includes audio fragments related to commercials. In this example it is determined whether fragments of the recorded audio correspond to one or more of the audio fragments in the audio fragment database.
  • the user device 2 can include a Speech- To-Text, STT, unit arranged for converting recorded speech into text.
  • the STT unit transcribes the recorded speech to a text string.
  • the audio fragment database 8 can thereto includes text strings representative of the audio fragments.
  • the text strings here form the fingerprint representative of the audio fragments and the recorded speech. Comparing of the fingerprints can then include comparing of text string of the recorded speech with the text strings of the audio fragments in the database.
  • the indication may include an indication of the identity of the user and/or user device, a time and/or date, a location, or the like.
  • the trigger transmitter monitors the television channels on the basis of video recognition. It is noted that it is also possible that the trigger transmitter monitors the television channels on the basis of recognition of audio fragments. It is also possible that the trigger transmitter is arranged for detecting the onset of a commercial block in the programming.
  • the trigger database and the audio fragment database are one and the same.
  • the user device is triggered by the trigger transmitter.
  • trigger mechanisms such as the device being at a predetermined geolocation, the device identifying a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like.
  • the audio fragment database includes predetermined sentences. It will be appreciated that the audio fragment database may also include fingerprints associated with the predetermined sentences without actually including the predetermined sentences. It will be appreciated that it is also possible that the audio fragment database includes text sentences
  • Comparing of the fingerprints can then include comparing of text strings.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word 'comprising' does not exclude the presence of other features or steps than those listed in a claim.
  • the words 'a' and 'an' shall not be construed as limited to 'only one', but instead are used to mean 'at least one', and do not exclude a plurality.
  • the mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to an advantage.

Abstract

System for determining user exposure to audio fragments. The system includes a first database including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, a user device including a microphone, a triggering unit and a communications unit, wherein the user device is arranged for receiving a trigger, after having received the trigger recording ambient sound, determining at least one fingerprint for the recorded sound, and communicating with the first database for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound, and a second database, wherein the user device is arranged for communicating with the second database for storing therein data representative of the audio fragment having been identified.

Description

Title: System for determining user exposure to audio fragments
FIELD OF THE INVENTION
The invention relates to automated recognition of audio fragments.
BACKGROUND TO THE INVENTION
It is known to automatically recognize an audio fragment by determining one or more fingerprints of the audio fragment and comparing these with fingerprints of known audio fragments in a database. One of such technologies is described in US 6,990,453B2.
SUMMARY OF THE INVENTION
It is an object to provide more useful and/or interactive use of recognizing audio fragments and/or audiovisual fragments.
Thereto, according to an aspect is provided a system for determining user exposure to audio fragments. The system includes a first database including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith. The system includes a user device, such as a mobile user device. The user device can e.g. be a smart phone. The user device can include a microphone, a triggering unit and a communications unit. The user device is arranged for receiving a trigger. After having received the trigger the user device is arranged to record ambient sound. The user device is arranged to determine at least one fingerprint for the recorded sound. The user device communicates with the first database for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound. The system includes a second database. The user device is arranged for communicating with the second database for storing therein data representative of the identified audio fragment having been heard.
Hence, the user device can be triggered, e.g. externally or internally, to start listening to ambient sound and automatically checks for potential audio fragments included in the first database. It will be appreciated that the first database need not include the full audio fragments, it can also include data representative of the audio fragments, such as one or more fingerprints and e.g. an identification of the audio fragment. It will be appreciated that the first database can be internal to the user device and/or external to the user device, e.g. at an internet server. If it has been determined that the user device has listened to an audio fragment included in the first database, a note of this can be made in the second database. The note can include the identified audio fragment and/or an identification thereof. The note can include a time of the identified audio fragment having been heard. The note can include an identification of the user and/or user device. The note can include information relating to the circumstances under which the identified audio fragment was heard, e.g. noisy background, quiet background, a location of the user device (e.g. GPS) when the fragment was heard, an indication of movement of the user device while the fragment was heard, connection of the user device to other devices while the fragment was heard (e.g. WiFi of a home router, Bluetooth of a car), which app was active on the user device.
Optionally, the fingerprint includes one or more parameters representative of the audio fragment. The parameters can include a (relative) volume, pitch, and/or spectrum. The fingerprint can include a time evolution of the one or more parameters.
Optionally, the fingerprint includes a spectral slice fingerprint, a multi- slice fingerprint, an LPC coefficient, a cepstral coefficient, frequency components of a spectrogram peak, or a combination thereof. Optionally, the fingerprint includes a time evolution of a spectral slice fingerprint, a multi-slice fingerprint, an LPC coefficient, a cepstral coefficient, frequency components of a spectrogram peak, or a combination thereof.
Optionally, the fingerprint includes a transcription of a speech fragment included in the audio fragment. The transcription is representative for the speech fragment.
Optionally, the identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound includes comparing the at least one fingerprint of the recorded sound with the at least one fingerprint of each of the audio fragments of the plurality of audio fragments. The identifying can include selecting the audio fragment of the plurality of audio fragments of which the at least one fingerprint best matches the at least one fingerprint of the recorded sound.
The audio fragment can include an audiovisual fragment, such as a video fragment including an audio track. The audio fragment can e.g. be a commercial. The first database can e.g. include data representative of a plurality of commercials.
Optionally, the system includes a broadcasting unit for broadcasting one or more audio fragments of the plurality of audio fragments. The broadcasting unit can e.g. be a radio station, television station, internet server, environmental broadcast unit in a store or the like.
Optionally, the system includes a trigger transmitter arranged for transmitting the trigger to the user device. The trigger transmitter can e.g. be an internet server, a telecommunications server, a GPS satellite, a beacon or the like.
Optionally, the trigger transmitter is arranged for receiving a broadcast audio fragment, determining whether the received audio fragment corresponds to an audio fragment of the plurality of audio fragments and if the received audio fragment corresponds to an audio fragment of the plurality of audio fragments transmitting the trigger to the user device. Hence the trigger transmitter can, e.g. continuously, check whether an audio fragment of the plurality of audio fragments is being broadcast. Once the trigger transmitter determines that one such audio fragment is being broadcast, it can trigger the user device, or a plurality of user devices, to start listening whether the user device(s) hear the audio fragment as well. The trigger transmitter may e.g. monitor a radio channel or a television channel to check whether one or more commercials included in the first database are being broadcast. Once the trigger transmitter detects such commercial, it can trigger user devices to check whether they hear the commercial as well.
Optionally the user device includes a Speech-To-Text, STT, unit for converting the recorded sound to text. The first database may include text fragments associated with audio fragments via STT when the text fragments are spoken. The user device may record ambient sound and convert it into text using the STT unit. It can then be determined whether the recorded ambient sound includes one or more of the text fragments included in the first database. Optionally, the user device is arranged for receiving a trigger in the form of a transmission, a geolocation, a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like.
Optionally, the user device is arranged for in response to identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound directing a web navigation to a predetermined URL.
Optionally, the user device has a user account associated therewith, wherein in response to the user device storing in the second database the data representative of the identified audio fragment having been heard credits are assigned to the user account. It is for example possible that the user that heard a certain commercial (as determined by his user device) is eligible for a discount or cashback is provided.
According to an aspect is provided a user device for determining user exposure to audio fragments. The user device includes a microphone, a triggering unit and a communications unit. The user device is arranged for receiving a trigger. The user device is arranged for after having received the trigger recording ambient sound, determining at least one fingerprint for the recorded sound, and communicating with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound. The user device is arranged for communicating with a second database for storing therein data representative of the identified audio fragment having been heard. The user device can be a mobile user device, such as a smart phone having dedicated software, such as an app, installed and running thereon.
Optionally, the user device includes or is in communication with a Speech-To-Text unit, STT, for converting the recorded sound to text.
Optionally, the user device is arranged for receiving a trigger in the form of a transmission, a geolocation, a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like. Optionally, the user device is arranged for communicating with the second database for storing therein, in relation to the data representative of the identified audio fragment having been heard, one or more of a time of the identified audio fragment having been heard, an identification of the user and/or user device.
According to an aspect is provided a trigger transmitter is arranged for receiving a broadcast audio fragment, further arranged for communicating with a first database including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith. The trigger transmitter is arranged for determining whether the received audio fragment corresponds to an audio fragment of the plurality of audio fragments and if the received audio fragment corresponds to an audio fragment of the plurality of audio fragments transmitting a trigger to a user device.
According to an aspect is provided a method for determining user exposure to audio fragments using a user device. The method includes: receiving a trigger; after having received the trigger recording ambient sound; determining at least one fingerprint for the recorded sound; communicating with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound; and communicating with a second database for storing therein data representative of the identified audio fragment having been heard.
According to an aspect is provided a computer program product including software code portions which, when run on a programmable apparatus, cause the apparatus to be ready to receive a trigger; after having received the trigger record ambient sound; determine at least one fingerprint for the recorded sound; communicate with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound; and communicate with a second database for storing therein data representative of the identified audio fragment having been heard. The computer program product can e.g. be an app designed to be run on a mobile device such as a smart phone, tablet, or vehicle such as a car. The computer program product can be included on a non-transitory storage medium.
It will be appreciated that any of the aspects, features and options described in view of the system apply equally to the user device, trigger
transmitter, method and computer program product. It will also be clear that one or more of the above aspects, features and options can be combined.
BRIEF DESCRIPTION OF THE DRAWING
The invention will further be elucidated on the basis of exemplary embodiments which are represented in a drawing. The exemplary embodiments are given by way of non-limitative illustration. It is noted that the figures are only schematic representations of embodiments of the invention that are given by way of non-limiting example.
In the drawing:
Fig. 1 shows a schematic representation of a system for determining user exposure to audio fragments;
Fig. 2 shows a schematic representation of a system for determining user exposure to audio fragments, and
Fig. 3 shows a schematic representation of a system for determining user exposure to audio fragments.
DETAILED DESCRIPTION
Figure 1 shows a schematic representation of a system 1 for determining user exposure to audio fragments. The system 1 includes a user device 2. It will be appreciated that the system 1 may include a plurality of user devices 2. In this example, the system 1 further includes a broadcasting unit 4, here in the form of a television broadcasting station. In this example the broadcasting station broadcasts programming of a television channel. The programming includes audiovisual fragments. The audiovisual fragments may e.g. relate to programs, such as shows, movies, sports registrations, news programs and the like. Some of the audiovisual fragments may relate to commercials.
The system 1 includes a trigger transmitter 6. Here the trigger transmitter includes an internet server. The trigger transmitter 6 monitors television channel broadcasts by the broadcasting unit 4. Here the trigger transmitter 6 monitors television channel broadcasts by the broadcasting unit 4 continuously. The trigger transmitter 6 can monitor one or a plurality of television channels simultaneously. The trigger transmitter 6 is arranged for recognizing certain audiovisual fragments in the programming of the television channel. Here, the trigger transmitter 6 performs video recognition on the broadcasted videos. The trigger transmitter 6 has a trigger database associated therewith which includes a plurality of audiovisual fragments (or data representative thereof) which the trigger transmitter 6 is to recognize. Once the trigger transmitter recognizes one of the audiovisual fragments included in the trigger database being broadcast, it generates a trigger which is transmitted to the user devices 2. In this example the audiovisual fragments in the trigger database relate to commercials.
The user device 2 in this example runs an app causing it to activate upon receipt of a trigger from the trigger transmitter 6. It will be appreciated that the app may thereto remain active in the background of the user device. When activated the user device 2 starts listening. The user device 2 records ambient sound using a, e.g. built in, microphone. If the user device 2 is in the neighborhood of a television set tuned to any television channel monitored by the trigger transmitter 6 it may record an audio fragment associated with the television programming of the monitored channel. The user device 2 determines one or more fingerprints for the audio being recorded. The user device 2 communicates with an audio fragment database 8. The audio fragment database 8 includes data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith. In this example the audio fragments in the audio fragment database relate to commercials, e.g. to audio tracks of television commercials. The fingerprint(s) determined by the user device 2 is compared with fingerprints in the audio fragment database 8. This comparing can be done by the user device 2 and/or by an internet server. If a match is found, it is determined that the associated audio fragment in the audio fragment database 8 has been heard by the user device 2.
The user device 2 then communicates with a result database 10. The user device 2 transmits to the result database 10, for storing therein, an indication that the identified audio fragment has been heard. The indication may include data representative of the identified audio fragment. The indication can include a time of the identified audio fragment having been heard. The indication can include an identification of the user and/or user device. The indication can include information relating to the circumstances under which the identified audio fragment was heard, e.g. noisy background, quiet background, a location of the user device (e.g. GPS) when the fragment was heard, an indication of movement of the user device while the fragment was heard, connection of the user device to other devices while the fragment was heard (e.g. WiFi of a home router, Bluetooth of a car), or which app was active on the user device.
Thus, in this example the result database 10 includes data
representative of commercials having been heard by user devices 2. In response to the user device 2 hearing an audio fragment included in the audio fragment database actions can be taken. An internet browser on the user device 2 may automatically be directed to a predetermined URL, e.g. related to the commercial. An offer may be made to the user of the user device 2. A credit may be awarded to the user of the user device, such as a discount a store credit or a cashback.
After the user device 2 has been triggered, it may remain active, i.e. recording ambient sound for a predetermined time. It is also possible that the user device remains active as long as it recognizes audio fragments. It is also possible that the user device remains active for a predetermined time after having last recognized an audio fragment.
Figure 2 shows a schematic representation of a system 1 for determining user exposure to audio fragments. The system 1 includes a user device 2. It will be appreciated that the system 1 may include a plurality of user devices 2. In this example, the user device is triggered manually, e.g. by pressing a button on the device 2 or by touch interaction with a touch screen of the device 2. In this example the user device 2 is associated with a service employee 12. The service employee 12 may e.g. trigger the device 2 when approaching a customer 14.
After triggering the user device 2 records ambient sound. The ambient sound can include a conversation between the service employee 12 and the customer 14. The user device 2 determines fingerprints for fragments of the recorded audio. In this example the audio fragment database 8 includes
predetermined sentences the service employee 12 may speak to the customer 14 such as "would you like a cup of coffee?", "would you like to use our restaurant?", "do you know our special offer?", "do you already have coupons?", "cola", "coffee with cake", "coffee with sugar", or the like. Each predetermined sentence has at least one fingerprint associated therewith. In this example it is determined whether fragments of the recorded audio correspond to one or more of the predetermined sentences stored in database 10. If one of the predetermined sentences has been identified in the recorded audio, an indication thereof is stored in the result database 10. The indication may include an indication of the identity of the service employee, a time and/or date, a location in a store, or the like. In this example, the system 1 includes a manager dashboard 16. The manager dashboard can be a graphical user interface. The manager dashboard 16 can present results included in the result database 10, statistics on the results and the like.
Figure 3 shows a schematic representation of a system 1 for determining user exposure to audio fragments. The system 1 includes a user device 2. It will be appreciated that the system 1 may include a plurality of user devices 2. In this example, the user device is triggered by GPS signal or a beacon or from the cloud. Alternatively, or additionally, the user device 2 may be triggered manually, e.g. by shaking the device 2 and/or by pressing a button. It is also possible that the user device 2 is arranged to continuously listens to ambient sound, e.g. when the user device is connected to an external power source such as a wall outlet socket.. Alternatively, or additionally, the user device 2 may be triggered on time basis.
After triggering the user device 2 records ambient sound. The ambient sound can include television programming, radio programming, commercials in a cinema, commercials in a shop, or the like. The user device determines at least one fingerprint for the recorded sound. In this example the audio fragment database 8 includes audio fragments related to commercials. In this example it is determined whether fragments of the recorded audio correspond to one or more of the audio fragments in the audio fragment database.
Alternatively, or additionally, the user device 2 can include a Speech- To-Text, STT, unit arranged for converting recorded speech into text. The STT unit transcribes the recorded speech to a text string. The audio fragment database 8 can thereto includes text strings representative of the audio fragments. The text strings here form the fingerprint representative of the audio fragments and the recorded speech. Comparing of the fingerprints can then include comparing of text string of the recorded speech with the text strings of the audio fragments in the database.
If one of the audio fragments has been identified in the recorded audio, an indication thereof is stored in the result database 10. The indication may include an indication of the identity of the user and/or user device, a time and/or date, a location, or the like.
Herein, the invention is described with reference to specific examples of embodiments of the invention. It will, however, be evident that various
modifications and changes may be made therein, without departing from the essence of the invention. For the purpose of clarity and a concise description features are described herein as part of the same or separate embodiments, however, alternative embodiments having combinations of all or some of the features described in these separate embodiments are also envisaged.
In this examples the trigger transmitter monitors the television channels on the basis of video recognition. It is noted that it is also possible that the trigger transmitter monitors the television channels on the basis of recognition of audio fragments. It is also possible that the trigger transmitter is arranged for detecting the onset of a commercial block in the programming.
It is possible that the trigger database and the audio fragment database are one and the same.
In the example of Figure 1 the user device is triggered by the trigger transmitter. It will be appreciated that also alternative trigger mechanisms are possible such as the device being at a predetermined geolocation, the device identifying a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like.
In the example of Figure 2 the audio fragment database includes predetermined sentences. It will be appreciated that the audio fragment database may also include fingerprints associated with the predetermined sentences without actually including the predetermined sentences. It will be appreciated that it is also possible that the audio fragment database includes text sentences
representative of the spoken sentences, and that the user device uses the STT unit for converting recorded fragments of a conversation to text as described in view of Figure 3. Comparing of the fingerprints can then include comparing of text strings.
However, other modifications, variations, and alternatives are also possible. The specifications, drawings and examples are, accordingly, to be regarded in an illustrative sense rather than in a restrictive sense.
For the purpose of clarity and a concise description features are described herein as part of the same or separate embodiments, however, it will be appreciated that the scope of the invention may include embodiments having combinations of all or some of the features described.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word 'comprising' does not exclude the presence of other features or steps than those listed in a claim. Furthermore, the words 'a' and 'an' shall not be construed as limited to 'only one', but instead are used to mean 'at least one', and do not exclude a plurality. The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to an advantage.

Claims

Claims
1. System for determining user exposure to audio fragments, including:
a first database including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith;
- a user device including a microphone, a triggering unit and a
communications unit, wherein the user device is arranged for receiving a trigger, after having received the trigger recording ambient sound, determining at least one fingerprint for the recorded sound, and communicating with the first database for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound; and
a second database, wherein the user device is arranged for
communicating with the second database for storing therein data representative of the identified audio fragment having been heard.
2. System according to claim 1, further including a broadcasting unit for broadcasting one or more audio fragments of the plurality of audio fragments.
3. System according to claim 1 or 2, further including a trigger transmitter arranged for transmitting the trigger to the user device.
4. System according to claims 2 and 3, wherein the trigger transmitter is arranged for receiving a broadcast audio fragment, determining whether the received audio fragment corresponds to an audio fragment of the plurality of audio fragments and if the received audio fragment corresponds to an audio fragment of the plurality of audio fragments transmitting the trigger to the user device.
5. System according to any one of claims 1-4, wherein the audio fragment includes an audiovisual fragment.
6. System according to any one of claims 1-5, wherein the fingerprint includes one or more parameters representative of the audio fragment, such as a volume, a relative volume, a pitch, and/or a spectrum, and/or a time evolution of the one or more parameters.
7. System according to any one of claims 1-6, wherein the fingerprint includes a spectral slice fingerprint, a multi-slice fingerprint, an LPC coefficient, a cepstral coefficient, frequency components of a spectrogram peak, or a combination thereof, and/or a time evolution thereof.
8. System according to any one of claims 1-7, wherein the fingerprint includes a transcription of a speech fragment included in the audio fragment.
9. System according to any one of claims 1-8, wherein the user device includes a Speech-To-Text unit, STT, for converting the recorded sound to text.
10. System according to any one of claims 1-9, wherein the user device is arranged for receiving a trigger in the form of a transmission, a geolocation, a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like.
11. System according to any one of claims 1-10, wherein the user device is arranged for communicating with the second database for storing therein, in relation to the data representative of the identified audio fragment having been heard, one or more of a time of the identified audio fragment having been heard, an identification of the user and/or user device.
12. System according to any one of claims 1-11, wherein the user device has a user account associated therewith, wherein in response to the user device storing in the second database the data representative of the identified audio fragment having been heard credits are assigned to the user account.
13. User device for determining user exposure to audio fragments, including: a microphone, a triggering unit and a communications unit, wherein the user device is arranged for receiving a trigger, after having received the trigger recording ambient sound, determining at least one fingerprint for the recorded sound, and communicating with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound;
the user device being arranged for communicating with a second database for storing therein data representative of the identified audio fragment having been heard.
14. User device according to claims 13, including a Speech-To-Text unit,
STT, for converting the recorded sound to text.
15. User device according to claim 13 or 14, wherein the user device is arranged for receiving a trigger in the form of a transmission, a geolocation, a beacon, shaking of the user device, a predetermined time, a predetermined time interval, a keyword, a button activation, a touch activation, a cue sheet, or the like.
16. User device according to claim 13, 14 or 15, wherein the user device is arranged for communicating with the second database for storing therein, in relation to the data representative of the identified audio fragment having been heard, one or more of a time of the identified audio fragment having been heard, an identification of the user and/or user device.
17. Trigger transmitter is arranged for receiving a broadcast audio fragment, further arranged for communicating with a first database including data
representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith,
the trigger transmitter being arranged for determining whether the received audio fragment corresponds to an audio fragment of the plurality of audio fragments and if the received audio fragment corresponds to an audio fragment of the plurality of audio fragments transmitting a trigger to a user device.
18. Method for determining user exposure to audio fragments using a user device, including:
receiving a trigger,
after having received the trigger recording ambient sound,
- determining at least one fingerprint for the recorded sound,
communicating with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound; and
- communicating with a second database for storing therein data representative of the identified audio fragment having been heard.
19. Computer program product including software code portions which, when run on a programmable apparatus, cause the apparatus to:
be ready to receive a trigger,
after having received the trigger record ambient sound,
determine at least one fingerprint for the recorded sound,
communicate with a first database, including data representative of a plurality of audio fragments, each audio fragment having at least one fingerprint associated therewith, for identifying which audio fragment of the plurality of audio fragments is included in the recorded ambient sound; and
communicate with a second database for storing therein data representative of the identified audio fragment having been heard.
PCT/NL2017/050289 2016-05-09 2017-05-09 System for determining user exposure to audio fragments WO2017196169A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17727416.4A EP3455814A1 (en) 2016-05-09 2017-05-09 System for determining user exposure to audio fragments
US16/300,145 US20190146994A1 (en) 2016-05-09 2017-05-09 System for determining user exposure to audio fragments

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
NL2016742 2016-05-09
NL2016742A NL2016742B1 (en) 2016-05-09 2016-05-09 System for determining user exposure to audio fragments.

Publications (1)

Publication Number Publication Date
WO2017196169A1 true WO2017196169A1 (en) 2017-11-16

Family

ID=56889142

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/NL2017/050289 WO2017196169A1 (en) 2016-05-09 2017-05-09 System for determining user exposure to audio fragments

Country Status (4)

Country Link
US (1) US20190146994A1 (en)
EP (1) EP3455814A1 (en)
NL (1) NL2016742B1 (en)
WO (1) WO2017196169A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199387A1 (en) * 2000-07-31 2004-10-07 Wang Avery Li-Chun Method and system for purchasing pre-recorded music
US6990453B2 (en) 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
US20160092926A1 (en) * 2014-09-29 2016-03-31 Magix Ag System and method for effective monetization of product marketing in software applications via audio monitoring

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160378747A1 (en) * 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199387A1 (en) * 2000-07-31 2004-10-07 Wang Avery Li-Chun Method and system for purchasing pre-recorded music
US6990453B2 (en) 2000-07-31 2006-01-24 Landmark Digital Services Llc System and methods for recognizing sound and music signals in high noise and distortion
US20160092926A1 (en) * 2014-09-29 2016-03-31 Magix Ag System and method for effective monetization of product marketing in software applications via audio monitoring

Also Published As

Publication number Publication date
NL2016742B1 (en) 2017-11-16
EP3455814A1 (en) 2019-03-20
US20190146994A1 (en) 2019-05-16

Similar Documents

Publication Publication Date Title
US11024312B2 (en) Apparatus, system, and method for generating voice recognition guide by transmitting voice signal data to a voice recognition server which contains voice recognition guide information to send back to the voice recognition apparatus
CN106796496B (en) Display apparatus and method of operating the same
US10972799B2 (en) Media presentation device with voice command feature
US9524638B2 (en) Controlling mobile device based on sound identification
US20050114141A1 (en) Methods and apparatus for providing services using speech recognition
US8180277B2 (en) Smartphone for interactive radio
US20140195230A1 (en) Display apparatus and method for controlling the same
EP2674941A1 (en) Terminal apparatus and control method thereof
CN103517094A (en) Server and method of controlling the same
US20190238244A1 (en) Seamless Integration of Radio Broadcast Audio with Streaming Audio
US9330647B1 (en) Digital audio services to augment broadcast radio
JP6654718B2 (en) Speech recognition device, speech recognition method, and speech recognition program
NL2016742B1 (en) System for determining user exposure to audio fragments.
JP6517670B2 (en) Speech recognition apparatus, speech recognition method and speech recognition program
US8433570B2 (en) Method of recognizing speech
CN102497242B (en) The acquisition methods and system of radio equipment program list
US11050499B1 (en) Audience response collection and analysis
JP6721732B2 (en) Speech recognition device, speech recognition method, and speech recognition program
US20220415331A1 (en) Methods and apparatus for panelist-based logins using voice commands
US11641505B1 (en) Speaker-identification model for controlling operation of a media player
WO2020009038A1 (en) Broadcast system, terminal device, broadcast method, terminal device operation method, and program
CN113228166A (en) Command control device, control method, and nonvolatile storage medium
JP2020088672A (en) Listener authentication system

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17727416

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017727416

Country of ref document: EP

Effective date: 20181210