GB2616512A - Cloud service platform system for speakers - Google Patents

Cloud service platform system for speakers Download PDF

Info

Publication number
GB2616512A
GB2616512A GB2300697.6A GB202300697A GB2616512A GB 2616512 A GB2616512 A GB 2616512A GB 202300697 A GB202300697 A GB 202300697A GB 2616512 A GB2616512 A GB 2616512A
Authority
GB
United Kingdom
Prior art keywords
speakers
retrieval matching
value
speech feature
target image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
GB2300697.6A
Other versions
GB202300697D0 (en
Inventor
Zhang Xuejun
Li Bin
Zeng Hongjie
Xu Xianfu
Zhang Susu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi University
Original Assignee
Guangxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University filed Critical Guangxi University
Publication of GB202300697D0 publication Critical patent/GB202300697D0/en
Publication of GB2616512A publication Critical patent/GB2616512A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/165Management of the audio stream, e.g. setting of volume, audio stream path
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • G06F2218/10Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A cloud service platform system for smart speakers comprising a speech input module, network connection and player receives positioning data from working speakers, marks two speakers within a threshold distance of each other as “suspected same group” and sends a “suspected same group acknowledgment message” to the corresponding speakers. If the feedback result is “yes”, data for the same group of speakers is unified and transmitted to any speaker within the group, and any speaker transmits data within the group. If a “NO” result is fed back, play data is transmitted successively according to a weight priority and volume is controlled. This allows a single application to manage conflicts between nearby speakers.

Description

CLOUD SERVICE PLATFORM SYSTEM FOR SPEAKERS
TECHNICAL FIELD
[0001] The present invention relates to the field of cloud service for speakers, and more particularly to a cloud service platform system for speakers.
BACKGROUND
[0002] Smart speaker is an upgraded product of a speaker, by which a household consumer can surf the Internet via speech, e.g., requesting a song, doing online shopping, or learning about a weather forecast. A smart speaker may also be used to control smart home devices, such as opening curtains, setting a refrigerator temperature, warming a water heater in advance. Baidu released its first own-brand smart speaker "Xiaodu Smart Speaker" in Beijing on June 11, 2018. Baidu's artificial intelligence (Al) assistant "Xiaodu Smart Speaker Donkey Kong" was released on Xiaodu Store on June 1, 2019. Huavvei SoundX smart speaker developed jointly by Huawei and Devialet was released officially on November 25 in the same year.
[0003] Existing cloud service platform systems have the technical problems of a single application and failing to effectively regulating a relationship between speakers. The present invention can solve the problems by providing a cloud service platform system for speakers.
SUMMARY
[0004] The technical problems to be solved in the present invention are the technical problems of a single application and failing to effectively regulating a relationship between speakers in the prior art. There is provided a novel cloud service platform system for speakers, which has the characteristics of multiple purposes and the capability to effectively deal with a conflict and a relationship between speakers.
[0005] To solve the above-mentioned technical problems, the following technical solutions are adopted.
[0006] A cloud service platform system for speakers is provided, where the speakers each includes a speech input module, a network connection unit and a player; and the cloud service platform system for the speakers includes a cloud server, and a network connection unit for linking the cloud server and the speakers. The speakers each are provided with a positioning detection unit, and the cloud server receives data from the positioning detection unit in real time; and the cloud server performs the following steps of detecting a conflict between the speakers: [0007] step 1, receiving positioning data from speakers in a working state; [0008] step 2, determining states of the speakers in operation based on the positioning data, and if a positioning distance between two speakers is smaller than a predefined threshold, marking corresponding speakers as "a suspected same group" and sending a "suspected same group acknowledgment message" to the corresponding speakers; and [0009] step 3, receiving a feedback result for the "suspected same group acknowledgment message"; if the result is "yes", unifying data of the same group of speakers, transmitting the unified data to any speaker in the same group of speakers and controlling any speaker to transmit data within the same group; and if the result is "NO", transmitting play data successively according to a weight priority and controlling the volume.
[0010] The working principle in the present invention is as follows: the present invention uses the positioning information of the speakers as a basis for determining whether the speakers are in the "suspected same group", and regulates effectively a relationship between the speakers that are possibly associated or have a conflict therebetween according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications.
[0011] As an optimization to the above solution, further, the weight priority is determined by the cloud server through the following steps: [0012] step 1.1, determining networking starting time of the speakers, with an earlier time corresponding to a higher priority; and [0013] step 1.2, determining self-check states of the speakers, with a better self-check state corresponding to a higher priority.
[0014] Further, the cloud server is also capable of invoking other applications on the Internet according to instructions from the speakers.
[0015] Further, the cloud server further receives speech control signals from the speakers to perform speech recognition, including the following steps: [0016] step 1: creating a historical speech feature map library, where a historical speech feature map is created by extracting features from statement speech input in advance or recorded as history and drawing a statement speech feature map including word, phrase and sentence feature maps; [0017] step II: extracting features from statement speech acquired by the speakers in real time, and drawing a target statement speech feature map; selecting and defining any statement speech feature map in the historical speech feature map library as a reference image, and defining the target statement speech feature map as a target image; [0018] step III: binarizing the target image IC, and defining that a value of 1 indicates having a speech feature and that a value of 0 indicates having no speech feature; meshing the binarized feature map into a grid map, defining a first point (xl, yl) of the grid map as an origin, defining a retrieval matching stride as L. and performing retrieval from the origin in x direction; if a point having the value of 1 is retrieved, recording a position and the value of the point and numbering the point in order; otherwise, continuing retrieval matching; [0019] step IV: updating point (xl, yl+N*L) as the origin, performing step III again until the retrieval matching in the x direction and y direction is completed, thereby completing initial positioning retrieval matching, where N is an integer, and L is a constant; [0020] step V: successively extracting points having the value of 1, updating the current extracted point having the value of 1 as the origin, updating the retrieval matching stride to L/2, performing the retrieval matching successively in the x direction without. performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond a range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the y direction, performing step VI; otherwise, performing step VII; [0021] step VI: performing the retrieval matching successively in the y direction while keeping the retrieval matching stride of L/2 unchanged, performing the retrieval matching successively in the y direction without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching snide when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the x direction, performing step V; otherwise, performing step VII; [0022] step VII: ending the retrieval matching until no new point needs to be subjected to the retrieval matching, and defining a region with the points having the value of 1 obtained by the retrieval matching as an effective target image; [0023] step VIII: performing a search matching analysis on the effective target image in the historical speech feature map library; and [0024] step 1X: invoking a corresponding strategy according to a recognition result.
[0025] Further, step VIII further includes image correction, which includes the following steps: [0026] step a: defining the effective target image as I, and selecting and defining any reference image in the historical speech feature map library as Ic; [0027] step b: defining an association relationship between the reference image lc and the target image qvi obtained by polar coordinate transformation: (r, yo) = (azr, -cpz), where a is a scale offset parameter, and (pz is a rotation offset parameter; [0028] step c: calculating, in the radial direction in a polar coordinate system, a projectionKc(i) of the reference image r: Kc(i) = f2i Zinn /PC (i, j), and a projection Kr (0 of the target image l: K7 (J) = n1 X21 I Pm (i, j), taking logarithms of Kc (i) and K7(i) to obtain LICc(i) and LK140), and using a translational difference between LICc(i) and LK:f(i) as the scale offset parameter az, wherein IP(i, j) = I (Kmaz + Ki sin (2=), Kmax Ki cos( 2=Ti)), i=1, 2, ..., nr, =1, 2, ...7-7;; flp flp 12i = ticp/n,p; and 771 =12/0 1) -firni(1 1)]. 172 = 1- i; IJ IJ ij where ñ is the number of samples in an angular direction when Ki =Kmax; IR) represents a maximum integer which is less than or equal to the value within the bracket; the target image has a size of 2K.,x2Kmax; nr=Kmax, representing the number of samples in the radial direction; and nr8K1, representing the number of samples in the angular direction; [0029] step d: calculating projections of the reference image r and the target image qvi in the radial direction and the angular direction according to the scale offset parameter in step c: i ----c -[nil 1Pc (i,ce(1))+77(2i1Pc cc (L))1, a, > 1 of ni and c 0, az < 1 n1.1 pm ( C-1 1 0, = vnr.azigii z ce -)) 77; and + 3,f pm (.
a, ij z cc a, < Om, az > 1 -fin performing a normalization calculation on B and 0 z to obtain a translation amount of the highest point, and calculating the rotation offset parameter yo, according to = 2n-dhcp, where cc() represents a minimum integer which is greater than or equal to the value within the bracket; [0030] step c: putting the rotation offset parameter (pz and the scale offset parameter a, to step A to correct the target image, and calculating, as the center point of the target image, a posi ion point \I IT corresponding to the minimum of E by cz = [ E.(/' 0 (i) -0c (i -d)] thereby i=1 z completing image correction.
[0031] Further, the search matching analysis in step VIII further includes the following steps: [0032] step A: making concentric circles with the center point of the target image ITas the center to divide the speech feature image into B annular regions, and finally, dividing each annular region into K sectors, K and B both being predefined constants; [0033] step B: calculating, as Codel, a sector speech feature value Vsqo of each sector Ssq: vs,10 = ,*,(Ensq I I Fsqo (x, y) -Psqo [0034] where Fsq0(x,y) represents a gray value of each pixel of the sector Ssq; No represents an average value of gray values of pixels in the sector Ssq; nsq represents the number in the annular region Ssq; 0<sq<BxK-1, 0={0°, (360°/K), 2*(3609K), 3*(3607K),...<180°}; [0035] step C: rotating the speech feature image (1809K), repeating step B, and extracting a sector speech feature value Vsqe of each sector Ssq as Code2; [0036] step E: rotating Codel and Code2 Rx(3607K) (R=0, 1, 2, ..., K-1) to obtain Code 1' and Code2', respectively: and [0037] step F: inputting Codel and Code2, and Code 1' and Code2' in step E to the historical speech feature map library for matching.
[0038] The present invention has the following beneficial effects: the present invention uses the positioning information of the speakers as a basis for determining whether the speakers are in the "suspected same group", and a relationship between the speakers that are possibly associated or have a conflict therebetween is then regulated effectively according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications. Feature recognition of speech is converted to global recognition of a feature map so that higher recognition efficiency can be achieved. The accuracy and efficiency of control can be improved by performing correction and positioning processing on a feature image.
BRIEF DESCRIPTION OF THE DRAWINGS
[0039] The present invention will be further described Mow in conjunction with the accompanying drawings and an example.
[0040] FIG. 1 is a flowchart of detection of a conflict between the speakers.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0041] To make the objective, technical solutions and advantages of the present invention clearer, the present invention will be further described below in detail below with reference to the embodiments. It will be understood that the specific example described herein is merely used to explain, rather than limit, the present invention.
Example 1
[0042] This example provides a cloud service platform system for speakers, where the speaker includes a speech input module, a network connection unit and a player; and the cloud service platform system for speakers includes a cloud server, and a network connection unit for linking the cloud server and the speakers. The speakers each are provided with a positioning detection unit, and the cloud server receives data from the positioning detection unit in real time; and the cloud server performs the following steps of detecting a conflict between the speakers: [0043] step 1: receiving positioning data from speakers in a working state; [0044] step 2: deteimining states of the speakers in operation based on the positioning data, and if a positioning distance between two speakers is smaller than a predefined threshold, marking corresponding speakers as "a suspected same group" and sending a "suspected same group acknowledgment message" to the corresponding speakers; and [0045] step 3: receiving a feedback result for the "suspected same group acknowledgment message"; if the result is "yes", unifying data of the speakers in the same group, transmitting the unified data to any speaker in the same group and controlling any speaker to transmit data within the same group; and if the result is "NO", transmitting play data successively according to a weight priority and controlling the volume.
[0046] In this example, the positioning information of the speakers is used as a basis for determining whether the speakers are in the "suspected same group", and a relationship between the speakers that are possibly associated or have a conflict therebetween is then regulated effectively according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications.
[0047] Specifically, the weight priority is determined by the cloud server through the following steps: [0048] step 1.1: determining networking starting time of the speakers, with an earlier time corresponding to a higher priority; and [0049] step 1.2: determining self-check states of the speakers, with a better self-check state corresponding to a higher priority.
[0050] Specifically, the cloud server is also capable of invoking other applications on the Internet according to instructions from the speakers.
[0051] In one embodiment, the cloud speaker further receives speech control signals from the speakers to perform speech recognition, including the following steps: [0052] step I: creating a historical speech feature map library, where a historical speech feature map is created by extracting features from statement speech input in advance or recorded as history and drawing a statement speech feature map including word, phrase and sentence feature maps; [0053] step II: extracting features from statement speech acquired by the speakers in real time, and drawing a target statement speech feature map; selecting and defining any statement speech feature map in the historical speech feature map library as a reference image, and defining the target statement speech feature map as a target image; [0054] step III: binarizing the target image lc, and defining that a value of 1 indicates having a speech feature and that a value of 0 indicates having no speech feature: meshing the binarized feature map into a arid map, defining a first point (xi, yl) of the grid map as the origin, defining a retrieval matching stride as L, and performing retrieval from the origin along x direction; if a point having the value of 1 is retrieved, recording a position and the value of the point and numbering the point in order; otherwise, continuing retrieval matching; [0055] step IV: updating point (xl, yl+N*L) as the origin, performing step III again until the retrieval matching in x direction and y direction is completed, thereby completing initial positioning retrieval matching, where N is an integer, and L is a constant; [0056] step V: successively extracting points having the value of 1, updating the current extracted point having the value of 1 as the origin, updating the retrieval matching stride to L/2, performing the retrieval matching successively in the x direction without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the y direction, and performing step VI; otherwise, performing step VII; [0057] step VI: performing the retrieval matching successively in the y direction while keeping the retrieval matching stride of L/2 unchanged and without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the x direction, and performing step V; otherwise, performing step VIE; [0058] step VII, ending the retrieval matching until no new point needs to be subjected to the retrieval matching, and defining a mgion with the points having the value of 1 obtained by the retrieval matching as an effective target image; [0059] step VILE: performing a search matching analysis on the effective target image in the historical speech feature map library; and [0060] step DC, invoking a corresponding strategy according to a recognition result.
[0061] In one embodiment, step VIII further includes image correction, which includes the following steps: [0062] step a: defining the effective target image as in and selecting and defining any reference image in the historical speech feature map library as Ic; [0063] step b: defining an association relationship between the reference image lc and the target image ly obtained by polar coordinate transformation as follows: 1(r, p) = ic(azr, -coz).
where az is a scale offset parameter, and yoz is a rotation offset parameter; [0064] step c: calculating, in the radial direction in a polar coordinate system, a projection Kc (i) of the reference image Ic: (i) = 12 * Eng' I Pc (i, j), and a projection Kf (i) of the target j=1 image 01: (i) = fli flyi 1 Pm (i, j) , taking logarithms of Kc(i) and K(i)to obtain LKc(i) and LKT(i), and taking a translational difference between LICe(i) and LKT(J0 as the scale offset parameter az, where /P(ii) = /(Kmax + K sin -27d K -max K cos (-27i)), i= 1, 2, ..., nr, j=1, 2, _ft; ; 71(p 71(p a, = 17 = flj-1)-flini(i -1)1,77.= 1-where ii."; is the number of samples in an angular direction when K=K,""; 110 represents a maximum integer which is less than or equal to the value within the bracket; the target image has a size of 21C,""ax2K,",z; n,=K",",, representing the number of samples in the radial direction; and rk=8K1, representing the number of samples in the angular direction; [0065] step d: calculating projections of the reference image lc and the target image qvf in the radial direction and the angular direction according to the scale offset parameter in step c: If-fl i,ce(L) ,a,>1 Di ni and _ThrazkiiP z (ice (I-1T)) + I Pr cc (L) , az < 1 1 I m * 12i CIZ = ----c M and performing a normalization calculation on 0 and 0 z to obtain a translation amount d of the highest point, and calculating the rotation offset parameter yoz according to (I), = 27rdtc, where ce() represents a minimum integer which is greater than or equal to the value within the bracket; [0066] step e: putting the rotation offset parameter yo, and the scale offset parameter az to step A to correct the target image, and calculating a position point Pzm corresponding to the minimum of II ( p- M j ----- , + E,, A by cz = Ei=i [ 0 z (0 -0c0 -d)1 as the center point of the target image, thereby completing image correction.
[0067] In one embodiment, the search matching analysis in step VIII further includes the following steps: [0068] step A: making concentric circles with the center point of the target image Ins the center to divide the speech feature image into B annular regions, and finally, dividing each annular region into K sectors, where K and B both are predefined constants; [0069] step B: calculating a sector speech feature value Vsqe of each sector Ssq as Code I: Vsqo = 117' (Ensq I I Fsqo (x, y) -Pscio F F), where Fsqe(x,y) represents a gray value of each pixel of the sector Ssq; Kilo represents an average value of gray values of pixels in the sector Ssq; mg represents the number in the annular region Ssq; 0<sq<BxK-1, 0=(0°, (360°/K), 2*(360°/K), 3*(3607K), ...<1801; [0070] step C: rotating the speech feature image (1807K), repeating step B, and extracting a sector speech feature value Vsge of each sector Ssq as Code2; [0071] step E: rotating Codel and Code2 Rx(360°/K) (R=0, 1, 2, ..., K-1) to obtain Code l' and Code2', respectively; and [0072] step F: inputting Codel and Code2, and Code I' and Code2' in step E to the historical speech feature map library for matching.
[0073] In this example, the positioning information of the speakers is used as a basis for determining whether the speakers are in the "suspected same group", and a relationship between the speakers that are possibly associated or have a conflict therebetween is then regulated effectively according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications. Feature recognition of speech is converted to global recognition of a feature map so that higher recognition efficiency can be achieved. The accuracy and efficiency of control can be improved by performing correction and positioning processing on a feature image.
[0074] While the illustrative specific embodiment of the present invention is described above so that those skilled in the art can understand the present invention, the present invention is not limited to the scope of the specific embodiment. For those skilled in the art, all invention-creations using the concept of the present invention shall fall within the protection scope of the present invention as long as various alterations are made within the spirit and scope of the present invention defined and determined by the appended claims.

Claims (6)

  1. CLAIMSI. A cloud service platform system for speakers, the speakers each comprising a speech input module, a network connection unit and a player, the cloud set-vice platform system for speakers comprising a cloud server, and a network connection unit for linking the cloud set-vet-and the speakers, wherein the speakers each are provided with a positioning detection unit, and the cloud server receives data from the positioning detection unit in real time; and the cloud server performs the following steps of detecting a conflict between the speakers: step 1: receiving positioning data from speakers in a working state; step 2: determining states of the speakers in operation based on the positioning data, and if a positioning distance between two speakers is smaller than a predefined threshold, marking corresponding speakers as "a suspected same group" and sending a "suspected same group acknowledgment message" to the corresponding speakers; and step 3: receiving a feedback result for the "suspected same group acknowledgment message"; when the result is "yes", unifying data of a same group of speakers, transmitting the unified data to any speaker in the same group of speakers and controlling any speaker to transmit data within the same group; and when the result is "NO", transmitting play data successively according to a weight priority and controlling volume.
  2. 2. The cloud service platform system for the speakers according to claim 1, wherein the weight priority is determined by the cloud server through the following steps: step 1.1: determining networking starting time of the speakers, with an earlier time corresponding to a higher priority; and step 1.2: determining self-check states of the speakers, with a better self-check state corresponding to a higher priority.
  3. 3. The cloud service platform system for the speakers according to claim 1, wherein the cloud server is also capable of invoking other applications on the Internet according to instructions from the speakers.
  4. 4. The cloud service platform system for the speakers according to any one of claims 1 to 3, wherein the cloud server further receives speech control signals from the speakers to perform speech recognition, comprising: step I: creating a historical speech feature map library, wherein a historical speech feature mapIIis created by extracting features from statement speech input in advance or recorded as history and drawing a statement speech feature map comprising word, phrase and sentence feature maps; step II: extracting features from statement speech acquired by the speakers in real time, and drawing a target statement speech feature map; selecting and defining any statement speech feature map in the historical speech feature map library as a reference image, and defining the target statement speech feature map as a target image; step III: binarizing the target image lc, and defining that a value of 1 indicates having a speech feature and that a value of 0 indicates having no speech feature; meshing the binarized feature map into a grid map, defining a first point (xi, yi) of the grid map as an origin, defining a retrieval matching stride as L, and performing retrieval from the origin in x direction; when a point having the value of 1 is retrieved, recording a position and the value of the point and numbering the point in order; when a point having the value of 0 is retrieved, continuing retrieval matching; step IV: updating point (xi, y 1+N*L) as the origin, perfomting step HI again until the retrieval matching in an x direction and a y direction is completed, thereby completing initial positioning retrieval matching, wherein N is an integer, and L is a constant; step V: successively extracting points having the value of 1, updating a current extracted point having the value of 1 as the origin, updating the retrieval matching stride to L/2, performing the retrieval matching successively in the x direction without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond a range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the y direction, performing step VI; when no new point having the value of 1 appears, performing step VII; step VI: performing the retrieval matching successively in the y direction with keeping the retrieval matching stride of L/2 unchanged and without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of I appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the x direction, performing step V; when no new point having the value of 1 appears, performing step VII; step VII: ending the retrieval matching until no new point needs to be subjected to the retrieval matching, and defining a region with the points having the value of 1 obtained by the retrieval matching as an effective target image; step VIII: performing a search matching analysis on the effective target image in the historical speech feature map library; and step IX: invoking a corresponding strategy according to a recognition result.
  5. 5. The cloud service platform system for the speakers according to claim 4, wherein step VIII further comprises image correction comprising the following steps: step a: defining the effective target image as 1111, and selecting and defining any reference image in the historical speech feature map library as lc; step b: defining an association relationship between the reference image lc and the target image ql obtained by polar coordinate transformation: IT (r, co) = (azr,(p - wherein a, is a scale offset parameter, and yoz is a rotation offset parameter; step c: calculating, in the radial direction in a polar coordinate system, a projection KC(i) of the reference image Ic: Kc(i) = 12i E jTh.C°1 /PC (Li), and a projection 1<7 0) of the target image qvi: Kr(0,.(21f1filp"(i,j), taking logarithms of Kc (i) and Kf (i) to obtain LKc(i) and LKII (i), and using a translational difference between LICc(i) and LKI1 (i) as the scale offset parameter a,, wherein IP(i, j) = 1(Kmax Ki sin ( Kmax ± IC1 cos ( )) =1, 2, j=1, 2, h7; 11,p 11q) fl = ti;incp; and = niU 1) -Plat(' -1)1,77 =1- ; &II wherein 71; is a number of samples in an angular direction when Ki=Kmax; fl() represents a maxi mum integer which is less than or equal to the value within the bracket; the target image has a siz c of 2K,1""x2Kmax; nr=K""", representing a number of samples in the radial direction; and ticp=8Ki, representing a number of samples in the angular direction; step d: calculating projections of the reference image lc and the target image 1111 in the radial direction and the angular direction according to the scale offset parameter in step c: ----c a. [ri,1--1 Pc (i, ce (-1))+ ilij1Pc(L))i, a, > 1 0 = y"z i 121 ni and 0, a, < I 74 = En [raz nii/Pli i, ce Ca7) + 777i ipr i, ce (17) z i=i m 0, az > 1 " and performingperforming a normalization calculation on 0 I' and 0 z to obtain a translation amount d of the highest point, and calculating the rotation offset parameter yo, according to yoz = 27rd/ic, wherein ce() represents a minimum integer which is greater than or equal to the value within a bracket; step e: putting the rotation offset parameter yoz and the scale offset parameter a, to step A to correct the target image, and calculating, as a center point of the target image, a position point PP ic -----M.\I -----corresponding to a minimum of E z by E ' , = EL. p [o0.) z -®C 0:-cid, thereby completing image correction.
  6. 6. The cloud service platform system for the speakers according to claim 4, wherein the search matching analysis in step VIII further comprises the following steps: step A: making concentric circles with a center point of the target image Was the center to divide the speech feature image into B annular regions, and finally, dividing each annular region into K sectors, K and B both being predefined constants; step B: calculating, as Codel, a sector speech feature value Vse of each sector Ssq: Vsge= -nsq (Ensq I I Fsqo y) -Pso F I); wherein Fsqq(x,y) represents a gray value of each pixel of the sector Ssq; 13,0 represents an average value of gray values of pixels in the sector Ssq; list, represents a number in the annular region Ssq; 0< sq<BxK-1, O=(0°, (360Q/K), 2*(360c/K), 3*(3607K),...<1801; step C: rotating the speech feature image (180°/K), repeating step B, and extracting a sector speech feature value Vsq0 of each sector Sq. as Code2; step E: rotating Codel and Code2 Rx(360c/K) (R=0, 1, 2, ..., K-1) to obtain Codel' and Code2', respectively; and step F: inputting Codel and Code2, and Codel' and Code2' in step E to the historical speech feature map library for matching.
GB2300697.6A 2022-01-26 2023-01-17 Cloud service platform system for speakers Pending GB2616512A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210095507.4A CN114550715A (en) 2022-01-26 2022-01-26 Sound box cloud service platform system

Publications (2)

Publication Number Publication Date
GB202300697D0 GB202300697D0 (en) 2023-03-01
GB2616512A true GB2616512A (en) 2023-09-13

Family

ID=81672941

Family Applications (1)

Application Number Title Priority Date Filing Date
GB2300697.6A Pending GB2616512A (en) 2022-01-26 2023-01-17 Cloud service platform system for speakers

Country Status (2)

Country Link
CN (1) CN114550715A (en)
GB (1) GB2616512A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9898250B1 (en) * 2016-02-12 2018-02-20 Amazon Technologies, Inc. Controlling distributed audio outputs to enable voice output
EP3541094A1 (en) * 2018-03-15 2019-09-18 Harman International Industries, Incorporated Smart speakers with cloud equalizer
US20210029452A1 (en) * 2019-07-22 2021-01-28 Apple Inc. Modifying and Transferring Audio Between Devices

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9898250B1 (en) * 2016-02-12 2018-02-20 Amazon Technologies, Inc. Controlling distributed audio outputs to enable voice output
EP3541094A1 (en) * 2018-03-15 2019-09-18 Harman International Industries, Incorporated Smart speakers with cloud equalizer
US20210029452A1 (en) * 2019-07-22 2021-01-28 Apple Inc. Modifying and Transferring Audio Between Devices

Also Published As

Publication number Publication date
GB202300697D0 (en) 2023-03-01
CN114550715A (en) 2022-05-27

Similar Documents

Publication Publication Date Title
Zhang et al. Distributed dynamic map fusion via federated learning for intelligent networked vehicles
KR102611454B1 (en) Storage device for decentralized machine learning and machine learning method thereof
US20150005901A1 (en) Iterative learning for reliable sensor sourcing systems
KR101109379B1 (en) Network fingerprinting
JP2022084814A (en) Distributed data collection method in wireless sensor network in which first node can publish itself or sensor data as collector to another node
JP5844920B2 (en) Image location method and system based on navigation function of mobile terminal
WO2019128355A1 (en) Method and device for determining accurate geographic location
CN111935820B (en) Positioning implementation method based on wireless network and related equipment
US9640074B2 (en) Permissions-based tracking of vehicle positions and arrival times
CN111353106A (en) Recommendation method and device, electronic equipment and storage medium
CN112860811A (en) Method and device for determining data blood relationship, electronic equipment and storage medium
CN113075648A (en) Clustering and filtering method for unmanned cluster target positioning information
GB2616512A (en) Cloud service platform system for speakers
WO2023226448A1 (en) Method and apparatus for generating logistics point-of-interest information, and device and computer-readable medium
WO2022068558A1 (en) Map data transmission method and apparatus
KR20220080051A (en) Method and apparatus for map query and electronic device
CN112380314B (en) Road network information processing method and device, storage medium and electronic equipment
US20190347807A1 (en) Information processing apparatus, data collection method, and data collection system
Faheem et al. Indexing in wot to locate indoor things
KR20210030136A (en) Apparatus and method for generating vehicle data, and vehicle system
Ye et al. Federated Learning-Enabled Cooperative Localization in Multi-agent System
CN114820955B (en) Symmetric plane completion method, device, equipment and storage medium
CN115242704B (en) Network topology data updating method and device and electronic equipment
JP2019128611A (en) Generation apparatus, generation method, and generation program
CN113627561B (en) Data fusion method and device, electronic equipment and storage medium