GB2616512A - Cloud service platform system for speakers - Google Patents
Cloud service platform system for speakers Download PDFInfo
- Publication number
- GB2616512A GB2616512A GB2300697.6A GB202300697A GB2616512A GB 2616512 A GB2616512 A GB 2616512A GB 202300697 A GB202300697 A GB 202300697A GB 2616512 A GB2616512 A GB 2616512A
- Authority
- GB
- United Kingdom
- Prior art keywords
- speakers
- retrieval matching
- value
- speech feature
- target image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000003702 image correction Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000006855 networking Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000001105 regulatory effect Effects 0.000 description 5
- 230000001276 controlling effect Effects 0.000 description 4
- 241000283074 Equus asinus Species 0.000 description 1
- 241001674048 Phthiraptera Species 0.000 description 1
- ATJFFYVFTNAWJD-UHFFFAOYSA-N Tin Chemical compound [Sn] ATJFFYVFTNAWJD-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 229910052718 tin Inorganic materials 0.000 description 1
- 238000010792 warming Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
- G06F16/635—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/165—Management of the audio stream, e.g. setting of volume, audio stream path
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/34—Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/61—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/52—Network services specially adapted for the location of the user terminal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
- G06F2218/10—Feature extraction by analysing the shape of a waveform, e.g. extracting parameters relating to peaks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Theoretical Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Circuit For Audible Band Transducer (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A cloud service platform system for smart speakers comprising a speech input module, network connection and player receives positioning data from working speakers, marks two speakers within a threshold distance of each other as “suspected same group” and sends a “suspected same group acknowledgment message” to the corresponding speakers. If the feedback result is “yes”, data for the same group of speakers is unified and transmitted to any speaker within the group, and any speaker transmits data within the group. If a “NO” result is fed back, play data is transmitted successively according to a weight priority and volume is controlled. This allows a single application to manage conflicts between nearby speakers.
Description
CLOUD SERVICE PLATFORM SYSTEM FOR SPEAKERS
TECHNICAL FIELD
[0001] The present invention relates to the field of cloud service for speakers, and more particularly to a cloud service platform system for speakers.
BACKGROUND
[0002] Smart speaker is an upgraded product of a speaker, by which a household consumer can surf the Internet via speech, e.g., requesting a song, doing online shopping, or learning about a weather forecast. A smart speaker may also be used to control smart home devices, such as opening curtains, setting a refrigerator temperature, warming a water heater in advance. Baidu released its first own-brand smart speaker "Xiaodu Smart Speaker" in Beijing on June 11, 2018. Baidu's artificial intelligence (Al) assistant "Xiaodu Smart Speaker Donkey Kong" was released on Xiaodu Store on June 1, 2019. Huavvei SoundX smart speaker developed jointly by Huawei and Devialet was released officially on November 25 in the same year.
[0003] Existing cloud service platform systems have the technical problems of a single application and failing to effectively regulating a relationship between speakers. The present invention can solve the problems by providing a cloud service platform system for speakers.
SUMMARY
[0004] The technical problems to be solved in the present invention are the technical problems of a single application and failing to effectively regulating a relationship between speakers in the prior art. There is provided a novel cloud service platform system for speakers, which has the characteristics of multiple purposes and the capability to effectively deal with a conflict and a relationship between speakers.
[0005] To solve the above-mentioned technical problems, the following technical solutions are adopted.
[0006] A cloud service platform system for speakers is provided, where the speakers each includes a speech input module, a network connection unit and a player; and the cloud service platform system for the speakers includes a cloud server, and a network connection unit for linking the cloud server and the speakers. The speakers each are provided with a positioning detection unit, and the cloud server receives data from the positioning detection unit in real time; and the cloud server performs the following steps of detecting a conflict between the speakers: [0007] step 1, receiving positioning data from speakers in a working state; [0008] step 2, determining states of the speakers in operation based on the positioning data, and if a positioning distance between two speakers is smaller than a predefined threshold, marking corresponding speakers as "a suspected same group" and sending a "suspected same group acknowledgment message" to the corresponding speakers; and [0009] step 3, receiving a feedback result for the "suspected same group acknowledgment message"; if the result is "yes", unifying data of the same group of speakers, transmitting the unified data to any speaker in the same group of speakers and controlling any speaker to transmit data within the same group; and if the result is "NO", transmitting play data successively according to a weight priority and controlling the volume.
[0010] The working principle in the present invention is as follows: the present invention uses the positioning information of the speakers as a basis for determining whether the speakers are in the "suspected same group", and regulates effectively a relationship between the speakers that are possibly associated or have a conflict therebetween according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications.
[0011] As an optimization to the above solution, further, the weight priority is determined by the cloud server through the following steps: [0012] step 1.1, determining networking starting time of the speakers, with an earlier time corresponding to a higher priority; and [0013] step 1.2, determining self-check states of the speakers, with a better self-check state corresponding to a higher priority.
[0014] Further, the cloud server is also capable of invoking other applications on the Internet according to instructions from the speakers.
[0015] Further, the cloud server further receives speech control signals from the speakers to perform speech recognition, including the following steps: [0016] step 1: creating a historical speech feature map library, where a historical speech feature map is created by extracting features from statement speech input in advance or recorded as history and drawing a statement speech feature map including word, phrase and sentence feature maps; [0017] step II: extracting features from statement speech acquired by the speakers in real time, and drawing a target statement speech feature map; selecting and defining any statement speech feature map in the historical speech feature map library as a reference image, and defining the target statement speech feature map as a target image; [0018] step III: binarizing the target image IC, and defining that a value of 1 indicates having a speech feature and that a value of 0 indicates having no speech feature; meshing the binarized feature map into a grid map, defining a first point (xl, yl) of the grid map as an origin, defining a retrieval matching stride as L. and performing retrieval from the origin in x direction; if a point having the value of 1 is retrieved, recording a position and the value of the point and numbering the point in order; otherwise, continuing retrieval matching; [0019] step IV: updating point (xl, yl+N*L) as the origin, performing step III again until the retrieval matching in the x direction and y direction is completed, thereby completing initial positioning retrieval matching, where N is an integer, and L is a constant; [0020] step V: successively extracting points having the value of 1, updating the current extracted point having the value of 1 as the origin, updating the retrieval matching stride to L/2, performing the retrieval matching successively in the x direction without. performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond a range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the y direction, performing step VI; otherwise, performing step VII; [0021] step VI: performing the retrieval matching successively in the y direction while keeping the retrieval matching stride of L/2 unchanged, performing the retrieval matching successively in the y direction without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching snide when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the x direction, performing step V; otherwise, performing step VII; [0022] step VII: ending the retrieval matching until no new point needs to be subjected to the retrieval matching, and defining a region with the points having the value of 1 obtained by the retrieval matching as an effective target image; [0023] step VIII: performing a search matching analysis on the effective target image in the historical speech feature map library; and [0024] step 1X: invoking a corresponding strategy according to a recognition result.
[0025] Further, step VIII further includes image correction, which includes the following steps: [0026] step a: defining the effective target image as I, and selecting and defining any reference image in the historical speech feature map library as Ic; [0027] step b: defining an association relationship between the reference image lc and the target image qvi obtained by polar coordinate transformation: (r, yo) = (azr, -cpz), where a is a scale offset parameter, and (pz is a rotation offset parameter; [0028] step c: calculating, in the radial direction in a polar coordinate system, a projectionKc(i) of the reference image r: Kc(i) = f2i Zinn /PC (i, j), and a projection Kr (0 of the target image l: K7 (J) = n1 X21 I Pm (i, j), taking logarithms of Kc (i) and K7(i) to obtain LICc(i) and LK140), and using a translational difference between LICc(i) and LK:f(i) as the scale offset parameter az, wherein IP(i, j) = I (Kmaz + Ki sin (2=), Kmax Ki cos( 2=Ti)), i=1, 2, ..., nr, =1, 2, ...7-7;; flp flp 12i = ticp/n,p; and 771 =12/0 1) -firni(1 1)]. 172 = 1- i; IJ IJ ij where ñ is the number of samples in an angular direction when Ki =Kmax; IR) represents a maximum integer which is less than or equal to the value within the bracket; the target image has a size of 2K.,x2Kmax; nr=Kmax, representing the number of samples in the radial direction; and nr8K1, representing the number of samples in the angular direction; [0029] step d: calculating projections of the reference image r and the target image qvi in the radial direction and the angular direction according to the scale offset parameter in step c: i ----c -[nil 1Pc (i,ce(1))+77(2i1Pc cc (L))1, a, > 1 of ni and c 0, az < 1 n1.1 pm ( C-1 1 0, = vnr.azigii z ce -)) 77; and + 3,f pm (.
a, ij z cc a, < Om, az > 1 -fin performing a normalization calculation on B and 0 z to obtain a translation amount of the highest point, and calculating the rotation offset parameter yo, according to = 2n-dhcp, where cc() represents a minimum integer which is greater than or equal to the value within the bracket; [0030] step c: putting the rotation offset parameter (pz and the scale offset parameter a, to step A to correct the target image, and calculating, as the center point of the target image, a posi ion point \I IT corresponding to the minimum of E by cz = [ E.(/' 0 (i) -0c (i -d)] thereby i=1 z completing image correction.
[0031] Further, the search matching analysis in step VIII further includes the following steps: [0032] step A: making concentric circles with the center point of the target image ITas the center to divide the speech feature image into B annular regions, and finally, dividing each annular region into K sectors, K and B both being predefined constants; [0033] step B: calculating, as Codel, a sector speech feature value Vsqo of each sector Ssq: vs,10 = ,*,(Ensq I I Fsqo (x, y) -Psqo [0034] where Fsq0(x,y) represents a gray value of each pixel of the sector Ssq; No represents an average value of gray values of pixels in the sector Ssq; nsq represents the number in the annular region Ssq; 0<sq<BxK-1, 0={0°, (360°/K), 2*(3609K), 3*(3607K),...<180°}; [0035] step C: rotating the speech feature image (1809K), repeating step B, and extracting a sector speech feature value Vsqe of each sector Ssq as Code2; [0036] step E: rotating Codel and Code2 Rx(3607K) (R=0, 1, 2, ..., K-1) to obtain Code 1' and Code2', respectively: and [0037] step F: inputting Codel and Code2, and Code 1' and Code2' in step E to the historical speech feature map library for matching.
[0038] The present invention has the following beneficial effects: the present invention uses the positioning information of the speakers as a basis for determining whether the speakers are in the "suspected same group", and a relationship between the speakers that are possibly associated or have a conflict therebetween is then regulated effectively according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications. Feature recognition of speech is converted to global recognition of a feature map so that higher recognition efficiency can be achieved. The accuracy and efficiency of control can be improved by performing correction and positioning processing on a feature image.
BRIEF DESCRIPTION OF THE DRAWINGS
[0039] The present invention will be further described Mow in conjunction with the accompanying drawings and an example.
[0040] FIG. 1 is a flowchart of detection of a conflict between the speakers.
DETAILED DESCRIPTION OF THE EMBODIMENTS
[0041] To make the objective, technical solutions and advantages of the present invention clearer, the present invention will be further described below in detail below with reference to the embodiments. It will be understood that the specific example described herein is merely used to explain, rather than limit, the present invention.
Example 1
[0042] This example provides a cloud service platform system for speakers, where the speaker includes a speech input module, a network connection unit and a player; and the cloud service platform system for speakers includes a cloud server, and a network connection unit for linking the cloud server and the speakers. The speakers each are provided with a positioning detection unit, and the cloud server receives data from the positioning detection unit in real time; and the cloud server performs the following steps of detecting a conflict between the speakers: [0043] step 1: receiving positioning data from speakers in a working state; [0044] step 2: deteimining states of the speakers in operation based on the positioning data, and if a positioning distance between two speakers is smaller than a predefined threshold, marking corresponding speakers as "a suspected same group" and sending a "suspected same group acknowledgment message" to the corresponding speakers; and [0045] step 3: receiving a feedback result for the "suspected same group acknowledgment message"; if the result is "yes", unifying data of the speakers in the same group, transmitting the unified data to any speaker in the same group and controlling any speaker to transmit data within the same group; and if the result is "NO", transmitting play data successively according to a weight priority and controlling the volume.
[0046] In this example, the positioning information of the speakers is used as a basis for determining whether the speakers are in the "suspected same group", and a relationship between the speakers that are possibly associated or have a conflict therebetween is then regulated effectively according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications.
[0047] Specifically, the weight priority is determined by the cloud server through the following steps: [0048] step 1.1: determining networking starting time of the speakers, with an earlier time corresponding to a higher priority; and [0049] step 1.2: determining self-check states of the speakers, with a better self-check state corresponding to a higher priority.
[0050] Specifically, the cloud server is also capable of invoking other applications on the Internet according to instructions from the speakers.
[0051] In one embodiment, the cloud speaker further receives speech control signals from the speakers to perform speech recognition, including the following steps: [0052] step I: creating a historical speech feature map library, where a historical speech feature map is created by extracting features from statement speech input in advance or recorded as history and drawing a statement speech feature map including word, phrase and sentence feature maps; [0053] step II: extracting features from statement speech acquired by the speakers in real time, and drawing a target statement speech feature map; selecting and defining any statement speech feature map in the historical speech feature map library as a reference image, and defining the target statement speech feature map as a target image; [0054] step III: binarizing the target image lc, and defining that a value of 1 indicates having a speech feature and that a value of 0 indicates having no speech feature: meshing the binarized feature map into a arid map, defining a first point (xi, yl) of the grid map as the origin, defining a retrieval matching stride as L, and performing retrieval from the origin along x direction; if a point having the value of 1 is retrieved, recording a position and the value of the point and numbering the point in order; otherwise, continuing retrieval matching; [0055] step IV: updating point (xl, yl+N*L) as the origin, performing step III again until the retrieval matching in x direction and y direction is completed, thereby completing initial positioning retrieval matching, where N is an integer, and L is a constant; [0056] step V: successively extracting points having the value of 1, updating the current extracted point having the value of 1 as the origin, updating the retrieval matching stride to L/2, performing the retrieval matching successively in the x direction without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the y direction, and performing step VI; otherwise, performing step VII; [0057] step VI: performing the retrieval matching successively in the y direction while keeping the retrieval matching stride of L/2 unchanged and without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the x direction, and performing step V; otherwise, performing step VIE; [0058] step VII, ending the retrieval matching until no new point needs to be subjected to the retrieval matching, and defining a mgion with the points having the value of 1 obtained by the retrieval matching as an effective target image; [0059] step VILE: performing a search matching analysis on the effective target image in the historical speech feature map library; and [0060] step DC, invoking a corresponding strategy according to a recognition result.
[0061] In one embodiment, step VIII further includes image correction, which includes the following steps: [0062] step a: defining the effective target image as in and selecting and defining any reference image in the historical speech feature map library as Ic; [0063] step b: defining an association relationship between the reference image lc and the target image ly obtained by polar coordinate transformation as follows: 1(r, p) = ic(azr, -coz).
where az is a scale offset parameter, and yoz is a rotation offset parameter; [0064] step c: calculating, in the radial direction in a polar coordinate system, a projection Kc (i) of the reference image Ic: (i) = 12 * Eng' I Pc (i, j), and a projection Kf (i) of the target j=1 image 01: (i) = fli flyi 1 Pm (i, j) , taking logarithms of Kc(i) and K(i)to obtain LKc(i) and LKT(i), and taking a translational difference between LICe(i) and LKT(J0 as the scale offset parameter az, where /P(ii) = /(Kmax + K sin -27d K -max K cos (-27i)), i= 1, 2, ..., nr, j=1, 2, _ft; ; 71(p 71(p a, = 17 = flj-1)-flini(i -1)1,77.= 1-where ii."; is the number of samples in an angular direction when K=K,""; 110 represents a maximum integer which is less than or equal to the value within the bracket; the target image has a size of 21C,""ax2K,",z; n,=K",",, representing the number of samples in the radial direction; and rk=8K1, representing the number of samples in the angular direction; [0065] step d: calculating projections of the reference image lc and the target image qvf in the radial direction and the angular direction according to the scale offset parameter in step c: If-fl i,ce(L) ,a,>1 Di ni and _ThrazkiiP z (ice (I-1T)) + I Pr cc (L) , az < 1 1 I m * 12i CIZ = ----c M and performing a normalization calculation on 0 and 0 z to obtain a translation amount d of the highest point, and calculating the rotation offset parameter yoz according to (I), = 27rdtc, where ce() represents a minimum integer which is greater than or equal to the value within the bracket; [0066] step e: putting the rotation offset parameter yo, and the scale offset parameter az to step A to correct the target image, and calculating a position point Pzm corresponding to the minimum of II ( p- M j ----- , + E,, A by cz = Ei=i [ 0 z (0 -0c0 -d)1 as the center point of the target image, thereby completing image correction.
[0067] In one embodiment, the search matching analysis in step VIII further includes the following steps: [0068] step A: making concentric circles with the center point of the target image Ins the center to divide the speech feature image into B annular regions, and finally, dividing each annular region into K sectors, where K and B both are predefined constants; [0069] step B: calculating a sector speech feature value Vsqe of each sector Ssq as Code I: Vsqo = 117' (Ensq I I Fsqo (x, y) -Pscio F F), where Fsqe(x,y) represents a gray value of each pixel of the sector Ssq; Kilo represents an average value of gray values of pixels in the sector Ssq; mg represents the number in the annular region Ssq; 0<sq<BxK-1, 0=(0°, (360°/K), 2*(360°/K), 3*(3607K), ...<1801; [0070] step C: rotating the speech feature image (1807K), repeating step B, and extracting a sector speech feature value Vsge of each sector Ssq as Code2; [0071] step E: rotating Codel and Code2 Rx(360°/K) (R=0, 1, 2, ..., K-1) to obtain Code l' and Code2', respectively; and [0072] step F: inputting Codel and Code2, and Code I' and Code2' in step E to the historical speech feature map library for matching.
[0073] In this example, the positioning information of the speakers is used as a basis for determining whether the speakers are in the "suspected same group", and a relationship between the speakers that are possibly associated or have a conflict therebetween is then regulated effectively according to the feedback result. Meanwhile, the cloud server can control a plurality of Internet applications. Feature recognition of speech is converted to global recognition of a feature map so that higher recognition efficiency can be achieved. The accuracy and efficiency of control can be improved by performing correction and positioning processing on a feature image.
[0074] While the illustrative specific embodiment of the present invention is described above so that those skilled in the art can understand the present invention, the present invention is not limited to the scope of the specific embodiment. For those skilled in the art, all invention-creations using the concept of the present invention shall fall within the protection scope of the present invention as long as various alterations are made within the spirit and scope of the present invention defined and determined by the appended claims.
Claims (6)
- CLAIMSI. A cloud service platform system for speakers, the speakers each comprising a speech input module, a network connection unit and a player, the cloud set-vice platform system for speakers comprising a cloud server, and a network connection unit for linking the cloud set-vet-and the speakers, wherein the speakers each are provided with a positioning detection unit, and the cloud server receives data from the positioning detection unit in real time; and the cloud server performs the following steps of detecting a conflict between the speakers: step 1: receiving positioning data from speakers in a working state; step 2: determining states of the speakers in operation based on the positioning data, and if a positioning distance between two speakers is smaller than a predefined threshold, marking corresponding speakers as "a suspected same group" and sending a "suspected same group acknowledgment message" to the corresponding speakers; and step 3: receiving a feedback result for the "suspected same group acknowledgment message"; when the result is "yes", unifying data of a same group of speakers, transmitting the unified data to any speaker in the same group of speakers and controlling any speaker to transmit data within the same group; and when the result is "NO", transmitting play data successively according to a weight priority and controlling volume.
- 2. The cloud service platform system for the speakers according to claim 1, wherein the weight priority is determined by the cloud server through the following steps: step 1.1: determining networking starting time of the speakers, with an earlier time corresponding to a higher priority; and step 1.2: determining self-check states of the speakers, with a better self-check state corresponding to a higher priority.
- 3. The cloud service platform system for the speakers according to claim 1, wherein the cloud server is also capable of invoking other applications on the Internet according to instructions from the speakers.
- 4. The cloud service platform system for the speakers according to any one of claims 1 to 3, wherein the cloud server further receives speech control signals from the speakers to perform speech recognition, comprising: step I: creating a historical speech feature map library, wherein a historical speech feature mapIIis created by extracting features from statement speech input in advance or recorded as history and drawing a statement speech feature map comprising word, phrase and sentence feature maps; step II: extracting features from statement speech acquired by the speakers in real time, and drawing a target statement speech feature map; selecting and defining any statement speech feature map in the historical speech feature map library as a reference image, and defining the target statement speech feature map as a target image; step III: binarizing the target image lc, and defining that a value of 1 indicates having a speech feature and that a value of 0 indicates having no speech feature; meshing the binarized feature map into a grid map, defining a first point (xi, yi) of the grid map as an origin, defining a retrieval matching stride as L, and performing retrieval from the origin in x direction; when a point having the value of 1 is retrieved, recording a position and the value of the point and numbering the point in order; when a point having the value of 0 is retrieved, continuing retrieval matching; step IV: updating point (xi, y 1+N*L) as the origin, perfomting step HI again until the retrieval matching in an x direction and a y direction is completed, thereby completing initial positioning retrieval matching, wherein N is an integer, and L is a constant; step V: successively extracting points having the value of 1, updating a current extracted point having the value of 1 as the origin, updating the retrieval matching stride to L/2, performing the retrieval matching successively in the x direction without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond a range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of 1 appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the y direction, performing step VI; when no new point having the value of 1 appears, performing step VII; step VI: performing the retrieval matching successively in the y direction with keeping the retrieval matching stride of L/2 unchanged and without performing the retrieval matching on points previously subjected to the retrieval matching, automatically halving the retrieval matching stride when the retrieval matching extends beyond the range of the target image, and continuing the retrieval matching until the retrieval matching stride comes to a minimum; defining a new point having the value of I appearing during the retrieval matching as a new point needing to be subjected to the retrieval matching in the x direction, performing step V; when no new point having the value of 1 appears, performing step VII; step VII: ending the retrieval matching until no new point needs to be subjected to the retrieval matching, and defining a region with the points having the value of 1 obtained by the retrieval matching as an effective target image; step VIII: performing a search matching analysis on the effective target image in the historical speech feature map library; and step IX: invoking a corresponding strategy according to a recognition result.
- 5. The cloud service platform system for the speakers according to claim 4, wherein step VIII further comprises image correction comprising the following steps: step a: defining the effective target image as 1111, and selecting and defining any reference image in the historical speech feature map library as lc; step b: defining an association relationship between the reference image lc and the target image ql obtained by polar coordinate transformation: IT (r, co) = (azr,(p - wherein a, is a scale offset parameter, and yoz is a rotation offset parameter; step c: calculating, in the radial direction in a polar coordinate system, a projection KC(i) of the reference image Ic: Kc(i) = 12i E jTh.C°1 /PC (Li), and a projection 1<7 0) of the target image qvi: Kr(0,.(21f1filp"(i,j), taking logarithms of Kc (i) and Kf (i) to obtain LKc(i) and LKII (i), and using a translational difference between LICc(i) and LKI1 (i) as the scale offset parameter a,, wherein IP(i, j) = 1(Kmax Ki sin ( Kmax ± IC1 cos ( )) =1, 2, j=1, 2, h7; 11,p 11q) fl = ti;incp; and = niU 1) -Plat(' -1)1,77 =1- ; &II wherein 71; is a number of samples in an angular direction when Ki=Kmax; fl() represents a maxi mum integer which is less than or equal to the value within the bracket; the target image has a siz c of 2K,1""x2Kmax; nr=K""", representing a number of samples in the radial direction; and ticp=8Ki, representing a number of samples in the angular direction; step d: calculating projections of the reference image lc and the target image 1111 in the radial direction and the angular direction according to the scale offset parameter in step c: ----c a. [ri,1--1 Pc (i, ce (-1))+ ilij1Pc(L))i, a, > 1 0 = y"z i 121 ni and 0, a, < I 74 = En [raz nii/Pli i, ce Ca7) + 777i ipr i, ce (17) z i=i m 0, az > 1 " and performingperforming a normalization calculation on 0 I' and 0 z to obtain a translation amount d of the highest point, and calculating the rotation offset parameter yo, according to yoz = 27rd/ic, wherein ce() represents a minimum integer which is greater than or equal to the value within a bracket; step e: putting the rotation offset parameter yoz and the scale offset parameter a, to step A to correct the target image, and calculating, as a center point of the target image, a position point PP ic -----M.\I -----corresponding to a minimum of E z by E ' , = EL. p [o0.) z -®C 0:-cid, thereby completing image correction.
- 6. The cloud service platform system for the speakers according to claim 4, wherein the search matching analysis in step VIII further comprises the following steps: step A: making concentric circles with a center point of the target image Was the center to divide the speech feature image into B annular regions, and finally, dividing each annular region into K sectors, K and B both being predefined constants; step B: calculating, as Codel, a sector speech feature value Vse of each sector Ssq: Vsge= -nsq (Ensq I I Fsqo y) -Pso F I); wherein Fsqq(x,y) represents a gray value of each pixel of the sector Ssq; 13,0 represents an average value of gray values of pixels in the sector Ssq; list, represents a number in the annular region Ssq; 0< sq<BxK-1, O=(0°, (360Q/K), 2*(360c/K), 3*(3607K),...<1801; step C: rotating the speech feature image (180°/K), repeating step B, and extracting a sector speech feature value Vsq0 of each sector Sq. as Code2; step E: rotating Codel and Code2 Rx(360c/K) (R=0, 1, 2, ..., K-1) to obtain Codel' and Code2', respectively; and step F: inputting Codel and Code2, and Codel' and Code2' in step E to the historical speech feature map library for matching.
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210095507.4A CN114550715A (en) | 2022-01-26 | 2022-01-26 | Sound box cloud service platform system |
Publications (2)
Publication Number | Publication Date |
---|---|
GB202300697D0 GB202300697D0 (en) | 2023-03-01 |
GB2616512A true GB2616512A (en) | 2023-09-13 |
Family
ID=81672941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB2300697.6A Pending GB2616512A (en) | 2022-01-26 | 2023-01-17 | Cloud service platform system for speakers |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114550715A (en) |
GB (1) | GB2616512A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9898250B1 (en) * | 2016-02-12 | 2018-02-20 | Amazon Technologies, Inc. | Controlling distributed audio outputs to enable voice output |
EP3541094A1 (en) * | 2018-03-15 | 2019-09-18 | Harman International Industries, Incorporated | Smart speakers with cloud equalizer |
US20210029452A1 (en) * | 2019-07-22 | 2021-01-28 | Apple Inc. | Modifying and Transferring Audio Between Devices |
-
2022
- 2022-01-26 CN CN202210095507.4A patent/CN114550715A/en active Pending
-
2023
- 2023-01-17 GB GB2300697.6A patent/GB2616512A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9898250B1 (en) * | 2016-02-12 | 2018-02-20 | Amazon Technologies, Inc. | Controlling distributed audio outputs to enable voice output |
EP3541094A1 (en) * | 2018-03-15 | 2019-09-18 | Harman International Industries, Incorporated | Smart speakers with cloud equalizer |
US20210029452A1 (en) * | 2019-07-22 | 2021-01-28 | Apple Inc. | Modifying and Transferring Audio Between Devices |
Also Published As
Publication number | Publication date |
---|---|
GB202300697D0 (en) | 2023-03-01 |
CN114550715A (en) | 2022-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Distributed dynamic map fusion via federated learning for intelligent networked vehicles | |
KR102611454B1 (en) | Storage device for decentralized machine learning and machine learning method thereof | |
US20150005901A1 (en) | Iterative learning for reliable sensor sourcing systems | |
KR101109379B1 (en) | Network fingerprinting | |
JP2022084814A (en) | Distributed data collection method in wireless sensor network in which first node can publish itself or sensor data as collector to another node | |
JP5844920B2 (en) | Image location method and system based on navigation function of mobile terminal | |
WO2019128355A1 (en) | Method and device for determining accurate geographic location | |
CN111935820B (en) | Positioning implementation method based on wireless network and related equipment | |
US9640074B2 (en) | Permissions-based tracking of vehicle positions and arrival times | |
CN111353106A (en) | Recommendation method and device, electronic equipment and storage medium | |
CN112860811A (en) | Method and device for determining data blood relationship, electronic equipment and storage medium | |
CN113075648A (en) | Clustering and filtering method for unmanned cluster target positioning information | |
GB2616512A (en) | Cloud service platform system for speakers | |
WO2023226448A1 (en) | Method and apparatus for generating logistics point-of-interest information, and device and computer-readable medium | |
WO2022068558A1 (en) | Map data transmission method and apparatus | |
KR20220080051A (en) | Method and apparatus for map query and electronic device | |
CN112380314B (en) | Road network information processing method and device, storage medium and electronic equipment | |
US20190347807A1 (en) | Information processing apparatus, data collection method, and data collection system | |
Faheem et al. | Indexing in wot to locate indoor things | |
KR20210030136A (en) | Apparatus and method for generating vehicle data, and vehicle system | |
Ye et al. | Federated Learning-Enabled Cooperative Localization in Multi-agent System | |
CN114820955B (en) | Symmetric plane completion method, device, equipment and storage medium | |
CN115242704B (en) | Network topology data updating method and device and electronic equipment | |
JP2019128611A (en) | Generation apparatus, generation method, and generation program | |
CN113627561B (en) | Data fusion method and device, electronic equipment and storage medium |