CN102546542A

CN102546542A - Electronic system and embedded device and transit device of electronic system

Info

Publication number: CN102546542A
Application number: CN2010105967785A
Authority: CN
Inventors: 卢廉瑾; 冯锐; 郭峰
Original assignee: Fujian Star Net eVideo Information Systems Co Ltd
Current assignee: Fujian Star Net eVideo Information Systems Co Ltd
Priority date: 2010-12-20
Filing date: 2010-12-20
Publication date: 2012-07-04
Anticipated expiration: 2030-12-20
Also published as: CN102546542B

Abstract

The invention discloses an electronic system and an embedded device and a transit device of the electronic system. The system comprises a voice collection device, an embedded client, a transmit device and a server. The voice collection device is connected with the embedded client, and the transit device is connected between the embedded client and the server. The embedded client controls the voice collection device to conduct voice collection and obtain voice data, and the transmit device is used for transmitting the voice data to the server to conduct voice recognition and feeds back a recognition result obtained by the server through voice recognition to the embedded client. The electronic system and the embedded device and the transit device of the electronic system can apply voice recognition technology to the embedded device easily, are low in requirements for the embedded device, can shield changes of the embedded device and the voice recognition server, and enable a general voice recognition engine to be applied to different embedded application systems.

Description

Electronic system and embedded device thereof and transferring equipment

Technical field

The present invention relates to electronic technology field, particularly relate to embedded amusing products.

Background technology

Mode identification technology refers to the science and technology that the recognition function (comprising vision, the sense of hearing, sense of touch, judgement etc.) of object, process and phenomenon in a certain specific environment in the external world is simulated.In recent years, this technology has obtained the systematic research achievement obtained development rapidly in the computer intelligence field.

Speech recognition is a typical application in the mode identification technology, and it just progressively becomes the key technology of man-machine interface in the information technology (HCI).As an emerging high-tech industry, speech recognition technology has had many comparatively ripe recognition engine to utilize, and the middle section of Acoustical Inst., Chinese Academy of Sciences believes that sharp voice platform is one of them.

Language sound recognition technology makes that can carry out direct sense organ between user and the computer exchanges, if therefore can it be applied to amusing products, will like never before promote user's impression, also can amplify out many new amusement utilizations.

The first, the great software and hardware resources that present mode identification technology need consume: at first, its complicated floating-point operation requires to have high performance processor and the huge internal memory of capacity; Secondly, speech recognition needs bigger according to the sample storehouse, can expend considerable memory space.Therefore, this technology has very harsh requirement to operation platform, be with the embedded device be main digital entertainment equipment can not bear.How to break through the software and hardware bottleneck, this technology of utilization on embedded device is the problem that the person that needs the embedded software developing faces.

The second, present embedded recognition engine all is based on x86 hardware and the windows platform is developed, but the framework of embedded product platform and operating system then maybe be ever-changing.Can not let single speech recognition engine remove to yield to various hardware platform.How to let speech recognition engine can adapt to various embedded device for this reason, make that both can be relatively independent, do not receive the other side's influence, the problem that the person that also is the embedded software developing faces.

About the application of speech recognition technology, can also consult No. the 00109844.6th, the Chinese invention patent application that discloses a kind of by name " client server speech information transporting system and method " October 03 calendar year 2001.Said system comprises at least one server station and client stations; Client stations comprises: be used to receive the device from the user's voice input signal; With the device that is used for the signal of the voice of representing to be received is sent to through public internet server station; And server station comprises: be used to receive the device from the voice equivalent signal of public internet; With the big/huge vocabulary speech recognition device that is used to discern the voice equivalent signal that is received; Client stations comprises local voice identifier and voice controller; Voice controller can import part voice input signal at least in the local voice identifier, and optionally a part of voice input signal is imported in the server station through public internet according to recognition result.

Summary of the invention

The technical problem that the present invention mainly solves provides a kind of electronic system and embedded device and transferring equipment; Can let speech recognition technology be applied in the embedded device easily; To embedded device require low; And can shield the variation separately of embedded device and speech recognition server, make general speech recognition engine can be applicable to different built-in applied systems.

For solving the problems of the technologies described above, the technical scheme that the present invention adopts is: a kind of electronic system is provided, comprises: voice capture device, embedded client, transferring equipment and server; Said voice capture device connects said embedded client, and said transferring equipment is connected between embedded client and the server; Wherein, Said embedded client control voice capture device is carried out voice collecting and is obtained speech data; Said transferring equipment is used for that said speech data is sent to said server and carries out speech recognition, and the recognition result that said server speech recognition is obtained feeds back to said embedded client.

Wherein, adopt the TCP/IP of local area network (LAN) to be connected between said transferring equipment and the said embedded client, the TCP/IP of employing local area network (LAN) is connected between said transferring equipment and the said server.

Wherein, said transferring equipment is the transferring equipment with unique host, and comprises the network interface card that connects said unique host and embedded client.

Wherein, said unique host comprises: speech recognition engine interface and initialization speech recognition resources; Linkage unit is used for receiving the TCP connection request from embedded client through said network interface card, and sets up transferring equipment through said network interface card and be connected with TCP/IP between the embedded client; The controlling packet receiving element after being used to set up said TCP/IP and connecting, receives the UDP controlling packet from embedded client through said network interface card, and said UDP controlling packet comprises: sample rate, channel number and speech coding form, ask to begin speech recognition with this; Initialization unit is used for after receiving beginning speech recognition request, calling the speech recognition engine interface, the initialization speech recognition resources, and the notice of replying corresponding said UDP controlling packet through said network interface card in initialization success back is given embedded client equipment; The Data Receiving unit is used for receiving the speech data from embedded client replying said notice to behind the embedded client equipment through said network interface card; Data transmission unit is used to call said speech recognition engine interface and sends speech data to server; The result returns the unit, is used for the recognition result from server is transmitted to embedded client through UDP.

Wherein, Further comprise: the format conversion unit; Be used for after the Data Receiving unit receives speech data, before data transmission unit is sent speech data; Said speech data is carried out sample rate conversion, convert the speech data form that server can be discerned into, transfer to said data transmission unit and send.

The present invention also provides a kind of embedded device, comprising: embedded client; Said embedded client has first interface that connects external voice capture device and second interface that connects external transferring equipment; Wherein, said embedded client receives the speech data that voice capture device collects through first interface, through said second interface speech data is sent to external transferring equipment, and receives the recognition result of said speech data through said second interface.

Wherein, the interface of said embedded client adopts the TCP/IP of local area network (LAN) to connect said external transferring equipment.

The present invention provides a kind of transferring equipment that is applied to embedded device again, comprising: the network interface card of unique host, the said unique host of connection and embedded client; Wherein, Said unique host receives the speech data from said embedded client through said network interface card; Speech data is sent to external server carries out speech recognition, and the recognition result that said server speech recognition is obtained feeds back to said embedded client.

Wherein, said unique host comprises: speech recognition engine interface and initialization speech recognition resources; Linkage unit is used for receiving the TCP connection request from embedded client through said network interface card, and sets up unique host through said network interface card and be connected with TCP/IP between the embedded client; The controlling packet receiving element after being used to set up said TCP/IP and connecting, receives the UDP controlling packet from embedded client through said network interface card, and said UDP controlling packet comprises: sample rate, channel number and speech coding form, ask to begin speech recognition with this; Initialization unit is used for after receiving beginning speech recognition request, calling the speech recognition engine interface, the initialization speech recognition resources, and the notice of replying corresponding said UDP controlling packet through said network interface card in initialization success back is given embedded client equipment; The Data Receiving unit is used for receiving the speech data from embedded client replying said notice to behind the embedded client equipment through said network interface card; Data transmission unit is used to call said speech recognition engine interface and sends speech data to server; The result returns the unit, is used for the recognition result from server is transmitted to embedded client through UDP.

Wherein, The format conversion unit is used for after the Data Receiving unit receives speech data, before data transmission unit is sent speech data, and said speech data is carried out sample rate conversion; Convert the speech data form that server can be discerned into, transfer to said data transmission unit and send.

The invention has the beneficial effects as follows: be different from the situation that speech recognition technology is difficult to use in the prior art electronic system, the present invention can with the collecting work of speech data and to the processing and identification work of speech data from physically being separated into 3 sub-systems: the more weak embedded device of the less ability of resource only is responsible for gathering and sending data and reception result; Transferring equipment can be sent to the server with speech recognition engine with speech data, receives behind the recognition result passback and gives embedded device; Because recognition engine is installed on the server, therefore abundant software and hardware resources is arranged, and when resource is not enough, can solve through increasing new server.Speech recognition engine can be provided by the third party; Simultaneously; Can connect as long as observe certain agreement between each sub-systems; Therefore can reduce the relevance of each sub-systems relatively: the setting of transferring equipment; Can shield embedded foreground and the variation separately of speech recognition backstage, make general speech recognition engine can be applicable to different built-in applied systems; After having adopted above scheme, the bottleneck restriction that embedded device has got around resource has broken through the fixedly limitation of operation platform simultaneously, makes that speech recognition technology of a high price can apply to the amusement equipment on the embedded platform pratical and feasiblely originally.

Description of drawings

Fig. 1 is the theory diagram of electronic system execution mode one of the present invention;

Fig. 2 is the theory diagram of electronic system execution mode two of the present invention;

Fig. 3 is the theory diagram of electronic system execution mode three of the present invention;

Fig. 4 is the structural representation of control information communication bag among the present invention;

Fig. 5 is the structural representation of data message communication bag among the present invention.

Embodiment

Consult Fig. 1, electronic system execution mode of the present invention comprises:

Voice capture device (figure does not show), embedded client, transferring equipment and server;

Said voice capture device connects said embedded client, and said transferring equipment is connected between embedded client and the server, replaces voice capture device and embedded client with embedded device among the figure;

Wherein, Said embedded client control voice capture device is carried out voice collecting and is obtained speech data; Said transferring equipment is used for that said speech data is sent to said server and carries out speech recognition, and the recognition result that said server speech recognition is obtained feeds back to said embedded client.

The present invention can with the collecting work of speech data and to the processing and identification work of speech data from physically being separated into 3 sub-systems: the more weak embedded device of the less ability of resource only is responsible for gathering and sending data and reception result; Transferring equipment can be sent to the server with speech recognition engine with speech data, receives behind the recognition result passback and gives embedded device; Because recognition engine is installed on the server, therefore abundant software and hardware resources is arranged, and when resource is not enough, can solve through increasing new server.Speech recognition engine can be provided by the third party;

Simultaneously; Can connect as long as observe certain agreement between each sub-systems; Therefore can reduce the relevance of each sub-systems relatively: the setting of transferring equipment; Can shield embedded foreground and the variation separately of speech recognition backstage, make general speech recognition engine can be applicable to different built-in applied systems;

After having adopted above scheme, the bottleneck restriction that embedded device has got around resource has broken through the fixedly limitation of operation platform simultaneously, makes that speech recognition technology of a high price can apply to the amusement equipment on the embedded platform pratical and feasiblely originally.

In another embodiment, adopt the TCP/IP of local area network (LAN) to be connected between said transferring equipment and the said embedded client, the TCP/IP of employing local area network (LAN) is connected between said transferring equipment and the said server.Certainly, also can not adopt local area network (LAN) to be connected between said transferring equipment and the said embedded client, or not adopt TCP/IP to connect, all connected modes all be admissible, such as the wireless connections mode.

In another embodiment, said transferring equipment is the transferring equipment with unique host, and comprises the network interface card that connects said unique host and embedded client.Can certainly not have unique host, use compatible system etc. such as adopting with other.

Consult Fig. 2, in another embodiment, said unique host comprises:

Speech recognition engine interface and initialization speech recognition resources;

Linkage unit is used for receiving the TCP connection request from embedded client through said network interface card, and sets up transferring equipment through said network interface card and be connected with TCP/IP between the embedded client;

The controlling packet receiving element after being used to set up said TCP/IP and connecting, receives the UDP controlling packet from embedded client through said network interface card, and said UDP controlling packet comprises: sample rate, channel number and speech coding form, ask to begin speech recognition with this;

Initialization unit is used for after receiving beginning speech recognition request, calling the speech recognition engine interface, the initialization speech recognition resources, and the notice of replying corresponding said UDP controlling packet through said network interface card in initialization success back is given embedded client equipment;

The Data Receiving unit is used for receiving the speech data from embedded client replying said notice to behind the embedded client equipment through said network interface card;

Data transmission unit is used to call said speech recognition engine interface and sends speech data to server;

The result returns the unit, is used for the recognition result from server is transmitted to embedded client through UDP.

Consult Fig. 3; In another embodiment; Further comprise: the format conversion unit, be used for after the Data Receiving unit receives speech data, before data transmission unit is sent speech data, said speech data is carried out sample rate conversion; Convert the speech data form that server can be discerned into, transfer to said data transmission unit and send.Transform through data format, make the shield effectiveness between each system better.

Also consult Fig. 1, the present invention also provides a kind of embedded device execution mode, comprising:

Embedded client;

Second interface (scheming not show) that said embedded client has first interface (figure does not show) that connects external voice capture device such as microphone and connects external transferring equipment is such as network interface card;

Wherein, said embedded client receives the speech data that voice capture device collects through first interface, through said second interface speech data is sent to external transferring equipment, and receives the recognition result of said speech data through said second interface.

Above-mentioned execution mode can let speech recognition technology be applied in the embedded device easily; To embedded device require low; And can shield the variation separately of embedded device and speech recognition server, make general speech recognition engine can be applicable to different built-in applied systems.

In the another one execution mode, the interface of said embedded client adopts the TCP/IP of local area network (LAN) to connect said external transferring equipment.Can certainly adopt connected modes such as metropolitan area network, the Internet.

Also consult Fig. 1, the present invention provides a kind of transferring equipment execution mode that is applied to embedded device again, comprising:

Unique host, connect the network interface card (scheming not show) of said unique host (figure do not show) and embedded client;

Wherein, Said unique host receives the speech data from said embedded client through said network interface card; Speech data is sent to external server carries out speech recognition, and the recognition result that said server speech recognition is obtained feeds back to said embedded client.

Above-mentioned transferring equipment when making speech recognition technology be applied to embedded device, can shield the variation separately of embedded device and speech recognition server, makes general speech recognition engine can be applicable to different built-in applied systems.

Use transferring equipment such as transfer gateway interface and scheduling between embedded device and the concrete recognition engine, make the function of embedded device, can be convenient for changing with to use concrete recognition engine (server) irrelevant.

Also consult Fig. 2, in the another one execution mode, said unique host also further comprises:

Linkage unit is used for receiving the TCP connection request from embedded client through said network interface card, and sets up unique host through said network interface card and be connected with TCP/IP between the embedded client;

Also consult Fig. 3, in addition, can further include:

The format conversion unit; Be used for after the Data Receiving unit receives speech data, before data transmission unit is sent speech data; Said speech data is carried out sample rate conversion, convert the speech data form that server can be discerned into, transfer to said data transmission unit and send.

More than the concrete operation of each subsystem can be following:

1) the embedded device end is set up session according to formulating agreement and transferring equipment;

2) the embedded device end is responsible for from microphone audio frequency acquiring data;

A session adopts UDP control communication that connects and the clear data communication of adopting TCP to realize by one; The data connection of TCP has guaranteed reliability of data transmission, even the clear data bag guarantees when the bag adhesion takes place, also can not influence the accuracy of data simultaneously; The control connection of UDP can reduce the quantity of connection, alleviates the load of transferring equipment when concurrent, simultaneously, even the communication of the UDP in the local area network (LAN) is reliable and stable through test basically;

Wherein, for distribution transferring equipment end:

1) transferring equipment is responsible for receiving the connection request of embedded device, manages and transmit all sessions;

2) behind the speech data that the reception embedded device is sent, once resample, the pcm data transaction is become to meet the data that speech recognition engine (server) requires;

3) data after will resampling mail to the identified server cluster through recognition interface, and collect their recognition result feedback;

4) recognition result is returned to embedded device;

Wherein, to identified server cluster and recognition engine:

1) work of real consumes resources is all accomplished on the recognition engine in the identified server cluster, and this server cluster is transparent to embedded device;

2) upgrade or server load when not enough when recognition technology, only need safeguard that this server cluster gets final product, can not involve the change of front end.

Wherein, for transferring equipment:

One, physical connection

1) transferring equipment (transfer gateway) and embedded client are in the unified local area network (LAN).Promptly physically, transferring equipment can be a platform independent main frame, and can be connected in the local area network (LAN) through throw the net card and many embedded client;

2) other end of transferring equipment; Should be connected with the cluster of certain mode and identified server; This depends on that the design of the speech recognition engine that is adopted and this application mode itself have nothing to do, and this point is formally set up one of purpose of transferring equipment; Promptly shield dependence between embedded device and the concrete third party's speech recognition engine that adopts, the variation that makes it separately can not have influence on the other end;

3) communication of transferring equipment and embedded client adopts the TCP/IP of local area network (LAN) to be connected, and can guarantee enough transmission rates; The agreement of communication is to look to be prone to the self-defining communication protocol that is suitable for the local area network (LAN) characteristics, the agreement that the packet that promptly adopts the UDP controlling packet to be connected with TCP makes up.In an embodiment, said protocol details can be with reference to as follows;

One, communication interface definition:

1, system communication pattern:

(1) control information communication bag: UDP;

(2) data message communication bag: TCP;

2, packet size: maximum is no more than 4096 bytes;

3, port numbers:

(1) UDP control information PORT COM:

Transmit port: 10010;

Receiving port: 10011;

(2) tcp data information communication port: (can distinguish corresponding input audio stream)

Passage one port: 10020;

Passage two ports: 10022;

4, applicable system: section believes the system of sharp voice platform in the application;

Two, communication modes brief description:

1, control information communication:

(1) control information refers to and applies for that voice transfer begins, applies for that voice transfer finishes, the result feeds back, background state notice (detecting voice begins, finishes), backstage forced termination, or the like;

(2) the UDP mode is adopted in the control information communication, guarantees that the packet border is independent;

2, data message communication:

(1) data message refers to the audio data stream of collection;

(2) the TCP mode is adopted in the data message communication, and the transmission data are uncorrected data, no encapsulating structure, and under the situation that guarantees reliable data transmission, the influence of avoiding the data adhesion to cause;

(3) the corresponding TCP of each audio data stream connects;

Three, protocol format figure:

In voice platform communication, adopt following two kinds of format protocols:

1, control information communication bag, like Fig. 4:

2, data message communication bag, like Fig. 5:

Four, protocol format brief description:

(1), control information communication bag:

Agreement comprises packet header and text two parts, and wherein regular length 32 bytes are adopted in the header part, and the length of body part provides in " text length " field in packet header.

0, communication modes: UDP

1, do a simple declaration in the face of each field of header part down:

2, main message number, the definition of sub-message number:

3, message text definition:

(2), data message communication bag:

0, communication modes: TCP

1, message text definition:

Though the design of depending on speech recognition engine that is connected of transferring equipment and third party's identified server cluster can be satisfied ask for something, like connected mode easily, enough data communication speed etc., what therefore usually also adopt is that local area network (LAN) connects.

Two, set up the purpose of transferring equipment

1) with embedded client and third party's speech recognition engine physics with on logic is connected separately, do not make its direct coupling, let the variation of an end as far as possible and---like variations such as the form of data, recognition interface, data communication protocols---do not have influence on the other end;

2) on the one hand; Embedded client only needs can accomplish to gather voice and observe with some communication protocols of transferring equipment and can use speech identifying function; And need not the additional more requirements of embedded device have further been reduced the threshold that can use the embedded device of speech-recognition services;

3) on the other hand; Different third party's recognition engine is to importing the parameter of data into---like type of coding, employing rate, channel number etc.---all has different demands; And embedded client not necessarily can all satisfy; Therefore another function of transferring equipment is to carry out a spot of adaptation processing to initial data, is called resampling, and the initial data that embedded device is transmitted transfers the identification that meets the required specification of speech recognition engine to and uses data;

Three, working mechanism

The control flows of use udp protocol adds the communication connection of the audio data stream combination of Transmission Control Protocol between transfer gateway and the embedded device.Transfer gateway and third party's recognition engine be connected the interface requirement of observing recognition engine.

Once typical communication process is following:

1) when embedded device is wanted to carry out speech recognition, at first initiates the TCP connection request to transferring equipment;

2) after the TCP connection request was accepted and successfully connected, embedded device sent the UDP controlling packet to transferring equipment, and package informatin need comprise following information: sample rate, channel number and speech coding form etc., ask to begin voice data transmission with this;

3) after transfer equipment listens to the request of " beginning speech recognition " of STB, call the speech recognition engine interface, initialization speech recognition device resource after the initialization success, is replied corresponding UDP controlling packet, and the notice embedded device can transmitting audio data;

4) embedded device such as STB begin to connect to the transferring equipment transmitting audio data through TCP;

5) transferring equipment is received speech data, and speech data is carried out sample rate conversion, converts the speech data form that speech recognition engine can be discerned into;

6) after conversion finishes, call the interface that recognition engine provides, send the speech data after the conversion to recognition engine, the wait recognition engine is returned recognition result;

7) result is transmitted to embedded device through UDP, a speech recognition finishes.

System of the present invention and equipment can be applicable to entertainment field, science and education field or conference scenario field etc.

The above is merely execution mode of the present invention; Be not so limit claim of the present invention; Every equivalent structure or equivalent flow process conversion that utilizes specification of the present invention and accompanying drawing content to be done; Or directly or indirectly be used in other relevant technical fields, all in like manner be included in the scope of patent protection of the present invention.

Claims

1. an electronic system is characterized in that, comprising:

Voice capture device, embedded client, transferring equipment and server;

Said voice capture device connects said embedded client, and said transferring equipment is connected between embedded client and the server;

2. electronic system according to claim 1 is characterized in that:

Adopt the TCP/IP of local area network (LAN) to be connected between said transferring equipment and the said embedded client, the TCP/IP of employing local area network (LAN) is connected between said transferring equipment and the said server.

3. electronic system according to claim 2 is characterized in that:

Said transferring equipment is the transferring equipment with unique host, and comprises the network interface card that connects said unique host and embedded client.

4. electronic system according to claim 3 is characterized in that:

Said unique host comprises:

5. electronic system according to claim 4 is characterized in that, further comprises:

6. an embedded device is characterized in that, comprising:

Embedded client;

Said embedded client has first interface that connects external voice capture device and second interface that connects external transferring equipment;

7. equipment according to claim 6 is characterized in that:

The interface of said embedded client adopts the TCP/IP of local area network (LAN) to connect said external transferring equipment.

8. a transferring equipment that is applied to embedded device is characterized in that, comprising:

The network interface card of unique host, the said unique host of connection and embedded client;

9. equipment according to claim 8 is characterized in that, said unique host comprises:

10. equipment according to claim 9 is characterized in that, further comprises: