CN115798479A

CN115798479A - Method and device for determining session information, electronic equipment and storage medium

Info

Publication number: CN115798479A
Application number: CN202211485243.XA
Authority: CN
Inventors: 崔晓亮; 刘已杨
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2023-03-14

Abstract

The invention discloses a method and a device for determining session information, electronic equipment and a storage medium. The method comprises the following steps: when detecting that voice information is collected based on fixed line equipment, acquiring field contents of at least two user fields in an encapsulation protocol; determining a user identifier according to the field content; a session text corresponding to the voice information is determined, and a session record is generated based on the user identification and the session text. The technical method of the embodiment of the invention solves the problems that in the prior art, when fixed-line equipment which does not support a media resource control protocol or SIP (session initiation protocol) based encryption is used for communication, a corresponding speaking user cannot be determined, and voice information of the speaking user cannot be converted into corresponding session records, so that subsequent services cannot be provided, and the use experience is poor, and realizes the technical effects that the corresponding speaking user can be determined based on the encapsulated protocol, the corresponding session records are generated, and the subsequent services can be provided conveniently.

Description

Method and device for determining session information, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining session information, an electronic device, and a storage medium.

Background

A Media Resource Control Protocol (MRCP) is a communication protocol for a voice server to provide various voice services such as voice recognition and voice synthesis to a client.

At present, when a client makes a consultation, a fixed-line device is mostly used by an operator, and the fixed-line device is mostly realized by performing text transcription on a call stream through a media resource control Protocol (mrc), or by using a port mirroring mode through a Session Initiation Protocol (SIP).

When the present invention is implemented based on the above-described embodiments, the inventors have found that the following problems occur:

for old fixed telephone equipment, a media resource control protocol is not supported, meanwhile, the SIP protocol is also encrypted, and if the SIP is analyzed, the SIP cannot be realized in a port mirroring manner, that is, the operation of transcribing the real-time voice stream cannot be realized based on the above-mentioned manner, and further, the subsequent process cannot be performed based on the transcribed content, so that the use experience is extremely reduced.

Disclosure of Invention

The invention provides a method, a device, electronic equipment and a storage medium for determining session information, which can achieve the technical effects of effectively distinguishing speaking users and converting voice information into corresponding session records.

According to an aspect of the present invention, there is provided a method of determining session information, the method including:

when detecting that voice information is collected based on fixed line equipment, acquiring field contents of at least two user fields in an encapsulation protocol;

determining a user identifier according to the field content;

determining a session text corresponding to the voice information, and generating a session record based on the user identification and the session text.

Further, the method further comprises:

and acquiring a microphone data stream of the fixed telephone equipment, acquiring a sound card data stream of the fixed telephone equipment, and determining the voice information based on the microphone data stream and/or the sound card data stream.

Further, before the obtaining of the field contents of at least two user fields in the encapsulation protocol, the method further includes:

and packaging a user field in a target communication protocol so as to determine the field content of the user field according to the acquisition mode corresponding to the voice information when the voice information is acquired.

Further, when detecting that the fixed line telephone equipment collects the voice information, the method further comprises the following steps:

and determining the field content of the user field in the packaging protocol based on the data source corresponding to the voice information.

Further, the determining the conversation text corresponding to the voice information includes:

carrying out segmentation processing on the voice information to obtain at least one voice fragment;

and performing character conversion on the at least one voice segment based on a character conversion module to obtain the conversation text.

Further, the generating a session record based on the user identifier and the session text includes:

and processing the user identification and the session text based on the identification result field encapsulated in the target communication protocol to obtain the session record.

Further, the method further comprises:

and updating the conversation record on a target display device in real time, so that when the trigger operation on the conversation text is detected, a feedback text corresponding to the conversation text is called, and feedback is performed based on the feedback text.

Further, when the trigger operation on the session text is detected, invoking a feedback text corresponding to the session text, including:

when the session text is detected to be triggered, performing semantic analysis on the session text to determine target semantics;

and determining the feedback text according to the target semantics.

According to another aspect of the present invention, there is provided an apparatus for determining session information, the apparatus including:

the field acquisition module is used for acquiring the field contents of at least two user fields in the encapsulation protocol when detecting that voice information is acquired based on fixed line equipment;

the identification determining module is used for determining the user identification according to the field content;

and the record generating module is used for determining the session text corresponding to the voice information and generating a session record based on the user identification and the session text.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a method of determining session information according to any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement the method for determining session information according to any one of the embodiments of the present invention when executed.

According to the technical scheme of the embodiment of the invention, when the fact that the voice information is collected based on the fixed-line telephone equipment is detected, the field contents of at least two user fields in the encapsulation protocol are obtained, the user identification is determined according to the field contents, the session text corresponding to the voice information is determined, and the session record is generated based on the user identification and the session text, so that the problems that in the prior art, when the fixed-line telephone equipment which does not support a media resource control protocol or SIP protocol communication based on encryption is used, the corresponding speaking user cannot be determined, the voice information of the speaking user cannot be converted into the corresponding session record, subsequent services cannot be provided, and the use experience is poor are solved, the fact that the corresponding speaking user can be determined based on the encapsulation protocol, the corresponding session record is generated, and the technical effect of providing the subsequent services is facilitated is achieved. .

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for determining session information according to an embodiment of the present invention;

FIG. 2 is a diagram of a system architecture for determining session information, to which embodiments of the present invention are applicable;

fig. 3 is a schematic structural diagram of an apparatus for determining session information according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device implementing the method for determining session information according to the embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Before the technical solution of the present embodiment is described, an application scenario may be exemplarily described. The scheme of the embodiment can be applied to any scene needing voice communication by using fixed-line telephone equipment, for example, when a user carries out a complaint or a suggestion and the like, the user can dial some hot-line telephones corresponding to the service platform on the basis of the terminal equipment so as to solve corresponding problems. The curing equipment is usually adopted for customer service of the service platform. The user making the call is a customer user. In this scenario, in order to more clearly understand the call content or other requirements, when the corresponding feedback information is determined according to the voice information of the client user, or when the client user needs to understand some policy information, the client user is not particularly clear, and needs to call the corresponding policy information, the voice information of the call may be converted into a text, so that the downstream system can quickly search for the corresponding policy based on the text and feed back, so that the client user can answer the corresponding client user based on the content of the feedback.

Fig. 1 is a flowchart of a method for determining session information according to an embodiment of the present invention, where this embodiment is applicable to a situation where when an answering device is a fixed-line device, a user corresponding to voice information and a corresponding text are determined based on an encapsulated protocol, and the method may be executed by a device for determining a session message, where the device may be implemented in a form of hardware and/or software, and the device may be configured at a PC end or a telephone customer service system terminal.

As shown in fig. 1, the method includes:

s110, when detecting that voice information is collected based on the fixed telephone equipment, acquiring field contents of at least two user fields in an encapsulation protocol.

The packaging protocol can be a communication protocol which is set by self-definition and is used for transmitting voice information, the packaging protocol comprises at least two user fields, and the field content is usually content information displayed by the user fields. When a call is accessed, the field contents corresponding to the two user fields can be filled according to the user information corresponding to the conversation voice information. For example, the at least two user fields are a from field and a to field, respectively. The from field represents from which user the voice information came from, and the to field represents to which user the voice information was sent. It can be understood that when the fixed-line device based voice call is performed, the corresponding voice can be transmitted through the corresponding device.

Specifically, the voice information of the customer service and the client during the voice call can be collected based on the fixed-line telephone equipment, and the voice information is transmitted based on an encapsulation protocol, wherein the protocol is obtained by self-defining encapsulation based on a from field and a to field in an MRCP protocol. The encapsulation protocol can be analyzed to obtain the field contents of at least two fields related to the information of the customer service user and the customer user in the protocol.

In this embodiment, a microphone data stream of the fixed-line device and a sound card data stream of the fixed-line device may be collected, and the voice information may be determined based on the microphone data stream and/or the sound card data stream.

The fixed-line telephone equipment is provided with a microphone collection array, can collect voice information of a user using the current fixed-line telephone equipment to speak based on the microphone collection array, and takes the collected voice information as a microphone data stream. Correspondingly, the current fixed-line telephone equipment can also play the voice information of the dialing user through the sound card, and the voice information played by the sound card is used as sound card data stream. Namely, the data stream corresponding to the audio information of the customer service user is used as the microphone data stream of the fixed telephone equipment, and the data stream corresponding to the audio information of the customer service user is used as the sound card data stream of the fixed telephone equipment.

Specifically, a microphone data stream and a sound card data stream on the fixed-line telephone device corresponding to the customer service user may be collected, the data stream may be analyzed, and voice information corresponding to the data stream, that is, voices corresponding to the customer service user and the client user may be obtained.

In this embodiment, before obtaining the field contents of at least two user fields in the encapsulation protocol, the method further includes: and packaging the user field in the target communication protocol so as to determine the field content of the user field according to the acquisition mode corresponding to the voice information when the voice information is acquired.

Specifically, the user field is encapsulated in the target protocol, and when the fixed-line telephone equipment is used for collecting the call voice information, the voice collection modes of different users are different. On the fixed telephone equipment of the customer service user, the voice information of the customer service user is acquired based on the sound card mode, and the voice information sent by the customer service user is acquired based on the microphone acquisition mode. Based on the mode, when the sound card collects the voice information, the information related to the client user is filled in the user field content, and when the microphone collects the voice information, the information related to the client service user is filled in the user field content.

In this embodiment, when it is detected that the fixed-line telephone device collects the voice information, the field content of the user field in the encapsulation protocol may be determined based on the data source corresponding to the voice information.

The data source refers to a data source corresponding to the audio information, and based on the above, the data source includes a microphone acquisition mode and a sound card acquisition mode, and the corresponding data source includes a microphone data source and a sound card data source, and based on the microphone data source and the sound card data source, it can be known from which user the voice information comes and the user equipment to which the voice information needs to be sent.

Specifically, the field content may be determined according to the data source corresponding to the voice data, and if the data source is from a sound card of a fixed telephone device used by the customer service user, which indicates that the voice is sent by a customer user who has a call with the customer service user, the corresponding field content is filled in the customer information. If the data source is from the microphone of the customer service user fixed telephone equipment and the description is the voice sent by the customer service user, the field content is filled into the customer service information.

And S120, determining the user identification according to the field content.

The user identifier is used to distinguish different users using the fixed telephone equipment, for example, the user identifier a corresponds to the user a, and the user identifier B corresponds to the user B.

Specifically, the user information included in the field content may be processed by intercepting one of a name, an ID, and a phone number of the user as a user identifier, and based on the identifier, it may be determined which user sent the voice corresponding information corresponds to.

S130, determining a conversation text corresponding to the voice information, and generating a conversation record based on the user identification and the conversation text.

The session text is text information obtained by converting the content of the voice information, and the voice information may be converted into the text information by an Automatic Speech Recognition (ASR) technique. The session record includes a plurality of session texts, each session text may be: the identity of the user-the timestamp-the content spoken by the session.

Specifically, the voice information may be transcribed into a text, that is, a session text, by using a voice transcription technology, and a user identifier corresponding to each sentence of the session text, that is, an identifier of a voice sender corresponding to the text, is associated with the session text to generate a session record, where the session record may be stored in a format such as word, txt, and the like.

Illustratively, when the customer service user communicates with the customer user, the content of the customer service from second 2 to second 6 is "hello, i is customer service No. 1 XXX asking what can help you? "the client user corresponding to ID 00056 says" hello, i would like to consult a service "from second 8 to second 14. Based on the voice information of the above content, the voice information may be converted into a text, and based on the user identifier, a session record may be generated, and the content of the session record may be: customer service 1-2S to 6S-you good, i am customer service No. 1 XXX asking what can help you? Users 00056-8S to 14S-hello, i want to consult a service.

It should be noted that, this has the advantages that the corresponding session content can be embodied in the form of text, service recommendation can be conveniently performed on the client based on the session record, and the session record can guide subsequent operations of the user in a conversation scene between other common users and the user.

In practical application, the mode of determining the conversation text corresponding to the voice information may be to process the voice information in segments to obtain at least one voice segment; and performing character conversion on at least one voice segment based on a character conversion module to obtain a conversation text.

It should be noted that the voice information includes the voice of the client user and the voice of the client user, and in order to distinguish whether the text corresponds to the client or the customer service after the transcription, the voice information needs to be segmented into a plurality of segments.

Specifically, because a call between a client user and a service user is usually made for a certain period of time, only one user is speaking and the other user is listening, if the current speaker is switched from the client to the client, there is a pause in the voice message, so that the voice message can be processed by Voice Activity Detection (VAD) technique, and each obtained voice segment corresponds to one of the service user and the client user. On the basis, the ASR speech-to-text service is called to process each speech fragment to obtain a corresponding session text.

In this embodiment, the session record is generated based on the user identifier and the session text, and the session record may be obtained by processing the user identifier and the session text based on the identification result field encapsulated in the target communication protocol.

And the identification result field is a field in the target protocol, and the user stores the session text corresponding to the voice fragment. For example, the identification result field is a content field.

Specifically, the recognition result field may be encapsulated in the target protocol, and the text corresponding to each speech segment is filled in the field content of the recognition result field. When information is transmitted based on the target protocol, the identification result field in the target protocol can be extracted, and the content corresponding to the field, namely the session text obtained after voice transcription, can be obtained. On the basis, the session texts corresponding to the user identifications are correlated to generate session records.

Fig. 2 is an exemplary architecture diagram of a system for determining session information, which is applicable to the embodiment of the present invention, and as shown in fig. 2, a system for determining session information is developed based on the solution of the embodiment, and the system includes: the ASR service end is configured to provide ASR service, and the downstream system service is configured to use the session record generated in this embodiment. Based on the seat terminal, microphone data streams are collected, sound card data streams are collected, voice information is determined, and user identification is determined according to different collection modes. Further, the protocol from and to fields are encapsulated, and the subscriber identity is filled into this field. On the basis, the text result corresponding to the ASR recognition voice is called for the collected audio information through VAD sentence break, and the text result is filled into the content field content of the protocol. Custom voice transcription protocol fields may also be supplemented for personalization. Further, the user identification is associated with the text to generate a session record, and the session record is sent to a downstream system service, so that the downstream system consumes the text information carrying the role, and better experience is brought to the user.

In practical applications, the session record may be updated on the target display device in real time, so that when the trigger operation on the session text is detected, the feedback text corresponding to the session text is called to perform feedback based on the feedback text.

The target display device may be any device having a function of displaying text, for example, the target display device may be a PC terminal. The feedback text is text information associated with the text content of the session.

It can be understood that when the customer service user and the client user are in an audio call, the client user usually asks the customer service user some service questions, and the customer service user may not answer the service questions, in this case, a conversation record of the communication between the client and the customer service is displayed on the target display device, a certain conversation text corresponding to the client user can be triggered through a click operation, so as to match an explanatory text corresponding to the conversation text from the database, and the question of the client user is answered based on the explanatory text. When the session text is detected to be triggered, the system can also be linked to a search engine, search text information related to the session text based on the search engine to serve as feedback text, and inform the client user of the content of the feedback text. This has the advantage that the feedback to the questions of the customer user can be made more accurate and faster.

In this embodiment, the method for retrieving the feedback text corresponding to the conversation text may be: when the triggering session text is detected, performing semantic analysis on the session text to determine target semantics; and determining a feedback text according to the target semantics.

The target semantics can be semantic analysis results obtained after the speech analysis technology processing is carried out on the session text.

Specifically, when the trigger session text is detected, semantic analysis techniques such as Natural Language Processing (NLP) may be called to perform word segmentation on the session text, remove nonsense words in the session text, determine corresponding events or meanings of the session text based on the remaining words, and use the corresponding events or meanings as target semantics. Further, based on the semantic result, text information matched with the semantic meaning, namely a feedback text, is determined through a search engine or an AI intelligent matching technology. And the target semantics can be more quickly matched with the related feedback text. The customer service user can carry out voice broadcast based on the feedback text, so that the customer user can clearly know the feedback content corresponding to the voice information, and the technical effect of user experience is improved.

According to the technical scheme of the embodiment of the invention, when the fact that voice information is collected based on fixed-line equipment is detected, the field contents of at least two user fields in the encapsulation protocol are obtained, the user identification is determined according to the field contents, the session text corresponding to the voice information is determined, and the session record is generated based on the user identification and the session text, so that the problems that in the prior art, when the fixed-line equipment which does not support a media resource control protocol or SIP protocol communication based on encryption is used, the corresponding speaking user cannot be determined, the voice information of the speaking user cannot be converted into the corresponding session record, subsequent service cannot be provided, and the use experience is poor are solved, the corresponding speaking user can be determined based on the encapsulation protocol, the corresponding session record can be generated, and the technical effect of providing the subsequent service is facilitated.

Fig. 3 is a schematic structural diagram of an apparatus for determining session information according to an embodiment of the present invention.

As shown in fig. 3, the apparatus includes:

a field obtaining module 210, configured to obtain field contents of at least two user fields in an encapsulation protocol when it is detected that voice information is collected based on a fixed line device;

an identifier determining module 220, configured to determine a user identifier according to the field content;

a record generating module 230, configured to determine a session text corresponding to the voice information, and generate a session record based on the user identifier and the session text.

On the basis of the above technical solution, the apparatus for determining session information further includes:

and the data stream acquisition module is used for acquiring a microphone data stream of the fixed telephone equipment, acquiring a sound card data stream of the fixed telephone equipment, and determining the voice information based on the microphone data stream and/or the sound card data stream.

and the user field packaging module is used for packaging the user fields in the target communication protocol before the field contents of at least two user fields in the packaging protocol are acquired, so that the field contents of the user fields are determined according to the acquisition mode corresponding to the voice information when the voice information is acquired.

On the basis of the above technical solution, the field obtaining module 210 includes:

and the data source determining module is used for determining the field content of the user field in the packaging protocol based on the data source corresponding to the voice information.

On the basis of the above technical solution, the record generating module 230 includes:

the voice segmentation module is used for carrying out segmentation processing on the voice information to obtain at least one voice segment;

and the character conversion module is used for carrying out character conversion on the at least one voice fragment based on the character conversion module to obtain the conversation text.

On the basis of the above technical solution, the record generating module 230 further includes:

and the generation module is used for processing the user identification and the session text based on the identification result field encapsulated in the target communication protocol to obtain the session record.

and the display feedback module is used for updating the conversation record on target display equipment in real time so as to call a feedback text corresponding to the conversation text when the trigger operation on the conversation text is detected, and perform feedback based on the feedback text.

On the basis of the technical scheme, the display feedback module comprises:

the semantic analysis module is used for performing semantic analysis on the session text when the session text is detected to be triggered, and determining target semantics;

and the feedback module is used for determining the feedback text according to the target semantics.

According to the technical scheme of the embodiment of the invention, when the fact that the voice information is collected based on the fixed-line telephone equipment is detected, the field contents of at least two user fields in the encapsulation protocol are obtained, the user identification is determined according to the field contents, the session text corresponding to the voice information is determined, and the session record is generated based on the user identification and the session text, so that the problems that in the prior art, when the fixed-line telephone equipment which does not support a media resource control protocol or SIP protocol communication based on encryption is used, the corresponding speaking user cannot be determined, the voice information of the speaking user cannot be converted into the corresponding session record, subsequent services cannot be provided, and the use experience is poor are solved, the fact that the corresponding speaking user can be determined based on the encapsulation protocol, the corresponding session record is generated, and the technical effect of providing the subsequent services is facilitated is achieved. The device for determining the session information provided by the embodiment of the invention can execute the method for determining the session information provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Fig. 4 is a schematic structural diagram of an electronic device 30 implementing the method for determining session information according to the embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 4, the electronic device 30 includes at least one processor 31, and a memory communicatively connected to the at least one processor 31, such as a Read Only Memory (ROM) 32, a Random Access Memory (RAM) 33, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 31 may perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 32 or the computer program loaded from the storage unit 38 into the Random Access Memory (RAM) 33. In the RAM33, various programs and data necessary for the operation of the electronic apparatus 30 can also be stored. The processor 31, the ROM32, and the RAM33 are connected to each other via a bus 34. An input/output (I/O) interface 35 is also connected to bus 34.

A plurality of components in the electronic device 30 are connected to the I/O interface 35, including: an input unit 36 such as a keyboard, a mouse, etc.; an output unit 37 such as various types of displays, speakers, and the like; a storage unit 38 such as a magnetic disk, an optical disk, or the like; and a communication unit 39 such as a network card, modem, wireless communication transceiver, etc. The communication unit 39 allows the electronic device 30 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 31 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 31 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The processor 31 performs the various methods and processes described above, such as the method of determining session information.

In some embodiments, the method of determining session information may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 38. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 30 via the ROM32 and/or the communication unit 39. When the computer program is loaded into the RAM33 and executed by the processor 31, one or more steps of the method of determining session information described above may be performed. Alternatively, in other embodiments, the processor 31 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of determining session information.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of determining session information, comprising:

determining a user identifier according to the field content;

2. The method of claim 1, further comprising:

and acquiring a microphone data stream of the fixed-line telephone equipment, acquiring a sound card data stream of the fixed-line telephone equipment, and determining the voice information based on the microphone data stream and/or the sound card data stream.

3. The method of claim 1, further comprising, prior to obtaining field contents of at least two user fields in an encapsulation protocol:

and packaging a user field in a target communication protocol so as to determine the field content of the user field according to a collection mode corresponding to the voice information when the voice information is collected.

4. The method of claim 1, wherein when it is detected that the fixed-line telephone equipment acquires the voice information, the method further comprises:

5. The method of claim 1, wherein the determining the session text corresponding to the voice information comprises:

carrying out segmentation processing on the voice information to obtain at least one voice segment;

6. The method of claim 1, wherein generating a conversation record based on the user identification and the conversation text comprises:

7. The method of claim 1, further comprising:

and updating the conversation record on a target display device in real time, so that when a trigger operation on the conversation text is detected, a feedback text corresponding to the conversation text is called, and feedback is carried out based on the feedback text.

8. The method according to claim 7, wherein the retrieving the feedback text corresponding to the conversation text when the trigger operation on the conversation text is detected comprises:

and determining the feedback text according to the target semantics.

9. An apparatus for determining session information, comprising:

10. An electronic device, characterized in that the electronic device comprises:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement a method of determining session information as recited in any of claims 1-8.

11. A storage medium containing computer-executable instructions for performing the method of determining session information of any one of claims 1-8 when executed by a computer processor.