CN1157679A

CN1157679A - Method and device for transmitting audio signal between communication system terminal equipment

Info

Publication number: CN1157679A
Application number: CN95195059A
Authority: CN
Inventors: I·塞贝什真
Original assignee: Siemens Corp
Current assignee: Siemens Corp
Priority date: 1994-07-25
Filing date: 1995-07-25
Publication date: 1997-08-20
Also published as: WO1996003829A1; KR970705278A; EP0772933A1

Abstract

A method for transmitting audio signals from one communication terminal setup (KES) to another communication terminal equipment (KEE) over a communication network using a suitable packet-switched transmission protocol (HDLCLAP or packet medium multiplexing), having the following steps: a) Compress the digital input audio signal in the sender's communication terminal equipment; b) the compressed audio signal is divided into data blocks; c) each data block is marked with an audio information data block or a command data block; d) when using an error identification, A protocol that meets the required secure transmission method, the audio information data block is transmitted from the sender's communication terminal equipment (KES) to the receiving communication terminal equipment (KEE) and stored in the device; e) the stored audio information data block is decompressed in use method, output as a digital audio signal or further output; f) in order to control the communication process, command data blocks to be securely transferred between sender and receiver by using a protocol that complies with a so-called transmission method with error recognition, required security exchange. Furthermore, a device performing such a method has application-specific additional devices.

Description

The method and apparatus of transmitting audio signal between communication terminal device

What the present invention relates to is, under the situation of the host-host protocol that uses a safety, at ITU-TV.8/V.8 bis initialization and ITU-T V.34 in the scope of modulator-demodulator, the method that audio signal is transferred to another communication terminal device (KEE) from a communication terminal device by the numeral selected or analog communication network.

By analog communication network, for example telephone network, or digital communication network, mobile radio network for example, analog interface, the terminal equipment of connection is mainly used in and transmits quasi real time analogue audio frequency/voice signal.After setting up communication, between two telephone terminals, the original sound signal of transmitter microphone converts the simulation electronic vibration to, and it simulated ground and synchronously is transferred to the reception telephone terminal by this analog electrical phone network.Under the digital communication situation, for example by the communication of a modulator-demodulator, analog signal must be converted to digital signal in addition before transmission, changes back analog signal after transmission again.In receiving telephone terminal, the electronic audio frequency/voice signal that receives is changed the echo tone signal equally synchronously at the telephone receiver place, and it comprises the primary signal that is similar to audio/speech message.

Task of the present invention is, provide the simple implementation method of a safety by the communication network transmitting audio signal, the digital network unit is not only also passed through by the analog network unit in connection or control path that the method allows to set up, this network element belongs to communication network separately, and may realize the communication service that adds.

In addition give the device of implementing this quadrat method.

Advantageous effect of the present invention particularly in: shortened the transmission delay of relative transmit audio information duration in real time, perhaps in analog communication network, when transmission, improved the available bandwidth scope of transmission.

Solved this task by method with claim 1 feature.Favourable device is the content of dependent claims.In addition, in claim 8, provided the wiring of implementing by the inventive method layout.

In the audio communication terminal equipment of transmit leg, audio/speech signal is converted into digital code, in order to dwindle the bulk information of transmission, at first compressed signal, it is divided into the message data piece then, and-by means of corresponding protocol-by telecommunications network by perfect transmission the one by one.By the defining virtual channel show HDLC transmission method (" High-Level Data Link Control ") HDLC LAP (" link access protocol ")-variable in ITU-T series G, Q, T, V, X, be described in detail-and ITU-T series in be the multiplexer method of medium with the bag, these different information categories do not rely on mutual " virtually connecting continuous " to be divided into the safe or unsafe mode-transmission of piece-selections.

In recipient's audio communication terminal equipment, the information of reception is decompressed, and is decoded into the original number character code, so formed a digital audio/speech information, it is same as signal source.For when needed, for example after the digital-to-analog conversion, provide relevant reception information, reception information generally by the digital storage equipment of self, resembles disk, is temporarily stored on the receiving terminal apparatus.

Under situation about using, in principle, can on both direction, carry out bi-directional exchanges of information according to the inventive method.In semiduplex mode, audio/speech information in identical connection or each reappear connect after, taken turns between terminal equipment, to exchange.Under the full-duplex mode situation, audio/speech information is exchanged between terminal equipment simultaneously.

The following characteristic of voice compression can have particularly advantageous invention way of realization:

Present standard with speech coding of very low bit rate, the video telephone of public switched telephone network (PSTN) among the ITU-T for example, caused having the high-quality speech coding of 4-8kbit/s transmission rate, this transmission speed has at first realized the transmission kind of a safety, for example meet HDLC LAP, have the transmission speed that meets the digital data transmission that G.726 CCITT recommend and almost realized the quality that can reach.

The sound compression method that meets ITU-TSG 15 and the standardized high-quality audio signal of ISO/IEC JTG1 SC29 at present for example up to the 16KHz HD Audio, under safe transmission kind situation, has the 24-32kbit/s transmission rate.

According to method of the present invention,, make for example simple using under the analog subscriber network interface case, available " voice storing call " realizes becoming possibility, that is to say that using under the suitable communication protocol situation, telephone terminal has voice mail to be used.

In this application, the voice and the acoustic information that are attached on the normal speech phone can be transferred to the recipient from transmit leg, perhaps are transferred to a plurality of recipients under audio/speech broadcasting situation.Rely on the sound/speech compressibility figure that uses, voice and/or acoustic information transmit with different tonequality.

Use the current disclosed compression method related time base that can four multiplication of voltages contracts.When using a modulator-demodulator with 28～30bit/s higher transmission rates for example, one for example the real-time audio information of 7K bit/s using under the suitable agreement situation, by public switched telephone network (PSTN) (PSTN-Public Switched Telephone Network), in a transmission time, send with respect to 4 times of real-time Transmission shortenings.For example, the transmission delay of a voice messaging reality is 4 minutes under common conversation condition on request, drops to now 1 minute.For stored digital closely, the call voice information of high compression is very suitable.In the example above-mentioned, voice approximately took 52K bytes of memory space in " voice storage server " in 1 minute.Therefore vacate a specific digital compression/decompression circuit that is used for the voice signal storage.

Another advantage of the present invention is, for example by means of developing and next sound compression method by ITU-TSG15 and ISO/IEC JTC1SC29, high-quality audio signal, for example 7KHz or 1 6KHz audio frequency are by means of being fit to method of the present invention even by having one the 0.3 analog electrical phone network to the 3.4Khz network bandwidth by synchronous transmission.

The especially effective implementation method according to the present invention below, reference diagram describes the present invention in detail.

Diagram:

Fig. 1: describe an audio communication terminal equipment KES who is used for realizing " voice mail communication ", KEE in the block diagram mode.(following) as the description of voice storing call.

Fig. 2: in the block diagram mode audio communication terminal equipment KES who meets Fig. 1 is described, KEE, additional have a possibility of access voice storage server.

Fig. 3: two voice storing call KES, different method of attachment between the KEE, that is to say two voice storing call KES1, direct connection between the KEE1, and two voice storing call KES2, between the KEE2, directly connect by a voice storage server VMS, each is included among the communication network KN.

Fig. 4 has introduced V.8 " starting " process of conversation initialization.This with the getting in touch of modulator-demodulator V.34 in not necessarily necessary, when communicating to connect beginning, V.8 " starting " process is once activated.

Fig. 5 table 3/V.8, the ITU-T of expansion be " call function " type of an embodiment of initialization procedure V.8.

Fig. 6 table 4/V.8, the ITU-T of expansion be " modulation Modi " kind of initialization procedure V.8.

Fig. 7 table 5/V.8, the ITU-T of expansion be the protocol type coding of initialization procedure V.8.

During connecting, telegraphone or with other " voice storing call " terminal equipment (KEE1 is to KES1), or be connected with " voice storing call " server (Fig. 3) (KEE2 to VMS or KES2 to VMS).KEE1 has described the direct connection of " voice storing call " to the connection of KES1, and information is not had the time-delay exchange simultaneously.KEE2 has described " voice storing call " (KEE2) to " the direct connection of voice storage server (VMS), while information is not had time-delay and sends to VMS from KEE2 to the connection of VMS.In contrast to this, " voice storing call " KES2 is at any visit of time point late VMS, and therefore, the not further time-delay of information sends to KES2 from VMS.

Fig. 1 comprises the simple block diagram of such audio communication terminal equipment: KES (transmit leg) and KEE (recipient) uses same configuration.Network interface arrives the physical connection of communication network (KN) as KES or KEE.System is controlled at can be switched plain old telephone under the voice storing call mode and return.It also contains each key element of system, and it is included in (for example, microphone/speaker (receiver) is connected in dialing when ring, off-hook, disconnect during on-hook) in the routine call terminal equipment equally.Switch S 1 is transferred to " voice storing call " to microphone (VMP) from plain old telephone.V.8bis/V.8 introduce voice storing call pattern by ITU-T.Audio/speech encoder (for example ITU-T G.723 G.8K bit/s) converts analog voice signal to digital code at transmit leg (KES), and signal is compressed.In multiplexed, this compressed signal is ready for transmission, and the while is voice messaging not only, and control information is also exchanged.In the piece with HDLL LAP X explanation, multiplexed information is passed through an IVU-TV.34 modulator-demodulator by safe transmission.At recipient (KEE), this process is moved in reverse order.The accurate reception of audio frequency/control information that the management of HDLC-LAP-X piece is multiplexing.Multiplexing control of Signal Separation one-tenth and speech data.Voice messaging is decoded and be distributed to loud speaker/receiver by change over switch (S2) in the audio/speech encoder.

Fig. 2 points out the simple block diagram of " a voice storing call " of communicating by letter with " voice storage server ".The description of Fig. 2 is the overwhelming majority of Fig. 1.One " voice storing call server " (VMS) is connected on the communication network (KN).At transmit leg (KES), additional change over switch (S3) can be imported mail box address (subaddressing) by telephone keypad.In order to be corresponding control information and to insert in the data flow that this address is transmitted on multiplexer to address spaces.By this information, in " mailbox " of KEE, " voice storing call " information temporarily is stored among the VMS.When monitoring, KEE provides its " mailbox " number by change over switch S3, and multiplexer becomes relevant control information to this number translated, and this number as data flow.After this VMS hands to KEE to this canned data.The realization of information output is as in Fig. 1.

Standardized or want standardized telecommunications element in the future below at first using.

Have than the modulator-demodulator ITU-T of higher transmission speed V.34 (up to 28000-33000Bit/s) and therefore meet ITU-T affiliated initialization procedure V.8/V.8bis.

V.8/V.8bis, the necessary expansion of the application that is used for having quoted that ITU-T recommends.

This (selectively) perfectly block of information transmission (one meets HDLC-LAP, is the multiplexer ITU-TH.223 of medium with the host-host protocol ITU-TV.gmax of packet mode exchange or with the bag).

Speech coder (the ITU-TG.723 that in the public switched telephone network (PSTN) of ITU-T, has very low bit rate the video telephone, G.dsvd, G.729) present standardization causes speech coder (quality near CCITT G.726) to have the transmission rate of 5-10KBit/s, and it realizes a most probable transmission means.The speech coder of following 4-KBit/s passes through ITU-T SG15 at present equally with regard to ISO/IEC ITCI SC 29 standardization.

Public telephone selects to have in the net the present standardization of the high quality audio encoding device (up to the 16KHz audio bandwidth) of 24-32kbit/s bit rate among ITU-T and the ISO/IEC, and its transmission lacks safety.

In order to realize the present invention, quoted, made up, expanded the part of above-mentioned standard and constituted whole system.

Describe in detail below to realizing several characteristics that meet the communication terminal device that the inventive method predesignates.

Under the situation of using a method according to the invention, the phone that, need have " voice storage " function in order to realize voice mail service, i.e. so-called voice storing call ".

In above-mentioned application, in the method for " terminal equipment is to terminal equipment ", voice from the transmit leg to recipient or acoustic information can be by additional transmitted to common phones, and perhaps (Voice Mail Broadcast) is transferred to many recipients in " voice mail broadcasting " method.Rely on the audio/speech compressibility figure that uses, voice and/or audio-frequency information can transmit with different tonequality.

For this reason, typical terminal equipment comprises the phone element at least, it is used for public switched telephone network (PSTN) or radio telephone, with additional communication control unit, it is by means of HDLC LAP, according to the host-host protocol control data transmission of the perfect communication of safety, also comprise the ITU-T-V.34 modulator-demodulator and be used to implement the V.8/V.8bis device of initialization procedure of suitable ITU-T.In addition, the transmit leg communication terminal device comprises an audio coder at least, it is used for the compressed audio input data signal, and recipient's communication terminal device also comprises an audio decoding apparatus in addition at least, and it is used for an audio information data that receives is decompressed.Fig. 1 comprises the simple block diagram of such audio communication terminal equipment.During connecting, " voice are stored " phone " or (Fig. 3) are linked to each other with one other " voice storing call " or with one " voice storage server ".Fig. 2 indicate one have communicate by letter with " voice storage server " " simple block diagram of voice storing call.

In order communicating by letter, between two " voice storing call ", to be organized in and to send the audio signal that audio communication terminal is established necessary transmission among the KES.

By arrangement " voice storing call " information, just before connecting, carry out " off line " and prepare.Send the user his communication terminal device KES is transformed into " voice storage mode ".Remove receiver and send information with phone.In the audio communication terminal equipment KES of transmit leg, information is compressed and local storage.In a suitable device, monitoring and local editor are possible.An internal storage by phone is determined message length.Should predesignate the memory capacity of 256KByte (about 5 minutes voice) at least.Behind the finish message at once or-hysteresis program that the use-usefulness of chargeing for example necessary night is finished in advance carries out real communication.

Connection is to set up as the connection of voice storing call.This means, " the voice storing call " of transmit leg at first attempt-no time delays-with one meet ITU-T V.8/V.8bis send signal, " the voice storing call " of called device as the recipient triggered in addition, predesignate expansion V.8/V.8bis ITU-T.The ITU-T that further describes the below V.8/V.8bis demonstration of initialization expansion realizes.If it is " voice storing call " (equally by ITU-T V.8/V.8bis) that called device is replied, can V.34 sets up the numeral that is used for information exchange by means of ITU-T so and connect.If called device is one " plain old telephone ", perhaps facsimile equipment for example after a time interval expiration of predesignating, is cancelled connection so.Under situation about having, if realize by called telephone it is manual, the user that one so " voice storing call " call out to trigger-expire-hear after the signal tone (meeting ITU-T V.8/V.8bis) that sends finishes up to the time interval carries out craft and is transformed into plain old telephone.In this case, connection is proceeded as the plain old telephone voice.

If two " voice storing calls " meet mutually,, V.8/V.8bis exchange terminal equipment performance between transmit leg and the recipient according to ITU-T in order to send the signal that to realize " voice storing call " communication.Also determine whether can support partly or/and V.34 full duplex ITU-T connects.

After realizing that the ITU-V.8-/V.8bis signal sends, introduce the V.34 process of modulator-demodulator of ITU-T.(that is to say and determine line quality and measure the maximum modem speed that is fit to) in " line test " with after with complete or semiduplex mode " training ", select the highest modem speed by ITU-T rule V.34, and then, if selection semiduplex mode, press ITU-TV.34, V.34 parameter repeated exchanged, and the necessary control command that is used to communicate by letter with the exchange of 1200bit/s speed.For full-duplex mode, control command must be exchanged as the part that data transmit.For this reason must one " monitoring control channel " of definition.The packet of this passage (CON) mark.Following content is effective as control command at least.

The additional exchange of transmit leg and recipient's terminal equipment performance (for example, the mark of audio/speech compression, the mail tankage, half/full duplex ability is passed on to receive priority)

Adjust transmission common, that priority is arranged or receive parameter according to transmit leg of announcing or receiving side terminal equipment performance, open or close the data network channel.

Send message

Message call

To the voice mail addressing

The communication normal termination

Interrupt communication

Actual communication between transmit leg and recipient, " ITU-T3.4Resyneh " by demonstration adapts with ITU-T data segment V.34.Here, with pre-determined maximal rate compression.For " voice storing call " communication, wrong identification and accidental rectification mode are necessary.This can use HDLC LAP (G.vmax), perhaps is the multiplexed realization of medium in order to bag.To this, the voice messaging of digitlization and compression is stored with the block mode of every 200 byte, and presses HDLC LAP counting.If one is marked with mistake, it is transmitted again so.

In the analog electrical phone network, rely on the actual transmission characteristic and the condition of wrong identification, the transmission of the voice stored information that provides accounts for 1/4th of the plain old telephone connect hours on request, at reception period, demonstrates the voice stored information of reception on recipient's " voice storing call ".

After reception information,,, cancel connection by a control command according to the V.34 end of data phase of ITU-T if all pieces all arrive the recipient.In order to send the message that has " the voice stored information " that finish, locate to predesignate an indication at recipient " voice storing call ".Recipient's user can take its receiver the time in office like this, clicks button and information decompress(ion) off line, also need not be connected with telecommunications network, is inversely transformed into and can hears signal.By an external interface selectively on " voice storing call ", for example press ITU-T V.24, under situation about having, for temporary transient storage and further operation, can be by being connected to the information that the personal computer transmission receives.Same under situation about having, " voice storing call " with a same or close interface links to each other with the data terminal unit in the communication network.At this moment, be used for that adjust demodulator not separating of telephone terminal and data terminal unit is necessary.

Other devices of the present invention have been predesignated the preparation in " voice storing call " information.For example also the numeric keypad of available phone adds the mail mail box address that provides the recipient.In half-duplex mode, by the exchange of parameter V.34, in 1200bit/s " symbol exchange " this notification receiver.In full duplex mode, by means of a control command with common V.34 transmission data.

After the information of being ready to, can begin to be established to the connection of " voice storage server " immediately.For example, can connect with the program of the hysteresis of establishment in advance in order to use charge lower night.

Here, connect each all conducts and meet the V.8/V.8 connection appearance of bis of ITU-T.At ITU-T V.8/V.8 among the bis, " voice storage server " is shown as " voice storing call " equally, can discern by the exchange of recipient and sending side terminal equipment performance simultaneously, whether can between " the voice storing call " of transmit leg and recipient " voice storage server ", carry out " voice storing call " communication.If so, V.34 after " symbol exchange " method (" line test ", " phase place adjustment ", " training "), select the modulator-demodulator transmission rate of maximum possible by ITU-T rule V.34 at ITU-T.

Actual connection between transmit leg and recipient meets ITU-T digital form V.34.Here, in order to ensure best feasible transmission, supposed symbol HDLC LAP or be the multiplex protocol of medium with the bag.Using under suitable subaddressing (" the Subaddress ") situation, transmit the voice stored information of recipient's " voice storing call " " voice mail ".

Accurately after the reception information, cancel the connection between emission side " voice storing call " and " the voice storage server " that be used to transmit.

At next constantly, the related protocol element " selection is called " of " voice storing call " usefulness " voice storing call " of recipient calls the voice storing call information that is received by " voice storage server " (meeting ITU-TV.8/V.8bis).In addition, " the voice storing call " with call capability (" inquiry ") is necessary.

After the recalls information, for example a lasting glittering red light may show that spendable reception voice stored information all set from " voice storage server ".Yet, receive the user information decompress(ion) and inversion changed into the voice signal that can hear.In this case, recipient's communication terminal device " off-line " is by the selectable external interface that has on the recipient's " voice storing call " who connects to data terminal unit, for example a personal computer, for storage/operation once more, the information of reception can be transmitted again.

Next step is embedding under the spread scenarios, proposes V.8 (appendix 1) and the ITU-T embodiment of parameter literary style V.34 of ITU-T.V.8 the bis initialization procedure is similar (appendix 2).

Appendix 1 and appendix 2 are parts of describing.

Fig. 4 has described V.8 " startup " process of conversation initialization.This with the getting in touch of modulator-demodulator V.34 in not necessarily necessary.V.8 " startup " process is once activated when communicating to connect beginning.

Demonstrate significantly among Fig. 2; In the connection that meets V.8 (after this being V.34), after connecting connection, the time-delay in calling terminal equipment (DCE) side amounts to nearly 1 second, then CI signal (CI=" call identifier ") 300b/s speed rates.Reply without ANSam signal (answer signal V.8) up to calling terminal equipment (DCE).Call out then DCE send signal CMs (" call menu signal; can select what is called " call function " kind-for example " voice storing call "; can select the modulation of modulator-demodulator-for example be used for full duplex V.34, have selectable protocol type-LAP "? ").In order to state called party's terminal equipment performance, whether called party's reflection-with its identical type of " JM " signal-information, rely on both sides' table definite may move both sides' communication.If of course, in 2 seconds, realize V.34 process so generally speaking.

Be provided with " voice storing call " if call out DCE, particularly plain old telephone is for example cancelled V.8 process by definite time interval (timer) of operation so.This time interval does not have regulation in V.8, and stipulates in the use of " voice storing call ".

In present ITU-TV.8 version, lack the CM/JM coding assignment device of " voice storing call ".In the requirement of ITU, require to adopt this distributor.Introduce an example below, illustrate that people can adopt it.

Table 3/V.8 (referring to Fig. 5) describes " call function " kind V.8 in detail.In first eight-level code, ITU has kept some code-points.For " voice storing call ", two code-points have been kept.This explicable code-point is by example application.

Modulation Modi describes in detail in table 4/V.8 (referring to Fig. 6).

If only stipulate the agreement of " voice storing call " server, can leave out for eight codings of feasible " agreement " so.If used a plurality of agreements, for example (PSTN Mobil), then uses the corresponding code-point of this eight-level code and regulation protocol class type to different network type.

In table 6/V.8 (not illustrating), definition GSTN kind.Whether it shows, by mobile telephone network connection calling or by calling terminal equipment.

Owing to, V.8 be ten minutes (300b/s) slowly, V.8 can not predesignate the exchange of additional information by ITU-T for a large amount of information exchanges.Therefore, by the ITU-T all Additional Agreement or the control commands of process operation of modulator-demodulator V.34.

After " startup " successful operation V.8, end V.34 " startup ".This comprises:

-V.34 " wireline inspection " (line quality assessment)

-V.34 HDX (half-duplex) or FDX (full duplex) " training "

-V.34 parameter exchange

-V.34 synchronous again

If exchange the use data of self, and then, the digital transmission mode that exchange is used for it V.34.

If during data communication, line quality is defective, and so so-called " retraining " (new training) V.34 stopped.Interrupt transfer of data and " newly train " new transmission speed of adjustment each time with enough quality.Continue transfer of data then.In present V.34 version, during " retraining ", can not realize the synchronous transmission of application data (for example audio-frequency information).

After successfully sending the ITU-T-V.8 signal, the ITU-T of starting modulator-demodulator is process V.34.Select to meet the V.34 the highest modem speed of ITU-T in " line test " with complete or semiduplex mode " training " back (that is to say, determine line quality and measure maximum suitable modem speed).And then, V.34 (only being used for semiduplex mode) exchange V.34 parameter again by ITU-T, and be used to communicate by letter necessary as control command with the exchange of 1200 bit/s speed.For full-duplex mode, control command must be exchanged as the part of transfer of data.For this reason, must virtual " monitoring control " channel of definition.Indicate this packet especially.Following content is effective as control command at least.

Exchange transmit leg and recipient's optional equipment function, for example, the sign of audio/speech compression, the mailbox capacity, half/full duplex ability, circular receives preferred value;

According to the transmit leg of statement and recipient's equipment performance, adjust transmission common, that priority is arranged or receive parameter; Open and close virtual application channel (for example being used for audio frequency);

Send message;

Message call;

To the voice mail addressing.

Normal termination communication.

Cancel communication.

Claims

1. Communication from one communication terminal equipment (KES) to another over a communication network within the scope of ITU-T-V.8 or ITU-T-V.8bis initialization and ITU-T V.34 modem using a secure transmission protocol A method for terminal equipment (KEE) transmission of audio signals, which has the following method steps:

a) compress the digital input audio signal in the sender's communication terminal equipment (KES);

b) the compressed audio signal is divided into data blocks;

c) Each data block is marked as an audio information data block or a command data block;

d) the audio information data block is transmitted from the sender's communication terminal equipment (KES) to the receiving communication terminal equipment (KEE) and stored there, using a protocol that complies with error detection and meets the required security transmission method;

e) The stored audio information data block is output as a digital audio signal or further processed after using the decompression method;

f) To control the communication process, command data blocks are exchanged securely between sender and receiver by using a protocol that complies with so-called transmission methods with error detection, required security.

2. The method according to claim 1, characterized in that the following data compression method depends on the transmission speed of the data block, and the transmission delay of the compressed audio signal between the communication terminal equipment (KES, KEE) is shorter than that of the digital The delay of the input audio signal is short.

3. The method as claimed in one of the preceding claims, characterized in that the audio signal is a speech signal.

4. The method as claimed in one of the preceding claims, characterized in that the identifiers as audio information data blocks are placed before the respective command data blocks in the control field.

5. The method as claimed in claim 4, characterized in that an identifier for the audio information data block is specified and in addition an identifier for the speech information data block and an identifier for the command data block are specified.

6. Method according to claim 4 or 5, characterized in that, in full-duplex mode, the pre-defined control data blocks for control generally together with the audio information data blocks from the transmitting communication terminal equipment (KES) via the network The data channel is transmitted to the receiving communication end equipment (KEE).

7. The method according to any one of the preceding claims, characterized in that, within the data transmission range of the ITU-T V.34 modem, an initialization procedure according to ITU-T V.8/V.8bis and an error-free modem are used Identify, secure transport protocol required.

8. The device for carrying out the method according to one of the preceding claims comprises a communication network (KN) and a sending communication terminal equipment (KES) and a receiving communication terminal equipment (KEE), each of which has at least one general telephone terminal The device is characterized in that, in addition, the communication terminal equipment (KES, KEE) has an initialization process by means of ITU-T V.8/V.8bis, a transport protocol with error recognition, required safety and an ITU-T V .34 Modem to control communication control equipment for data transmission; sending communication terminal equipment (KES) additionally has at least one audio coding device for compressing audio input digital signal; receiving communication terminal equipment (KEE) has at least one decompressing received An audio decoding device for audio information data.