[go: up one dir, main page]

WO1994018667A1 - Programmateur electronique a enregistrement vocal - Google Patents

Programmateur electronique a enregistrement vocal Download PDF

Info

Publication number
WO1994018667A1
WO1994018667A1 PCT/US1994/001597 US9401597W WO9418667A1 WO 1994018667 A1 WO1994018667 A1 WO 1994018667A1 US 9401597 W US9401597 W US 9401597W WO 9418667 A1 WO9418667 A1 WO 9418667A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
information
data
time
message
Prior art date
Application number
PCT/US1994/001597
Other languages
English (en)
Inventor
Ari B. Naim
Thomas J. O'brien
Original Assignee
Naim Ari B
Brien Thomas J O
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naim Ari B, Brien Thomas J O filed Critical Naim Ari B
Publication of WO1994018667A1 publication Critical patent/WO1994018667A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/02Digital computers in general; Data processing equipment in general manually operated with input through keyboard and computation using a built-in program, e.g. pocket calculators
    • G06F15/025Digital computers in general; Data processing equipment in general manually operated with input through keyboard and computation using a built-in program, e.g. pocket calculators adapted to a specific application
    • G06F15/0266Digital computers in general; Data processing equipment in general manually operated with input through keyboard and computation using a built-in program, e.g. pocket calculators adapted to a specific application for time management, e.g. calendars, diaries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/109Time management, e.g. calendars, reminders, meetings or time accounting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present invention relates generally to digital voice recording devices coupled with a programmable daily scheduler that possess alarm reminder options.
  • the device is accomplished by means of either an external switch on a keypad or through spoken voice commands.
  • the device also has the feature of being able to present information both on a visual display or in audio (voice synthesis) .
  • the prior art is comprised of various types of scheduling systems, hand-held, U.S. Pat. No. 4,117,542, desk-top, U.S. Pat. No. 4,548,510, and through communication lines, U.S. Pat. No. 4,783,800.
  • the present invention relates most closely to the hand-held device.
  • the device disclosed in U.S. Pat. No. 4,117,542 is intended for storing and retrieving telephone numbers, street addresses, appointments, and agenda. It has the option of offering normal calculator functions as well.
  • the main difficulty with this device is that it is limited to the laborious task of manually typing in messages and other information on the keypad. Entering information by means of typing on the keypad has many disadvantages.
  • voice as a means for entering information.
  • Voice or audio
  • audio recording permits any language or sound to be recorded.
  • Prior art in voice recording and reproducing for use in wrist watches appears in U.S. Pat. No. 4,391,530, where voice is coupled to alarm time entries to render context to the alarm times.
  • An additional feature of a confirming external switch is disclosed in the U.S. Pat. No. 4,405,241.
  • the design size constraint of both of these designs compromises the extent of memory storage and the number and types of functions for data manipulation.
  • voice as an input means reduces the extent of manual input, some manual input of information and functional operation is still necessary. By using voice in order to control a device, the simplicity of use is even further advanced and, in some instances, manual demands are completely eliminated. Voice control also alleviates the process of learning to operate the device.
  • a primary goal of the present invention is to provide a device in which use is made of voice entry to not only enter messages and alarm times but also to control functions of the device, such as switching between modes of operation (message entry and alarm time entry) .
  • the proposed method of accomplishing this goal removes the need for visual contact as well as reducing memory and voice recognition requirements.
  • This invention is intended to offer a means of keeping a daily schedule/agenda in a simple and easy to use fashion. Messages and appointments are stored by either recording an audio signal (e.g. voice) or by typing manually on an alphanumeric keypad. The device is fully operational both by voice commands means or through a keypad.
  • an audio signal e.g. voice
  • the device is fully operational both by voice commands means or through a keypad.
  • An electronic scheduler in accordance with the present invention comprises: (a) a real time clock, comprising means for keeping current time and date; an alarm time register; an alarm date register; means for identifying a match between the current time and a set alarm time stored in the alarm time register; means for identifying a match between the current date and a set alarm date stored in the alarm date register; and means for outputting an alarm time reached signal or prerecorded message or sound when the current alarm time matches the set alarm time and the current alarm date matches the set alarm date; (b) a random access memory (RAM) for storing units of compressed digital audio data defining a message; (c) an audio storage/retrieval processor comprising: a microphone for receiving audio signals; amplifier means for amplifying the audio signals; a first low pass filter; an A/D converter for converting the amplified audio signals into digital audio data; data compression means for compressing digital audio data from the A/D converter into compressed digital audio data to be stored in the RAM; means for retrieving digital audio data from the RAM; means
  • the real time clock comprises an oscillator, a time counting unit incrementing continuously based on a reference signal provided by the oscillator, a date counting unit, and means for making a periodic comparison between the alarm time stored in the alarm time register and the current time provided by the time counting unit.
  • an electronic scheduler in accordance with the present invention may comprise means for initiating a programmed sequence that sounds an alarm or plays back a recorded message in response to the alarm time reached signal.
  • Preferred embodiments may also comprise means for marking selected memory addresses with an alarm time such that corresponding data is to be played back or logged into a scheduling network along with other messages to be played back.
  • Preferred embodiments may advantageously include means for storing context information packets associated with selected messages, the context information indicating the time the associated message was entered; alarm time(s) associated with the associated message; the number of times the message has been played; and a date on which the message may be automatically erased.
  • An electronic scheduler in accordance with the present invention may also comprise a read only memory
  • ROM read only memory
  • the addressing means may include a microprocessor or a digital signal processor controlling information flow between all components.
  • preferred embodiments may include a read only memory (ROM) storing firmware, application software, screen message data and prerecorded voice message data.
  • ROM read only memory
  • Preferred embodiments may also include means for grouping entered information into data groups where a group can include name, telephone number, address, and message; and search logic means for retrieving from memory all the stored information in a group when only a portion of the information is provided.
  • an electronic scheduler in accordance with the present invention may include: (i) speech patterning means for extracting identifying parameters from the digital audio data; (j) speech recognition means for comparing the extracted identifying parameters to each of a group of reference identifying parameters associated with a first reference vocabulary, and producing a match indication as a function of the comparing; (k) command logic means to effect the performance of predetermined functions of the electronic scheduler upon receiving the match indication; and (1) interactive speech control means for controlling the interaction of the command logic means with the voice synthesis means such that the voice synthesis means synthesizes prompts indicating when speech commands are to be input and which options are available at a given instant.
  • the first reference vocabulary may be either factory installed and speaker-independent or be created through a training process with spoken utterances or sounds by extracting from the utterances or sounds identifying parameters and storing the identifying parameters as the group of identifying parameters for the first reference vocabulary.
  • the speech recognition means may include means for producing a nonmatch indication when the match indication does not result from the input of a given spoken utterance, the nonmatch indication indicating that the given spoken utterance was not recognized.
  • the predetermined functions include: turning on and off the electronic scheduler, retrieving specified stored information, setting the alarm time associated with a particular recorded message, and setting a secret code for limited data access.
  • Preferred embodiments may also include: (m) a second reference vocabulary containing time and date information for use by the speech recognition means to extract time and date information from the extracted identifying parameters; (n) alarm time logic means to allow entry of an alarm time including time of day and date upon the speech recognition means producing a match indication; and (o) interactive speech recording means for controlling the interaction of the command logic means with the voice synthesis means, whereby the voice synthesis means synthesizes speech or sound prompts for indicating the required delivery time of an audio message input and which options are available at a given instant.
  • an electronic scheduler in accordance with the present invention may comprise means for audibly confirming the content of the information entered by the speech recognition means.
  • audio input is converted from analog to digital and stored in random access memory (RAM) for later retrieval.
  • RAM random access memory
  • digital memory storage offers the control integrity and access that is necessary for a scheduling/agenda system.
  • Other digital mass storage devices can be used either as a replacement to or in addition to the RAM, such as, optical or magnetic disk drives. For example, to be able to record "Call Joe at 555 1212" and have this message alert one to this task shortly before it must be executed requires that one be able to program an alarm and have this particular message ready for play at that time instant. Any audio input can thus be automatically incorporated into the scheduling/agenda system along with typed in information. The digitized audio information simply receives a different storage location in the memory.
  • the audio input offers is in lowering the level of the user's required sophistication and familiarity with technology.
  • Yet another advantage of the audio input is that information other than what can be expressed as alphanumeric characters can be recorded, such as music or an individual's voice.
  • the output of the device encompasses both visual display and audio output.
  • the display shows previously entered messages, messages in the process of being typed in, commands, functions and more. When an audio message is searched and found the display will indicate something to the effect of "Audio Information, press ⁇ PLAY> to listen.”
  • the audio output is achieved by accessing the particular block of data stored in the memory, passing it through a digital to analog converter, filtering it, amplifying it and outputting it through the speaker.
  • the audio playback can be halted at any instant, played back repeatedly or saved for future reference.
  • Audio commands and readouts are accomplished by extracting prestored digitized commands and digits from the ROM concurrent with the display of these commands on the display.
  • the scheduling aspect of the device offers a method of logging into and retrieving from memory telephone numbers, addresses, appointments, meetings, and other information and daily activities. These entries can be classified into user defined or factory predefined categories (e.g., personal entries, business entries). Schedule inquires can be made by date, by time, or by any key-word present in the stored information.
  • a programmable alarm is coupled with the scheduling aspects of the device.
  • the alarm time the time the alarm turns on, can be appended to any of the entries made, be it an appointment, a meeting, a phone call that must be made at a specific hour or any other alarm related need.
  • Alarm times can be appended to audio inputs as well, extending the utility of the audio input in an important way. For example, an individual can quickly leave himself a note to remind himself of a task to be performed by simply speaking into the device and then keying in the hour for the alarm to turn on. An option is also available whereby the actual message entered will be played back instead of an alarm.
  • the search capability offers access to stored entries by providing only a portion of a particular entry to be found.
  • the method used to search is what is known in the field of artificial intelligence as a "top-down" search. This involves first searching all the name fields, then all the telephone number fields, then all the address fields, then all the message fields, and finally all the search index fields. The first item found that satisfies the search is displayed. The system continues to search through additional fields to locate another match. If the appropriate key is pressed, the system displays the next match found, until a message "search complete” is displayed to announce that the entire memory has been searched, and all matching fields have been found. In the case where additional matches are requested and the system is still in - li ⁇ the process of searching, the message "searching" will flash on the screen to indicate such a status.
  • Voice operation of the device is available through user spoken commands.
  • the device Before entering a command, the device provides an audio prompt to indicate when the command should be spoken and what command options are available at that instant. For example, after entering a message through the audio means, a reminding alarm is set at a specific time at which the message will be played back. After entering the message, the system will prompt the user by announcing: “alarm ?”. By saying “No” the entry is complete and no alarm time has been appended. By saying “Yes”, the system prompts: “hour ?”; the user then says one word indicating the hour. The system then prompts: "minute ?”; the user then says two additional digits specifying the minute.
  • the main objective of this device is to use voice as a means for entering information, such as appointments, which will offer a user a more efficient and less demanding mechanism for maintaining a schedule.
  • the present innovation introduces a method of exploiting voice recognition for controlling the device's functionality. Because the dominant constraint of the device is its size, use of many sophisticated voice recognition systems, that demand high computational power, are prohibitive.
  • the proposed method offers a unique and practical solution by which more simple, portable and less computationally demanding voice recognition algorithms can be taken advantage of.
  • this specification also describes, in detail, one of many possible hardware implementations of the proposed objectives.
  • the design pays special attention to power consumption, memory backup features, memory management and voice recognition error minimization.
  • Power consumption is an important consideration for extended portable usage, since voice synthesis, recording and playback components consume a relatively large amount of power.
  • Storage of audio data in digital form requires relatively large amounts of memory and so memory management is vital through data compression and automatic erasing features.
  • audio confirmation and automatic rejection of poorly received voice inputs a means for reducing recognition errors, at little computational burden, is effected.
  • FIG. 1 is a block diagram of one embodiment of a voice controlled appointment keeper according to the present invention.
  • FIG. 2 depicts the audio storage/retrieval processor 11 portion of the block diagram shown in FIG. 1.
  • FIG. 3 depicts the circuitry for the keypad interface component 31 shown in FIG. 1.
  • FIG. 4 depicts the circuitry for the real-time- clock component 5 shown in FIG. 1.
  • FIG. 5 depicts the circuitry for the power control component 3 shown in FIG. 1.
  • FIG. 6 is a flowchart of the voice recognition and training sequence used to recognize spoken commands for function execution and spoken information for data entry; this algorithm is implemented in software and is located in the voice recognition processor 43 in FIG. 1.
  • FIGS. 7A, 7B and 7C are flowcharts for the interactive voice control and voice information entry dialogue between the device and the user.
  • FIG. 8 is an example of interactive dialogue between the device and the user for entering an alarm time, corresponding to block 136 in FIG. 7A.
  • FIG. 9 is one possible outer appearance design of the device for handheld size.
  • FIG. 1 showing the key building blocks of a device suited for carrying out the present invention.
  • Component interconnectivity is specified by lines; arrows designate direction of information flow and no arrows indicate bi-directional flow.
  • the information entered into the device is conveniently divided into two types of information.
  • the first is "control information", intended to control the device in performing such functions as playback, search the data base, set alarm time and the like.
  • the second type is "message information” which includes telephone numbers, names, notes and the like, that is usually intended for storage and retrieval purposes.
  • Voice recognition is performed mostly on the control type information, while the audio inputted message type information is limited to a stored digitized audio message. Voice recognition can, however, be used to input some types of message information, such as phone numbers, that are stored in text format, while the name and other information associated with that message information can be entered and retrieved as message information. All typed text information (names, numbers, addresses, emos, and functional commands) entered through the keypad provides both control and message type information.
  • Audio range frequency signals enter the Microphone 29 where they are transformed into electrical signals and transferred to the Audio and Storage Retrieval Processor 11.
  • the electrical signals from the microphone are amplified by the input amplifier 50 (FIG. 2) to raise the signal level, and then passed through a low pass filter 51 to remove aliasing frequencies.
  • the amplified anti-aliased audio signal is then digitized by an A/D converter 52, which converts the analog signal into a binary representation capable of being stored, retrieved, and manipulated by digital hardware. Compression of the binary representation is then performed by the data compressor 53 in order to increase the amount of recording time available for a given digital memory size. It is possible to achieve the same results using software to perform the data compression.
  • the functional block is represented as hardware to show functional necessity.
  • the data rate (amount of information (in bits) required per time to effectively represent the signal that is to be reproduced) is a limiting factor in the amount of recording time available for a given digital memory size.
  • ADPCM Adaptive Differential Pulse Code Modulation
  • ADPCM Adaptive Differential Pulse Code Modulation
  • Other compression techniques can be used in addition to, or as a replacement of, ADPCM.
  • Some algorithms are capable of compressing random binary data at ratios of 1.5:1 to 10:1 (depending on the redundancy factor of the data) . Through the combined use of more than one compression algorithm, the data rate can be reduced to yield longer recording times without expanding memory resources or significantly affecting audio sound quality.
  • the compressed data is stored in the RAM 9 (FIG.
  • Each converted input binary sequence defining a message is assigned an address so that it can be retrieved at any time, marked with an alarm time to be played back when the alarm time sets off, or logged into the scheduling network along with other typed in messages.
  • a data "header” or “footer” is stored with each message to indicate the message length or alternatively, an "end of message” or “beginning of message” sequence is stored with each message to define its memory location.
  • An alternative and more restrictive addressing method involves reserving a portion of memory where a table listing the addresses of the messages is located. In addition to this storage addressing information, a context information packet is also stored with the actual message.
  • the context information contains the time the message was entered; the alarm(s) , if any, that are associated with the message; the number of times the message was played; a date by which the message may be automatically erased; and any other information (including a text reference message) that may be used to control or track the message.
  • Inputting information via the keypad is possible.
  • There are alphabet keys, numeric keys, and control keys See FIG. 9) . Pressing the alphanumeric keys serves the purpose of entering alphabetic letters and numbers as indicated by the labels closest to each alphanumeric key.
  • Function keys are also available to execute such operations as: search by various categories (e.g.
  • the CPU periodically selects a row to be read through a latch 70.
  • This latch is also used to control other system functions (such as volume control in this application) if there are left over outputs.
  • the latch data is decoded by a decode circuit 71, a data selector, and the selected row is read by selecting another latch 73.
  • the data read by the CPU is a bit pattern of the keypad, such that any key can be checked individually for a depression.
  • the ON key is decoded separately as to provide a switch that works without the CPU.
  • a latch 74 is used to read this key during power-on as well as other signals.
  • This circuit can be implemented in many different ways, and is shown here to provide functional completeness. The essential functionality afforded by this circuit is the sensing of keypresses.
  • the first and second output means are possible, i.e., by display means on the face of the device and/or by synthesized voice means.
  • each alphanumeric character that is displayed also triggers the transmission of a particular prestored digitized acoustic sound.
  • the sequence of these sounds, each sounding out one character produces the sound of the complete word/number.
  • These digitized acoustic sounds are prestored in the ROM 7 (FIG. 1) .
  • There are other ways of producing the synthesized voice output and many commercial packages are readily available (e.g. AT&T DSP16 series) (See further discussion below on voice synthesis) .
  • FIG. 2 depicts the audio storage/retrieval processor 11 of FIG. 1.
  • outputting audio information is accomplished by first addressing in the RAM the particular memory block that is associated with the message to be heard. Then the data is decompressed in the data expansion unit 54 into binary data for the D/A converter 55.
  • the D/A Converter converts the digital signal to a sampled audio signal, passes it through a Low Pass Filter 56 to remove unwanted harmonics, and then passes it through an -Amplifier 57 which drives a Speaker 25.
  • the message can be deleted or saved in memory, replayed, tagged with a new alarm for future referencing amongst other options.
  • the LCD (liquid crystal display) Display 15 in this example device is a text or graphic display for providing information to the user. Information such as status, options, or recorded information (previously typed into the unit) can be shown on the display.
  • the display is controlled by the CPU as defined by the firmware stored in the ROM.
  • the CPU 37 (FIG. 1) controls the information flow and operational functions of the unit. At each CPU instruction cycle, an instruction is received from the ROM. The CPU then transfers data between itself and an external device or between two external devices.
  • the CPU operation can be interrupted periodically to handle maintenance functions such as reading the keypad for key presses, checking to see if any of the alarm settings has reached its term (i.e., comparing the stored alarm times with the current time) , checking to see if the "ON" button is depressed or if the main battery level is low.
  • the Oscillator 13 (FIG. 1) provides a time base for the internal functions of the CPU. As shown in FIG. 1, this same time base may be used as a reference for the audio storage/retrieval processor 11.
  • the address decode circuit 35 shown in FIG. 1 controls the CPU's access to the peripherals or devices of the system. This unit effectively divides the address space of the CPU into portions big enough for the individual peripherals or devices.
  • the read only memory (ROM) 7 shown in FIG. 1 contains the firmware (hardware drivers) , application software, screen message data, pre-recorded voice messages (user alerts, etc.), and other data required for operation of the device. Its function is to provide preprogrammed instructions for the CPU. At each instruction cycle of the CPU, the ROM receives a binary address from the CPU, and presents the data corresponding to that address.
  • the random access memory (RAM) 9 shown in FIG. 1 stores text and voice data along with some statistical information about the data.
  • the information contained within the RAM can be accessed by the CPU at any time, and its organization is entirely dependant upon the software driving the system.
  • Additional digital mass storage devices can be used, such as optical or magnetic disk drives.
  • the real time clock 5 shown in FIG. 1 stores and counts the time of day, day of the week, day of the month, month, and year with the use of an oscillator.
  • the alarm function of the real time clock is used to wake up the unit during stand-by mode when one of the alarm times that were set matches the current time of the clock.
  • the periodic interrupt function of this component is used to provide the CPU with interrupts at regular intervals in order to scan the Keypad or check the alarm times, etc.
  • the input to the real time clock is primarily for setting the clock time and for setting the alarm time. Refer to FIG. 4 for the circuitry of the real time clock 5 in FIG. 1.
  • the time counting unit 91 and date counting unit 92 increment continuously based on the reference frequency provided by the oscillator 90.
  • the current time and date for these units can be set by the CPU at the user's discretion.
  • the time and date set and updated in these units are the source from which the CPU can obtain current time and date information for display to the user.
  • the alarm times and dates set by the user are addressed and stored in the memory unit.
  • the CPU extracts the most current pending alarm time and date and sets the alarm time register 95 and the alarm date register 96 to that time and date.
  • a comparison is made on a periodic basis between the time register 95 and the time counter 91 by the comparator 93 and comparison is also made in the same periodic manner between the date register 96 and the date counter 92 by the comparator 94.
  • the alarm time signal 98 indicates this occurrence to the power control circuit 3 (FIG. 1 or FIG. 5) .
  • the clock address decoder 99 controls the operation of the registers and counters of this circuit by transmitting select signals 97 to the unit that is to be activated.
  • the instruction for activating the whole of the real time clock component is received from the address decode circuit 35 (FIG. 1) .
  • the memory back-up circuit 17 assures data retention in stand-by mode (power off) , and protects the RAM from false writes during power on/off.
  • a lithium battery 39 is used to provide the power necessary for retaining RAM data and running the real time clock during stand-by mode.
  • the power control circuit 3 boosts the voltage of the batteries 41 (either replaceable or rechargeable) to 5V in order to power the system during active operation (power on) , and automatically removes the 5V in order to enter stand-by mode when the CPU does not access the keypad interface 31 in a predefined time frame.
  • the system is turned off in the event of a CPU failure, and can also be turned off (at the user's request, or automatically when user input has ceased for an extended period) by halting regular keypad interface accesses.
  • FIG. 5 for a more detailed circuit description of the power control circuit block of FIG. 1.
  • the unit can be turned on by two sources: the user via a simple momentary switch, and the reaching of an alarm time.
  • Flip-flop 101 is SET by the alarm signal 98 (FIG. 4) or the ON signal 76 (FIG. 3).
  • the output of the flip-flop controls the DC/DC converter 102, and turns it on or off, which applies or disconnects the power to the circuit in order to conserve battery usage while the unit is not in use.
  • the flip-flop is turned off by the time-out of the One Shot 103 (retriggerable) which is caused by the absence of keypad scans.
  • the keypad scans can be stopped purposely, or by a CPU failure.
  • the low battery detect circuit 105 provides the system with a signal (low battery) that indicates a battery voltage lower than a predefined value. This indication can be used to alert the user or shut down the system during an out-of-tolerance condition.
  • the reset circuit 104 provides the system with a momentary set-up time after power-on to allow for oscillator stabilization and low level hardware initialization, and is also necessary for the device start-up (power on) .
  • the reset function is built into some CPUs, and is therefore shown only for functional completeness. This power method is replaceable with a simple switch, or can be implemented without a DC/DC converter.
  • the external interface 16 shown in FIG. 1 is for the purpose of transferring the memory contents to a personal computer or to provide additional memory to the device.
  • This interface is a means by which the information stored within the device can be archived by the user through downloading the data to a personal computer or storing in external removable non-volatile memory. Alternatively, this interface can be used to upload information into the device's memory (e.g. voice mail). The expansion of storage capability is also facilitated by this interface for extending the available audio recording time and text storage capacity of the device. Voice recognition of spoken command and spoken data information is performed in the following manner. As FIG. 6 shows, the A/D converter 52 (FIG.
  • each reference word in the vocabulary has identifying parameters that distinguish it from the other words in the same vocabulary.
  • the feature extractor is followed by the edge detector 111 which locates the point in the recorded time frame where the particular utterance actually began and ended (there may be some instant of delay before speaking the word) and uses those points as the points of reference for the beginning and ending of the relevant identifying parameters. Ascertaining the beginning and the ending boundaries of the utterance is an important element to forming identifying parameter sets that are comparable to those in the reference vocabulary.
  • a parameter set formed from a time shifted version of the same uttered word, or an expanded or compressed version of the same uttered word, can lead to very different identifying parameters, and consequently, an error in the classification of the utterance.
  • the identifying parameters extracted from the incoming spoken utterance are compared 112 to each group of identifying parameters belonging to each of the words in the reference vocabulary 117 and a measure of distance is made for each one (e.g., the Hamming distance for binary code words).
  • the comparison that produces the best (e.g., the smallest) distance is considered to be the best match, and its measured distance is then compared to a defined threshold 114 to ascertain whether this best match is close enough to be considered the same.
  • the final decision 115 is reached as follows: if the best distance measured does not pass the threshold, the utterance is not considered to match any of the words in the reference vocabulary and the final decision is to request a retry (that the same utterance be produced again for another attempt at recognition) ; if best distance measured does pass the threshold, the utterance is considered to be that word corresponding to those identifying parameters it was found closest to.
  • the final word can be either a command word, such as, "YES" or "NO" or information words, such as, digits for alarm time entry.
  • the reference vocabulary which is characterized by the identifying parameters of the words it contains, is formed in a fashion similar to the recognition process described above.
  • the feature extractor 110 and the edge detector 111 are used to construct the reference identifying parameters when the user of the device utters words into the microphone. Because this training process for the reference identifying parameters is performed by a particular user of the device, this method is referred to as speaker-dependent. This is done by initially placing the device in training mode (the answer to training 112 question would be YES) through an external switch, then pressing one of the ten digit keys and speaking the number that corresponds to the pressed key into the microphone. (The device may ask for more than one trial for each word spoken in order to construct a more tolerant identifying parameter set.) Several more keys can be programmed to be available as spoken commands including the playback function, the alarm function, the memo function, the secret function and the telephone function.
  • the device is trained to associate the spoken utterance with the particular key that was pressed. For example, the user presses the key marked “3" and says “THREE”.
  • the training unit 116 now adjusts the parameter set in the reference vocabulary to regard the spoken word "THREE" as representing the same function as the pressing of the key marked "3".
  • blocks 112 and 116 can be removed when the reference vocabulary identifying parameters have been factory installed and cannot be changed.
  • This case would pertain to speaker-independent systems that are capable of recognizing the spoken utterances independent of which speaker spoke them. This is usually accomplished by having a large number of speakers utter the words in the defined vocabulary. Each speaker will produce identifying parameters for each of the words in the defined vocabulary that are slightly different than those produced by any other speaker. The different identifying parameters produced by each of the speakers can then be averaged and used as the "speaker-independent identifying parameters".
  • the voice output for the purpose of sounding information and prompts (e.g., data required from the user) to the user is shown as the voice synthesis processor 12 block in FIG. 1.
  • the functionality necessary for this task is that required for storage of voice patterns, such as the audio message storage means, and that required for audio playback, such as said audio retrieval means 11 (FIG. 1 or FIG. 2).
  • the voice synthesized data patterns After data compression 56, are transferred through the CPU data bus to the ROM for storage, along with the other data necessary for system operation.
  • Output of the voice synthesis operation is initiated in different situations: it can be automatic when reading the display output, if such an option is selected by the user; it can be part of a dialogue (user prompts) for voice or keypad control command entry; it can be part of a dialogue for voice or keypad data information entry; or other situations, such as sounding of voice alarms.
  • the times at which this occurs are controlled by the software of the system.
  • the data for producing the intended words or sounds is read from the ROM and sent through the CPU data bus to the audio storage/retrieval processor 11 (FIG. 1 or FIG. 2) where it is expanded 54 (FIG. 2), D/A converted 55 (FIG. 2) , filtered 56 (FIG. 2) and amplified 57 (FIG. 2) for listening by the user.
  • This method for voice synthesis is accomplished without additional hardware requirement.
  • the drawback to this method is a significant increase in ROM size needed to store the extensive data for the vocabulary's speech patterns.
  • the use of a separate voice synthesizer employing allophone, phoneme, LPC, or other comparable methods, would reduce the storage requirement of the needed vocabulary at the expense of additional hardware.
  • the memory requirement is reduced because only the pointers to the pattern sequences, needed to reproduce the intended utterances, are stored.
  • the patterns for the allophones or phonemes can be stored in the system ROM, along with the vocabulary look-up tables (where the needed allophones or phonemes for each word are listed) .
  • the particular implementation would dictate the necessity of additional hardware for the voice synthesis requirement.
  • an interactive method is devised.
  • This interactive method employs the voice synthesis means and the voice recognition means, both discussed above, in the following fashion.
  • FIG. 7 for a flowchart of one example of a software implementation of the interactive method.
  • the recognition process can be initiated by either engaging an external switch 130 or entering an audio message 160. (Other uses for the voice recognition system includes voice recognition training 116 (FIG. 6).) Using the external switch to initiate recognition, the device prompts the user through voice synthesis means to enter a command by sounding "ENTER ALARM".
  • FIG. 8 illustrates an example of alarm time entry 135 dialogue.
  • the interactive method effectively reduces the vocabulary size for voice recognition and therefore also reduces the memory requirement and raises the recognition performance (i.e., fewer words to recognize is a simpler problem) .
  • the device will sound "set alarm?" and the user must answer either YES or NO, thus the device does not need to know the words "set” or "alarm".
  • the burden on the user to remember the different available commands is reduced since options are presented and a mere selection must be made.
  • the response input time is nearly known so that the recognition algorithm need only operate on a specific sector of the recorded memory, improving recognition performance.
  • Yet another variant of the present invention would include a personal computer link for down loading inputted information, for loading into the device information entered through a personal computer, or for nonvolatile memory expansion.
  • a personal computer link for down loading inputted information, for loading into the device information entered through a personal computer, or for nonvolatile memory expansion.
  • Such a link could of course be used for facsimile transmission of information if it was needed. Accordingly, the scope of protection of the following claims is intended to be broad enough to cover all such modifications and variations.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Marketing (AREA)
  • Acoustics & Sound (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Electric Clocks (AREA)
  • Calculators And Similar Devices (AREA)

Abstract

Les dispositifs de programmation électronique sont diffusés sur une grande échelle mais aucun n'utilise la parole en tant que moyen de mémorisation de message et de commande de fonctionnalité. L'utilisation de la parole, plutôt qu'un message écrit, représente une nouvelle approche, étant donné qu'elle rend le dispositf indépendant du langage (c'est-à-dire non limité à un alphabet ou à un clavier particulier), supprime le tracas de l'entrée d'un message manuel et permet son accès aux handicapés visuels ou autres ne pouvant se servir du langage écrit. L'utilisation de la reconnaissance (43) et de la synthèse (12) vocales permet d'interpréter les commandes et l'information vocales par l'intermédiaire d'un processus interactif. L'intégration de la mémorisation vocale et de la technologie de commande à un agenda électronique introduisent un nouveau produit d'importance permettant aux utilisateurs d'améliorer leur efficacité de travail et leur productivité journalière.
PCT/US1994/001597 1993-02-11 1994-02-10 Programmateur electronique a enregistrement vocal WO1994018667A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US1651993A 1993-02-11 1993-02-11
US08/016,519 1993-02-11

Publications (1)

Publication Number Publication Date
WO1994018667A1 true WO1994018667A1 (fr) 1994-08-18

Family

ID=21777547

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1994/001597 WO1994018667A1 (fr) 1993-02-11 1994-02-10 Programmateur electronique a enregistrement vocal

Country Status (1)

Country Link
WO (1) WO1994018667A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2372864A (en) * 2001-02-28 2002-09-04 Vox Generation Ltd Spoken language interface
EP0847003A3 (fr) * 1996-12-03 2004-01-02 Texas Instruments Inc. Système de mémo audio et sa méthode de fonctionnement
US7200210B2 (en) 2002-06-27 2007-04-03 Yi Tang Voice controlled business scheduling system and method
EP1884921A1 (fr) * 2006-08-01 2008-02-06 Bayerische Motoren Werke Aktiengesellschaft Procédé destiné à la prise en charge de l'utilisateur d'un système d'entrée vocale

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3946157A (en) * 1971-08-18 1976-03-23 Jean Albert Dreyfus Speech recognition device for controlling a machine
US4408096A (en) * 1980-03-25 1983-10-04 Sharp Kabushiki Kaisha Sound or voice responsive timepiece
US4449232A (en) * 1979-03-22 1984-05-15 Sharp Kabushiki Kaisha Audibly announcing apparatus
US4737976A (en) * 1985-09-03 1988-04-12 Motorola, Inc. Hands-free control system for a radiotelephone
US5014317A (en) * 1987-08-07 1991-05-07 Casio Computer Co., Ltd. Recording/reproducing apparatus with voice recognition function
US5199009A (en) * 1991-09-03 1993-03-30 Geno Svast Reminder clock

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3946157A (en) * 1971-08-18 1976-03-23 Jean Albert Dreyfus Speech recognition device for controlling a machine
US4449232A (en) * 1979-03-22 1984-05-15 Sharp Kabushiki Kaisha Audibly announcing apparatus
US4408096A (en) * 1980-03-25 1983-10-04 Sharp Kabushiki Kaisha Sound or voice responsive timepiece
US4737976A (en) * 1985-09-03 1988-04-12 Motorola, Inc. Hands-free control system for a radiotelephone
US5014317A (en) * 1987-08-07 1991-05-07 Casio Computer Co., Ltd. Recording/reproducing apparatus with voice recognition function
US5199009A (en) * 1991-09-03 1993-03-30 Geno Svast Reminder clock

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0847003A3 (fr) * 1996-12-03 2004-01-02 Texas Instruments Inc. Système de mémo audio et sa méthode de fonctionnement
GB2372864A (en) * 2001-02-28 2002-09-04 Vox Generation Ltd Spoken language interface
GB2372864B (en) * 2001-02-28 2005-09-07 Vox Generation Ltd Spoken language interface
US7200210B2 (en) 2002-06-27 2007-04-03 Yi Tang Voice controlled business scheduling system and method
EP1884921A1 (fr) * 2006-08-01 2008-02-06 Bayerische Motoren Werke Aktiengesellschaft Procédé destiné à la prise en charge de l'utilisateur d'un système d'entrée vocale

Similar Documents

Publication Publication Date Title
US5602963A (en) Voice activated personal organizer
TWI525532B (zh) Set the name of the person to wake up the name for voice manipulation
US4368988A (en) Electronic timepiece having recording function
US6711543B2 (en) Language independent and voice operated information management system
US5583965A (en) Methods and apparatus for training and operating voice recognition systems
JPH08194500A (ja) 後でテキストを生成するためのスピーチ記録装置および記録方法
JPH0651941A (ja) 音声注釈を備えた携帯型コンピュータ
CN103051770A (zh) 一种移动终端的语音提醒方法及系统
US5721537A (en) Pager-recorder and methods
US20080040106A1 (en) Digital Recording and playback system with voice recognition capability for concurrent text generation
US7349844B2 (en) Minimizing resource consumption for speech recognition processing with dual access buffering
KR100719776B1 (ko) 휴대형 코드인식 음성 합성출력장치
WO1994018667A1 (fr) Programmateur electronique a enregistrement vocal
KR100554397B1 (ko) 대화형 음성 인식 시스템 및 방법
US8280734B2 (en) Systems and arrangements for titling audio recordings comprising a lingual translation of the title
US20020131564A1 (en) Portable electronic device capable of pre-recording voice data for notification
JPH08116385A (ja) 個人情報端末装置および音声応答システム
Stifelman VoiceNotes--an application for a voice-controlled hand-held computer
JP5402102B2 (ja) スケジュール管理装置およびスケジュール管理プログラム
JP3879192B2 (ja) 記録再生装置
JP2001228897A (ja) 音声入力装置及びその制御方法並びにプログラムコードを格納した記憶媒体
KR100466520B1 (ko) 텍스트 데이터의 편집 및 재생 시스템
JPH0685704A (ja) 音声受信表示装置
KR20010067635A (ko) 음성메모 기록, 재생 및 음성메시지 자동전달 기능을 가진음성시계
JP2001188562A (ja) 音声記録再生装置、周辺装置、これら装置を備える音声記録再生システムおよび音声データ処理プログラムを記録した記録媒体

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): BR CA CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: CA