WO2001058165A2 - System and method for integrated delivery of media and associated characters, such as audio and synchronized text transcription - Google Patents
System and method for integrated delivery of media and associated characters, such as audio and synchronized text transcription Download PDFInfo
- Publication number
- WO2001058165A2 WO2001058165A2 PCT/US2001/003499 US0103499W WO0158165A2 WO 2001058165 A2 WO2001058165 A2 WO 2001058165A2 US 0103499 W US0103499 W US 0103499W WO 0158165 A2 WO0158165 A2 WO 0158165A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- media
- audio
- computer
- text
- sfream
- Prior art date
Links
Classifications
- 
        - G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
 
- 
        - G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/68—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/683—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/685—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/61—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
- H04L65/612—Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio for unicast
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/75—Media network packet handling
- H04L65/762—Media network packet handling at the source
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2201/00—Electronic components, circuits, software, systems or apparatus used in telephone systems
- H04M2201/40—Electronic components, circuits, software, systems or apparatus used in telephone systems using speech recognition
 
- 
        - H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/42391—Systems providing special services or facilities to subscribers where the subscribers are hearing-impaired persons, e.g. telephone devices for the deaf
 
Definitions
- the invention relates to the field of communications and, more particularly, to providing audio or other media having an aural component with associated textual streams.
- the robust growth in demand for both media content and delivery channels has increased the need for novel types of information, news, financials and other services.
- the Internet and other network technologies have enabled a variety of multipoint media streams, such as news web sites containing streamable video clips, audio clips and other media combinations.
- One frequent type of news source or media event is a collective meeting or proceeding, in which one or a few speakers discuss information of interest to a wide audience.
- Those types of settings include sessions of Congress, presidential and other news conferences, corporate analysts' meetings, media conferences and other group events.
- Timely delivery of information content may be particularly valuable, such as with sessions of Congress and other governmental bodies. Many interested parties could benefit from prompt knowledge of pending provisions in legislation, rulings in court cases and other deliberations. For instance, individuals or organizations that would be affected by the enactment of pending legislation may want to furnish input to their representatives or constituents may want to take other actions to contribute or adjust to new statutory, regulatory or other programs.
- the federal government deploys a host of communications facilities situated at a variety of sources, often issuing permits for access to those resources. For instance, Congress permits press access to its chambers and hearing rooms, from which live video and audio feeds are generated for delivery to commercial networks and to news and other organizations.
- Figure 1 illustrates an overall network architecture or system for delivering media and text under one embodiment of the invention.
- Figure 2 illustrates an example of a subscriber interface used to view the output produced by the system of Figure 1.
- Figures 3 and 4 together illustrate a flow chart of media and textual processing under the system of Figure 1.
- Figure 5A is a block diagram illustrating two alternative methods of implementing aspects of the invention described, for example, with respect to Figure 1.
- Figure 5B is a flow diagram illustrating a routine for providing simultaneous transmission of audio and synchronized text to a subscriber over a network, using software tools provided by RealNetworks of Seattle, Washington.
- Figure 5C is a block diagram illustrating a data flow and processing diagram for delivering audio voice and associated text to a subscriber under a Microsoft Windows Media environment.
- Figure 5D is a block diagram illustrating a system front end under the embodiment of Figure 5C.
- Figure 5E is a flow diagram illustrating a routine for delivering voice and associated text to a subscriber under the system of Figure 5C.
- Figure 6 is a schematic diagram illustrating production and development environments for implementing aspects of the invention.
- Figure 7 is a process flow diagram illustrating the flow of data and functions performed by aspects of the invention.
- Figure 8 is a data flow diagram illustrating flow of data through the system of Figure 1.
- Figure 9 is a schematic diagram illustrating timing calculations performed by the system of Figure 1.
- Figure 10 is an example of a home web page for use by the system of Figure 1.
- Figure 11 is an example of a login dialogue box for use by the system in Figure 1.
- Figure 12 is an example of a customized web page for an individual subscriber.
- Figure 13 is an example of a hearings calendar web page.
- Figure 14A is an example of the web page of Figure 13 showing Senate hearings for Thursday, May 25, 2000.
- Figure 14B is an example of a user settings web page.
- Figure 15 is a web page showing a selection of a breakdown of Energy and Natural Resources hearings from the web page of Figure 14.
- Figure 16 is an example of a hearing subscription web page.
- Figure 17 shows the web page of Figure 16, whereby a subscriber has selected live streaming receipt of a hearing.
- Figure 18 is an example of the web page of Figure 15 with a keywords selection box.
- Figure 19 is an example of a web page showing a subscriber's input of keywords after selecting the keywords box of Figure 18.
- Figure 20 is an example of an email message provided to the subscriber based on the selected keywords.
- Figure 21 is an example of a web page showing live receipt of audio and associated transcribed text from a hearing.
- Figure 22 is an example of a web page listing background resources for an associated Senate hearing.
- Figure 23 is an example of a web page listing committee members.
- Figure 24 is an example of a web page listing a press release.
- Figure 25 is an example of a web page for per ⁇ tting a subscriber to search for keywords within previously transcribed hearings.
- Figure 26 is an example of a web page provided as a result of the query entered in the web page of Figure 25.
- Figure 27 is an example of a web page providing summaries of the context in which the query term is located in the transcribed hearing.
- Figure 28 is an example of a web page providing the full hearing transcript with the query term highlighted.
- Figure 29 is an example of a web page providing the hearing text with associated audio.
- Figure 30 is an example of a web page showing subscriber selection of a block of text.
- Figure 31 is an example of an email message generated by the subscriber that includes the block of text selected in Figure 30.
- Figure 32 is a data model relationship diagram that may be employed with, for example, an SQL database.
- Figure 33 is a data model relationship diagram representing an alternative embodiment that may be employed by, for example, an Oracle database.
- Figure 34 is a flow diagram illustrating a suitable routine for encoding content for query by a user.
- Figure 35 is a flow diagram illustrating a suitable routine for generating recorded content and character files, such as for storage on a CD-ROM.
- the invention relates to a system and method for the integrated delivery of media and associated characters such as synchronized text transcription, in which a computing system or dedicated network may collect, process and deliver unified audio, video and textual content on a live basis to subscribers.
- a computing system or dedicated network may collect, process and deliver unified audio, video and textual content on a live basis to subscribers.
- front-end audio or video servers receive and encode the audible or video activities, for example, of a legislature, press conference, musical/multimedia composition, corporate analyst meeting, town meeting or other event.
- the raw, digitized media feeds from the event are transmitted to a centralized distribution server, which, in turn, delivers the digitized stream of the event to a transcription facility, where automated and/or human transcription facilities decode the language component.
- the textual content is synchronized with the original audio, video or other media and delivered to subscribers, for instance via a Web site interface.
- Subscribers may configure the delivery modes according to their preference, for instance, to silently parse the textual stream for keywords, triggering full-screen, audible, wireless or other delivery of the audio or video content when a topic of interest is discussed.
- Subscribers may alternatively choose to view and hear the media and textual output continuously, and they may access archives for the purpose of reproducing text for research or editorial activities.
- the system stores or archives all encoded audio/video content and associated transcriptions (when such transcriptions are created). Thus, subscribers may retrieve the archived content and transcriptions after the live event.
- Subscribers may perform queries of archived content and transcriptions to retrieve portions of content files and associated transcriptions that match the query, so that a subscriber may view portions of transcriptions and listen to associated audio/video content related to a query.
- First described below is a suitable computing platform for implementing aspects of the invention.
- Second, audio and text processing performed by the system are described.
- Third, details regarding the system are provided.
- Fourth, a suitable user interface is described. Thereafter, a suitable data model is described.
- enhancements and some alternative embodiments are presented.
- FIG. 1 and 2 and the following discussion provide a brief, general description of a suitable computing environment in which aspects of the invention can be implemented.
- embodiments of the invention will be described in the general context of computer-executable instructions, such as routines executed by a general purpose computer, e.g., a server or personal computer.
- a general purpose computer e.g., a server or personal computer.
- Those skilled in the relevant art will appreciate that aspects of the invention can be practiced with other computer system configurations, including Internet appliances, hand-held devices, wearable computers, cellular or mobile phones, multiprocessor systems, microprocessor-based or programmable consumer electronics, set-top boxes, network PCs, minicomputers, mainframe computers and the like.
- aspects of the invention can be embodied in a special purpose computer or data processor that is specifically programmed, configured or constructed to perform one or more of the computer-executable instructions explained in detail below.
- the term "computer,” as used generally herein, refers to any of the above devices, as well as any data processor.
- aspects of the invention can also be practiced in distributed computing environments, where certain tasks or modules are performed by remote processing devices, that are linked through a communications network, such as a Local Area Network (LAN), Wide Area Network (WAN) or the Internet.
- LAN Local Area Network
- WAN Wide Area Network
- program modules or subroutines may be located in both local and remote memory storage devices.
- Aspects of the invention described herein may be stored or distributed on computer-readable media, including magnetic and optically readable and removable computer discs, as well as distributed electronically over the Internet or over other networks (including wireless networks).
- Those skilled in the relevant art will recognize that portions of the invention reside on a server computer, while corresponding portions reside on a client computer. Data structures and transmission of data particular to aspects of the invention are also encompassed within the scope of the invention.
- a legislative session or other event is intended to be recorded and delivered to subscribers with a corresponding textual stream.
- an audio signal transducer such as a microphone array 102
- the microphone array 102 is connected directly or indirectly to an audio server or encoder 104 located at the event site.
- the encoder is connected to receive the audio signal by way of multi-patch terminals, pool feeds, web casts, and the like.
- the audio encoder 104 may be or include a computer workstation having one or more high-resolution audio digitizer boards along with sufficient CPU, memory and other resources to capture raw sounds and other data for processing in digital form.
- the audio encoder 104 may use as an encoding platform the commercially available RealProducerTM software, by RealNetworks to produce a digitized audio stream.
- the audio encoder 104 may include or be coupled to a multiport central hub that receives inputs from microphones in one or more hearing rooms.
- one audio encoder may be located in a Senate office building and receive 16 inputs associated with microphones located in various hearing rooms, while two similar servers may be located in House office buildings and be coupled to 24 inputs associated with microphones in House hearing rooms.
- the audio encoder may include a single interface for managing all encoding sessions or all input audio streams from the microphones.
- the audio encoder is a quad Pentium III 550 MHz computer with two gigabytes of RAM, and running the Windows NTTM operating system. Unnecessary applications that consume processing overhead, such as screensavers and power management features are deleted.
- the audio encoder also includes an analog-to-digital converter, such as the Audi/o analog-to- digital converter by Sonorus of Newburg, New York, that can accommodate eight audio inputs simultaneously.
- the audio encoder may include a sound card, such as the Studi/o sound card by Sonorus, that may accommodate 16 digital audio channels. Further details may be found at http://www.sonorus.com.
- the resulting raw, digitized audio stream is transmitted over a communications link 106 to a remote distribution server 108 acting as a distribution and processing hub.
- the communications link 106 joining the audio encoder 104 and the distribution server 108 may be or include any one or more of, for instance, the Internet, an intranet, a LAN (Local Area Network), a WAN (Wide Area Network) or a MAN (Metropolitan Area Network), a frame relay connection, an Advanced Intelligent Network (AIN) connection, a synchronous optical network (SONET) connection, a digital TI, T3 or El line, Digital Data Service (DDS) connection, DSL (Digital Subscriber Line) connection, an Ethernet connection, an ATM (Asynchronous Transfer Mode) connection, a FDDI (Fiber Distributed Data Interface) connection or a CDDI
- the Communications link 106 may be or include any one or more of a WAP (Wireless Application Protocol) link, a GPRS (General Packet Radio Service) link, a GSM (Global System for Mobile Communication) link, or other wired or wireless, digital or analog interfaces or connections.
- WAP Wireless Application Protocol
- GPRS General Packet Radio Service
- GSM Global System for Mobile Communication
- the distribution server 108 incorporates a database 110 for the mass storage of synchronized collections of audio, video and textual information related to individual media events collected by one or more audio servers 104 or other front-end sources.
- additional sources may include a portable text-scanning or OCR device, such as the Hewlett-Packard CapShareTM device, to capture and transmit textual information, such as press releases, schedules, transcripts or other data, from the event site along with other media using infrared or other connections to communications link 106.
- Any type of scanner may be employed, such as a flatbed scanner for scanning documents related to an event and converting them into an electronic format (such as ASCII.pdf, etc.), which can then be electronically forwarded to the distribution server.
- the distribution server 108 includes an Apache web server and RealG2 server by RealNetworks.
- the system may run the Linux 6.1 operating system by RedHat and the MySQL database.
- the distribution server 108 employs PERL, C/C++ and Personal Home Page (PHP) by Zend Technology.
- PGP Personal Home Page
- the distribution server 108 may be or include, for instance, a workstation ranning the Microsoft Windows NT , Unix, Linux, Xenix, SolarisTM, OS/2TM, BeOSTM, Mach, Apache, OpenStepTM or other operating system or platform software.
- the distribution server 108 directs the raw, digitized audio stream via a communications link 112, which may be or include similar connections as communications link 106, to a processing facility 140.
- the processing facility 140 may be a separate facility or other internal, local or remote engine dedicated to transcribing the raw media input into character or other format, such as ASCII English or other text or other forms.
- the processing facility 140 may incorporate a voice recognition server 114 to receive the digitized audio or other media streams for processing and conversion.
- the voice recognition server 114 may in one embodiment include one or more speech recognition modules 146, such as the commercially available DragonTM Professional or IBM ViaVoiceTM products.
- a separate interface to the Dragon speech recognition module may be developed using a Software Development Kit ("SDK").
- SDK Software Development Kit
- the interface or overlay may be similar to existing court-reporting system interfaces. Thus, if a transcription agent is familiar with and performs court reporting functions, they may readily employ the system.
- the interface may further allow agents to create speaker macros to facilitate transcription processes, such as identifying speakers.
- agents to create speaker macros such as identifying speakers.
- a transcription agent may say "one Mac", which will pull previously prepared text that was entered when preparing for the hearing that will identify Speaker 1 as "Senator Orin Hatch”. This permits the transcription agent to avoid having to provide speakers' titles and possibly spell their names out each time they speak.
- the interface also automatically wraps and time stamps the generated text according to a format needed to encode and display it.
- the speech recognition module 146 may be capable of speaker- independent operation. Different or specialized versions of the speech recognition module 146 may be employed within the voice recognition server 114 to enhance accuracy, upgrade functionality or provide special foreign language or other features, according to the transcription needs.
- the voice recognition server 114 may be attended by a human transcription agent or "voice writer" to monitor and operate the speech recognition module 146 and other components in order to ensure the smooth flow of first-stage conversion from voice to text. It may be advantageous to program the speech recognition module 146 with particular vocabulary words likely to be spoken at the event and with the speech profile of the human transcription agent before processing the media stream.
- the voice recognition server 114 may be a dual motherboard computer having approximately one-half a gigabyte of RAM, and where unnecessary utilities that consume processing overhead (such as screensavers, energy management functions, and the like) are removed to make the computer more stable. Peripheral inputs unnecessary to the voice writing function are eliminated.
- the audio server 104, speech recognition module 146 and other elements may cooperate to recognize and split different voices or other audible sources into separate channels, which, in turn, are individually processed to output distinct textual streams.
- the voice recognition server 114 thus invokes one or more speech recognition modules 146 preferably with oversight or monitoring by a human transcription agent to resolve the digitized verbal content generated by the audio server 104 into a raw textual stream, for instance ASCII-coded characters. Output in other languages and formats, such as 16-bit Unicode output, is also possible.
- the role of the transcription agent may include the maintenance and operation of the speech recognition module 146, monitoring the raw textual stream and other service tasks.
- the human transcription agent may listen to the mcoming audio stream via headphones and substantially simultaneously repeat the received audio into a microphone. The transcription agent's voice is then digitized and input to the speech recognition module 146, where the speech recognition module has been trained to the particular transcription agent's voice.
- the speech recognition module 145 has improved accuracy because the mcoming audio stream is converted from the speech patterns and speech signatures of one or more speakers to the trained voice of the transcription agent.
- the transcription agent's role is intended to be comparatively limited, and, generally, is not or not frequently to involve semantic judgments or substantive modifications to the raw textual stream. It may be noted that the role of or need for the transcription agent may be reduced or eliminated in implementations of the invention, depending on the sophistication and accuracy of the speech recognition module 146, as presently known or as developed in the future.
- the raw textual stream may be delivered over a local connection 118, such as an RS232 serial, FireWireTM or USB cable, to a scopist workstation 120, which may also be located within the processing facility 140 or elsewhere.
- the scopist workstation 120 may be a personal or server computer executing text editing software presented on a Graphical User Interface (GUI) 122 for review by a human editorial agent, whose role is intended to involve a closer parsing of the raw textual stream and comparison with the received audio stream.
- GUI Graphical User Interface
- the tasks of the editorial agent stationed at scopist workstation 120 include review of the raw textual stream produced by the voice recognition server 114 to correct mistakes in the output of the speech recognition module 146, to resolve subtleties, such as foreign language phrases, to make judgments about grammar and semantics, to add emphasis or other highlights and generally to increase the quality of the output provided by the system.
- the editorial agent at the scopist workstation 120 may be presented with the capability, for instance, on the agent GUI 122 to stop/play/rewind the sfreaming digitized audio or other media in conjunction with the text being converted to compare the audible event to the resulting text.
- data compression technology known in the art may be employed to fast-forward the media and textual stream for editing or other actions while still listening to audible output at a normal or close to normal pitch.
- the editorial agent at the scopist workstation 120 may attempt to enhance textual accuracy to as close to 100% as possible.
- the system outputs the audio and text streams with as little lag time from event to reception as is possible to provide an experience akin to a "live" radio (or television broadcast) for the subscriber.
- some degree of delay including that resulting from processing time in the servers, network lag and human response time of the transcriber, editorial agent or other attendants, may be inevitable.
- the total amount of delay from event to reception may vary according to the nature of the input, network conditions, amount of human involvement and other factors, but may generally be in the range of 15 minutes or less.
- the voice recognition server 114 may provide timestamps to the received audio or generated text to synchronize the audio with the text, as described herein.
- time stamps may be added to the audio by the distribution server before forwarding it to the processing facility, and to the text after it is received from the processing facility.
- the voice recognition server 114 and/or scopist workstation 120 scan the text for increased accuracy.
- the Dragon software allows a user to provide electronic files (Word, notepad, etc.) to help increase the accuracy of the voice recognition.
- the word is compared with an internal word list.
- Such input files help to create the internal word list. If the word is not found in the internal word list, the software will prompt the agent to train that word into the system. The next time that word is input, the system will recognize the word and text will be generated correctly.
- the processing facility 140 may provide automated task assignment functions to individual transcription and editorial agents.
- the processing facility receives a notification from the distribution server of hearings to be transcribed for each day.
- the processing facility may automatically assign hearings to individual agents, or post all hearing transcription assignments to a central web page or file that all agents may access and choose to accept to transcribe/edit for that day.
- the edited textual stream is delivered via a communications link 124, which may likewise be or include a similar link to the communications link 106, to a text encoder module 126 incorporated within the distribution server 108.
- the communications link 124 may also be or include, for instance, a telnet connection initiated over the Internet or other network links.
- the text encoder module 126 receives the corrected textual stream and converts the stream into, in an illustrated embodiment, a RealTextTM stream adhering to the commercially known Real encoding standard for further processing.
- the converted RealTextTM stream may be transmitted back to the distribution server 108 via a connection 128, which may be, for instance, a lOObaseT connection to a processor 142.
- the finished, edited, corrected, converted RealTextTM stream representing the audible, visible or other events being transcribed is then sent to the distribution server 108, synchronized and stored in database 110 with the raw digitization of the media from the event for delivery to subscribers.
- the synchronization may be implemented, for instance, using the Wall Clock function of the commercially available Real software.
- the Wall Clock function allows multiple media streams to be synchronized using internal timestamps encoded into each stream. As the streams are received on the client or recipient side, they are buffered until all streams are at the same internal time in relation to each other. Once the streams are aligned in time using timestamps and other information, the player within the client workstation 136 may start playing the streams simultaneously.
- the distribution server 108 may store the finished composite stream or portions thereof in database 110 in RealTextTM or a variety of other formats, for instance, in XML, HTML, ASCII, WAV, AIFF, MPEG, MP3, WindowsTM Media or others.
- the distribution server 108 may include web serving functionality, such as the ability to deliver web pages requested by other client computers.
- a distribution server may be or include a G2 RealServer produced by RealNetworks.
- a separate web server may be employed.
- the distribution server 108 may have the ability to broadcast, rebroadcast or hold one or more media streams simultaneously.
- the distribution server 108 may accept any type of media stream, including audio, video, multimedia or other data stream capable of being sensed by a human or processed by a computer. Additionally, the distribution server 108 may synchronize any combination of streams and types of media. The arrival of finished RealText or other stream into the database
- the dissemination link 130 may, again, be or include a similar link to communications link 106, such as a single or multiple digital TI or other communications channel.
- the dissemination link 130 may furthermore be or include a Personal Area Network (PAN), a Family Area Network (FAN), a cable modem connection, an analog modem connection such as a V.90 or other protocol connection, an Integrated Service Digital Network (ISDN) or Digital Subscriber Line (DSL) connection, a BlueTooth wireless link, a WAP (Wireless Application Protocol) link, a SymbianTM link, a GPRS (General Packet Radio Service) link, a GSM (Global System for Mobile Communication) link, a CDMA (Code Division Multiple Access) or TDMA (Time Division Multiple Access) link such as a cellular phone channel, a GPS (Global Positioning System) link, a CDPD (Cellular Digital Packet Data), a RIM (Research in Motion, Limited) duplex paging type device, an IEEE 802.11-based radio frequency link, or other wired or wireless links.
- PAN Personal Area Network
- FAN Family Area Network
- FAN Family Area Network
- cable modem connection such as
- the dissemination link 130 includes TCP/IP connections over the Internet 132 to one or more subscriber link 134, which in turn may be or include links similar to communications link 106, for delivery to one or more client workstations 136.
- any one or more of the communications link 106, communications link 112, communications link 124, communications link 130, communications link 134 or other communications links may be or include self-healing or self-adjusting communication sockets that permit dynamic allocation of bandwidth and other resources according to local or global network conditions.
- the client workstation 136 may be or include, for instance, a personal computer running the Microsoft Windows 95, 98, 2000, Millenium , NTTM, Windows CETM, PalmTM OS, Unix, Linux, SolarisTM, OS/2 TM, BeOS TM, MacOSTM or other operating system or platform.
- the client workstation 136 may also be or include any microprocessor-based machine such as an Intel x86-based device or Motorola 68K or PowerPC device, microcontroller or other general or special purpose device operating under programmed control.
- the client workstation 136 may also include electronic memory such as RAM (random access memory) or EPROM (electronically programmable read only memory), storage such as a hard drive, CD-ROM or rewritable CD-ROM or other magnetic, optical or other media, and other associated components connected over an electronic bus (not shown), as will be appreciated by persons skilled in the art.
- the client workstation 136 may also be or include a network-enabled appliance such as a WebTV unit, radio-enabled Palm Pilot or similar palm-top computer, a set- top box, a game-playing console such as Sony Playstation or Sega Dreamcast , a browser-equipped cellular telephone, other TCP/IP client or other wireless appliance or communication device.
- a network-enabled appliance such as a WebTV unit, radio-enabled Palm Pilot or similar palm-top computer, a set- top box, a game-playing console such as Sony Playstation or Sega Dreamcast , a browser-equipped cellular telephone, other TCP/IP client
- the combined, synchronized media and finished textual stream arriving over the subscriber links 134 from the database 110 may be viewed on a client GUI 144 in conjunction with a browser and/or an administrative module 138 running on the client workstation 136 pennitting authentication of subscribers, and access to and manipulation of the information content delivered by the invention. More particularly, a subscriber may use the client GUI 144 on the client workstation 136 to invoke or log on to a web site for his or her information subscription, and enter password and other information to view the synchronized output stream according to his or her delivery preference. Schedules of different types of media events, in searchable database or other form, may, in another embodiment, be presented on the client GUI 144 to assist in event selection, as described herein.
- a subscriber may choose to view the entire information stream produced by the system, including audio, video and synchronized textual output on the client GUI 144 using speakers 148, headphones and other output devices for further review, as shown in Figure 2.
- a subscriber may enter commands using the administrative module 138 and client GUI 144 to have the information stream delivered silently or in a background process, with an alert function activated.
- the alert function may scan the mcoming textual stream at the point of the distribution server 108 or client workstation 136 for the presence of keywords chosen by the subscriber, upon the detection of which a full screen may pop up showing the surrounding text, video or other content.
- the alert function may deliver other information such as a message or notice via email, an Inbox message in Microsoft OutlookTM, an online instant message, an IRC (Internet Relay Chat) message, a pager message, a telephone call or other electronic notification.
- the user may choose to receive the informational content in a silent mode while viewing the entire textual stream, but with the ability to highlight portions of the textual stream to hear the audio output associated with that portion. This, for instance, may be useful for a subscriber wishing to discern emphasis, inquiry, irony or other inflections or subtleties that may not be evident in textual form.
- a subscriber operating the client workstation 136 may likewise choose to highlight, cut, paste, stream to hard or removable drive or otherwise store or archive one or more portions of the information content delivered for later processing, word processing, retransmission or other uses.
- subscriber access via the subscriber links 134 may permit a web site or other entry portal to allow a subscriber to access prior audio events or content for archival or research purposes.
- the subscriber may manipulate the administrative module 138 to schedule the delivery of the sfreaming service according to specified dates and times, events of interest and associated delivery modes, as well as other settings.
- the database 110 within distribution server 108 may be configured to be searchable according to discrete search terms, particular fields related to header descriptions of the event, or on other bases.
- the database 110 may be configured with a decision support or data mining engine to facilitate the research functions.
- An example of subscriber choices for manipulating the client GUI 144 and associated administrative choices is illustrated in Figure 2.
- processing begins.
- audio or other input from an event is collected and delivered to the audio server 104.
- the raw audio, video or other signals are digitized.
- the digitized audio data is transmitted to the distribution server 108.
- the digitized audio stream in RealAudioTM format, or otherwise, is transmitted to the processing facility 140.
- the speech recognition module 146 is invoked to output an ASCII text or other stream corresponding to the audio content.
- the ASCII text stream is output to the scopist workstation 120.
- the ASCII text stream is edited by an editorial agent at the scopist workstation 120 using the agent GUI 122.
- the edited or corrected textual stream is transmitted to the text encoder module 126.
- the corrected or edited ASCII text is converted to an advanced text format, such as RealTextTM, using software tools provided by RealNetworks.
- the reformatted textual stream is stored and synchronized with the audio or other media source within the database 110.
- the integrated media/textual information is then prepared for subscriber access.
- one or more subscribers access the distribution server 108 and are validated for use.
- a subscriber's delivery profile is checked to set delivery mode, such as full sfreaming content, background execution while searching for alert terms, or other formats or modes described herein.
- the integrated audio or other media along with the textual stream are delivered according to the subscriber's service profile, whether triggering an alert or other mode.
- subscriber requests for archival linking to related sources or other nonsfreaming services may be processed as desired.
- FIG. 5A depicts a system for receiving mcoming audio and adding to its synchronized text using software tools provided by RealNetworks.
- the right-hand side of Figure 5A depicts an alternative or additional embodiment using software tools provided by Microsoft Corporation of Redmond, Washington.
- a vertical dashed line schematically differentiates between the two embodiments. Referring first to the left-hand side of Figure 5A, the audio server
- the 104 includes a RealAudioTM encoder 502 that receives the mcoming audio stream from an audio source and converts it into an encoded audio stream to be transmitted over the communications link 106 to a RealServer 504.
- the encoded audio is broken into discrete subfiles at, for example, one minute intervals represented by file segments 106 in Figure 5 A (numbers 1, 2 . . .), which are stored in the archive or database 110.
- the Wall Clock tool provided by RealAudio employs timestamps in each of the file segments 506, and in the RealTextTM encoded text stream (provided over the communications link 124) to synchronize them.
- the Wall Clock function is supposed to maintain synchronism between a RealTextTM file and a RealAudio file, regardless of the length or duration of each file. However, tests have shown that the function does not work and thus synchronism is lost. Therefore, as described herein, a solution employed by an aspect of the invention creates several individual one minute files and stitches them together to provide appropriate synchronism.
- the distribution server 108 retrieves each stored file segment 506, stitches them together, and combines them with the encoded text stream, to generate a synchronously combined audio and text file representing a single audio event or discrete media content.
- a human operator may associate a title or name to the combined file, which forms one of several titles or event names of a playlist 508.
- the client workstation 136 includes a browser or communication software to establish a communication link with the distribution server 108 via the network 132.
- the client workstation 136 also includes a Real Player application for receiving a subscriber-selected title or audio event from the playlist 508 for replay and display, as described herein.
- the RealServer receives the encoded audio stream and, under block 524, stores the encoded audio stream in discrete one-minute RealAudio format files in the database 110.
- the system receives the mcoming encoded audio stream, parses it into discrete blocks, and stores each block as a separate file. "Parsing" simply refers to breaking the mcoming information signal into smaller chucks. While one-minute chucks are described, the system may divide the stream into larger chucks, however the resulting synchronism will have a correspondingly greater granularity. If shorter than one minute blocks are used, then the system may provide even closer synchronism between the audio and text streams; however, greater processing is required to store and retrieve a resulting greater number of discrete files.
- the RealServer constructs the playlist 508 that identifies each consecutive one-minute file segment.
- the RealServer receives the initial line of text (such as in ASCII format) from the scopist workstation 120 during a telnet session.
- a RealServer executes a script to initiate two substantially simultaneous processes.
- the first process starts the RealText process to encode the received text into RealText format.
- the second starts a G2SLTA process to simulate a live broadcast of audio by identifying from the playlist and retrieving the first one-minute file.
- the G2SLTA, RealText and other functions are provided by the software development kit (the RealG2 SDK) distributed by RealNetworks.
- the RealServer transmits the RealText encoded text and RealAudio file to a requesting subscriber.
- the RealServer determines whether an end of hearing flag is received in the current RealAudio file. If it is, then the routine ends in block 536.
- the RealServer retrieves the next audio file identified in the playlist and receives the next lines of ASCII text. The routine then moves back to block 530 where the RealServer encodes the received text and retrieves for transmission the next audio file.
- the stitching together of the consecutive one-minute audio files may be performed to generate a single complete stored audio file representing the complete hearing.
- the hearing may be broken into two or more separate files, where each file is considerably longer than the single minute.
- a lengthy hearing may be broken into separate parts, e.g. to ease transcription.
- Each audio file (and associated text or other character transcription file) includes a start of hearing flag indicating the beginning of the hearing or other content and an end of hearing flag indicating the end of the hearing/content.
- a Windows MediaTM audio encoder 510 receives the mcoming audio stream and converts it into an encoded audio stream, which is transmitted over the communication link 106 to a Windows MediaTM server 512.
- the Windows MediaTM server 512 routes the encoded audio for storage in the archive or database 110 over a communications link 514, and forwards the audio over the communications link 112 to the processing facility 140.
- the distribution server 108 then retrieves the stored audio (represented by line 516) and combines it with the text received over the communication links 124.
- the client workstation 136 employing not only a browser or communication facility, but also a Windows MediaTM player, requests and receives the audio and associated and synchronized text for replay on the workstation.
- the mcoming voice audio is split and sent to the transcription agent at the voice recognition server 114, and the raw text is then transmitted to an editing agent at the scopist workstation 120.
- the incoming audio is also delayed for approximately five minutes under a delay block 540. Such a delay may simply involve storing the audio in the archive 110 for five minutes and then retrieving it.
- An encoder block 542 receives the delayed audio and combines it with the edited text from the workstation 120 to create an encoded ASF format stream.
- the ASF Advanced Sfreaming Format
- ASF Advanced Sfreaming Format
- the Windows MediaTM server 512 receives the merged audio and text ASF stream and streams it to a client browser 544, such as Internet Explorer or Netscape Navigator, running on the subscriber workstation 136.
- the client browser 544 plays the stream with a media player, such as Windows MediaTM Player, and displays the text via Java scripts.
- the encoder 542 may be implemented in Visual Basic. The encoder receives the audio input from a sound card in the audio server 104, and receives the text from a TCP/IP connection with the processing facility 140.
- a voice or other audio stream after being digitized, is input to a voice recorder block 546 running on the Windows MediaTM audio encoder 510 of the audio server 104.
- the voice recorder encodes the voice signal, through a sound card, into an ASF voicestream.
- the ASF voice stream is received by the media server 512 and broadcast to the encoder 542.
- the encoder may be a Visual Basic program that encodes voice and text, so that the client browser, using the Media Player, may play the voice ASF broadcast.
- the encoder uses a TCP/IP connection to get the text input from the scopist workstation 112.
- the encoder then merges the text with the delayed voice into a combined ASF stream and sends it to the media server 512 for broadcast.
- a basic process flow is shown as beginning in a block 550 where the mcoming voice audio is encoded by the voice recorder 546 into an audio ASF stream.
- the media server 512 receives the encoded stream and broadcasts it to the encoder 542.
- the encoder combines the audio with a received text stream to create a combined audio and text ASF stream.
- the media server 512 receives the combined ASF stream and broadcasts it to a client browser.
- the client browser 544 receives the combined audio and text ASF stream and replays it on the subscriber workstation 136.
- an example of a hardware platform for developing and distributing stored media and associated text or character files to subscribers includes a development environment 602 as a separate computer platform for testing content distribution.
- a Redundant Array of Intelligent Inexpensive Disks (“RAID") 604 stores the encoded media (audio, video, etc.) and associated text or character files.
- RAID 5 system is employed.
- a RAID 5 system employs large sector striping data with a rotating parity stripe.
- RAID 5 system does not employ a dedicated parity drive and therefore write performance is better than other RAID systems.
- the RAID 5 system also provides better storage efficiency than other RAID systems. Fmlher details regarding RAID systems may be found, for example, at http://www.vogon.co.uk.
- the RAID systems store archives of transcripts and associated audio for hearings or other content. While RAID systems are depicted, alternative embodiments may employ any known data storage system, such as automated tape cartridge facilities produced by Storage Technology of Louisville, Colorado or by ADIC of Redmond, Washington. Additionally, some subscribers may maintain their own secure database and archives.
- the data model (described below) and other aspects of the system may be licensed or sold to third parties or subscribers, particularly those wishing to maintain their own secure database of archived content.
- the system functionality described herein may be provided by a secure subscriber server, independent from the publicly accessible system described herein. Users of such a secure system may simply access the secure database and receive synchronized sfreaming media and text after being first authorized by the system. .Indeed, the development environment of Figure 6 represents such an alternative embodiment of a separate, secure system.
- a G2 server (sold by RealNetworks) and database server 606 provide an interface with the RAID system 604, and communication with a web server 610, through a firewall 608.
- the firewall 608 may be implemented by a secure server or computer running any variety of firewall or network security products.
- the web server 610 may include an authentication server or implement authentication services.
- the web server 610 communicates with the network 132 to receive requests for web pages and files stored in the RAID system 604, and delivers results of such requests to a test base of workstations or computers operated by test subscribers who are coupled to the network.
- a separate production system 614 is quite similar to the development environment 602, but it includes substantially more storage and processing capabilities. As shown in Figure 6, the production system employs three RAID systems 604. Two G2 servers and a separate single database server 606 are coupled between the firewall computer 608 and the RAID systems 604. A Domain Name Server (“DNS”) 616 is coupled between the web and authentication server 610 and the network 132. The DNS converts domain names into Internet protocol (“IP”) addresses. IP addresses consist of a string of four numbers up to three digits each, which are associated with IP address text names (e.g., www.hearing.com). A separate DNS helps speed resolving IP addresses and facilitates content exchange with subscribers' workstations.
- DNS Domain Name Server
- the DNS, web and authentication servers and/or G2 and database servers may be located close to an Internet backbone, such as at Uunet facilities or facilities provided by Akami.
- the production environment 614 represents a suitable implementation of the distribution server 108.
- a separate web server such as an Apache web server, is provided for handling web page delivery functions, while a separate server handles audio and text file retrieval and sfreaming, such as G2 servers by RealNetworks.
- a single server may perform all the functions, or even more servers may be employed to furtlier compartmentalize various functions performed by the system.
- FIG. 7 a programming schematic illustrates functionality performed by the above hardware and represents how the system starts processes, manages functions and stores data.
- the system includes five main processes: broadcasting under block 702, audio encoding under block 704, processing under block 706, archiving and synchronizing under block 708 and web interfacing under block 710.
- live audio or other media
- an event such as from Senate and House hearing rooms, where the audio is controlled through audio technicians or operators and multiport patch panels, which may be positioned at or near the hearing rooms.
- an audio encoder 712 receives the live audio feed, converts it from analog to digital, and then encodes the digitized audio for distribution or routing to the processing block 706.
- the audio encoder may be the RealMedia server described above.
- a G2 server 714 in the processing block 706 receives the encoded audio and rebroadcasts the live encoded audio stream in RealMedia format to a voice writing block 716.
- the voice writing block 716 receives the live audio stream and, using voice recognition software, creates a text sfream that is routed to a scoping or editing block 718.
- the scoping block 718 cleans up or edits the text stream to fix errors and provide additional content or value to the text.
- the scoping block 718 then feeds or sfreams the accurate or edited text stream to an RT receiver 720.
- the RT receiver receives the edited text stream and routes it to a keyword alerts block 722, a RealTextTM encoder 724 and a database server 726.
- the keyword alerts block 722 scans the text sfream for keywords that have been previously submitted by subscribers, as described herein.
- the RealText encoder block 724 employs the RealNetworks RealText encoder to encode the text into RealMedia text stream for use by Real Player client software residing on subscriber workstations.
- the database server block 726 stores the text sfream in a predefined data structure, which forms part of the archiving process.
- An aichiving block 728 forming part of the archiving and synchronizing process 708 stores the audio and text streams for all processed hearings or other events received from the processing block 706.
- the database server block 726 interfaces with the archiving block 728, stores text and media streams for accurate retrieval and subscriber querying, and maintains an updated index of all stored and archived audio, text and other content files (such as a "playlist" in the RealAudio embodiment).
- a synchronizing block 730 receives the audio and text streams from the archiving block 728, and synchronizes text and audio streams for transmission to the web interface process 710. Audio and text stream blocks 732 in the web interface process 710 receive the audio text streams from the synchronizing block 730 and provide such synchronized streams to requesting subscribers' workstations.
- FIG. 8 a schematic block diagram illustrates data flow between processes under the system of Figure 1.
- the encoded media is routed by the media server or distribution server 108 to the transcription generation facility or processing facility 140.
- the distribution server stores the encoded media and retrieves it after some delayed time y.
- the time delay corresponds to the time it takes the processing facility to generate the transcription for the encoded media.
- the voice writing process 716 receives the encoded media at the time x where x corresponds to the start time of the hearing, audio event or beginning of the aural media.
- the scoping process 718 receives the raw text produced by the voice writing process, edits it and provides it to the RT receiver process 720.
- the mcoming encoded media includes a hearing ID that uniquely identifies each hearing or other audio content, which may be provided by the G2 server 714.
- the RT receiver process sets the time x to the current system time.
- the RT receiver process receives a line from the scopist workstation and sets a delay value y to the current system time.
- the incoming spoken words are parsed into lines, where each line may represent, for example, approximately 85 character spaces, approximately 15 words or an average line of text on a document, although many other methods of parsing mcoming audio and transcribed text may be performed.
- each line will represent parsing of the mcoming audio into smaller increments.
- the RT receiver process 720 determines a delta value indicating how long from the beginning of the hearing the currently received line was spoken. Delta represents the current system time y minus the initial time x.
- the RT receiver process stores in the database the currently transcribed line, the value y and/or the value delta. Thereafter, the process loops back to block 904, and blocks 904 through 908 are repeated until the transcription is completed.
- the transcribed text is routed back to the distribution server 108, from the processing facility 140, with the timestamp indications (such as the value delta), as shown in Figure 8.
- the distribution server then combines the received text with the audio or multimedia content that has been delayed by the value y to thereby synchronize the text with the original media, and delivers it over the communication line 130.
- PHP is a scripting language similar to JavaScript or Microsoft's VB script.
- PHTML contains programming executed at a web server rather than at a web client (such as a browser).
- a file suffix ".phtml” indicates an HTML page having a PHP script.
- PHP is an embedded, server-side scripting language similar in concept to Microsoft's ASP, but with a syntax similar to Perl. PHP is used on many web sites, particularly with Apache web servers.
- PHTML files may represent user interface pages, such as web pages described below, while PHP files represent files and executables running on the server. Interaction between different PHTML scripts in web page forms provided from subscribers via subscriber workstations 136 are handled by the distribution server 108 (or web servers 610) with "categories functions.” Categories, as generally referred to herein, refer to character codes (such as two-character codes) that distinguish functional code groups. Functions, as generally referred to herein for pages, are actions within a categoiy. Most functions originate as either a hidden item in a form (e.g., automatically executed when the page is served) or when the submit buttons on web page forms are selected. Categories need only be unique within one PHTML file. Functions need only be unique within a category.
- each PHTML file functions specific to a category are prepended with a file identifier and category identifier. For example, all functions in a company administration file (cau ⁇ nin.phtml) for a Contact Information category are prefixed with "CAO.”
- Each category has a dispatch function that takes the function variable (e.g., "$func") and passes it to the particular function that handles it.
- the dispatch function is prefixed with the file's prefix code and suffixed with the category's code (e.g., "function CADispatchCI($func)").
- Each file also has a main category dispatch function that calls the individual dispatch functions, such as in the form "function CACatDispatch($cat, $func)," where "$cat" refers to a category variable.
- the web pages may also be implemented in XML (Extensible Markup Language) or HTML (HyperText Markup Language) scripts that provide information to a subscriber or user.
- the web pages provide facilities to receive input data, such as in the form of fields of a form to be filled in, pull-down menus or entries allowing one or more of several entries to be selected, buttons, sliders or other known user interface tools for receiving user input in a web page.
- input data such as in the form of fields of a form to be filled in, pull-down menus or entries allowing one or more of several entries to be selected, buttons, sliders or other known user interface tools for receiving user input in a web page.
- screen “web page” and “page” are generally used interchangeably herein. While PHP, PHTML, XML and HTML are described, various other methods of creating displayable data may be employed, such as the Wireless Access Protocol ("WAP").
- WAP Wireless Access Protocol
- the web pages are stored as display descriptions, graphical user interfaces or other methods of depicting information on a computer screen (e.g., commands, links, fonts, colors, layouts, sizes and relative positions, and the like), where the layout and info ⁇ nation or content to be displayed on the page is stored in a database.
- a "link” refers to any resource locator identifying a resource on a network, such as a display description provided by an organization having a site or node on the network.
- a "display description,” as generally used herein, refers to any method of automatically displaying information on a computer screen in any of the above-noted formats, as well as other formats, such as email or character/code-based formats, algorithm-based formats (e.g., vector generated), or matrix or bit-mapped formats. While aspects of the invention are described herein using a networked environment, some or all features may be implemented within a single-computer environment. To more easily describe aspects of the invention, the depicted embodiments and web pages are at times described in terms of a subscriber's interaction with the system. In implementation, however, data input by a subscriber is received by the subscriber workstation 136, where the workstation then transmits such input data to the distribution server 108. The distribution server 108 then performs computations or queries or provides output back to the subscriber workstation, typically for visual display to the subscriber, as those skilled in the relevant ait will recognize.
- a home page 1000 for beginning a session or interaction with a web site for accessing legislative hearings online (such as at the URL http://www.hearingroom.com).
- a new user may navigate to various additional pages (not shown) that provide information regarding the system, without requiring the user to become a subscriber.
- Such information may include information regarding subscription levels.
- Such information may include the name of the subscription level (gold, silver, platinum, and the like), maximum number of users per subscription level, maximum number of live hearings per week per subscription level, and the like.
- the subscription levels may include a "View all Live" premium subscription level that would permit such a subscriber to view all hearings live (at a significantly higher annual fee than other subscription levels).
- Clicking a login button 1002 causes the distribution server 108 to provide or serve up to the requesting computer (such as the subscriber workstation 136) a login dialog box 1002 shown at Figure 11.
- the subscriber enters a user name and password in input fields 1002 and 1004 before clicking an okay button to be authorized to log on to the system.
- a new session begins.
- the system uses session management routines provided by the personal home page version 4 software ("php4").
- a script may provide similar functionality.
- the subscriber's browser running on his or her computer (e.g., workstation 130) sends a cookie to the distribution server 108. If the cookie does not exist, the server creates one.
- the cookie contains a unique identifier that the web server software (such as php) reads in a local file, which contains a session state for that session.
- Subscriber authentication may be handled by a web server, such as an Apache web server.
- FIG. 12 an example of a custom web page as displayed to the logged-in subscriber is shown.
- the page displays the subscriber's name in a name portion 1202 ("Demonstration, Inc.”), and an introductory message 1204 ("Hearing lineup for Thursday, July 20th, 2000"). Displayed below in a portion 1206 are short descriptions of the hearings requested to be delivered to the subscriber.
- a background link 1208 allows the subscriber to click thereon to receive background information with respect to each listed hearing, as described below.
- An archive button 1210 links to a page to permit a subscriber to search for and/or order an archived transcript (such as the page of Figure 25).
- a subscribe button 1212 allows the subscriber to subscribe to and receive a transcript of an upcoming hearing (such as by providing a page as shown in Figure 13).
- a hearings calendar link 1302 allows the subscriber to click thereon to receive a chronological list of upcoming hearings.
- a future hearings by committee link 1304 allows the subscriber to click thereon to receive a list of future hearings sorted by committee.
- a calendar 1306 displays the days of the current month, with links for each day. Clicking on the link for a particular day allows the subscriber to receive a web page displaying a list of all hearings for that day. For example, clicking on the day May 20 th in the calendar 1306 causes the system to display a day listing 1308 that provides the subscriber with access to Senate and House of Representative hearings information.
- a Committee Details link 1312 provides information on the committee, although this link may be omitted.
- the system provides a web page 1500, shown in Figure 15, that displays particular Energy and Natural Resources hearings for that day. As shown in Figure 15, each hearing is listed with its title, the Senate committee or subcommittee conducting the meeting, and its time and date.
- An order button 1502 allows the subscriber to order a transcript of the hearing, while a background link 1504 allows the subscriber to receive background information, as described below.
- FIG. 14B an example of a settings page 1450 is shown, which may be retrieved by clicking on the settings button 1214 ( Figure 12).
- a subscriber may edit contact information, change passwords, change keyword groups (described below), determine account information (under “Assets”), and edit company information as shown.
- Clicking on the order button 1502 causes the system to provide a subscription web page, such as the page 1600 shown in Figure 16.
- the subscriber may click one of three radio buttons: a live sfreaming hearing button 1602 to receive the transcript in near real time, a two-hour transcript button 1604 to receive the transcript with a two-hour time delay, and a next day transcript button 1606 to receive a transcript the day after the hearing.
- the two-hour option may be replaced by a "same day" option.
- a buy button 1608 allows the subscriber to purchase or subscribe to the selected hearing, while a cancel button 1610 cancels the transaction and returns the subscriber to the previous screen. Clicking the buy button causes the system to deduct the appropriate amount from the subscriber's account, while clicking the cancel button credits the subscriber's account.
- the subscriber After clicking the live sfreaming hearing button 1602, and clicking the buy button 1608, as shown in Figure 17, the subscriber is returned to the hearing listings page 1500 as shown in Figure 18. As shown, the hearing ordered by the subscriber no longer has associated with it the order button 1502, but now has associated with it a keywords button 1802. By clicking on the keywords button, the system provides a keywords page 1900, as shown in Figure 19. Keyword entry fields 1902 allow the subscriber to enter one or more keywords that the subscriber wishes the system to identify within a hearing, and to provide notification to the subscriber when such terms are uttered during the hearing. As shown in Figure 19, five keywords are entered in five keyword fields. The subscriber may enter as few as one word, and the system can provide more than five fields.
- An update keywords button 1904 allows the subscriber to select the keywords and return to the previous screen.
- a duplicate group button 1906 allows the subscriber to copy the words in these fields for use as keywords for other hearings.
- a delete group button 1908 allows the subscriber to delete all entries within the keyword entry fields 1902.
- the system provides an email message 2000 to the subscriber, providing the subscriber with notification that one or more of the subscriber's keywords have been uttered during a selected hearing.
- the email message includes a subject line 2002 that identifies the email message as an alert ("LiveWireAlert"), and the title of the hearing.
- the body of the message indicates to the subscriber which keywords have been uttered (in this case, "Richardson” and "Exxon”), and provides a link 2006 to allow the subscriber to click thereon to immediately listen to the hearing in progress.
- the system provides an email notification when two of the subscriber's keywords were identified within the transcript of the hearing.
- as few as one, and as great as all fields entered by the subscriber may be required to be located within the transcript before the system provides an email notification to the subscriber.
- more detailed query constructs may be created, such as using Boolean connectors, and the like.
- the keyword alerts process 722 scans the text generated by the processing facility 140 for any keywords entered by one or more subscribers.
- the system accumulates multiple mentions of the same keyword until a threshold is exceeded. For example, the system may require five mentions of one keyword, and a single mention of a second keyword in a subscriber's list of keywords before providing an alert to the subscriber. Thus, the system does not provide alerts every time a keyword is mentioned, but only for occurrences where a keyword is mentioned a sufficient number of times to indicate that the substance of the hearing may correspond to that keyword.
- the system provides not only an alert to the subscriber, but also a portion of the franscript that includes the one or more keywords and associated text to provide a context for the located keywords within the transcript.
- the system may provide the line of transcript before and after the line containing the one or more keywords (similar to that shown in Figure 27). Based on this alert and portion of transcript text, the subscriber may wish to purchase or obtain the entire transcript.
- the alert may include additional information to permit the subscriber to obtain the hearing and transcript (such as the link 2006).
- future hearings not specifically ordered by a subscriber may be scanned, and the system may provide corresponding alerts.
- the system may provide an augmented version of alert notification to that described above.
- the system may provide a smaller context for the alert terms (such as only the five words that precede and follow the keyword).
- the subscriber may order a full copy of the transcript to be received over the portable device, or receive a much greater summary or larger context of the proceedings that include the search terms.
- the subscriber may be able to listen to the audio over a cell phone, for example.
- the subscriber may have to pay additional amounts for such service.
- a web page or window such as a window 2100 shown in Figure 21, is provided to the subscriber.
- the window includes a heading 2102 that identifies the hearing, time and date, and provides the background link 1504 for additional information.
- the conventional Real Player controls 2104 provide a graphical user interface for the user to control the delivery of audio (such as pausing, stopping, rewinding, adjusting volmne, etc.), although speed, pause and other functions may not be available for the delivery of live audio.
- a transcription portion 2106 displays the text transcription previously created by the processing facility 140.
- a hearing details section 2202 provides details regarding the date, time, committee, location and full name of the hearing.
- a member list link 2204 links to a list of committee members, while a supporting materials section 2206 provides related information regarding the particular hearing identified in the hearing details section, such as provided by a link 2208.
- a list of committee members is presented as organized by Republicans and Democrats.
- Each committee member has an associated link 2302, whereby clicking on such link causes the system to display details regarding that particular committee member.
- Clicking on any of the supporting materials links 2206, such as a link 2208 causes the system to display the associated materials, such as a press release displayed in a window 2400 shown in Figure 24.
- All background research is preferably from a primary source, and includes original material retrieved from well-respected and rehable sources.
- the research provided by the system may be in-depth public information with numerous citations. General background information may be avoided, but instead information that is relevant and not redundant is provided to subscribers in a single format.
- researchers may also be responsible for retrieving prepared testimony from hearing rooms and scanning that testimony into electronic form to be posted with other background information on the system. Such researchers may also be charged with the task of making physical audio connections in certain hearing rooms and testing functionality of these connections. Such researchers or "legislative interns" who have need to access hearing rooms are issued a Capitol Hill Press Pass from the Senate Radio/TV Gallery to carry out such duties. Furthermore, researchers may also do internal research for the system that does not get to published to subscribers. For example, researchers may train the voice recognition software, where such fraining focuses on jargon, abbreviations, scientific terms, foreign language usage, proper names and any other terms that may be received in the mcoming audio sfream. Such fraining may be particular to an upcoming healing
- the system displays an archive page 2500 to permit the subscriber to search through stored hearing transcripts for keywords.
- a keyword search field 2504 allows the subscriber to enter a keyword to be searched, while a find button 2506 initiates the query.
- An advanced search link 2508 allows the user to access enhanced search tools, such as multifield Boolean searching, and the like.
- Figure 26 shows an example of a query result screen 2600 that lists three hearings that include the search term "AOL."
- a hearing listed in the query results without a order button indicates that the subscriber has previously ordered that hearing, and thus need not pay additional costs to access the full transcript.
- a listen link 2602 allows the subscriber to listen to the hearing as it has been previously stored in the database of the system.
- a read link 2604 allows the subscriber to retrieve from the database only the text portion of the transcript to view.
- a view results button 2606 allows the subscriber to view the line of the transcript containing the search term, and the single lines that precede and succeed it, as shown in a screen 2700 of Figure 27.
- the subscriber may click the listen link 2602 to view both the transcript text and listen to the audio, as shown in Figure 29.
- the subscriber may select portions of the franscript text, such as a block of text 3000. The subscriber may then cut and paste the selected text, such as pasting the text into an email message 3100 shown in Figure 31.
- the subscriber may readily employ such transcribed hearings for use in other documents (such as word processing documents), for routing to other individuals (such as via email), or many other purposes.
- a client-side search routine may provide local searching capability to the subscriber.
- the workstation 136 may provide a search module, with a user interface similar to that in Figure 25 or 19, that permits the subscriber to enter key words or query search terms.
- the workstation analyzes the sfreaming text received for the key words.
- the system provides an indication or alert to the subscriber.
- the subscriber may perform additional tasks on the workstation, and then receive a pop up dialog box or other notification to permit the subscriber to redirect his or her attention back to the sfreaming audio and transcription.
- the subscriber may also employ other known search tools, such as the common "Find” function (often available under a key combination of "Alt"- “F”). Thus, the subscriber may search in retrieved text for one or more search terms.
- search tools such as the common "Find” function (often available under a key combination of "Alt"- “F”).
- Figure 26 shows an example of a listing of three hearings and important information regarding each hearing, such as the hearing title, its date, time, and the like. This information is captured and stored in the database. The information is stored in the database as separate fields for each hearing to permit subscribers to readily search and retrieve archived hearings. Of course, a greater number of fields may be stored for each hearing to permit greater searching capabilities of the database, although fewer fields may be employed. Suitable Data Model
- the tables below identify various data objects (identified by names within single quotation marks preceding each table), as well as variables or fields within each table (under the "Column Name”).
- the tables also identify the "Data Type” for the variable (e.g., integer ("int”), small integer (“tinyint”), variable character (“varchar”), etc.).
- the "Constraints” column represents how the column number is generated.
- a “Description” column provides a brief description regarding some of the fields.
- Each table includes a "Primary Key” that is a unique value or number within each column to ensure no duplicates exist, and one or more "Key” values representing how extensively the tables are indexed.
- the data objects in the database may be implemented as linked tables as those skilled in the relevant art will appreciate.
- the "Excuse” field in the hearings table above identifies why audio could not be received or recorded from a hearing, such as lack of working microphones in the hearing room, problems with the audio server 104, and the like.
- the "Keywords” field allows the system to identify keywords, such as metadata, that can be used by search engines to identify a particular hearing. Rather than employing the Keywords field, the system may simply use the "Name" field for searches, where the name represents the title of the hearing.
- the "ResearchDone” field represents whether a human intern or other individual has perfo ⁇ ned the required research regarding a hearing, such as obtaining the members' names, any list of witnesses for the hearing, and background research regarding any relevant documents for the hearing (such as scans in .pdf or other format of previously prepared testimony). This research is used when the subscriber clicks on the background link 1504.
- the "Status” field represents one of eight flags indicating status of a particular hearing: (1) whether audio is to be recorded and stored for a hearing; (2) whether a transcript is to be obtained from the recorded audio; (3) whether the audio recording is in progress; (4) whether transcription of the audio is in progress; (5) whether the audio encoding storage is complete; (6) whether the audio files have been stitched together to complete a complete single audio file (as described above); (7) whether transcription of the audio has been complete; and (8) whether the hearing is complete.
- Keywords should be deleted after a hearing is concluded.
- An "Assets” table may store a list of a company's or subscriber's total assets, where a company may have multiple assets with different expiration dates due to different subscription programs (as described herein). Higher level logic may ensure that the assets list makes sense, whereby if one asset has a negative value, assets of a positive value are applied to offset the negative one.
- a "Billing" table allows subscribers to create separate billing entries for separate billing events for themselves. A zero or negative hearing ID value means that the billing event was canceled, and that the funds were restored to the company's assets; entries should not be removed from the billing table.
- a "BillingNotes” table stores notes regarding a subscriber's billing entry. It is a separate table from the billing table to allow the billing table to use a fixed row size.
- a “Building” table may keep track of building abbreviations and names, such as "RHOB” representing an abbreviation for the Rayburn House Office Building.
- a “ClientCodes” table may represent subscriber code numbers.
- a “CmteLabei” table may store labels for committees, so that committee names will not have to all start with “Committee on” or “Subcommittee on.”
- a “CmteRanks” table may store rankings of members in a committee.
- a “Cmtes” table provides a list of committees, including fields to indicate whether the committee is for the House or Senate, an internally determined committee number, and an internally determined subcommittee letter.
- a “Company” table describes a company or subscriber to the system.
- a “Contacts” table stores internal contact lists for the system, including names, phone numbers, addresses, etc.
- a “Costs” table may store a pricing structure for the system, and may include an access field representing an access level of the subscriber.
- a "FileTypes” table may store hearing background information to allow the system to properly display it to subscribers.
- a "Hearcmte” table may associate hearings to one or more committees. The hearcmte table stores the associations between hearings and committees. A similar table is used to store the relations between members and committees.
- the hearcmte table structure is: row number, hearing number, committee number.
- the row number is unique within the table; hearing number points to a hearing, and committee number points to a committee. In this way, a hearing can easily have many committees associated with it.
- a "Hearinglnfo" table may store background information for hearings, including links to appropriate stored documents or other background materials described herein.
- a “MoreAccess” table may provide finer-grained access controls to an administrative section of the system.
- a “MoreAccessNames” table may contain English names for the MoreAccess table's field names.
- "Politparty” and “states” tables may list political party affiliations and states, respectively.
- a "Rooms” table may list buildings and room numbers in which a hearing may be located.
- a “SubscriptionLevels” table may list subscription levels for subscribers.
- a “Subscriptions” table may list all the subscriptions made on the system.
- a “Tasks” table may list tasks for processing by the system.
- a “TransactionTypes” table may list different purchase options for hearings provided to subscribers by the system.
- An “UnavailableReasons” table may list reasons why a hearing cannot be covered.
- a "Users” table may list users or subscribers to the system.
- a "WhatsNew” table may list information new to the web site or provided by the system, which may be displayed to subscribers.
- Figure 32 is a data model relationship diagram that shows the relationship between the various tables described above.
- the system constructs a dynamically expandable database that links audio or other media, associated transcriptions with respect to that media, and other associated content, such as the background information described herein, and subscriber and business information.
- the system overall acts as a production, storage and retrieval system.
- Those skilled in the relevant ait can create a suitable database from the schema described herein using any known database, such as MySQL, Access by Microsoft or any Oracle or Sequel database.
- Figure 33 shows an example of a database schema for implementing the data model described herein within an Oracle database.
- the hardware platform, and associated software and software tools may, of course, employ future versions or upgrades thereto.
- a more robust search capability may be provided than that described above. For example, subscribers may be able to search not only hearing transcripts, but also search through all background materials associated with such hearings. Users may be able to provide additional refinements of searches, such as by searching sessions of Congress (e.g., "second session of the 106 congress").
- each one-minute audio file may need not be replayed for each one-minute of audio to create the resulting complete file.
- the system may provide simply the audio of a hearing to a subscriber, without the associated transcription. Of course, providing such audio will be at a lower price, and may be offered on a pay per listen basis.
- the system may be more cookie-based for each session, whereby a password may be used only once.
- Live-wire alerts, or alerts regarding the system's recognition of a subscriber's key term in received audio may be performed using various telecommunications means, such as paging a subscriber, placing a prerecorded call to the subscriber's cell phone, sending an email message over a wireless link to a subscriber's palm-top or hand-held computer, and the like. Audio may be split so that one audio source may be effectively duplicated in real time to be sent to two or more locations, such as to the archiving facility and to the processing facility.
- the system may require subscribers to indicate at a specified time before a hearing whether the subscriber wishes to receive the transcription of the hearing to thereby provide sufficient time to gather background information.
- the audio server 104 may include automated or manual gain control to adjust line level and improve signal-to-noise ratio of audio input thereto.
- a digital signal processor may be employed to help split voices apart when multiple speakers are talking at once.
- the processing facility 140 may include speech recognition modules trained for individual voices of House or Senate speakers (such as fraining separate modules to the voices of separate senators). In other words, the archive stores "famous" voice files relating to speeches or other recorded audio with respect to particular people. These files may be used to train the voice recognition system. As a result, fewer transcription agents would be required. Fmthermore, these files may be sold to others. Voice-over Internet protocol (IP) functionality may be employed by the distribution server 108 and processing facility.
- IP Voice-over Internet protocol
- the distribution server may employ data mining techniques and audio mining.
- DragonTM provides search tools for searching for words or phrases within audio files.
- subscribers may be able to search for keywords in audio files that have not been transcribed.
- the audio mining tools may review an audio file and generate an index. The index could then be published to subscribers who may perform text searches of the index and retrieve archived audio files.
- Known data mining techniques may be employed to search for patterns of desired data in stored transcripts. Such data mining tools and techniques may be provided to subscribers over a web interface.
- subscription to the system is sold on a yearly, declining-balance subscription model. Subscribers choose the hearing coverage and the timing for receiving transcripts (real time, two-hour delay, next day, etc.). The cost of each service ordered is deducted from the annual subscription fee. For example, live sfreaming of a legislative hearing may cost $500 for a "silver subscriber" (who pays a $5,000-a-year subscription fee), $400 for a "gold subscriber” (who pays a $10,000-a-year subscription fee) and only $300 for a "platinum subscriber” (who pays a $15,000-a-year subscription fee).
- Subscribers or other users of the system may employ a pay-per-view (or pay-per-listen) model where access to a single hearing and associated transcript may cost, for example, $750.
- the database schema includes customer administrative functions and fields to ensure that subscribers are billed correctly.
- the system may also employ additional fields to permit subscribers to track client and matter numbers and time spent for a matter to a subscriber's clients.
- a subscriber to the system may be a law firm that, in turn, may be required to track and bill its clients on a client and matter level.
- the database is used to populate web pages and provide information regarding hearings to subscribers.
- the database must keep an accurate list of committees, including committee names, member names, committee descriptions, a list of subcommittees and links to all committee web sites.
- committee members the database must include accurate information regarding the title (such as senator, representative, etc.), party affiliation, full name, role (e.g., chairman, secretary, etc.), member web sites, member biographies, etc.
- Hearing locations must also be maintained, such as location date, location description, room number, etc.
- background information regarding the hearing must be maintained by the database, including opening statements, prepared testimony, member list, witness list, and related materials (e.g., web sites, charts, diagrams, scanned text, etc.).
- the database must maintain accurate information regarding subscribers, such as the subscriber's name (typically a company name and logo), company address, account contact information, technical contact information (such as a technical support person at the subscriber's location), subscription level, subscription length (such as in months), etc.
- the database may furthermore maintain individual subscriber accounts and associated information. While the transcription of legislative hearings are generally described herein, aspects of the invention have broad applicability to various other types of content or media.
- aspects of the system in Figure 1 may be used to search and recall recorded music that contains sung lyrics, specific recorded events derived from original audio proceedings (or video proceedings having an audio component), such as plays and other performances, speeches, motion pictures and the like.
- Live media events may be improved by providing sfreaming text together with the live event.
- Each of these media events or "contenf ' is digitally encoded and stored in archives or the database of the system. Such stored archives may be accessible to subscribers via the Internet. Users may search databases of text or other characters associated with the stored original content, receive and display variable length "blocks" of text corresponding to their search terms, and click on such recalled terms to hear and/or see the original content.
- the system described above creates an interactive link between the text and the original content.
- Figure 34 shows an example of such an alternative embodiment as a routine 3400.
- the system receives and encodes original content, such as any recorded or live proceeding or performance in which at least a portion of that content is to be encoded into a character stream. For example, any event having spoken, sung, chanted or otherwise uttered word may be used.
- the encoded content is stored in discrete files in the database 110 by the distribution server 108. Two or more files may be created for a particular event, such as separately recording not only the event itself, but individual components of the events (such as recording a concert, and recording individual files representing vocals, guitar, drums and other microphone outputs).
- a database index is updated to reflect the new file or files stored therein.
- the system creates and stores a character file or files based on the newly stored content file (such as a text transcription of lyrics in a recorded song).
- the system links the newly created character file with the original content file.
- the system updates the database index to reflect the newly created character file and its link with the original content file.
- the system may also, or alternatively, create a new index representing the character file, such as an index reflecting lyrics of the song. All this information is stored by the system.
- the system receives a user query, such as several words from the song's lyrics.
- the system may also provide additional search tools or filters (such as in a web page) that allow the user to further refine a search. For example, if the user is searching for a particular song, additional filters may indicate the type of music (e.g., rock 'n roll, blues, jazz, soundtrack, etc.) and/or artist. Such additional search or query refinements or filters helps speed the user's search and provide more relevant search results.
- the system searches the database based on the query and, in block 3418, provides the search results: one or more character files with linked and associated content files.
- the user may view the character file to see, for example, the lyrics, and/or receive the content, such as sfreaming audio for the song.
- the system may also permit the user to request or order a copy of the content file.
- the content file is a song
- the user may, after providing payment or authorization, receive a download version of the song, or order a physical version (e.g., CD, cassette, etc.) of the song.
- a user may use search tools provided by the system to identify songs containing those words within the lyrics of the song, listen to some or all of the song, and view associated text (lyrics). While music has been used as an example, various other types of content may similarly be stored, encoded, linked, searched and requested/purchased.
- the above processing system may be modified to create or produce content and associated character files for storage and distribution on permanent recordable medium, such as CD-ROMs.
- Such an application may apply to a broad variety of text materials, such as poetry, plays, speeches, language learning audio recordings, literature and other types of multimedia content from which text was originally derived under the voice recognition process described above.
- the associated text may be read from a computer screen or other device, and such text file may be searched so that a user may click on a search line of text to hear the actual audio associated with that portion of text.
- Figure 35 is an example of a routine 3500 for producing such CD- ROMs.
- the system receives and encodes original content as one or more new files. If the original content is already encoded, then the system need not perform any encoding functions.
- the system creates a character file from the original content file. For example, if the original content is a speech, movie or play, then the system creates a text transcription of the words spoken. In block 3506, the system links the character file with the content file and, in block 3508, creates an index for the file or files. In block 3510, the content file, character file and index are recorded on a recordable medium, such as a CD-ROM. Of course, various other recordable mediums are possible, such as magnetic tape, or other optical, magnetic or electrical storage devices.
- the index may include not only an association between the spoken word and the time it occurred during the original content, but other information, such as discrete portions of the original content. If the original content is a play, then the index may include act and scene designations. Thus, the user may employ search tools to retrieve not only specific lines in a screenplay but certain portions of the play. For music, the index may include not only lyrics, but also refrains, bridges, movements or other portions within the music.
- a subset of CD-ROMs or other created physical medium may be used for teaching, such as teaching a foreign language.
- a foreign language audio file may be linked with two text files: a foreign language text file and an English (or primary language) file.
- the text may be presented to a student in English and the equivalent or comparable foreign language text simultaneously presented with audio content. By clicking on segments of the foreign language text, a student may hear the actual spoken word corresponding to the foreign language text.
- the foreign language spoken word may be output to the student, together with scrolling or sfreaming text simultaneously provided in synchronism.
- the text and audio files may become part of a larger text document or file, such as an entire curriculum or larger book with associated audio.
- audio and linked text files may be provided to students on a CD-ROM
- students may log into a system and receive such files via the Internet or other network.
- aspects of the invention may be applied to interactive text and audio technology in distance learning or fraining via the Internet.
- the system can be used to create interactive lectures, classroom discussions, etc., that are accessible by students via the Internet, both in real time and as on-demand lectures archived by the system. Students may receive not only the audio and associated text but also video with respect to the event.
- static displays of information such as Powerpoint presentations, diagrams from an electronic whiteboard, slides and other images can be included and linked with the event.
- the audio, text and any additional content may all be stored in not only the database 110 but also on recordable media such as CD- ROMs. Students or users may search for relevant issues and subjects by performing text searches in the text files, or searches of the audio files (as noted above).
- Various communication channels may be used, such as a LAN, WAN, or a point-to-point dial-up connection, instead of the Internet.
- the server system may comprise any combination of hardware or software that can support these concepts.
- a web server may actually include multiple computers.
- a client system may comprise any combination of hardware and software that interacts with the server system.
- the client systems may include television-based systems, Internet appliances and various other consumer products through which auctions may be conducted, such as wireless computers (palm-top, wearable, mobile phones, etc.).
- a system for capturing audio, video or other media from events or recordings combines digitized delivery of the media with accompanying high-accuracy textual or character streams, synchronized with the content.
- Live governmental, corporate and other group events may be captured using microphones, video cameras and other equipment, whose output is digitized and sent to a transcription facility containing speech recognition workstations.
- Human transcription agents may assist in the initial conversion to text data, and human editorial agents may further review the audio and textual streams contemporaneously, to make corrections, add highlights, identify foreign phrases and otherwise increase the quality of the transcription service.
- Subscribers to the service may access a web site or other portal to view the media and text in a real time or near real time to the original event, and access archival versions of other events for research, editing and other purposes.
- Subscribers may configure their accounts to deliver the sfreaming content in different ways, including full content delivery and background execution that triggers on keywords for pop-up text, audio, video or other delivery of important portions in real time.
- Subscribers may set up their accounts to stream different events at different dates and times, using different keywords and other settings.
- Various live media events may be archived for later transcription. All archived content files are indexed in a database for rapid or accurate retrieval by subscribers, who may order transcriptions of such archived files if no transcription had been performed earlier.
- transcription files associated with content files are indexed to permit efficient access and retrieval.
- Subscribers may construct a query to search the database, and receive as query results, a list of one or more files. Subscribers may access portions of such files to view franscriptions associated with the queiy, and listen to associated audio or other content corresponding to and synchronized with the transcription.
- transcription refers to converting any aural content into a corresponding character file. The character file is generated from and relates to the original aural file.
- characters refers not only to text (such as ASCII characters), but also pictographs (such as pictures representing a word or idea, hieroglyph, etc.), ideograms (e.g., a character or symbol representing an idea or thing without expressing a particular word or phrase for it) or the like.
- characters as generally used herein, includes any symbols, ciphers, symbolizations, phonograms, logograms, and the like.
- language generally refers to any organized information communication system employing characters or series of characters, both human and machine- readable characters. Machine readable characters include computer codes such as ASCII, Unicode, and the like, computer languages and scripts, as well as computer readable symbols such as bar codes.
- content refers to any information
- media refers to any human generated content, including the human generated content described herein.
- the system may be modified to receive audio music files or streams, and convert the music into notes and other musical notation reflecting the music.
- the system may generate separate audio signals corresponding to each instrument in the music, measure frequency and duration of each note, and compose a representation of a musical score associated with the music.
- any oral audio component may be transcribed to create a corresponding (and synchronized) character file that may be in any language (not necessarily the language of the speaker).
- the system may accept as input a group conversation between several individuals, each speaking a different language.
- the system may create separate transcription files for each speaker, so that separate transcription files in each of the different languages is created for each of the speakers.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Library & Information Science (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Signal Processing (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
Description
Claims
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| AU2001233269A AU2001233269A1 (en) | 2000-02-03 | 2001-02-02 | System and method for integrated delivery of media and associated characters, such as audio and synchronized text transcription | 
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| US18014300P | 2000-02-03 | 2000-02-03 | |
| US60/180,143 | 2000-02-03 | ||
| US09/498,233 | 2000-02-03 | ||
| US09/498,233 US6513003B1 (en) | 2000-02-03 | 2000-02-03 | System and method for integrated delivery of media and synchronized transcription | 
Publications (2)
| Publication Number | Publication Date | 
|---|---|
| WO2001058165A2 true WO2001058165A2 (en) | 2001-08-09 | 
| WO2001058165A3 WO2001058165A3 (en) | 2001-10-25 | 
Family
ID=26876035
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| PCT/US2001/003499 WO2001058165A2 (en) | 2000-02-03 | 2001-02-02 | System and method for integrated delivery of media and associated characters, such as audio and synchronized text transcription | 
Country Status (2)
| Country | Link | 
|---|---|
| AU (1) | AU2001233269A1 (en) | 
| WO (1) | WO2001058165A2 (en) | 
Cited By (26)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO2005043914A1 (en) * | 2003-10-23 | 2005-05-12 | Caption It Pty. Ltd | Transcription system and method | 
| WO2007068119A1 (en) | 2005-12-13 | 2007-06-21 | Audio Pod Inc. | Segmentation and transmission of audio streams | 
| EP1818837A1 (en) * | 2006-02-10 | 2007-08-15 | Harman Becker Automotive Systems GmbH | System for a speech-driven selection of an audio file and method therefor | 
| WO2008084023A1 (en) * | 2007-01-10 | 2008-07-17 | Nuance Communications, Inc. | Method for communication management | 
| WO2008028029A3 (en) * | 2006-08-31 | 2008-09-04 | At & T Corp | Method and system for providing an automated web transcription service | 
| EP1901284A3 (en) * | 2006-09-12 | 2009-07-29 | Storz Endoskop Produktions GmbH | Audio, visual and device data capturing system with real-time speech recognition command and control system | 
| US7987492B2 (en) | 2000-03-09 | 2011-07-26 | Gad Liwerant | Sharing a streaming video | 
| US8185132B1 (en) | 2009-07-21 | 2012-05-22 | Modena Enterprises, Llc | Systems and methods for associating communication information with a geographic location-aware contact entry | 
| US8416925B2 (en) | 2005-06-29 | 2013-04-09 | Ultratec, Inc. | Device independent text captioned telephone service | 
| US8515024B2 (en) | 2010-01-13 | 2013-08-20 | Ultratec, Inc. | Captioned telephone service | 
| US8819149B2 (en) | 2010-03-03 | 2014-08-26 | Modena Enterprises, Llc | Systems and methods for notifying a computing device of a communication addressed to a user based on an activity or presence of the user | 
| US8908838B2 (en) | 2001-08-23 | 2014-12-09 | Ultratec, Inc. | System for text assisted telephony | 
| US9222798B2 (en) | 2009-12-22 | 2015-12-29 | Modena Enterprises, Llc | Systems and methods for identifying an activity of a user based on a chronological order of detected movements of a computing device | 
| US9729907B2 (en) | 2005-12-13 | 2017-08-08 | Audio Pod Inc | Synchronizing a plurality of digital media streams by using a descriptor file | 
| US10225584B2 (en) | 1999-08-03 | 2019-03-05 | Videoshare Llc | Systems and methods for sharing video with advertisements over a network | 
| US10389876B2 (en) | 2014-02-28 | 2019-08-20 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10805111B2 (en) | 2005-12-13 | 2020-10-13 | Audio Pod Inc. | Simultaneously rendering an image stream of static graphic images and a corresponding audio stream | 
| US10878721B2 (en) | 2014-02-28 | 2020-12-29 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10917519B2 (en) | 2014-02-28 | 2021-02-09 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10938886B2 (en) | 2007-08-16 | 2021-03-02 | Ivanti, Inc. | Scripting support for data identifiers, voice recognition and speech in a telnet session | 
| US11258900B2 (en) | 2005-06-29 | 2022-02-22 | Ultratec, Inc. | Device independent text captioned telephone service | 
| US11539900B2 (en) | 2020-02-21 | 2022-12-27 | Ultratec, Inc. | Caption modification and augmentation systems and methods for use by hearing assisted user | 
| US11664029B2 (en) | 2014-02-28 | 2023-05-30 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US20230229702A1 (en) * | 2020-08-18 | 2023-07-20 | Dish Network L.L.C. | Methods and systems for providing searchable media content and for searching within media content | 
| WO2024052705A1 (en) * | 2022-09-09 | 2024-03-14 | Trint Limited | Collaborative media transcription system with failed connection mitigation | 
| US12335327B2 (en) | 2007-06-28 | 2025-06-17 | Voxer Ip Llc | Telecommunication and multimedia management method and apparatus | 
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US7698061B2 (en) | 2005-09-23 | 2010-04-13 | Scenera Technologies, Llc | System and method for selecting and presenting a route to a user | 
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| GB9504376D0 (en) * | 1995-03-04 | 1995-04-26 | Televitesse Systems Inc | Automatic broadcast monitoring system | 
| US5815196A (en) * | 1995-12-29 | 1998-09-29 | Lucent Technologies Inc. | Videophone with continuous speech-to-subtitles translation | 
| US6122613A (en) * | 1997-01-30 | 2000-09-19 | Dragon Systems, Inc. | Speech recognition using multiple recognizers (selectively) applied to the same input sample | 
- 
        2001
        - 2001-02-02 AU AU2001233269A patent/AU2001233269A1/en not_active Abandoned
- 2001-02-02 WO PCT/US2001/003499 patent/WO2001058165A2/en active Application Filing
 
Cited By (68)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US10362341B2 (en) | 1999-08-03 | 2019-07-23 | Videoshare, Llc | Systems and methods for sharing video with advertisements over a network | 
| US10225584B2 (en) | 1999-08-03 | 2019-03-05 | Videoshare Llc | Systems and methods for sharing video with advertisements over a network | 
| US7987492B2 (en) | 2000-03-09 | 2011-07-26 | Gad Liwerant | Sharing a streaming video | 
| US10523729B2 (en) | 2000-03-09 | 2019-12-31 | Videoshare, Llc | Sharing a streaming video | 
| US10277654B2 (en) | 2000-03-09 | 2019-04-30 | Videoshare, Llc | Sharing a streaming video | 
| US8917822B2 (en) | 2001-08-23 | 2014-12-23 | Ultratec, Inc. | System for text assisted telephony | 
| US8908838B2 (en) | 2001-08-23 | 2014-12-09 | Ultratec, Inc. | System for text assisted telephony | 
| US9131045B2 (en) | 2001-08-23 | 2015-09-08 | Ultratec, Inc. | System for text assisted telephony | 
| US9967380B2 (en) | 2001-08-23 | 2018-05-08 | Ultratec, Inc. | System for text assisted telephony | 
| US9961196B2 (en) | 2001-08-23 | 2018-05-01 | Ultratec, Inc. | System for text assisted telephony | 
| WO2005043914A1 (en) * | 2003-10-23 | 2005-05-12 | Caption It Pty. Ltd | Transcription system and method | 
| US11190637B2 (en) | 2004-02-18 | 2021-11-30 | Ultratec, Inc. | Captioned telephone service | 
| US10491746B2 (en) | 2004-02-18 | 2019-11-26 | Ultratec, Inc. | Captioned telephone service | 
| US10587751B2 (en) | 2004-02-18 | 2020-03-10 | Ultratec, Inc. | Captioned telephone service | 
| US11005991B2 (en) | 2004-02-18 | 2021-05-11 | Ultratec, Inc. | Captioned telephone service | 
| US12335437B2 (en) | 2005-06-29 | 2025-06-17 | Ultratec, Inc. | Device independent text captioned telephone service | 
| US10972604B2 (en) | 2005-06-29 | 2021-04-06 | Ultratec, Inc. | Device independent text captioned telephone service | 
| US8416925B2 (en) | 2005-06-29 | 2013-04-09 | Ultratec, Inc. | Device independent text captioned telephone service | 
| US11258900B2 (en) | 2005-06-29 | 2022-02-22 | Ultratec, Inc. | Device independent text captioned telephone service | 
| US10015311B2 (en) | 2005-06-29 | 2018-07-03 | Ultratec, Inc. | Device independent text captioned telephone service | 
| US10469660B2 (en) | 2005-06-29 | 2019-11-05 | Ultratec, Inc. | Device independent text captioned telephone service | 
| WO2007068119A1 (en) | 2005-12-13 | 2007-06-21 | Audio Pod Inc. | Segmentation and transmission of audio streams | 
| US10237595B2 (en) | 2005-12-13 | 2019-03-19 | Audio Pod Inc. | Simultaneously rendering a plurality of digital media streams in a synchronized manner by using a descriptor file | 
| US10805111B2 (en) | 2005-12-13 | 2020-10-13 | Audio Pod Inc. | Simultaneously rendering an image stream of static graphic images and a corresponding audio stream | 
| EP1961154A4 (en) * | 2005-12-13 | 2016-03-09 | Audio Pod Inc | TRANSMISSION OF DIGITAL DATA | 
| US10735488B2 (en) | 2005-12-13 | 2020-08-04 | Audio Pod Inc. | Method of downloading digital content to be rendered | 
| US9729907B2 (en) | 2005-12-13 | 2017-08-08 | Audio Pod Inc | Synchronizing a plurality of digital media streams by using a descriptor file | 
| US9930089B2 (en) | 2005-12-13 | 2018-03-27 | Audio Pod Inc. | Memory management of digital audio data | 
| US9954922B2 (en) | 2005-12-13 | 2018-04-24 | Audio Pod Inc. | Method and system for rendering digital content across multiple client devices | 
| US10091266B2 (en) | 2005-12-13 | 2018-10-02 | Audio Pod Inc. | Method and system for rendering digital content across multiple client devices | 
| JP2007213060A (en) * | 2006-02-10 | 2007-08-23 | Harman Becker Automotive Systems Gmbh | System for speech-driven selection of audio file and method therefor | 
| US8106285B2 (en) | 2006-02-10 | 2012-01-31 | Harman Becker Automotive Systems Gmbh | Speech-driven selection of an audio file | 
| EP1818837A1 (en) * | 2006-02-10 | 2007-08-15 | Harman Becker Automotive Systems GmbH | System for a speech-driven selection of an audio file and method therefor | 
| US7842873B2 (en) | 2006-02-10 | 2010-11-30 | Harman Becker Automotive Systems Gmbh | Speech-driven selection of an audio file | 
| WO2008028029A3 (en) * | 2006-08-31 | 2008-09-04 | At & T Corp | Method and system for providing an automated web transcription service | 
| EP1901284A3 (en) * | 2006-09-12 | 2009-07-29 | Storz Endoskop Produktions GmbH | Audio, visual and device data capturing system with real-time speech recognition command and control system | 
| US8502876B2 (en) | 2006-09-12 | 2013-08-06 | Storz Endoskop Producktions GmbH | Audio, visual and device data capturing system with real-time speech recognition command and control system | 
| WO2008084023A1 (en) * | 2007-01-10 | 2008-07-17 | Nuance Communications, Inc. | Method for communication management | 
| US8712757B2 (en) | 2007-01-10 | 2014-04-29 | Nuance Communications, Inc. | Methods and apparatus for monitoring communication through identification of priority-ranked keywords | 
| US12335327B2 (en) | 2007-06-28 | 2025-06-17 | Voxer Ip Llc | Telecommunication and multimedia management method and apparatus | 
| US10938886B2 (en) | 2007-08-16 | 2021-03-02 | Ivanti, Inc. | Scripting support for data identifiers, voice recognition and speech in a telnet session | 
| US9026131B2 (en) | 2009-07-21 | 2015-05-05 | Modena Enterprises, Llc | Systems and methods for associating contextual information and a contact entry with a communication originating from a geographic location | 
| US8478295B1 (en) | 2009-07-21 | 2013-07-02 | Modena Enterprises, Llc | Systems and methods for associating communication information with a geographic location-aware contact entry | 
| US9473886B2 (en) | 2009-07-21 | 2016-10-18 | Modena Enterprisees, LLC | Systems and methods for associating communication information with a geographic location-aware contact entry | 
| US8185132B1 (en) | 2009-07-21 | 2012-05-22 | Modena Enterprises, Llc | Systems and methods for associating communication information with a geographic location-aware contact entry | 
| US9222798B2 (en) | 2009-12-22 | 2015-12-29 | Modena Enterprises, Llc | Systems and methods for identifying an activity of a user based on a chronological order of detected movements of a computing device | 
| US8515024B2 (en) | 2010-01-13 | 2013-08-20 | Ultratec, Inc. | Captioned telephone service | 
| US9253804B2 (en) | 2010-03-03 | 2016-02-02 | Modena Enterprises, Llc | Systems and methods for enabling recipient control of communications | 
| US8819149B2 (en) | 2010-03-03 | 2014-08-26 | Modena Enterprises, Llc | Systems and methods for notifying a computing device of a communication addressed to a user based on an activity or presence of the user | 
| US9215735B2 (en) | 2010-03-03 | 2015-12-15 | Modena Enterprises, Llc | Systems and methods for initiating communications with contacts based on a communication specification | 
| US11368581B2 (en) | 2014-02-28 | 2022-06-21 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US12136425B2 (en) | 2014-02-28 | 2024-11-05 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10917519B2 (en) | 2014-02-28 | 2021-02-09 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10542141B2 (en) | 2014-02-28 | 2020-01-21 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US12400660B2 (en) | 2014-02-28 | 2025-08-26 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US11627221B2 (en) | 2014-02-28 | 2023-04-11 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US11664029B2 (en) | 2014-02-28 | 2023-05-30 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10742805B2 (en) | 2014-02-28 | 2020-08-11 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US11741963B2 (en) | 2014-02-28 | 2023-08-29 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10878721B2 (en) | 2014-02-28 | 2020-12-29 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US10389876B2 (en) | 2014-02-28 | 2019-08-20 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US12136426B2 (en) | 2014-02-28 | 2024-11-05 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US12137183B2 (en) | 2014-02-28 | 2024-11-05 | Ultratec, Inc. | Semiautomated relay method and apparatus | 
| US12035070B2 (en) | 2020-02-21 | 2024-07-09 | Ultratec, Inc. | Caption modification and augmentation systems and methods for use by hearing assisted user | 
| US11539900B2 (en) | 2020-02-21 | 2022-12-27 | Ultratec, Inc. | Caption modification and augmentation systems and methods for use by hearing assisted user | 
| US20230229702A1 (en) * | 2020-08-18 | 2023-07-20 | Dish Network L.L.C. | Methods and systems for providing searchable media content and for searching within media content | 
| US12346369B2 (en) * | 2020-08-18 | 2025-07-01 | Dish Network L.L.C. | Methods and systems for providing searchable media content and for searching within media content | 
| WO2024052705A1 (en) * | 2022-09-09 | 2024-03-14 | Trint Limited | Collaborative media transcription system with failed connection mitigation | 
Also Published As
| Publication number | Publication date | 
|---|---|
| AU2001233269A1 (en) | 2001-08-14 | 
| WO2001058165A3 (en) | 2001-10-25 | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| WO2001058165A2 (en) | System and method for integrated delivery of media and associated characters, such as audio and synchronized text transcription | |
| US6122617A (en) | Personalized audio information delivery system | |
| US7366979B2 (en) | Method and apparatus for annotating a document | |
| US6192340B1 (en) | Integration of music from a personal library with real-time information | |
| US6513003B1 (en) | System and method for integrated delivery of media and synchronized transcription | |
| US7191023B2 (en) | Method and apparatus for sound and music mixing on a network | |
| US20020091658A1 (en) | Multimedia electronic education system and method | |
| KR100361680B1 (en) | On demand contents providing method and system | |
| US20150113410A1 (en) | Associating a generated voice with audio content | |
| US20020085030A1 (en) | Graphical user interface for an interactive collaboration system | |
| US20100223314A1 (en) | Apparatus and method for creating and transmitting unique dynamically personalized multimedia messages | |
| US20020085029A1 (en) | Computer based interactive collaboration system architecture | |
| US20020087592A1 (en) | Presentation file conversion system for interactive collaboration | |
| US20080189099A1 (en) | Customizable Delivery of Audio Information | |
| US20060136556A1 (en) | Systems and methods for personalizing audio data | |
| MX2007003646A (en) | Method and apparatus for remote voice-over or music production and management. | |
| WO2001053922A2 (en) | System, method and computer program product for collection of opinion data | |
| US20080033725A1 (en) | Methods and a system for providing digital media content | |
| TW200425710A (en) | Method for distributing contents | |
| US20040011187A1 (en) | Method and system for group-composition in internet, and business method therefor | |
| US20090240734A1 (en) | System and methods for the creation, review and synchronization of digital media to digital audio data | |
| KR20020008647A (en) | Song Registration System, and Accompaniment Apparatus and Singing Room System Suitable for the Same | |
| KR20000059119A (en) | internet based method of providing song contest service and apparatus for the same | |
| US20040034548A1 (en) | Apparatus and method of implementing an internet radio community health support system | |
| Sladek et al. | Speech-to-text transcription in support of pervasive computing | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| AK | Designated states | Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW | |
| AL | Designated countries for regional patents | Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG | |
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
| AK | Designated states | Kind code of ref document: A3 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW | |
| AL | Designated countries for regional patents | Kind code of ref document: A3 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG | |
| DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
| REG | Reference to national code | Ref country code: DE Ref legal event code: 8642 | |
| 32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established | Free format text: COMMUNICATION PURSUANT TO RULE 69 EPC (EPO FORM 1205A OF 151102) | |
| 122 | Ep: pct application non-entry in european phase | ||
| NENP | Non-entry into the national phase | Ref country code: JP |