GB2472057A - FIFO queue coupling device for communication between systems using Linux pipe semantics - Google Patents
FIFO queue coupling device for communication between systems using Linux pipe semantics Download PDFInfo
- Publication number
- GB2472057A GB2472057A GB0912795A GB0912795A GB2472057A GB 2472057 A GB2472057 A GB 2472057A GB 0912795 A GB0912795 A GB 0912795A GB 0912795 A GB0912795 A GB 0912795A GB 2472057 A GB2472057 A GB 2472057A
- Authority
- GB
- United Kingdom
- Prior art keywords
- queues
- data
- read
- entry
- fifo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/167—Interprocessor communication using a common memory, e.g. mailbox
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/54—Indexing scheme relating to G06F9/54
- G06F2209/548—Queue
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A coupling device for communication between operating systems in a cluster has a number of FIFO queues. The operating systems have an application programming interface which uses Linux pipe semantics to read from and write to the queues. The coupling device maintains the data structures and locks needed to represent Linux pipes. This allows the use of standard mount, open, write, read and close function calls to the operating system kernel to be used to access the FIFO queues on the coupling device. A write function call with more data than will fit in a single queue entry is split into several entries. A read function call, for less data than will fit into single queue entry, will return an error code indicating that it has failed and not alter the queue. The operating systems my run on a single computer system or be distributed across several computer systems.
Description
INTELLECTUAL
. .... PROPERTY OFFICE Application No. GBO9 12795.2 RTM Date:29 October 2009 The following terms are registered trademarks and should be read as such wherever they occur in this document: Unix Linux System z Parallel Sysplex Infiniband zIOS Intellectual Property Office is an operating name of the Patent Office www.ipo.gov.uk
METHOD TO IMPLEMENT A ROBUST AND MOSTLY LINUX COMPLIANT
CLUSTER FIFO WITH THE COUPLING FACILITY FOR RECORD BASED
COMMUNICATION
BACKGROUND OF THE INVENTION
Field of the Invention
The present invention generally relates to systems and methods for implementing a FIFO file system to allow multiple operating systems to efficiently communicate via a coupling device in a robust manner.
Description of the Related Art
A coupling device is a cluster communication medium connected to multiple operating systems.
The coupling device has one or more queues to which the multiple operating systems can write data to and read data from. Conventionally, a coupling device is a cluster communication medium connected to multiple operating system instances on which one or more queues can be allocated. When communication is performed through a queue on a coupling device, that queue remembers data sent via the channel until the data has been fetched by the receiver. A set of queues can be grouped into a queue area. Each queue in a queue area consists of queue entries that each may hold a maximal amount of data. Each entry comprises standard data parts and adjunct data parts.
In addition to queues, a queue area can also contain lock structures on which atomic lock operations can be performed. Atomic lock operations atomically test the current lock value or modify a lock value only if a caller knows the current lock value.
Coupling devices offer the following synchronous atomic operations on its queues: writing of an entry to the end of a queue and reading and deleting an entry from the beginning of a queue.
These coupling device operations can be combined with one lock operation into a single atomic operation such that the queue operation only succeeds if the lock operation succeeds.
In addition to queues and locks, a queue area provides some general storage area to store information.
While coupling devices do exist, for example, the system disclosed in U.S. Patent No. 6,999,997 B2 (the entirety of which is incorporated herein by reference), they have limited capabilities. For example, the software for interfacing with such coupling devices are proprietary and thus not freely adaptable to different operating systems. In another example, each of the multiple operating systems requires a local queue manager, which increases the complexity of setting up such communication and introduces an extra steps in the communication path.
Conventional software require that a program reliably transfers data to a log or another program before it may continue. Such requirements occur often in the context of failure resilient programs that need a method to recover from failures. In such cases, the latency of the reliable data transfer is of great importance because this latency has a direct impact on the program's performance.
The communication required for a cluster infrastructure (heartbeats, membership service, 2 phase commits, voting protocols, etc.) is another area where reliable fast communication is required.
Conventional communication methods for cluster infrastructure also require that the communication be robust. Robust communications refer to communication that requires that the failure of any member of a cluster must not impact the communication among other members in the cluster and no recovery method for such a communication channel may rely on the global state of the cluster. In particular, this requirement means there may not be any connection status associated with a communication channel.
Conventionally, present cluster communication devices or methods do not provide solutions to all the above requirements. That is, IINTX inter process communication (including FIFO's or pipes) is restricted to communication within an operating system image and therefore it is not useful for cluster communication.
The following conventional communication types are used within clusters: network communication, communication via storage (disks), communication via RDMA (remote direct memory access) and communication via the System z Coupling Facility in a Parallel Sysplex.
Yet none of these conventional connections fulfills all of the above requirements.
For example, network communication (point to point or multicast via any kind of media like Ethernet, Myrinet, Tnfiniband, etc) trade reliability against latency (need to run complex protocols to achieve reliability, otherwise messages get lost if the receiver is not available).
Storage based communication trade the ordering of messages against latency: Storing and retrieving messages atomically in an ordered fashion on a disk requires an expensive protocol, besides it is not robust because a node that fails may leave an intermediate state on disk that must be cleaned up.
RDMA (e.g. via Infiniband) only supports point to point communication and requires a minimal connection semantics. That is, a message cannot be sent if the receiver HW is not available.
System z Coupling Facility based queues are only accessible to z/OS and TPF through a proprietary low level interface (see MVS Programming: Sysplex Services Reference, 5A22- 7618-09). Current z/OS cluster implementations (Parallel Sysplex) do not rely on coupling facility based communication alone, they also need communication via the XCF network for the cluster infrastructure communication.
SUMMARY OF THE INVENTION
In view of the foregoing and other exemplary problems, drawbacks, and disadvantages of the conventional methods and structures, an exemplary feature of the present invention is to provide a method for multiple operating systems to robustly communicate with a coupling device.
The present invention exemplarily provides a method of implementing a first-in, first out (FIFO) system for basic cluster communication, comprising a plurality of operating systems and a coupling the coupling device including a plurality of queues, each operating system including an application programming interface configured to read from and write to the plurality of queues via the application programming interface using a FIFO semantics mostly compliant with Linux FIFO's, the method including initializing one lock and data structures on the coupling device to represent the FIFO's, executing a "mountO" function on one of the plurality of operating systems to mount a directory tree representing the queues in the coupling device into a directory tree of the one of the plurality of operating systems, executing an "open (ORDWRONONBLOCK)" function and an "open (0 RDWR)" function on the one of the plurality of operating systems to open one of the plurality of queues for reading and writing, executing a "write ( )" function, including n bytes of data, on the one of the plurality of operating systems to write the n bytes of data to an entry at an end of the one of the plurality of queues after executing one of the "open (0_RD WRO_NONBL0CK)" function and the "open (O_RDWR)" function, the executing the "writeO" function including determining whether the n bytes of data are greater than or less than a maximum byte size of an entry of the one of the plurality of queues, and if the n bytes of data are less than the maximum byte size of the entry of the one of the plurality of queues, writing the n bytes of data to an entry at an end of the one of the plurality of queues, or if the n bytes of data are greater than the maximum byte size of the entry of the one of the plurality of queues, writing the n bytes of data to multiple entries at the end of the one of the plurality of queues, each of the multiple entries including less than a the maximum byte size of the entry, executing a "read ( )" function on the one of the plurality of operating systems to read data from an entry at a beginning of the one of the plurality of queues after executing one of the "open (0_RD WR0_N0NBLOCK)" function and the "open (0_RDWR)" function provided the read buffer passed to the read function has at least the size of a queue entry. The executing the "readO" function including determining whether a read buffer size of the one of the plurality of operating systems is greater than or less than a maximum byte size of an entry of the one of the plurality of queues, and if the read buffer size is greater than the maximum byte size of the entry of the one of the plurality of queues, reading data from an entry at a beginning of the one of the plurality of queues, or if the read buffer size is less than the maximum byte size of the entry of the one of the plurality of queues, returning a read fail response to the one of the plurality of operating systems. The method also including executing a "close ()" function on the one of the plurality of operating systems to close the one of the plurality of queues, executing a "umountO" function to unmount the directory tree representing the queues in the coupling device from the directory tree of the one of the plurality of operating systems, and deallocating data structures representing P0SIX FIFOs from the coupling device.
The "open (0 RDONLYIONONBLOCK)", "open (0 RDONLY)", "open (0WR0NLYO_NONBL0CK)", "open (0_WRONLY)", "write ( )", functions are implemented compliant to the Linux FIFO semantics. The "readO" function is implemented compliant to the Linux FIFO semantics if its read buffer is greater than or equal to the entry size of the queue representing the FIFO being read from.
The "read ()" command fails if its read buffer is smaller than the entry size of the queue representing the FIFO being read from, wherein the "closeO" function removes all traces of the FIFO on the operating system instance it is called on if no more processes on that operating system instance have that FIFO open and the "closeO" does not modify any data or state on the coupling device.
The "mount", "open(O RDWR 0 NONBLOCK)", "open(O_RDWR)," "readO", "writeQ", "closeQ" and "umountQ" functions do not leave any intermediate state on the coupling device if they fail while being executed. The "mountO", "open (0 RDWRO NONBLOCK)", "open (0 RD WR)", "write ( )", "read ()", and "close ()", "umountO" functions are implemented as parts of the operating system kernels.
The claimed invention exemplarily provides a method for a reliable and robust cluster communication. Exemplarily, the present invention relies on low latency coupling technology (such as a System z Coupling Facility) and provides an interface that is in accordance with UNIX paradigms and compliant with first in, first out (FIFO) operations as defined for Linux.
The coupling technology must provide support to store queue data structures grouped in queue areas. Exemplarily, the present invention shows that the coupling technology need only to provide a minimal set of basic operations to implement such a communication channel. One basic operation would exemplarily be to atomically append records to the end of a queue data structure. Another other basic operation would exemplarily be to atomically dequeue a record from a beginning of a queue data structure.
Exemplarily, the present invention provides a Linux compliant communication interface that exemplarily exploits a coupling device to provide a connectionless, reliable, and ordered cluster communications with minimal send/receive latency that is also fault tolerant with regard to node failures.
Exemplarily, Linux compliant refers that, in order to be acceptable by both the Linux open source development community and UNTX /Linux software developers, application development interfaces must adhere to API standards. UNTX-like operating systems conventionally offer interfaces according to the file paradigm to access external devices. Some inter process communication (IPC) interfaces are also available in UNIX-like operating systems, but are restricted for communication among processes within a single operating system image.
Another exemplary embodiment of the present invention includes a computer readable medium tangibly embodying a program for executing any method set forth herein.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing and other exemplary purposes, aspects and advantages will be better understood from the following detailed description of an exemplary embodiment of the invention with reference to the drawings, in which: Figure 1 illustrates a system including multiple operating systems and a coupling device according to an exemplary embodiment of the invention; Figures 2, 3, 4 illustrate a coupling device of the system of Fig. 1; Figures 5-11 illustrate operating steps of the claimed invention; Figure 12 illustrates a typical hardware configuration which may be used for implementing the computer system and method according to the exemplary aspects of the present invention; and Figure 13 illustrates a magnetic data storage diskette 1200 and CD-ROM 1202 to store the system 1001.
DETAILED DESCPTION OF EXEMPLARY EMBODIMENTS OF THE INVENTION
Exemplarily, the invention would operate in an operating system that provides a virtual file system (VFS) application programming interface (API) and allows to support implementations of different file system (ES) types.
Exemplarily, the present invention would include systems physically connected to a coupling device (CDev).
Exemplarily, FS of type cf_rfifo comprises a root directory containing a set of FIFO's.
Exemplarily, cf_rfifo supports open, read, write, close functions according to Linux fifo semantics (a semantics also implemented by other Unix variants like AIX and OpenSolaris) with the following exceptions: the content of a cf_rfifo fifo is not erased when the last process has closed the fifo, reading from the fifo will fail if the read buffer is smaller then PIPE BUE bytes, or alternatively, reading from the fifo may only be allowed by a single reader, cfrfifo supports the close operation such that within an operating system instance all local resources associated with an open fifo are deleted if its last file descriptor in said operating system instance is closed, and open on a cf_rfifo fifo only supports the options 0 RDWR and 0 RDWR ONONBLOCK.
Exemplarily, a FS of type cf rfifo is stored in a CDev queue area. The FS of type cf rfifo stored in a CDev queue area uses CDev queue area storage (cf rfifo_sb) to describe the attributes of the FS (e.g. the superblock) and the attributes of the fifos. That storage may be a queue or a control block in the CDev queue area. The FS of type cf rfifo is stored in a CDev queue area uses for each fifo in the FS one list (cf_data_list) to describe the contents of the fifo such that each list entry has maximal size PIPE BUF bytes.
Exemplarily, a kernel module implementing the ES type cf_rfifo includes a set of kernel data structures to control each cf rfifo fifo in a mounted cf rfifo FS having standard structures (e.g. Linux struct file) for each "opening" of a cf_rfifo fifo in a mounted cf_rfifo FS, standard structures (e.g. Linux struct mode) for each fifo in a mounted cf rfifo FS, an extension to the mount operation to mount a cf rfifo FS described by a CDev and queue area identifier into the file hierarchy of the operating system.
In another exemplary embodiment, on a System z Coupling Facility, each queue entry has an adjunct data field that can be used to store the record size. Exemplarily, opreations such as read and delete or write access to both normal and adjunct data can be performed in a single atomic instruction. If such a feature is available on the Cdev, with every write operation, the number of valid bytes (<= PIPE BUE) may be stored in the adjunct data field of each queue entry.
Alternatively some bytes of the normal data entry must be reserved to store the number of valid bytes in the data entry.
In another exemplary embodiment, a local read cache can be used to avoid the restriction that the read buffer has to be at least of size PIPEBUF. If a local read cache is used, only one reader is allowed in order to preserve the Linux FIFO semantics. The local read cache buffer must have a size of exactly PIPE BUF bytes for storing the biggest possible cf rfifo entry.
Eigs. 1-11 disclose an exemplary embodiment of system 1 according to an aspect of the present invention. System 1 may include coupling device 10 and a plurality of operating systems 2-1, 2- 2, 2-3, 2-4, 2-5, 2-6. System 1 may be contained within one computer system or its components may be distributed across several computer systems. Each of coupling device 10 and plurality of operating systems 2-1, 2-2, 2-3, 2-4, 2-5, 2-6 maybe any computer system or system component, for example, a server or a plurality of servers.
Referring to Figure 1 each operating system 2-x may include an application programming interface 2-x-1.
Referring to Figure 2, coupling device 10 may include queue area 12. Queue area 12 may be a memory configured to represent a first-inlfirst-out (FIFO) file system. A FIFO file system is a set of FIFO's. Queue area 12 may include cf_FIFO sb lock 13, FIFO file system meta data: cfFIFOsb 14, and a plurality of queue subareas 12-1, 12-2.. 12-x.
Exemplarily, each queue subarea contains one list. Each queue subarea describes a FIFO. A HFO signifies that the first entry written into a HFO 12-x will be the first entry read from that HFO 12-x.
Referring to Figures 3 and 4, the list of data in HFO: cf_data_list 18 may include a plurality of data entries 18-la, 18-2a... 18-xa that each hold data each with a corresponding adjunct data entry 18-lb, 18-2b... 18-xb for metadata. Data entries 18-la, 18-2a... 18-xa (429 may hold a maximal amount of data of PIPE BUF bytes. Adjunct data entry 18-lb, 18-2b... 18-xb (44) may include an integer value end_idx (46) which determines the last valid byte of data in corresponding data entry 18-la, 18-2a... 18-xa such that all bytes in the data entry 18-xa (42) upto the one referred to by end_idx (46) in 1 8-xb (44) are valid bytes.
Figure 5 illustrates exemplary methods to access a FIIFO for read and/or write operations.
Referring to Figure 5, in Steps 320, an operating system, for example, operating system 2-1, mounts directory of FTFO's represented by the queue area 12 in the coupling device 10.
Executing the "mount (j" function causes the directory tree for queue area 12 in coupling device including FIFO's represented by queue subareas 12-1, 12-2... 12-x to appear in the directory tree for operating system 2-1 as a standard file interface. Multiple operating systems 2-1, 2-2, 2- 3, 2-4, 2-5, 2-6 may simultaneously have queue area 12 from coupling device 10 mounted thereto.
Then in Step 321, if operating system 2-1 needs to access to or anticipates it may need to access a particular queue subarea 12-1, 12-2... 12-x, for example, queue subarea 12-1, operating system 2-1 executes an OPEN function in application programming interface 2-1-1.
After operating system 2-1 opens queue subarea 12-1, in Step 322 operating system 2-1 executes a WRITE function from application programming interface 2-1-1 to place n bytes in one or more entries at the end of queue subarea 12-1. At this point, operating system 2-1 has the option of repeating Step 322 if more data is to be written to queue subarea 12-1.
Alternatively after operating system 2-1 opens queue subarea 12-1, in Step 332 operating system 2-1 executes a READ function from application programming interface 2-1-1 to read data from the beginning of queue subarea 12-1. Application programming interface 2-1-1 then read and deletes the entry at the beginning of queue area 12.
After each WRITE and READ function, operating system 2-1 has the option of continuing with Step 322 if data is to be written to queue subarea 12-1 or with Step 332 if data is to be read from queue subarea 12-1.
Once all WRITE and READ functions have ended, in Step 323/333 operating system 2-1 executes a CLOSE function from application programming interface 2-1-1. At that point, application programming interface 2-1-1 closes access of operating system 2-1 to queue subarea 12-1.
Any number of operating systems 2-1, 2-2, 2-3, 0-4, 2-5, 2-6 may simultaneously have any number of queues 12-1, 12-2, 12-3 open for reading and writing at any point in time.
If all operations between operating system 2- and the fifo file system represented by queue area 12 of coupling device 10 are completed, however, operation system 2-1 may issue a "umount ()" command to coupling device 10 to disconnect the directory tree of represented by queue area 12 of coupling device 10 from the directory tree of operation system 2-1 as set forth in Step 340.
There are many advantages to system 1 as set forth above and in Figs. 1-4. For example, by using a widely known UNIX or Linux application programming interface 11, system 1 may be easily adapted to any system that runs UNIX or Linux or is capable of communicating with hardware that runs UNIX or Linux. Thus, system 1 may be used by a broader audience then previous systems which were proprietary.
n another example, by using coupling device 10 to connect operating systems 2-1, 2-2, 2-3, 2-4, 2-5, 2-6, latency is minimized because operating systems 2-1, 2-2, 2-3, 2-4, 2-5, 2-6 are unaware -10 -of each other, as none of the system's timing of operations is dependent on the other system.
Each of operating systems 2-1, 2-2, 2-3, 2-4, 2-5, 2-6 may simply read from or write to queues 12-1, 12-2, 12-3 when they are ready.
In a further example, by issuing only single line commands to API 11 of coupling device 10, system 1 has a high ease of use.
In yet another example, by directly mounting the directory tree of queues 12-1, 12-2, 12-3 to the directory tree of operating systems 2-1, 2-2, 2-3, 2-4, 2-5, 2-6, those systems have no need for a local queue manager, and instead may connect directly to coupling device 10 and directly write to or read from queues 12-1, 12-2, 12-3. This also may reduce the latency of system 1.
In a yet further example, because coupling device 10 does not differentiate between which of operating systems 2-1, 2-2, 2-3, 2-4, 2-5, 2-6 have opened which of queues 12-1, 12-2, 12-3 for reading and/or writing, all the operations are much simpler and thus more robust.
Figure 6 illustrates method 500 for opening the data structures. In particular, functions open(..., ORDWR), open(..., ORDWRO_NONBLOCK) would be included in method 500. First, in Step 510, a CDev, queue area, queue subarea cf_data_list for the relevant fifo are determined.
Step 520 determines whether Step 510 was successful in determining the CDev, queue area, queue subarea information for the relevant fifo. If Step 520 determines that these determinations were unsuccessful, method 500 proceeds to Step 530 to return with an error. Otherwise, method 500 proceeds to Step 540 wherein kernel data structures are created for the opened fifo. These kernel data structures among others store whether the FIFO was opened in blocking (0 RD WR) or non-blocking (0 RDWR0NONBLCK) mode. After the creation of the kernel data structures, method 500 proceeds to Step 540 to successfully returns a file descriptor.
Figure 7 illustrates method 600 for closing the data structures. In Step 610, kernel data structures for the opened fifo is released and in Step 620, method 600 return with a success indicator.
Figure 8 illustrates method 700 for writing a write buffer to a fifo that was opened with the open(. .,0_RDWR) method 500. At a write function to write a buffer to a fifo opened by the open(...,O_RDWR) function, Step 710 would exemplarily include taking, at most PIPE BUF bytes, from a beginning of the write buffer and putting the taken bytes into buffer to be sent to the CDev. In Step 720, CDev buffer is written as a new entry to end of the cf_data_list representing the fifo when the cf fifo_sb_Iock 13 is at a value of 1. In Step 725 if the value of cffifo sb_lock 13 was not 1 in step 720 then Method 700 returns with an I/O error in Step 770.
Otherwise Method 700 proceeds to Step 730. Tn Step 730, method 700 tests whether the cf_data_list was full in Step 720. If the cf_data_list was full, then method 700 would continue by repeating Step 720. Otherwise, if the cf_list_data was not full, method 700 would proceed with Step 740. In step 740 if the write buffer is empty method 700 would return the number of bytes written to the FIFO represented by the cf_data_list in Step 750. Otherwise method 700 would continue with Step 710.
On the other hand, Figure 9 illustrates method 800 for writing to a write buffer to a fifo that was opened with the open(...,O RDWRONONBLOCK) method 500. In Step 810, at most an amount of PIPE_BUF bytes is taken away from a beginning of the write buffer and put into a buffer to be sent to the CDev. Method 800 then proceeds to Step 812 where CDev buffer is written as a new entry to the end of the cf_data_list representing the fifo if the value of cffifo sb_lock 13 is 1. In Step 815 if the value of cf fifo sb_lock 13 was not 1 in step 812 then Method 800 returns with an I/O error. Otherwise Method 800 proceeds to Step 820. In Step 820, method 800 tests whether the cf_data_list was full in Step 812 in Step 870. If the cf_data_list was not full, then method 800 would proceed to Step 850 to determine whether the write buffer is empty. If the write buffer is not empty method 800 would proceed to Step 810. Otherwise, if the write buffer is empty, method 800 would return the number of bytes written to the FIFO represented by the cf_data_list.
If, in method 800, Step 820 determines that cf_data_list was full, Step 830 then determines whether any data has yet been written to the fifo. If no data has been written to the fifo, method 800 proceeds to Step 840 to return with an error message EAGAIN. On the other hand, if data has been written, method 800 would return the number of bytes written to the FIFO represented by the cf_data_list in Step 860.
On the other hand, Figure 10 illustrates method 900 for reading into a read buffer from a fifo that was opened with the command (0_RD WRO NONBLOCK). In Step 910, it is determined whether the read buffer is greater than or equal to PIPE BUF. If it is determined that the value of the read buffer is greater than or equal to PIPE BUF, the method proceeds to Step 920. On the other hand, if the value of the read buffer is less than PIPE BUF, the method proceeds to Step 930. In Step 930, method 900 returns with error value EINYAL. Tn Step 920, the first entry is read and deleted from cf_data_list representing the fifo provided cf_fifo_sb lock 13 is 1.
-12 -In Step 940, if the value of cf_fifo_sb_lock was not 1 in Step 920, then method 900 proceeds to step 960 where method 900 returns with an I/O error. On the other hand, if the cffifosblock was at a value of 1, then method 900 then determines, at Step 950, whether an entry was available in cf_data_list in Step 920. If an entry was available in cf_data_list, then method 900 would proceed to Step 980. Otherwise, if an entry was not available in cf_data_list, method 900 would return an error value EAGAIN in Step 970. In Step 980, all valid bytes from the entry read in Step 920 are copied to the read buffer and the number of valid bytes in the read buffer is returned.
Figure 11 illustrates an exemplary embodiment of method 1000 for reading into a read buffer from a fifo opened with the command open(..., 0 RDWR). In Step 1010, it is determined whether the read buffer is greater than or equal to PIPE BUF. If it is determined that the value of the read buffer is greater than or equal to PTPE_BUF, the method proceeds to Step 1030. On the other hand, if the value of the read buffer is less than PIPE BUF, the method proceeds to Step 1020. In Step 1020, method 1000 returns with error value EINVAL. Tn Step 1030, the first entry is read and deleted from cf_data_list representing the fifo provided cf_fifo sb lock 13 is 1.
In Step 1040, if the value of cf fifo sblock 13 was not 1 in Step 1030, then method 1000 proceeds to step 1050 where method 1000 returns with an I/O error. On the other hand, if the cffifosblock was 1, then method 1000 then determines, at Step 1060, whether an entry was available in cf_data_list in Step 1030. If an entry was available in cf_data_list, then method 1000 would proceed to Step 1070. Otherwise, if an entry was not available in cf_data_list, method 1000 would continue with Step 1030. Tn Step 1070, all valid bytes from the entry read in Step 1030 are copied to the read buffer and the number of valid bytes in the read buffer is returned.
A kernel module implementing the new file system (FS) type cf_fifo is exemplarily provided that would include a set of kernel data structures to control each cf fifo fifo in a mounted cffifo FS, standard structures (e.g. Linux struct file) for each "opening" of a cf_fifo fifo FS in a mounted cf fifo FS, standard structures (e.g. Linux struct mode) for each fifo in a mounted cffifo FS, an extension to the mount() operation to mount a cf_fifo FS described by a CDev and queue area identifier into the file hierarchy of the operating system, default file system and fifo attributes will be read from cf fifo sb, and a default file system and fifo attributes may be overwritten by arguments provided to mount.
-13 -A maintenance program to allocate a fifo file system with a certain number N of fifos on a coupling device may allocate data structures to store a queue area (see Figure 2) to store a file system super block lock (cf fifo sb_lock) , meta data queue and N queue sub areas (see Figure 3). Exemplarily, the meta data may describe a name, the maximal size of the fifos in the fifo file system, ownership and access right information. The file system super block lock may be reserved during the allocation and any maintenance operation on the file system. A value of 0 for cffifo sb_lock indicates the reservation. It will be released by setting cf fifo sblock to 1 when the allocation is finished. The queue subareas will be initialized to contain an empty cf_data_list lists. The size of the data entries of the cf_write_list elements will be at least PIPE BUF bytes big.
An alternative implementation of the read functions may be implemented as follows, Regardless of the size of the read buffer upto PIPE_BUF bytes but no more bytes then fit in the read buffer.
Therefore each opening of a fifo associates an internal buffer of size PTPE_BUF with the opened fifo. This internal buffer is initially empty. Each read operation then checks whether the associated internal buffer is empty. If the read operation is empty then it is filled by reading and deleting an entry of the cf_data_list representing the FIFO. Then the read function reads as many bytes as fit into the read buffer from the internal buffer into the the read buffer thereby deleting all read.
A maintenance program to deallocate a fifo file system from a coupling device may deallocate all configuration and data needed to store the fifo file system from the coupling device may be provided. Before starting the actual deallocation of data structures on the CDev the maintenance program will set the cf fifo sb lock to 0 in the queue area of file system to be deallocated.
Referring now to Fig. 12, system 1100 illustrates a typical hardware configuration which may be used for implementing the inventive system and method for buying and selling merchandise. The configuration has preferably at least one processor or central processing unit (CPU) 1110. The CPUs 1102 are interconnected via a system bus 1114 to a random access memory (RAM) 1114, read-only memory (ROM) 1116, input/output (I/O) adapter 1118 (for connecting peripheral devices such as disk units 1121 and tape drives 1140 to the bus 1114), user interface adapter 1122 (for connecting a keyboard 1124, mouse 1126, speaker 1128, microphone 1132, and/or other user interface device to the bus 1114), a communication adapter 1134 for connecting an information handling system to a data processing network, the Internet, and ntranet, a personal area network (PAN), etc., and a display adapter 1136 for connecting the bus 1114 to a display -14 -device 1138 and/or printer 1139. Further, an automated reader/scanner 1141 may be included.
Such readers/scanners are commercially available from many sources.
Such a method may be implemented, for example, by operating a computer, as embodied by a digital data processing apparatus, to execute a sequence of machine-readable instructions. These instructions may reside in various types of signal-bearing media.
Thus, this aspect of the present invention is directed to a programmed product, including signal-bearing media tangibly embodying a program of machine-readable instructions executable by a digital data processor to perform the above method.
Such a method may be implemented, for example, by operating the Cpu ii 10 to execute a sequence of machine-readable instructions. These instructions may reside in various types of signal bearing media.
Thus, this aspect of the present invention is directed to a programmed product, comprising signal-bearing media tangibly embodying a program of machine-readable instructions executable by a digital data processor incorporating the CPU 1110 and hardware above, to perform the method of the invention.
This signal-bearing media may include, for example, a RAIVI contained within the CPU 1110, as represented by the fast-access storage for example. Alternatively, the instructions may be contained in another signal-bearing media, such as a magnetic data storage diskette 1200 or CD-ROM 1102, (Fig. 13), directly or indirectly accessible by the CPU 1110.
Whether contained in the computer server/CPU 1110, or elsewhere, the instructions may be stored on a variety of machine-readable data storage media, such as DASD storage (e.g., a conventiollal "hard drive" or a RAID array), magnetic tape, electronic read-only memory (e.g., ROM, EPROM, or EEPROM), an optical storage device (e.g., CD-ROM, WORM, DVD, digital optical tape, etc.), paper "punch" cards, or other suitable signal-bearing media including transmission media such as digital and analog and communication links and wireless. In an illustrative embodiment of the invention, the machine-readable instructions may comprise software object code, complied from a language such as "C," etc. -15 -While the invention has been described in terms of several exemplary embodiments, those skilled in the art will recognize that the invention can be practised with modification within the spirit and scope of the appended claims. Further, it is noted that, Applicant's intent is to encompass equivalents of all claim elements, even if amended later during prosecution.
While the invention has been described in terms of several exemplary embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims. Further, it is noted that, Applicant's intent is to encompass equivalents of all claim elements, even if amended later during prosecution.
Claims (1)
- -16 -CLAIMS1. A method of implementing a first-in, first out (HFO) system for basic cluster communication, comprising a plurality of operating systems and a coupling device including a plurality of queues, each operating system including an application programming interface configured to read from and write to the plurality of queues via the application programming interface using a FIFO semantics mostly compliant with Linux FIFO's, the method comprising: initializing one lock and data structures on the coupling device to represent the FIFO's; executing a "mountO" function on one of the plurality of operating systems to mount a directory tree representing the queues in the coupling device into a directory tree of the one of the plurality of operating systems; executing an "open (0_RD WRO_NONBLOCK)" function and an "open (0 RD WR)" function on the one of the plurality of operating systems to open one of the plurality of queues for reading and writing; executing a "write ()"function, including n bytes of data, on the one of the plurality of operating systems to write the n bytes of data to one entry or more entires to be appended to the end of the one of the plurality of queues after executing one of the "open (0_RD WR0_N0NBL0CK)" function and the "open (0 RDWR)" function, the executing the "writeO" function including: determining whether the n bytes of data are greater than or less than a maximum byte size of an entry of the one of the plurality of queues, wherein if the n bytes of data are less than the maximum byte size of the entry of the one of the plurality of queues, writing the n bytes of data to an entry at an end of the one of the plurality of queues, wherein if the n bytes of data are greater than the maximum byte size of the entry of the one of the plurality of queues, writing the n bytes of data to multiple entries at the end of the one of the plurality of queues, each of the multiple entries including less than a the maximum byte size of the entry; executing a "read ( )" function on the one of the plurality of operating systems to read data from an entry at a beginning of the one of the plurality of queues and deleting the read entry after executing one of the "open (0 RDWRO NONBL0CK)" function and the "open (0 RD WR)" function provided the read buffer passed to the read function has at least the size of a queue entry, the executing the "read()" function including: determining whether a read buffer size of the one of the plurality of operating systems is greater than or less than a maximum byte size of an entry of the one of the plurality of queues, -17 -wherein if the read buffer size is greater than the maximum byte size of the entry of the one of the plurality of queues, reading data from an entry at a beginning of the one of the plurality of queues, wherein if the read buffer size is less than the maximum byte size of the entry of the one of the plurality of queues, returning a read fail response to the one of the plurality of operating systems; executing a "close ()" function on the one of the plurality of operating systems to close the one of the plurality of queues; executing a "umountO" function to unmount the directory tree representing the queues in the coupling device from the directory tree of the one of the plurality of operating systems; and deallocating data structures representing the FIFOs from the coupling device, wherein the "open (0 RDWRO_NONBLOCK)", "open (O_RDWR)", "write ()", functions are implemented compliant to the Linux HFO semantics, wherein the "readQ" function is implemented compliant to the Linux FIFO semantics if its read buffer is greater than or equal to the entry size of the queue representing the FIFO being read from, wherein the "readO" command fails if its read buffer is smaller than the entry size of the queue representing the FIFO being read from, wherein the "closeO" function removes all traces of the FIFO on the operating system instance it is called on if no more processes on that operating system instance have that FIFO open, wherein the "closeQ" does not modify any data or state on the coupling device, wherein the "mount", "open(O RDWRI ONONBLOCK)", "open(ORDWR)", "readO", "writeQ", "closeQ" and "umountO" frmnctions do not leave any intermediate state on the coupling device if they fail while being executed, wherein the "mountO", "open (0_RD WR0 N0NBL0CK)", "open (0_RDWR)", "write ()","read ()", and "close ()", "umountO" functions are implemented as a new file system type in the operating system kernels.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0912795.2A GB2472057B (en) | 2009-07-23 | 2009-07-23 | Method to implement a robust cluster FIFO with the coupling facility for record based communication |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0912795.2A GB2472057B (en) | 2009-07-23 | 2009-07-23 | Method to implement a robust cluster FIFO with the coupling facility for record based communication |
Publications (3)
| Publication Number | Publication Date |
|---|---|
| GB0912795D0 GB0912795D0 (en) | 2009-08-26 |
| GB2472057A true GB2472057A (en) | 2011-01-26 |
| GB2472057B GB2472057B (en) | 2016-01-27 |
Family
ID=41058405
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| GB0912795.2A Expired - Fee Related GB2472057B (en) | 2009-07-23 | 2009-07-23 | Method to implement a robust cluster FIFO with the coupling facility for record based communication |
Country Status (1)
| Country | Link |
|---|---|
| GB (1) | GB2472057B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109426572B (en) * | 2017-08-29 | 2021-07-02 | 杭州海康威视数字技术股份有限公司 | Task processing method, device and electronic device |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5163156A (en) * | 1988-07-27 | 1992-11-10 | At&T Bell Laboratories | Method for distributing messages through a mapping table which includes for each originating device a sequential list of corresponding destination devices |
| WO1996023317A1 (en) * | 1995-01-23 | 1996-08-01 | Tandem Computers Incorporated | A method for accessing a file in a multi-processor computer system using pipes and fifos |
| US6092166A (en) * | 1997-04-30 | 2000-07-18 | International Business Machines Corporation | Cross-system data piping method using an external shared memory |
| US6868437B1 (en) * | 2001-05-18 | 2005-03-15 | Agilent Technologies, Inc. | System and method for interprocess communication of remote procedure call messages utilizing shared memory |
-
2009
- 2009-07-23 GB GB0912795.2A patent/GB2472057B/en not_active Expired - Fee Related
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5163156A (en) * | 1988-07-27 | 1992-11-10 | At&T Bell Laboratories | Method for distributing messages through a mapping table which includes for each originating device a sequential list of corresponding destination devices |
| WO1996023317A1 (en) * | 1995-01-23 | 1996-08-01 | Tandem Computers Incorporated | A method for accessing a file in a multi-processor computer system using pipes and fifos |
| US6092166A (en) * | 1997-04-30 | 2000-07-18 | International Business Machines Corporation | Cross-system data piping method using an external shared memory |
| US6868437B1 (en) * | 2001-05-18 | 2005-03-15 | Agilent Technologies, Inc. | System and method for interprocess communication of remote procedure call messages utilizing shared memory |
Also Published As
| Publication number | Publication date |
|---|---|
| GB0912795D0 (en) | 2009-08-26 |
| GB2472057B (en) | 2016-01-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US8402318B2 (en) | Systems and methods for recording and replaying application execution | |
| US9021303B1 (en) | Multi-threaded in-memory processing of a transaction log for concurrent access to data during log replay | |
| US8799213B2 (en) | Combining capture and apply in a distributed information sharing system | |
| JP3974892B2 (en) | Method, system, and computer readable medium for managed file system filter model and architecture | |
| US6606685B2 (en) | System and method for intercepting file system writes | |
| US7624207B2 (en) | Method, system and program products for reducing data movement within a computing environment | |
| US8977898B1 (en) | Concurrent access to data during replay of a transaction log | |
| US7457921B2 (en) | Write barrier for data storage integrity | |
| JP5194005B2 (en) | Application program interface for managing media files | |
| US5968134A (en) | Distributed pipes and fifos in a multiprocessor | |
| KR19990029323A (en) | Application program interface, how to implement it, and computer program products | |
| US6633876B1 (en) | Analyzing post-mortem information on a remote computer system using a downloadable code module | |
| US20110167049A1 (en) | File system management techniques for computing environments and systems | |
| US10387274B2 (en) | Tail of logs in persistent main memory | |
| CN108874555A (en) | A kind of method and device for writing message to message-oriented middleware | |
| US10726047B2 (en) | Early thread return with secondary event writes | |
| US9473565B2 (en) | Data transmission for transaction processing in a networked environment | |
| GB2472057A (en) | FIFO queue coupling device for communication between systems using Linux pipe semantics | |
| US7711721B2 (en) | Apparatus, system, and method for suspending a request during file server serialization reinitialization | |
| US7770054B2 (en) | Apparatus, system, and method to prevent queue stalling | |
| CN115374074A (en) | Log processing method and device, storage medium and electronic equipment | |
| KR20150040277A (en) | Lock free streaming of executable code data | |
| Hwang et al. | A reliable and portable multimedia file system | |
| GB2472060A (en) | FIFO queue coupling device for communication between systems using Posix pipe semantics | |
| CN101325761A (en) | System and method for bridging file systems between two different processors in mobile phone |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 746 | Register noted 'licences of right' (sect. 46/1977) |
Effective date: 20160202 |
|
| PCNP | Patent ceased through non-payment of renewal fee |
Effective date: 20180723 |