JP2006185312A

JP2006185312A - Failure analysis apparatus and failure analysis method

Info

Publication number: JP2006185312A
Application number: JP2004380071A
Authority: JP
Inventors: Tetsuya Teramachi; 哲也寺町
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2004-12-28
Filing date: 2004-12-28
Publication date: 2006-07-13

Abstract

【課題】正確に自動で、アクセスルートの障害箇所を特定し、障害の無いアクセスルートに切り替える。
【解決手段】障害が検出された場合、通常アクセスを実行したアクセスルート上の全ての構成要素が、障害箇所候補とされる。障害箇所候補へテスト用アクセスが実行される。障害特定手段１１ｃにより、１以上のアクセスルートでテスト用アクセスが実行された結果、最後の１つとなった障害箇所候補の構成要素が障害箇所とされる。アクセスルート切替手段１１ｄにより、障害箇所として特定された構成要素を経由しないアクセスルートに切り替えられる。
【選択図】図１
PROBLEM TO BE SOLVED: To identify a faulty part of an access route accurately and automatically and switch to an access route without a fault.
When a failure is detected, all the components on the access route that has executed normal access are determined as failure location candidates. Test access is performed to the candidate failure location. As a result of the test access being executed by one or more access routes by the failure specifying means 11c, the constituent element of the failure location candidate that becomes the last one is set as the failure location. The access route switching unit 11d switches to an access route that does not go through the component identified as the failure location.
[Selection] Figure 1

Description

本発明は、コンピュータシステムの障害解析装置及び障害解析方法に関し、特に、冗長化されたアクセスルートと複数の装置とを含むコンピュータシステムの障害解析装置及び障害解析方法に関する。 The present invention relates to a failure analysis device and failure analysis method for a computer system, and more particularly to a failure analysis device and failure analysis method for a computer system including redundant access routes and a plurality of devices.

コンピュータシステムは、データを記憶装置に記憶したり、データを記憶装置から抽出したりしている。このため、記憶装置はコンピュータシステムにとって不可欠な存在である。 The computer system stores data in a storage device and extracts data from the storage device. For this reason, the storage device is indispensable for the computer system.

しかも、コンピュータシステムにおいて、利用するデータは年々増加するので、大容量の記憶装置が必要である。現在、低価格の大容量の記憶装置が出現しており、入手しやすくなっている。 In addition, since data to be used in a computer system increases year by year, a large-capacity storage device is required. Currently, low-priced large-capacity storage devices have emerged and are easily available.

また、コンピュータシステムでは、終日運転に対応するために、複数の記憶装置を論理的に１台の記憶装置として使用している（ディスクアレイ）。このディスクアレイを採用したコンピュータシステムは、コンピュータシステムの中に予備の記憶装置を準備している。そして、自動的に、予備の記憶装置に運用している記憶装置のデータを通常時に転送している。このようにすると、コンピュータシステムの中の１つの記憶装置において、使用不可能となるような障害が発生した場合に対応できるようになる。 Further, in the computer system, a plurality of storage devices are logically used as one storage device (disk array) in order to cope with all-day operation. In a computer system employing this disk array, a spare storage device is prepared in the computer system. Then, the data of the storage device operating in the spare storage device is automatically transferred at the normal time. In this way, it becomes possible to cope with a case where a failure that makes it unusable occurs in one storage device in the computer system.

ディスクアレイを使用し、かつ、複数のアクセスルートを持つコンピュータシステムも現れている。ここで、アクセスルートの障害については、オペレータが障害箇所を特定しているか、自動で大まかな障害箇所を特定している。なぜなら、コンピュータシステムはマルチベンダが一般的であり、大まかなインタフェースが統一されているだけだからである。そして、障害箇所を使用しないで済むように、使用するアクセスルートを手動か自動で変更している（例えば、特許文献１参照）。
特開平９−２５９００１号公報 Computer systems using a disk array and having a plurality of access routes have also appeared. Here, regarding the failure of the access route, the operator has specified the failure location or automatically specified the approximate failure location. This is because a multi-vendor computer system is generally used, and only a rough interface is unified. Then, the access route to be used is changed manually or automatically so that the troubled part is not used (see, for example, Patent Document 1).
JP-A-9-259001

しかし、手動で使用するアクセスルートを変更する場合は、オペレータが接続ミスを発生させやすい。
また、自動で使用するアクセスルートを変更する場合、障害箇所が存在することしか分かっていないので、後述するように機能的に何の問題も無い活性箇所を停止させることがあり、コンピュータシステムの運用効率を低下させている。例えば、中継器が存在するネットワークの場合、障害箇所の特定が困難であり、関係する構成要素を全て交換する等の悪い運用効率になっている。しかも、活性箇所を停止させてしまうと、復旧に時間がかかる。具体的には、図３０を参照して説明する。 However, when manually changing the access route to be used, the operator tends to cause a connection error.
In addition, when changing the access route to be used automatically, it is only known that there is a faulty part, so the active part that has no functional problem may be stopped as described later, and the operation of the computer system may be stopped. It is reducing efficiency. For example, in the case of a network in which a repeater is present, it is difficult to identify a fault location, resulting in poor operational efficiency such as exchanging all related components. Moreover, if the active location is stopped, it takes time to recover. Specifically, this will be described with reference to FIG.

図３０は、従来のシステム構成図における障害への対応の例を示す図である。
従来のシステム構成図は、第１のコンピュータ１１０と第２のコンピュータ２１０と中継器３１０、４１０と記憶装置５３０と伝送路Ｌ３１、Ｌ３２、Ｌ３３、Ｌ３４、Ｌ３５、Ｌ３６、Ｌ３７、Ｌ３８とから構成される。第１のコンピュータ１１０と第２のコンピュータ２１０とは、ユーザの端末装置であり、ユーザのサービス要求に応答するか、ユーザのサービス要求をサーバに送信する。中継器３１０、４１０は、第１のコンピュータ１１０と第２のコンピュータ２１０と記憶装置５３０とを互いに接続する。記憶装置５３０は、第１のコンピュータ１１０と第２のコンピュータ２１０とから、データを記憶させられる。第１のコンピュータ１１０と第２のコンピュータ２１０と中継器３１０、４１０と記憶装置５３０とは、伝送路Ｌ３１、Ｌ３２、Ｌ３３、Ｌ３４、Ｌ３５、Ｌ３６、Ｌ３７、Ｌ３８を介して、互いに通信している。 FIG. 30 is a diagram illustrating an example of handling a failure in a conventional system configuration diagram.
A conventional system configuration diagram includes a first computer 110, a second computer 210, repeaters 310 and 410, a storage device 530, and transmission lines L31, L32, L33, L34, L35, L36, L37, and L38. The The first computer 110 and the second computer 210 are user terminal devices, and respond to a user service request or transmit a user service request to a server. The repeaters 310 and 410 connect the first computer 110, the second computer 210, and the storage device 530 to each other. The storage device 530 can store data from the first computer 110 and the second computer 210. The first computer 110, the second computer 210, the repeaters 310 and 410, and the storage device 530 communicate with each other via transmission lines L31, L32, L33, L34, L35, L36, L37, and L38. .

第１のコンピュータ１１０は、アダプタ１１１、１１２とから構成される。アダプタ１１１、１１２は、第１のコンピュータ１１０をネットワークに接続する。アダプタ１１１、１１２とは、互いに通信している。 The first computer 110 includes adapters 111 and 112. The adapters 111 and 112 connect the first computer 110 to the network. The adapters 111 and 112 are in communication with each other.

第２のコンピュータ２１０は、アダプタ２１１、２１２とから構成される。第１のコンピュータ１１０と第２のコンピュータ２１０との関係において、同じ名称の構成要素ならば、構成要素の機能も同じである。 The second computer 210 includes adapters 211 and 212. In the relationship between the first computer 110 and the second computer 210, if the component has the same name, the function of the component is the same.

記憶装置５３０は、コントローラ５３１、５３２と記憶装置本体５３３とから構成される。コントローラ５３１、５３２は、記憶装置５３０をネットワークに接続する。記憶装置本体５３３は、記憶装置５３０の本体である。コントローラ５３１、５３２と記憶装置本体５３３とは、互いに通信している。 The storage device 530 includes controllers 531 and 532 and a storage device body 533. The controllers 531 and 532 connect the storage device 530 to the network. The storage device main body 533 is a main body of the storage device 530. The controllers 531 and 532 and the storage device main body 533 communicate with each other.

従来では、第１のコンピュータ１１０と第２のコンピュータ２１０との環境を統一するために、第１のコンピュータ１１０におけるアダプタ１１１と伝送路Ｌ３１と中継器３１０と伝送路Ｌ３５とコントローラ５３１とのアクセスルートに障害が検出されると、第２のコンピュータ２１０におけるアダプタ２１１と伝送路Ｌ３３と中継器３１０と伝送路Ｌ３５とコントローラ５３１とのアクセスルートも使用しないようにしている。 Conventionally, in order to unify the environment of the first computer 110 and the second computer 210, access routes between the adapter 111, the transmission line L31, the repeater 310, the transmission line L35, and the controller 531 in the first computer 110. If a failure is detected, the access route between the adapter 211, the transmission line L33, the repeater 310, the transmission line L35, and the controller 531 in the second computer 210 is not used.

ここで、障害箇所がアダプタ１１１の場合、第２のコンピュータ２１０での冗長性が失われ、信頼性が低下する。
本発明は、このような点に鑑みてなされたものであり、利用しているアクセスルートに関して、自動で、アクセスルートの障害箇所を正確に特定し、障害の無いアクセスルートに切り替える障害解析装置及び障害解析方法を提供することを目的とする。 Here, when the failure point is the adapter 111, the redundancy in the second computer 210 is lost and the reliability is lowered.
The present invention has been made in view of the above points, and relates to an access route that is used, a failure analysis device that automatically identifies a failure location of an access route and switches to an access route without a failure, and The object is to provide a failure analysis method.

本発明では、上記課題を解決するために、図１に示すように、複数の装置間の通常アクセスにおいて障害を検出すると、通常アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補とする障害検出手段１１ａと、障害検出手段１１ａで障害が検出されると、障害箇所候補へテスト用アクセスを実行し、テスト用アクセスで障害を検出しなかった場合、テスト用アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補から除外し、テスト用アクセスで障害を検出した場合、テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素を障害箇所候補から除外する障害診断手段１１ｂと、障害診断手段１１ｂにおいて、１以上のアクセスルートでテスト用アクセスが実行された結果、最後の１つとなった障害箇所候補の構成要素を障害箇所とする障害特定手段１１ｃと、複数の装置間の通常アクセス用のアクセスルートを、障害特定手段１１ｃで障害箇所として特定された構成要素を経由しないアクセスルートに切り替えるアクセスルート切替手段１１ｄとを提供する。 In the present invention, in order to solve the above-described problem, as shown in FIG. 1, when a failure is detected in a normal access between a plurality of devices, all the components on the access route that has executed the normal access are regarded as failure location candidates. If a failure is detected by the failure detection means 11a and the failure detection means 11a, a test access is executed to the failure location candidate, and if no failure is detected by the test access, the access route that executed the test access If all the above components are excluded from the failure location candidates and a failure is detected by the test access, a failure that excludes all the components that are not placed on the access route that performed the test access from the failure location candidates As a result of the test access being executed by one or more access routes in the diagnosis unit 11b and the failure diagnosis unit 11b, the last 1 The failure specifying unit 11c that uses the component of the candidate failure location that has become the failure location, and the access route for normal access between a plurality of devices that does not pass through the component specified as the failure location by the failure specifying unit 11c Access route switching means 11d for switching to a route is provided.

このようにすると、複数の装置間の通常アクセスにおいて障害が検出された場合、障害検出手段１１ａにより、通常アクセスを実行したアクセスルート上の全ての構成要素が、障害箇所候補とされる。障害検出手段１１ａで障害が検出されると、障害箇所候補へテスト用アクセスが実行される。障害診断手段１１ｂにより、テスト用アクセスで障害が検出されなかった場合、テスト用アクセスを実行したアクセスルート上の全ての構成要素が、障害箇所候補から除外される。障害診断手段１１ｂにより、テスト用アクセスで障害が検出された場合、テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素が、障害箇所候補から除外される。障害特定手段１１ｃにより、障害診断手段１１ｂにおいて１以上のアクセスルートでテスト用アクセスが実行された結果、最後の１つとなった障害箇所候補の構成要素が、障害箇所とされる。アクセスルート切替手段１１ｄにより、複数の装置間の通常アクセス用のアクセスルートが、障害特定手段１１ｃで障害箇所として特定された構成要素を経由しないアクセスルートに切り替えられる。 In this way, when a failure is detected in a normal access between a plurality of devices, the failure detection unit 11a sets all the components on the access route where the normal access has been performed as failure location candidates. When a failure is detected by the failure detection means 11a, a test access is executed to the failure location candidate. When no failure is detected in the test access by the failure diagnosis unit 11b, all the components on the access route that executed the test access are excluded from the failure location candidates. When a failure is detected in the test access by the failure diagnosis unit 11b, all components that are not arranged on the access route that has executed the test access are excluded from the failure location candidates. As a result of execution of test access by one or more access routes in the failure diagnosis unit 11b by the failure identification unit 11c, the component of the failure portion candidate that becomes the last one is set as the failure point. The access route switching unit 11d switches the access route for normal access between a plurality of devices to an access route that does not pass through the component specified as the failure location by the failure specifying unit 11c.

本発明では、通常アクセスにおいて障害を検出すると、テスト用アクセスを繰り返すようにした。そして、障害箇所を特定し、障害箇所として特定された構成要素を経由しないアクセスルートに切り替えるようにした。 In the present invention, when a failure is detected in normal access, the test access is repeated. Then, the fault location is specified, and the access route is switched to not via the component specified as the fault location.

これにより、障害箇所を正確に特定することができ、障害箇所を容易に交換できるようになる。 As a result, the fault location can be accurately identified, and the fault location can be easily replaced.

以下、本発明の実施の形態を図面を参照して説明する。
まず、本発明の概念について説明し、その後、実施の形態の具体的な内容を説明する。
図１は、本発明の概念図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
First, the concept of the present invention will be described, and then the specific contents of the embodiment will be described.
FIG. 1 is a conceptual diagram of the present invention.

本発明の障害解析装置が使用される障害解析システムは、コンピュータ１０と中継器２０、３０と装置４０と伝送路Ｌ１、Ｌ２、Ｌ３、Ｌ４、Ｌ５、Ｌ６、Ｌ７、Ｌ８とから構成される。コンピュータ１０は、ユーザの端末装置であり、ユーザのサービス要求に応答するか、ユーザのサービス要求をサーバに送信する。中継器２０、３０は、コンピュータ１０と装置４０とを互いに接続する。装置４０は、コンピュータ１０から、データを記憶させられる。コンピュータ１０と中継器２０、３０と装置４０とは、伝送路Ｌ１、Ｌ２、Ｌ３、Ｌ４、Ｌ５、Ｌ６、Ｌ７、Ｌ８を介して、互いに通信している。 The failure analysis system in which the failure analysis device of the present invention is used includes the computer 10, the repeaters 20, 30, the device 40, and the transmission lines L1, L2, L3, L4, L5, L6, L7, L8. The computer 10 is a user terminal device and responds to a user service request or transmits a user service request to a server. The repeaters 20 and 30 connect the computer 10 and the device 40 to each other. The device 40 can store data from the computer 10. The computer 10, the repeaters 20, 30 and the device 40 communicate with each other via transmission lines L1, L2, L3, L4, L5, L6, L7, and L8.

コンピュータ１０は、障害解析装置１１とアダプタ１２、１３とから構成される。障害解析装置１１は、利用しているアクセスルートに関して、アクセスルートの障害箇所を特定し、障害の無いアクセスルートに切り替える。アダプタ１２、１３は、コンピュータ１０をネットワークに接続する。障害解析装置１１とアダプタ１２、１３とは、互いに通信している。 The computer 10 includes a failure analysis device 11 and adapters 12 and 13. The failure analysis apparatus 11 identifies the location of a failure in the access route with respect to the access route being used, and switches to an access route without a failure. The adapters 12 and 13 connect the computer 10 to the network. The failure analysis device 11 and the adapters 12 and 13 communicate with each other.

装置４０は、コントローラ４１、４２と装置本体４３とから構成される。コントローラ４１、４２は、装置４０をネットワークに接続する。装置本体４３は、装置４０の本体である。コントローラ４１、４２と装置本体４３とは、互いに通信している。 The device 40 includes controllers 41 and 42 and a device main body 43. The controllers 41 and 42 connect the device 40 to the network. The device main body 43 is the main body of the device 40. The controllers 41 and 42 and the apparatus main body 43 communicate with each other.

障害解析装置１１は、障害検出手段１１ａと障害診断手段１１ｂと障害特定手段１１ｃとアクセスルート切替手段１１ｄとから構成される。
障害検出手段１１ａは、複数の装置間の通常アクセスにおいて障害を検出すると、通常アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補とする。 The failure analysis device 11 includes a failure detection unit 11a, a failure diagnosis unit 11b, a failure identification unit 11c, and an access route switching unit 11d.
When the failure detection unit 11a detects a failure in normal access between a plurality of devices, the failure detection unit 11a sets all components on the access route that executed the normal access as failure location candidates.

障害診断手段１１ｂは、障害検出手段１１ａで障害が検出されると、障害箇所候補へテスト用アクセスを実行し、テスト用アクセスで障害を検出しなかった場合、テスト用アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補から除外する。さらに障害診断手段１１ｂは、テスト用アクセスで障害を検出した場合、テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素を障害箇所候補から除外する。 When a failure is detected by the failure detection unit 11a, the failure diagnosis unit 11b executes a test access to the failure location candidate. If no failure is detected by the test access, the failure diagnosis unit 11b Are excluded from the candidate failure points. Further, when a failure is detected by the test access, the failure diagnosis unit 11b excludes all components that are not arranged on the access route that has executed the test access from the failure location candidates.

障害特定手段１１ｃは、障害診断手段１１ｂにおいて、１以上のアクセスルートでテスト用アクセスが実行された結果、最後の１つとなった障害箇所候補の構成要素を障害箇所とする。 The failure identification unit 11c uses the component of the failure location candidate that becomes the last one as a result of execution of the test access by one or more access routes in the failure diagnosis unit 11b as the failure location.

アクセスルート切替手段１１ｄは、複数の装置間の通常アクセス用のアクセスルートを、障害特定手段１１ｃで障害箇所として特定された構成要素を経由しないアクセスルートに切り替える。 The access route switching unit 11d switches the access route for normal access between a plurality of devices to an access route that does not pass through the component specified as the failure location by the failure specifying unit 11c.

障害検出手段１１ａと障害診断手段１１ｂと障害特定手段１１ｃとアクセスルート切替手段１１ｄとは、互いに通信している。
例えば、コンピュータ１０と装置４０との通信が、伝送路Ｌ１、Ｌ５を利用して実現されている。そして、伝送路Ｌ１、Ｌ５にエラーが発生すると、伝送路Ｌ３、Ｌ５に切り替えて、エラーが無い場合、伝送路Ｌ１もしくはアダプタ１２が故障箇所である。さらに、伝送路Ｌ２、Ｌ７に切り替えて、エラーが無い場合、伝送路Ｌ１が故障箇所である。その後、伝送路Ｌ１、Ｌ５を、エラーの無い伝送路Ｌ２、Ｌ７に切り替える。 The failure detection unit 11a, the failure diagnosis unit 11b, the failure identification unit 11c, and the access route switching unit 11d communicate with each other.
For example, communication between the computer 10 and the device 40 is realized using the transmission lines L1 and L5. When an error occurs in the transmission lines L1 and L5, the transmission line is switched to the transmission lines L3 and L5. If there is no error, the transmission line L1 or the adapter 12 is a failure location. Further, when there is no error by switching to the transmission lines L2 and L7, the transmission line L1 is a failure location. Thereafter, the transmission lines L1 and L5 are switched to the transmission lines L2 and L7 having no error.

このようにすると、障害を発生させた構成要素を正確に自動で把握でき、容易に交換できる。
以下、実施の形態の具体的な内容を説明する。 If it does in this way, the component which generated the fault can be grasped automatically automatically, and can be exchanged easily.
Hereinafter, specific contents of the embodiment will be described.

［第１の実施の形態］
図２は、第１の実施の形態のシステム構成図である。第１のコンピュータ１００と第２のコンピュータ２００とは、利用しているアクセスルートに関して、アクセスルートの障害箇所を特定し、障害の無いアクセスルートに切り替える。 [First Embodiment]
FIG. 2 is a system configuration diagram of the first embodiment. The first computer 100 and the second computer 200 identify the faulty part of the access route with respect to the access route being used, and switch to the access route without a fault.

第１の実施の形態のシステム構成図は、第１のコンピュータ１００と第２のコンピュータ２００と中継器３００、４００と記憶装置５００と伝送路Ｌ１１、Ｌ１２、Ｌ１３、Ｌ１４、Ｌ１５、Ｌ１６、Ｌ１７、Ｌ１８とから構成される。第１のコンピュータ１００と第２のコンピュータ２００とは、データを記憶装置５００に書き込んだり、データを記憶装置５００から引き出したりする。中継器３００、４００は、第１のコンピュータ１００と第２のコンピュータ２００と記憶装置５００とを互いに接続する。記憶装置５００は、第１のコンピュータ１００と第２のコンピュータ２００とから送られるデータを記憶する。第１のコンピュータ１００と第２のコンピュータ２００と中継器３００、４００と記憶装置５００とは、伝送路Ｌ１１、Ｌ１２、Ｌ１３、Ｌ１４、Ｌ１５、Ｌ１６、Ｌ１７、Ｌ１８を介して、互いに通信している。 The system configuration diagram of the first embodiment includes a first computer 100, a second computer 200, repeaters 300, 400, a storage device 500, transmission lines L11, L12, L13, L14, L15, L16, L17, L18. The first computer 100 and the second computer 200 write data to the storage device 500 and pull data from the storage device 500. The repeaters 300 and 400 connect the first computer 100, the second computer 200, and the storage device 500 to each other. The storage device 500 stores data sent from the first computer 100 and the second computer 200. The first computer 100, the second computer 200, the repeaters 300 and 400, and the storage device 500 communicate with each other via transmission lines L11, L12, L13, L14, L15, L16, L17, and L18. .

第１のコンピュータ１００は、障害特定部１０３とアクセスルート切替部１０４とアダプタ１０１、１０２とから構成される。
障害特定部１０３は、記憶装置５００への通常アクセスにおいて、障害を検出した場合、通常アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補とする。そして、障害箇所候補へテスト用アクセスを実行する。テスト用アクセスで障害を検出しなかった場合、テスト用アクセスを実行したアクセスルート上の全ての構成要素を、障害箇所候補から除外する。テスト用アクセスで障害を検出した場合、テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素を、障害箇所候補から除外する。最後に、１以上のアクセスルートでテスト用アクセスを実行した結果として、最後の１つとなった障害箇所候補の構成要素を障害箇所とする。 The first computer 100 includes a failure identification unit 103, an access route switching unit 104, and adapters 101 and 102.
When a failure is detected in the normal access to the storage device 500, the failure identifying unit 103 sets all the components on the access route that executed the normal access as failure location candidates. Then, a test access is executed to the failure location candidate. When no failure is detected in the test access, all the components on the access route that executed the test access are excluded from the failure location candidates. When a failure is detected in the test access, all the components that are not arranged on the access route that has executed the test access are excluded from the failure location candidates. Finally, as a result of executing the test access with one or more access routes, the constituent element of the fault location candidate that becomes the last one is set as the fault location.

アクセスルート切替部１０４は、記憶装置５００への通常アクセスが実行されて障害が検出されたアクセスルートを、障害箇所として特定された構成要素を経由しないアクセスルートに切り替える。 The access route switching unit 104 switches the access route in which the normal access to the storage device 500 is executed and the failure is detected to an access route that does not pass through the component identified as the failure location.

アダプタ１０１、１０２は、第１のコンピュータ１００をネットワークに接続する。
障害特定部１０３とアクセスルート切替部１０４とアダプタ１０１、１０２とは、互いに通信している。 The adapters 101 and 102 connect the first computer 100 to the network.
The failure identification unit 103, the access route switching unit 104, and the adapters 101 and 102 communicate with each other.

第２のコンピュータ２００は、障害特定部２０３とアクセスルート切替部２０４とアダプタ２０１、２０２とから構成される。第１のコンピュータ１００と第２のコンピュータ２００との関係において、同じ名称の構成要素ならば、構成要素の機能も同じである。 The second computer 200 includes a failure identification unit 203, an access route switching unit 204, and adapters 201 and 202. In the relationship between the first computer 100 and the second computer 200, if the component has the same name, the function of the component is the same.

記憶装置５００は、コントローラ５０１、５０２と記憶装置本体５０３とから構成される。コントローラ５０１、５０２は、記憶装置５００をネットワークに接続する。記憶装置本体５０３は、記憶装置５００の本体である。コントローラ５０１、５０２と記憶装置本体５０３とは、互いに通信している。 The storage device 500 includes controllers 501 and 502 and a storage device body 503. The controllers 501 and 502 connect the storage device 500 to the network. The storage device main body 503 is the main body of the storage device 500. The controllers 501 and 502 and the storage device main body 503 communicate with each other.

また、第１のコンピュータ１００の障害特定部１０３と第２のコンピュータ２００の障害特定部２０３とは、互いに通信し、連帯して動作する。
図３は、障害特定部の例を示す図である。障害特定部１０３は、全てのアクセスルート上において各構成要素に対する障害確認部を複数有している。障害特定部１０３は、デーモンを利用して各構成要素を監視している。デーモンから各構成要素に命令が出て、障害特定部１０３は、各構成要素からの応答を取得する。 Further, the failure identification unit 103 of the first computer 100 and the failure identification unit 203 of the second computer 200 communicate with each other and operate together.
FIG. 3 is a diagram illustrating an example of the failure identification unit. The failure identification unit 103 has a plurality of failure confirmation units for each component on all access routes. The failure identification unit 103 monitors each component using a daemon. A command is issued from the daemon to each component, and the failure identifying unit 103 acquires a response from each component.

障害特定部１０３は、各構成要素に対する障害確認部として、記憶装置本体に対する障害確認部１０３ａ、アダプタに対する障害確認部１０３ｂ、コントローラ（自経路）に対する障害確認部１０３ｃ、コントローラ（他経路）に対する障害確認部１０３ｄ、中継器及び中継器前伝送路に対する障害確認部１０３ｅ及び中継器及び中継器後伝送路に対する障害確認部１０３ｆから構成される。記憶装置本体に対する障害確認部１０３ａ、アダプタに対する障害確認部１０３ｂ、コントローラ（自経路）に対する障害確認部１０３ｃ、コントローラ（他経路）に対する障害確認部１０３ｄ、中継器及び中継器前伝送路に対する障害確認部１０３ｅ及び中継器及び中継器後伝送路に対する障害確認部１０３ｆは、互いに通信している。 The failure identification unit 103 is a failure confirmation unit for each component, a failure confirmation unit 103a for the storage device body, a failure confirmation unit 103b for the adapter, a failure confirmation unit 103c for the controller (own route), and a failure confirmation for the controller (other route). 103d, a failure confirmation unit 103e for the repeater and the transmission path before the repeater, and a failure confirmation unit 103f for the transmission path after the repeater and the repeater. Failure confirmation unit 103a for the storage device main body, failure confirmation unit 103b for the adapter, failure confirmation unit 103c for the controller (own route), failure confirmation unit 103d for the controller (other route), failure confirmation unit for the repeater and the transmission path before the repeater 103e and the failure confirmation unit 103f for the repeater and the post-relay transmission path communicate with each other.

以下に、各構成要素に対する障害確認部について説明する。
図４は、記憶装置本体に対する障害確認部の処理の例を示す図である。
記憶装置本体に対する障害確認部１０３ａが、図４に太字で示されるアクセスルートを確認すると、正常なら、記憶装置本体５０３へのアクセスルートは正常である。異常なら、記憶装置本体５０３へのアクセスルートは異常であり、アダプタ１０１と伝送路Ｌ１１と中継器３００と伝送路Ｌ１５とコントローラ５０１と記憶装置本体５０３との中で、何れかが異常である。 Hereinafter, the failure confirmation unit for each component will be described.
FIG. 4 is a diagram illustrating an example of processing of the failure confirmation unit for the storage device body.
When the failure confirmation unit 103a for the storage device main body confirms the access route shown in bold in FIG. 4, if the access route is normal, the access route to the storage device main body 503 is normal. If it is abnormal, the access route to the storage device main body 503 is abnormal, and any of the adapter 101, the transmission path L11, the repeater 300, the transmission path L15, the controller 501, and the storage device main body 503 is abnormal.

なお、原則的に、記憶装置本体５０３の自らの障害を管理する機能により、記憶装置本体５０３そのものが異常な場合は、明確に異常であることを外部に通知する。つまり、記憶装置本体５０３へのアクセスルートが異常な場合と記憶装置本体５０３そのものが異常な場合とが存在するが、後者については明確になる。 Note that, in principle, when the storage device main body 503 itself is abnormal, the storage device main body 503 itself notifies the outside of the fact that the storage device main body 503 is abnormal by the function of managing its own failure. That is, there are cases where the access route to the storage device main body 503 is abnormal and cases where the storage device main body 503 itself is abnormal, but the latter is clear.

図５は、アダプタに対する障害確認部の処理の例を示す図である。ここで、通常アクセスを実行したアクセスルートの一部を経由したテスト用アクセスを実行する。
アダプタに対する障害確認部１０３ｂが、図５に太字で示されるアクセスルートを確認すると、正常なら、アダプタ１０１へのアクセスルートは正常である。異常なら、アダプタ１０１へのアクセスルートは異常であり、アダプタ１０１は異常である。 FIG. 5 is a diagram illustrating an example of processing of the failure confirmation unit for the adapter. Here, the test access is executed via a part of the access route that has executed the normal access.
When the failure confirmation unit 103b for the adapter confirms the access route shown in bold in FIG. 5, if it is normal, the access route to the adapter 101 is normal. If abnormal, the access route to the adapter 101 is abnormal, and the adapter 101 is abnormal.

図６は、コントローラ（自経路）に対する障害確認部の処理の例を示す図である。ここで、通常アクセスを実行したアクセスルートの一部を経由したテスト用アクセスを実行する。 FIG. 6 is a diagram illustrating an example of processing of the failure confirmation unit for the controller (own route). Here, the test access is executed via a part of the access route that has executed the normal access.

コントローラ（自経路）に対する障害確認部１０３ｃが、図６に太字で示されるアクセスルートを確認すると、正常なら、コントローラ５０１へのアクセスルートは正常である。異常なら、コントローラ５０１へのアクセスルートは異常であり、アダプタ１０１と伝送路Ｌ１１と中継器３００と伝送路Ｌ１５とコントローラ５０１との中で、何れかが異常である。 When the failure confirmation unit 103c for the controller (own route) confirms the access route shown in bold in FIG. 6, if it is normal, the access route to the controller 501 is normal. If it is abnormal, the access route to the controller 501 is abnormal, and any of the adapter 101, the transmission line L11, the repeater 300, the transmission line L15, and the controller 501 is abnormal.

図７は、コントローラ（他経路）に対する障害確認部の処理の例を示す図である。ここで、通常アクセスを実行したアクセスルートと異なるアクセスルートを経由したテスト用アクセスを実行する。 FIG. 7 is a diagram illustrating an example of processing of the failure confirmation unit for the controller (other route). Here, a test access via an access route different from the access route that executed the normal access is executed.

コントローラ（他経路）に対する障害確認部１０３ｄが、図７に太字で示されるアクセスルートを確認すると、正常なら、コントローラ５０１へのアクセスルートは正常である。異常なら、コントローラ５０１へのアクセスルートは異常である。 When the failure confirmation unit 103d for the controller (other route) confirms the access route shown in bold in FIG. 7, if it is normal, the access route to the controller 501 is normal. If it is abnormal, the access route to the controller 501 is abnormal.

ここで、自経路と他経路との両方が異常な場合、コントローラ５０１は異常である。
なお、コントローラ（他経路）に対する障害確認部１０３ｄは、記憶装置５００の内部の処理を伴う。この場合において、例えば、UNIX（登録商標）のUSCSIコマンドのSEND DIAGNOSTIC/RECEIVE DIAGNOSTICを利用して、コントローラ５０１とコントローラ５０２との間を互いに通信している。 Here, when both the own route and the other route are abnormal, the controller 501 is abnormal.
It should be noted that the failure confirmation unit 103d for the controller (other path) is accompanied by processing inside the storage device 500. In this case, for example, the SEND DIAGNOSTIC / RECEIVE DIAGNOSTIC of UNIX (registered trademark) USCSI command is used to communicate between the controller 501 and the controller 502.

図８は、中継器及び中継器前伝送路に対する障害確認部の処理の例を示す図である。ここで、通常アクセスを実行したアクセスルートの一部を経由したテスト用アクセスを実行する。そして、通常アクセスを実行したアクセスルートと異なるアクセスルートを経由したテスト用アクセスを実行する。 FIG. 8 is a diagram illustrating an example of processing of the failure confirmation unit for the repeater and the pre-repeater transmission path. Here, the test access is executed via a part of the access route that has executed the normal access. Then, a test access via an access route different from the access route that executed the normal access is executed.

中継器及び中継器前伝送路に対する障害確認部１０３ｅが、図８に太字で示される２つのアクセスルートを確認すると、左側が正常なら、コントローラ５０１への左側のアクセスルートは正常である。左側が異常なら、コントローラ５０１への左側のアクセスルートは異常である。右側が正常なら、コントローラ５０１への右側のアクセスルートは正常である。右側が異常なら、コントローラ５０１への右側のアクセスルートは異常である。 When the failure confirmation unit 103e for the repeater and the transmission path before the repeater confirms two access routes shown in bold in FIG. 8, if the left side is normal, the left access route to the controller 501 is normal. If the left side is abnormal, the left access route to the controller 501 is abnormal. If the right side is normal, the right access route to the controller 501 is normal. If the right side is abnormal, the right access route to the controller 501 is abnormal.

ここで、左側が正常で右側が異常なら、アダプタ２０１と伝送路Ｌ１３との中で、何れかが異常である。右側が正常で左側が異常なら、アダプタ１０１と伝送路Ｌ１１との中で、何れかが異常である。 Here, if the left side is normal and the right side is abnormal, one of the adapter 201 and the transmission line L13 is abnormal. If the right side is normal and the left side is abnormal, one of the adapters 101 and the transmission line L11 is abnormal.

図９は、中継器及び中継器後伝送路に対する障害確認部の処理の例を示す図である。ここで、通常アクセスを実行したアクセスルートの一部を経由したテスト用アクセスを実行する。そして、通常アクセスを実行したアクセスルートと異なるアクセスルートを経由したテスト用アクセスを実行する。 FIG. 9 is a diagram illustrating an example of processing of the failure confirmation unit for the repeater and the transmission path after the repeater. Here, the test access is executed via a part of the access route that has executed the normal access. Then, a test access via an access route different from the access route that executed the normal access is executed.

中継器及び中継器後伝送路に対する障害確認部１０３ｆが、図９に太字で示される２つのアクセスルートを確認すると、左側が正常なら、コントローラ５０１への左側のアクセスルートは正常である。左側が異常なら、コントローラ５０１への左側のアクセスルートは異常である。右側が正常なら、コントローラ５０２への右側のアクセスルートは正常である。右側が異常なら、コントローラ５０２への右側のアクセスルートは異常である。 When the failure confirmation unit 103f for the repeater and the post-relay transmission path confirms two access routes shown in bold in FIG. 9, if the left side is normal, the left access route to the controller 501 is normal. If the left side is abnormal, the left access route to the controller 501 is abnormal. If the right side is normal, the right access route to the controller 502 is normal. If the right side is abnormal, the right access route to the controller 502 is abnormal.

ここで、左側が正常で右側が異常なら、伝送路Ｌ１６とコントローラ５０２との中で、何れかが異常である。右側が正常で左側が異常なら、伝送路Ｌ１５とコントローラ５０１との中で、何れかが異常である。 Here, if the left side is normal and the right side is abnormal, one of the transmission lines L16 and the controller 502 is abnormal. If the right side is normal and the left side is abnormal, one of the transmission lines L15 and the controller 501 is abnormal.

以上において、例えば、コントローラ（他経路）に対する障害確認部１０３ｄ以外の他の障害確認部は、UNIXのUSCSIコマンドのTEST UNIT READYを利用している。
以下に、上述した各障害確認部を利用して、障害箇所の特定の処理を説明する。 In the above, for example, the failure confirmation unit other than the failure confirmation unit 103d for the controller (other path) uses the UNIX USCSI command TEST UNIT READY.
Below, the specific process of a failure location is demonstrated using each failure confirmation part mentioned above.

図１０は、第１の実施の形態の処理の例を示すフローチャートの前半である。
［Ｓ１１］障害特定部１０３が、初期設定として既に記憶しているアクセスルート情報を取得する。なお、アクセスルート情報は、設定ファイルに指定できる。ここで、アクセスルート情報とは、スタートポイントからエンドポイントまでのアクセスルートに、何が存在するかの情報である。 FIG. 10 is the first half of a flowchart illustrating an example of processing according to the first embodiment.
[S11] The failure identification unit 103 acquires access route information that is already stored as an initial setting. The access route information can be specified in the setting file. Here, the access route information is information indicating what exists in the access route from the start point to the end point.

［Ｓ１２］障害特定部１０３が、初期設定として既に記憶している各構成要素に対する障害確認部を取得する。なお、実行する順番に、各構成要素に対する障害確認部を後述する設定ファイルに記載できる。 [S12] The failure identification unit 103 acquires a failure confirmation unit for each component already stored as an initial setting. Note that in the order of execution, the failure confirmation unit for each component can be described in a setting file described later.

［Ｓ１３］障害特定部１０３の中の記憶装置本体に対する障害確認部１０３ａが、記憶装置本体５０３までの障害の有無を判断する。障害が存在する場合、他の構成要素を確認するため、Ｓ１５へ進む。障害が存在しない場合、次のアクセスルートを確認するため、Ｓ１１へ進む。 [S13] The failure confirmation unit 103a for the storage device body in the failure identification unit 103 determines whether there is a failure up to the storage device body 503. If a failure exists, the process proceeds to S15 to check other components. If there is no failure, the process proceeds to S11 to confirm the next access route.

［Ｓ１５］記憶装置本体５０３自体の障害の場合、障害特定部１０３が、アクセスルート切替部１０４に、障害の存在を通知する。
［Ｓ１６］障害特定部１０３の中のアダプタに対する障害確認部１０３ｂが、アダプタ１０１までの障害の有無を判断する。障害が存在する場合、アダプタ１０１の障害が発見されたため、Ｓ１７へ進む。障害が存在しない場合、他の構成要素を確認するため、Ｓ１８へ進む。 [S15] In the case of a failure in the storage device body 503 itself, the failure identifying unit 103 notifies the access route switching unit 104 of the presence of the failure.
[S16] The failure confirmation unit 103b for the adapter in the failure identification unit 103 determines whether there is a failure up to the adapter 101. If there is a failure, the failure of the adapter 101 has been discovered, and the process proceeds to S17. If no failure exists, the process proceeds to S18 in order to check other components.

［Ｓ１７］障害特定部１０３が、アダプタ１０１の障害を確認する。そして、次のアクセスルートを確認するため、Ｓ１１へ進む。
［Ｓ１８］障害特定部１０３の中の自経路でのコントローラに対する障害確認部１０３ｃが、コントローラ５０１までの障害の有無を判断する。障害が存在する場合、他の構成要素を確認するため、Ｓ２０へ進む。障害が存在しない場合、記憶装置本体５０３の障害が発見されたため、Ｓ１９へ進む。 [S17] The failure identification unit 103 confirms the failure of the adapter 101. And in order to confirm the next access route, it progresses to S11.
[S18] The failure confirmation unit 103c for the controller on its own route in the failure identification unit 103 determines whether there is a failure up to the controller 501. If there is a failure, the process proceeds to S20 to check other components. If no failure exists, the failure of the storage device main body 503 has been found, and the process proceeds to S19.

［Ｓ１９］障害特定部１０３が、記憶装置本体５０３の障害を確認する。そして、次のアクセスルートを確認するため、Ｓ１１へ進む。
［Ｓ２０］障害特定部１０３が、伝送路Ｌ１１、Ｌ１５と中継器３００とコントローラ５０１との中の何れかの障害を確認する。そして、図１１のＡへ進む。 [S19] The failure identification unit 103 confirms the failure of the storage device main body 503. And in order to confirm the next access route, it progresses to S11.
[S20] The failure identification unit 103 confirms any failure in the transmission lines L11 and L15, the repeater 300, and the controller 501. Then, the process proceeds to A of FIG.

図１１は、第１の実施の形態の処理の例を示すフローチャートの後半である。
［Ｓ２１］図１０のＡから、障害特定部１０３の中の他経路でのコントローラに対する障害確認部１０３ｄが、コントローラ５０１までの障害の有無を判断する。障害が存在する場合、コントローラ５０１の障害が発見されたため、Ｓ２２へ進む。障害が存在しない場合、他の構成要素を確認するため、Ｓ２３へ進む。 FIG. 11 is the second half of the flowchart illustrating an example of processing according to the first embodiment.
[S21] From FIG. 10A, the failure confirmation unit 103d for the controller on the other path in the failure identification unit 103 determines whether there is a failure up to the controller 501. If there is a failure, the failure of the controller 501 has been found, so the process proceeds to S22. If no failure exists, the process proceeds to S23 in order to check other components.

［Ｓ２２］障害特定部１０３が、コントローラ５０１の障害を確認する。そして、次のアクセスルートを確認するため、図１０のＢを介してＳ１１へ進む。
［Ｓ２３］障害特定部１０３が、伝送路Ｌ１１、Ｌ１５と中継器３００との中の何れかの障害を確認する。 [S22] The failure identification unit 103 confirms the failure of the controller 501. And in order to confirm the next access route, it progresses to S11 via B of FIG.
[S23] The failure identifying unit 103 confirms any failure in the transmission lines L11 and L15 and the repeater 300.

［Ｓ２４］障害特定部１０３の中の中継器及び中継器前伝送路に対する障害確認部１０３ｅが、伝送路Ｌ１３までの障害の有無を判断する。障害が存在する場合、他の構成要素を確認するため、Ｓ２６へ進む。障害が存在しない場合、伝送路Ｌ１１の障害が発見されたため、Ｓ２５へ進む。 [S24] The failure confirmation unit 103e for the repeater in the failure identification unit 103 and the transmission path before the repeater determines whether there is a failure up to the transmission line L13. If a failure exists, the process proceeds to S26 in order to check other components. If there is no failure, a failure in the transmission line L11 has been found, and the process proceeds to S25.

［Ｓ２５］障害特定部１０３が、伝送路Ｌ１１の障害を確認する。そして、次のアクセスルートを確認するため、図１０のＢを介してＳ１１へ進む。
［Ｓ２６］障害特定部１０３が、伝送路Ｌ１５と中継器３００との中の何れかの障害を確認する。 [S25] The failure identifying unit 103 confirms a failure in the transmission line L11. And in order to confirm the next access route, it progresses to S11 via B of FIG.
[S26] The failure identifying unit 103 confirms any failure in the transmission line L15 and the repeater 300.

［Ｓ２７］障害特定部１０３の中の中継器及び中継器後伝送路に対する障害確認部１０３ｆが、伝送路Ｌ１６までの障害の有無を判断する。障害が存在する場合、中継器３００の障害が発見されたため、Ｓ２８へ進む。障害が存在しない場合、伝送路Ｌ１５の障害が発見されたため、Ｓ２９へ進む。 [S27] The failure confirmation unit 103f for the repeater and the post-relay transmission line in the failure identification unit 103 determines whether there is a failure up to the transmission line L16. If there is a failure, since a failure of the repeater 300 has been found, the process proceeds to S28. If there is no failure, since a failure in the transmission line L15 has been found, the process proceeds to S29.

［Ｓ２８］障害特定部１０３が、中継器３００の障害を確認する。そして、次のアクセスルートを確認するため、図１０のＢを介してＳ１１へ進む。
［Ｓ２９］障害特定部１０３が、伝送路Ｌ１５の障害を確認する。そして、次のアクセスルートを確認するため、図１０のＢを介してＳ１１へ進む。 [S28] The failure identification unit 103 confirms the failure of the repeater 300. And in order to confirm the next access route, it progresses to S11 via B of FIG.
[S29] The failure identifying unit 103 confirms a failure in the transmission line L15. And in order to confirm the next access route, it progresses to S11 via B of FIG.

このようにすると、障害箇所の特定を自動で正確に実行できる。これによって、コンピュータシステムの保守者による障害箇所の特定ミスを未然に防止できる。そして、保守に対する大幅な効率化を図ることができる。 In this way, the fault location can be automatically and accurately identified. As a result, it is possible to prevent an error in identifying a fault location by a computer system maintainer. In addition, the efficiency of maintenance can be greatly improved.

以下に、障害箇所を含むアクセスルートを障害箇所を含まないアクセスルートへ変更する場合について説明する。
図１２は、通信テーブルの例を示す図である。 The case where an access route including a failure location is changed to an access route not including a failure location will be described below.
FIG. 12 is a diagram illustrating an example of a communication table.

通信テーブル６０は、名称と使用伝送路と通信状況とから構成される。名称は、アクセスルートの名称である。使用伝送路は、名称に関係するアクセスルートが使用する伝送路である。通信状況は、名称に関係するアクセスルートの状況である。なお、通信状況は、アクセスルートが運用されている場合運用用となり、アクセスルートが待機している場合待機用となり、アクセスルートがアクセスルートを診断する目的の場合診断用となる。 The communication table 60 includes a name, a used transmission path, and a communication status. The name is the name of the access route. The used transmission line is a transmission line used by the access route related to the name. The communication status is the status of the access route related to the name. The communication status is for operation when the access route is operated, for standby when the access route is waiting, and for diagnosis when the access route is for the purpose of diagnosing the access route.

ここで、運用用と診断用と待機用との図１２の記載は、第１の実施の形態の初期値である。通信１は、伝送路Ｌ１１、Ｌ１５を使用し、運用用である。通信２は、伝送路Ｌ１１、Ｌ１６を使用し、診断用である。通信３は、伝送路Ｌ１２、Ｌ１７を使用し、診断用である。通信４は、伝送路Ｌ１２、Ｌ１８を使用し、待機用である。通信５は、伝送路Ｌ１３、Ｌ１５を使用し、運用用である。通信６は、伝送路Ｌ１３、Ｌ１６を使用し、診断用である。通信７は、伝送路Ｌ１４、Ｌ１７を使用し、診断用である。通信８は、伝送路Ｌ１４、Ｌ１８を使用し、待機用である。 Here, the descriptions of FIG. 12 for operation, diagnosis, and standby are initial values of the first embodiment. Communication 1 uses transmission lines L11 and L15 and is for operation. Communication 2 uses transmission lines L11 and L16 and is used for diagnosis. Communication 3 uses transmission lines L12 and L17 and is used for diagnosis. Communication 4 uses transmission lines L12 and L18 and is for standby. Communication 5 uses transmission lines L13 and L15 and is for operation. Communication 6 uses transmission lines L13 and L16 and is used for diagnosis. Communication 7 uses transmission lines L14 and L17 and is used for diagnosis. Communication 8 uses transmission lines L14 and L18 and is for standby.

アクセスルート切替部１０４、２０４は、第１のコンピュータ１００と第２のコンピュータ２００と記憶装置５００とが接続された場合、通信テーブル６０を作成する。記憶装置５００に対し入出力が始まると、障害特定部１０３、２０３の障害情報により、障害箇所交換時に障害箇所が存在するアクセスルートにコンピュータがアクセスしないように、障害箇所が存在するアクセスルートを障害箇所が存在しないアクセスルートに切り替える。 The access route switching units 104 and 204 create the communication table 60 when the first computer 100, the second computer 200, and the storage device 500 are connected. When input / output to / from the storage device 500 is started, the failure information of the failure identification units 103 and 203 causes the failure of the access route where the failure location exists so that the computer does not access the access route where the failure location exists when replacing the failure location. Switch to an access route that does not exist.

図１３は、アクセスルート情報の例を示す図である。
アクセスルート情報８０は、第１のコンピュータ表現部８１と第２のコンピュータ表現部８２とから表現される。第１のコンピュータ表現部８１は、第１のコンピュータ１００に関係するアクセスルートを表現する。第２のコンピュータ表現部８２は、第２のコンピュータ２００に関係するアクセスルートを表現する。 FIG. 13 is a diagram illustrating an example of access route information.
The access route information 80 is expressed by a first computer expression unit 81 and a second computer expression unit 82. The first computer expression unit 81 represents an access route related to the first computer 100. The second computer expression unit 82 represents an access route related to the second computer 200.

さらに、存在するアクセスルート情報８０は、アダプタ表現部８３と中継器表現部８４とコントローラ表現部８５とから表現される。アダプタ表現部８３は、アクセスルートが利用しているアダプタを表現する。中継器表現部８４は、アクセスルートが利用している中継器を表現する。コントローラ表現部８５は、アクセスルートが利用しているコントローラを表現する。 Further, the existing access route information 80 is expressed by an adapter expression unit 83, a repeater expression unit 84, and a controller expression unit 85. The adapter expression unit 83 represents an adapter used by the access route. The repeater expression unit 84 represents the repeater used by the access route. The controller expression unit 85 represents a controller used by the access route.

これらのアクセスルート情報８０は、ＯＳ(Operating System)内で定義されている。
なお、障害情報は、図１３の情報を利用する。
図１４は、アクセスルート選択情報の例を示す図である。もともと複数存在するアクセスルートを１つの仮想デバイスとして見せている。 The access route information 80 is defined in the OS (Operating System).
The failure information uses the information in FIG.
FIG. 14 is a diagram illustrating an example of access route selection information. Originally, a plurality of access routes are shown as one virtual device.

アクセスルート選択情報９０は、仮想デバイス名９１と仮想デバイス番号９２と第１の名称９３と第２の名称９４とユーザ用名称９５と第１の予備の名称９６と第２の予備の名称９７とから表現される。 The access route selection information 90 includes a virtual device name 91, a virtual device number 92, a first name 93, a second name 94, a user name 95, a first spare name 96, and a second spare name 97. It is expressed from

仮想デバイス名９１は、仮想デバイスの名称である。仮想デバイス番号９２は、仮想デバイス名９１に関係する仮想デバイスの番号である。第１の名称９３は、仮想デバイス名９１に関係する仮想デバイスにされるアクセスルートの名称である。第２の名称９４は、仮想デバイス名９１に関係する仮想デバイスにされるアクセスルートの名称である。なお、通常は第１の名称９３に関係するアクセスルートが仮想デバイスになる。ユーザ用名称９５は、仮想デバイスにされるアクセスルートのユーザ用の名称である。第１の予備の名称９６は、仮想デバイスを利用しない場合において、第１の名称９３に関係するアクセスルートのユーザ用の名称である。第２の予備の名称９７は、仮想デバイスを利用しない場合において、第２の名称９４に関係するアクセスルートのユーザ用の名称である。 The virtual device name 91 is the name of the virtual device. The virtual device number 92 is a virtual device number related to the virtual device name 91. The first name 93 is a name of an access route to be a virtual device related to the virtual device name 91. The second name 94 is a name of an access route to be a virtual device related to the virtual device name 91. Normally, the access route related to the first name 93 is a virtual device. The user name 95 is a name for a user of an access route to be a virtual device. The first spare name 96 is a name for the user of the access route related to the first name 93 when the virtual device is not used. The second spare name 97 is a name for the user of the access route related to the second name 94 when the virtual device is not used.

図１５は、障害情報の表現の例を示す図である。
障害情報６００は、障害名称６０１と障害説明開始部６０２と第１の障害説明部６０３と第２の障害説明部６０４と障害説明終了部６０５とから表現される。 FIG. 15 is a diagram illustrating an example of expression of failure information.
The failure information 600 is expressed by a failure name 601, a failure description start unit 602, a first failure description unit 603, a second failure description unit 604, and a failure description end unit 605.

障害名称６０１は、障害が検出されたアクセスルートの名称である。障害説明開始部６０２は、障害の説明の開始を宣言する。第１の障害説明部６０３において、cmd＿flag=3の場合アダプタ１０１以外の障害であり、cmd＿flag=1の場合アダプタ１０１の障害である。第２の障害説明部６０４において、es＿key=0x4の場合記憶装置５００以外の障害であり、es＿key=0x3の場合記憶装置５００の障害である。es＿key=0x4、es＿key=0x3の場合、記憶装置５００のコントローラが障害特定部１０３へ報告する。なお、この機能はそもそも記憶装置５００が有している。障害説明終了部６０５は、障害の説明の終了を宣言する。 The failure name 601 is the name of the access route where the failure is detected. The failure description start unit 602 declares the start of the description of the failure. In the first failure explanation unit 603, when cmd_flag = 3, it is a failure other than the adapter 101, and when cmd_flag = 1, it is a failure of the adapter 101. In the second failure explanation unit 604, when es_key = 0x4, the failure is other than the storage device 500, and when es_key = 0x3, the failure is in the storage device 500. When es_key = 0x4 and es_key = 0x3, the controller of the storage device 500 reports to the failure identifying unit 103. Note that this function is originally possessed by the storage device 500. The fault explanation end unit 605 declares the end of the fault explanation.

図１６は、アダプタに対するアクセスルート切替部の処理の例を示す図である。
障害特定部１０３によりアダプタ１０１が障害箇所として特定された場合、影響を受けるアクセスルートは、通信１及び通信２である。この場合、アクセスルート切替部１０４、２０４は、通信１を停止させ、通信４を運用用にする。アクセスルート切替部１０４、２０４は、通信２を停止させる。 FIG. 16 is a diagram illustrating an example of processing of the access route switching unit for the adapter.
When the adapter 101 is identified as a failure location by the failure identifying unit 103, the affected access routes are communication 1 and communication 2. In this case, the access route switching units 104 and 204 stop the communication 1 and make the communication 4 operational. The access route switching units 104 and 204 stop the communication 2.

図１７は、コントローラに対するアクセスルート切替部の処理の例を示す図である。
障害特定部１０３によりコントローラ５０１が障害箇所として特定された場合、影響を受けるアクセスルートは、通信１、通信３、通信５及び通信７である。この場合、アクセスルート切替部１０４、２０４は、通信１を停止させ、通信４を運用用にする。アクセスルート切替部１０４、２０４は、通信３を停止させる。アクセスルート切替部１０４、２０４は、通信５を停止させ、通信８を運用用にする。アクセスルート切替部１０４、２０４は、通信７を停止させる。 FIG. 17 is a diagram illustrating an example of processing of the access route switching unit for the controller.
When the controller 501 is specified as a failure location by the failure specifying unit 103, the affected access routes are communication 1, communication 3, communication 5, and communication 7. In this case, the access route switching units 104 and 204 stop the communication 1 and make the communication 4 operational. The access route switching units 104 and 204 stop the communication 3. The access route switching units 104 and 204 stop the communication 5 and use the communication 8 for operation. The access route switching units 104 and 204 stop the communication 7.

図１８は、中継器前の伝送路に対するアクセスルート切替部の処理の例を示す図である。
障害特定部１０３により伝送路Ｌ１１が障害箇所として特定された場合、影響を受けるアクセスルートは、通信１及び通信２である。この場合、アクセスルート切替部１０４、２０４は、通信１を停止させ、通信４を運用用にする。アクセスルート切替部１０４、２０４は、通信２を停止させる。 FIG. 18 is a diagram illustrating an example of processing of the access route switching unit for the transmission path before the repeater.
When the transmission path L11 is specified as a failure location by the failure specifying unit 103, the affected access routes are communication 1 and communication 2. In this case, the access route switching units 104 and 204 stop the communication 1 and make the communication 4 operational. The access route switching units 104 and 204 stop the communication 2.

図１９は、中継器に対するアクセスルート切替部の処理の例を示す図である。
障害特定部１０３により中継器３００が障害箇所として特定された場合、影響を受けるアクセスルートは、通信１、通信２、通信５及び通信６である。この場合、アクセスルート切替部１０４、２０４は、通信１を停止させ、通信４を運用用にする。アクセスルート切替部１０４、２０４は、通信２を停止させる。アクセスルート切替部１０４、２０４は、通信５を停止させ、通信８を運用用にする。アクセスルート切替部１０４、２０４は、通信６を停止させる。 FIG. 19 is a diagram illustrating an example of processing of an access route switching unit for a repeater.
When the repeater 300 is identified as a failure location by the failure identification unit 103, the affected access routes are communication 1, communication 2, communication 5, and communication 6. In this case, the access route switching units 104 and 204 stop the communication 1 and make the communication 4 operational. The access route switching units 104 and 204 stop the communication 2. The access route switching units 104 and 204 stop the communication 5 and use the communication 8 for operation. The access route switching units 104 and 204 stop the communication 6.

図２０は、中継器後の伝送路に対するアクセスルート切替部の処理の例を示す図である。
障害特定部１０３により伝送路Ｌ１５が障害箇所として特定された場合、影響を受けるアクセスルートは、通信１及び通信５である。この場合、アクセスルート切替部１０４、２０４は、通信１を停止させ、通信４を運用用にする。アクセスルート切替部１０４、２０４は、通信５を停止させ、通信８を運用用にする。 FIG. 20 is a diagram illustrating an example of processing of the access route switching unit for the transmission path after the repeater.
When the transmission path L15 is specified as a failure location by the failure specifying unit 103, the affected access routes are the communication 1 and the communication 5. In this case, the access route switching units 104 and 204 stop the communication 1 and make the communication 4 operational. The access route switching units 104 and 204 stop the communication 5 and use the communication 8 for operation.

このようにすると、構成要素の交換の際、コンピュータシステムに対する操作が不要となるため、操作ミスが起こり得ない。また、コンピュータシステムの知識が少ない人でも、構成要素の交換を実行しやすい。 This eliminates the need for an operation on the computer system when replacing the constituent elements, so that an operation error cannot occur. In addition, even a person with little knowledge of a computer system can easily exchange components.

そして、通常業務において、処理性能を落とすことのない継続運用可能な高信頼性のコンピュータシステムを実現できる。
［第２の実施の形態］
以下に、第１の実施の形態と比較して、各構成要素に対する障害確認部の起動の順序が記憶されている設定ファイルを利用する場合について説明する。 Further, it is possible to realize a highly reliable computer system that can be continuously operated without degrading the processing performance in normal business.
[Second Embodiment]
Hereinafter, a case will be described in which a setting file in which the order of activation of the failure confirmation unit for each component is stored is used as compared with the first embodiment.

図２１は、第２の実施の形態のシステム構成図である。
第２の実施の形態のシステム構成図は、第１の実施の形態と比較して、第１のコンピュータ１００が第１のコンピュータ１００ｚに変化し、第２のコンピュータ２００が第２のコンピュータ２００ｚに変化し、障害特定部１０３が障害特定部１０３ｚに変化し、障害特定部２０３が障害特定部２０３ｚに変化している。そして、第１のコンピュータ１００ｚと第２のコンピュータ２００ｚとに設定ファイル５０が追加されている。 FIG. 21 is a system configuration diagram of the second embodiment.
In the system configuration diagram of the second embodiment, the first computer 100 is changed to the first computer 100z, and the second computer 200 is changed to the second computer 200z, compared to the first embodiment. The failure identification unit 103 is changed to the failure identification unit 103z, and the failure identification unit 203 is changed to the failure identification unit 203z. A setting file 50 is added to the first computer 100z and the second computer 200z.

第１の実施の形態と第２の実施の形態との関係において、障害特定部１０３ｚ、２０３ｚを除いた構成要素の名称が同一の場合は構成要素の機能も同一である。設定ファイル５０は、各構成要素に対する障害確認部の起動の順序を記憶する。そして、障害特定部１０３ｚ、２０３ｚにより参照される。 In the relationship between the first embodiment and the second embodiment, when the names of the constituent elements excluding the failure specifying units 103z and 203z are the same, the functions of the constituent elements are also the same. The setting file 50 stores the activation order of the failure confirmation unit for each component. Then, it is referred to by the failure identification units 103z and 203z.

図２２は、設定ファイルの例を示す図である。各構成要素に対する障害確認部の起動の順序を設定ファイルに指定する。
設定ファイル５０は、構成要素名と障害確認部の略称と異常応答時の障害箇所と正常応答時の障害箇所とから構成される。構成要素名は、構成要素の名称である。障害確認部の略称は、構成要素名に関係する構成要素の障害を確認する手段の略称である。異常応答時の障害箇所は、構成要素名に関係する構成要素が異常である場合、障害を発生させている構成要素である。正常応答時の障害箇所は、構成要素名に関係する構成要素が正常である場合、障害を発生させている構成要素である。 FIG. 22 is a diagram illustrating an example of a setting file. The order of starting the failure confirmation unit for each component is specified in the configuration file.
The setting file 50 includes a component name, an abbreviation of the failure confirmation unit, a failure location at the time of an abnormal response, and a failure location at the time of a normal response. The component name is the name of the component. The abbreviation of the failure confirmation unit is an abbreviation of means for confirming the failure of the component related to the component name. The failure part at the time of an abnormal response is a component that has caused a failure when the component related to the component name is abnormal. The failure part at the time of normal response is a component that has caused a failure when the component related to the component name is normal.

具体的には、実行する順番に、各構成要素に対する障害確認部の起動の順序を設定ファイル５０に記載する。図１０と図１１との処理を設定ファイル５０に記載すると、図２２のようになる。 Specifically, the order of activation of the failure confirmation unit for each component is described in the setting file 50 in the order of execution. If the processing of FIG. 10 and FIG. 11 is described in the setting file 50, it will be as shown in FIG.

設定ファイル５０は、記憶装置本体に対する障害確認部５１、アダプタに対する障害確認部５２、コントローラ（自経路）に対する障害確認部５３、コントローラ（他経路）に対する障害確認部５４、中継器及び中継器前伝送路に対する障害確認部５５、中継器及び中継器後伝送路に対する障害確認部５６から構成される。 The setting file 50 includes a failure confirmation unit 51 for the storage device body, a failure confirmation unit 52 for the adapter, a failure confirmation unit 53 for the controller (own route), a failure confirmation unit 54 for the controller (other route), the relay and the transmission before the relay. It comprises a fault confirmation unit 55 for the path, and a fault confirmation unit 56 for the repeater and the transmission path after the repeater.

記憶装置本体に対する障害確認部５１は、記憶装置本体５０３という構成要素に対して障害確認を実行し、（ａ）と略称される。
アダプタに対する障害確認部５２は、アダプタ１０１という構成要素に対して障害確認を実行し、（ｂ）と略称され、異常応答時にはアダプタ１０１を障害箇所とする。 The failure confirmation unit 51 for the storage device main body executes failure confirmation for a component called the storage device main body 503 and is abbreviated as (a).
The failure confirmation unit 52 for the adapter performs failure confirmation for the constituent element of the adapter 101, which is abbreviated as (b), and uses the adapter 101 as a failure location during an abnormal response.

コントローラ（自経路）に対する障害確認部５３は、コントローラ５０１という構成要素に対して障害確認を実行し、（ｃ）と略称され、正常応答時には記憶装置本体５０３を障害箇所とする。 The failure confirmation unit 53 for the controller (own route) executes failure confirmation for the component called the controller 501, which is abbreviated as (c), and sets the storage device body 503 as a failure point during normal response.

コントローラ（他経路）に対する障害確認部５４は、コントローラ５０１という構成要素に対して障害確認を実行し、（ｄ）と略称され、異常応答時にはコントローラ５０１を障害箇所とし、正常応答時には伝送路Ｌ１１、Ｌ１５及び中継器３００を障害箇所とする。 The failure confirmation unit 54 for the controller (other route) executes failure confirmation for a component called the controller 501, and is abbreviated as (d). The controller 501 is used as a failure location during an abnormal response, and the transmission path L11, Let L15 and the repeater 300 be faulty locations.

中継器及び中継器前伝送路に対する障害確認部５５は、中継器３００という構成要素に対して障害確認を実行し、（ｅ）と略称され、異常応答時には中継器３００を障害箇所とし、正常応答時には伝送路Ｌ１１を障害箇所とする。 The failure confirmation unit 55 for the repeater and the transmission line before the repeater performs failure confirmation for the constituent element called the repeater 300, which is abbreviated as (e). Sometimes, the transmission line L11 is used as a failure location.

中継器及び中継器後伝送路に対する障害確認部５６は、伝送路Ｌ１６という構成要素に対して障害確認を実行し、（ｆ）と略称され、異常応答時には中継器３００を障害箇所とし、正常応答時には伝送路Ｌ１５を障害箇所とする。 The failure confirmation unit 56 for the repeater and the transmission line after the repeater performs failure confirmation for the component called the transmission line L16, which is abbreviated as (f). Sometimes, the transmission line L15 is used as a failure location.

このように、設定ファイル５０を使用すれば、テスト用アクセスを自由に実行でき、構成要素の構成が複雑なコンピュータシステムの場合でも、障害箇所の細かい特定が容易になる。そして、構成要素の障害の原因も特定しやすくなる。 As described above, by using the setting file 50, the test access can be freely executed, and even in the case of a computer system having a complicated configuration of the constituent elements, it is easy to specify the fault location. And it becomes easy to specify the cause of the failure of a component.

具体的には、既存のコンピュータシステムにHub等の中継器を追加した場合、その構成要素がコンピュータから障害確認可能な機能を有していれば、その構成要素を設定ファイル５０に追加することで、障害箇所の自動特定が可能になる。 Specifically, when a repeater such as a Hub is added to an existing computer system, if the component has a function that allows the computer to confirm a failure, the component can be added to the setting file 50. , Automatic identification of the fault location becomes possible.

［第３の実施の形態］
以下に、第１の実施の形態と比較して、記憶装置が１つから２つに増加した場合について説明する。 [Third Embodiment]
Hereinafter, a case where the number of storage devices is increased from one to two as compared with the first embodiment will be described.

図２３は、第３の実施の形態のシステム構成図である。
第３の実施の形態のシステム構成図は、第１のコンピュータ１００と第２のコンピュータ２００と中継器３００、４００と第１の記憶装置５１０と第２の記憶装置５２０と伝送路Ｌ１１、Ｌ１２、Ｌ１３、Ｌ１４、Ｌ１５、Ｌ１６、Ｌ１７、Ｌ１８、Ｌ１９、Ｌ２０、Ｌ２１、Ｌ２２とから構成される。第１のコンピュータ１００と第２のコンピュータ２００とは、ユーザの端末装置であり、ユーザのサービス要求に応答するか、ユーザのサービス要求をサーバに送信する。中継器３００、４００は、第１のコンピュータ１００と第２のコンピュータ２００と第１の記憶装置５１０と第２の記憶装置５２０とを、互いに接続する。第１の記憶装置５１０は、第１のコンピュータ１００と第２のコンピュータ２００とから、データを記憶させられる。第２の記憶装置５２０は、第１のコンピュータ１００と第２のコンピュータ２００とから、データを記憶させられる。第１のコンピュータ１００と第２のコンピュータ２００と中継器３００、４００と第１の記憶装置５１０と第２の記憶装置５２０とは、伝送路Ｌ１１、Ｌ１２、Ｌ１３、Ｌ１４、Ｌ１５、Ｌ１６、Ｌ１７、Ｌ１８、Ｌ１９、Ｌ２０、Ｌ２１、Ｌ２２を介して、互いに通信している。 FIG. 23 is a system configuration diagram of the third embodiment.
The system configuration diagram of the third embodiment includes a first computer 100, a second computer 200, repeaters 300, 400, a first storage device 510, a second storage device 520, transmission lines L11, L12, L13, L14, L15, L16, L17, L18, L19, L20, L21, and L22. The first computer 100 and the second computer 200 are user terminal devices that respond to a user service request or transmit a user service request to a server. The repeaters 300 and 400 connect the first computer 100, the second computer 200, the first storage device 510, and the second storage device 520 to each other. The first storage device 510 can store data from the first computer 100 and the second computer 200. The second storage device 520 can store data from the first computer 100 and the second computer 200. The first computer 100, the second computer 200, the repeaters 300, 400, the first storage device 510, and the second storage device 520 include transmission lines L11, L12, L13, L14, L15, L16, L17, It communicates with each other via L18, L19, L20, L21, and L22.

第１のコンピュータ１００については、第１の実施の形態で示した通りである。
第２のコンピュータ２００については、第１の実施の形態で示した通りである。
第１の記憶装置５１０は、コントローラ５１１、５１２と記憶装置本体５１３とから構成される。コントローラ５１１、５１２は、第１の記憶装置５１０をネットワークに接続する。記憶装置本体５１３は、第１の記憶装置５１０の本体である。コントローラ５１１、５１２と記憶装置本体５１３とは、互いに通信している。 The first computer 100 is as described in the first embodiment.
The second computer 200 is as described in the first embodiment.
The first storage device 510 includes controllers 511 and 512 and a storage device body 513. The controllers 511 and 512 connect the first storage device 510 to the network. The storage device main body 513 is the main body of the first storage device 510. The controllers 511 and 512 and the storage device main body 513 communicate with each other.

第２の記憶装置５２０は、コントローラ５２１、５２２と記憶装置本体５２３とから構成される。コントローラ５２１、５２２は、第２の記憶装置５２０をネットワークに接続する。記憶装置本体５２３は、第２の記憶装置５２０の本体である。コントローラ５２１、５２２と記憶装置本体５２３とは、互いに通信している。 The second storage device 520 includes controllers 521 and 522 and a storage device body 523. The controllers 521 and 522 connect the second storage device 520 to the network. The storage device main body 523 is the main body of the second storage device 520. The controllers 521 and 522 and the storage device main body 523 communicate with each other.

図２４は、通信テーブルの例を示す図である。
通信テーブル７０は、名称と使用伝送路と通信状況とから構成される。名称は、アクセスルートの名称である。使用伝送路は、名称に関係するアクセスルートが使用する伝送路である。通信状況は、名称に関係するアクセスルートの状況である。なお、通信状況は、アクセスルートが運用されている場合運用用となり、アクセスルートが待機している場合待機用となり、アクセスルートがアクセスルートを診断する目的の場合診断用となる。 FIG. 24 is a diagram illustrating an example of a communication table.
The communication table 70 includes a name, a used transmission path, and a communication status. The name is the name of the access route. The used transmission line is a transmission line used by the access route related to the name. The communication status is the status of the access route related to the name. The communication status is for operation when the access route is operated, for standby when the access route is waiting, and for diagnosis when the access route is for the purpose of diagnosing the access route.

ここで、運用用と診断用と待機用との図２４の記載は、第３の実施の形態の初期値である。通信１は、伝送路Ｌ１１、Ｌ１５を使用し、運用用である。通信２は、伝送路Ｌ１１、Ｌ１６を使用し、診断用である。通信３は、伝送路Ｌ１１、Ｌ１７を使用し、運用用である。通信４は、伝送路Ｌ１１、Ｌ１８を使用し、診断用である。通信５は、伝送路Ｌ１２、Ｌ１９を使用し、診断用である。通信６は、伝送路Ｌ１２、Ｌ２０を使用し、待機用である。通信７は、伝送路Ｌ１２、Ｌ２１を使用し、診断用である。通信８は、伝送路Ｌ１２、Ｌ２２を使用し、待機用である。通信９は、伝送路Ｌ１３、Ｌ１５を使用し、運用用である。通信１０は、伝送路Ｌ１３、Ｌ１６を使用し、診断用である。通信１１は、伝送路Ｌ１３、Ｌ１７を使用し、運用用である。通信１２は、伝送路Ｌ１３、Ｌ１８を使用し、診断用である。通信１３は、伝送路Ｌ１４、Ｌ１９を使用し、診断用である。通信１４は、伝送路Ｌ１４、Ｌ２０を使用し、待機用である。通信１５は、伝送路Ｌ１４、Ｌ２１を使用し、診断用である。通信１６は、伝送路Ｌ１４、Ｌ２２を使用し、待機用である。 Here, the descriptions in FIG. 24 for operation, diagnosis, and standby are initial values of the third embodiment. Communication 1 uses transmission lines L11 and L15 and is for operation. Communication 2 uses transmission lines L11 and L16 and is used for diagnosis. Communication 3 uses transmission lines L11 and L17 and is for operation. Communication 4 uses transmission lines L11 and L18 and is used for diagnosis. Communication 5 uses transmission lines L12 and L19 and is for diagnosis. Communication 6 uses transmission lines L12 and L20 and is for standby. Communication 7 uses transmission lines L12 and L21 and is used for diagnosis. Communication 8 uses transmission lines L12 and L22 and is for standby. Communication 9 uses transmission lines L13 and L15 and is for operation. The communication 10 uses transmission lines L13 and L16 and is for diagnosis. The communication 11 uses transmission lines L13 and L17 and is for operation. The communication 12 uses transmission lines L13 and L18 and is used for diagnosis. Communication 13 uses transmission lines L14 and L19 and is used for diagnosis. Communication 14 uses transmission lines L14 and L20 and is for standby. Communication 15 uses transmission lines L14 and L21 and is used for diagnosis. Communication 16 uses transmission lines L14 and L22 and is for standby.

アクセスルート切替部１０４、２０４は、第１のコンピュータ１００と第２のコンピュータ２００と第１の記憶装置５１０と第２の記憶装置５２０とが接続された場合、通信テーブル７０を作成する。第１の記憶装置５１０と第２の記憶装置５２０とに対し入出力が始まると、障害特定部１０３、２０３の障害情報により、障害箇所交換時に障害箇所が存在するアクセスルートにコンピュータがアクセスしないように、障害箇所が存在するアクセスルートを障害箇所が存在しないアクセスルートに切り替える。 The access route switching units 104 and 204 create the communication table 70 when the first computer 100, the second computer 200, the first storage device 510, and the second storage device 520 are connected. When input / output is started with respect to the first storage device 510 and the second storage device 520, the failure information of the failure identification units 103 and 203 prevents the computer from accessing the access route where the failure location exists when replacing the failure location. In addition, the access route in which the failure point exists is switched to the access route in which the failure point does not exist.

図２５は、中継器後の伝送路に対するアクセスルート切替部の処理の例を示す図である。
障害特定部１０３により伝送路Ｌ１５が障害箇所として特定された場合、影響を受けるアクセスルートは、通信１及び通信９である。この場合、アクセスルート切替部１０４、２０４は、通信１を停止させ、通信６を運用用にする。アクセスルート切替部１０４、２０４は、通信９を停止させ、通信１４を運用用にする。 FIG. 25 is a diagram illustrating an example of processing of the access route switching unit for the transmission path after the repeater.
When the transmission path L15 is specified as a failure location by the failure specifying unit 103, the affected access routes are the communication 1 and the communication 9. In this case, the access route switching units 104 and 204 stop the communication 1 and use the communication 6 for operation. The access route switching units 104 and 204 stop the communication 9 and make the communication 14 operational.

図２６は、図２５の場合による通信テーブルの変化を示す図である。
通信テーブル７０において、通信１が停止に、通信６が運用用に、通信９が停止に、通信１４が運用用に変化する。 FIG. 26 is a diagram illustrating changes in the communication table in the case of FIG.
In the communication table 70, the communication 1 changes to stop, the communication 6 changes to operation, the communication 9 changes to stop, and the communication 14 changes to operation.

図２７は、コントローラに対するアクセスルート切替部の処理の例を示す図である。
障害特定部１０３によりコントローラ５１１が障害箇所として特定された場合、影響を受けるアクセスルートは、通信１、通信５、通信９及び通信１３である。この場合、アクセスルート切替部１０４、２０４は、通信１を停止させ、通信６を運用用にする。アクセスルート切替部１０４、２０４は、通信５を停止させる。アクセスルート切替部１０４、２０４は、通信９を停止させ、通信１４を運用用にする。アクセスルート切替部１０４、２０４は、通信１３を停止させる。 FIG. 27 is a diagram illustrating an example of processing of the access route switching unit for the controller.
When the controller 511 is specified as a failure location by the failure specifying unit 103, the affected access routes are communication 1, communication 5, communication 9, and communication 13. In this case, the access route switching units 104 and 204 stop the communication 1 and use the communication 6 for operation. The access route switching units 104 and 204 stop the communication 5. The access route switching units 104 and 204 stop the communication 9 and make the communication 14 operational. The access route switching units 104 and 204 stop the communication 13.

図２８は、図２７の場合による通信テーブルの変化を示す図である。
通信テーブル７０において、通信１が停止に、通信６が運用用に、通信５が停止に、通信９が停止に、通信１４が運用用に、通信１３が停止に変化する。 FIG. 28 is a diagram showing changes in the communication table in the case of FIG.
In the communication table 70, communication 1 is changed to stop, communication 6 is changed to operation, communication 5 is changed to stop, communication 9 is changed to stop, communication 14 is changed to operation, and communication 13 is changed to stop.

そして、通常業務において、処理性能を落とすことのない継続運用可能な高信頼性のコンピュータシステムを実現できる。
図２９は、コンピュータのハードウェア構成の例を示す図である。コンピュータ８００は、ＣＰＵ(Central Processing Unit)８０１によって装置全体が制御されている。ＣＰＵ８０１には、バス８０７を介してＲＡＭ(Random Access Memory)８０２、ハードディスクドライブ（ＨＤＤ:Hard Disk Drive）８０３、グラフィック処理装置８０４、入力インタフェース８０５、および通信インタフェース８０６が接続されている。 Further, it is possible to realize a highly reliable computer system that can be continuously operated without degrading the processing performance in normal business.
FIG. 29 is a diagram illustrating an example of a hardware configuration of a computer. The entire computer 800 is controlled by a CPU (Central Processing Unit) 801. A random access memory (RAM) 802, a hard disk drive (HDD) 803, a graphic processing device 804, an input interface 805, and a communication interface 806 are connected to the CPU 801 via a bus 807.

ＲＡＭ８０２には、ＣＰＵ８０１に実行させるＯＳのプログラムやアプリケーションプログラムの少なくとも一部が一時的に格納される。また、ＲＡＭ８０２には、ＣＰＵ８０１による処理に必要な各種データが格納される。ＨＤＤ８０３には、ＯＳやアプリケーションプログラムが格納される。 The RAM 802 temporarily stores at least part of an OS program and application programs to be executed by the CPU 801. The RAM 802 stores various data necessary for processing by the CPU 801. The HDD 803 stores an OS and application programs.

グラフィック処理装置８０４には、モニタ９０１が接続されている。グラフィック処理装置８０４は、ＣＰＵ８０１からの命令に従って、画像をモニタ９０１の画面に表示させる。入力インタフェース８０５には、キーボード９０２とマウス９０３とが接続されている。入力インタフェース８０５は、キーボード９０２やマウス９０３から送られてくる信号を、バス８０７を介してＣＰＵ８０１に送信する。 A monitor 901 is connected to the graphic processing device 804. The graphic processing device 804 displays an image on the screen of the monitor 901 in accordance with a command from the CPU 801. A keyboard 902 and a mouse 903 are connected to the input interface 805. The input interface 805 transmits a signal transmitted from the keyboard 902 or the mouse 903 to the CPU 801 via the bus 807.

通信インタフェース８０６は、ネットワーク９０４に接続されている。通信インタフェース８０６は、ネットワーク９０４を介して、他のコンピュータとの間でデータの送受信を行う。 The communication interface 806 is connected to the network 904. The communication interface 806 transmits / receives data to / from other computers via the network 904.

以上のようなハードウェア構成によって、本実施の形態の処理機能を実現することができる。
なお、上記の処理機能は、コンピュータによって実現することができる。その場合、障害解析装置が有すべき機能の処理内容を記述したプログラムが提供される。そのプログラムをコンピュータで実行することにより、上記処理機能がコンピュータ上で実現される。処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記録装置には、ハードディスク装置（ＨＤＤ）、フレキシブルディスク（ＦＤ）、磁気テープなどがある。光ディスクには、ＤＶＤ(Digital Versatile Disc)、ＤＶＤ−ＲＡＭ(Random Access Memory)、ＣＤ−ＲＯＭ(Compact Disc Read Only Memory)、ＣＤ−Ｒ(Recordable)／ＲＷ(ReWritable)などがある。光磁気記録媒体には、ＭＯ(Magneto-Optical disk)などがある。 With the hardware configuration as described above, the processing functions of the present embodiment can be realized.
The above processing functions can be realized by a computer. In that case, a program describing the processing contents of the function that the failure analysis apparatus should have is provided. By executing the program on a computer, the above processing functions are realized on the computer. The program describing the processing contents can be recorded on a computer-readable recording medium. Examples of the computer-readable recording medium include a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Examples of the magnetic recording device include a hard disk device (HDD), a flexible disk (FD), and a magnetic tape. Examples of the optical disc include a DVD (Digital Versatile Disc), a DVD-RAM (Random Access Memory), a CD-ROM (Compact Disc Read Only Memory), and a CD-R (Recordable) / RW (ReWritable). Magneto-optical recording media include MO (Magneto-Optical disk).

プログラムを流通させる場合には、例えば、そのプログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型記録媒体が販売される。また、プログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することもできる。 When distributing the program, for example, a portable recording medium such as a DVD or a CD-ROM in which the program is recorded is sold. It is also possible to store the program in a storage device of a server computer and transfer the program from the server computer to another computer via a network.

プログラムを実行するコンピュータは、例えば、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、自己の記憶装置に格納する。そして、コンピュータは、自己の記憶装置からプログラムを読み取り、プログラムに従った処理を実行する。なお、コンピュータは、可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することもできる。また、コンピュータは、サーバコンピュータからプログラムが転送される毎に、逐次、受け取ったプログラムに従った処理を実行することもできる。 The computer that executes the program stores, for example, the program recorded on the portable recording medium or the program transferred from the server computer in its own storage device. Then, the computer reads the program from its own storage device and executes processing according to the program. The computer can also read the program directly from the portable recording medium and execute processing according to the program. In addition, each time the program is transferred from the server computer, the computer can sequentially execute processing according to the received program.

（付記１）複数の情報機器及び前記情報機器間を接続する１個以上の伝送路を構成要素とする複数のアクセスルートで通信可能な複数の装置間の通信の障害を解析する障害解析装置において、
前記複数の装置間の通常アクセスにおいて障害を検出すると、前記通常アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補とする障害検出手段と、
前記障害検出手段で障害が検出されると、前記障害箇所候補へテスト用アクセスを実行し、前記テスト用アクセスで障害を検出しなかった場合、前記テスト用アクセスを実行したアクセスルート上の全ての構成要素を、前記障害箇所候補から除外し、前記テスト用アクセスで障害を検出した場合、前記テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素を、前記障害箇所候補から除外する障害診断手段と、
前記障害診断手段において、１以上のアクセスルートで前記テスト用アクセスが実行された結果、最後の１つとなった前記障害箇所候補の構成要素を障害箇所とする障害特定手段と、
前記複数の装置間の前記通常アクセス用のアクセスルートを、前記障害特定手段で障害箇所として特定された構成要素を経由しないアクセスルートに切り替えるアクセスルート切替手段と、
を有することを特徴とする障害解析装置。 (Supplementary note 1) In a failure analysis apparatus for analyzing a communication failure between a plurality of information devices and a plurality of devices capable of communicating with a plurality of access routes having one or more transmission paths connecting the information devices as constituent elements ,
When a failure is detected in normal access between the plurality of devices, failure detection means that sets all components on the access route that has executed the normal access as failure location candidates;
When a failure is detected by the failure detection means, a test access is executed to the failure location candidate, and if no failure is detected by the test access, all the access routes on which the test access is executed When a component is excluded from the failure location candidates and a failure is detected by the test access, all components that are not arranged on the access route that has executed the test access are excluded from the failure location candidates. Fault diagnosis means to
In the failure diagnosis unit, as a result of execution of the test access through one or more access routes, a failure identification unit that uses the component of the failure point candidate that is the last one as a failure point;
An access route switching unit that switches the access route for normal access between the plurality of devices to an access route that does not pass through a component identified as a failure location by the failure identifying unit;
A failure analysis apparatus characterized by comprising:

（付記２）前記障害診断手段は、テスト用アクセスの起動の順序が記憶されている設定ファイルを参照し、前記設定ファイルで示される順序で前記テスト用アクセスを実行することを特徴とする付記１記載の障害解析装置。 (Additional remark 2) The said fault diagnosis means refers to the setting file in which the order of starting of the test access is stored, and executes the test access in the order indicated by the setting file. The failure analysis device described.

（付記３）前記障害診断手段は、前記通常アクセスを実行したアクセスルートの一部の構成要素を経由したアクセスルートで前記テスト用アクセスを実行することを特徴とする付記１記載の障害解析装置。 (Supplementary note 3) The fault analysis device according to supplementary note 1, wherein the fault diagnosis unit executes the test access by an access route that passes through some components of the access route that has executed the normal access.

（付記４）前記障害診断手段は、前記通常アクセスを実行したアクセスルートと異なる構成要素を経由したアクセスルートで前記テスト用アクセスを実行することを特徴とする付記１記載の障害解析装置。 (Supplementary note 4) The fault analysis device according to supplementary note 1, wherein the fault diagnosis unit executes the test access by an access route that passes through a component different from the access route that has executed the normal access.

（付記５）前記複数の装置は、記憶装置と前記記憶装置に対してネットワーク経由でアクセスするコンピュータであり、前記情報機器には、前記記憶装置内でデータを記憶する装置本体、前記装置本体を制御するコントローラ、前記ネットワーク上でデータを中継する中継器及び前記コンピュータにおいて通信を制御するアダプタが含まれることを特徴とする付記１記載の障害解析装置。 (Supplementary Note 5) The plurality of devices are a storage device and a computer that accesses the storage device via a network, and the information device includes a device main body that stores data in the storage device, and the device main body. The failure analysis apparatus according to claim 1, further comprising: a controller for controlling, a relay for relaying data on the network, and an adapter for controlling communication in the computer.

（付記６）コンピュータにより、複数の情報機器及び前記情報機器間を接続する１個以上の伝送路を構成要素とする複数のアクセスルートで通信可能な複数の装置間の通信の障害を解析する障害解析方法において、
障害検出手段が、前記複数の装置間の通常アクセスにおいて障害を検出すると、前記通常アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補とし、
障害診断手段が、前記障害検出手段で障害が検出されると、前記障害箇所候補へテスト用アクセスを実行し、前記テスト用アクセスで障害を検出しなかった場合、前記テスト用アクセスを実行したアクセスルート上の全ての構成要素を、前記障害箇所候補から除外し、前記テスト用アクセスで障害を検出した場合、前記テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素を、前記障害箇所候補から除外し、
障害特定手段が、前記障害診断手段において、１以上のアクセスルートで前記テスト用アクセスが実行された結果、最後の１つとなった前記障害箇所候補の構成要素を障害箇所とし、
アクセスルート切替手段が、前記複数の装置間の前記通常アクセス用のアクセスルートを、前記障害特定手段で障害箇所として特定された構成要素を経由しないアクセスルートに切り替える、
ことを特徴とする障害解析方法。 (Additional remark 6) The failure which analyzes the failure of communication between several apparatuses which can communicate by several access route which makes a component a 1 or more transmission path which connects between several information devices and the said information devices with a computer In the analysis method,
When the failure detection means detects a failure in normal access between the plurality of devices, all the components on the access route that executed the normal access are set as failure location candidates,
If the failure diagnosis unit detects a failure by the failure detection unit, the failure diagnosis unit performs a test access to the failure location candidate, and if the failure is not detected by the test access, the access that has performed the test access When all the components on the route are excluded from the failure location candidates and a failure is detected by the test access, all the components not arranged on the access route that has executed the test access are Exclude from failure candidate,
As a result of the execution of the test access by one or more access routes in the failure diagnosis unit, the failure identification unit sets the component of the failure location candidate that is the last one as a failure location,
The access route switching means switches the access route for normal access between the plurality of devices to an access route that does not pass through a component identified as a failure location by the failure identification means.
A failure analysis method characterized by the above.

（付記７）複数の情報機器及び前記情報機器間を接続する１個以上の伝送路を構成要素とする複数のアクセスルートで通信可能な複数の装置間の通信の障害を解析する障害解析プログラムにおいて、
コンピュータに、
障害検出手段は、前記複数の装置間の通常アクセスにおいて障害を検出すると、前記通常アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補とし、
障害診断手段は、前記障害検出手段で障害が検出されると、前記障害箇所候補へテスト用アクセスを実行し、前記テスト用アクセスで障害を検出しなかった場合、前記テスト用アクセスを実行したアクセスルート上の全ての構成要素を、前記障害箇所候補から除外し、前記テスト用アクセスで障害を検出した場合、前記テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素を、前記障害箇所候補から除外し、
障害特定手段は、前記障害診断手段において、１以上のアクセスルートで前記テスト用アクセスが実行された結果、最後の１つとなった前記障害箇所候補の構成要素を障害箇所とし、
アクセスルート切替手段は、前記複数の装置間の前記通常アクセス用のアクセスルートを、前記障害特定手段で障害箇所として特定された構成要素を経由しないアクセスルートに切り替える、
処理を実行させることを特徴とする障害解析プログラム。 (Supplementary note 7) In a failure analysis program for analyzing a communication failure between a plurality of information devices and a plurality of devices communicable by a plurality of access routes having one or more transmission paths connecting the information devices as constituent elements ,
On the computer,
When detecting a failure in normal access between the plurality of devices, the failure detection means sets all components on the access route that executed the normal access as failure location candidates,
When a failure is detected by the failure detection unit, the failure diagnosis unit executes a test access to the failure location candidate, and if the failure is not detected by the test access, the access that has executed the test access When all the components on the route are excluded from the failure location candidates and a failure is detected by the test access, all the components not arranged on the access route that has executed the test access are Exclude from failure candidate,
The failure identifying means uses the component of the failure location candidate that has become the last one as a result of the execution of the test access by one or more access routes in the failure diagnosis means as a failure location,
The access route switching means switches the access route for normal access between the plurality of devices to an access route that does not pass through a component identified as a failure location by the failure identifying means.
A failure analysis program characterized by causing a process to be executed.

（付記８）複数の情報機器及び前記情報機器間を接続する１個以上の伝送路を構成要素とする複数のアクセスルートで通信可能な複数の装置間の通信の障害を解析する障害解析プログラムを記録したコンピュータ読み取り可能な記録媒体において、
コンピュータに、
障害検出手段は、前記複数の装置間の通常アクセスにおいて障害を検出すると、前記通常アクセスを実行したアクセスルート上の全ての構成要素を障害箇所候補とし、
障害診断手段は、前記障害検出手段で障害が検出されると、前記障害箇所候補へテスト用アクセスを実行し、前記テスト用アクセスで障害を検出しなかった場合、前記テスト用アクセスを実行したアクセスルート上の全ての構成要素を、前記障害箇所候補から除外し、前記テスト用アクセスで障害を検出した場合、前記テスト用アクセスを実行したアクセスルート上に配置されていない全ての構成要素を、前記障害箇所候補から除外し、
障害特定手段は、前記障害診断手段において、１以上のアクセスルートで前記テスト用アクセスが実行された結果、最後の１つとなった前記障害箇所候補の構成要素を障害箇所とし、
アクセスルート切替手段は、前記複数の装置間の前記通常アクセス用のアクセスルートを、前記障害特定手段で障害箇所として特定された構成要素を経由しないアクセスルートに切り替える、
処理を実行させることを特徴とする障害解析プログラムを記録したコンピュータ読み取り可能な記録媒体。 (Supplementary Note 8) A failure analysis program for analyzing a communication failure between a plurality of information devices and a plurality of devices communicable by a plurality of access routes having one or more transmission paths connecting the information devices as constituent elements In a recorded computer-readable recording medium,
On the computer,
When detecting a failure in normal access between the plurality of devices, the failure detection means sets all components on the access route that executed the normal access as failure location candidates,
When a failure is detected by the failure detection unit, the failure diagnosis unit executes a test access to the failure location candidate, and if the failure is not detected by the test access, the access that has performed the test access When all the components on the route are excluded from the failure location candidates and a failure is detected by the test access, all the components not arranged on the access route that has executed the test access are Exclude from failure candidate,
The failure identifying means uses the component of the failure location candidate that has become the last one as a result of the execution of the test access by one or more access routes in the failure diagnosis means as a failure location,
The access route switching means switches the access route for normal access between the plurality of devices to an access route that does not pass through the component identified as the failure location by the failure identification means.
A computer-readable recording medium on which a failure analysis program is recorded.

本発明の概念図である。It is a conceptual diagram of this invention. 第１の実施の形態のシステム構成図である。It is a system configuration figure of a 1st embodiment. 障害特定部の例を示す図である。It is a figure which shows the example of a failure specific | specification part. 記憶装置本体に対する障害確認部の処理の例を示す図である。It is a figure which shows the example of a process of the failure confirmation part with respect to a memory | storage device main body. アダプタに対する障害確認部の処理の例を示す図である。It is a figure which shows the example of a process of the failure confirmation part with respect to an adapter. コントローラ（自経路）に対する障害確認部の処理の例を示す図である。It is a figure which shows the example of a process of the failure confirmation part with respect to a controller (own path). コントローラ（他経路）に対する障害確認部の処理の例を示す図である。It is a figure which shows the example of a process of the failure confirmation part with respect to a controller (other path | route). 中継器及び中継器前伝送路に対する障害確認部の処理の例を示す図である。It is a figure which shows the example of a process of the failure confirmation part with respect to a repeater and a transmission line before a repeater. 中継器及び中継器後伝送路に対する障害確認部の処理の例を示す図である。It is a figure which shows the example of a process of the failure confirmation part with respect to a repeater and a transmission line after a repeater. 第１の実施の形態の処理の例を示すフローチャートの前半である。It is the first half of the flowchart which shows the example of a process of 1st Embodiment. 第１の実施の形態の処理の例を示すフローチャートの後半である。It is the second half of the flowchart which shows the example of a process of 1st Embodiment. 通信テーブルの例を示す図である。It is a figure which shows the example of a communication table. 存在するアクセスルートの表現の例を示す図である。It is a figure which shows the example of the expression of the existing access route. アクセスルート情報の表現の例を示す図である。It is a figure which shows the example of expression of access route information. 障害情報の表現の例を示す図である。It is a figure which shows the example of expression of fault information. アダプタに対するアクセスルート切替部の処理の例を示す図である。It is a figure which shows the example of a process of the access route switching part with respect to an adapter. コントローラに対するアクセスルート切替部の処理の例を示す図である。It is a figure which shows the example of a process of the access route switching part with respect to a controller. 中継器前の伝送路に対するアクセスルート切替部の処理の例を示す図である。It is a figure which shows the example of a process of the access route switching part with respect to the transmission line before a repeater. 中継器に対するアクセスルート切替部の処理の例を示す図である。It is a figure which shows the example of a process of the access route switching part with respect to a repeater. 中継器後の伝送路に対するアクセスルート切替部の処理の例を示す図である。It is a figure which shows the example of a process of the access route switching part with respect to the transmission line after a repeater. 第２の実施の形態のシステム構成図である。It is a system configuration figure of a 2nd embodiment. 設定ファイルの例を示す図である。It is a figure which shows the example of a setting file. 第３の実施の形態のシステム構成図である。It is a system configuration figure of a 3rd embodiment. 通信テーブルの例を示す図である。It is a figure which shows the example of a communication table. 中継器後の伝送路に対するアクセスルート切替部の処理の例を示す図である。It is a figure which shows the example of a process of the access route switching part with respect to the transmission line after a repeater. 図２５の場合による通信テーブルの変化を示す図である。It is a figure which shows the change of the communication table by the case of FIG. コントローラに対するアクセスルート切替部の処理の例を示す図である。It is a figure which shows the example of a process of the access route switching part with respect to a controller. 図２７の場合による通信テーブルの変化を示す図である。It is a figure which shows the change of the communication table by the case of FIG. コンピュータのハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of a computer. 従来のシステム構成図における障害への対応の例を示す図である。It is a figure which shows the example of the response | compatibility to the failure in the conventional system block diagram.

Explanation of symbols

１０コンピュータ
１１障害解析装置
１１ａ障害検出手段
１１ｂ障害診断手段
１１ｃ障害特定手段
１１ｄアクセスルート切替手段
１２アダプタ
１３アダプタ
２０中継器
３０中継器
４０装置
４１コントローラ
４２コントローラ
４３装置本体
Ｌ１伝送路
Ｌ２伝送路
Ｌ３伝送路
Ｌ４伝送路
Ｌ５伝送路
Ｌ６伝送路
Ｌ７伝送路
Ｌ８伝送路
DESCRIPTION OF SYMBOLS 10 Computer 11 Failure analysis device 11a Failure detection means 11b Failure diagnosis means 11c Failure identification means 11d Access route switching means 12 Adapter 13 Adapter 20 Repeater 30 Repeater 40 Device 41 Controller 42 Controller 43 Device main body L1 Transmission path L2 Transmission path L3 Transmission Path L4 transmission path L5 transmission path L6 transmission path L7 transmission path L8 transmission path

Claims

In a failure analysis device that analyzes a failure in communication between a plurality of information devices and a plurality of devices that can communicate with a plurality of access routes having one or more transmission paths connecting the information devices as components,
When a failure is detected in normal access between the plurality of devices, failure detection means that sets all components on the access route that has executed the normal access as failure location candidates;
When a failure is detected by the failure detection means, a test access is executed to the failure location candidate, and if no failure is detected by the test access, all the access routes on which the test access is executed When a component is excluded from the failure location candidates and a failure is detected by the test access, all components that are not arranged on the access route that has executed the test access are excluded from the failure location candidates. Fault diagnosis means,
In the failure diagnosis unit, as a result of execution of the test access through one or more access routes, a failure identification unit that uses the component of the failure point candidate that is the last one as a failure point;
An access route switching unit that switches the access route for normal access between the plurality of devices to an access route that does not pass through a component identified as a failure location by the failure identifying unit;
A failure analysis apparatus characterized by comprising:

The failure according to claim 1, wherein the failure diagnosis unit refers to a setting file in which a start order of test access is stored, and executes the test access in an order indicated by the setting file. Analysis device.

The failure analysis apparatus according to claim 1, wherein the failure diagnosis unit executes the test access with an access route that passes through some components of the access route that has executed the normal access.

The failure analysis apparatus according to claim 1, wherein the failure diagnosis unit executes the test access using an access route that passes through a component different from the access route that executed the normal access.

In a failure analysis method for analyzing a failure in communication between a plurality of devices capable of communicating with a plurality of access routes having a plurality of information devices and one or more transmission paths connecting the information devices as components, by a computer,
When the failure detection means detects a failure in normal access between the plurality of devices, all the components on the access route that executed the normal access are set as failure location candidates,
If the failure diagnosis unit detects a failure by the failure detection unit, the failure diagnosis unit performs a test access to the failure location candidate, and if the failure is not detected by the test access, the access that has performed the test access When all the components on the route are excluded from the failure location candidates and a failure is detected by the test access, all the components not arranged on the access route that has executed the test access are Exclude from failure candidate,
As a result of the execution of the test access by one or more access routes in the failure diagnosis unit, the failure identification unit sets the component of the failure location candidate that is the last one as a failure location,
The access route switching means switches the access route for normal access between the plurality of devices to an access route that does not pass through a component identified as a failure location by the failure identification means.
A failure analysis method characterized by the above.