JP7447674B2

JP7447674B2 - Information processing program, information processing method, and information processing device

Info

Publication number: JP7447674B2
Application number: JP2020090137A
Authority: JP
Inventors: 孝広齊藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2020-05-22
Filing date: 2020-05-22
Publication date: 2024-03-12
Anticipated expiration: 2040-05-22
Also published as: JP2021184224A

Description

本発明は、情報処理プログラム、情報処理方法及び情報処理装置に関する。 The present invention relates to an information processing program, an information processing method, and an information processing apparatus.

各種の製品を市場に出荷する製造元では、製品出荷後の市場品質マネジメントが重要な経営課題となっている。例えば、製品が実際に使用されているフィールドで発生した不具合は、障害レポート（ＭＲ：Maintenance Report）として報告される。このような障害レポートは、例えば、フィールドサポート業務を行うフィールドＳＥによって作成されるが、対象製品によっては電話やメールによるサポートを行うコールセンター業務の中でも作成される。 For manufacturers who ship various products to the market, market quality management after product shipment has become an important management issue. For example, a malfunction that occurs in a field where a product is actually used is reported as a maintenance report (MR). Such a failure report is created, for example, by a field SE that performs field support work, but depending on the target product, it may also be created during call center work that provides support by telephone or email.

そこで、市場品質マネジメントでは、市場に出たフィールド稼働製品の障害レポートを用いて発生事象を特定し、過去事例などから発生原因及び対策方法などを判定する。そして、決定された対策方法にしたがって、障害レポートが提出された障害に対する対応が行われる。 Therefore, market quality management uses failure reports of field-operated products on the market to identify occurrences, and determines causes and countermeasures based on past cases. Then, in accordance with the determined countermeasure method, a response to the failure for which the failure report has been submitted is taken.

また近年、以下のサイクルでフィールド稼働製品の品質向上に向けた取り組みが行われている。障害レポートが各所から報告され、報告された大量の障害レポートに対して分析が行われる。この分析により、例えば、最近増加した不具合の検出が行われる。このような直近で増加した不具合は傾向障害と呼ばれ、偶発障害と区別される。傾向障害は、今後も増加し続ける可能性があるので対策策定の緊急度が高い。例えば、ＯＳ（Operating System）のアップデートにより、フィールド稼働製品に用いられているファームウェアとの間で不具合が発生した場合などは、今後、同じファームウェアを使用している製品全てで発生してしまう事になるので一刻も早く対策を打つ必要がある。 In addition, in recent years, efforts have been made to improve the quality of field-operated products using the following cycle. Failure reports are reported from various places, and analysis is performed on a large number of reported failure reports. This analysis results in, for example, detection of defects that have increased recently. Such failures that have increased recently are called trend failures and are distinguished from random failures. Trend disorders are likely to continue to increase in the future, so there is a high degree of urgency in formulating countermeasures. For example, if an OS (Operating System) update causes a problem with the firmware used in a field-operated product, the problem will occur in all products using the same firmware in the future. Therefore, it is necessary to take measures as soon as possible.

このような傾向障害が検出された場合、製品の製造元は、検出した不具合の内容や対策を文書化し、市場に周知させることで不具合の発生を未然に防止する。例えば、製品の製造元は、「特定のファームウェアのバクに起因する不具合」の最近の増加を検知し、その傾向障害の対応策とともにその傾向障害の内容をフィールドに周知する。これにより、製造元は、該当するファームウェアを搭載した製品に対して利用者に予防措置を実施させて、その障害の発生を未然に防ぐことができる。 When such a trend failure is detected, the product manufacturer prevents the occurrence of the failure by documenting the details of the detected failure and countermeasures, and disseminating the information to the market. For example, a product manufacturer detects a recent increase in "malfunctions caused by specific firmware bugs" and informs the field of the nature of the trending faults along with countermeasures for the trending faults. This allows the manufacturer to prevent the occurrence of failures by having users take preventive measures for products equipped with the relevant firmware.

従来、この傾向障害の検知は人手により行われてきた。これに対して、障害レポートの急激な増加により、人手による傾向障害の検知では、適切な検知が困難になってきた。そこで、コード情報ベースの検知手法やキーワードベースの検知手法といった情報処理装置を用いた傾向障害の検出が導入されてきている。コード情報ベースの検知手法は、不具合内容を表すコード情報が所外レポートに付与さえている場合に、そのコード情報を用いて傾向障害を検知する方法である。また、キーワードベースの検知手法は、障害レポートに記載された単語や文中の主語と述語の係り受け組をコード情報としてみなすことで増加するキーワードを検出して、傾向障害を検出する手法である。また、キーワードベースよりも大きな単位で言葉を取り扱う文書ベースの検知手法を用いることも考えられる。 Conventionally, this trend failure detection has been performed manually. On the other hand, due to the rapid increase in the number of failure reports, it has become difficult to properly detect trend failures manually. Therefore, detection of trend failures using information processing devices, such as code information-based detection methods and keyword-based detection methods, has been introduced. The code information-based detection method is a method for detecting trend failures using code information that indicates the nature of the problem when it is attached to an external report. In addition, the keyword-based detection method is a method for detecting trend failures by detecting an increasing number of keywords by regarding the dependency pairs of subjects and predicates in words and sentences written in failure reports as code information. It is also conceivable to use a document-based detection method that handles words in larger units than keyword-based methods.

なお、文書を分析して検索する技術として、地理上の位置に基づき、提示するクラスタの粒度を変える従来技術がある。また、階層クラスタリング結果における各分岐点での代表ベクトルと、検索目標オブジェクトから生成されるベクトルとの類似度を算出して、算出した類似度の高いクラスタを提示する従来技術がある。 Note that as a technique for analyzing and searching documents, there is a conventional technique that changes the granularity of clusters to be presented based on geographical location. Furthermore, there is a conventional technique that calculates the degree of similarity between a representative vector at each branch point in the hierarchical clustering result and a vector generated from a search target object, and presents clusters with high calculated degrees of similarity.

特開２０１１－１１８７８４号公報Japanese Patent Application Publication No. 2011-118784 特開２００３－３１６８１９号公報Japanese Patent Application Publication No. 2003-316819

しかしながら、コード情報ベースの検知手法の場合、コード情報が表す不具合内容の粒度が荒い。そのため、増加傾向にある障害を見逃されることが多い。例えば、増加傾向の障害と減少傾向の障害とが同一コードに分類されている場合、増加傾向にある障害が見逃されるおそれがある。 However, in the case of a detection method based on code information, the granularity of the defect content represented by the code information is coarse. As a result, disorders that are on the rise are often overlooked. For example, if a fault with an increasing trend and a fault with a decreasing trend are classified into the same code, the fault with an increasing trend may be overlooked.

また、キーワードベースの検知手法を行う場合、単語の表記揺れや同義語及び同義表現を考慮して検知を行うことが好ましい。例えば、「製造元ロゴ画面」、「製造元画面」及び「ＢＩＯＳ（Basic Input Output System）画面」といった単語は、全て同じ画面を指す場合があり、その場合にはそれらの単語が同義語となる。また、「停止する」や「停まる」は同義であり、さらに否定をともなった表現である「進まない」も同義語となる。しかし、単語の表記揺れや同義語及び同義表現を識別するための辞書の整備は、情報が増えるにつれて作成の煩雑さやで作成コストが増加する。そのため、障害レポートが急増する現状においては、キーワードベースの検知を行うことは困難である。そのため、これらの検知手法を用いて傾向障害を検出することは難しく、傾向障害に対する対応が遅れてしまい、製品の品質を向上させることが困難となる。 Furthermore, when performing a keyword-based detection method, it is preferable to perform detection by taking into consideration spelling variations of words, synonyms, and synonymous expressions. For example, the words "manufacturer logo screen," "manufacturer screen," and "BIOS (Basic Input Output System) screen" may all refer to the same screen, in which case these words are synonyms. Furthermore, "to stop" and "to stop" are synonymous, and the negative expression "do not proceed" is also a synonym. However, as the amount of information increases, the cost of creating a dictionary to identify variations in the spelling of words, synonyms, and synonymous expressions increases due to the complexity of creating the dictionary. Therefore, in the current situation where the number of failure reports is rapidly increasing, it is difficult to perform keyword-based detection. Therefore, it is difficult to detect trend failures using these detection methods, and response to trend failures is delayed, making it difficult to improve product quality.

これに対して、文書ベースの検知手法を用いることで、単語の表記揺れや同義語の処理のためのコストを抑えることができる。このような文書のグルーピングを行う場合に、クラスタを用いる手法が考えられる。クラスタを行う場合、クラスタ数となどの処理パラメータが事前に与えられる。ただし、パラメータが不適切であると適切なクラスタが生成されない場合があり、各クラスタの増加傾向を見落とすおそれがある。 On the other hand, by using a document-based detection method, it is possible to reduce the cost of processing word variations and synonyms. When grouping documents like this, a method using clusters can be considered. When clustering is performed, processing parameters such as the number of clusters are given in advance. However, if the parameters are inappropriate, appropriate clusters may not be generated, and there is a risk of overlooking the increasing trend of each cluster.

例えば、小さすぎるクラスタは、各クラスタに含まれる要素間の内容の類似性は高くなるが、障害レポートの発生件数を時間で集約すると件数が少なくなり、有意な傾向を検出し難くなる。例えば、過去６か月の月別集計結果が（０．１．０，２，２，３）の場合、危険率５％のＭａｎｎ－Ｋｅｎｄａｌｌの傾向検定では有意な増加傾向は認められない。ここで、月別集計結果における括弧内の数字は、左から順に過去６か月の古い順の各月の発生件数を表す。 For example, if a cluster is too small, the content similarity between elements included in each cluster will be high, but if the number of occurrences of failure reports is aggregated over time, the number will be small, making it difficult to detect a significant trend. For example, if the monthly aggregate results for the past six months are (0.1.0, 2, 2, 3), no significant increasing trend is recognized in the Mann-Kendall trend test with a risk rate of 5%. Here, the numbers in parentheses in the monthly aggregation results represent the number of occurrences in each month from the left to the oldest in the past six months.

また、複数の障害で構成された大きすぎるクラスタは、実際には増加傾向にある検知したい障害の増加傾向が他の障害によって薄まる可能性があり、検定では有意となりにくい場合がある。例えば、過去６か月の月別集計結果が（０，３，６，１０，１５，２０）である障害は、有意な増加傾向が認められる。一方、月別集計結果が（５２，４８，４４，４８，４６，４７）である不具合は、増加傾向及び減少傾向のいずれも認められない。この２つの障害を含むクラスタの月別集計結果は、両者の和となるが、その場合にはこのクラスタにおいて有意な増加傾向が認められなくなってしまう。 In addition, if a cluster is too large and is made up of multiple faults, the increasing trend of the fault that is desired to be detected may be diluted by other faults, and the test may not be significant. For example, a significant increasing trend is observed for failures whose monthly aggregate results for the past six months are (0, 3, 6, 10, 15, 20). On the other hand, for the defects whose monthly aggregate results are (52, 48, 44, 48, 46, 47), neither an increasing trend nor a decreasing trend is observed. The monthly aggregation result for a cluster that includes these two failures is the sum of both, but in that case, no significant increasing trend will be recognized in this cluster.

そして、単に文書ベースでクラスタリングを行った場合には、クラスタリング実行時に用いるパラメータを適切に決定することが困難であり、増加傾向を見逃すおそれがある。 If clustering is simply performed on a document basis, it is difficult to appropriately determine parameters to be used when performing clustering, and there is a risk of overlooking an increasing trend.

なお、地理上の位置に基づき提示するクラスタの粒度を変える従来技術では、位置に応じて適切な増加傾向にある事例の検出を行うことは難しく、地理的シソーラスに相当する情報を用いずに傾向障害を検出することは困難である。また、目標と結果とのベクトルの類似度を基にクラスタを提示する従来技術では、検索クエリの入力なしでのクラスタの提示は難しく、傾向障害を検出することは困難である。したがって、いずれの従来技術でも、製品の品質を向上させることは困難である。 In addition, with conventional technology that changes the granularity of clusters presented based on geographic location, it is difficult to detect cases that are showing an appropriate increasing trend depending on location, and it is difficult to detect cases that are showing an appropriate increasing trend depending on location. Failures are difficult to detect. Further, in the conventional technology that presents clusters based on the similarity of vectors between a target and a result, it is difficult to present clusters without inputting a search query, and it is difficult to detect trend failures. Therefore, it is difficult to improve the quality of products using any of the conventional techniques.

開示の技術は、上記に鑑みてなされたものであって、製品の品質を向上させる情報処理プログラム、情報処理方法及び情報処理装置を提供することを目的とする。 The disclosed technology has been developed in view of the above, and aims to provide an information processing program, an information processing method, and an information processing device that improve the quality of products.

本願の開示する情報処理プログラム、情報処理方法及び情報処理装置の一つの態様において、以下の処理をコンピュータに実行させる。複数の文書情報を取得する。前記文書情報のそれぞれの内容を数値化して数値化情報を算出する。前記数値化情報を基に階層クラスタリングを行って樹形図を示す樹形図情報を生成する。前記樹形図情報が示す前記樹形図が有する複数の分岐点に対応するクラスタに属する前記文書情報の特徴を特定する。前記樹形図情報が示す前記樹形図が有する複数の分岐点に対応する前記クラスタに含まれる特定クラスタであって、前記特徴が有意性を有し、且つ、対応する前記分岐点を含む階層構造における上位の前記分岐点及び下位の分岐点に対応する前記クラスタよりも強い前記特徴を有する前記特定クラスタを抽出する。前記抽出した前記特定クラスタの情報を出力する。 In one aspect of the information processing program, information processing method, and information processing apparatus disclosed in the present application, a computer is caused to execute the following processing. Get multiple document information. Each content of the document information is digitized to calculate digitized information. Hierarchical clustering is performed based on the numerical information to generate dendrogram information indicating a dendrogram. A feature of the document information belonging to a cluster corresponding to a plurality of branch points of the tree diagram indicated by the tree diagram information is identified. A specific cluster included in the cluster corresponding to a plurality of branch points included in the tree diagram indicated by the tree diagram information, the feature having significance, and a hierarchy including the corresponding branch point. The specific cluster having the feature stronger than the cluster corresponding to the upper branching point and the lower branching point in the structure is extracted. Information about the extracted specific cluster is output.

１つの側面では、本発明は、製品の品質を向上させることができる。 In one aspect, the present invention can improve product quality.

図１は、傾向障害検出システムを示す図である。FIG. 1 is a diagram showing a trend failure detection system. 図２は、障害レポートの一例を表す図である。FIG. 2 is a diagram illustrating an example of a failure report. 図３は、サーバ装置のブロック図である。FIG. 3 is a block diagram of the server device. 図４は、階層クラスタリング部及び傾向障害検出部の詳細を表すブロック図である。FIG. 4 is a block diagram showing details of the hierarchical clustering unit and the trend failure detection unit. 図５は、デンドログラムの一例を表す図である。FIG. 5 is a diagram showing an example of a dendrogram. 図６は、分岐点毎の増加傾向評価値の一例を表す図である。FIG. 6 is a diagram showing an example of increasing tendency evaluation values for each branch point. 図７は、クラスタの抽出を説明するための図である。FIG. 7 is a diagram for explaining cluster extraction. 図８は、実施例１に係る傾向障害算出処理のフローチャートである。FIG. 8 is a flowchart of the trend failure calculation process according to the first embodiment. 図９は、特定顧客に特異的に多い障害の検出処理における階層クラスタの一部を表す図である。FIG. 9 is a diagram showing part of a hierarchical cluster in the process of detecting failures that are uniquely common to specific customers. 図１０は、サーバ装置のハードウェア構成図である。FIG. 10 is a hardware configuration diagram of the server device.

以下に、本願の開示する情報処理プログラム、情報処理方法及び情報処理装置の実施例を図面に基づいて詳細に説明する。なお、以下の実施例により本願の開示する情報処理プログラム、情報処理方法及び情報処理装置が限定されるものではない。 Embodiments of an information processing program, an information processing method, and an information processing apparatus disclosed in the present application will be described in detail below based on the drawings. Note that the information processing program, information processing method, and information processing apparatus disclosed in the present application are not limited to the following embodiments.

図１は、傾向障害検出システムを示す図である。傾向障害検出システム１０は、図１に示すように、サーバ装置１、障害レポート入力端末２、障害レポートデータベース３、単語ベクトルデータベース４、検索結果出力用クライアント端末５を有する。 FIG. 1 is a diagram showing a trend failure detection system. As shown in FIG. 1, the trend failure detection system 10 includes a server device 1, a failure report input terminal 2, a failure report database 3, a word vector database 4, and a client terminal 5 for outputting search results.

サーバ装置１と障害レポート入力端末２とは、ネットワーク６を介して接続される。また、サーバ装置１には、障害レポートデータベース３及び単語ベクトルデータベース４が接続される。さらに、検索結果出力用クライアント端末５は、サーバ装置１に接続されてもよいし、ネットワーク６に接続されてもよい。 The server device 1 and the failure report input terminal 2 are connected via a network 6. Furthermore, a failure report database 3 and a word vector database 4 are connected to the server device 1 . Further, the search result output client terminal 5 may be connected to the server device 1 or to the network 6.

障害レポート入力端末２は、企業などで用いられて運用状態にある製品に障害が発生した場合に、障害対応を行った作業者が、その障害の情報を入力するための装置である。障害レポート入力端末２は、作業者から入力された障害の情報を取得する。そして、障害レポート入力端末２は、取得した障害の情報を障害レポートとして、ネットワーク６を介してサーバ装置１へ送信する。障害の情報には、例えば、その障害の情報の登録が行われた発行日、障害が発生した顧客の顧客名、障害が発生した製品の種別及び機種名、発生した障害の現象、障害の原因、その障害への対処及び対処完了日などが含まれる。 The failure report input terminal 2 is a device used by a worker who has responded to a failure to input information about the failure when a failure occurs in a product that is used in a company and is in operation. The failure report input terminal 2 acquires failure information input by the worker. Then, the failure report input terminal 2 transmits the acquired failure information to the server device 1 via the network 6 as a failure report. The failure information includes, for example, the publication date when the failure information was registered, the name of the customer where the failure occurred, the type and model name of the product where the failure occurred, the phenomenon of the failure that occurred, and the cause of the failure. , the response to the failure and the date on which the response was completed.

障害レポートデータベース３は、サーバ装置１により各障害レポート入力端末２から収集された障害レポートの情報が入力される。そして、障害レポートデータベース３は、入力された障害レポートをまとめて、図２に示すような障害レポート１００を作成する。図２は、障害レポートの一例を表す図である。障害レポート１００は、個々の障害レポート１０１を複数含む。障害レポート１０１は、文書情報であり、障害レポート入力端末２から送信される１つ１つの障害レポートである。 The failure report database 3 receives failure report information collected from each failure report input terminal 2 by the server device 1 . Then, the failure report database 3 compiles the input failure reports and creates a failure report 100 as shown in FIG. 2. FIG. 2 is a diagram illustrating an example of a failure report. The failure report 100 includes a plurality of individual failure reports 101. The failure report 101 is document information, and is each failure report transmitted from the failure report input terminal 2.

各障害レポート１０１は、発行日などの定型情報を記載するフィールドと、現象、原因及び対処法などの自由記述フィールドを有する。ここで、本実施例では発行日を記載したが、時刻まで含む発行日時でもよい。すなわち、障害レポート１０１は、時刻に関連付けられた文書情報であり、時刻には日にちや日時が含まれる。例えば、障害レポート１０１は、図２に示すように、発行日、顧客名、種別、機種名及び対処完了日といった定型情報、現象、原因及び対処法とった自由記述情報を含む。また、障害レポート１００の情報は、ここで挙げた情報に限らず、交換部品の部品型名などのその他の情報を含んでもよく、傾向障害だけでなく、様々な観点でクラスタを生成可能である。 Each failure report 101 has a field for writing standard information such as issue date, and a free description field for phenomena, causes, countermeasures, etc. Here, although the issue date is described in this embodiment, the issue date and time including the time may also be used. That is, the failure report 101 is document information associated with time, and time includes date and time. For example, as shown in FIG. 2, the failure report 101 includes standard information such as issue date, customer name, type, model name, and response completion date, as well as free description information such as the phenomenon, cause, and response method. Furthermore, the information in the failure report 100 is not limited to the information listed here, but may also include other information such as part model names of replacement parts, and clusters can be generated from various viewpoints in addition to trend failures. .

障害レポートデータベース３は、障害レポート１００を保持する。障害レポートデータベース３が保持する障害レポート１００は、障害レポート入力端末２からの入力時又は定期的に、新たな障害レポート１０１が登録されて更新される。 The failure report database 3 holds failure reports 100. The failure report 100 held by the failure report database 3 is updated by registering a new failure report 101 when input from the failure report input terminal 2 or periodically.

単語ベクトルデータベース４は、単語を分散表現と呼ばれるベクトルで表現した単語ベクトルを保持する。分散表現で表現された単語ベクトルは、正解を与えずとも用例が似ている単語は同じようなベクトルとして表現される。すなわち、用例が似た単語は、同じように扱われるため、表記ゆれが存在する単語や同義語は同じように扱われる。 The word vector database 4 holds word vectors that represent words as vectors called distributed representations. In word vectors expressed using distributed representation, words that have similar usage examples are expressed as similar vectors even if no correct answer is given. In other words, words with similar usages are treated the same way, so words with different spellings and synonyms are treated the same way.

サーバ装置１は、ネットワーク６を介して各障害レポート入力端末２から送信された障害レポート１０１を収集する。そして、サーバ装置１は、障害レポート１０１を障害レポートデータベース３に格納させる。 The server device 1 collects failure reports 101 sent from each failure report input terminal 2 via the network 6. Then, the server device 1 stores the failure report 101 in the failure report database 3.

そして、サーバ装置１は、単語ベクトルデータベース４に格納された単語ベクトルを用いて障害レポート１０１に含まれる障害レポート１０１それぞれをベクトル化し、類似度を用いてクラスタリングして複数の文書クラスタを生成する。次に、サーバ装置１は、各文書クラスタ内の障害の発生件数を集計して傾向障害を検出する。その後、サーバ装置１は、検出した傾向障害を検索結果出力用クライアント端末５へ送信する。 Then, the server device 1 vectorizes each of the failure reports 101 included in the failure report 101 using the word vectors stored in the word vector database 4, and performs clustering using the similarity to generate a plurality of document clusters. Next, the server device 1 totals the number of occurrences of failures in each document cluster and detects trend failures. Thereafter, the server device 1 transmits the detected trend failure to the search result output client terminal 5.

検索結果出力用クライアント端末５は、傾向障害の検出結果を表示して利用者に提供する装置である。検索結果出力用クライアント端末５は、検出された傾向障害の情報をサーバ装置１から受信する。そして、検索結果出力用クライアント端末５は、取得した傾向障害の情報を出力してモニタなどへの表示を行うことで、傾向障害の情報を利用者に通知する。 The search result output client terminal 5 is a device that displays the detection results of trend failures and provides them to the user. The search result output client terminal 5 receives information on detected trend failures from the server device 1 . Then, the search result output client terminal 5 notifies the user of the trend failure information by outputting the acquired trend failure information and displaying it on a monitor or the like.

次に、図３を参照して、サーバ装置１の詳細について説明する。図３は、サーバ装置のブロック図である。サーバ装置１は、図３に示すように、障害レポート情報取得部１１、階層クラスタリング部１２、傾向障害検出部１３及び出力部１４を有する。 Next, details of the server device 1 will be explained with reference to FIG. 3. FIG. 3 is a block diagram of the server device. As shown in FIG. 3, the server device 1 includes a failure report information acquisition section 11, a hierarchical clustering section 12, a trend failure detection section 13, and an output section 14.

障害レポート情報取得部１１は、障害レポート入力端末２から送信された障害レポートを収集する。そして、障害レポート情報取得部１１は、収集した障害レポート１０１を障害レポートデータベース３へ送信して障害レポート１００を更新する。 The failure report information acquisition unit 11 collects failure reports sent from the failure report input terminal 2. Then, the failure report information acquisition unit 11 transmits the collected failure report 101 to the failure report database 3 and updates the failure report 100.

階層クラスタリング部１２は、障害レポート１０１に含まれる単語の単語ベクトルの作成及び作成した単語ベクトルを用いた障害レポート１０１のベクトル化、並びに、ベクトル化した障害レポート１０１のクラスタリングを行う。図４は、階層クラスタリング部及び傾向障害検出部の詳細を表すブロック図である。階層クラスタリング部１２は、図４に示すように、文解析部１２１、単語ベクトル作成部１２２、文書ベクトル作成部１２３及びデンドログラム作成部１２４を有する。 The hierarchical clustering unit 12 creates word vectors of words included in the failure report 101, vectorizes the failure report 101 using the created word vectors, and clusters the vectorized failure report 101. FIG. 4 is a block diagram showing details of the hierarchical clustering unit and the trend failure detection unit. As shown in FIG. 4, the hierarchical clustering unit 12 includes a sentence analysis unit 121, a word vector creation unit 122, a document vector creation unit 123, and a dendrogram creation unit 124.

文解析部１２１は、障害レポートデータベース３に格納された障害レポート１００に含まれる全ての障害レポート１０１を取得する。そして、文解析部１２１は、取得した障害レポート１０１を解析して単語を抽出する。その後、文解析部１２１は、抽出した単語を単語ベクトル作成部１２２へ出力する。 The sentence analysis unit 121 acquires all the failure reports 101 included in the failure reports 100 stored in the failure report database 3. The sentence analysis unit 121 then analyzes the obtained failure report 101 and extracts words. Thereafter, the sentence analysis unit 121 outputs the extracted words to the word vector creation unit 122.

また、文解析部１２１は、障害レポート１００から傾向障害の検出に用いる複数の障害レポート１０１を取得する。例えば、過去６か月の傾向を判定する場合、文解析部１２１は過去６か月以内に発生した障害が登録された全ての障害レポート１０１を取得する。そして、文解析部１２１は、取得した障害レポート１０１を文書ベクトル作成部１２３へ出力する。 Furthermore, the sentence analysis unit 121 acquires a plurality of failure reports 101 from the failure report 100 to be used for detecting trend failures. For example, when determining trends over the past six months, the sentence analysis unit 121 obtains all failure reports 101 in which failures that have occurred within the past six months are registered. The sentence analysis unit 121 then outputs the obtained failure report 101 to the document vector creation unit 123.

単語ベクトル作成部１２２は、障害レポート１００に含まれる単語の数値化を行う。具体的には、単語ベクトル作成部１２２は、障害レポート１００から抽出された単語の入力を文解析部１２１から受ける。そして、単語ベクトル作成部１２２は、分散表現を用いて取得した単語を表して単語ベクトルを生成する。例えば、単語ベクトル作成部１２２は、Ｗｏｒｄ２Ｖｅｃを用いて単語ベクトルを生成する。単語ベクトル作成部１２２は、自由記述文中の単語の使われ方をニューラルネットワークにより学習して、各単語をｎ次元のベクトルに変換する。次元数であるｎは、操作者により指定される値である。その後、単語ベクトル作成部１２２は、生成した単語ベクトルを単語ベクトルデータベース４に格納する。 The word vector creation unit 122 digitizes words included in the failure report 100. Specifically, the word vector creation unit 122 receives input of words extracted from the failure report 100 from the sentence analysis unit 121. Then, the word vector generation unit 122 generates a word vector by representing the acquired word using the distributed representation. For example, the word vector generation unit 122 generates a word vector using Word2Vec. The word vector creation unit 122 uses a neural network to learn how words are used in free description sentences, and converts each word into an n-dimensional vector. The number of dimensions, n, is a value specified by the operator. Thereafter, the word vector creation unit 122 stores the generated word vectors in the word vector database 4.

文書ベクトル作成部１２３は、障害レポート１０１の数値化を行う。具体的には、文書ベクトル作成部１２３は、傾向障害の検出に用いる障害レポート１０１の入力を文解析部１２１から受ける。そして、文書ベクトル作成部１２３は、単語ベクトルデータベース４に格納された単語ベクトルを用いて、各障害レポート１０１における現象、原因及び対処といった文書をベクトル化して、それぞれの障害レポート１０１の文書ベクトルを生成する。その後、文書ベクトル作成部１２３は、文書ベクトルで表された各障害レポート１０１をデンドログラム作成部１２４へ出力する。このベクトル化が、「数値化」の一例にあたり、文書ベクトルが「数値化情報」の一例にあたる。そして、この文書ベクトル作成部１２３が、「数値化部」の一例にあたる。 The document vector creation unit 123 digitizes the failure report 101. Specifically, the document vector creation unit 123 receives input of the failure report 101 used for detecting trend failures from the sentence analysis unit 121. Then, the document vector creation unit 123 uses the word vectors stored in the word vector database 4 to vectorize documents such as phenomena, causes, and countermeasures in each failure report 101, and generates document vectors for each failure report 101. do. Thereafter, the document vector creation unit 123 outputs each failure report 101 represented by the document vector to the dendrogram creation unit 124. This vectorization is an example of "digitization," and the document vector is an example of "digitization information." This document vector creation unit 123 is an example of a "digitization unit".

ここで、文書ベクトル作成部１２３による文書ベクトルの作成処理の一例を詳細に説明する。入力となるテキスト情報は、障害レポート１０１において、現象、原因及び対処などのフィールドに分割されて記述されている。文書ベクトル作成部１２３は、フィールド毎の自由記述文に含まれる単語の単語ベクトルから重心ベクトルを算出する。これにより、文書ベクトル作成部１２３は、フィールド毎の自由記述文をｎ次元のベクトルとして表現する。 Here, an example of the document vector creation process by the document vector creation unit 123 will be described in detail. Input text information is described in the failure report 101 divided into fields such as phenomenon, cause, and countermeasure. The document vector creation unit 123 calculates a centroid vector from the word vectors of words included in the free description text for each field. Thereby, the document vector creation unit 123 expresses the free description text for each field as an n-dimensional vector.

他にも、文書ベクトル作成部１２３は、自由記述文に含まれる各単語の重みを考慮した重み付き重心ベクトルを採用することも可能である。例えば、文書ベクトル作成部１２３は、その文書における単語の出現確率が偏っている単語をより重視して重み付けを行っても良い。この場合、文書ベクトル作成部１２３は、単語をｗとして、ｗの重み＝文書におけるｗの出現頻度×ｌｏｇ（全文書数／ｗを含む文書数）として算出することができる。また、文書ベクトル作成部１２３は、算出した重心ベクトルに対して正規化を行ってもよい。例えば、文書ベクトル作成部１２３は、各重心ベクトルそれぞれのベクトルの長さを１にすることで正規化を行うことができる。 Alternatively, the document vector creation unit 123 can also employ a weighted centroid vector that takes into consideration the weight of each word included in the free description text. For example, the document vector creation unit 123 may perform weighting by giving more importance to words whose appearance probabilities in the document are biased. In this case, the document vector creation unit 123 can calculate the weight of w=frequency of appearance of w in documents×log (total number of documents/number of documents including w), where w is a word. Further, the document vector creation unit 123 may normalize the calculated gravity center vector. For example, the document vector creation unit 123 can perform normalization by setting the length of each centroid vector to 1.

各フィールドの重心ベクトルの算出後、文書ベクトル作成部１２３は、例えば以下の２つの方法のいずれかを用いて文書ベクトルを求める。１つのベクトル化の方法として、文書ベクトル作成部１２３は、フィールドから得られたｎ次元のベクトルの重心を文書ベクトルとして採用する。この場合、１つの文書がｎ次元のベクトルとして表現される。ここで、文書ベクトル作成部１２３は、求めた文書のベクトルを正規化して用いてもよい。 After calculating the centroid vector of each field, the document vector creation unit 123 calculates the document vector using, for example, one of the following two methods. As one vectorization method, the document vector creation unit 123 uses the center of gravity of the n-dimensional vector obtained from the field as the document vector. In this case, one document is expressed as an n-dimensional vector. Here, the document vector creation unit 123 may normalize and use the obtained document vector.

また、他のベクトル化の方法として、文書ベクトル作成部１２３は、各フィールドのｎ次元ベクトルを結合したベクトルを文書ベクトルとして採用しても良い。例えば、現象、原因及び対処の３つのフィールドを用いる場合、障害レポート１０１は３次元のベクトルとして表現される。この場合も、文書ベクトル作成部１２３は、求めた文書のベクトルを正規化して用いてもよい。 Further, as another vectorization method, the document vector creation unit 123 may employ a vector obtained by combining n-dimensional vectors of each field as the document vector. For example, when using three fields: phenomenon, cause, and countermeasure, the failure report 101 is expressed as a three-dimensional vector. In this case as well, the document vector creation unit 123 may normalize and use the obtained document vector.

デンドログラム作成部１２４は、文書ベクトルを用いて表された各障害レポート１０１の入力を文書ベクトル作成部１２３から受ける。次に、デンドログラム作成部１２４は、文書ベクトルの類似度を用いて、類似度が近い障害レポート１０１を順番にまとめていく階層クラスタ分析を実行する。例えば、デンドログラム作成部１２４は、各障害レポート１０１をそれぞれの１つのクラスタとして、文書ベクトルで表されるクラスタ間の距離を用いてクラスタ同士をまとめていくことで、大きなクラスタを生成するcomplete linkを用いる。 The dendrogram creation unit 124 receives input of each failure report 101 expressed using a document vector from the document vector creation unit 123. Next, the dendrogram creation unit 124 uses the similarity of the document vectors to perform hierarchical cluster analysis in which failure reports 101 with similar degrees of similarity are grouped in order. For example, the dendrogram creation unit 124 generates a large cluster by treating each failure report 101 as one cluster and combining the clusters using the distance between the clusters represented by the document vector. Use.

デンドログラム作成部１２４は、図５に示すように障害レポート１０１のデンドログラム２００を作成する。図５は、デンドログラムの一例を表す図である。デンドログラム２００は「樹形図」とも呼ばれ、デンドログラム２００を示す情報が、「樹形図情報」の一例にあたる。 The dendrogram creation unit 124 creates a dendrogram 200 of the failure report 101, as shown in FIG. FIG. 5 is a diagram showing an example of a dendrogram. The dendrogram 200 is also called a "dendrogram," and information indicating the dendrogram 200 is an example of "dendrogram information."

図５における最下層のクラスタ２０１が、それぞれ１つの障害レポート１０１にあたる。そして、デンドログラム２００における１つの分岐点２０２が、２つのクラスタが統合された１つのクラスタに対応する。デンドログラム作成部１２４は、作成したデンドログラム２００を傾向障害検出部１３へ出力する。 Each cluster 201 at the lowest level in FIG. 5 corresponds to one failure report 101. One branch point 202 in the dendrogram 200 corresponds to one cluster obtained by integrating two clusters. The dendrogram creation unit 124 outputs the created dendrogram 200 to the trend failure detection unit 13.

ここで、実際は非常に多くの障害レポート１０１を用いてクラスタリングを行うため、デンドログラム２００における分岐点２０２の数が非常に多くなる。また、階層クラスタリングは、計算コストが大きいため計算の処理数を抑えることが好ましい。そこで、例えば、デンドログラム作成部１２４は、ｋ－ｍｅａｎｓ法などの粒度が粗く且つ計算コストが小さい方式を用いて、障害レポート１０１を大きく分けたうえで、階層クラスタリングを行っても良い。これにより、クラスタリングの処理時間を短縮することができ、且つ、計算コストを低く抑えることが可能となる。 Here, since clustering is actually performed using a very large number of failure reports 101, the number of branch points 202 in the dendrogram 200 becomes very large. Further, since hierarchical clustering requires a large calculation cost, it is preferable to reduce the number of calculation processes. Therefore, for example, the dendrogram creation unit 124 may use a method such as the k-means method that has coarse granularity and low calculation cost to broadly divide the failure report 101 and then perform hierarchical clustering. This makes it possible to shorten the clustering processing time and to keep the calculation cost low.

図４に戻って説明を続ける。傾向障害検出部１３は、分岐点評価部１３１及び抽出部１３２を有する。 Returning to FIG. 4, the explanation will be continued. The trend failure detection unit 13 includes a branch point evaluation unit 131 and an extraction unit 132.

分岐点評価部１３１は、デンドログラム２００の入力をデンドログラム作成部１２４から受ける。次に、分岐点評価部１３１は、図５に示すデンドログラム２００の各分岐点２０２を抽出する。そして、分岐点評価部１３１は、分岐点２０２毎に増加傾向評価値を算出する。 The branch point evaluation unit 131 receives input of the dendrogram 200 from the dendrogram creation unit 124. Next, the branch point evaluation unit 131 extracts each branch point 202 of the dendrogram 200 shown in FIG. Then, the branch point evaluation unit 131 calculates an increasing tendency evaluation value for each branch point 202.

例えば、分岐点評価部１３１は、図６に示すように各分岐点２０２について増加傾向評価値を算出する。図６は、分岐点毎の増加傾向評価値の一例を表す図である。図６におけるクラスタ２１１～２１３がそれぞれデンドログラム２００における分岐点２０２におけるクラスタにあたる。 For example, the branch point evaluation unit 131 calculates an increasing tendency evaluation value for each branch point 202 as shown in FIG. FIG. 6 is a diagram showing an example of increasing tendency evaluation values for each branch point. Clusters 211 to 213 in FIG. 6 correspond to the clusters at the branch point 202 in the dendrogram 200, respectively.

例えば、分岐点評価部１３１は、クラスタ２１２に属する障害の過去６か月の発生件数として（０，１，０，２，２，３）を取得する。ここで、括弧内の数字は、紙面に向かって左から古い順に６か月の古い順の各月の発生件数を表す。次に、分岐点評価部１３１は、過去６か月の発生件数に対してＭａｎｎ－Ｋｅｎｄａｌｌの傾向検定を行い単調性の指標となるｔａｕ統計量を算出する。ここでは、分岐点評価部１３１は、クラスタ２１２においてｔａｕ＝０．７８８と算出する。次に、分岐点評価部１３１は、ｔａｕを用いて統計的検定に用いられる増加傾向評価値であるＰ値を算出する。Ｐ値は、偏りがないと考えられる帰無仮説が成立する場合に観測結果以上の偏りが発生する確率である。Ｍａｎｎ－Ｋｅｎｄａｌｌの傾向検定の場合、帰無仮説下では、ｔａｕ統計量が標準正規分布にしたがうため、分岐点評価部１３１は、ｔａｕ統計量からＰ値を算出することができる。例えば、分岐点評価部１３１は、クラスタ２１２のＰ値を０．７８８と算出する。 For example, the branch point evaluation unit 131 obtains (0, 1, 0, 2, 2, 3) as the number of occurrences of failures belonging to the cluster 212 in the past six months. Here, the numbers in parentheses represent the number of occurrences in each month in 6-month chronological order starting from the left on the page. Next, the branching point evaluation unit 131 performs a Mann-Kendall trend test on the number of occurrences in the past six months and calculates a tau statistic that is an index of monotony. Here, the branch point evaluation unit 131 calculates tau=0.788 in the cluster 212. Next, the branching point evaluation unit 131 uses tau to calculate the P value, which is an increasing trend evaluation value used in the statistical test. The P value is the probability that a bias greater than the observed result will occur when the null hypothesis that there is no bias holds true. In the case of the Mann-Kendall trend test, the tau statistic follows a standard normal distribution under the null hypothesis, so the branch point evaluation unit 131 can calculate the P value from the tau statistic. For example, the branch point evaluation unit 131 calculates the P value of the cluster 212 as 0.788.

同様に、分岐点評価部１３１は、クラスタ２１３に属する障害の過去６か月の発生件数として（１，０，１，１，２，２）を取得する。そして、分岐点評価部１３１は、クラスタ２１３のｔａｕ統計量を０．７０１と算出し、Ｐ値を０．１００と算出する。 Similarly, the branch point evaluation unit 131 obtains (1, 0, 1, 1, 2, 2) as the number of occurrences of failures belonging to the cluster 213 in the past six months. Then, the branch point evaluation unit 131 calculates the tau statistic of the cluster 213 as 0.701, and calculates the P value as 0.100.

次に、分岐点評価部１３１は、クラスタ２１２とクラスタ２１３との過去６か月の発生件数を合計して、クラスタ２１１に属する障害の過去６か月の発生件数として（１，１，１，３，４，５）を取得する。そして、分岐点評価部１３１は、クラスタ２１１のｔａｕ統計量を０．８９４と算出し、Ｐ値を０．０２７と算出する。 Next, the branching point evaluation unit 131 adds up the number of occurrences in the past six months in the clusters 212 and 213, and calculates the number of occurrences in the past six months of failures belonging to the cluster 211 as (1, 1, 1, 3, 4, 5). Then, the branch point evaluation unit 131 calculates the tau statistic of the cluster 211 as 0.894, and calculates the P value as 0.027.

分岐点評価部１３１は、このようにＰ値の算出をデンドログラム２００の全ての分岐点２０２に対応するクラスタついて行う。そして、分岐点評価部１３１は、各クラスタの増加傾向評価値であるＰ値を各クラスタの識別情報とともに抽出部１３２へ出力する。 The branch point evaluation unit 131 thus calculates the P value for the clusters corresponding to all the branch points 202 of the dendrogram 200. Then, the branch point evaluation unit 131 outputs the P value, which is the increasing tendency evaluation value of each cluster, to the extraction unit 132 together with the identification information of each cluster.

ここで、本実施例では、分岐点評価部１３１は、デンドログラム２００の全ての分岐点２０２に対応する全てのクラスタに対して増加傾向評価値を算出した。しかし、下層側のクラスタは粒度が細かすぎて有意な増加傾向が認められないことが多い。そこで、分岐点評価部１３１は、予め決められた数以上の障害レポート１０１を含むクラスタを選択して、その選択したクラスタに限定して増加傾向評価値を求めて、その選択したクラスタの中から抽出部１３２に傾向障害を表すクラスタを抽出させても良い。これにより、計算コストを削減することができる。 Here, in this embodiment, the branching point evaluation unit 131 calculated increasing trend evaluation values for all clusters corresponding to all the branching points 202 of the dendrogram 200. However, the granularity of the clusters on the lower layer side is often too fine to show any significant increasing trend. Therefore, the branching point evaluation unit 131 selects clusters including a predetermined number or more of failure reports 101, calculates an increasing trend evaluation value limited to the selected clusters, and The extraction unit 132 may extract clusters representing trend failures. Thereby, calculation costs can be reduced.

抽出部１３２は、デンドログラム２００における全ての分岐点２０２に対応するそれぞれのクラスタの識別情報及び増加傾向評価値であるＰ値の入力を分岐点評価部１３１から受ける。そして、分岐点評価部１３１は、統計的検定として所定の危険率と算出したＰ値とを比較して各クラスタに有意な増加傾向が認められるか否を判定する。例えば、危険率を５％と設定した場合、Ｐ値＜０．０５であれば、抽出部１３２は、有意な増加傾向が認められると判定する。 The extraction unit 132 receives from the branch point evaluation unit 131 the identification information of each cluster corresponding to all the branch points 202 in the dendrogram 200 and the P value that is the increasing tendency evaluation value. Then, the branching point evaluation unit 131 compares a predetermined risk rate with the calculated P value as a statistical test to determine whether a significant increasing tendency is observed in each cluster. For example, when the risk rate is set to 5%, if the P value is <0.05, the extraction unit 132 determines that a significant increasing trend is recognized.

例えば、図５におけるクラスタ２１２のＰ値は０．７８８であり危険率５％より大きいので、抽出部１３２は、クラスタ２１２において有意な増加傾向は認められないと判定する。同様に、クラスタ２１３のＰ値は０．７０１であり危険率５％より大きいので、抽出部１３２は、クラスタ２１３において有意な増加傾向は認められないと判定する。これに対して、クラスタ２１１のＰ値は０．０２７であり危険率５％以下であるので、抽出部１３２は、クラスタ２１１において有意な増加傾向が認められると判定する。このように、含まれる障害レポート１０１の件数が少ないために有意な増加傾向が認められなかったクラスタ２１２及び２１３が統合されることにより、有意な増加傾向が認められるクラスタ２１１が生成される場合がある。 For example, since the P value of cluster 212 in FIG. 5 is 0.788, which is greater than the risk rate of 5%, the extraction unit 132 determines that no significant increasing trend is observed in cluster 212. Similarly, since the P value of cluster 213 is 0.701, which is greater than the risk rate of 5%, the extraction unit 132 determines that no significant increasing trend is observed in cluster 213. On the other hand, since the P value of cluster 211 is 0.027, which is a risk rate of 5% or less, the extraction unit 132 determines that a significant increasing tendency is recognized in cluster 211. In this way, by merging clusters 212 and 213 in which no significant increasing trend was observed due to the small number of included failure reports 101, cluster 211 where a significant increasing trend was observed may be generated. be.

次に、抽出部１３２は、特定の分岐点２０２を選択した場合に、その分岐点２０２を含む階層構造において増加傾向が最大となる分岐点２０２のクラスタを抽出する。図７は、クラスタの抽出を説明するための図である。例えば、クラスタ２２３を選択した場合に、クラスタ２２３に対応する分岐点２０２を含む階層構造の分岐点２０２にはクラスタ２２１、２２２及び２２４～２２９が対応する。その中で、クラスタ２２１及び２２７～２２９は有意な増加傾向が認められず、クラスタ２２２～２２６は有意な増加傾向が認められる場合で説明する。なお、この場合のクラスタ２２１は、クラスタ２２２を含む下位のクラスタの統合により有意性が消失したと言える。この場合に、クラスタ２２３の増加傾向がクラスタ２２２及び２２４～２２６の増加傾向よりも大きい、すなわち増加傾向が強ければ、抽出部１３２は、クラスタ２２３を抽出する。抽出部１３２は、このようなクラスタをデンドログラム２００の中の全ての分岐点２０２から抽出する。この抽出部１３２により抽出されたクラスタが、「特定クラスタ」の一例にあたる。 Next, when a specific branch point 202 is selected, the extraction unit 132 extracts a cluster of branch points 202 having the maximum increasing tendency in the hierarchical structure that includes that branch point 202. FIG. 7 is a diagram for explaining cluster extraction. For example, when cluster 223 is selected, clusters 221, 222, and 224 to 229 correspond to branch point 202 in a hierarchical structure that includes branch point 202 corresponding to cluster 223. Among them, clusters 221 and 227 to 229 have no significant increasing tendency, and clusters 222 to 226 have a significant increasing tendency. Note that the cluster 221 in this case can be said to have lost its significance due to the integration of lower-order clusters including the cluster 222. In this case, if the increasing tendency of the cluster 223 is larger than the increasing tendency of the clusters 222 and 224 to 226, that is, if the increasing tendency is strong, the extraction unit 132 extracts the cluster 223. The extraction unit 132 extracts such clusters from all branch points 202 in the dendrogram 200. The cluster extracted by this extraction unit 132 is an example of a "specific cluster."

この抽出部１３２によるクラスタの抽出方法の具体例を以下に説明する。抽出部１３２は、増加傾向評価値であるＰ値が高い順に各分岐点２０２に対応するクラスタの識別子をソートした選択リストを作成する。 A specific example of the method for extracting clusters by the extraction unit 132 will be described below. The extraction unit 132 creates a selection list in which cluster identifiers corresponding to each branch point 202 are sorted in descending order of P value, which is an increasing tendency evaluation value.

次に、抽出部１３２は、リストの先頭から順にエントリを抽出する。そして、抽出したエントリのクラスタに有意な増加傾向が認められるか否かを判定する。有意な増加傾向が認められれば、抽出部１３２は、抽出したエントリを抽出リストに追加する。そして、抽出部１３２は、抽出したエントリのクラスタに対応する分岐点２０２の下位の分岐点２０２に対応するクラスタのエントリを選択リストから削除する。次に、抽出部１３２は、削除したエントリの間を詰めて選択リストのエントリをソートしなおす。その後、抽出部１３２は、次の位置のエントリの抽出を行う。 Next, the extraction unit 132 sequentially extracts entries from the top of the list. Then, it is determined whether a significant increasing trend is observed in the clusters of the extracted entries. If a significant increasing trend is recognized, the extraction unit 132 adds the extracted entry to the extraction list. Then, the extraction unit 132 deletes the entry of the cluster corresponding to the branch point 202 below the branch point 202 corresponding to the cluster of the extracted entry from the selection list. Next, the extraction unit 132 re-sorts the entries in the selection list by closing the spaces between the deleted entries. After that, the extraction unit 132 extracts the entry at the next position.

これに対して、抽出したエントリのクラスタに有意な増加傾向が認められなければ、抽出部１３２は、選択リストのそれ以下のエントリに有意な増加傾向が認められるエントリが存在しないので、抽出リストの作成を終了する。そして、抽出部１３２は、作成した抽出リストに登録されたエントリに対応するクラスタの情報を出力部１４へ出力する。ここで、クラスタの情報としては、そのクラスタがどのような障害を表すグループであるかが識別できる情報であればよい。例えば、クラスタの情報は、抽出されたクラスタに属する障害レポート１０１の情報であっても良いし、そのクラスタに含まれる障害レポート１０１間で類似度の高い障害に関する単語であっても良い。 On the other hand, if a significant increasing tendency is not recognized in the cluster of extracted entries, the extraction unit 132 determines that there is no entry in the selection list below which a significant increasing tendency is recognized. Finish creating. Then, the extraction unit 132 outputs information on clusters corresponding to the entries registered in the created extraction list to the output unit 14. Here, the cluster information may be any information that can identify what kind of failure the cluster represents. For example, the cluster information may be information on failure reports 101 belonging to the extracted cluster, or may be words related to failures that have a high degree of similarity among the failure reports 101 included in the cluster.

また、抽出部１３２は、評価基準に対応させて抽出するクラスタ数の下限や上限を決定しても良い。本実施例では、ある分岐点２０２におけるクラスタの増加傾向評価値がその分岐点２０２を含む階層構造において最大の増加傾向評価値であるという条件にあう分岐点２０２に対応するクラスタを抽出したが、抽出したクラスタの数が下限に達しない場合がある。その場合は、抽出部１３２は、抽出したクラスタを除いて選択リストを再度作成して、上述した抽出リストを再度作成して前のリストに加えても良い。また、上限を超えた場合には、抽出部１３２は、増加傾向評価値の高い順に上限に収まる数のクラスタを抽出しても良い。 Further, the extraction unit 132 may determine a lower limit or an upper limit of the number of clusters to be extracted in accordance with the evaluation criteria. In this example, clusters corresponding to a branch point 202 that meet the condition that the increasing tendency evaluation value of the cluster at a certain branch point 202 is the largest increasing tendency evaluation value in the hierarchical structure that includes that branch point 202 are extracted. The number of extracted clusters may not reach the lower limit. In that case, the extraction unit 132 may create the selection list again excluding the extracted cluster, create the above-mentioned extraction list again, and add it to the previous list. Furthermore, when the upper limit is exceeded, the extraction unit 132 may extract the number of clusters that fall within the upper limit in descending order of increasing tendency evaluation value.

出力部１４は、発生率の増加傾向が認められる傾向障害のグループであるクラスタの情報の入力を抽出部１３２から受ける。そして、出力部１４は、取得した各クラスタを発生率の増加傾向が認められる傾向障害のグループであるクラスタとして、その情報を検索結果出力用クライアント端末５へ出力する。 The output unit 14 receives input from the extraction unit 132 of information on clusters, which are groups of trend failures in which an increasing trend in incidence is observed. Then, the output unit 14 outputs the information on each acquired cluster to the search result output client terminal 5 as a cluster that is a group of trend failures in which an increasing tendency of occurrence rate is recognized.

利用者は、検索結果出力用クライアント端末５を用いて、発生率の増加傾向が認められる傾向障害のグループであるクラスタの情報を取得する。そして、利用者は、各クラスタの情報を用いてどのような傾向障害が発生しているかを確認する。これにより、利用者は、発生している傾向障害に対する対処を迅速に行うことが可能となる。 The user uses the search result output client terminal 5 to obtain information on clusters, which are groups of trend failures whose incidence rates are increasing. Then, the user uses the information of each cluster to check what kind of trend failures are occurring. This allows the user to quickly take measures against the trending failures that are occurring.

次に、図８を参照して、本実施例に係る傾向障害検出処理の流れを説明する。図８は、実施例１に係る傾向障害算出処理のフローチャートである。 Next, with reference to FIG. 8, the flow of the trend failure detection process according to this embodiment will be described. FIG. 8 is a flowchart of the trend failure calculation process according to the first embodiment.

障害レポート情報取得部１１は、障害レポート入力端末２から送信された障害レポート１０１を収集する。そして、障害レポート情報取得部１１は、収集した障害レポートを障害レポートデータベース３へ送信して、障害レポート１０１のそれぞれを各エントリとする障害レポート１００を障害レポートデータベース３に格納する（ステップＳ１）。 The failure report information acquisition unit 11 collects failure reports 101 sent from the failure report input terminal 2. Then, the failure report information acquisition unit 11 transmits the collected failure reports to the failure report database 3, and stores the failure report 100 having each failure report 101 as an entry in the failure report database 3 (step S1).

階層クラスタリング部１２の文解析部１２１は、傾向障害の検出処理に用いるデータを有する障害レポート１００を障害レポートデータベース３から取得する。そして、文解析部１２１は、障害レポート１００に含まれる各障害レポート１０１を分析して、障害レポート１００に含まれる単語を抽出する。単語ベクトル作成部１２２は、文解析部１２１により抽出された単語を、分散表現を用いて表すことで、単語ベクトルを生成する（ステップＳ２）。その後、単語ベクトル作成部１２２は、生成した単語ベクトルを単語ベクトルデータベース４に格納する。 The sentence analysis unit 121 of the hierarchical clustering unit 12 acquires a failure report 100 having data used for trend failure detection processing from the failure report database 3. The sentence analysis unit 121 then analyzes each failure report 101 included in the failure report 100 and extracts words included in the failure report 100. The word vector generation unit 122 generates a word vector by representing the words extracted by the sentence analysis unit 121 using a distributed representation (step S2). Thereafter, the word vector creation unit 122 stores the generated word vectors in the word vector database 4.

文書ベクトル作成部１２３は、傾向障害の検出に用いる障害レポート１０１である対象文書を、文解析部１２１から取得する。そして、文書ベクトル作成部１２３は、単語ベクトルデータベース４に登録された単語ベクトルを用いて、傾向障害の検出処理の対象文書である各障害レポート１０１の文書ベクトルを作成する（ステップＳ３）。その後、文書ベクトル作成部１２３は、作成した文書ベクトルをデンドログラム作成部１２４へ出力する。 The document vector creation unit 123 acquires the target document, which is the failure report 101 used for detecting trend failures, from the sentence analysis unit 121. Then, the document vector creation unit 123 uses the word vectors registered in the word vector database 4 to create a document vector for each failure report 101, which is a target document for the trend failure detection process (step S3). Thereafter, the document vector creation unit 123 outputs the created document vector to the dendrogram creation unit 124.

デンドログラム作成部１２４は、対象文書群に含まれる各障害レポート１０１の文ベクトルを文書ベクトル作成部１２３から取得する。次に、デンドログラム作成部１２４は、各障害レポート１０１の文書ベクトルを用いての対象文書群の階層クラスタリングを実行して、デンドログラム２００を作成する（ステップＳ４）。その後、デンドログラム作成部１２４は、作成したデンドログラム２００を傾向障害検出部１３の分岐点評価部１３１へ出力する。 The dendrogram creation unit 124 acquires the sentence vector of each failure report 101 included in the target document group from the document vector creation unit 123. Next, the dendrogram creation unit 124 executes hierarchical clustering of the target document group using the document vectors of each failure report 101 to create the dendrogram 200 (step S4). Thereafter, the dendrogram creation unit 124 outputs the created dendrogram 200 to the branch point evaluation unit 131 of the trend failure detection unit 13.

分岐点評価部１３１は、デンドログラム作成部１２４から取得したデンドログラム２００の各分岐点２０２に対応するクラスタの増加傾向評価値を算出する（ステップＳ５）。そして、分岐点評価部１３１は、算出した増加傾向評価値とともに各クラスタの識別情報を抽出部１３２へ出力する。 The branch point evaluation unit 131 calculates the increasing tendency evaluation value of the cluster corresponding to each branch point 202 of the dendrogram 200 obtained from the dendrogram creation unit 124 (step S5). Then, the branch point evaluation unit 131 outputs the identification information of each cluster together with the calculated increasing tendency evaluation value to the extraction unit 132.

抽出部１３２は、デンドログラム２００の分岐点２０２のそれぞれに対応する各クラスタの識別情報及び増加傾向評価値を分岐点評価部１３１から取得する。そして、抽出部１３２は、増加傾向評価値の高い順にクラスタをソートして並べた選択リストを生成する（ステップＳ６）。 The extraction unit 132 acquires the identification information and increasing tendency evaluation value of each cluster corresponding to each of the branch points 202 of the dendrogram 200 from the branch point evaluation unit 131. Then, the extraction unit 132 generates a selection list in which clusters are sorted and arranged in descending order of increasing tendency evaluation value (step S6).

次に、抽出部１３２は、選択リストにおける選択対象のエントリの先頭からの順番を表すｉを１に設定する（ステップＳ７）。 Next, the extraction unit 132 sets i to 1, which represents the order from the beginning of the entries to be selected in the selection list (step S7).

次に、抽出部１３２は、選択リストのｉ番目のエントリの抽出を行う（ステップＳ８）。 Next, the extraction unit 132 extracts the i-th entry in the selection list (step S8).

次に、抽出部１３２は、ｉ番目のエントリの抽出が成功したか否かを判定する（ステップＳ９）。ｉ番目のエントリの抽出に失敗した場合（ステップＳ９：否定）、抽出部１３２は、ステップＳ１３へ進む。 Next, the extraction unit 132 determines whether the i-th entry has been successfully extracted (step S9). If the extraction of the i-th entry fails (step S9: negative), the extraction unit 132 proceeds to step S13.

これに対して、ｉ番目のエントリの抽出に成功した場合（ステップＳ９：肯定）、抽出部１３２は、そのエントリに対応するクラスタの増加傾向評価値を用いて、そのクラスタにおいて有意な増加傾向が存在するか否かを判定する（ステップＳ１０）。そのクラスタにおいて有意な増加傾向が存在しない場合（ステップＳ１０：否定）、抽出部１３２は、ステップＳ１３へ進む。 On the other hand, if the extraction of the i-th entry is successful (step S9: affirmative), the extraction unit 132 uses the increasing tendency evaluation value of the cluster corresponding to that entry to determine whether there is a significant increasing tendency in that cluster. It is determined whether it exists (step S10). If there is no significant increasing trend in that cluster (step S10: negative), the extraction unit 132 proceeds to step S13.

これに対して、そのクラスタにおいて有意な増加傾向が存在する場合（ステップＳ１０：肯定）、抽出部１３２は、抽出したエントリを抽出リストに追加する（ステップＳ１１）。 On the other hand, if there is a significant increasing trend in that cluster (step S10: affirmative), the extraction unit 132 adds the extracted entry to the extraction list (step S11).

次に、抽出部１３２は、抽出したエントリに対応する分岐点２０２の下位の分岐点２０２に対応するエントリを選択リストから楽所する（ステップＳ１２）。さらに、抽出部１３２は、選択リストにおけるエントリが削除された部分を詰めて、選択リストに含まれる各エントリに先頭から順に番号を振り直す。その後、抽出部１３２は、ステップＳ８へ戻る。 Next, the extraction unit 132 selects an entry corresponding to a branch point 202 below the branch point 202 corresponding to the extracted entry from the selection list (step S12). Furthermore, the extraction unit 132 fills in the portion of the selection list where entries have been deleted, and renumbers each entry included in the selection list in order from the beginning. After that, the extraction unit 132 returns to step S8.

一方、エントリの抽出が失敗した場合（ステップＳ９：否定）及び抽出したエントリで表されるクラスタにおいて有意な増加傾向が認められなかった場合（ステップＳ１０：否定）、抽出部１３２は、以下の処理を行う。抽出部１３２は、抽出リストに登録されたエントリに対応するクラスタの情報を出力部１４へ出力する。出力部１４は、抽出部１３２から取得したクラスタを、各クラスタの情報を検索結果出力用クライアント端末５へ傾向障害を表すクラスタとして出力する（ステップＳ１３）。 On the other hand, if entry extraction fails (step S9: negative) or if no significant increasing trend is observed in the cluster represented by the extracted entry (step S10: negative), the extraction unit 132 performs the following processing. I do. The extraction unit 132 outputs cluster information corresponding to the entry registered in the extraction list to the output unit 14. The output unit 14 outputs the clusters acquired from the extraction unit 132 and information on each cluster to the search result output client terminal 5 as clusters representing trend failures (step S13).

以上に説明したように、本実施例に係る傾向障害検出処理では、サーバ装置は、障害レポートのそれぞれの文書ベクトルを求め、その文書ベクトルを用いて階層クラスタリングを行ってデンドログラムを作成する。その後、サーバ装置は、デンドログラムの分岐点に対応するクラスタのうち、優位な増加傾向が認められるクラスタであって、その分岐点を含む階層構造において増加傾向評価値が最大となるクラスタを抽出してその情報を通知する。 As described above, in the trend failure detection process according to the present embodiment, the server device obtains a document vector for each failure report, performs hierarchical clustering using the document vector, and creates a dendrogram. After that, the server device extracts a cluster that has a dominant increasing tendency among the clusters corresponding to the branching points of the dendrogram and has the highest increasing tendency evaluation value in the hierarchical structure that includes the branching point. and notify you of that information.

これにより、増加傾向が最も強く現れる内容粒度のクラスタを自動的に作成することができ、増加傾向にある不具合を高精度に検出することが可能となる。また、同期語辞書などの整備が不要なため人的コストを抑えることができる。したがって、高精度な傾向障害の検出により、不具合の発生に迅速かつ適切に対処することができ、製品の品質を向上させることが可能となる。 As a result, it is possible to automatically create clusters with a content granularity that shows the strongest increasing tendency, and it is possible to detect defects that are increasing with high accuracy. Furthermore, since there is no need to maintain synchronized word dictionaries, human costs can be reduced. Therefore, by detecting trend failures with high accuracy, it is possible to quickly and appropriately deal with the occurrence of defects, and it is possible to improve product quality.

次に、実施例２について説明する。実施例１では、増加傾向にある障害の検出処理を例に説明したが、デンドログラムの分岐点２０２で算出する評価値を変えることにより他の検出処理にも、本実施例で説明した手法を適用することも可能である。その場合、評価値は、特定のクラスタにおいて、クラスタ内の障害レポート１０１及びクラスタ外の障害レポート１０１と顧客名とのクロス集計結果から算出されるカイ二乗統計量である。本実施例では、特定顧客に特異的に多い障害を発見する処理について説明する。本実施例に係るサーバ装置１も図３及び４で表される。以下の説明では、実施例１と同様の各部の動作は説明を省略する場合がある。 Next, Example 2 will be explained. In Example 1, the detection process for a disorder that is on the rise is explained as an example, but the method described in this example can be applied to other detection processes by changing the evaluation value calculated at the branch point 202 of the dendrogram. It is also possible to apply In that case, the evaluation value is a chi-square statistic calculated from the cross-tabulation results of the failure report 101 within the cluster, the failure report 101 outside the cluster, and the customer name in a specific cluster. In this embodiment, a process for discovering failures that are uniquely common to specific customers will be described. The server device 1 according to this embodiment is also shown in FIGS. 3 and 4. In the following description, descriptions of the operations of the same parts as in the first embodiment may be omitted.

図９は、特定顧客に特異的に多い障害の検出処理における階層クラスタの一部を表す図である。図９では、クラスタ毎に、顧客Ａで発生した障害と、顧客Ａ以外の顧客で発生した障害と、そのクラスタに含まれない顧客Ａで発生した障害と、そのクラスタに含まれない顧客Ａ以外の顧客で発生した障害と、そのクラスタでのカイ二乗統計量が示される。 FIG. 9 is a diagram showing part of a hierarchical cluster in the process of detecting failures that are uniquely common to specific customers. In Figure 9, for each cluster, there are failures that occur at customer A, failures that occur at customers other than customer A, failures that occur at customer A that is not included in that cluster, and customers other than A that are not included in that cluster. It shows the failures that occurred for customers and the chi-square statistic for that cluster.

例えば，クラスタ３０１は、顧客Ａで発生した８件の障害及びそれ以外の顧客で発生した２件の障害の計１０件の障害を含むクラスタである。そして、クラスタ３０１に含まれない障害には、顧客Ａで発生した１７件の障害及びそれ以外の顧客で発生した９９９７３件の障害が含まれる。そして、クラスタ３０１のけるカイ二乗統計量は、２２４９３である。 For example, cluster 301 is a cluster that includes a total of 10 failures: 8 failures that occurred with customer A and 2 failures that occurred with other customers. The failures not included in the cluster 301 include 17 failures that occurred with customer A and 99973 failures that occurred with other customers. The chi-square statistic of the cluster 301 is 22,493.

階層クラスタリング部１２のデンドログラム作成部１２４は、実施例１と同様に、検出処理で用いられる障害レポート１０１の文書ベクトルを用いて階層クラスタリングを実行して、図９で示すような階層を一部に有するデンドログラム２００を作成する。 Similarly to the first embodiment, the dendrogram creation unit 124 of the hierarchical clustering unit 12 executes hierarchical clustering using the document vector of the failure report 101 used in the detection process, and partially creates a hierarchy as shown in FIG. A dendrogram 200 having the following information is created.

傾向障害検出部１３の分岐点評価部１３１は、デンドログラム２００の各分岐点２０２に対応するクラスタ毎の評価値であるカイ二乗統計量を算出する。 The branch point evaluation unit 131 of the trend failure detection unit 13 calculates a chi-square statistic that is an evaluation value for each cluster corresponding to each branch point 202 of the dendrogram 200.

カイ二乗統計量は、値が大きいほど顧客Ａの特異性が強いことを表す。すなわち、図９では、クラスタ３０２が、顧客Ａの特異性が最も強く現れている。抽出部１３２は、顧客Ａで発生した障害において有意な特異性が認められ、且つ、そのクラスタを含む階層構造において、そのクラスタのカイ二乗統計量で表される評価値が最大となる分岐点２０２に対応するクラスタを抽出する。これにより、顧客Ａに特異的に多い障害を表すクラスタが抽出される。 The larger the value of the chi-square statistic, the stronger the specificity of customer A is. That is, in FIG. 9, cluster 302 exhibits the strongest peculiarity of customer A. The extraction unit 132 extracts a branching point 202 where significant specificity is recognized in the failure that occurred at customer A, and where the evaluation value expressed by the chi-square statistic of that cluster is maximum in the hierarchical structure that includes that cluster. Extract the cluster corresponding to . As a result, clusters representing failures that are uniquely common to customer A are extracted.

利用者は、サーバ装置１により抽出されたクラスタで表される障害を確認することで、特定の顧客において特異的に多い障害を発見することができ、特定の顧客に対して迅速で適切な対応を行うことができる。 By checking the failures represented by the clusters extracted by the server device 1, the user can discover failures that are uniquely common in a particular customer, and can respond quickly and appropriately to the particular customer. It can be performed.

（ハードウェア構成）
図１０は、サーバ装置のハードウェア構成図である。図１０に示すように、サーバ装置１は、ＣＰＵ９１、メモリ９２、ハードディスク９３及びネットワークインタフェース９４を有する。ＣＰＵ９１は、バスを介して、メモリ９２、ハードディスク９３及びネットワークインタフェース９４に接続される。 (Hardware configuration)
FIG. 10 is a hardware configuration diagram of the server device. As shown in FIG. 10, the server device 1 includes a CPU 91, a memory 92, a hard disk 93, and a network interface 94. The CPU 91 is connected to a memory 92, a hard disk 93, and a network interface 94 via a bus.

ネットワークインタフェース９４は、ネットワーク６及び検索結果出力用クライアント端末５との通信用のインタフェースである。ＣＰＵ９１は、ネットワークインタフェース９４を介して、障害レポート入力端末２や検索結果出力用クライアント端末５と通信を行う。 The network interface 94 is an interface for communication with the network 6 and the search result output client terminal 5. The CPU 91 communicates with the failure report input terminal 2 and the search result output client terminal 5 via the network interface 94 .

ハードディスク９３は、補助記憶装置である。ハードディスク９３は、図３に例示した障害レポート情報取得部１１、階層クラスタリング部１２、傾向障害検出部１３及び出力部１４の機能を実現するためのプログラムを含む各種プログラムを格納する。 The hard disk 93 is an auxiliary storage device. The hard disk 93 stores various programs including programs for realizing the functions of the failure report information acquisition section 11, the hierarchical clustering section 12, the trend failure detection section 13, and the output section 14 illustrated in FIG.

また、本実施例では、障害レポートデータベース３及び単語ベクトルデータベース４をサーバ装置１の外部に配置したが、サーバ装置１がそれらを保持する構成でもよい。その場合、ハードディスク９３が、障害レポートデータベース３及び単語ベクトルデータベース４の機能を実現する。 Further, in this embodiment, the failure report database 3 and the word vector database 4 are placed outside the server device 1, but the server device 1 may have a configuration in which they are retained. In that case, the hard disk 93 implements the functions of the failure report database 3 and the word vector database 4.

ＣＰＵ９１は、ハードディスク９３に格納された各種プログラムを読み出してメモリ９２に展開して実行する。これにより、ＣＰＵ９１及びメモリ９２は、図３に例示した障害レポート情報取得部１１、階層クラスタリング部１２、傾向障害検出部１３及び出力部１４の機能を実現する。 The CPU 91 reads various programs stored in the hard disk 93, expands them into the memory 92, and executes them. Thereby, the CPU 91 and the memory 92 realize the functions of the failure report information acquisition unit 11, the hierarchical clustering unit 12, the trend failure detection unit 13, and the output unit 14 illustrated in FIG.

１サーバ装置
２障害レポート入力端末
３障害レポートデータベース
４単語ベクトルデータベース
５検索結果出力用クライアント端末
６ネットワーク
１０傾向障害検出システム
１１障害レポート情報取得部
１２階層クラスタリング部
１３傾向障害検出部
１４出力部
１２１文解析部
１２２単語ベクトル作成部
１２３文書ベクトル作成部
１２４デンドログラム作成部
１３１分岐点評価部
１３２抽出部 1 Server device 2 Failure report input terminal 3 Failure report database 4 Word vector database 5 Client terminal for outputting search results 6 Network 10 Trend failure detection system 11 Failure report information acquisition unit 12 Hierarchical clustering unit 13 Trend failure detection unit 14 Output unit 121 Sentences Analysis section 122 Word vector creation section 123 Document vector creation section 124 Dendrogram creation section 131 Branch point evaluation section 132 Extraction section

Claims

Get multiple document information,
digitizing each content of the document information to calculate quantified information;
Performing hierarchical clustering based on the digitized information to generate dendrogram information indicating a dendrogram;
identifying features of the document information belonging to clusters corresponding to a plurality of branch points of the tree diagram indicated by the tree diagram information;
A specific cluster included in the cluster corresponding to a plurality of branch points included in the tree diagram indicated by the tree diagram information, the feature having significance, and a hierarchy including the corresponding branch point. extracting the specific cluster having the feature stronger than the cluster corresponding to the upper branching point and the lower branching point in the structure;
An information processing program that causes a computer to execute a process of outputting the extracted information on the specific cluster.

The document information is information associated with time,
Based on the time associated with the document information, identifying an increase/decrease trend in events indicated by the document information as the feature;
Extracting the specific cluster in which the increase/decrease tendency has significance and the increase/decrease tendency is stronger than the cluster corresponding to the upper branch point and the lower branch point in the hierarchical structure including the corresponding branch point. The information processing program according to claim 1, causing a computer to execute the processing.

The document information is information indicating an event associated with an occurrence location,
Identifying the occurrence tendency of the event indicated by the document information as the characteristic at the specific occurrence location,
Extracting the specific cluster in which the occurrence tendency has significance and the occurrence tendency is stronger than the cluster corresponding to the upper branch point and the lower branch point in the hierarchical structure including the corresponding branch point. The information processing program according to claim 1, causing a computer to execute the processing.

A claim characterized in that a statistical evaluation value indicating the feature is calculated, and a computer is caused to execute a process of determining the prior significance of the feature and the strength of the feature based on the calculated evaluation value. The information processing program described in any one of Items 1 to 3.

The information processing according to any one of claims 1 to 4, characterized in that the computer executes a process of performing natural language processing and obtaining each document vector as numerical information based on the content of the document information. program.

Get multiple document information,
digitizing each content of the document information to calculate quantified information;
Performing hierarchical clustering based on the digitized information to generate dendrogram information indicating a dendrogram;
identifying features of the document information belonging to clusters corresponding to a plurality of branch points of the tree diagram indicated by the tree diagram information;
A specific cluster included in the cluster corresponding to a plurality of branch points included in the tree diagram indicated by the tree diagram information, the feature having significance, and a hierarchy including the corresponding branch point. extracting the specific cluster having the feature stronger than the cluster corresponding to the upper branching point and the lower branching point in the structure;
outputting information on the extracted specific cluster;
An information processing method characterized by causing a computer to perform processing .

a digitization unit that obtains a plurality of pieces of document information and digitizes the content of each of the document information to calculate digitized information;
a dendrogram creation unit that performs hierarchical clustering based on the numerical information to generate dendrogram information indicating a dendrogram;
a branching point evaluation unit that identifies characteristics of the document information belonging to a cluster corresponding to a plurality of branching points of the tree diagram indicated by the tree diagram information;
A specific cluster included in the cluster corresponding to a plurality of branch points included in the tree diagram indicated by the tree diagram information, the feature having significance, and a hierarchy including the corresponding branch point. an extraction unit that extracts the specific cluster having the characteristic stronger than the cluster corresponding to the upper branching point and the lower branching point in the structure;
An information processing device comprising: an output unit that outputs the extracted information on the specific cluster.