JP2016133899A

JP2016133899A - Information processor, information processing method, and program

Info

Publication number: JP2016133899A
Application number: JP2015006910A
Authority: JP
Inventors: 将史瀧本; Masafumi Takimoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-01-16
Filing date: 2015-01-16
Publication date: 2016-07-25
Anticipated expiration: 2035-01-16
Also published as: JP6602013B2

Abstract

【課題】構成する軸同士の関連性や類似性が考慮された比較認識を容易に行うことが可能になるチャート（グラフ）を生成できるようにする。
【解決手段】入力される多変量データから抽出した複数の特徴量間の類似度を算出する特徴量間距離算出部と、算出した特徴量間の類似度を基に次元数を削減する次元削減部と、次元削減の結果を基に特徴量に対応する軸の配置を決定するチャート形状決定部と、決定した軸の配置に従って多変量データを描画したチャートを出力する出力部とを有し、一瞥しただけでは把握しにくい多変量データであっても直感的に把握しやすい表現形態にて表示できるようにする。
【選択図】図１It is possible to generate a chart (graph) capable of easily performing comparative recognition in consideration of relevance and similarity between constituent axes.
A feature distance calculation unit that calculates a similarity between a plurality of feature values extracted from input multivariate data, and a dimension reduction that reduces the number of dimensions based on the calculated similarity between feature values. A chart shape determining unit that determines the arrangement of the axes corresponding to the feature amount based on the result of dimension reduction, and an output unit that outputs a chart in which multivariate data is drawn according to the determined axis arrangement, Even multivariate data that is difficult to grasp with a glance can be displayed in an easy-to-understand expression form.
[Selection] Figure 1

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

人がデータ解析を行う際、数字の羅列をそのまま眺めるのではなくデータを人が把握しやすい形態で表示するための視覚表現として多様なグラフ（graph）又はチャート（chart）による表現方法を使うのが一般的である。対象となるデータや把握したい情報の性質によって適切に採用された可視化方法でデータを把握しやすくすることにより単なる数字の羅列では気付かないデータの傾向や情報を読み取ることが可能となる。ビッグデータの活用によって新しい問題解決方法や付加価値を提供することが昨今の技術領域のメインストリームになりつつある。例えば、多変量のデータを効果的に可視化して人の目による分析をサポートすることが重要となっている。 When people perform data analysis, they do not look at the enumeration of numbers as they are, but use various graphs or charts as visual representations to display data in a form that is easy for humans to understand. Is common. By making it easy to grasp the data with a visualization method appropriately adopted according to the target data and the nature of the information to be grasped, it becomes possible to read data trends and information that are not noticed by simply enumerating numbers. Providing new problem-solving methods and added value by utilizing big data is becoming the mainstream of recent technological fields. For example, it is important to effectively visualize multivariate data to support human eye analysis.

一般的に、データを複数の項目で分析する場合、円グラフ（Pie Chart）や棒グラフ（Bar Chart、Bar Graph）等が用いられる。また、レーダチャート（Radar Chart）や平行座標プロット（Parallel Coordinate Plot）等が用いられる。これらのうち、棒グラフやレーダチャート、平行座標プロットは、多変量データの関係をそのまま知ることができる方法である。これら３種のグラフは、さまざまな評価に使用できるデータ可視化方法であり、一般的には多変量を比較するのに使用される。また、グラフ内の複数の項目同士を比較することもでき、同じ項目で評価値を出した他のグラフを比較することもできる。 Generally, when analyzing data with a plurality of items, a pie chart, a bar chart (Bar Chart, Bar Graph), or the like is used. In addition, a radar chart, a parallel coordinate plot, or the like is used. Of these, bar graphs, radar charts, and parallel coordinate plots are methods that allow the relationship between multivariate data to be known as they are. These three graphs are data visualization methods that can be used for various evaluations and are generally used to compare multivariates. In addition, a plurality of items in the graph can be compared with each other, and other graphs having evaluation values for the same item can be compared.

棒グラフや平行座標プロット、レーダチャートは、多変量解析に利用されるデータ可視化法である一方、変量の数の増加とともに軸数が増えてグラフ形状が複雑化していくため、人が一度に全体を把握することが困難となるということが知られている。認知神経科学で良く知られた事実として『マジック・ナンバー７±２説』（非特許文献１）というものがある。これは、人が情報を認識する際、７±２のチャンクに納まる情報であればうまく認識できるが、それ以上になると難しくなるという説である。この説によると、例えばグラフ形状の直感的な比較に優れているレーダチャートであっても、複数のグラフ形状比較をする際、隣り合う軸同士の意味的なまとまり（チャンク）のない一般的なレーダチャートの場合は各軸が１チャンクとなる。そのため、軸数が７±２を超えたレーダチャートは人の形状認識能力を超えている可能性が高いことになる。 Bar graphs, parallel coordinate plots, and radar charts are data visualization methods used for multivariate analysis, but as the number of variables increases, the number of axes increases and the shape of the graph becomes more complex. It is known that it is difficult to grasp. A well-known fact in cognitive neuroscience is the “Magic Number 7 ± 2 theory” (Non-Patent Document 1). This is a theory that when a person recognizes information, it can be recognized well if it is information contained in 7 ± 2 chunks, but it becomes difficult if it exceeds that. According to this theory, for example, even when a radar chart is excellent in intuitive comparison of graph shapes, when comparing a plurality of graph shapes, there is a general case where there is no semantic chunk between adjacent axes. In the case of a radar chart, each axis is one chunk. Therefore, the radar chart having the number of axes exceeding 7 ± 2 is highly likely to exceed the human shape recognition ability.

下記特許文献１には、多変量データの可視化の際にデータを表現する特徴量の数の増加とともに横方向に伸びていく平行座標プロットにおいて、部分的な平行座標プロットを取り出して配置し、人の認識しやすいグラフを作成する技術が提案されている。また、下記非特許文献２には、レーダチャートの軸に割りつける変数の順番が重要であると説き、これらをＰＣＡによって近くなったものをレーダチャート上での軸で近くになるように設定することが説明されている。 In the following Patent Document 1, partial parallel coordinate plots are extracted and arranged in parallel coordinate plots that extend in the horizontal direction as the number of feature quantities representing the data is increased when multivariate data is visualized. A technique for creating a graph that is easy to recognize is proposed. Non-Patent Document 2 below describes that the order of variables assigned to the axes of the radar chart is important, and sets those close by PCA to be close to the axes on the radar chart. It has been explained.

特開２０１３−１６１２２６号公報JP2013-161226A

“The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information", George A. Miller, The Psychological Review , 1956“The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for Processing Information”, George A. Miller, The Psychological Review, 1956 『多変量解析事例集第１集』（吉澤・芳賀編，日科技連，1992）"Multivariate Analysis Case Collection Vol. 1" (Yoshizawa and Haga, Hikaru Girenren, 1992)

ここで、レーダチャート、平行座標プロット、棒グラフや棒グラフを折れ線表示したものすべてにおいて言える課題は、データ可視化を形状として把握しやすく可視化している傍ら、軸の相対的な位置や間隔（幅や角度）に情報価値がないことである。これらのグラフは、多変量の項目間を連結した線によって（棒グラフの場合は棒の並びによって）形状が構成されるにも関わらず、偶然隣り合った項目を結ぶことによって形状が確定する。そのため、可視化によって得られる情報において人の認識にとっては隣り合う軸同士の関係性を重点的に認識することになる。例えば、棒グラフでは遠くの棒同士の高さの差は近くの棒同士の差に比べて直感的な大小把握が難しくなり、平行座標プロットでは隣同士の変数間に相関があるかどうかしか判らない。よって、詳細にスコアの差を比較する必要のない項目を離したり、微小な差異を詳細に比較すべき項目を近くにしたりするといった、多変量データを効果的に可視化する際の工夫が必要となってくる。 Here, the problem that can be said in all radar charts, parallel coordinate plots, bar graphs and bar graphs that are displayed as lines is that the data visualization is easy to grasp as a shape, while the relative position and spacing of axes (width and angle) ) Has no information value. Although these graphs are formed by lines connecting multivariate items (in the case of a bar graph, by the arrangement of bars), the shapes are determined by connecting adjacent items by chance. Therefore, in the information obtained by visualization, for human recognition, the relationship between adjacent axes is recognized with priority. For example, in a bar graph, the height difference between distant bars is more difficult to grasp intuitively than the difference between nearby bars, and the parallel coordinate plot only tells whether there is a correlation between adjacent variables. . Therefore, it is necessary to devise methods for effectively visualizing multivariate data, such as separating items that do not need to compare score differences in detail, or making items that should be compared in detail close to each other. It becomes.

前記特許文献１に記載の平行座標プロットを部分的に取り出して配置する方法では、人が注目すべき部分が絞られるため、部分的な平行座標プロットに関して比較しやすくなる。しかし、グラフ全体を俯瞰してデータを把握するような場合には、多変量の部分的組合せを複数可視化して一つのグラフとするため、一瞥して情報を直感的に把握するのには不向きなグラフになったり、見落としが発生しやすくなったりする可能性がある。また、前記非特許文献２のレーダチャートの変数の順序決定方法によっても、近くにまとめるべき変数成分の見当がつくだけであり、それら変数を表す軸間の類似度等を反映したチャートを得ることができない。 In the method of partially extracting and arranging the parallel coordinate plots described in Patent Document 1, since a portion that should be noted by a person is narrowed down, it becomes easy to compare partial parallel coordinate plots. However, if you want to grasp the data from a bird's-eye view of the entire graph, it is not suitable for intuitively grasping information at a glance because multiple partial combinations of multivariate are visualized into one graph. May be easy to overlook, and oversight may occur. In addition, according to the method for determining the order of the variables in the radar chart of Non-Patent Document 2, only the variable components to be grouped together can be obtained, and a chart reflecting the similarity between the axes representing these variables can be obtained. I can't.

本発明は、このような事情に鑑みてなされたものであり、構成する軸同士の関連性や類似性が考慮された比較認識を容易に行うことが可能になるチャート（グラフ）を生成できるようにすることを目的とする。 The present invention has been made in view of such circumstances, and can generate a chart (graph) that can easily perform comparative recognition in consideration of the relevance and similarity between constituent axes. The purpose is to.

本発明に係る情報処理装置は、入力される多変量データから抽出した複数の特徴量間の類似度を算出する算出手段と、前記算出手段により算出した前記特徴量間の類似度及び前記多変量データを基に次元数を削減する次元削減手段と、前記次元削減手段による次元削減の結果を基に、前記複数の特徴量のそれぞれに対応する軸の配置を決定する形状決定手段と、前記形状決定手段により決定した軸の配置に従って前記多変量データを描画した図を出力する出力手段とを有することを特徴とする。 The information processing apparatus according to the present invention includes a calculation unit that calculates a similarity between a plurality of feature amounts extracted from input multivariate data, a similarity between the feature amounts calculated by the calculation unit, and the multivariate Dimension reduction means for reducing the number of dimensions based on data, shape determination means for determining an arrangement of axes corresponding to each of the plurality of feature amounts based on a result of dimension reduction by the dimension reduction means, and the shape Output means for outputting a drawing in which the multivariate data is drawn according to the arrangement of the axes determined by the determining means.

本発明によれば、一瞥しただけでは把握しにくい多変量データであっても直感的に把握しやすい表現形態にて表示することができ、データ分析がしやすくなる効果が得られる。 According to the present invention, even multivariate data that is difficult to grasp with a glance can be displayed in an easy-to-understand expression form, and an effect of facilitating data analysis can be obtained.

本実施形態における情報処理装置としての多変量データ可視化装置の構成例を示す図である。It is a figure which shows the structural example of the multivariate data visualization apparatus as an information processing apparatus in this embodiment. 本実施形態における処理動作の例を示すフローチャートである。It is a flowchart which shows the example of the processing operation in this embodiment. 異常パターン毎にレーダチャートの形状が異なる例を示す図である。It is a figure which shows the example from which the shape of a radar chart differs for every abnormal pattern. 異なる性質のデータの判別に不向きなレーダチャートの例を示す図である。It is a figure which shows the example of the radar chart unsuitable for discrimination | determination of the data of a different property. 図４に示したレーダチャートを、異なる性質のデータを判別しやすく変換を施した例を示す図である。FIG. 5 is a diagram showing an example in which the radar chart shown in FIG. 4 is converted so that data of different properties can be easily identified. 第１の実施形態に係るレーダチャートの生成手順の例を示す図である。It is a figure which shows the example of the production | generation procedure of the radar chart which concerns on 1st Embodiment. 第１の実施形態によるグラフの例を示す図である。It is a figure which shows the example of the graph by 1st Embodiment. 第１の実施形態に係るレーダチャートの生成手順の他の例を示す図である。It is a figure which shows the other example of the production | generation procedure of the radar chart which concerns on 1st Embodiment. 第１の実施形態に係るレーダチャートの生成手順の他の例を示す図である。It is a figure which shows the other example of the production | generation procedure of the radar chart which concerns on 1st Embodiment. 第１の実施形態に係る表示形式を選択可能なＧＵＩの例を示す図である。It is a figure which shows the example of GUI which can select the display format which concerns on 1st Embodiment. 本実施例における情報処理装置を実現可能なコンピュータ機能を示す図である。It is a figure which shows the computer function which can implement | achieve the information processing apparatus in a present Example.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
本発明の第１の実施形態について説明する。第１の実施形態は、複数の特徴量で表現される複数のデータが有り、各々のデータがそれらの中でどのような傾向にあるかを分析したい場合に、人が見て直感的に判り易い形でのレーダチャートやそれに類するチャートを生成する方法である。 (First embodiment)
A first embodiment of the present invention will be described. In the first embodiment, when there is a plurality of data expressed by a plurality of feature amounts and it is desired to analyze what kind of tendency each data has among them, it is intuitively understood by humans. This is a method of generating a radar chart or a similar chart in an easy form.

一例として、入力画像から複数の特徴量を抽出して自動で検査を行う外観検査でのユースケースを挙げて説明する。入力画像の正常・異常を自動で判定する外観検査では、入力画像から判定に必要な特徴量を複数抽出し、それらのスコア（評価値）を総合的に判断して正常画像か欠陥画像かを判定する。抽出される特徴量の示す各スコアと検出される異常の種類には相関があり、生産現場では判定結果である正常・異常のラベル以外にも判定スコアや元の抽出特徴軸におけるスコアにより測定の難しい欠陥の傾向を分析することができる。これによって、例えば生産ライン設計にフィードバックをかけることができる。 As an example, a use case in an appearance inspection in which a plurality of feature amounts are extracted from an input image and automatically inspected will be described. In an appearance inspection that automatically determines whether an input image is normal or abnormal, a plurality of feature values necessary for determination are extracted from the input image, and the score (evaluation value) is comprehensively determined to determine whether the image is normal or defective. judge. There is a correlation between each score indicated by the extracted feature quantity and the type of abnormality detected. In addition to the normal / abnormal labels that are the judgment results at the production site, the measurement is based on the judgment score and the score on the original extracted feature axis. The tendency of difficult defects can be analyzed. Thereby, for example, feedback can be applied to the production line design.

特に、人の目による官能目視検査を自動化した外観検査装置では複雑なパターンかつ多様なレベルの異常データが存在するため、異常と判定した画像だけを見ても何を根拠に異常と判定したのか分析するのが難しい。そこで、抽出特徴を一覧性の高いチャート（グラフ）によって可視化することで、画像を見るだけではわかりにくかった傾向の把握を容易にすることができる。 In particular, visual inspection equipment that automated sensory visual inspection with the human eye has complex patterns and various levels of abnormal data, so what is the reason for determining abnormalities based only on images that are determined to be abnormal? Difficult to analyze. Therefore, by visualizing the extracted features with a chart (graph) having a high listability, it is possible to easily grasp a tendency that is difficult to understand simply by looking at the image.

図３に一例として、各々の入力画像から抽出した特徴量のスコアの傾向の違いを直感的に知ることができる例としてレーダチャートの形状の違いが異常パターン毎に異なる様子を示した。抽出特徴の数が５個である場合を示しており、各々の抽出特徴にはＩＤが番号によって付与されている。図３（Ａ）は、入力画像としての正常画像及びそれに対応するレーダチャートの例を示しており、図３（Ｂ）は、入力画像としてのムラ欠陥画像及びそれに対応するレーダチャートの例を示している。また、図３（Ｃ）は、入力画像としてのキズ欠陥画像及びそれに対応するレーダチャートの例を示しており、図３（Ｄ）は、入力画像としての異物欠陥画像及びそれに対応するレーダチャートの例を示している。図３（Ａ）〜図３（Ｄ）に示すように、それぞれのレーダチャートの形状は異なっている。 As an example, FIG. 3 shows a state in which the difference in the shape of the radar chart differs for each abnormal pattern as an example in which the difference in the tendency of the score of the feature amount extracted from each input image can be intuitively known. A case where the number of extracted features is five is shown, and an ID is assigned to each extracted feature by a number. 3A shows an example of a normal image as an input image and a corresponding radar chart, and FIG. 3B shows an example of a mura defect image as an input image and a corresponding radar chart. ing. FIG. 3C shows an example of a flaw defect image as an input image and a corresponding radar chart. FIG. 3D shows a foreign object defect image as an input image and a corresponding radar chart. An example is shown. As shown in FIGS. 3A to 3D, the shapes of the respective radar charts are different.

このように、適切なレーダチャートが設定されると、ユーザがレーダチャートの形状を見るだけで欠陥の傾向を直感的に把握するのに役立つ。ここでは見易さのために各欠陥の事例は画像だけを見て判る入力画像を示したが、生産現場で発生する欠陥サンプルには人の目には正常サンプルとほとんど外観の変わらないようなものも存在する。そのような場合に、そのサンプルが何の異常の傾向を持つのか、といった確認や分析の際にもレーダチャートの形状を見るだけで直感的に理解することが可能となる。その他、こういった情報の可視化によって複雑な判定処理アルゴリズム自体をユーザが理解していなくとも、アルゴリズムがどのように画像を判定しているかを直感的にチェックすることができるため、検査装置の信頼度の向上や不具合の早期発見に役立つ。 As described above, when an appropriate radar chart is set, it is useful for the user to intuitively grasp the defect tendency simply by looking at the shape of the radar chart. Here, for the sake of clarity, each defect case is shown as an input image that can be understood only by looking at the image. However, the defect sample that occurs at the production site has almost the same appearance as a normal sample. There are also things. In such a case, it is possible to intuitively understand only by looking at the shape of the radar chart when confirming or analyzing what kind of abnormality the sample has. In addition, since the visualization of such information enables the user to intuitively check how the algorithm determines the image without understanding the complicated determination processing algorithm itself, the reliability of the inspection device It helps to improve the degree and detect defects early.

しかし、図３に示したような抽出特徴の種類が５個程度と少なく、それぞれが各欠陥の傾向を把握するのに充分であるならば良いが、一般的に多様な異常パターンに対応することを前提とした識別器では抽出特徴の数は数十から数百まで増加することが多い。このような場合に何の工夫も無くレーダチャートを作ると、図３に示したように一瞥して異常の傾向を把握できるようなチャートとは異なるチャートが生成されることが多い。 However, the number of extracted features as shown in FIG. 3 is as few as five, and it is sufficient if each of them is sufficient to grasp the tendency of each defect, but generally it can cope with various abnormal patterns. In many cases, the number of extracted features increases from several tens to several hundreds. In such a case, if a radar chart is created without any ingenuity, a chart different from a chart that can be grasped at a glance as shown in FIG. 3 is often generated.

例として、抽出特徴の数が３５個である場合に、単純に抽出した特徴量を抽出特徴順に時計回りに軸を配しただけのレーダチャートを描画したことによって欠陥の傾向によるグラフの形状の違いを把握しにくくなる典型例を図４に示す。図４（Ａ）〜図４（Ｄ）に示すレーダチャートは、それぞれ異なる理由で欠陥とされた検査対象の画像を入力としたときに生成されたグラフである。生産現場において１日に大量の検査対象がラインを流れていく中で人がこれら４つの形状の違いを一瞥して把握し、瞬時にそれぞれがどういった欠陥を意味しているのかを理解するのは困難である。 As an example, when the number of extracted features is 35, the difference in graph shape due to the tendency of defects by drawing a radar chart in which the extracted feature quantities are simply arranged clockwise in the order of the extracted features FIG. 4 shows a typical example in which it is difficult to grasp the above. The radar charts shown in FIGS. 4A to 4D are graphs generated when an image of an inspection target that is regarded as a defect for different reasons is input. As a large number of inspection objects flow along the line at the production site in a day, a person grasps the difference between these four shapes at a glance, and immediately understands what kind of defect each means. It is difficult.

そこで、本実施形態では、グラフに対して以降で説明する処理を施すことによって直感的に形状の違いを把握しやすいグラフに変換する。図５（Ａ）〜図５（Ｄ）に、図４（Ａ）〜図４（Ｄ）を各軸が示すスコアはそのまま保存しながら、それぞれ認識しやすい形状になるように軸の順を並べ替えたレーダチャートを示す。これによると図４（Ａ）〜図４（Ｄ）においては、不明瞭であった性質がよく判るようになる。例えば、図５（Ａ）、図５（Ｂ）、図５（Ｃ）はそれぞれ異なる欠陥の種類を示す画像に対応するチャートであったこと、図５（Ｄ）は、図５（Ａ）及び図５（Ｂ）に示される２つの欠陥種が複合している可能性が高いこと等が即座に理解しやすい。 Therefore, in the present embodiment, the graph is converted into a graph that makes it easy to grasp the difference in shape intuitively by performing processing described below on the graph. 5A to 5D, the order of the axes is arranged so that each of the axes shown in FIGS. 4A to 4D can be easily recognized while keeping the score indicated by each axis as it is. The changed radar chart is shown. According to this, in FIG. 4 (A) to FIG. 4 (D), the unclear property can be clearly understood. For example, FIG. 5A, FIG. 5B, and FIG. 5C are charts corresponding to images showing different types of defects, and FIG. 5D is a chart corresponding to FIG. It is easy to immediately understand that there is a high possibility that the two defect types shown in FIG.

つまり、たとえ抽出特徴の数が増えたとしても、図５（Ａ）〜図５（Ｄ）に示すように同じ欠陥信号に反応する特徴量を表す軸をそれぞれ近くに配置するだけで、形状を見るだけで欠陥の傾向を把握できるレーダチャートを作成することができるようになる。以下では、見やすいレーダチャート生成アルゴリズムの例を詳細に説明する。 That is, even if the number of extracted features increases, as shown in FIG. 5 (A) to FIG. 5 (D), as shown in FIGS. It becomes possible to create a radar chart that can grasp the tendency of defects just by looking. Hereinafter, an example of an easy-to-see radar chart generation algorithm will be described in detail.

前述したとおり、見やすいレーダチャートを生成するためには、関連性の高い軸を近くに配置することが好ましい。決まった傾向の入力欠陥信号に対して決まった反応をする特徴量があれば、それらの軸が近くに配置されたレーダチャートでは該当の箇所がグラフの形状としていつも連動して変形する。そのため、ひとまとまりの領域に同じ意味付けをすることで数少ない部分領域を人は意識してグラフを読み取れば良いことになるためである。逆に、連動して反応する軸が散在したレーダチャートでは同じような意味を持つ特徴量の集合をまとめて認識することが人にとって困難となる。 As described above, in order to generate an easy-to-read radar chart, it is preferable to arrange highly relevant axes close to each other. If there is a feature quantity that has a predetermined reaction to an input defect signal having a predetermined tendency, in the radar chart in which those axes are arranged close to each other, the corresponding portion is always deformed in conjunction with the shape of the graph. For this reason, by giving the same meaning to a group of areas, it is sufficient for a person to read the graph while paying attention to a few partial areas. On the other hand, it is difficult for a person to collectively recognize a set of feature values having the same meaning in a radar chart in which axes that react in conjunction are scattered.

ここで処理の流れに関して、図１の本実施形態における情報処理装置としての多変量データ可視化装置の構成例を示すブロック図、及び図２のフローチャートを用いて説明する。多変量データ入力部１０１にて、画像等の対象となるサンプルからｎ個の特徴量が抽出されたデータが入力される（Ｓ２０１）。そして、入力されたｎ個の特徴量を基に特徴量間距離算出部１０２にて各々の特徴量同士の類似度（距離）を計算し（Ｓ２０２）、得られた特徴量間の類似度を尺度とする特徴空間で表現された各特徴量を代表するｎ個のベクトルが獲得される。 Here, the flow of processing will be described with reference to a block diagram showing a configuration example of a multivariate data visualization apparatus as an information processing apparatus in the present embodiment of FIG. 1 and a flowchart of FIG. In the multivariate data input unit 101, data obtained by extracting n feature values from a target sample such as an image is input (S201). Then, the inter-feature amount distance calculation unit 102 calculates the similarity (distance) between the respective feature amounts based on the inputted n feature amounts (S202), and calculates the similarity between the obtained feature amounts. N vectors representing each feature amount expressed in the feature space as a scale are acquired.

このｎ個のベクトルを次元削減部１０３にて原点から等距離に点が並ぶように２次元（又は３次元）に次元削減を行う（Ｓ２０３）。このとき、データ可視化時に必要となる次元削減数やどの特徴量同士を近付けたい（又は遠ざけたい）等のパラメータ類をユーザが指定したい場合、これらの変更を結果に反映させるため、データ可視化パラメータ入力部１０４から入力することも可能である。 The dimension reduction unit 103 performs dimension reduction on these n vectors in two dimensions (or three dimensions) so that points are arranged at equal distances from the origin (S203). At this time, if the user wants to specify parameters such as the number of dimension reductions required for data visualization and which feature quantities want to be moved closer (or moved away from each other), input the data visualization parameters to reflect these changes in the results. It is also possible to input from the unit 104.

そして、求まった原点から等距離に並んだｎ個の特徴量を代表するベクトルやユーザがデータ可視化パラメータ入力部１０４により入力したパラメータを基にしてチャート形状決定部１０６にてチャートの軸の配置が決定（Ｓ２０４）される（Ｓ２０４）。この配置にならって多変量データ入力部１０１から入力されたデータのそれぞれが各軸毎に評価値をプロットされてチャートが完成し、ディスプレイ等の出力部１０５にて出力される（表示される）。 The chart shape determining unit 106 determines the arrangement of the chart axes based on the vectors representing n feature quantities arranged at equal distances from the obtained origin and the parameters input by the user using the data visualization parameter input unit 104. Determination (S204) is made (S204). In accordance with this arrangement, each of the data input from the multivariate data input unit 101 is plotted with an evaluation value for each axis to complete a chart, and is output (displayed) by the output unit 105 such as a display. .

また、チャート形状決定部１０６にて決定したチャートの軸配置をチャート形状記憶部１０８で記憶しておく。そして、チャートの軸配置に使われなかった多変量データを多変量データ追加入力部１０７にて受付け、チャート形状記憶部１０８で保持されているチャートの軸の上にプロットすることで新たなチャートを作成して出力部１０５に出力する。 In addition, the chart axis storage determined by the chart shape determination unit 106 is stored in the chart shape storage unit 108. Then, the multivariate data that has not been used for the chart axis arrangement is received by the multivariate data addition input unit 107 and plotted on the chart axis held in the chart shape storage unit 108 to create a new chart. Create and output to the output unit 105.

次に、図２に示したステップＳ２０２での具体的な特徴量間の類似度（距離）の算出例について説明する。特徴量同士の関連が高いかどうかは、例えばカルバック・ライブラー情報量で定義することができる。カルバック・ライブラー情報量によって特徴量同士の類似性を計算する場合、各特徴量が示すデータの分布をカーネル密度推定（ＫＤＥ）等のアルゴリズムによって推定し、推定された分布間の距離によって定義する。特徴量ＩＤがｉとｊの２つの特徴量Ｆ_iとＦ_jのそれぞれが示すデータバラつきから推定した確率分布をＰ_iとＰ_jとすると、２つの分布間の類似度は下記式（１）で定義される。 Next, an example of calculating the similarity (distance) between specific feature amounts in step S202 shown in FIG. 2 will be described. Whether or not the feature amounts are highly related can be defined by, for example, the amount of information of the Cullback / Librer. When calculating the similarity between feature quantities based on the amount of information of the Cullback / Librer, the distribution of data indicated by each feature quantity is estimated by an algorithm such as kernel density estimation (KDE) and defined by the distance between the estimated distributions. . If the probability distributions estimated from the data variations indicated by the two feature quantities F _i and F _{j with} the feature quantity IDs i and j are P _i and P _j , the similarity between the two distributions is expressed by the following equation (1). Defined by

ただし、分布間の距離は式（１）で示した以外にも、２つの確率分布が多次元正規分布であると仮定して、Ｂｈａｔｔａｃｈａｒｙｙａ距離を用いたり、相互情報量によって定義する等、他の指標で算出しても良いことは言うまでもない。また、各確率分布推定と分布間の距離算出を同時に行う密度比推定と呼ばれる方法によっても同様の結果を得ることができる。その他、分布間の距離算出に使える指標としては、特徴量毎が示すデータのスコアランキングの差の２乗和の総和を使う方法がある。この場合にもスコアが大きい程、特徴量同士が似ていないものとして扱える。 However, in addition to the distance between the distributions shown in the equation (1), it is assumed that the two probability distributions are multi-dimensional normal distributions, the Bhattacharya distance is used, or the other information is defined by mutual information. Needless to say, it may be calculated using an index. Similar results can also be obtained by a method called density ratio estimation in which each probability distribution estimation and distance calculation between the distributions are performed simultaneously. In addition, as an index that can be used for calculating the distance between distributions, there is a method that uses the sum of square sums of differences in score ranking of data indicated by each feature amount. Also in this case, the larger the score, the more the feature amounts can be treated as not similar.

以上のようにして、全ｎ個の入力特徴量Ｆ＝｛Ｆ_i｝ⁿ _i=1同士の類似度を算出する。これにより、全ｎ個の入力特徴量は同じく自身を含めてｎ個の特徴量との類似度を要素とするベクトルが獲得される。このとき得られたベクトルを式（２）で表す。 As described above, the similarity between all n input feature values F = {F _i } ⁿ _{i = 1} is calculated. As a result, a vector whose elements are similarities to n feature quantities including n itself is obtained for all n input feature quantities. The vector obtained at this time is expressed by equation (2).

次に、ステップＳ２０３に続くステップＳ２０３で行う処理について具体的な算出法の例について説明する。ステップＳ２０３では、２次元のレーダチャートの軸としてｎ個の特徴量を表す軸は２次元へ次元削減を行うため、次元削減後に対応する軸の描画のための点の座標を式（３）で表す。式（３）ではｍ次元に次元削減する場合を示しているが、通常のレーダチャートではｍ＝２とすれば良い。また、式（３）においてＢはレーダチャート生成のための埋め込み行列であり、定義は式（４）に示したとおりである。 Next, an example of a specific calculation method for the process performed in step S203 subsequent to step S203 will be described. In step S203, the axis representing the n feature quantities as the axis of the two-dimensional radar chart performs dimension reduction to two dimensions, and the coordinates of the point for drawing the corresponding axis after dimension reduction are expressed by equation (3). Represent. Equation (3) shows a case where the dimension is reduced to m dimensions, but in an ordinary radar chart, m = 2 may be used. In Equation (3), B is an embedding matrix for generating a radar chart, and the definition is as shown in Equation (4).

ここで、埋め込み行列Ｂを求めるためのいくつかの方法を以下に示す。求めるべき埋め込み行列ＢをＢ^*と呼ぶことにし、また、近付けるべき特徴量同士の規則を記述した類似度行列Ｗによって式（５）で埋め込み行列を定義する。 Here, several methods for obtaining the embedding matrix B are shown below. The embedding matrix B to be obtained is referred to as B ^*, and the embedding matrix is defined by the equation (5) by the similarity matrix W describing the rules of the feature quantities to be brought close to each other.

ここで、類似度行列Ｗの要素であるＷ_i,jは、特徴量同士を近付ける規則を記述した行列であり、ｉ番目の特徴とｊ番目の特徴を近付ける場合には１に、遠ざける場合には０になるような関数として設定されれば何でも良い。つまり、類似度Ｗ_i,jが１の場合には式（５）で次元削減後の距離が最小化対象となり、０の場合には無視されるようにしてＢ^*を求めることになる。当然のことながら、近付ける優先度を対象によって変えたい場合には、Ｗ_i,jとして０〜１の間の値を設定しても良い。類似度Ｗ_i,jの例として使える式の例を式（６）、式（７）に示す。 Here, W _{i, j,} which is an element of the similarity matrix W, is a matrix describing a rule for bringing the feature quantities close to each other. When the i-th feature and the j-th feature are brought close to each other, 1 is used. Any function can be used as long as the function is set to 0. In other words, when the similarity W _{i, j} is 1, the distance after dimension reduction is to be minimized by Equation (5), and when it is 0, B ^* is determined so as to be ignored. Of course, when the priority to be approached is to be changed depending on the target, a value between 0 and 1 may be set as Wi _{, j} . Examples of expressions that can be used as examples of the similarity W _{i, j} are shown in Expressions (6) and (7).

式（６）は、ｎ次元特徴空間でのＧ_iがＧ_jのｋ近傍にあるかどうかで０か１かを決定する方式であり、式（７）は、定数γによって定義される距離ベースで算出される値をセットする方式である。その他のＷ_i,jの例として、抽出特徴量に関する事前知識を反映させた値をセットしても良い。また、抽出特徴の性質上、ユーザの都合によりデータ可視化の際に近付けたくない特徴量が存在する場合には、ユーザによる該当特徴量ＩＤ指定の後、該特徴量同士の類似度Ｗ_i,jを０にセットすることでユーザ希望の結果を得ることができる。逆に、ユーザの希望で近付けたい特徴があれば同様の入力により、該当する特徴量間のＷ_i,jを１にセットすれば良い。 Equation (6) is a method for determining whether G _i in the n-dimensional feature space is near k of G _j or not, and Equation (7) is a distance base defined by a constant γ. This is a method for setting the value calculated in (1). As another example of _{Wi, j} , a value reflecting prior knowledge about the extracted feature amount may be set. If there is a feature quantity that is not desired to be approached when the data is visualized due to the convenience of the user due to the nature of the extracted feature, the similarity W _{i, j} between the feature quantities after the user designates the corresponding feature quantity ID. By setting 0 to 0, the result desired by the user can be obtained. Conversely, if there is a feature that the user wants to approach, Wi _{, j} between the corresponding feature amounts may be set to 1 by the same input.

これらの指標に基き次元削減を行うが、式（５）にある次元削減後の原点からの距離が１になるという制約条件により、各特徴量ＩＤを代表する点が原点からの距離１の円上に並ぶ結果を得る。このときの処理前と処理後のイメージ図、及びそのとき得られるチャートの例を図６に示す。 Dimension reduction is performed based on these indices, but the point representing each feature ID is a circle with a distance of 1 from the origin due to the constraint that the distance from the origin after dimension reduction in Equation (5) is 1. Get the top results. FIG. 6 shows an image diagram before and after processing at this time, and an example of a chart obtained at that time.

図６（Ａ）は、前述した全入力特徴量をカルバック・ライブラー情報量によって類似度ベースの距離により定義された１０次元空間内に１０個の各特徴量を代表するベクトルＧ_iが分布している様子を表している。図６（Ｂ）は、前述した方法にて算出された次元削減用の埋め込み行列によって２次元に次元削減された各特徴量を代表するベクトルｇ_iが原点周りの距離１の円上に分布していることを示している。これが、レーダチャートで各軸をどの位置関係に描画するかの指標になる。 In FIG. 6A, the vector G _i representing each of the ten feature quantities is distributed in a 10-dimensional space defined by the similarity-based distance of all the input feature quantities described above by the Cullback / railer information quantity. It shows how it is. In FIG. 6B, a vector g _i representing each feature quantity two-dimensionally reduced by the dimension reduction embedding matrix calculated by the method described above is distributed on a circle having a distance of 1 around the origin. It shows that. This is an index of the positional relationship in which each axis is drawn on the radar chart.

この円上の点ｇ_iが各特徴量を表す特徴量ＩＤのｉ番目の軸を設定するための位置指標となるため、原点から全１０個の点に向かって軸を設定し描画することによって、図６（Ｃ）に示したようなチャートを得ることができる。このチャートはレーダチャートのように軸を等間隔に配していないが、各軸の類似度が各軸の間の角度として表現されているため、意味的なまとまりを把握しやすくなっており、一般的なレーダチャートよりも理解しやすくなる場合がある。このように、軸の相対的な位置と角度に情報価値が無かったレーダチャートの欠点を補った、新しいチャートをＣｌｕｍｓｙＳｐｉｄｅｒＣｈａｒｔ（ＣＳＣ）と称する。 Since the point g _i on the circle is a position index for setting the i-th axis of the feature amount ID representing each feature amount, the axes are set and drawn from the origin toward all ten points. A chart as shown in FIG. 6C can be obtained. This chart does not arrange the axes at regular intervals like the radar chart, but the similarity of each axis is expressed as the angle between each axis, so it is easy to grasp the semantic unity, It may be easier to understand than a general radar chart. In this way, a new chart that compensates for the shortcomings of radar charts that have no information value in relative positions and angles of axes is referred to as “Clumsy Spider Chart (CSC)”.

このチャート上に任意のデータの各軸スコアをプロットして描画したＣＳＣの例を図６（Ｄ）に示す。図６（Ｄ）は説明を簡易にするために軸の数が少ない場合のＣＳＣとして表現しているが、ＣＳＣが便利になるのはより高次元の多変量を扱うときである。高次元になっても、連動してスコアが変わる軸が近く配置されているために、局所に注目する場合は近い軸同士のスコア変動をまとめて観察すればよく、全体のグラフ形状を見ると局所のまとまりを持ったスコアの動きのバランスを俯瞰して把握することができる。 FIG. 6D shows an example of CSC drawn by plotting each axis score of arbitrary data on this chart. Although FIG. 6D is expressed as a CSC when the number of axes is small in order to simplify the explanation, the CSC is useful when dealing with higher-dimensional multivariate. Since the axes whose scores change in conjunction with each other are arranged close to each other even if it becomes higher, if you pay attention to the local area, you only need to observe the score fluctuations of the nearby axes together, and if you look at the overall graph shape You can get an overview of the balance of the movement of scores with local unity.

以上説明したように、軸同士の関連性や類似性を考慮して軸の配置（並びや間隔）を設定してチャート（グラフ）を生成することによって、全体を俯瞰した際に複数の軸同士の意味的なまとまりを直感的に把握することが可能となる。これにより、巨視的な視点で見たときのグラフ全体の形状比較がし易くかつ微視的な視点では軸の近い関連性の高い物同士の差異に注目して比較することのできる、人の認識にやさしいチャート（グラフ）を生成することが可能となる。 As described above, by taking into account the relevance and similarity between the axes, the arrangement (arrangement and interval) of the axes is set and a chart (graph) is generated, so that when the whole is viewed, a plurality of axes are It is possible to intuitively grasp the semantic unity. This makes it easy to compare the shape of the entire graph when viewed from a macroscopic viewpoint, and from a microscopic viewpoint, it is possible to compare by focusing on the differences between closely related objects that have close axes. A chart (graph) that is easy on recognition can be generated.

図７（Ａ）、図７（Ｂ）に高次元（１０８次元）の多変量データの比較が直感的に優れていることがわかるＣＳＣの例を示した。図７（Ａ）、図７（Ｂ）によると、一般的なレーダチャートよりも、どの軸が相関が強く連動して動くことが直感的にわかるため、グラフの注目ポイントが暗に示されており、高次元になっても形状の認識がしやすいことがわかる。図７（Ａ）と図７（Ｂ）に示したチャートを比較するときも全体的な形状の傾向は変わらないが、局所的な形状としての右上部分、左上部分の形状に違いがあるということが判り易い。軸の並びが適切な順になっていない一般的なレーダチャートであると、１０８本もの軸の上のグラフを一瞥して傾向を把握するのは困難になるのは言うまでもない。 FIGS. 7A and 7B show examples of CSCs in which comparison of high-dimensional (108-dimensional) multivariate data is intuitively superior. According to FIG. 7A and FIG. 7B, it is intuitively understood which axis moves with a strong correlation in comparison with a general radar chart. Thus, it can be seen that the shape can be easily recognized even when the dimensions become high. When comparing the charts shown in FIG. 7A and FIG. 7B, the overall shape trend does not change, but there is a difference in the shape of the upper right part and the upper left part as a local shape. Is easy to understand. Needless to say, a general radar chart in which the axes are not arranged in an appropriate order makes it difficult to grasp the trend by looking at the graphs on the 108 axes.

また、通常のレーダチャートのように軸の間の角度が等しい表示を好むユーザのために外観だけを一般的なレーダチャートに合わせるように軸の並びはそのままにしながら軸間角度を等間隔になるようにしても良い。その場合の変換過程の概念は、図８（Ａ）〜図８（Ｄ）に示しており、図６（Ｄ）に対応するグラフを図８（Ｅ）に示している。 In addition, for users who prefer a display with the same angle between the axes as in a normal radar chart, the axis-to-axis angles are equally spaced while keeping the axes aligned so that only the appearance matches the general radar chart. You may do it. The concept of the conversion process in that case is shown in FIGS. 8A to 8D, and a graph corresponding to FIG. 6D is shown in FIG.

詳細な分析ではなく、より短時間で直感的にデータの傾向を把握するためには、できるだけ冗長な情報になる可能性の高い部分は取り除いてユーザに提示した方が良い場合がある。そのため、さらに簡易チャート表示モード等を選択できるようにし、特徴量同士の類似度が所定の値より高い軸同士を１つの軸にまとめて表示しても良い。その際の処理手順を図９（Ａ）〜図９（Ｄ）に示す。例えば、次元削減後の軸の間の角度が設定した閾値以下であった場合にそれらの軸をまとめて１つの軸で表し、その軸で表されるスコアは元の複数の軸のスコアの平均値にする等により算出する方法がある。または次元削減前に近傍関係にある点の平均を取る等して１つの点にまとめ、その後に次元削減をしても良い。図９（Ｅ）は、図６（Ｄ）や図８（Ｅ）に示したデータを同様に図９（Ｄ）のチャート上に表示した例であるが、図９（Ｅ）をさらに一般的なレーダチャートの形式にして見たいユーザのために全軸間の角度を等間隔になるように調整して表示しても良い。 In order to grasp the tendency of data intuitively in a shorter time rather than in a detailed analysis, it may be better to remove a portion that is likely to be redundant information as much as possible and present it to the user. Therefore, a simple chart display mode or the like can be selected, and axes whose similarity between feature amounts is higher than a predetermined value may be displayed together on one axis. The processing procedure at that time is shown in FIGS. For example, if the angle between the axes after dimension reduction is less than or equal to the set threshold, those axes are collectively expressed as one axis, and the score represented by that axis is the average of the scores of the original multiple axes There is a method of calculating by making a value. Alternatively, before the dimension reduction, the points in the neighborhood relation may be averaged to be combined into one point, and then the dimension reduction may be performed. FIG. 9E is an example in which the data shown in FIG. 6D and FIG. 8E is displayed on the chart of FIG. 9D, but FIG. 9E is more general. For a user who wants to view in the form of a simple radar chart, the angles between all axes may be adjusted so as to be equally spaced.

なお、前述した各種チャートの表示切り替えは、例えば図１０に示すように複数の画像に対応するチャートを表示させ、チャートを見ながらユーザが好みの表示形式をボタンをクリックするだけで簡単に選択できるようにしてもよい。これによりデータによって適した分析が容易になる。 The display switching of the various charts described above can be easily selected by simply displaying a chart corresponding to a plurality of images as shown in FIG. 10, for example, and the user clicking a button while viewing the chart. You may do it. This facilitates analysis suitable for the data.

また、前述した説明では、すべて一般的なレーダチャートに近い表示形態を想定したため、次元削減時の次元数を２次元とした例を用いて説明したが、より複雑な分析が必要な場合等は次元削減時の次元数を３次元としても良い。削減後の次元数が３である場合、前述したチャートの３次元表現となり、よりそれぞれの特徴量の関係が直感的に表現されたグラフを作成することができる。ただし、３次元のチャートを確認するディスプレイが２次元表示の場合等は２次元のチャートよりも形状の把握が難しくなる場合がある。そのために３次元化したチャートの際は、緯度や経度によってＲＧＢ（表示色）の値を変化させながらカラーで表示するなどすれば良い。なお、本実施形態は外観検査を例として説明したが複数の特徴量を扱った場合のデータ可視化全般に利用できる方法であることは言うまでもない。 In the above description, since all display forms are assumed to be similar to a general radar chart, the number of dimensions at the time of dimension reduction is described as an example of two dimensions. However, when more complicated analysis is required, etc. The number of dimensions at the time of dimension reduction may be three dimensions. When the number of dimensions after the reduction is 3, it becomes a three-dimensional representation of the chart described above, and a graph that more intuitively represents the relationship between each feature amount can be created. However, when the display for confirming the three-dimensional chart is a two-dimensional display, it may be more difficult to grasp the shape than the two-dimensional chart. Therefore, in the case of a three-dimensional chart, it may be displayed in color while changing RGB (display color) values according to latitude and longitude. Although the present embodiment has been described by taking the appearance inspection as an example, it is needless to say that the present embodiment is a method that can be used for general data visualization when a plurality of feature quantities are handled.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第２の実施形態は、第１の実施形態で説明したチャート生成のための次元削減時において次元削減する前後において特徴量間の距離関係の保持度合いを基準にして次元削減を行い、人が見て直感的に判り易い形でのレーダチャート等を生成する方法である。以下では、第２の実施形態において、前述した第１の実施形態と異なる点について説明する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. In the second embodiment, dimension reduction is performed based on the degree of retention of the distance relationship between feature quantities before and after the dimension reduction for the dimension generation for chart generation described in the first embodiment. This is a method for generating a radar chart or the like in an intuitively easy-to-understand form. Hereinafter, differences of the second embodiment from the first embodiment will be described.

次元削減する前後において特徴量間の距離関係の保持度合いを基準にして次元削減を行う場合、最小化すべきは次元削減する前後で距離関係が変わることによる誤差である。このときの誤差２乗和を例えば式（８）に示した方法で定義すると良い。 When dimension reduction is performed on the basis of the degree of retention of the distance relationship between feature quantities before and after the dimension reduction, what should be minimized is an error caused by the distance relation changing before and after the dimension reduction. The sum of squared errors at this time may be defined, for example, by the method shown in Equation (8).

ただし、次元削減前の特徴量Ｇやｇは、第１の実施形態で示したものと同じであり、Δ_ijは式（９）のように定義し、Ｇ_i、Ｇ_jの間の距離を表す。 However, the feature amounts G and g before dimension reduction are the same as those shown in the first embodiment, Δ _ij is defined as in Expression (9), and the distance between G _i and G _j is defined as Represent.

また、δ_ijは同様に次元削減後のｇ_i、ｇ_jの距離を表す。ただし、δ_ijには２次元平面に次元削減した後に原点からの距離が等しくなるように式（１０）に示す制約を加える。 Similarly, δ _ij represents the distance between g _i and g _j after dimension reduction. However, the restriction shown in Expression (10) is added to δ _ij so that the distance from the origin becomes equal after the dimension reduction to the two-dimensional plane.

以上のようにした後、最急降下法により解を求めることができる。これによって求まった解は、２次元平面の上で原点周辺の同心円上に配置された点として次元削減後の特徴量の代表点を得る。この点を基にして以降のチャートを獲得する手順は、第１の実施形態と同様である。なお、式（８）以外の誤差２乗和関数の例を式（１１）、式（１２）に示す。これらはどれも多次元尺度法等で利用されている関数である。 After doing the above, a solution can be obtained by the steepest descent method. The solution obtained in this way obtains a representative point of the feature quantity after dimension reduction as a point arranged on a concentric circle around the origin on the two-dimensional plane. The procedure for acquiring subsequent charts based on this point is the same as in the first embodiment. Examples of error square sum functions other than Expression (8) are shown in Expressions (11) and (12). These are all functions used in multidimensional scaling.

その他、式（８）、式（１１）、式（１２）に記した最小化すべき誤差２乗和関数は、次元削減前の特徴量Ｇのすべての関係性を保持するための誤差となっている。ｋ近傍までを近付けるといったパラメータを追加するとＩＳＯＭＡＰと同様な手順で次元削減を行うことができる。以上手順によって得られた次元削減後の各特徴量を表す代表点であるｇを基にしてチャートを生成する手順は、第１の実施形態と同様手順によって実現でき、適切なチャートを得ることができる。 In addition, the error sum-of-squares function to be minimized described in Expression (8), Expression (11), and Expression (12) is an error for maintaining all the relationships of the feature amount G before dimension reduction. Yes. If a parameter such as approaching to the vicinity of k is added, dimension reduction can be performed in the same procedure as ISOMAP. The procedure for generating a chart based on g, which is a representative point representing each feature quantity after dimension reduction obtained by the above procedure, can be realized by the same procedure as in the first embodiment, and an appropriate chart can be obtained. it can.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。第３の実施形態は、高次元の特徴量で表現される複数のデータがあり、各々のデータがそれらの中でどのような傾向にあるかを分析したい場合に人が見て直感的に判り易い表示に平行座標プロットや棒グラフを表示する方法である。第１の実施形態又は第２の実施形態で説明した方法によって、レーダチャートの軸同士の間隔や順番を決定するアルゴリズムを用いることによって、同様に平行座標プロットや棒グラフの項目の順番と間隔を決定することができる。また、レーダチャートでは円形状になるように軸を配することができたため、平行座標プロットや棒グラフでもｎ個の項目にてグラフを生成する際、末端であるｎ個目の項目の隣に１個目の項目からのグラフをさらに描画しても良い。 (Third embodiment)
Next, a third embodiment of the present invention will be described. In the third embodiment, there is a plurality of data expressed by high-dimensional feature amounts, and when it is desired to analyze how each data has a tendency, it is intuitively understood by humans. This is a method of displaying a parallel coordinate plot or a bar graph on an easy display. Using the algorithm described in the first embodiment or the second embodiment to determine the spacing and order of radar chart axes, the order and spacing of parallel coordinate plots and bar graph items are similarly determined. can do. In addition, since the axes can be arranged in a circular shape in the radar chart, when generating a graph with n items in parallel coordinate plots and bar graphs, 1 is placed next to the n-th item at the end. A graph from the first item may be further drawn.

（本発明の他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。即ち、前述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other embodiments of the present invention)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

例えば、前述した各実施形態における情報処理装置は、図１１に示すようなコンピュータ機能１１００を有し、そのＣＰＵ１１０１により各実施形態での動作が実施される。コンピュータ機能１１００は、図１１に示すように、ＣＰＵ１１０１と、ＲＯＭ１１０２と、ＲＡＭ１１０３とを備える。また、操作部（ＣＯＮＳ）１１０９のコントローラ（ＣＯＮＳＣ）１１０５と、ＬＣＤ等の表示部としてのディスプレイ（ＤＩＳＰ）１１１０のディスプレイコントローラ（ＤＩＳＰＣ）１１０６とを備える。さらに、ハードディスク（ＨＤ）１１１１、及びフレキシブルディスク等の記憶デバイス（ＳＴＤ）１１１２のコントローラ（ＤＣＯＮＴ）１１０７と、ネットワークインタフェースカード（ＮＩＣ）１１０８とを備える。それら機能部１１０１、１１０２、１１０３、１１０５、１１０６、１１０７、１１０８は、システムバス１１０４を介して互いに通信可能に接続された構成としている。 For example, the information processing apparatus in each embodiment described above has a computer function 1100 as shown in FIG. 11, and the CPU 1101 performs operations in each embodiment. As shown in FIG. 11, the computer function 1100 includes a CPU 1101, a ROM 1102, and a RAM 1103. Also, a controller (CONSC) 1105 of the operation unit (CONS) 1109 and a display controller (DISPC) 1106 of a display (DISP) 1110 as a display unit such as an LCD are provided. Furthermore, a hard disk (HD) 1111, a controller (DCONT) 1107 of a storage device (STD) 1112 such as a flexible disk, and a network interface card (NIC) 1108 are provided. The functional units 1101, 1102, 1103, 1105, 1106, 1107, and 1108 are configured to be communicably connected to each other via the system bus 1104.

ＣＰＵ１１０１は、ＲＯＭ１１０２又はＨＤ１１１１に記憶されたソフトウェア、又はＳＴＤ１１１２より供給されるソフトウェアを実行することで、システムバス１１０４に接続された各構成部を総括的に制御する。すなわち、ＣＰＵ１１０１は、前述したような動作を行うための処理プログラムを、ＲＯＭ１１０２、ＨＤ１１１１、又はＳＴＤ１１１２から読み出して実行することで、各実施形態での動作を実現するための制御を行う。ＲＡＭ１１０３は、ＣＰＵ１１０１の主メモリ又はワークエリア等として機能する。 The CPU 1101 performs overall control of each component connected to the system bus 1104 by executing software stored in the ROM 1102 or the HD 1111 or software supplied from the STD 1112. That is, the CPU 1101 reads out and executes a processing program for performing the operation as described above from the ROM 1102, the HD 1111 or the STD 1112, thereby performing control for realizing the operation in each embodiment. The RAM 1103 functions as a main memory or work area for the CPU 1101.

ＣＯＮＳＣ１１０５は、ＣＯＮＳ１１０９からの指示入力を制御する。ＤＩＳＰＣ１１０６は、ＤＩＳＰ１１１０の表示を制御する。ＤＣＯＮＴ１１０７は、ブートプログラム、種々のアプリケーション、ユーザファイル、ネットワーク管理プログラム、及び各実施形態における動作を実現するための処理プログラム等を記憶するＨＤ１１１１及びＳＴＤ１１１２とのアクセスを制御する。ＮＩＣ１１０８はネットワーク１１１３上の他の装置と双方向にデータをやりとりする。 The CONSC 1105 controls an instruction input from the CONS 1109. The DISPC 1106 controls the display of the DISP 1110. The DCONT 1107 controls access to the HD 1111 and the STD 1112 that store a boot program, various applications, user files, a network management program, a processing program for realizing operations in the embodiments, and the like. The NIC 1108 exchanges data bidirectionally with other devices on the network 1113.

なお、前記実施形態は、何れも本発明を実施するにあたっての具体化のほんの一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed as being limited thereto. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

１０１：多変量データ入力部１０２：特徴量間距離算出部１０３：次元削減部１０４：データ可視化パラメータ入力部１０５：出力部１０６：チャート形状決定部１０７：多変量データ追加入力部１０８：チャート形状記憶部 101: Multivariate data input unit 102: Distance between feature amount calculation unit 103: Dimension reduction unit 104: Data visualization parameter input unit 105: Output unit 106: Chart shape determination unit 107: Multivariate data addition input unit 108: Chart shape storage Part

Claims

A calculation means for calculating a similarity between a plurality of feature amounts extracted from input multivariate data;
Dimension reduction means for reducing the number of dimensions based on the similarity between the feature quantities calculated by the calculation means and the multivariate data;
Based on the result of dimension reduction by the dimension reduction means, shape determining means for determining the arrangement of the axes corresponding to each of the plurality of feature amounts;
An information processing apparatus comprising: output means for outputting a drawing in which the multivariate data is drawn according to the arrangement of the axes determined by the shape determining means.

The information processing apparatus according to claim 1, wherein the calculation unit calculates a similarity between the feature amounts based on a distribution of data included in the multivariate data.

3. The information processing according to claim 1, wherein the shape determining unit arranges axes corresponding to the plurality of feature amounts so as to form an angle corresponding to the similarity between the feature amounts. apparatus.

The information processing apparatus according to claim 1, wherein an arrangement of axes corresponding to each of the plurality of feature amounts is changed according to an input from a user.

5. The information processing apparatus according to claim 1, wherein the axes corresponding to the feature amounts having a similarity between the feature amounts higher than a predetermined value are collectively arranged.

A calculation step of calculating a similarity between a plurality of feature amounts extracted from input multivariate data;
A dimension reduction step of reducing the number of dimensions based on the calculated similarity between the feature quantities and the multivariate data;
Based on the result of dimension reduction, a shape determining step for determining the arrangement of the axes corresponding to each of the plurality of feature amounts;
And an output step of outputting a diagram in which the multivariate data is drawn according to the determined arrangement of the axes.

A calculation step for calculating a similarity between a plurality of feature amounts extracted from input multivariate data;
A dimension reduction step of reducing the number of dimensions based on the calculated similarity between the feature quantities and the multivariate data;
A shape determining step for determining an arrangement of axes corresponding to each of the plurality of feature amounts based on a result of dimension reduction;
A program for causing a computer to execute an output step of outputting a diagram in which the multivariate data is drawn according to the determined arrangement of axes.