TWI659321B - System and method for analyzing industry relevance - Google Patents
System and method for analyzing industry relevance Download PDFInfo
- Publication number
- TWI659321B TWI659321B TW107102052A TW107102052A TWI659321B TW I659321 B TWI659321 B TW I659321B TW 107102052 A TW107102052 A TW 107102052A TW 107102052 A TW107102052 A TW 107102052A TW I659321 B TWI659321 B TW I659321B
- Authority
- TW
- Taiwan
- Prior art keywords
- text features
- industry
- related words
- word
- overlap rate
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000010219 correlation analysis Methods 0.000 claims abstract description 74
- 238000004458 analytical method Methods 0.000 claims abstract description 31
- 238000010586 diagram Methods 0.000 claims description 44
- 238000012097 association analysis method Methods 0.000 claims description 13
- 238000012098 association analyses Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 239000011521 glass Substances 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 239000003208 petroleum Substances 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 239000010931 gold Substances 0.000 description 1
- 229910052737 gold Inorganic materials 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本發明揭露一產業關聯性分析系統與方法。此種產業關聯性分析系統包括使用者介面、伺服器、記憶體與處理器。此種產業關聯性分析方法包括:根據關鍵字,搜尋出複數個關聯字;搜尋預設時間區段內與該些關聯字相關之多篇新聞,並根據該些新聞,針對每個關聯字計算並產生文字雲,其中每個文字雲包括多個文字特徵;比較該些文字特徵,以計算出每個文字特徵的重疊率,並將該些文字特徵根據重疊率由大至小排序;以及取出重疊率較大的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生一關聯性分析圖。 The invention discloses an industry correlation analysis system and method. Such an industry correlation analysis system includes a user interface, a server, a memory, and a processor. This method of industry relevance analysis includes: searching for multiple related words based on keywords; searching for multiple news related to the related words within a preset time period, and calculating for each related word based on the news. And generate a word cloud, where each word cloud includes multiple text features; comparing the text features to calculate the overlap rate of each text feature, and sorting the text features according to the overlap rate from large to small; and extracting A number of text features with a large overlap rate are generated according to the extracted text features and the relationship between the extracted text features and the related words.
Description
本發明乃是關於一種產業關聯性分析系統與方法,特別是指一種能針對不同產業下的多個公司進行關聯性分析的產業關聯性分析系統與方法。 The present invention relates to an industry correlation analysis system and method, and particularly to an industry correlation analysis system and method capable of performing correlation analysis for multiple companies in different industries.
對於產業研究員來說,常常需要從各產業近期的新聞以及近期在社群媒體上針對各產業進行探討的文章中找尋關鍵議題。此外,各產業近期的新聞以及近期在社群媒體上針對各產業進行探討的文章也常透露出同產業下公司與公司間或是不同產業間的差異或相似處。 For industry researchers, it is often necessary to look for key topics from recent news on various industries and recent articles on various industries that have been discussed on social media. In addition, recent news from various industries and recent discussions on various industries on social media often reveal differences or similarities between companies in the same industry, between companies, or between different industries.
然而,由於資訊量眾多,要從各產業近期的新聞以及近期在社群媒體上針對各產業進行探討的文章中彙整出關鍵議題,還要進一步統整出同產業下公司與公司間或是不同產業間的差異或相似處並不是件容易的事。若能夠有效地從各產業近期的新聞以及近期在社群媒體上針對各產業進行探討的文章中彙整出關鍵議題,並進一步統整出同產業下公司與公司間或是不同產業間的差異或相似處,產業研究員便有機會能根據時事判斷一公司或一產業的特徵、轉型契機或是潛在的危機。 However, due to the large amount of information, key issues must be compiled from the recent news of various industries and recent articles on various industries that have been discussed on social media. It is also necessary to further unify companies or companies in the same industry or between companies. Differences or similarities between industries are not easy. If we can effectively summarize the key topics from the recent news of various industries and recent articles on various industries on social media, and further unify the differences between companies in the same industry or between companies or between different industries, or Similarly, industry researchers have the opportunity to judge the characteristics, transformation opportunities, or potential crises of a company or an industry based on current events.
本發明所提供之產業關聯性分析系統包括使用者介面、伺服器、記憶體與處理器。使用者介面設置以輸入關鍵字與顯示分析結果。伺服器設置以運行至少一資料庫。記憶體設置以儲存分析程式。處理器連接於使用者介面、伺服器與記憶體,並設置以執行分析程式以執行以下操作:根據關鍵字,處理器於資料庫中搜尋出複數個關聯字;處理器於資料庫中搜尋預設時間區段內與該些關聯字相關之多篇新聞,並根據該些新聞,針對每個關聯字計算並產生文字雲,其中每一個文字雲包括多個文字特徵;比較該些文字雲之該些文字特徵,以計算出每個文字特徵的一重疊率,並將該些文字雲之該些文字特徵根據重疊率由大至小排序;以及取出重疊率較大的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生關聯性分析圖,以作為分析結果。 The industry correlation analysis system provided by the present invention includes a user interface, a server, a memory and a processor. User interface settings to enter keywords and display analysis results. The server is configured to run at least one database. Memory settings to store analysis programs. The processor is connected to the user interface, the server, and the memory, and is configured to execute an analysis program to perform the following operations: based on keywords, the processor searches for a plurality of related words in a database; the processor searches for a pre- Set a number of news related to the related words in the time section, and calculate and generate a word cloud for each related word based on the news, where each word cloud includes multiple text features; compare the word clouds The text features to calculate an overlap rate of each text feature, and sort the text features of the word cloud from large to small according to the overlap rate; and take out several text features with a large overlap rate, and A correlation analysis diagram is generated according to the extracted text features and the relationship between the extracted text features and the related words as an analysis result.
於本發明所提供之產業關聯性分析系統的一實施例中,處理器更執行分析程式以進一步執行以下操作:取出重疊率較小的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生另一關聯性分析圖,以作為分析結果。於本發明所提供之產業關聯性分析系統的另一實施例中,處理器更執行分析程式以進一步執行以下操作:取出重疊率較大的數個文字特徵與重疊率較小的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生另一關聯性分析圖,以作為分析結果。 In an embodiment of the industry correlation analysis system provided by the present invention, the processor further executes an analysis program to further perform the following operations: extracting several text features with a small overlap rate, and according to the extracted text features and The relationship between the extracted text features and the related words generates another correlation analysis diagram as an analysis result. In another embodiment of the industry correlation analysis system provided by the present invention, the processor further executes an analysis program to further perform the following operations: extracting several text features with a large overlap rate and several text features with a small overlap rate , And generate another correlation analysis diagram according to the extracted text features and the relationship between the extracted text features and the related words as an analysis result.
另外,本發明所提供之產業關聯性分析方法適用於前述之產業關聯性分析系統。本發明所提供之產業關聯性分析方法包括:根據關鍵字,於資料庫中搜尋出複數個關聯字;於資料庫中搜尋預設時間區段內與該些關聯字相關之多篇新聞,並根據該些新聞,針對每個關聯字計算並產生文字雲,其中,每一個文字雲包括多個文字特徵;比較該些文字雲之該些文字特徵,以計算出每 個文字特徵的一重疊率,並將該些文字雲之該些文字特徵根據重疊率由大至小排序;以及取出重疊率較大的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生關聯性分析圖,以作為分析結果。 In addition, the industrial correlation analysis method provided by the present invention is applicable to the aforementioned industrial correlation analysis system. The industrial relevance analysis method provided by the present invention includes: searching for a plurality of related words in a database according to keywords; searching for a plurality of news related to the related words in a preset time section in the database, and According to the news, a word cloud is calculated and generated for each associated word, where each word cloud includes multiple text features; the word features of the word clouds are compared to calculate each An overlap rate of the text features, and sort the text features of the word cloud in descending order according to the overlap rate; and extract several text features with a large overlap rate, and according to the extracted text features and The relationship between the extracted text features and the related words generates a correlation analysis diagram as an analysis result.
於本發明所提供之產業關聯性分析方法的一實施例中,產業關聯性分析方法還包括:取出重疊率較小的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生另一關聯性分析圖,以作為分析結果。於本發明所提供之產業關聯性分析方法的另一實施例中,產業關聯性分析方法還包括:取出重疊率較大的數個文字特徵與重疊率較小的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生另一關聯性分析圖,以作為分析結果。 In an embodiment of the industry association analysis method provided by the present invention, the industry association analysis method further includes: extracting several text features with a small overlap ratio, and according to the text features being retrieved and the extracted ones. The relationship between the text features and the related words generates another correlation analysis diagram as the analysis result. In another embodiment of the industry association analysis method provided by the present invention, the industry association analysis method further includes: taking out several text features with a large overlap rate and several text features with a small overlap rate, and The extracted text features and the relationship between the extracted text features and the related words generate another correlation analysis diagram as an analysis result.
透過本發明所提供之產業關聯性分析系統與方法,在根據關鍵字以及與其相關之新聞報導產生多個文字雲後,便能夠分析這些文字雲,並於分析結果中展現出該些文字雲中的相關性或差異性。如此一來,便有利於產業研究員針對為數眾多的新聞資訊量進行分析與彙整。 Through the industry correlation analysis system and method provided by the present invention, after generating multiple word clouds based on keywords and related news reports, these word clouds can be analyzed and the word clouds can be displayed in the analysis results. Relevance or difference. In this way, it will help industry researchers to analyze and compile a large number of news information.
10‧‧‧處理器 10‧‧‧ processor
11‧‧‧使用者介面 11‧‧‧user interface
12‧‧‧伺服器 12‧‧‧Server
13‧‧‧資料庫 13‧‧‧Database
14‧‧‧記憶體 14‧‧‧Memory
15‧‧‧分析程式 15‧‧‧analysis program
S210、S220、S230、S240‧‧‧步驟 S210, S220, S230, S240 ‧‧‧ steps
S201、S202、S203、S204、S205、S206‧‧‧步驟 S201, S202, S203, S204, S205, S206‧‧‧ steps
S242、S244‧‧‧步驟 S242, S244‧‧‧ steps
圖1為根據本發明一例示性實施例繪示之產業關聯性分析系統的方塊圖。 FIG. 1 is a block diagram of an industry correlation analysis system according to an exemplary embodiment of the present invention.
圖2為根據本發明一例示性實施例繪示之產業關聯性分析方法的方塊圖。 FIG. 2 is a block diagram of an industrial correlation analysis method according to an exemplary embodiment of the present invention.
圖3為根據本發明另一例示性實施例繪示之金產業關聯性分析方法的方塊圖。 FIG. 3 is a block diagram illustrating a gold industry correlation analysis method according to another exemplary embodiment of the present invention.
圖4A為根據本發明一例示性實施例繪示之關聯性分析圖的示意圖。 FIG. 4A is a schematic diagram of a correlation analysis diagram according to an exemplary embodiment of the present invention.
圖4B為根據本發明另一例示性實施例繪示之關聯性分析圖的示意圖。 FIG. 4B is a schematic diagram of a correlation analysis diagram according to another exemplary embodiment of the present invention.
圖4C為根據本發明又一例示性實施例繪示之關聯性分析圖的示意圖。 FIG. 4C is a schematic diagram of a correlation analysis diagram according to another exemplary embodiment of the present invention.
在下文將參看隨附圖式更充分地描述各種例示性實施例,在隨附圖式中展示一些例示性實施例。然而,本發明概念可能以許多不同形式來體現,且不應解釋為限於本文中所闡述之例示性實施例。確切而言,提供此等例示性實施例使得本發明將為詳盡且完整,且將向熟習此項技術者充分傳達本發明概念的範疇。在諸圖式中,類似數字始終指示類似元件。 Various exemplary embodiments will be described more fully hereinafter with reference to the accompanying drawings, in which some exemplary embodiments are shown. However, the inventive concept may be embodied in many different forms and should not be construed as limited to the exemplary embodiments set forth herein. Rather, these exemplary embodiments are provided so that this invention will be thorough and complete, and will fully convey the scope of the inventive concept to those skilled in the art. In the drawings, similar numbers always indicate similar elements.
大體而言,為了能對為數眾多的新聞資訊量進行分析、彙整並產生具有參考價值的分析結果,本發明所提供之產業關聯性分析系統與方法透過根據為數眾多的新聞資訊產生多個文字雲,並進一步對這些文字雲進行分析與彙整的方式,來產生令使用者一目瞭然的關聯性分析圖。以下將以數個實施例來說明本發明所提供之產業關聯性分析系統與方法。 In general, in order to be able to analyze, aggregate and generate analysis results with a large amount of news information, the industry correlation analysis system and method provided by the present invention generate multiple word clouds based on a large amount of news , And further analyze and aggregate these word clouds to generate a correlation analysis diagram that is clear to users. In the following, several embodiments will be used to explain the industry correlation analysis system and method provided by the present invention.
首先要說明的是本發明之產業關聯性分析系統的架構,請參照圖1,圖1為根據本發明一例示性實施例繪示之產業關聯性分析系統的方塊圖。 The structure of the industry correlation analysis system of the present invention is described first. Please refer to FIG. 1. FIG. 1 is a block diagram of the industry correlation analysis system according to an exemplary embodiment of the present invention.
如圖1所示,本實施例所提供之產業關聯性分析系統包括處理器10、使用者介面11、伺服器12與記憶體14。使用者介面11設置以輸入關鍵字與顯示分析結果。伺服器12設置以運行至少一資料庫13。記憶體14設置以儲存一分析程式15。處理器10連接於使用者介面11、12伺服器與記憶體14。本實施例所提供之金融非結構化文本分析系統中的處理器10、使用者介面11與記憶體14可以一 電子裝置來實現,如:個人電腦、智慧型手機…等。本實施例所提供之金融非結構化文本分析系統中的伺服器12可以能與電子裝置進行網路通訊的一伺服器設備來實現。 As shown in FIG. 1, the industry correlation analysis system provided in this embodiment includes a processor 10, a user interface 11, a server 12, and a memory 14. The user interface 11 is configured to input keywords and display analysis results. The server 12 is configured to run at least one database 13. The memory 14 is configured to store an analysis program 15. The processor 10 is connected to the user interface 11 and the server 12 and the memory 14. The processor 10, the user interface 11 and the memory 14 in the financial unstructured text analysis system provided in this embodiment may be Electronic devices, such as personal computers, smart phones, etc. The server 12 in the financial unstructured text analysis system provided in this embodiment may be implemented by a server device capable of performing network communication with an electronic device.
請參照圖2,圖2為根據本發明一例示性實施例繪示之產業關聯性分析方法的方塊圖。 Please refer to FIG. 2, which is a block diagram of an industrial correlation analysis method according to an exemplary embodiment of the present invention.
本實施例所提供之產業關聯性分析方法是由圖1所繪示之產業關聯性分析系統中的處理器10執行儲存於記憶體14中的一分析程式15來實現,故請同時參照圖1與圖2以利瞭解。如圖2所示,大體而言,本實施例所提供之產業關聯性分析方法包括以下步驟:根據關鍵字,於資料庫中搜尋出複數個關聯字(步驟S210);於資料庫中搜尋預設時間區段內與該些關聯字相關之多篇新聞,並根據該些新聞,針對每個關聯字計算並產生文字雲,其中,每一個文字雲包括多個文字特徵(步驟S220);比較該些文字雲之該些文字特徵,以計算出每個文字特徵的一重疊率,並將該些文字雲之該些文字特徵根據重疊率由大至小排序(步驟S230);以及取出重疊率較大的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生關聯性分析圖,以作為分析結果(步驟S240)。 The industry association analysis method provided in this embodiment is implemented by the processor 10 in the industry association analysis system shown in FIG. 1 executing an analysis program 15 stored in the memory 14, so please refer to FIG. 1 at the same time. With Figure 2 to facilitate understanding. As shown in FIG. 2, generally speaking, the industrial correlation analysis method provided in this embodiment includes the following steps: searching for a plurality of related words in a database according to keywords (step S210); Set a number of news related to the related words in the time section, and calculate and generate a word cloud for each related word based on the news, where each word cloud includes multiple text features (step S220); compare The text features of the word clouds to calculate an overlap rate of each text feature, and sort the text features of the word clouds from large to small according to the overlap rate (step S230); and extract the overlap rate A larger number of text features are generated as a result of analysis based on the extracted text features and the relationship between the extracted text features and the related words as an analysis result (step S240).
接著要說明的是本實施例所提供之產業關聯性分析方法中各步驟的細節。 Next, the details of each step in the industrial correlation analysis method provided in this embodiment are explained.
於步驟S210中,當使用者透過使用者介面11輸入一關鍵字時,根據此關鍵字,處理器10便會透過伺服器12於資料庫13中搜尋複數個關聯字。於本實施例中,此關鍵字為一產業名稱,且該些關聯字為多個公司名稱,但本發明於此並不限定。舉例來說,使用者透過使用者介面11輸入「顯示器」,處理器10便會透過伺服器12於資料庫13中搜尋複數個與關鍵字「顯示器」相對應的關聯字,如:「A公司」、「B公司」、「C公司」。 In step S210, when the user inputs a keyword through the user interface 11, according to the keyword, the processor 10 searches the database 13 for a plurality of related words through the server 12. In this embodiment, the keyword is an industry name, and the related words are multiple company names, but the invention is not limited thereto. For example, if the user inputs "display" through the user interface 11, the processor 10 searches the database 13 through the server 12 for a plurality of related words corresponding to the keyword "display", such as: "A company "," Company B "," Company C ".
接著,於步驟S220中,處理器10便會透過伺服器12於資料庫13中搜尋一預設時間區段內與每一個關聯字相關之多篇新聞。於本實施例中,伺服器12係運行至少一資料庫13,資料庫13的資料來源可例如為各大新聞網的所發佈的新聞。此外,前述與每一個關聯字相關之多篇新聞即為內容中存在該關聯字的新聞。假設複數個關聯字為「A公司」、「B公司」、「C公司」,處理器10便會根據與「A公司」相關之多篇新聞、與「B公司」相關之多篇新聞以及與「C公司」相關之多篇新聞分別計算與產生一個文字雲。須說明地是,步驟S220中所產生的每個文字雲包括多個文字特徵,且此處所指的文字特徵即為用以形成每個文字雲的多個詞。 Then, in step S220, the processor 10 searches the database 13 through the server 12 for a plurality of news related to each associated word within a preset time period. In this embodiment, the server 12 runs at least one database 13, and the data source of the database 13 may be, for example, news released by major news networks. In addition, the aforementioned multiple news related to each related word are the news in which the related word exists in the content. Assuming that the related words are "Company A", "Company B", and "Company C", the processor 10 will be based on multiple news related to "Company A", multiple news related to "Company B", and A number of news stories related to "Company C" were calculated and generated as a word cloud. It should be noted that each word cloud generated in step S220 includes multiple word features, and the word features referred to herein are multiple words used to form each word cloud.
接下來,於步驟S230中,處理器10會比較多個文字雲中的該些文字特徵,以計算出每個文字特徵的一重疊率,並將該些文字雲之該些文字特徵根據重疊率由大至小排序。 Next, in step S230, the processor 10 compares the text features in the multiple word clouds to calculate an overlap rate of each text feature, and calculates the text features of the word clouds according to the overlap rate. Sort from largest to smallest.
簡單來說,重疊率即為一個文字特徵於該些文字雲中的出現機率。重疊率的計算首先取決於一個文字特徵出現於幾個文字雲中,再者取決於該文字特徵出現於該些文字雲中的次數。假設步驟S220中針對關聯字「A公司」、「B公司」、「C公司」產生了文字雲I、文字雲II與文字雲III。此時,若一個文字特徵同時出現於文字雲I、文字雲II與文字雲III,且該文字特徵多次出現於該些文字雲,則此文字特徵便具有較大的重疊率;相反地,若一個文字特徵只出現於文字雲I、文字雲II或文字雲III,且該文字特徵出現於該文字雲的次數只有一兩次,則此文字特徵便具有較小的重疊率。 Simply put, the overlap rate is the probability of a text feature appearing in those word clouds. The calculation of the overlap rate depends first on the appearance of a word feature in several word clouds, and also on the number of times that the word feature appears in the word clouds. It is assumed that, in step S220, word cloud I, word cloud II, and word cloud III are generated for related words "company A", "company B", and "company C". At this time, if a text feature appears in the word cloud I, word cloud II, and word cloud III at the same time, and the text feature appears in the word clouds multiple times, the text feature has a large overlap rate; on the contrary, If a text feature only appears in word cloud I, word cloud II, or word cloud III, and the text feature appears in the word cloud only once or twice, then the text feature has a small overlap rate.
最後,於步驟S240中,處理器10會取出重疊率較大的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生關聯性分析圖,以作為分析結果。 Finally, in step S240, the processor 10 extracts several text features with a large overlap ratio, and generates an association analysis based on the extracted text features and the relationship between the extracted text features and the related words. Figure as the analysis result.
關於步驟S240中所產生的關聯性分析圖請參照圖4A,圖4A為根據本發明一例示性實施例繪示之關聯性分析圖的示意圖。假設於步驟S240中需取出十個重疊率較大的文字特徵,且這十個重疊 率較大的文字特徵分別是「D公司」、「E公司」、「外資」、「持股」、「玻璃」、「面板」、「訂單」、「代工」、「鏡頭」與「手機」。此時,步驟S240中所產生的關聯性分析圖便可舉例如圖4A所示。 For the correlation analysis diagram generated in step S240, please refer to FIG. 4A. FIG. 4A is a schematic diagram of a correlation analysis diagram according to an exemplary embodiment of the present invention. Assume that in step S240, ten text features with a large overlap rate need to be extracted, and these ten overlaps The text features with larger rates are `` Company D '', `` Company E '', `` Foreign Investment '', `` Shareholding '', `` Glass '', `` Panel '', `` Order '', `` OEM '', `` Lens '' and `` Mobile Phone '' ". At this time, an example of the correlation analysis diagram generated in step S240 is shown in FIG. 4A.
由圖4A可知,在這十個文字特徵裡,「玻璃」、「代工」、「E公司」為重疊率較大的文字特徵,且「玻璃」、「代工」、「E公司」即為A公司、B公司與C公司之間的關聯。另外,相較於「玻璃」、「代工」、「E公司」,其他文字特徵的重疊率雖然較小,但這些文字特徵都可被視為A公司與B公司、A公司與C公司或B公司與C公司之間的關聯。 It can be seen from FIG. 4A that among the ten text features, “glass”, “foundry”, and “E company” are the text features with a large overlap rate, and “glass”, “foundry”, and “E company” are The association between company A, company B and company C. In addition, compared with "glass", "foundry", "company E", although the overlap rate of other text features is small, these text features can be regarded as company A and company B, company A and company C or Association between Company B and Company C.
也就是說,根據藉由本實施例所提供之產業關聯性分析方法所產生的分析結果(即,關聯性分析圖),便能夠清楚地了解一產業下多個公司之間的關聯性。 That is, according to the analysis result (ie, the correlation analysis diagram) generated by the industrial correlation analysis method provided in this embodiment, the correlation between multiple companies in an industry can be clearly understood.
請參照圖3,圖3為根據本發明另一例示性實施例繪示之產業關聯性分析方法的方塊圖。 Please refer to FIG. 3, which is a block diagram of an industrial correlation analysis method according to another exemplary embodiment of the present invention.
本實施例所提供之產業關聯性分析方法是由圖1所繪示之產業關聯性分析系統中的處理器10執行儲存於記憶體14中的一分析程式15來實現,故請同時參照圖1與圖3以利瞭解。需先說明的是,本實施例所提供之產業關聯性分析方法中的步驟S220~S240與圖2所繪示之產業關聯性分析方法中的步驟S220~S240類似,故於以下的說明中,將僅針對兩者差異之處作說明,而關於步驟S220~S240的細節請參照前述實施例。 The industry association analysis method provided in this embodiment is implemented by the processor 10 in the industry association analysis system shown in FIG. 1 executing an analysis program 15 stored in the memory 14, so please refer to FIG. 1 at the same time. With Figure 3 to facilitate understanding. It should be noted that steps S220 to S240 in the industrial correlation analysis method provided in this embodiment are similar to steps S220 to S240 in the industrial correlation analysis method shown in FIG. 2, so in the following description, Only the differences between the two will be described. For details about steps S220 to S240, please refer to the foregoing embodiment.
本實施例所提供之產業關聯性分析方法與圖2所繪示之產業關聯性分析方法的差異之一在於,本實施例所提供之產業關聯性分析方法還包括步驟S201~S206。 One of the differences between the industry association analysis method provided in this embodiment and the industry association analysis method shown in FIG. 2 is that the industry association analysis method provided in this embodiment further includes steps S201 to S206.
於步驟S201中,處理器10會判斷使用者所輸入的關鍵字是否為一產業名稱。於本實施例中,資料庫13亦儲存有一產業與個體關聯表,此產業與個體關聯表包括有多個產業名稱與多個公司名 稱。於此產業與個體關聯表中,每個產業名稱對應一或多個公司名稱,且每個公司名稱同樣對應一或多個產業名稱。 In step S201, the processor 10 determines whether the keyword entered by the user is an industry name. In this embodiment, the database 13 also stores an industry-individual association table. The industry-individual association table includes multiple industry names and multiple company names. Called. In this industry-individual association table, each industry name corresponds to one or more company names, and each company name also corresponds to one or more industry names.
若處理器10判斷使用者所輸入的關鍵字為一產業名稱,則進入步驟S202。於步驟S202中,處理器10會根據使用者所輸入的關鍵字與前述之產業與個體關聯表搜尋出複數個關聯字。舉例來說,使用者所輸入的關鍵字為「顯示器」,且透過產業與個體關聯表所搜尋出之該些關聯字為「A公司」、「B公司」、「C公司」。 If the processor 10 determines that the keyword input by the user is an industry name, it proceeds to step S202. In step S202, the processor 10 searches for a plurality of related words according to the keywords entered by the user and the aforementioned industry and individual association table. For example, the keyword entered by the user is "display", and the related words searched through the industry and individual association table are "company A", "company B", and company "C".
另一方面,若處理器10判斷使用者所輸入的關鍵字並非一產業名稱,則進入步驟S203。於步驟S203中,處理器10進一步判斷使用者所輸入的關鍵字是否為一公司名稱。若處理器10判斷使用者所輸入的關鍵字也不是一公司名稱,則進入步驟S204,以等待使用者輸入下一個新的關鍵字。另一方面,若處理器10判斷使用者所輸入的關鍵字為一公司名稱,則進入步驟S205。 On the other hand, if the processor 10 determines that the keyword input by the user is not an industry name, it proceeds to step S203. In step S203, the processor 10 further determines whether the keyword entered by the user is a company name. If the processor 10 determines that the keyword entered by the user is not a company name, it proceeds to step S204 to wait for the user to enter the next new keyword. On the other hand, if the processor 10 determines that the keyword input by the user is a company name, it proceeds to step S205.
於步驟S205中,處理器10會根據關鍵字於資料庫13中搜尋出複數個輔助關聯字。舉例來說,若使用者所輸入的關鍵字為「A公司」,則根據前述之產業與個體關聯表,「A公司」同時對應於「顯示器」、「LED」與「手機」三種產業,那麼於此步驟中,處理器10便會搜尋出「顯示器」、「LED」與「手機」這三個輔助關聯字。 In step S205, the processor 10 searches the database 13 for a plurality of auxiliary related words according to the keywords. For example, if the keyword entered by the user is "Company A", according to the aforementioned industry-individual association table, "Company A" corresponds to three industries: "display", "LED", and "mobile phone". In this step, the processor 10 searches for the three auxiliary related words of “display”, “LED” and “mobile phone”.
最後,於步驟S206中,處理器10會根據該些輔助關聯字之一,於資料庫13中搜尋出複數個關聯字。承上例,處理器10會根據「顯示器」、「LED」與「手機」這些輔助關聯字的其中之一於資料庫13中搜尋出複數個關聯字。舉例來說,處理器10根據「顯示器」這個輔助關聯字搜尋到複數個關聯字「A公司」、「B公司」、「C公司」。 Finally, in step S206, the processor 10 searches the database 13 for a plurality of related words according to one of the auxiliary related words. Following the above example, the processor 10 searches the database 13 for a plurality of related words according to one of the auxiliary related words of “display”, “LED” and “mobile phone”. For example, the processor 10 searches for a plurality of related words “Company A”, “Company B”, and “Company C” according to the auxiliary related word “display”.
值得注意的是,於步驟S206中,使用者可以透過使用者介面11來選擇該些輔助關聯字之一,使得處理器10根據被選擇的輔助關聯字來搜尋複數個關聯字。此外,若使用者並未透過使用者介面11來選擇該些輔助關聯字之一,則處理器10將根據一預設排序 或者隨機地選擇該些輔助關聯字之一作為依據來搜尋複數個關聯字。舉例來說,關鍵字「A公司」之輔助關聯字的預設排序可為「顯示器」→「LED」→「手機」。根據分析程式15的設計,處理器10可根據排序為第N名的輔助關聯字來搜尋複數個關聯字(其中,N為大於等於1的整數)。 It is worth noting that in step S206, the user can select one of the auxiliary related words through the user interface 11, so that the processor 10 searches for a plurality of related words according to the selected auxiliary related words. In addition, if the user does not select one of the auxiliary related words through the user interface 11, the processor 10 will sort according to a preset Or randomly select one of the auxiliary related words as a basis to search for a plurality of related words. For example, the default ordering of auxiliary related words of the keyword "company A" may be "display" → "LED" → "mobile phone". According to the design of the analysis program 15, the processor 10 may search for a plurality of related words (where N is an integer greater than or equal to 1) according to the auxiliary related words ranked as the Nth rank.
於是,處理器10便能根據步驟S202與步驟S206所產生的複數個關聯詞來進行步驟S220~S240,以產生如圖4A所示之關聯性分析圖。 Therefore, the processor 10 can perform steps S220 to S240 according to the plurality of related words generated in steps S202 and S206 to generate the correlation analysis diagram shown in FIG. 4A.
本實施例所提供之產業關聯性分析方法與圖2所繪示之產業關聯性分析方法的另一差異在於,本實施例所提供之產業關聯性分析方法還包括步驟S242與S244。 Another difference between the industry association analysis method provided in this embodiment and the industry association analysis method shown in FIG. 2 is that the industry association analysis method provided in this embodiment further includes steps S242 and S244.
於步驟S242中,處理器10會取出重疊率較小的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生關聯性分析圖,以作為分析結果。 In step S242, the processor 10 extracts several text features with a small overlap ratio, and generates a correlation analysis diagram according to the extracted text features and the relationship between the extracted text features and the related words. Take as analysis result.
關於步驟S242中所產生的關聯性分析圖請參照圖4B,圖4B為根據本發明另一例示性實施例繪示之關聯性分析圖的示意圖。假設根據步驟S202與步驟S206所產生的複數個關聯詞為「F公司」與「G公司」,且假設於步驟S242中需取出十個重疊率較小的文字特徵,且這十個重疊率較小的文字特徵分別是「H公司」、「石油」、「供應商」、「工程」、「塑膠」、「模具」、「訂單」、「燈具」、「出口」與「代工」。此時,步驟S242中所產生的關聯性分析圖便可舉例如圖4B所示。 For the correlation analysis diagram generated in step S242, please refer to FIG. 4B. FIG. 4B is a schematic diagram of a correlation analysis diagram according to another exemplary embodiment of the present invention. Assume that the plurality of related words generated according to step S202 and step S206 are "F company" and "G company", and it is assumed that in step S242, ten text features with a small overlap rate need to be extracted, and the ten overlap rates are small The text features are "H company", "petroleum", "supplier", "engineering", "plastic", "mold", "order", "lamp", "export" and "foundry". At this time, an example of the correlation analysis diagram generated in step S242 can be shown in FIG. 4B.
由圖4B可知,在這十個文字特徵裡,「H公司」、「石油」、「供應商」、「工程」、「燈具」、「出口」與「代工」為重疊率很低甚至是重疊率趨近零的文字特徵;也就是說,「H公司」、「石油」、「供應商」、「工程」、「燈具」、「出口」與「代工」即為F公司與G公司之間的差異所在。除此之外,「塑膠」、「模具」、「訂單」雖然也是重疊率低的文字特徵,但仍即為F公司與G公司之間的關聯。 As can be seen from Figure 4B, among these ten text characteristics, the overlap rate of "H company", "oil", "supplier", "engineering", "lamps", "export" and "foundry" is very low or even Textual characteristics with overlapping rates approaching zero; that is, "Company H", "Petroleum", "Supplier", "Engineering", "Lighting", "Export" and "OEM" are F and G companies The difference is. In addition, although "plastic", "mold", and "order" are also textual features with low overlap, they are still the association between company F and company G.
另外,於步驟S244中,處理器10會取出重疊率較大的數個文字特徵與重疊率較小的數個文字特徵,並根據被取出之該些文字特徵以及被取出之該些文字特徵與該些關聯字的關係產生另一關聯性分析圖,以作為分析結果。 In addition, in step S244, the processor 10 extracts a number of text features with a large overlap rate and a number of text features with a small overlap rate, and according to the extracted text features and the extracted text features and The relationship between these related words generates another correlation analysis diagram as the analysis result.
關於步驟S244中所產生的關聯性分析圖請參照圖4C,圖4C為根據本發明又一例示性實施例繪示之關聯性分析圖的示意圖。假設根據步驟S202與步驟S206所產生的複數個關聯詞為「J公司」、「K公司」與「L公司」,且假設於步驟S242中需取出五個重疊率較大的文字特徵與十個重疊率較小的文字特徵,這五個重疊率較大的文字特徵分別為「訂單」、「處理器」、「手機」、「晶圓」與「顯示器」,這十個重疊率較小的文字特徵分別為「M公司」、「財報」、媒體」、「陸資」、「營收」、「面板」、「代工」、「股東會」、「電腦」與「通訊」。此時,步驟S244中所產生的關聯性分析圖便可舉例如圖4C所示。 For the correlation analysis diagram generated in step S244, please refer to FIG. 4C. FIG. 4C is a schematic diagram of the correlation analysis diagram according to another exemplary embodiment of the present invention. Assume that the plurality of related words generated according to steps S202 and S206 are "company J", "K company", and "L company", and it is assumed that in step S242, five text features with a large overlap rate and ten overlaps need to be extracted The text features with lower overlap rate. The five text features with higher overlap rate are "Order", "Processor", "Mobile Phone", "Wafer" and "Display". The ten texts with lower overlap rate are The characteristics are "M Company", "Financial Report", "Media", "Land Capital", "Revenue", "Panel", "OEM", "Shareholders' Meeting", "Computer" and "Communication". At this time, an example of the correlation analysis diagram generated in step S244 is shown in FIG. 4C.
由圖4C可知,在這十五個文字特徵裡,「訂單」、「處理器」、「手機」、「晶圓」與「顯示器」為重疊率較大的文字特徵,其中「處理器」與「晶圓」為J公司、K公司與L公司之間的關聯,「訂單」為J公司與K公司之間的關聯,「顯示器」為J公司與L公司之間的關聯,且「手機」為K公司與L公司之間的關聯。另一方面,「M公司」、「財報」、媒體」、「陸資」、「營收」、「面板」、「代工」、「股東會」、「電腦」與「通訊」則為重疊率很低甚至是重疊率趨近零的文字特徵;也就是說,「M公司」、「財報」、媒體」、「陸資」、「營收」、「面板」、「代工」、「股東會」、「電腦」與「通訊」即為F公司與G公司之間的差異所在。 It can be seen from FIG. 4C that among these fifteen text features, “order”, “processor”, “mobile phone”, “wafer” and “display” are text features with a large overlap rate, among which “processor” and "Wafer" is the association between Company J, K and L, "Order" is the association between Company J and K, "Display" is the association between Company J and L, and "Mobile Phone" It is the association between company K and company L. On the other hand, "M Company", "Financial Report", "Media", "Land-owned", "Revenue", "Panel", "OEM", "Shareholders' Meeting", "Computer" and "Communication" overlap Rate is very low, or even the textual feature of overlapping rate is close to zero; that is, "M company", "financial report", media, "land capital", "revenue", "panel", "foundry", " "Shareholders' meeting", "Computer" and "Communication" are the differences between Company F and Company G.
根據前述說明,除了根據重疊率較大的數個文字特徵產生如圖4A所繪示之關聯性分析圖,以顯示出一產業下多個公司的關聯性之外,透過本實施例所提供的產業關聯性分析方法,還能根據重疊率較小的數個文字特徵產生如圖4B所繪示之關聯性分析圖, 以顯示出一產業下多個公司的差異性。或者,透過本實施例所提供的產業關聯性分析方法,還能根據重疊率較大與較小的數個文字特徵產生如圖4C所繪示之關聯性分析圖,以同時顯示出一產業下多個公司的關聯性與差異性。 According to the foregoing description, in addition to generating a correlation analysis diagram as shown in FIG. 4A according to several text features with a large overlap rate, to show the correlation of multiple companies in an industry, through the provided by this embodiment The industrial correlation analysis method can also generate the correlation analysis diagram shown in FIG. 4B according to several text features with a small overlap rate. To show the differences between multiple companies in an industry. Or, by using the industry correlation analysis method provided in this embodiment, a correlation analysis diagram as shown in FIG. 4C can be generated according to several text features with a large overlap rate and a small number, so as to simultaneously display the industry Relevance and differentiation of multiple companies.
值得注意的是,為便於說明,於前述各實施例中用以產生關聯性分析圖的多個文字雲係分別關聯於同一產業名稱下的多個公司名稱;然而,本發明於此並不限定。於其他實施例中,產業與個體關聯表中的每個產業名稱亦可對應於多個其他產業名稱。於此情況下,當使用者透過使用者介面11輸入的關鍵字為一產業名稱時,根據此關鍵字,處理器10便會根據產業與個體關聯表搜尋出與此產業名稱對應之複數個其他產業名稱。接著,處理器10根據此產業名稱以及與其對應之其他產業名稱分別產生一文字雲。最後,處理器10再根據這些文字雲產生關聯性分析圖,以顯示出不同產業間的關聯性和/或差異性。 It is worth noting that, for convenience of explanation, the multiple word cloud systems used to generate the correlation analysis diagram in the foregoing embodiments are respectively associated with multiple company names under the same industry name; however, the present invention is not limited thereto . In other embodiments, each industry name in the industry-individual association table may also correspond to multiple other industry names. In this case, when the keyword entered by the user through the user interface 11 is an industry name, according to the keyword, the processor 10 searches for a plurality of others corresponding to the industry name according to the industry-individual association table. Industry name. Then, the processor 10 generates a word cloud according to the industry name and other industry names corresponding thereto. Finally, the processor 10 generates a correlation analysis chart according to these word clouds to show the correlation and / or difference between different industries.
亦值得注意的是,雖然特定之方法係參照在本文中所描繪之流程圖來進行描述,但是該發明所屬技術領域中具有通常知識者應該容易地理解,本發明所提供之產業關聯性分析方法中各步驟的執行順序並不因此而限制。也就是說,於本發明之其他實施例所提供之產業關聯性分析方法中,各步驟之執行順序可以改變、某些步驟可以被組合或者某些步驟可以省略。 It is also worth noting that although the specific method is described with reference to the flowchart depicted in this document, those with ordinary knowledge in the technical field to which the invention belongs should easily understand that the industrial correlation analysis method provided by the invention The execution order of the steps in this step is not limited by this. That is to say, in the industrial correlation analysis method provided by other embodiments of the present invention, the execution order of each step may be changed, some steps may be combined, or some steps may be omitted.
綜上所述,透過本發明所提供之產業關聯性分析系統與方法,便能根據各產業近期的新聞以及近期在社群媒體上針對各產業進行探討的文章快速地產生出關聯性分析圖,以顯示出不同產業間的關聯性和/或差異性,或者同一產業下多個公司間的關聯性和/或差異性。 In summary, through the industry correlation analysis system and method provided by the present invention, a correlation analysis map can be quickly generated based on the recent news of various industries and recent articles on various industries discussed on social media. To show the correlation and / or difference between different industries, or the correlation and / or difference between multiple companies in the same industry.
換句話說,本發明所提供之產業關聯性分析系統與方法能有效地從各產業近期的新聞以及近期在社群媒體上針對各產業進行 探討的文章中彙整出關鍵議題,並進一步統整出同一產業下公司與公司間或是不同產業間的差異處或相似處。 In other words, the industry correlation analysis system and method provided by the present invention can effectively carry out the industry's recent news and various social media The articles discussed summarize key issues, and further unify the differences or similarities between companies in the same industry, between companies, or between different industries.
最後須說明地是,於前述說明中,儘管已將本發明技術的概念以多個示例性實施例具體地示出與闡述,然而在此項技術之領域中具有通常知識者將理解,在不背離由以下申請專利範圍所界定的本發明技術的概念之範圍的條件下,可對其作出形式及細節上的各種變化。 Finally, it must be noted that, in the foregoing description, although the concept of the technology of the present invention has been specifically shown and described with a number of exemplary embodiments, those having ordinary knowledge in the field of this technology will understand that Various changes in form and detail may be made without departing from the scope of the concept of the technology of the present invention as defined by the following patent application scope.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW107102052A TWI659321B (en) | 2018-01-19 | 2018-01-19 | System and method for analyzing industry relevance |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW107102052A TWI659321B (en) | 2018-01-19 | 2018-01-19 | System and method for analyzing industry relevance |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TWI659321B true TWI659321B (en) | 2019-05-11 |
| TW201933143A TW201933143A (en) | 2019-08-16 |
Family
ID=67347971
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| TW107102052A TWI659321B (en) | 2018-01-19 | 2018-01-19 | System and method for analyzing industry relevance |
Country Status (1)
| Country | Link |
|---|---|
| TW (1) | TWI659321B (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113434666A (en) * | 2021-04-06 | 2021-09-24 | 西安理工大学 | Keyword relevance analysis method |
| TWI765645B (en) * | 2021-04-07 | 2022-05-21 | 元智大學 | Investment scoring method of financial text |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201033823A (en) * | 2008-12-09 | 2010-09-16 | Ibm | Systems and methods for analyzing electronic text |
| US20140081706A1 (en) * | 2012-06-04 | 2014-03-20 | Unmetric, Inc. | Industry Specific Brand Benchmarking System Based On Social Media Strength Of A Brand |
| TW201640383A (en) * | 2015-05-07 | 2016-11-16 | Dataa Dev Co Ltd | Internet events automatic collection and analysis method and system thereof |
| CN107342976A (en) * | 2017-05-18 | 2017-11-10 | 辛柯俊 | For the mobile solution platform and method of enterprise's Analysis on Industry Chain |
-
2018
- 2018-01-19 TW TW107102052A patent/TWI659321B/en not_active IP Right Cessation
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| TW201033823A (en) * | 2008-12-09 | 2010-09-16 | Ibm | Systems and methods for analyzing electronic text |
| US20140081706A1 (en) * | 2012-06-04 | 2014-03-20 | Unmetric, Inc. | Industry Specific Brand Benchmarking System Based On Social Media Strength Of A Brand |
| TW201640383A (en) * | 2015-05-07 | 2016-11-16 | Dataa Dev Co Ltd | Internet events automatic collection and analysis method and system thereof |
| CN107342976A (en) * | 2017-05-18 | 2017-11-10 | 辛柯俊 | For the mobile solution platform and method of enterprise's Analysis on Industry Chain |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN113434666A (en) * | 2021-04-06 | 2021-09-24 | 西安理工大学 | Keyword relevance analysis method |
| TWI765645B (en) * | 2021-04-07 | 2022-05-21 | 元智大學 | Investment scoring method of financial text |
Also Published As
| Publication number | Publication date |
|---|---|
| TW201933143A (en) | 2019-08-16 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Koch et al. | VarifocalReader—in-depth visual analysis of large text documents | |
| US10095780B2 (en) | Automatically mining patterns for rule based data standardization systems | |
| US10691770B2 (en) | Real-time classification of evolving dictionaries | |
| TWI643076B (en) | Financial unstructured text analysis system and method thereof | |
| WO2021218322A1 (en) | Paragraph search method and apparatus, and electronic device and storage medium | |
| US20140180934A1 (en) | Systems and Methods for Using Non-Textual Information In Analyzing Patent Matters | |
| CN109804364A (en) | Knowledge mapping constructs system and method | |
| US8478756B2 (en) | Contextual document attribute values | |
| Zhang et al. | Feature-level sentiment analysis for Chinese product reviews | |
| JP2016532173A (en) | Semantic information, keyword expansion and related keyword search method and system | |
| JP2005267647A (en) | Method for rendering table by using natural language command | |
| CN114116997A (en) | Knowledge question answering method, knowledge question answering device, electronic equipment and storage medium | |
| US11288266B2 (en) | Candidate projection enumeration based query response generation | |
| Singh et al. | Sentiment analysis using lexicon based approach | |
| JP2025074312A (en) | Large-scale model-based question answering method, device, electronic device, storage medium, agent, and program | |
| CN106126736A (en) | Software developer's personalized recommendation method that software-oriented safety bug repairs | |
| CN114417008A (en) | Construction engineering field-oriented knowledge graph construction method and system | |
| CN111061876A (en) | Event public opinion data analysis method and device | |
| TWI659321B (en) | System and method for analyzing industry relevance | |
| Abu-Rasheed et al. | Explainable graph-based search for lessons-learned documents in the semiconductor industry | |
| CN119669317B (en) | Information display method and device, electronic device, storage medium and program product | |
| Medvet et al. | Brand-related events detection, classification and summarization on twitter | |
| CN110399431A (en) | A kind of incidence relation construction method, device and equipment | |
| CN118395956A (en) | Report generation method, device, equipment, medium and program product | |
| CN110908986A (en) | Layering method and device for computing tasks, distributed scheduling method and device and electronic equipment |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| MM4A | Annulment or lapse of patent due to non-payment of fees |