CN107710239A - PARZEN window feature selecting algorithms for form concept analysis (FCA) - Google Patents
PARZEN window feature selecting algorithms for form concept analysis (FCA) Download PDFInfo
- Publication number
- CN107710239A CN107710239A CN201680033746.XA CN201680033746A CN107710239A CN 107710239 A CN107710239 A CN 107710239A CN 201680033746 A CN201680033746 A CN 201680033746A CN 107710239 A CN107710239 A CN 107710239A
- Authority
- CN
- China
- Prior art keywords
- class
- section
- distribution curve
- feature
- known object
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
- G06N5/022—Knowledge engineering; Knowledge acquisition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Medical Informatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Describe a kind of system for being used for the feature selecting for form concept analysis (FCA).One group of data point with feature is divided into object class.For each object class, convolution is carried out to the data point using Gaussian function, so as to obtain the class distribution curve for each known object class.For each class distribution curve, dyadic array is generated, dyadic array has one on section of the class distribution curve relative to the maximum data value of all other class distribution curve, and has zero on other sections.For each object class, two metaclass curves are generated, two metaclass curve instruction exceedes all other known object class for which section, the performance of the known object class.The section is sorted on predetermined confidence threshold value.The sequence in the section is used for which feature the selection in FCA dot matrix construct extracts from one group of data point.
Description
Government license rights
The present invention is made under U.S. government Contract NO FA8650-13-C7356 by governmental support.The government is in this hair
There are specific rights in bright.
The cross reference of related application
This is entitled " the A General Formal Concept Analysis submitted in the U.S. on July 23rd, 2015
(FCA) Shen is continued in Framework for Classification " U.S. non-provisional application No.14/807,083 part
Please, entire contents are incorporated by reference into this.
This is also entitled " the A Parzen Window Feature Selection submitted in the U.S. on July 23rd, 2015
Algorithm for Formal Concept Analysis (FCA) " the non-of U.S. Provisional Application No.62/195,876 faces
When patent application, entire contents are incorporated by reference into this.
Technical field
The present invention relates to the system for the feature extraction for form concept analysis (FCA), and more specifically it relates to
For the system using Parzen windows for FCA feature extraction.
Background technology
The information of many forms can be described as a group objects, and each object has one group of attribute and/or value.At these
In the case of, any hierarchy is still implicit.As a rule, this group objects can be entirely different with two or more
Attribute domain and/or codomain are related.Form concept analysis (FCA) is the principle method for deriving the partial order on a group objects, often
Individual object is by one group of attribute definition.This is the technology in terms of data and knowledge processing, and it is in data visualization, data mining, letter
There is application (referring to the bibliography list being incorporated to, bibliography No.2) in terms of breath retrieval and information management.Organize data
Principle be the partial order as caused by the inclusion relation between object properties.In addition, FCA allows to dig from the rule of structural data
Pick.
FCA is widely used in data analysis.FCA relies on binary feature, to build dot matrix.In the presence of for by scalar number
According to the technology for being converted to binary format, but they, which often lead to generation, too many category is efficiently used in dot matrix construction
Property.Feature selecting on scalar data is generally completed by scaling or creating unified case (bin).From the scalar number in FCA
Blind choosing strategy be present according to the existing method of middle selection feature, it produces too many and feature usually not.Due to based on spy
Index increase needed for the dot matrix construction of sign calculates the time, and this is problematic.
Therefore, there is still a need for the feature quantity in FCA is reduced to it is most useful, with allow do not weakening FCA ability
In the case of build more magnetic dot arrays.
The content of the invention
The present invention relates to the system for the feature extraction for form concept analysis (FCA), and more specifically it relates to
For the system using Parzen windows for FCA feature extraction.The system is including one or more processors and thereon
Coding has the non-transitory computer-readable medium of executable instruction so that when performing the executable instruction, it is one or
The multiple operations of more computing devices.One group of data point with feature is divided into one group of known object class by the system.Pin
To each known object class, convolution is carried out to the data point using Gaussian function, so as to obtain for each known object class
Class distribution curve.For each class distribution curve, identify the class distribution curve relative to all other class distribution curve most
The section of big data value.The section is sorted on predetermined confidence threshold value.The sequence in the section is used in FCA points
Which feature selection extracts from one group of data point in battle array construction, and special selected by extraction from one group of data point
Sign.
On the other hand, selected feature is used to interpret neural deta.
On the other hand, selected feature is applied to FMRI (fMRI) reaction, with the thinking to people
Journey is classified.
On the other hand, the system generation includes one and zero dyadic array, and the dyadic array is distributed in the class
Have one on the section of the maximum data of curve, and have zero on other sections.
On the other hand, for each known object class, two metaclass curves are generated, the two metaclass curve is indicated for which
Section, it is known that the performance of object class exceedes all other known object class.
On the other hand, one group of data point includes the data from neural sensor.
On the other hand, the predetermined confidence threshold value be used to eliminate the section with low confidence value.
On the other hand, by obtain area below each class distribution curve along each section with along each
The ratio of the summation of area below all other class distribution curve in section determines the sequence in the section.
On the other hand, present invention additionally comprises a kind of method for being used to make computing device operation described herein.
Finally, it yet still another aspect, present invention additionally comprises a kind of computer program product, the computer program product includes:
The computer-readable instruction being stored in non-transitory computer-readable medium, the computer-readable instruction can be by with processing
The computer of device performs, so that computing device operation described herein.
Brief description of the drawings
The file or Patent Application Publication of this patent include performed at least one color drawings.With colour
The copy or Patent Application Publication of the patent of accompanying drawing are provided when asking and paying necessary expense by Patent Office.
According to reference to detailed description to various aspects of the present invention referring to the drawings, the purpose of the present invention, feature with
And advantage will become apparent from, wherein:
Fig. 1 is to show the feature extraction being used for for form concept analysis (FCA) according to the embodiment of the present invention
The block diagram of the component of system;
Fig. 2 is the diagram of computer program product according to the embodiment of the present invention;
Fig. 3 is the diagram of the first background (context) table according to the embodiment of the present invention:
Fig. 4 A are the diagrams of the second context table according to the embodiment of the present invention:
Fig. 4 B are the diagrams of the dot matrix obtained by the data in the second context table according to the embodiment of the present invention;
Fig. 5 is the diagram of the handling process of the feature extraction for FCA according to the embodiment of the present invention:
Fig. 6 is needed for the high score class standard of the unified case of the use compared with Parzen windows according to the embodiment of the present invention
The diagram of the growth of dot matrix nodes;
Fig. 7 is needed for the high score class standard of the unified case of the use compared with Parzen windows according to the embodiment of the present invention
The diagram of the growth of dot matrix number of edges;
Fig. 8 is the classification degree of accuracy as threshold value and Parzen window sizes σ function according to the embodiment of the present invention
Diagram;
Fig. 9 is the dot matrix section as threshold value and Parzen window sizes σ function structure according to the embodiment of the present invention
The diagram of points;
Figure 10 A are the diagrams of class distribution curve according to the embodiment of the present invention;
Figure 10 B are the diagrams of the independent two metaclass curve for each object class according to the embodiment of the present invention;
Figure 11 is the diagram of the confidence value of class distribution curve according to the embodiment of the present invention;And
Figure 12 is the diagram of the FCA classification of record nerves reaction and nerves reaction according to the embodiment of the present invention.
Embodiment
The present invention relates to the system for the feature extraction for form concept analysis (FCA), and more specifically it relates to
For the system using Parzen windows for FCA feature extraction.Following description is presented so that those of ordinary skill in the art's energy
It is enough to manufacture and using the present invention and be incorporated into the background of application-specific.The application discussed is used for by using the present invention
In feature extracting method structure dot matrix come use FCA analyze in response to different stimulated brain activity.Various modifications and
Multiple use in terms of different application is it will become apparent to those skilled in the art that and defined herein general former
Reason can be applied to extensive aspect.Therefore, the present invention be not limited to presented aspect, but meet with it is disclosed herein
Principle and the consistent widest range of novel feature.
In the following detailed description, many details are elaborated, to provide more thoroughly understanding to the present invention.So
And it will be appreciated by those skilled in the art that, the present invention can be put into practice in the case where being not necessarily limited by these details.
In the case of other, known features and equipment are illustrated in form of a block diagram rather than in detail, to avoid the fuzzy present invention.
The notice of reader is placed on All Files and document with this specification while submission, and this document and document can
With by this specification opening so that public inspection, the content of all these files and document are incorporated by reference into this.This theory
Institute's functional (including any appended claims, summary and accompanying drawing) disclosed in bright book can be identical, equivalent with serving
Or the alternate feature of similar purpose replaces, unless be subject to clear stipulaties in addition.Therefore, unless being subject to clear stipulaties in addition, institute
Disclosed each feature is only an example in the equivalent or similar features of universal serial.
Moreover, being not expressly set out " device " for performing specified function in claims or for performing specific work(
Can any element of " step " be not construed as " device " or " step " such as specified in the 6th section of the 112nd chapters of 35U.S.C.
Clause.Especially, " ... the step of " or " ... action " is used to be not intended to and quote in the claims herein
The regulation that the 6th section of 35U.S.C.112.
It is if mark left, right, front and rear, top, bottom, positive and negative, clockwise and all only go out counterclockwise if note that use
Used in facilitating purpose, it is no intended to imply any specific fixed-direction.By contrast, they are used for reflection
Relative position and/or direction between the various pieces of body.Similarly, when the present invention changes, above-mentioned mark may change
Their orientation.
Before describing the present invention in detail, it provide firstly incorporated by reference list of documents as used in this specification.
Next, provide the description of each main aspect to the present invention.Then it is to provide the introduction of the general introduction of the present invention.Finally, carry
For the present invention detail to obtain the understanding to specific aspect.
(1) list for the bibliography being incorporated to
Quote and be incorporated to below with reference to document through the application.For the sake of clarity and convenience, these bibliography are herein
It is listed in the center resources of reader.Following bibliography is incorporated by reference into this, just as being entirely included in this.These ginsengs
Document is examined to be quoted in this application by referring to following corresponding bibliographic reference number, it is as follows:
1.V.Arulmozhi.Classification task by using Matlab Neural Network Tool
Box~~A beginners.Internatioual Journal of Wisdom Based Computing, 2011.
2.G.Romano C.Carpineto.Concept Data Analysis:Theory and
Applications.Wiley, Chapter 2,2004.
3Richard () .Duda, Peter E.Hart, and David G.Stork.Pattern
Classification.Wiley-Interscience, 2nd edition, Chapter 4, Section 3,2001.
4.B.Ganter and R.Wille.Formal Concept Analysis:MathematicaI
Foundations, Springer-Verlag, Chapter l, 1998.
5.M.Swaiu, S.K.Dash, S.Dash, and A.Mohapatra.An approach for IRIS plant
Classification usi ì ng neural network.International Joumal of Soft Comput í ng,
2012.
6.K.Bache and M.Lichman.UCI machine learning repository.University of
California, Irvine, School of Infformation and Computer Sc í ences, 2013, available
at htp://archive.ics.uci.edu/mI/datasets/Iris taken on July 17,2015.
(2) main aspect
Each embodiment has three " main " aspects.First aspect is a kind of for being directed to form concept analysis
(FCA) system of Parzen window feature selectings.The form of the system generally use computer system operation software uses " hard
The form of coding " instruction set.The system can be incorporated in the various devices for providing difference in functionality, such as robot or other
Device.Second main aspect is a kind of method of the form of generally use software, and it utilizes data handling system (computer)
Operated.3rd main aspect is computer program product.The computer program product generally represents to be stored in such as light
Learn the non-temporary of the magnetic memory apparatus of storage device (for example, CD (CD) or digital universal disc (DVD)) or such as floppy disk or tape
Computer-readable instruction on when property computer-readable medium.Other non-restrictive examples of computer-readable medium include hard disk,
Read-only storage (ROM) and flash-type memory.These aspects will be described in more detail below.
The block diagram for the example for showing the system (that is, computer system 100) of the present invention is provided in Fig. 1.Computer system
100 are configured to perform calculating, processing, operation, and/or the function associated with program or algorithm.In one aspect, beg for herein
The some processing and step of opinion are implemented as residing in computer-readable memory unit and by one of computer system 100
Or more computing device series of instructions (for example, software program).Upon execution, the instruction makes computer system 100
Perform specific action and show specific behavior, as described herein.
Computer system 100 can include being configured to the address/data bus 102 for transmitting information.In addition, one or more
(such as processor 104 (or multiple processors) couples multiple data processing units with address/data bus 102.Processor 104
It is configured to processing information and instruction.On the one hand, processor 104 is microprocessor.Alternatively, processor 104 can be not
The processor (such as parallel processor) or field programmable gate array of same type.
Computer system 100 is configured to utilize one or more data storage cells.Computer system 100 can wrap
The volatile memory-elements 106 coupled with address/data bus 102 are included (for example, random access memory (" RAM "), static state
RAM, dynamic ram etc.), wherein, volatile memory-elements 106 are configured to store information and instruction for processor 104.
The Nonvolatile memery unit 108 that computer system 100 can also include coupling with address/data bus 102 is (for example, only
Read memory (" ROM "), programming ROM (" PROM "), erasable programmable ROM (" EPROM "), electrically erasable ROM
(" EEPROM "), flash memory etc.), wherein, Nonvolatile memery unit 108 is configured to store the static state for processor 104
Information and instruction.Alternatively, computer system 100 can be performed from the online data storage unit retrieval in such as " cloud " calculating
Instruction.On the one hand, computer system 100 can also include one or more the connecing with the coupling of address/data bus 102
Mouthful, such as interface 110.One or more interface is configured so that computer system 100 can fill with other electronics
Put and connected with computer system.The communication interface realized by one or more interface can include wired (for example, serial
Cable, modem, network adapter etc.) and/or it is wireless (for example, radio modem, wireless network adapter etc.)
The communication technology.
In one aspect, computer system 100 can include with address/data bus 102 couple input unit 112 in
It is one or more, wherein, input unit 112 is configured to transmit information and command selection to processor 100.According to one
Individual aspect, input unit 112 include alphanumeric input device (such as keyboard), and it can include alphanumeric key and/or work(
Can key.Alternatively or additionally, input unit 112 can include the input unit in addition to alphanumeric input device.Example
Such as, input unit 112 can include one or more sensors, such as the video camera of video or rest image, Mike
Wind or neural sensor.Other examples input unit 112 can include accelerometer, GPS sensor or gyroscope.
On the one hand, computer system 100 can also include one or more optional computer data available storage dresses
Put, the storage device 116 such as coupled with address/data bus 102.Storage device 116 is configured to store information and/or meter
Calculation machine executable instruction.In one aspect, storage device 116 be such as magnetically or optically disk drive (for example, hard disk drive
(" HDD "), floppy disk, compact disc read-only memory (" CD-ROM "), digital universal disc (" DVD ")) storage device.According to a side
Face, display device 118 couple with address/data bus 102, wherein, display device 118 is display configured to video and/or figure
Shape.On the one hand, display device 118 can include:Cathode-ray tube (" CRT "), liquid crystal display (" LCD "), Flied emission show
Show device (" FED "), plasma scope or the alphabetical number that can recognize that suitable for display video and/or graph image and user
Any other display device of word character.
Computer system 100 presented herein is example computing device according to one aspect.However, computer system
100 non-restrictive example is not strictly limited to as computer system.For example, provide the table of computer system 100 on one side
Show a kind of Data Management Analysis that can be used according to various aspects described herein.Further, it is also possible to realize that other calculate is
System.In fact, the spirit and scope of this technology are not limited to any single data processing circumstance.Therefore, on the one hand, using passing through
The computer executable instructions for such as program module that computer performs control or realized the one of the various aspects of this technology
Or more operation.In one implementation, such program module includes being configured to perform particular task or realizes specific take out
Routine, program, object, component, and/or the data structure of image data type.In addition, on the one hand provide one of this technology or
More aspects realize that such as task is by passing through communication network links by using one or more DCEs
Remote processing device perform, or such as various program modules are located at includes the local and remote meter of memory-storage device
In calculation machine storage medium.
The diagram of the computer program product (that is, storage device) of the specific implementation present invention is shown in Fig. 2.The computer
Program product is depicted as floppy disk 200 or such as CD or DVD CD 202.However, as mentioned previously, computer program production
Product generally represent the computer-readable instruction being stored in any compatible non-transitory computer-readable medium.Such as on this
Term used in invention " instruction " is indicated generally at the one group of operation to perform on computers, and can represent whole program
Fragment or single independent software module.The non-limiting example of " instruction " includes computer program code (source or object code)
" hard coded " electronic installation (that is, is encoded to the computer operation in computer chip)." instruction " is stored in any non-
On temporary computer-readable medium, such as it is stored in the memory of computer or floppy disk, CD-ROM and flash drive
On.Anyway, these instructions are coded in non-transitory computer-readable medium.
(3) introduce
Form concept analysis (FCA) is to derive concept hierarchy or formal ontology from the set of object and its characteristic or attribute
Principle method.It is based on gathering comprising the ordering relation defined to create the partial order of object by attribute.Formally see, carry on the back
Scape=(G, M, I) is made up of two set G and M and the relations I between them (being referred to as incidence relation).G element is referred to as
Object, and M element is referred to as attribute (referring to bibliography No.4).If object g ∈ G have attribute m ∈ M, write as gIm
Or (g, m) ∈ I.Background can represent that it is rectangle table with crosstab or context table, wherein capable beginning is object, and arrange
Beginning be attribute, exemplified with its example in Fig. 3.Row g and " X " of row m intersection mean that object g has attribute m.For right
The set of elephantIt can defineIn other words, for object A a certain subset, A '
Represent one group of attribute that all objects in A share.Accordingly, can defineChange speech
It, for attribute B a certain subset, B ' expressions have the object set of all properties in B.
Now can be with form of Definition concept.The formal notion of background (G, M, I) is a pair (A, B), wherein,A '=B, and B '=A.A is referred to as extension (extent), and B is referred to as concept (A, B) intension (intent).
The set of all concepts of (G, M, I) instruction background (G, M, I).Row and column it is any rearrange after, in context table
Concept is represented with maximum continuous blocks " X ", as shown in Figure 3.Described in bibliography No.2 and No.4 for determining concept
The algorithm of dot matrix.Mathematically, the critical aspects of concept dot matrix are concept dot matrixIt is perfect lattice, wherein, infimum
It is given by respectively with supremum:
∧t∈T(At, Bt)=(∩t∈TAt, (∪t∈T Bt) ") and
Vt∈T(At, Bt)=((∪t∈TAt) ", ∩t∈T Bt)。
Reference picture 3, object (for example, lion) have the attribute from the row corresponding with " X " (for example, predation, lactation are moved
Thing).Continuous grey block 300 rearranges lower maximum in any of row and column, and forms formal notion.Supremum is referred to as protecting
Connection (join) is simultaneously written to z ∨ y or is write as VS (set S join) sometimes.Infimum, which is referred to as protecting, hands over (meet), and writes
Write into z ∧ y or sometimes as ∧ S (set S guarantor hands over).Given in bibliography No.4 to the extensive of form concept analysis
Description.
(3.1) example of background and concept dot matrix
Concept dot matrix is the mathematic(al) object represented as described above by (G, M, I).Concept dot matrix can be schemed visual by Hasse
Change, directed acyclic graph, wherein, node represents concept, and line represents the inclusion relation between node.In form concept analysis
In the case of, Hasse figures are with the single top node for representing all objects (being provided by G) and represent all properties (being provided by M)
Single bottom node.All nodes therebetween represent each conception of species being made up of some of object and attribute subsets.Two sections
Line between point represents order information.Node above is considered as being more than following node.In Hasse figures, there is property set
The node n for closing m and object set g has following characteristic:
M=g ', it is the set of all properties of each object-sharing in g.
G=m ', it is the set of all objects with all properties in m.
N each child node has whole m in its intension.
N each father node has whole g in its outer Yanzhong.
Thus, the sequence of the node in dot matrix n > k means that n extension is included in k outer Yanzhong, and equally, n
Intension included in k intension in.Node n upper collection (upset) includes its all ancestor node in the dot matrix.N next part
(downset) it is made up of its all child node in the dot matrix.
The context table of concept dot matrix as caused by form content is shown respectively in Fig. 4 A and Fig. 4 B and corresponding Hasse schemes.Object
Nine major planets, and attribute be such as size, to the sun distance and whether there is the characteristic of satellite.Each node is (by such as
The circle of element 400 and 402 represents) correspond to concept, node of its object including coming from above connection all objects and
Collect (union), and attribute includes the common factor (intersection) of all properties of all nodes connected from below.Most
Eventually, top node 404 includes all object G and present attribute.Accordingly, bottommost node 406 include all properties M and
There is no object.
(4) specific details of the invention
In system according to certain embodiments of the present invention, for from utilization fMRI (FMRI)
The scalar data of BOLD (Blood oxygen level dependence) reactions of measurement performs feature selecting.FMRI is the function god using MRI technique
Imaged process, it measures brain activity by detecting the change relevant with blood flow.This technology depends on brain blood flow and god
The fact that be combined through member activation.When a region of brain is in use, flowing to the blood flow in the region also increases.In response to
Stimulate, the commonly provided data sets of fMRI, the data set can include the brain from 20k-100k (wherein, k represents " thousand ") voxel
The sample of activity (inferring from BOLD signals).It is performed according to the feature selecting of the higher-dimension scalar data, in being reacted from voxel
Noise in extract signal.Then selected feature can be further analyzed using such as FCA method, with their knot of understanding
Structure and to the movable contribution in response to stimulating (hereinafter referred to as object class), and is further used for brain activity being decoded to stimulation
Dimension.
Fig. 5 is the flow chart for showing the Parzen window feature selectings for FCA according to the embodiment of the present invention.
In one operation 500, data set is divided into known object class.It is segmented into the non-limiting example bag of the data set of known object class
The data of fMRI BOLD reactions and the sensor in environment are included (such as from video camera, radar and laser radar
(LIDAR) imaging data).In the second operation 502, class distribution curve is generated for each object class.Thereafter, in the 3rd operation
In 504, dyadic array is generated for each object class.In the 4th operation 506, two metaclass curves are generated according to the dyadic array.
Next, in the 5th operation 508, section is ranked up on confidence threshold value.Finally, in the 5th operation 510, the row
Sequence is used for the feature extracted for the construction selection of FCA dot matrix from data set.Below to each progress in these operations more
It is described in detail.
(4.1) feature selecting
It is determined that for scalar data value appropriate case when, using Parzen windows density estimation (referring to bibliography No.3,
Description for Parzen window density estimations).Method according to certain embodiments of the present invention includes data point being divided into list
Only known object class.For each class, convolution is carried out to data point using Gaussian function.Resulting curve is referred to as class point
Cloth curve, it shows in Figure 10 A.For each class, by corresponding class distribution curve compared with other class distribution curves.Wound
Dyadic array is built, the dyadic array includes one on the section of class distribution curve maximum (relative to all other class distribution curve)
With zero on other sections.This is two metaclass curves, and it indicates that such has relative to all other class and is included in section
Those sections of the data value of maximum probability.The diagram of the independent two metaclass curve of each object class is shown in fig. 1 ob.So
Confidence level of these sections on them is ranked up afterwards, wherein by giving class including and all classes in the section
Comprising the ratio of summation calculate confidence level.The confidence value of Figure 10 A example is shown in Figure 11.
Formally see, algorithm ParzenFeatureSelection is as follows.Gauss (μ) is allowed to be with mean μ and standard
Poor σ Gauss.CoIt is used for object o class curve, and resulting case is bo.Corresponding confidence value is co.Output is bins
(it is initial value and the list of end value in section) and confs (it is the list for the confidence level in each section).
It is required that:X, the vector of the scalar data from input (such as fMRI BOLD voxels activity), obj corresponding objects classes,
Thresh confidence level cutoff thresholds
The sequence of this confidence level can be completed in several ways, and its non-limiting example is described below.Pass through
Obtain the area of the area and all other class distribution curve along the section below the class distribution curve along the section
The ratio of summation establish grade (rank).In our application, fMRI experiment duplicate measurements brain activities are as response
In different stimulated class (for example, class A and B) voxel value, to generate multiple measurement samples.If for example, input data voxel value
Reach 3.7 for 10 different samples, and wherein 7 samples are associated with the element of A classes, and wherein 3 samples are other
Class, then if in another sample observe value 3.7, can with 70% firmly believe its be class A example.Use predetermined threshold
Abandon the section with low confidence value.According to data statistics (sample number, sample Distribution value), calculated for level of confidence
Other methods can be proved to be useful.It is the non-limiting example that level of confidence calculates below:
Merge the size of case, Xiang Geng great case assigns more high confidence level.
Case is divided into multistage, wherein, central segment is endowed more high confidence level, and edge section is endowed more low confidence.
Calculated using different non-linear confidences.For example, use Fisher discriminates.Consider being directed to often from voxel
The average and dispersion (scatter) of the reaction sample of individual class.Average (m of the definition for class AA) and dispersion reaction (sA), its
In, the x for voxel to class AiReaction, dispersion byIt is it is determined that and similar
Ground, define remaining (mR) average, and remaining (sR) dispersion be defined for all reactions of other classes.Provide these
Definition, Fisher discriminates are defined as
Then the stability of voxel can be defined as maxAF(A).The advantages of measurement is that it maximizes class A average
The distance between its residual value, while minimize the variance of the reaction and the reaction to other classes to class A.
(4.2) experimental research
Two datasets are studied for classification.It is Irvine (UCI) machine in University of California first
Obtainable flag flower (Iris) data set in device learning database (referring to bibliography No.7, for flag flower data set).At this
In individual problem, target is come to flag flower based on " sepal length ", " sepal width ", " petal length " and " petal width "
Type is classified.Second data set is made up of fMRI BOLD reactions.
(4.2.1) flag flower
Algorithm of the classified use of flag flower data set described in U.S. non-provisional application No.14/807083 performs,
It as illustrated herein comprehensively is incorporated by reference into this.Using the present invention, compared with prior art, more magnetic dot arrays can be used
Data set is classified, the unified vanning (binning) of such as data, so that classification is more rapidly.
Fig. 6 and Fig. 7 have been illustrated according to the various embodiments of the present invention and Parzen windows (commonly referred to as Gauss case,
Represented by rhombus 602) compare, utilize the growth needed for the high score class standard of unified case (being represented as rectangle 600).It should be noted that
Dot matrix (or node) (as shown in Figure 6) and less than 100 edges with less than 50 concepts (or node) it is (as shown in Figure 7) come
Realize 90% degree of accuracy.
In addition, also having carried out a research, whether can improve classification accuracy, while still keep magnetic dot arrays structure if seeing.
The result is shown in the form of three-dimensional (3D) plot in Fig. 8 and Fig. 9, wherein, color value corresponds in each plot
Z-axis value.Blueness represent in z-axis minimum value (for example, for the % degrees of accuracy, z-axis minimum value in Fig. 8 for 30), and
Red represents maximum.Fig. 8 is exemplified with as threshold value (x-axis, labeled as confidence threshold value) and Parzen window sizes σ (y-axis, mark
Be designated as Gauss Sigma) function the classification degree of accuracy (z-axis and color, labeled as the % degrees of accuracy).
Fig. 9 is exemplified with as threshold value (x-axis, labeled as confidence threshold value) and Parzen window size σ (y-axis, labeled as Gauss
Sigma the dot matrix number of nodes (z-axis and color, labeled as # nodes) of function structure).Point in each plot is right each other
Should, thus the region with x=0.7-0.8 (confidence threshold value) and y=0.02-0.06 (Gauss Sigma) corresponds to z=97%
(the % degrees of accuracy in Fig. 7) and z=50 (the # nodes in Fig. 9).As shown in the figure, the results showed that needing less than 50 nodes
In the case of can realize 97% the degree of accuracy.The sorting technique of this state of the art than being announced for the data set is good
(referring to bibliography No.1 and No.5).
(4.2.2) FMRI (fMRI) Blood oxygen level dependence (BOLD) reacts
(4.2.2.1) voxel is cased
FMRI BOLD reactions be used to represent that IC nervous activity is horizontal with non-intruding mode.Various stimulations are presented
(for example, spoken words, written, image), represent semantic or conceptual input.During presentation is stimulated, brain is recorded
Reaction.Inert baseline is subtracted, and extracts this neutral brain state and in response to the difference between the brain states of the stimulation
It is different.
This group stimulates (single word, spoken words, image either in sentence etc.) representation conceptual analysis (FCA)
Object, and for IC voxel extract fMRI BOLD react expression object attribute.Then, can exert
During thought process classification of the power to people, FCA is classified, and (in U.S. non-provisional application No.14/807, described in 083) application is extremely
FMRI BOLD react.Therefore, the feature extraction using the Parzen window bin packing algorithms via the present invention.
Figure 12 exemplified with human experimenter 1200 be presented with one group of stimulation 1202 (for example, spoken words, written,
Characteristic).During the group is presented and stimulates 1202,1202 record fMRI BOLD reactions 1204 are stimulated in response to the group.Because the group
1202 are stimulated to represent FCA object, and the fMRI BOLD reactions 1204 extracted represent the attribute of object, so then may be used
So that when making great efforts the classification of the thought process 1208 to people, the application of FCA classification 1206 to fMRI BOLD is reacted into 1204.
Invention described herein has a variety of applications.For example, as described above, FCA classification contributes to for thorn is presented
The classification of sharp fMRI BOLD reactions.Moreover, method according to certain embodiments of the present invention can be used for production line
Or the inefficient classification in circuit design, because many such poor efficiency are based on dependence, produced so as to obtain
During concealed structure.
Claims (20)
1. a kind of system for being used for the feature selecting for form concept analysis FCA, the system include:
There is the association that coding has executable instruction thereon to deposit for one or more processors, one or more processor
Reservoir so that when performing the executable instruction, operated below one or more computing device:
One group of data point with feature is divided into one group of known object class;
For each known object class, convolution is carried out to the data point using Gaussian function, obtains being directed to each known object
The class distribution curve of class;
For each class distribution curve, identify the class distribution curve relative to the maximum data value of all other class distribution curve
Section;
The section is sorted on predetermined confidence threshold value;
Using the sequence in the section, select which feature extracted from one group of data point in FCA dot matrix construction;With
And
From feature selected by one group of data point extraction.
2. system according to claim 1, wherein, selected feature is used to interpret neural deta.
3. system according to claim 2, wherein, selected feature is applied to FMRI fMRI reactions,
Classified with the thought process to people.
4. system according to claim 1, wherein, one or more processor, which also performs generation, includes one and zero
Dyadic array operation, the dyadic array has one on the section of the maximum data of the class distribution curve, and at it
Have zero on its section.
5. system according to claim 4, wherein, for each known object class, generate two metaclass curves, the binary
The instruction of class curve exceedes all other known object class for which section, the performance of the known object class.
6. system according to claim 1, wherein, one group of data point includes the data from neural sensor.
7. system according to claim 1, wherein, the predetermined confidence threshold value, which be used to eliminate, has low confidence value
Section.
8. system according to claim 1, wherein, by being taken along below each class distribution curve in each section
Area and the ratio of the summation along area below all other class distribution curve in each section determine the section
The sequence.
9. a kind of computer implemented method for being used for the feature selecting for form concept analysis FCA, methods described include following
Step:
Make the action for the instruction that one or more computing devices are stored on non-transitory memory so that upon execution,
Operated below one or more computing device:
One group of data point with feature is divided into one group of known object class;
For each known object class, convolution is carried out to the data point using Gaussian function, obtains being directed to each known object
The class distribution curve of class;
For each class distribution curve, identify the class distribution curve relative to the maximum data value of all other class distribution curve
Section;
The section is sorted on predetermined confidence threshold value;
Using the sequence in the section, select which feature extracted from one group of data point in FCA dot matrix construction;With
And
From feature selected by one group of data point extraction.
10. according to the method for claim 9, wherein, selected feature is used to interpret neural deta.
11. according to the method for claim 10, wherein, it is anti-that selected feature is applied to FMRI fMRI
Should, classified with the thought process to people.
12. according to the method for claim 9, wherein, one or more processor, which also performs generation, includes a He
The operation of zero dyadic array, the dyadic array have one on the section of the maximum data of the class distribution curve, and
Have zero on other sections.
13. according to the method for claim 12, wherein, for each known object class, generate two metaclass curves, described two
The instruction of metaclass curve exceedes all other known object class for which section, the performance of the known object class.
14. according to the method for claim 9, wherein, the predetermined confidence threshold value, which be used to eliminate, has low confidence
The section of value.
15. a kind of computer program product for being used for the feature selecting for form concept analysis FCA, the computer program production
Product include the computer-readable instruction being stored in non-transitory computer-readable medium, and the computer-readable instruction can lead to
Cross the computer with one or more processors to perform, so as to operate below the computing device:
One group of data point with feature is divided into one group of known object class;
For each known object class, convolution is carried out to the data point using Gaussian function, obtains being directed to each known object
The class distribution curve of class;
For each class distribution curve, identify the class distribution curve relative to the maximum data value of all other class distribution curve
Section;
The section is sorted on predetermined confidence threshold value;
Using the sequence in the section, select which feature extracted from one group of data point in FCA dot matrix construction;With
And
From feature selected by one group of data point extraction.
16. computer program product according to claim 15, wherein, selected feature is used to interpret neural deta.
17. computer program product according to claim 16, wherein, selected feature be applied to functional magnetic resonance into
As fMRI reactions, classified with the thought process to people.
18. computer program product according to claim 15, the computer program product also includes described for making
One or more computing device generations include the instruction of the operation of one and zero dyadic array, and the dyadic array is described
Have one on the section of the maximum data of class distribution curve, and have zero on other sections.
19. computer program product according to claim 18, wherein, for each known object class, generate two metaclass
Curve, the two metaclass curve instruction exceed all other known object for which region, the performance of the known object class
Class.
20. computer program product according to claim 15, wherein, the predetermined confidence threshold value be used to eliminate tool
There is the section of low confidence value.
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201562195876P | 2015-07-23 | 2015-07-23 | |
| US14/807,083 | 2015-07-23 | ||
| US14/807,083 US10360506B2 (en) | 2014-07-23 | 2015-07-23 | General formal concept analysis (FCA) framework for classification |
| US62/195,876 | 2015-07-23 | ||
| PCT/US2016/031644 WO2017014826A1 (en) | 2015-07-23 | 2016-05-10 | A parzen window feature selection algorithm for formal concept analysis (fca) |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN107710239A true CN107710239A (en) | 2018-02-16 |
Family
ID=57834502
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201680033746.XA Pending CN107710239A (en) | 2015-07-23 | 2016-05-10 | PARZEN window feature selecting algorithms for form concept analysis (FCA) |
Country Status (3)
| Country | Link |
|---|---|
| EP (1) | EP3326118A4 (en) |
| CN (1) | CN107710239A (en) |
| WO (1) | WO2017014826A1 (en) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111353498A (en) * | 2018-12-21 | 2020-06-30 | 三星电子株式会社 | System and method for providing dominant scene classification through semantic segmentation |
| CN114245912A (en) * | 2019-09-24 | 2022-03-25 | 赫尔实验室有限公司 | System and method for perceptual error assessment and correction by solving optimization problems under constraints based on probabilistic signal temporal logic |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10866992B2 (en) | 2016-05-14 | 2020-12-15 | Gratiana Denisa Pol | System and methods for identifying, aggregating, and visualizing tested variables and causal relationships from scientific research |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA2279359C (en) * | 1999-07-30 | 2012-10-23 | Basantkumar John Oommen | A method of generating attribute cardinality maps |
| WO2010022505A1 (en) * | 2008-08-29 | 2010-03-04 | Peter Sweeney | Systems and methods for semantic concept definition and semantic concept relationship synthesis utilizing existing domain definitions |
| WO2010147010A1 (en) * | 2009-06-17 | 2010-12-23 | 日本電気株式会社 | Module classification analysis system, module classification analysis method, and module classification analysis program |
| US9495454B2 (en) * | 2012-03-08 | 2016-11-15 | Chih-Pin TANG | User apparatus, system and method for dynamically reclassifying and retrieving target information object |
-
2016
- 2016-05-10 EP EP16828171.5A patent/EP3326118A4/en not_active Withdrawn
- 2016-05-10 WO PCT/US2016/031644 patent/WO2017014826A1/en unknown
- 2016-05-10 CN CN201680033746.XA patent/CN107710239A/en active Pending
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111353498A (en) * | 2018-12-21 | 2020-06-30 | 三星电子株式会社 | System and method for providing dominant scene classification through semantic segmentation |
| CN114245912A (en) * | 2019-09-24 | 2022-03-25 | 赫尔实验室有限公司 | System and method for perceptual error assessment and correction by solving optimization problems under constraints based on probabilistic signal temporal logic |
Also Published As
| Publication number | Publication date |
|---|---|
| WO2017014826A1 (en) | 2017-01-26 |
| EP3326118A4 (en) | 2019-03-27 |
| EP3326118A1 (en) | 2018-05-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11514369B2 (en) | Systems and methods for machine learning model interpretation | |
| Hussain et al. | Classification, clustering and association rule mining in educational datasets using data mining tools: A case study | |
| US20180247156A1 (en) | Machine learning systems and methods for document matching | |
| Bellinger et al. | Manifold-based synthetic oversampling with manifold conformance estimation | |
| US11170249B2 (en) | Identification of fields in documents with neural networks using global document context | |
| US20190303535A1 (en) | Interpretable bio-medical link prediction using deep neural representation | |
| US11947506B2 (en) | Method and system for mapping a dataset from a Hilbert space of a given dimension to a Hilbert space of a different dimension | |
| CN106575380B (en) | The system and method for the data classification of use form conceptual analysis | |
| US20140279727A1 (en) | Sparse Factor Analysis for Analysis of User Content Preferences | |
| US10360506B2 (en) | General formal concept analysis (FCA) framework for classification | |
| EP3832491A1 (en) | Methods for processing a plurality of candidate annotations of a given instance of an image, and for learning parameters of a computational model | |
| Man et al. | Hydraulic flow unit classification and prediction using machine learning techniques: A case study from the Nam Con Son basin, offshore Vietnam | |
| Asiri et al. | Enhancing brain tumor diagnosis: transitioning from convolutional neural network to involutional neural network | |
| CN107710239A (en) | PARZEN window feature selecting algorithms for form concept analysis (FCA) | |
| Sardar et al. | Big data computing: advances in technologies, methodologies, and applications | |
| Anderson | Visual Data Mining: The VisMiner Approach | |
| US10360993B2 (en) | Extract information from molecular pathway diagram | |
| Johnson et al. | On experimenting large dataset for visualization using distributed learning and tree plotting techniques | |
| US20240028828A1 (en) | Machine learning model architecture and user interface to indicate impact of text ngrams | |
| Schatzmann et al. | Using self-organizing maps to visualize clusters and trends in multidimensional datasets | |
| Bruch et al. | Evaluation of semi-supervised learning using sparse labeling to segment cell nuclei | |
| Habrat et al. | Identification of AI-generated rock thin-section images by feature analysis under data scarcity | |
| Babu et al. | A statistician teaches deep learning | |
| Saiful et al. | MRI-Based Brain Tumor Classification Using Various Deep Learning Convolutional Networks and CNN | |
| CN116415632A (en) | Method and system for local interpretability of neural network prediction domains |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180216 |
|
| WD01 | Invention patent application deemed withdrawn after publication |