'Hyoungjin Kim'에 해당되는 글 2건

  1. 2009.03.03 Random forest (3)
  2. 2006.08.14 CBIR (1)
search2009.03.03 11:35
from http://durl.kr/bnz

Leo Breiman and Adele Cutler

RF is an example of a tool that is useful in doing analyses of scientific data.
But the cleverest algorithms are no substitute for human intelligence and knowledge of the data in the problem.
Take the output of random forests not as absolute truth, but as smart computer generated guesses that may be helpful in leading to a deeper understanding of the problem.


We assume that the user knows about the construction of single classification trees. Random Forests grows many classification trees. To classify a new object from an input vector, put the input vector down each of the trees in the forest. Each tree gives a classification, and we say the tree "votes" for that class. The forest chooses the classification having the most votes (over all the trees in the forest).

Each tree is grown as follows:

  1. If the number of cases in the training set is N, sample N cases at random - but with replacement, from the original data. This sample will be the training set for growing the tree.
  2. If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing.
  3. Each tree is grown to the largest extent possible. There is no pruning.

In the original paper on random forests, it was shown that the forest error rate depends on two things:

  • The correlation between any two trees in the forest. Increasing the correlation increases the forest error rate.
  • The strength of each individual tree in the forest. A tree with a low error rate is a strong classifier. Increasing the strength of the individual trees decreases the forest error rate.

Reducing m reduces both the correlation and the strength. Increasing it increases both. Somewhere in between is an "optimal" range of m - usually quite wide. Using the oob error rate (see below) a value of m in the range can quickly be found. This is the only adjustable parameter to which random forests is somewhat sensitive.

Features of Random Forests

  • It is unexcelled in accuracy among current algorithms.
  • It runs efficiently on large data bases.
  • It can handle thousands of input variables without variable deletion.
  • It gives estimates of what variables are important in the classification.
  • It generates an internal unbiased estimate of the generalization error as the forest building progresses.
  • It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
  • It has methods for balancing error in class population unbalanced data sets.
  • Generated forests can be saved for future use on other data.
  • Prototypes are computed that give information about the relation between the variables and the classification.
  • It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.
  • The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.
  • It offers an experimental method for detecting variable interactions.

Posted by myditto
search2006.08.14 14:08

External links

  • CIRES developed by the University of Texas at Austin.
  • Tiltomo : Image Visual Search EngineCBIR (content based image retrieval) system uses advanced proprietarySubject, Color & Texture recognition algorithms to analyze imagecomposition.
  • our.imgSeek - Site for social photo bookmarking: search images by similarity, sketch, tag, rate and get recommendations.
  • IKONA - Online demonstration - Generic CBIR system - INRIA - IMEDIA
  • SIMPLIcity and ALIP online Demos developed by Stanford and Penn State Universities
  • GIFT - The GNU Image Finding Tool - an open source query by example CBIRS
    • Viper Demo - an online demonstration of the GIFT
    • Perl MRML Client - another GIFT demo, using a different client, and combining textual annotation with visual features
  • SIMBA- demo of the Search Images By Appearance system by theAlbert-Ludwigs-Universität Freiburg (Germany) - Inst. for PatternRecognition and Image Processing
  • FIRE online demo, FIRE homepage FIRE (Flexible Image Retrieval Engine) is another open source query by example CBIRS
  • LCPD: Leiden 19th-Century Portrait Database - an online database of 19th century studio portraits searchable via CBIR and commonly referenced in the literature
  • imgSeek - opensource photo collection manager and viewer with content-based search and many other features
  • Video Google demo - search movies for specific objects
  • Cortina - Content Based Image Retrieval for 3 Million Images. From UCSB.
  • eVision - Go Beyond Keywords! Perform a Visual Image Search.
  • Octagon - Free Java based Content-Based Image Retrieval software.
  • Retrievr - search and explore in a selection of Flickr images by drawing a rough sketch or uploading an image.
  • LTU technologies- LTU tech has deployed CBIR and automatic image classificationapplications in the media market, the IP protection market and the lawenforcement / computer forensics market. Online demo on Corbis images.
  • PicSOM CBIR tool, developed in the Laboratory of Computer and Information Science, Helsinki University of Technology.
  • LIRE - Lucene Image Retrieval Java CBIR library, which uses the Lucene search engine
  • MUVIS - MUVIS Image and Video Retrieval CBIR System at TUT- Tampere University of Technology.
  • xcavator - an interactive image search demo integrated with Flickr. Powered by technology developed by CogniSign.
  • IN2 intelligent indexing - provides multimedia content management solutions including content-based image and video retrieval.

Relevant research papers

Posted by myditto