by
Leo Breiman and Adele Cutler
RF is an example of a tool that is useful in doing analyses of scientific data.
But the cleverest algorithms are no substitute for human intelligence and knowledge of the data in the problem.
Take the output of random forests not as absolute truth, but as smart computer generated guesses that may be helpful in leading to a deeper understanding of the problem.
Overview
We assume that the user knows about the construction of single classification trees. Random Forests grows many classification trees. To classify a new object from an input vector, put the input vector down each of the trees in the forest. Each tree gives a classification, and we say the tree "votes" for that class. The forest chooses the classification having the most votes (over all the trees in the forest).
Each tree is grown as follows:
- If the number of cases in the training set is N, sample N cases at random - but with replacement, from the original data. This sample will be the training set for growing the tree.
- If there are M input variables, a number m<<M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing.
- Each tree is grown to the largest extent possible. There is no pruning.
In the original paper on random forests, it was shown that the forest error rate depends on two things:
- The correlation between any two trees in the forest. Increasing the correlation increases the forest error rate.
- The strength of each individual tree in the forest. A tree with a low error rate is a strong classifier. Increasing the strength of the individual trees decreases the forest error rate.
Reducing m reduces both the correlation and the strength. Increasing it increases both. Somewhere in between is an "optimal" range of m - usually quite wide. Using the oob error rate (see below) a value of m in the range can quickly be found. This is the only adjustable parameter to which random forests is somewhat sensitive.
Features of Random Forests
- It is unexcelled in accuracy among current algorithms.
- It runs efficiently on large data bases.
- It can handle thousands of input variables without variable deletion.
- It gives estimates of what variables are important in the classification.
- It generates an internal unbiased estimate of the generalization error as the forest building progresses.
- It has an effective method for estimating missing data and maintains accuracy when a large proportion of the data are missing.
- It has methods for balancing error in class population unbalanced data sets.
- Generated forests can be saved for future use on other data.
- Prototypes are computed that give information about the relation between the variables and the classification.
- It computes proximities between pairs of cases that can be used in clustering, locating outliers, or (by scaling) give interesting views of the data.
- The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection.
- It offers an experimental method for detecting variable interactions.
from : http://www.promotionworld.com/news/editors/080317MobileVisualSearch.html What is the future of mobile internet usage? |
|
|
|
|
The world of Mobile search is evolving extremely fast. Beginning with keyword-based search, going through the next step - voice search, now the end user is offered to send a photo by his cell phone in order to find relevant to his photo’s query information in Internet. Mobile Search is a developing branch that allows users to find mobile content interactively on mobile websites. With the years, mobile content has changed its media direction towards mobile multimedia. Nevertheless, mobile search is not just a simple shift of PC web search to mobile equipment, but it is connected to specialized segments of mobile broadband and mobile content, both of which have been fast-paced evolving recently. The major search engines are aggressively trying to create applications and relationships in order to take advantage of a mobile ad market. According to a leading market research firm eMarketer, strong competition for the US mobile search market might be anticipated, having in mind the large US online ad market and strong pushes by portals. By 2011, mobile search is expected to account for around $715 million. The Mobile directory search industry is almost as old as the telecom and offers services that enable people by entering a word or phrase on their phone to find local services based on their current location. An example of usage would be a person looking for a local hotel after a tiring journey or taxi company after a night out. The services can also come with a map and directions to facilitate the user. What was the next step? GOOG-411. This is another but this time voice-activated mobile search. The free service allows callers to access Google’s local information through voice search. There is no doubt, that mobile voice search is simpler and more convenient for the callers than typing on the phone’s buttons. “I’d have to be a visionary to be vindicated, and I’m making no such claim. It’s just hard to ignore that most people prefer talking in their phones to typing on them, and a mobile search engine that made voice search possible might have an easier time finding an audience”, said Bryson Meunier, Product Champion, Natural Search in a posting at www.findresolution.com. For the same reasons Meunier believes that mobile visual search could be bigger than voice search. How do the searchers initiate a visual query? Simply by snapping a photo of something with their phone, which the mobile search engine processes with algorithms and returns relevant digital content based on its interpretation of the user’s visual query. Visual Search is now gathering popularity. At the Cebit trade show in Germany, Vodafone demonstrated Otello, a search engine that uses images as input. Users send pictures via MMS (Multimedia Messaging Service) from their mobile phones. Otello then returns information relevant to the picture to the mobile phone, just like a normal search engine. There are other examples of companies like SnapNow and Mobot that have actually been offering this service for a few years. Google has its own Mobile Visual Search engines in the face of Never Vision. Of course, the audience for mobile visual search is currently not so large, but it might be just a matter of time, predicted Meunier. |
Google makes extensive use of OpenCV internally for their street view, map stitching and other image processing needs. They have recently contributed some of their code back to OpenCV in the form of Daniel Filip's C++ image wrapper class cvwimage. Look for cvwimage.h in the .../cxcore/include directory. This code came too late to be documented in the book, but we will document it on the website after we ourselves get more familiar with it. Google is interested in speed and minimizing bugs, so this class wraps IplImage and concentrates on these things:
1. Images are explicitly owned to avoid memory leak problems.
2. This class provides fast access to subregions of an image, especially lines. The member functions are only things that are very fast. To call most OpenCV functionality, you expose the pointer to the image and call the OpenCV functions as usual. That means, there is little learning curve with this class.
3. You can derive window pointers to sub regions of the original image.
Typically in Google, they allocate a huge image space and then put all their images that they are processing into sub-regions of this huge window (see the Region of Interest "ROI" discussion starting on the bottom of page 43 in the book). This huge window becomes their processing buffer and sub-regions are allocated and passed to OpenCV or Google functions for processing. Memory managment isn't much of a problem because only one routine "owns" the huge window so it's easy to manage when this is allocated or deallocated.
from http://fyi.oreilly.com/2008/10/gary-bradskis-top-ten-tips-for.html

이올린에 북마크하기
이올린에 추천하기
Prev

