FUN in S&T: April 2011

Thursday, April 28, 2011

Reinforcement Learning / Successes of Reinforcement Learning

Wednesday, April 27, 2011

Measuring Measures - Measuring Measures - Learning about Machine Learning, 2nd Ed.

Classifier Showdown « Synaptic

take a closer look at three different classifiers and discuss three different types of classifiers: naive bayesian classifiers, support vector machines and modular multilayer perceptron neural networks.

Saturday, April 23, 2011

Ninth Workshop on Mining and Learning with Graphs (MLG 2011)

Friday, April 22, 2011

Stanford School of Engineering - Stanford Engineering Everywhere

Tuesday, April 19, 2011

math - Mathematics for AI/Machine learning ? - Stack Overflow

Fuzzy Logic,

Bayes Theorem

Logic - An Investigation of the Laws of Thought (Boole) and Set Theory and Logic (Stoll)
Computation - Introduction to the Theory of Computation (Sipser)
Probablility -

Math for Programmers.

algorithm - Help Understanding Cross Validation and Decision Trees - Stack Overflow

The purpose of cross validation is not to help select a particular instance of the classifier (or decision tree, or whatever automatic learning application) but rather to qualify the model,

Monday, April 18, 2011

Text-learning Group - Resources

Notes on Path Finding problem

Some good sources on Path-finding:

# A* path finding: ( googling for astar c.)

By Patrick Lester (Updated July 18, 2005)

http://www.policyalmanac.org/games/aStarTutorial.htm

http://www.heyes-jones.com/astar.html

http://stackoverflow.com/questions/3412157/a-implemented-in-c

http://en.wikipedia.org/wiki/A*

# Path matrix for required length

Create the adjacency matrix where matrix[u][v] = 1 denotes an edge between u and v, and matrix[u][v] = 0 denotes no edge between u and v. Then (matrix)^3 (just simple matrix exponentiation) is 'magically' the path matrix of exactly length 3.

The theorem that addresses the random walk problem is this:

Let A = [aij] be the adjacency matrix of a graph G having points n1, …, nn. Let k be any positive integer. Then the number of distinct n1-nj walks of length k in G is equal to the i, j element of A^k.

Check

A) book: Classic Data Structures, by D. Samanta at page 367

http://books.google.com/books?id=_lDjLlasQycC&lpg=PA368&ots=LRtfxKppaq&dq=path%20matrix%20of%20exactly%20length&pg=PA367#v=onepage&q=path%20matrix%20of%20exactly%20length&f=false

B) http://oneweb.utc.edu/~Christopher-Mawata/petersen/lesson7.htm

# Some other related topics:

The python code for : Dijkstra to find two points furthest away from each other

# Find the longest path:

http://stackoverflow.com/questions/477591/algorithm-to-find-two-points-furthest-away-from-each-other

From the record:

An efficient implementation of Dijkstra's algorithm takes O(Elog V) time for a graph with E edges and V vertices.
Hosam Aly's "flood fill" is a breadth first search, which is O(V). This can be thought of as a special case of Dijkstra's algorithm in which no vertex can have its distance estimate revised.
The Floyd-Warshall algorithm takes O(V^3) time, is very easy to code, and is still the fastest for dense graphs (those graphs where vertices are typically connected to many other vertices). But it'snot the right choice for the OP's task, which involves very sparse graphs.

Raimund Seidel gives a simple method using matrix multiplication to compute the all-pairs distance matrix on an unweighted, undirected graph (which is exactly what you want) in the first section of his paper On the All-Pairs-Shortest-Path Problem in Unweighted Undirected Graphs [pdf].

# This one is not exactly the path finding problem, but also very interesting-- like generating the possible move

http://stackoverflow.com/questions/2111934/algorithm-to-find-most-efficient-moves-to-arrive-at-a-given-point

(This is not exactly the problem that I have, but it's isomorphic, and I think that this explanation will be easiest for others to understand.)

Suppose that I have a set of points in an n-dimensional space. Using 3 dimensions for example:

  A : [1,2,3] B : [4,5,6] C : [7,8,9]

I also have a set of vectors that describe possible movements in this space:

  V1 : [+1,0,-1] V2 : [+2,0,0]

Now, given a point dest, I need to find a starting point p and a set of vectors moves that will bring me todest in the most efficient manner. Efficiency is defined as "fewest number of moves", not necessarily "least linear distance": it's permissible to select a p that's further from dest than other candidates if the move set is such that you can get there in fewer moves. The vectors in moves must be a strict subset of the available vectors; you can't use the same vector more than once unless it appears more than once in the input set.

My input contains ~100 starting points and maybe ~10 vectors, and my number of dimensions is ~20. The starting points and available vectors will be fixed for the lifetime of the app, but I'll be finding paths for many, many different dest points. I want to optimize for speed, not memory. It's acceptable for the algorithm to fail (to find no possible paths to dest).

Update w/ Accepted Solution

I adopted a solution very similar to the one marked below as "accepted". I iterate over all points and vectors and build a list of all reachable points with the routes to reach them. I convert this list into a hash of <dest, p+vectors>, selecting the shortest set of vectors for each destination point. (There is also a little bit of optimization for hash size, which isn't relevant here.) Subsequent dest lookups happen in constant time.

Sunday, April 17, 2011

Combining Learning Strategies to Reduce Label Cost: ICML 2011 Workshop

Regularization for high dimensional learning Course

Regularization for high dimensional learning Course
www.disi.unige.it/dottorato/corsi/RegMet2011/

Exploration & Exploitation Challenge | Machine Learning for Website Optimisation

Bayesian Modelling Applications Workshop

Welcome to Social Web Mining Workshop, co-located with IJCAI 2011

KDD 2011: 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

SWSM 2011

ICML 2011 Structured Sparsity: Learning and Inference Workshop

Thursday, April 7, 2011

Welcome to Social Web Mining Workshop, co-located with IJCAI 2011

Social Web Mining Workshop, co-located with IJCAI 2011

International Workshop on Social Web Mining

Co-located with IJCAI, 18 July 2011, Barcelona, Spain

Louhi 2011

The third Louhi one-day workshop is a multidisciplinary international workshop, bringing together researchers studying syntactic, semantic and pragmatic aspects of healthcare documents. Healthcare documents encompass, but are not limited to, electronic patient records, clinical documentation, discharge letters, care guidelines, scientific texts and other textual data related to biomedicine and healthcare.The Third Louhi Workshop follows Louhi’08, the First conference on Text and Data Mining of Health Documents in Turku, Finland, 2008, and Louhi '10 in Los Angeles, collocated with 11th NAACL conference. The workshop aims to gather representatives from clinical practice, research and IT industry

Wednesday, April 6, 2011

2011 IEEE GRSS Data Fusion Contest

There are only 50 days left for downloading 5 high-resolution
WorldView-2 multi-spectral multi-angular acquisitions and
participating to the Contest. The deadline for the paper submission is
May 31, 2011. Final results will be announced in Vancouver (Canada) at
the 2011 IEEE International Geoscience and Remote Sensing Symposium.
Check the IGARSS 2011 abstract at http://slidesha.re/gLagLW
About the IEEE GRSS Data Fusion Contest:
The Data Fusion Contest has been organized by the Data Fusion
Technical Committee of the Geoscience and Remote Sensing Society of
the International Institute of Electrical and Electronic Engineers and
annually proposed since 2006. It is a contest open not only to IEEE
members, but to everyone.
This year the Data Fusion Contest aims at exploiting multi-angular
acquisitions over the same target area.
Five WorldView-2 multi-sequence images have been provided by
DigitalGlobe. This unique data set is composed by five Ortho Ready
Standard Level-2 WorldView-2 multi-angular acquisitions, including
both 16 bit panchromatic and multi-spectral 8-band images. The imagery
was collected over Rio de Janeiro (Brazil) on January 2010 within a
three minute time frame. The multi-angular sequence contains the
downtown area of Rio, including a number of large buildings,
commercial and industrial structures, the airport and a mixture of
community parks and private housing.
Since there are a large variety of possible applications, each
participant can decide the research topic to work with. Each
participant is required to submit a full paper in English of no more
than 4 pages including illustrations and references by May 31, 2011.
Final results will be announced in Vancouver (Canada) at the 2011 IEEE
International Geoscience and Remote Sensing Symposium.
2011 DigitalGlobe - IEEE GRSS Data Fusion Contest
--
♥ ¸¸.•*¨*•♫♪♪♫•*¨*•.¸¸♥

Monday, April 4, 2011

Greg Mankiw's Blog: Advice for Grad Students

Don Davis gives some guidance about finding research topics.
John Cochrane tells grad students how to write a paper.
Michael Kremer provides a checklist to make sure your paper is as good as it can be.
David Romer gives you the rules to follow to finish your PhD.
David Laibson offers some advice about how the navigate the job market for new PhD economists.
John Cawley covers the same ground as Laibson but in more detail.
Kwan Choi office advice about how to publish in top journals.
Dan Hamermesh offers advice on, well, just about everything.
Assar Lindbeck tells you how, after getting that first academic post, to win the Nobel prize.