New academic developments that have immediate applications in industry?
What are some new academic developments in ML + NLP that have immediate applications in industry?
I prefer developments that are one step away from application (e.g. question answering), instead of approaches (e.g. deep learning for your application).
I am giving on talk on this to thought-leaders in industry at O'Reilly Strata Conference. If I hear interesting suggestions, I'll try to include them in my talk.
If you were forming a startup, either a new company or in an existing company, and you had a year of R&D time to transition something from academia to industry, what would it?
I'll get the ball rolling by proposing this topics I proposed. These are not meant to be comprehensive in any way. I just zoomed in on some work that I think is particularly cool.
Please make one separate answer per idea. Please indicate if it is an application or an approach. Please upvote the ideas that you consider most promising and most close to immediate application.
Application: Knowledge extraction
Wick, McCallum, Miklau. Scalable probabilistic databases with factor graphs and MCMC and other recent papers by Andrew McCallum's group.
The idea is that you can use factor graphs for inference in information extraction, relation extraction, coreference, and many other interesting problems, while leveraging a traditional database and answering any SQL query by sampling. The idea is that you keep in the database both the actual data and the inferred value, and always resample the inferred values with MCMC. Then to answer a query you create a materialized view, watch it change for a while while the database sampler explores the answer space, and get the answer. As the inference is completely decoupled from the database, you can ask questions that don't fit at all in the graphical model formalism used under the hood. Some details for more complex applications are still a bit fuzzy (like how do you average over inferred data with an identifiability problem), but for simple things this can work really well.
See Andrew McCallum's talk at videolectures, it's really interesting, and should be an easy sell for the enterprise environment due to it being database-conscious, java-friendly, and very abstractable (as in, users don't have to know there's an inferential machine in there).
Application: Semantic search.
Semantic hashing (Salakhutdinov + Hinton, 2007)
Keyword search and its varients, like that done by Google, can easily scale to billions of documents, but can often miss relevant results.
What if your search is missing relevant results, because simple keyword matching misses documents that don't contain that exact keywords? This issue is especially acute for short text, like tweets. Tweets about the MTV music awards, for example, rarely contain the term VMA or the hash tag #vma. But wouldn't it be useful to retrieve all relevant results?
Semantic hashing allows you to do search just as fast as keyword matching, but it does semantic search and find relevant documents that don't necessarily contain the search keywords. It also is completely automatic, and doesn't require ontologies or other human annotation. And it can scale to billions of documents, like keyword search.
answered 2 days ago
This is indeed a promising development to perform web-scale similarity lookups for documents.
I would also add that this approach should be amenable to to indexing non-text multimedia content (image, sounds, audio) if you stack it on top of a good (unsupervised?) feature extraction layer. For instance for images you can extract convolutional code-words using Convolutional-DBNs or simpler convolutional soft kmeans) or more holistic features (eigen-scenes with PCA, GIST scene descriptors, ...)
Application: Latent factor log linear models
How about the latent factor log linear models by Menon and Elkan?
http://arxiv.org/abs/1006.2156
Very nice representation for recommendations that accounts for extraneous features very well. Very simple training and state of the art results.
Application: Question answering.
Unsupervised Semantic Parsing (Poon + Domingos, 2009+2010)
A lot of work has gone into building natural language search engines, and question-answering systems. However, these works have only been moderately successful. In particular, previous approaches (like that of Powerset and Wolfram Alpha) have required sophisticated linguistic expertise, and extensive ontology and knowledge-base construction. Essentially, there have been a lot of human engineering in the loop, and these techniques still don't work so well.
Unsupervised semantic parsing is a highly ambitious and successful technique that attacks the problem of reading text and understanding its meaning. It requires no human annotation, and just learns by reading text. It has been applied to question-answering and is far more successful that competing academic baselines. By combining this automatic technique with current human-engineered tricks, one could significantly improve deployed NL search and question-answering systems.
Source code is available that implements this technique.
Applications: Machine translation + spam detection.
Large scale recurrent character-level language models (unpublished)
Machine translation requires accurate language models, to choose correct translations. Detecting spam also requires large-scale accurate language models. However, most large-scale language models are not very accurate, and most accurate language models are not large scale.
Large scale recurrent character-level language models are both accurate, and large-scale, and have immediate applications in machine translation and spam.
Source code for this technique has NOT been published.
Approach: Parallel training.
[Note: This isn't that good, since I think this technology isn't mature, and it is several steps away from being an application in practice.]
Graphlab, a new parallelism abstraction (Low et al, 2010)
There are two ways to achieve significant improvements in predictive analytics and ML tasks like recommendation, sentiment analysis, credit risk assessment, financial forecasting, etc: You can throw more data at the problem or you can use more sophisticated learning algorithms.
MapReduce, and its implementation Hadoop, have been highly successful at promoting distributed computing. MapReduce is good for single-iteration and embarassingly parallel distributed tasks like feature processing, which means that a lot more data can be processed. However, Map-Reduce is too high-level to implement sophisticated learning algorithms.
What kind of gains could you see if you could have the best of both worlds? Large data AND sophisticated learning algorithms? GraphLab might offer those gains.
GraphLab is only slightly lower-level than MapReduce, but significantly more powerful. It is good for iterative algorithms with computational dependencies or complex asynchronous schedules, and has been tested on a variety of sophisticated machine learning algorithms.
Source code is available that implements GraphLab.