The Microsoft Distributed Machine Learning Toolkit (DMTK) has been made open source by the vendor's Asia research team. The DMTK will make machine learning tasks on big data more scalable and efficient with a smaller cluster of computers. This is particularly useful for machine learning researchers and developers that work with large datasets.
Distributed machine learning involves stringing a number of computers together to solve complex problems. Here's what the DMTK contains:
- DMTK framework, which is a parameter server that supports storing a hybrid data-structure model.
- Two distributed machine learning algorithms, which, according to Microsoft, "can be used to train the fastest and largest topic model and largest word-embedded model in the world".
- APIs to reduce the barrier of entry for distributed machine learning "so researchers and developers can focus on core machine learning tasks like data, model and training".
- The ability to quickly handle complex tasks involving computer vision, speech recognition and textual understanding.
The toolkit is particularly beneficial to researchers and developers working with big data, big model machine learning problems as they can make use of the DMTK to solve them much faster and with smaller clusters of computers. They can also build their own distributed machine learning algorithms on top of the toolkit.
[Via Microsoft Research Blog]