Microsoft Open Sources Distributed Machine Learning Toolkit For Easier Big Data Research

The Microsoft Distributed Machine Learning Toolkit (DMTK) has been made open source by the vendor's Asia research team. The DMTK will make machine learning tasks on big data more scalable and efficient with a smaller cluster of computers. This is particularly useful for machine learning researchers and developers that work with large datasets.

Distributed machine learning involves stringing a number of computers together to solve complex problems. Here's what the DMTK contains:

  • DMTK framework, which is a parameter server that supports storing a hybrid data-structure model.
  • Two distributed machine learning algorithms, which, according to Microsoft, "can be used to train the fastest and largest topic model and largest word-embedded model in the world".
  • APIs to reduce the barrier of entry for distributed machine learning "so researchers and developers can focus on core machine learning tasks like data, model and training".
  • The ability to quickly handle complex tasks involving computer vision, speech recognition and textual understanding.

The toolkit is particularly beneficial to researchers and developers working with big data, big model machine learning problems as they can make use of the DMTK to solve them much faster and with smaller clusters of computers. They can also build their own distributed machine learning algorithms on top of the toolkit.

Microsoft has made DMTK available on Github and plans to add more components to it later down the track. You can find out more over at the DMTK website.

[Via Microsoft Research Blog]


Comments

Be the first to comment on this story!

Trending Stories Right Now