A System of Agents brings Service-as-Software to life READ MORE
July 31, 2015
Ashu Garg
The open source movement is made up of developers publishing free source code for others to use—and as a result, open source is eating software, redefining how technology companies build products, and operate their businesses. Adding legitimacy to the movement are large companies like MySQL and RedHat, who are increasingly using, and contributing to open source software. At the same time, machine learning is revolutionizing how we use data by leveraging complex algorithms to drive predictive modeling and create amazing use cases like image recognition. Open source solutions are free, so for companies that provide machine learning as a service, the chase is on to leverage the unique difficulties of machine learning as a way to help them thrive in an open source ecosystem.
Here are three open source challenges that provide opportunities for machine learning startups.
Challenge 1: Machine learning is difficult to implement in open source.
Open source machine learning algorithms lag behind research innovations and are difficult to program. Even if coded correctly, getting good performance from these algorithms requires expertise, as data scientists will need to use techniques like hyperparameter tuning and data augmentation to fine-tune these algorithms.
The Opportunity: A hosted solution that learns based on user data or provides pre-trained models.
Companies without a data science team to use open source tools will look for “black box” hosted solutions from startups. Startups can succeed by making machine learning easy, either by generating machine learning models from raw customer data, or by providing pre-trained models for common use cases like image recognition. These startups give the power of machine learning to companies who don’t have the resources to hire a data science team.
Challenge 2: Machine learning is not easily scalable in open source.
Machine learning at scale requires massive amounts of data and specialized hardware. Large companies also require different algorithms that can process streaming data. Although open source projects focused on machine learning at scale exist, they are difficult to use.
The Opportunity: A platform built on top of open source that easily deploys machine learning act scale.
Startups can provide value by simplifying open source projects that are focused on data science at scale. Many companies use these open source projects, like Hadoop and Spark, but spend a lot of hours deploying and maintaining their overly complex systems. There is a clear, valuable market for startups who can adapt the technology already used by big companies and make it easier and cheaper for them to maintain. For example, Databricks provides a cloud-based platform that manages large-scale Spark clusters, allowing companies to easily deploy machine learning at scale without worrying about hardware and complicated Spark configurations.
Challenge 3: Getting enough data for good machine learning.
With open sourced machine learning, the main differentiator between a good machine learning model and a bad one is the quality of the training data. Deep learning methods need a lot of high-quality, domain-specific data to be effective, which can be hard to obtain.
The Opportunity: Aggregating high-quality training data in a sector to dominate performance.
Startups can provide better machine learning solutions than open source if they have high quality data in a specific sector. For example, a startup that has collected the most training images of bone CT scans will be able to provide industry-leading performance on problems such as identifying osteoporosis. Thus, startups can dominate a sector by being first to aggregate high-quality training data in a category through partnerships and customers. There is a first-mover advantage here, as leading performance creates more customers that provide you with more data. Data begets data.
As technology evolves, machine learning as a service companies have the opportunity to create huge businesses, even in an open source ecosystem. By deploying hosted machine learning solutions, solving problems of scale, and controlling the data required to make good predictions for companies that don’t have these capabilities, machine learning can begin revolutionizing existing sectors and creating new ones.