Artificial Intelligence (AI) and Machine Learning (ML) are undoubtedly among the top technology trends over the past decade, transforming how businesses and consumers interact and operate. The near-limitless quantum of training data and decreasing storage and cloud computing costs, paired with the rapid adoption of ML platforms, will revolutionize the tech industry for years to come. While the mobilization of these platforms and robust amount of data available allow businesses unparalleled insights and the ability to drive new digitization, there is still a gap between the promises of AI and the ROI companies are seeing with ML implementations.
Building an ML model is like software development. It is an iterative process that works best with a repository or system of record that can track changes over time, provide a real-time collaboration environment that is highly secure, provide automation capabilities, and be extremely scalable to achieve the highest level of performance. ML models rely on properly labeled training data to get a sufficient output (or prediction). For example, if you uploaded a picture of a cat and a dog and asked the ML model to identify the cat without training the model, it would not be able to determine which animal is a cat, or even that there are two animals in the picture. Aggregating and labeling thousands of images, videos, and texts are crucial to training the ML model to make accurate predictions, such as understanding which image contains a cat — or identifying cancer on a radiology scan.
The data labeling and training process is extremely tedious. Cognilytica Research estimates that 80% of an ML data science team’s resources are focused just on the process of data aggregation and labeling. Traditionally, companies have either hired expensive internal teams or used outsourced labelers (BPOs), both of which pose significant issues. Building an internal team is costly, time-consuming, and difficult to scale, while offshore BPOs present data security problems, poor accuracy that can lead to expensive un-training and re-training of AI models, difficulty collaborating in real-time with labelers, and issues tracking labeler performance.
As we spent significant time researching the ML data labeling space and the related challenges, two things became clear to us: First, companies need a software-based solution focused on collaboration, quality of labeled training data, speed of labeling, and automation. Second, there is a massive untapped opportunity in building an end-to-end MLOps platform similar to the DevOps landscape.
This is why we are thrilled to announce our investment in Labelbox, the most advanced training data platform for AI/ML. Labelbox’s industry-leading, end-to-end software platform allows AI/ML teams to create and manage high-quality training data all in one place with robust security and compliance capabilities. It supports customers’ production pipelines with powerful APIs, offers labeling automation tools, provides full visibility into labeling teams’ performance, and allows the flexibility to use any mix of internal or outsourced labeling teams. Labelbox also enables MLOps teams to continuously create better training data through collaboration and automation, identifying labeling and training errors faster to predict model accuracy, and enabling rapid iteration of ML models.
When we first met Manu Sharma and Brian Rieger, it became clear that they were uniquely suited to build what has now become the industry-leading solution. Both Manu and Brian understand the challenges of building an ML model from scratch to successful production deployments in advanced ML roles at NASA, Planet Labs, and Boeing. Since starting Labelbox in 2018, they have built a phenomenal team that has deep experience in building highly successful startups.
Labelbox’s rapidly growing customer base spans several of the Fortune 1000, federal agencies, high-growth technology companies, as well as customers across agriculture, insurance, transportation, healthcare, retail, financial services, hi-tech, and manufacturing. This customer diversity demonstrates the flexibility of Labelbox’s platform and the ability for data science teams of all sizes and industry sectors to benefit from the platform.
For customers such as Blue River Technology, Labelbox has significantly improved their speed, efficiency, and cost of ML development and operations. Blue River, the John Deere-owned maker of See & Spray Agricultural Machines, uses computer vision to help sprayers identify weeds in farmland. While the common practice is to broadcast spray herbicides across the entire farm, Blue River is able to leverage computer vision and ML to target herbicide application to only where the weeds are, reducing herbicide application by up to 80%. This is a complex problem given the multitude of species of weeds that vary by location and type of crop, as well as the need to work with cross-functional teams of data scientists, agronomists, data engineers and QA. Blue River uses Labelbox as a core tool in their data operations to coordinate workflows between inhouse and outsourced labelers, assess quality of labeling, and evaluate model and data quality. By using Labelbox, Blue River was able to cut their ML spend by 50% and create 2x more training data with the same budget.
As a global venture fund with expertise investing in leading growth-stage companies, we are focused on partnering with category leaders that have the potential to scale globally and transform industries, and lives, through innovation. Many publications have referred to data as the “new oil” of our economy. If this is true, then the labeling process is the critical refinery that will make AI and ML technology successful. We believe Labelbox will be a key enabler to help enterprises increase ROI from AI/ML deployments without compromising data security and compliance. Our partnership with BCG gives us the opportunity to drive awareness and visibility about Labelbox’s offering with BCG’s Global 2000 clients across 50+ countries. We are excited to partner with Labelbox and look forward to working together in their next phase of growth to create a global training data infrastructure platform for all AI/ML applications.