Creating an AI-Ready Data Architecture in the Cloud

Sophisticated ML-as-a-service platforms such as Facebook’s FBLearner Flow are ideal for delivering AI-at-scale. But the data architectures that feed into them are just as vital

Automation is what AI algorithms do best. So, it should be no surprise that the world’s most advanced AI-using enterprises are using the technology to automate the process of experimenting with and scaling AI itself.

But with Facebook, Uber and Lyft all unveiling bespoke platforms to manage end-to-end AI deployment in recent years, it’s now crystal clear that the data architectures needed to do so effectively are highly sophisticated.

CUserssolomPicturesCorinium ImagesFacebook login page

“Traditional pipeline systems we evaluated did not appear to be a good fit for our uses,” explains Facebook Software Engineer Jeffrey Dunn. “We decided to build a brand-new platform, FBLearner Flow, capable of easily reusing algorithms in different products.”

Of course, few companies are as far down the path to AI maturity as Facebook. Most are still laying the foundations required to apply AI effectively in business contexts. The latest polling from Gartner shows that just 37% have implemented any kind of AI at all.

But it’s fair to assume that many enterprises aspire to deploying sophisticated AIs at scale one day – and to achieve this, CDAOs must take steps to put AI-ready data architectures in place.

Start Small and Plan to Scale

An enterprise’s first foray into AI always starts with a question. Where are there processes that can either be fully automated, or made more efficient through the introduction of AI-powered decision aides?

Different use cases require different data architectures, which in turn require different levels of technical ‘know how’ to execute. As such, enterprises with less AI experience may prefer to cut their teeth on a more foundational AI project that doesn’t depend on online machine learning.

Guy Taylor, Head of Data and Data-Driven Intelligence at Nedbank, explains: “Just getting a model that is working and producing answers into an API-type format is a good way to do the initial start.”

“Once you start running into technical and data pipeline challenges, it’s figuring out what those constraints are and then taking a generalized perspective. You can quickly get to value through iteration” 

– Guy Taylor, Head of Data and Data-Driven Intelligence, Nedbank

Data scientists often run into challenges when taking AI models and hooking them up to live business data to put them into production. What’s more, an enterprise’s computing requirements can grow rapidly as they start to operationalize more AI use cases.

Cloud-based data infrastructure is a common solution to these challenges. This provides the flexibility data teams need to quickly spin up new projects with the minimum initial outlay.

Pool Data in Cloud-Based Data Lakes

Once a CDAO understands what types of data are needed to fuel their AI projects, they must find the right data sources and copy the data across into one or several data lakes so it’s available to their data scientists.

“I always say, do your data homework before you start with AI,” quips Dr Susan Wegner, VP AI and Data Analytics at Lufthansa Industry Solutions. “If you are unable to access the data, then you can have the nicest deep learning stuff, but you cannot operationalize it.”

Cloud-based data storage is generally the preferred solution when it comes to ensuring data is readily available to feed AI models. Although, some organizations do choose to store certain data types on premises.

Creating a data management layer will typically also be necessary to ensure that all data is standardized and searchable in a data catalogue.

“What seems to be happening a lot of times is that companies are moving from collecting the data to basically putting the data in one place (or not) and analyzing the data,” notes Tomas Sanchez, Chief Data Architect at the UK’s Office for National Statistics.

“There needs to be a paradigm shift or a thinking shift in these organizations,” he continues. “They really can’t do data seriously unless they have some sort of data management in place.”

Invest in Identity Access Management Skills

Cybersecurity and governance concerns have held up many organizations’ migration to the cloud in recent years.

“In terms of data storage, we have that solved on the cloud,” explains Chubb CDAO Dante Tellez. “But the entire flow is not yet done because of the governance and privacy issues.”

However, there is a growing consensus that these concerns can be addressed and that the future of AI-at-scale is in the cloud. In fact, RightScale’s 2017 State of the Cloud survey, 95% of the 1,000 IT professionals polled confirmed that they are now using the cloud for some of their IT or data needs.

The challenge for enterprises is that most data professionals aren’t certified to do proper security design or identity access management (IAM). Without this, it’s easy to misconfigure cloud platforms so that it’s possible for them to be accessed improperly.

As such, enterprises must be careful to recruit staff with the proper certifications or invest in training existing staff members in IAM.

This will no doubt be challenging for some companies, given the scarcity of data professionals with these qualifications. But overcoming this challenge is essential for ensuring the right access rules and guard rails are in place to protect an enterprise’s data lake.

Optimize Workflows with ML-as-a-Service Platforms

The final piece of the architectural puzzle when it comes to AI-at-scale is the technology an enterprise will use to apply AIs in new business contexts.

As we touched on at the start of this article, ML-as-a-service platforms help enterprises like Facebook streamline the process of developing, testing and reusing AI programs and features across their organizations.

Dunn explains: “In some of our early work to leverage AI and ML, we noticed that the largest improvements in accuracy often come from quick experiments, feature engineering and model training – rather than applying fundamentally different algorithms.”

“Eliminating the manual work required for experimentation allows machine learning engineers to spend more time on feature engineering,” he concludes. “Which in turn can produce greater accuracy improvements.”

Eliminating the manual work required for experimentation allows machine learning engineers to spend more time on feature engineering

– Jeffrey Dunn, Software Engineer, Facebook

ML-as-a-service platforms do more than just empower data scientists to take models from their laptops, wrap them for use and upload them to a common library of AI tools and features. They can also keep records of training data for compliance purposes and even handle AI workflows, taking data from core systems and translating it into ‘features’ models can use.

When organizations reach the point in their AI journeys where they must prioritize driving ROI through reusing, repurposing and optimizing AI tools, giving data scientists tools so they can do these things efficiently is key.

Few companies are truly ready to make the most out of these platforms today. But as more and more enterprises reach the advanced stages of their AI programs, we will no doubt see many of them add ML-as-a-service platforms to their existing data architectures.

This is an extract from our Scaling AI in the Cloud report. For more in-depth insights to help your organization successfully harness the power of AI, claim your copy here now.