Big data is not just about scaling your data analytics processing platforms to keep up with the onslaught of new information. Just as important, big data is about bringing together your best and brightest minds–your data scientists–and giving them the tools they need to interactively and collaboratively explore rich information sets.
Data scientist productivity is a critical concern, especially when you’re talking about high-priced talent in short supply. If you don’t provide your data scientists with scalable modeling platforms, you won’t realize the full value of your investment in big data.
Today’s statistical modelers and business analysts need high-performance cloud-centric development platforms–often known as “sandboxes”–where they can aggregate and prepare data sets, tweak segmentations and decision trees, and iterate through statistical models as they look for deep statistical patterns.
Big data sandboxes are where you develop the all-important intellectual property – advanced analytic models – that extract intelligence from otherwise inchoate gobs of content. To be as productive as possible, teams of data scientists must have massively parallel cloud-computing resources–including CPU, memory, storage, and I/O capacity–at their fingertips, available within their sandboxing platforms and in the operational cloud environments to which they will deploy their models.
If you fail to provide them with the cloud-based scalability they need to run a growing range of jobs, you’ll be wasting their time as they queue up for access to limited processing and storage resource.
Sandbox scalability is critical, but it’s more than just raw horsepower. Your sandboxing platform must also embed comprehensive, extensible libraries of reusable algorithms and models for advanced analytics. Today your data science requirements may revolve around traditional statistical analysis, data mining, and predictive modeling, and these libraries should be included in all of your sandboxing environments. But your data scientists will increasingly need to incorporate libraries of MapReduce, R, geospatial, matrix manipulation, natural language processing, sentiment analysis, and other advanced analytic algorithms as well.
And don’t skimp on training and other skills-enhancement initiatives to ensure that you have sufficient numbers of the right kinds of data scientists for your big-data projects. Data science’s learning curve is formidable. Your organization may need to establish a data-science center of excellence and a structured training curriculum to ensure you have the right kinds of professionals who’ve mastered this demanding discipline.
Here, for your inspiration, are several IBM resources on the topic of data scientists in the business:
- IBM webpage defining the data scientist.
- An article by IBM VP Anjul Bhambhri providing her perspective.
And here are several blogs that I authored examining various aspects of data scientists in modern business:
- Data Scientist: Exploration in the Age of the Unstructured
- Data Scientist: Closing the Talent Gap
- Data Scientist: Sexy Is as Sexy Does
- Data Scientist: Potential Superstars in Prediction Markets
- Data Scientist: Bringing True Science into the Business Process
- Data Scientist: Chart The Customer Journey
- Data Scientist: Consider the Curriculum
- Data Scientist: Mastering the Methodology, Learning the Lingo
- Data Scientists: Credentialed or Otherwise
- Data Scientists: Explore Game Theory to Boost Customer Engagement
- Data Scientists: Bridge the Cultural Divide with BI Practitioners
- Data Scientists: Myths and Mathemagical Superpowers
- Data Scientists: Run Your Mad Experiments
- Data Scientists: Illuminate Your Patterns with Pictures
- Data Scientists: Grow and Sustain a Center of Excellence
- Data Scientists: How Big is Your Big Data Sandbox?
Last but not least, we will be holding a Twitter chat on “The Rise of the Data Scientist,” on May 9 from 4-5 p.m. ET. We invite you to join us on this chat, using hashtag #cloudchat. I will be one of the panelists, along with bit.ly’s chief scientist Hilary Mason and STORM Insights founder & CEO Adrian Bowles. More info on the chat can be found here.