How do you solve a problem like the data scientist?

In which Jill reminds us that managing the data is the hard part.

The newest aphorism-du-jour in the big data world is this: “It’s not big data—it’s just data.” As debates on both the definition and value of big data continue, this argument has some validity. The struggle to find, assess, cleanse, annotate, integrate, standardize, and provision data predates not only the big data trend, but computing itself.

Information Week warned readers of the consequences of “infoglut” back in 1995 under the heading “New Tools that Can Help Tame an Ocean of Data.” Mixed metaphor aside, the article confirmed what had been keeping business and IT managers up at night: how to harness proliferating and ever more complex amounts of data. The rise of Big Data means that these challenges will only get harder.

“Most of the complex problems we tackle should involve some sort of initial data exploration,” explains Bill Rand, Assistant Professor of Marketing at the University of Maryland and Director at the university’s Center for Complexity in Business. Rand personifies the expanding role of the data scientist, professionals who not only explore diverse data sets but determine how the use of the data can help their companies compete.

Rand and his team have been applying analytical skills to examine diverse social media data to understand behavior patterns and propensities that could aid marketers. “Social media players aren’t a bunch of people working on a common problem,” he explains. ”They’re individuals working on separate problems. Data Scientists need to explore large volumes of detailed data to understand the realm of possible social media actions. Only after the initial analysis can they determine how to apply subsequent analytic models.”

The keyword here isn’t “analyze,” but “apply.” The people with the job title dubbed “The Sexiest Job of the 21st Century” by authors Thomas H. Davenport, Ph.D, and D.J. Patil, are no longer expected to simply run mathematical models against diverse data sets. They’re now just as likely to suggest how to leverage the data to drive cross-selling techniques, suggest supply chain efficiencies, predict fraud, and determine a customer’s next likely purchase.

“Data scientists, by definition, combine business acumen with data acumen,” explains P.K. Kannan, Professor of Marketing Science and Marketing Department Chair at University of Maryland’s Smith Business School (and Rand’s boss). “From a knowledge perspective, a data scientist has keen insights into the business models driving the firm, its products and services, while simultaneously possessing mastery of data creation and data analysis. In that sense, they’re different from traditional statisticians not only in their business domain knowledge but also in terms of their broader scope.”

This is one lofty job description and one that, without the right set of guidelines, standards, and skills, is primed for failure. On the one hand, IT personnel are likely to have begun implementing data governance, establishing clear policies for the access, usage, and deployment of information from a variety of sources. They may have also adopted enabling technologies such as data quality, master data management, and metadata repository tools to help automate repeatable tasks. Depending on how it’s defined, the data scientist’s role could erode data governance policies, or worse, contradict them.

On the business side, the phenomenon of data hoarding is alive and well and making no apologies. Even in the age of big data, knowledge is (still) power, and line of business staff are loathe to share data that might bestow the sheen of indispensability. So the customer address data is shared, but the on-line behaviors are shielded from customer support reps. Or the electronic health record is shared with clinicians but the patient’s survey data is shared only with administrators. A data scientist (or business analyst or visualization tool user) can hardly deliver value if she can only access a portion of the data—however big—she needs to do her job.

Managers have to do the hard but sometimes unpleasant work of inventorying incumbent skills and even consolidating data management roles or functions. Circumscribing role boundaries is key, not only to prevent duplication of effort, but to stem confusion among incumbent data experts. Failing to do so can result in staff disaffection. “I guess I always assumed I was one of the firm’s data authorities,” an actuary at an insurance company confided recently. “Now I’m being ‘coached’ on how to do the job I’ve done for twelve years. Maybe if I called myself a data scientist I’d have more clout.”

With the increase of systems generating the data—both within and outside of the firewall—operationalizing the flow and usage of information is the biggest barrier to becoming a data–driven organization. At Baseline Consulting we called it “the data supply chain,” and it’s an apt term for big data’s interdisciplinary skill sets and cross-functional reach. Because no matter how big or complex the data is, the “it’s not the size, but how you use it” aphorism is as true as it ever was.

This post is an excerpt I wrote for Phil Simon’s new book, Too Big to Ignore: The Business Case for Big Data (John Wiley & Sons, 2013). The book is geared to business managers and executives seeking to understand big data’s value proposition, admonishing them to Think Big.

Original post on