Matt Asay
Contributor

People and Python in AI

analysis
Sep 25, 20234 mins
Data ScienceGenerative AIProgramming Languages

If you want to squeeze the most value from your data, teach your employees Python and Excel instead of specialized programming languages.

In yet another installment of “everyone is doing it, but no one knows how,” a recent NewVantage Partners survey found that while 93.9% of executives surveyed expect to increase their data investments in 2023, just 23.9% of organizations characterize themselves as data-driven. Where is all that investment going, if not to change the way their companies operate? What’s stopping these executives from imposing this vision of a glorious data future on their companies?

People. The problem is always people. Of these same executives, 79% cite cultural issues as the biggest impediment to embracing a data-driven future. It turns out to be easy to say “data-driven” but much harder to implement because people ultimately animate a business, not data. The key, then, is to ensure that data enables and augments people rather than replaces them.

Python and friends

More than a decade ago, Gartner analyst Svetlana Sicular posited two fundamental truths about (big) data that we too often forget: “Organizations already have people who know their own data better than mystical data scientists” and “learning Hadoop is easier than learning the company’s business.” One way to boost the intelligent use of data is by lowering the bar to programming literacy. As arcane as data tools can be, the much more valuable “tool” is an employee’s grasp of the company’s business because expert employees can ask more intelligent questions from the company’s data.

To that end, the focus for every enterprise should be to make data tools more accessible to a greater population of employees. Efforts to make Microsoft Excel a key component of data analytics should be encouraged, including recent attempts to use Excel for data transformation initiatives. There are far more people proficient with Excel than, say TensorFlow or Hugging Face models. Helping them do more with a tool they already know is a big win.

Same with Python. Although R and other more specialized languages continue to be valuable, Python is the single-biggest driver of AI productivity for a swelling army of would-be data engineers. As I’ve written, following Nick Elprin’s projection that data science would become an enterprisewide capability with far-reaching implications, then “the language most likely to dominate is the one that is most accessible to the broadest population within the enterprise.”

Namely, Python.

And SQL, of course. It’s telling that a recent IEEE Spectrum analysis of programming language popularity found that Python and SQL are the two most popular languages right now. Python is on top with a lead that keeps widening. For employers looking to hire, SQL tops the list (with Python a close second). The two together are a solid combination given that both tap into skills that many employees already have rather than forcing people (and their employers) to learn new ways of dealing with data.

Generative AI (GenAI) is another way we’ll see more employees empowered to work with data. I’ve tried using GenAI tools like ChatGPT to automate some of the work my team does with answering questions on our public forums, but the output is still not good enough, requiring more work to fix ChatGPT’s answers than to simply write a better answer to start with. (Beware of GenAI when it comes up with great prose at the expense of technical accuracy. Users may like it, as one recent analysis found, but that will dim when they try some of those AI-suggested answers in production.)

The point, however, isn’t the technology. It’s the people using it. This is where most companies continue to get things wrong.

Power to the people

As the NewVantage report notes, every year “a great majority of respondents report that the principal challenges to becoming a data-driven organization are human—culture, people, process, or organization—rather than technological,” but each year the survey uncovers little progress toward overcoming these human issues. “Too much of the focus of data executives is on non-human issues” like “data modernization, data products, AI and ML, data quality, and various data architectures.”

In other words, we seem to realize we have a people problem, yet we keep trying to fix it with tech. I’ve mentioned a few technologies that allow developers and others to work with data using familiar tools rather than imposing new technologies that force them to change how they work and think to conform to the strictures of the tool, which is a losing strategy.

The crowning asset of a company is the people who interpret the data, not the data itself. These people already work for you; the key is to figure out how to leverage data tools they already know or can easily learn.

Matt Asay
Contributor

Matt Asay runs developer relations at MongoDB. Previously. Asay was a Principal at Amazon Web Services and Head of Developer Ecosystem for Adobe. Prior to Adobe, Asay held a range of roles at open source companies: VP of business development, marketing, and community at MongoDB; VP of business development at real-time analytics company Nodeable (acquired by Appcelerator); VP of business development and interim CEO at mobile HTML5 start-up Strobe (acquired by Facebook); COO at Canonical, the Ubuntu Linux company; and head of the Americas at Alfresco, a content management startup. Asay is an emeritus board member of the Open Source Initiative (OSI) and holds a J.D. from Stanford, where he focused on open source and other IP licensing issues.

More from this author