Simon Bisson
Contributor

Visual Studio Code joins the Anaconda Python data science toolkit

analysis
Feb 27, 20185 mins
AnalyticsData SciencePython

Microsoft’s Anaconda support is the next step in its open source analytics expansion

python snake programming language
Credit: Thinkstock

There’s growing support for Python at Microsoft, with tools like Azure Notebooks taking advantage of it as a way of trying out ideas and sharing algorithms. Python is also one of the more popular languages in Visual Studio Code, with additional tools in the Visual Studio Code extensions marketplace.

One result of Visual Studio Code’s Python support is the recent announcement that the Anaconda data science platform will ship with Visual Studio Code, along with the R Open distribution as its default R language libraries.

Founded by Travis Oliphant, the original author of NumPy, Anaconda has become an essential data science tool, with a massive set of more than a thousand libraries and plugins that cover most analytic cases. Because Python is an interpreted language, with support for familiar read-evaluate-print loops (REPL), you can test code snippets from the command line, working with data sources, before building more complex scripts and programmes.

Microsoft has written an Anaconda package manager extension for Visual Studio Code, allowing direct download of packages from inside the editor. That interactive nature of Visual Studio Code fits well with Python’s role in Anaconda.

Microsoft’s relationship with Anaconda is intended to go further than Anaconda using R Open and Visual Studio Code. It’s also working with Anaconda to embed its data science tools inside SQL Server. Bringing interactive analytics tooling into the heart of a database is a sensible approach; and Microsoft has already started to put its own analytic tools there.

But making that service dependent on an open source project that it doesn’t control is a big step forward for Microsoft. SQL Server is one of its flagship enterprise products, so bringing in a set of tools that update on a very different schedule could be an issue for many of Microsoft’s corporate customers. But with Anaconda a popular tool on data scientists’ desktops, it shouldn’t be too much of a stretch for users. If you don’t need it in a production database, you can always not install it, leaving the SQL Server/Anaconda combination for your data science team’s development environment.

Azure will also get access to Anaconda, as part of its Azure Machine Learning platform. To get the most from a machine learning platform, you need to be able to build and test your statistical models before you deploy them at scale. By using Anaconda to construct Python and R analytical models, you can test them on sample data in Visual Studio Code before embedding them in an Azure ML pipeline.

Microsoft and open source analytics

Microsoft’s open source efforts started before its Anaconda support, of course. Its 2015 acquisition of Revolution R gave it stewardship over one of the more popular distributions of the R statistical language: R Open. Microsoft has kept R Open under the GPL, a decision that seemed at odds with Microsoft’s position on the GPL’s viral nature—though it has kept it licensed under the less-strict GPL v2. R Open’s multiprocessor support has made it an important tool, because it performs well with large amounts of data.

In Azure, R has become a key component of Microsoft’s machine learning platform, with R offering a set of functional programming elements for statistical analysis that can be dropped into Azure’s machine learning pipeline.

But R isn’t the only analytical language out there, and its inherent complexity has left it a tool for experienced data scientists, where a deep background in statistical analysis is essential. The alternative to R is Python, a language that began life as a scripting tool intended to rival Perl. An interpreted language, Python’s extensibility (and its early adoption within Google) has made it a popular development tool, especially where statistical and mathematical operations are needed. Python’s libraries include several popular analytic packages, including the NumPy scientific computing toolkit.

That’s why Microsoft followed up its open source analytics efforts in R with support for Python and now Anaconda.

Microsoft and open source

If there’s a line to be drawn between Satya Nadella’s tenure as Microsoft CEO and that of his predecessors, it’s likely to be one of Microsoft’s relationship with the open source community. That’s not to say that Microsoft didn’t previously engage with the open source world, but in today’s cloud-first environment, working well with open source developers isn’t only good for code, it’s also a competitive advantage for Microsoft.

That improved relationship is also a two-way street. Microsoft is using open source in its tools, and it’s committing code back into the projects it uses. It’s begun to open-source its own platforms, in the shape of Visual Studio Code, .Net, PowerShell, and the Chakra JavaScript engine. But we’re now starting to see the other side of the relationship, where open source developers are taking dependencies on Microsoft’s open source projects.

Microsoft’s engineers now speak the language of open source, if not fluently, then certainly competently. They’re much more aware of the requirements of open source developers and have the flexibility to respond to them that wasn’t there in earlier eras. The result is a new set of relationships that, like the one with Anaconda, look set to do what Microsoft has always tried to do: Make life easier for anyone building solutions on Microsoft’s own platforms.

Simon Bisson
Contributor

Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson prefers to think of “career” as a verb rather than a noun, having worked in academic and telecoms research, as well as having been the CTO of a startup, running the technical side of UK Online (the first national ISP with content as well as connections), before moving into consultancy and technology strategy. He’s built plenty of large-scale web applications, designed architectures for multi-terabyte online image stores, implemented B2B information hubs, and come up with next generation mobile network architectures and knowledge management solutions. In between doing all that, he’s been a freelance journalist since the early days of the web and writes about everything from enterprise architecture down to gadgets.

More from this author