Python “preeminent” in O’Reilly learning platform usage analysis
The more you know
Anyone doing much work with data these days knows Python is hot, and that it’s become the programming language of choice for data science. So while the news of Python’s popularity may not be earth-shattering, there’s lots of interesting data that goes beyond that headline. For example, it’s not just that Python is popular; it’s that it’s still growing, in comparison to a similar analysis last year of search and usage data from 2018. And in a one-two punch, interest in Python’s “rivals” is decreasing.
One place Python code shows up frequently is in notebooks, and another language that shows up frequently in notebooks, especially for code running on Apache Spark, is Scala. But based on search frequency rating, 2019 interest in Scala was down over 20% from 2018 levels, indicating that Python may be making inroads in the data engineering world, where Scala has been popular. And the R programming language, which arguably had been Python’s chief competitor in the Machine Learning (ML) realm, is down from 2018 too, almost as much as is Scala’s. Even beyond the data world, Python’s increased popularity is noteworthy, given that interest in Java, JavaScript, C# and C++ are all on the decline in O’Reilly’s study.
Other categories
Other growing categories on the upswing include data engineering, data science, and AI + ML, generally, and Kafka, specifically. Even interest in SQL showed growth. Meanwhile, interest in data management and Spark is down, and that of Hadoop has dropped even more precipitously. According to O’Reilly’s report on the data, “Hadoop and its ecosystem of related projects (such as Hive) are in the midst of a protracted, years-long decline.” In fact, year-over-year, Hadoop and Hive are each down 34%, and Spark is down by 21%.
Cloud’s growing in general. Among the major public cloud providers, AWS, Azure and Google Cloud Platform (GCP) come in first, second and third in platform content usage, respectively, and all three are growing. But the growth in their numbers ranks in the opposite order, with usage of GCP-related content growing at almost 40%, Azure at 30% and AWS at 15%. And in the different, but related, world of container technology, content usage on Docker is down slightly while that of Kubernetes is up almost 40%.
Speaking more broadly, 2019 seems to have been a year for maturity, and rigor. Usage on O’Reilly’s security-related content was up 26%. Enterprise architecture, though small in absolute terms, was up roughly 50%. Architectural patterns, and serverless architecture were up significantly as well. Interestingly, DevOps, while still the top infrastructure + ops topic, was down 5% year-over-year.
Extensibility and popularity
Meanwhile, back to Python. The core language is a versatile one, but the real fuel for its momentum may be its extensibility and huge ecosystem support, resulting in a dizzying array of packages that extend the language. That’s certainly true in the machine learning arena, where Python packages like NumPy, Pandas, Scikit-learn and PyTorch seem to make up much of the de facto ML stack.
As a result, Python sample code is everywhere. And while I’m not much of a Python developer, I find Python code easy to read, and even to modify. That’s both good and bad — it means even novices may find the language approachable, but it may also mean that a chunk of Python code out there isn’t the best, in terms of quality, performance or efficient use of the language and its packages. All the more reason, then, for the industry to be focusing on architecture, design and good development practices. Let’s all hope interest in those topics of technological robustness stay strong and grow in 2020.