this set by intersecting the time intervals of the updates
to package.json with the release dates of the packages.
We find that on average, within the lifetime of a commit,
a package query will resolve to 1.88 different versions.
Finally, our data also allows us to answer a ques-
tion that package developers may find crucial as they
issue releases; given that many package consumers use
flexible queries, what is the fraction that will obtain a
new version when it is released, without changing their
package.json. We call this measure “implicit adoption
ratio”, and obtain it by computing, for each package ver-
sion at its release date, the size of the set of projects re-
solving to the latest version divided by the size of the
set of projects resolving to any of the versions. Fig-
ure 12 shows the implicit adoption trends for the popular
express package for building web applications. Note
that the second part of the graph indicates that releases
are continually issued both for the 3.x.x and 4.x .x ver-
sion families. From it we get several insights: first, patch
versions have higher implicit adoptions ratios than mi-
nor versions, which have higher ratios than the two ma-
jor versions visible in the chart. This is explained by
the tendency to adopt version queries which minimize
incompatible updates. Second, as new releases come out
in the 4.x.x, the implicit adoption ratio increases, indi-
cating that the fraction of projects configured to accept
these new releases grows over time. Finally and as a
complement to the second observation, the fraction of
projects implicitly resolving to the latest version in the
3.x.x family shrinks gradually and decisively over time.
The last two points can be explained either by a combi-
nation of 1) the continuously growing number of projects
using express, which tend to use the latest version when
they are created (not visible on the graph), and 2) exist-
ing express projects that migrate to 4.x.x series when
they can afford to.
Takeaways. Through declaring dependencies with
queries, application developers can benefit from auto-
mated upgrades, at various levels of granularity. This
mechanism is used widely in practice, and new releases,
particularly patch ones, have high implicit immediate
adoption ratios.
6 Related Work
Empirical analysis of software ecosystems is an im-
portant aspect of software ecosystem research as a
whole [26]. Correspondingly, related work focuses on
specific aspects like visualization [13], depicting ecosys-
tem maturity [1], or how to aggregate software quality
metrics [17].
Some works empirically analyze software ecosys-
tems that evolve around a specific programming lan-
guages, as we did for npm. Raemaekers et al. present a
crawled dataset containing basic metrics, dependencies,
and changes with some aggregate statistics about Maven,
a popular package manager for Java [23]. Another
work runs software to identify bugs in source code of li-
braries shared in the same ecosystem [16]. In contrast to
these works, our study of the npm ecosystem focuses on
the ecosystem evolution, popularity measures, and pack-
age versioning. An analysis of the statistical computing
project R [8] finds a super-linear growth in packages as
we report in Section 3. In addition, the study focuses
on characterizing contributions to user-contributed ver-
sus core packages. We refrain from running a similar
analysis as npm does not differentiate packages explic-
itly in such a way, although we did identify different
types of packages based on our analysis of popularity
measures (see Section 4.2). In [11], the authors present
results of a quantitative study of the Ruby ecosystem.
The paper presents a graph visualization of the whole
ecosystem as well as some descriptive statistics and his-
tograms about selected characteristics of packages, in-
cluding downloads and package size. In contrast to our
work, the dataset is much smaller, having only around
10K gem nodes and 13.1K dependencies. Furthermore,
the paper does not go into the dynamics of the ecosystem,
considering instead a single point in time. We did not
find any published empirical analyses of the npm ecosys-
tem.
Some works have studied the evolution of versions and
corresponding change of software projects. For example,
in a recent empirical study [5] regarding two ecosystems
(including npm) the authors find that developers struggle
with changing versions as they might break dependent
code. Similar assessments on the effects of changes have
been made regarding the Apache ecosystem [2] or the
Maven ecosystem [24]. In contrast to these works, we
assess versions in npm from a black-box perspective: we
do not assess how version changes are reflected in the
implementation of individual packages, but focus on the
occurrence of version numbers and how they are adopted
by application developers.
Finally, npm has occasionally been analyzed out of the
context of peer-reviewed venues. npm packages pager-
ank provides a keyword-based search for packages, and
presents the results as recommendations based on their
PageRank [21]. While we also consider the PageRank as
a possible popularity measure, we have shown that this
metric may not be adequate for packages most useful to
application developers (Section 4.2). The project npm
by numbers analyses a snapshot of the npm ecosystem
from September 2015 and presents various statistics on
it, including the distribution of version numbers and re-
leases of packages and the dependencies between pack-
ages [25]. In contrast to our work, npm by numbers con-
11