We should work on creating benchmarks to crucial and performance-heavy tasks, in order to easily track performance and test how changes improved / degraded performance easily.
We can even add a CI step at a later stage that reports automatically in PRs.
All benchmarks should have mocked requests (perhaps besides some specific benchmarks that test download speeds). They should basically be treated similarly to unit-tests, with anything external that may change and affect the benchmark being mocked.
So, for example, when building the dependency tree for a package in a benchmark, we want the dependency tree for each package to be consistent and not change between different runs of the benchmarks (so that we can have an accurate comparison), as dependent packages get updated over time.
We can utilize tools like pyperf to manage and run the benchmarks.