On Wed, Dec 4, 2019 at 4:58 PM Andrew Engelbrecht <andrew(a)fsf.org> wrote:
Hello Pagure devel list,
Sorry if this isn't quite the right place for this question.
The FSF is looking into hosting our own Web based SCM service, possibly
using Pagure. We would like to how the pagure.io team handles spam,
abuse, and overly large repos on your site. We expect that this could be
an issue, and that it would be critical for keeping our new site alive
and well. Any input on this topic would be very helpful.
There are two issues with "large" Pagure systems that you might be
concerned with:
* Gitolite configuration regeneration is slow with lots of
repositories. In the default configuration, Gitolite is the backend
used for managing Git ACLs. Pagure has an alternative integrated
backend that is much more performant. This will likely be better for
heavier instances. We are exploring changing Pagure's default from
Gitolite to the integrated backend to simplify
installation/configuration and improve performance. The Pagure servers
run by Fedora (
src.fedoraproject.org,
git.centos.org, pagure.io) don't
use Gitolite anymore for this reason. This is configurable in
/etc/pagure/pagure.cfg (as installed by the Fedora, Mageia, or
openSUSE packages) as documented[1].
* Very large repos tend to be mostly fine in Pagure. The slowest
actions are currently listing commits and computing the various stats.
This is because that data is more-or-less processed on demand, and
really large repos like the Linux kernel tend to be a struggle to
process on-demand. Even GitHub and GitLab struggle with some of this,
the difference is that we don't have any background tasks that do this
and cache the results regularly, so it's more obvious. It does work
and doesn't cause the backend to die, it just takes longer than your
average repository to compute them.
I also found this page about the forge evaluations on LibrePlanet[2],
and I think I can provide some answers to your questions there (as I
run a few Pagure instances myself and I'm the maintainer of the pagure
package in Fedora, Mageia, and openSUSE).
There is only a top level namespace, so there might be some conflicts
in terms of who reserves a repo name first. With the import/export functionality, at least
a fork does not need to be stuck under /fork.
This is actually configurable. You can set this in
/etc/pagure/pagure.cfg with the USER_NAMESPACE option as
documented[3].
Very bad usability with no js from site (gnu criteria A): login,
logout, create issue etc. Would need to investigate adding functionality without js. Could
package js into an extension, but this is not ideal for users.
At least for login and logout, this depends on the auth backend used.
Fedora's instances all leverage OpenID Connect through Ipsilon[4], and
our theme does include JavaScript. You could just as easily support a
JS-free login with Ipsilon. Or, if you want to use Pagure's local auth
backend, there is no JavaScript in the register/login/logout flow that
I recall. As for issues/PRs/etc. it would be possible to restore
JS-free usability. This is an RFE for this[5], but no work has been
done. Pull requests to improve this are welcome!
Ease of deployment, upgrading, debugging
The basic setup of Pagure is pretty easy. I've documented quick-start
processes in the Fedora, Mageia, and openSUSE packages[6][7][8] that
do reliably work. These are just simplified versions of pingou's blog
post about setting up Pagure on a Banana Pi[9], adapted for the distro
and the newest versions of Pagure. The pagure documentation does a
good job of explaining deployment strategies[10] in better detail.
Upgrade procedures are mostly just update the files and restart the
services. In cases where database migrations or other things are
required, they are documented[11].
Resource requirements / performance under load
A heavily loaded system will want to probably configure celery to have
separate queues for processing various types of tasks[12]. This will
help mitigate issues with the user experience. In addition, you will
want to have the Redis server on a separate server[13] and a remote
SQL server[14] to eliminate the contention between Pagure and the
database servers. Fedora uses a separate PostgreSQL for its instances,
while CentOS uses a separate MariaDB, so either works. I personally
use PostgreSQL. You may also want to consider using
repoSpanner[15][16] to have distributed storage for Git. This is a
somewhat extreme option, but it is an option for scaling out Pagure.
How featureful are is are administration interface/tools (GUI or CLI)
The pagure-admin CLI tool is the primary administrative interface, and
provides a relatively intuitive way to do administrative actions for a
Pagure server. I honestly don't have to use it much, but the
admin-centric functionality is present there.
Hopefully this additional information helps!
[1]:
https://docs.pagure.org/pagure/configuration.html#git-auth-backend
[2]:
https://libreplanet.org/wiki/Fsf_2019_forge_evaluation
[3]:
https://docs.pagure.org/pagure/configuration.html#user-namespace
[4]:
https://ipsilon-project.org/
[5]:
https://pagure.io/pagure/issue/3507
[6]:
https://src.fedoraproject.org/rpms/pagure/blob/master/f/pagure-README.Fedora
[7]:
http://svnweb.mageia.org/packages/cauldron/pagure/current/SOURCES/pagure-...
[8]:
https://build.opensuse.org/package/view_file/openSUSE:Factory/pagure/pagu...
[9]:
http://blog.pingoured.fr/index.php?post/2016/01/05/Setting-up-pagure-on-a...
[10]:
https://docs.pagure.org/pagure/install.html
[11]:
https://pagure.io/pagure/blob/master/f/UPGRADING.rst
[12]:
https://docs.pagure.org/pagure/configuration.html#celery-queue-options
[13]:
https://docs.pagure.org/pagure/configuration.html#redis-options
[14]:
https://docs.pagure.org/pagure/configuration.html#db-url
[15]:
https://github.com/repoSpanner/repoSpanner
[16]:
https://docs.pagure.org/pagure/configuration.html#repospanner-options
--
真実はいつも一つ!/ Always, there's only one truth!