Hey folks!
I'd like to open a can of worms: SQLAlchemy integration in Flask. It's a long read but I hope you'll like it. I try not to be on the ranty side.
First, some context: we have been plagued by tech debt for a very long time, maybe more than other development projects because web tech has evolved very quickly in the past 15 years (after the big flatline of "I need to be IE6-compatible", but I digress), and also because we're a small team maintaining a ton of apps.
In the past, we've had to switch web frameworks a couple of times. Turbogears 1, then Turbogears 2, then Flask, but we also have apps in CherryPy (to this day!) and Pyramid (and somewhat Django although we're not maintaining them). As a result we've grown very suspicious of framework integration because that means more work when we (inevitably) have to migrate to a different framework. I think it's partly why we chose Flask at the time: it's a minimalist framework, therefore there will be less integration bits to migrate away from. I don't agree entirely with this point of view but I think the standardization has done us a lot of good.
We've also standardized on SQLAlchemy for our DB, which is great. But we haven't standardized on how to integrate the two. There's an integration library called Flask-SQLAlchemy, but our apps don't use it (to my knowledge). I think that the reason is that Flask-SQLAlchmemy makes your models unusable without Flask, and that triggers our "what if the framework goes away" PTSD. Here's an example of what models look like with Flask-SQLAlchemy:
from flask import Flaskfrom flask_sqlalchemy import SQLAlchemy
app = Flask(__name__) app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db' db = SQLAlchemy(app) class User(db.Model): id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String(80), unique=True, nullable=False
As you can see, the User class inherits from db.Model, something that the Flask-SQLAlchemy extension provides. Same for the Column and the DB types. I do understand why it would make us flinch to "tie" our models to the web framework, even indirectly. (that said, if we want to migrate those models from Flask-SQLAlchemy to native SQLAlchemy, it would be so trivial that plain sed can handle it)
So we've tried to do the SQLAlchemy integration in Flask ourselves. This has two downsides: - it's not as easy as it looks, especially when migrations and Alembic come into play - we end up with many slightly different integrations, written by different people or even the same person at different points in time. Ironically, our attempt at avoiding tech debt has caused us more tech debt.
Flask-SQLAlchemy has however some pretty strong points: - it's widely used and of good quality - it's maintained by the same people who maintain Flask (although the same cannot be said of Flask-Migrate, the Alembic integration) - it gives us some shortcuts very common in web apps, like get_or_404 or pagination support, that we also had to write ourselves with differing implementations.
But if we don't want to go this route, I've extracted a lightweight way of integrating SQLAlchemy in Flask that leaves the models usable without Flask. I think it's a reasonable concern to separate entirely the models from the web app framework, so I'm not advocating for Flask-SQLAlchemy here. I'm advocating against each app having its own integration.
On top of that, while I don't see us moving away from Flask anytime soon, I do see a reason why we would want to look at other web frameworks in the near future, and that reason is async support. Flask 2.x has grown support for async views, but it's not as good https://flask.palletsprojects.com/en/2.0.x/async-await/#when-to-use-quart-instead as other async-native frameworks such as FastAPI or Starlette, and extensions are unlikely to support it. That said, using async with SQLAlchemy requires doing away with quite a few convenience features of the ORM, such as implicit queries on model attributes, so moving to an async framework would require us to rewrite all the database access layer anyway. Maybe this is not a good example after all.
Anyway, this long email is about finding a common ground for SQLAlchemy integration in Flask, while taking into account our difficult experiences with webframewoks in the past, but not being locked in them. Is there something that I misrepresented here? Do you have opinions? Preferences?
Thanks!
Aurélien
On Mon, Dec 06, 2021 at 06:36:56PM +0100, Aurelien Bompard wrote:
Hey folks!
I'd like to open a can of worms: SQLAlchemy integration in Flask. It's a long read but I hope you'll like it. I try not to be on the ranty side.
...snip...
Anyway, this long email is about finding a common ground for SQLAlchemy integration in Flask, while taking into account our difficult experiences with webframewoks in the past, but not being locked in them. Is there something that I misrepresented here? Do you have opinions? Preferences?
I don't have much of a horse in this race, but I think standardizing our applications more is a really good thing. Doing that would allow us to make changes accross the board on apps instead of just here and there when something changes.
We have been using flask for quite a while, so I don't know that we need to fear being more tied to it.
kevin
On Mon, 2021-12-06 at 18:36 +0100, Aurelien Bompard wrote: <snip>
Anyway, this long email is about finding a common ground for SQLAlchemy integration in Flask, while taking into account our difficult experiences with webframewoks in the past, but not being locked in them. Is there something that I misrepresented here? Do you have opinions? Preferences?
So, full disclosure, I'm normally just lurking on this list and am not currently writing or maintaining code for the infrastructure team, so my 2¢ probably isn't worth much more than that.
Having said that, in my day job, I've been writing a Flask API to correspond with a massive database restructure using SQLAlchemy. When I started writing the API, I originally used Flask-SQLAlchemy for all the reasons you listed above. However, a couple of months ago I stripped it out for a couple of reasons.
1. We're using a clustered database (CockroachDB, for those who care) that uses optimistic concurrency, so automatic transaction retries are a must, and we need control over how those retries are done. 2. We are using the same models for a couple of different projects (the API itself and a script that is synchronizing between the old database and the new), and not all the projects are built on Flask. Initially, I was able to get the sync script working with Flask-SQLAlchemy, but things got ugly quickly when I started doing multithreading, so I abandoned it and am now using Flask and SQLAlchemy separately.
In short, Flask-SQLAlchemy does a great job of tying together Flask and SQLAlchemy if you're 100% sure that your project models will never be required outside of Flask. The minute you step outside of the Flask- SQLAlchemy way of doing things, things start to go very wrong very quickly.
Jonathan
Thanks for your input!
1. We're using a clustered database (CockroachDB, for those who care)
that uses optimistic concurrency, so automatic transaction retries are a must, and we need control over how those retries are done.
Interesting, we don't use that, but then again we've recently started using more funky stuff on the database side (TimescaleDB) so maybe one day...
- We are using the same models for a couple of different projects (the
API itself and a script that is synchronizing between the old database and the new), and not all the projects are built on Flask. Initially, I was able to get the sync script working with Flask-SQLAlchemy, but things got ugly quickly when I started doing multithreading, so I abandoned it and am now using Flask and SQLAlchemy separately.
When I thought about that use case, I supposed it would be OK to instantiate the app and start the app context from within the script, as it would also give you access to Flask's config file. But I did not think about multithreading. Would you recommend against creating the app instance and the app context in a command-line script?
Is the code you wrote to integrate Flask and SQLAlchemy opensource, and available somewhere?
Thanks again!
Aurélien
Sorry for taking so long to reply. I'm afraid I don't check this mailing list as often as I should. :)
On Tue, 2021-12-07 at 08:52 +0100, Aurelien Bompard wrote:
Thanks for your input!
- We're using a clustered database (CockroachDB, for those who
care) that uses optimistic concurrency, so automatic transaction retries are a must, and we need control over how those retries are done.
Interesting, we don't use that, but then again we've recently started using more funky stuff on the database side (TimescaleDB) so maybe one day...
Unfortunately CockroachDB has gone the route of MongoDB in its licensing, so it's not really open. YugabyteDB looks like it has most of the same features and is Apache 2.0 licensed, so would probably be a better fit for Fedora (and, if it wasn't for the fact that it's missing GIN indexes, we would probably be using it too).
- We are using the same models for a couple of different projects
(the API itself and a script that is synchronizing between the old database and the new), and not all the projects are built on Flask. Initially, I was able to get the sync script working with Flask-SQLAlchemy, but things got ugly quickly when I started doing multithreading, so I abandoned it and am now using Flask and SQLAlchemy separately.
When I thought about that use case, I supposed it would be OK to instantiate the app and start the app context from within the script, as it would also give you access to Flask's config file. But I did not think about multithreading. Would you recommend against creating the app instance and the app context in a command-line script?
Well, that was what I tried to do first, but, as I said, everything broke down when I tried to do multithreading (and got worse when I tried to setup multiprocessing). The problem is that Flask-SQLAlchemy tries to manage the DB session for you, and, since SQLAlchemy sessions aren't thread-safe, my command-line script kept crashing, and a few hours of poking around couldn't fix it. If I'd been willing to poke around more in Flask-SQLAlchemy's, I might have figured something out, but it just didn't seem to be worth the effort, when manually managing my sessions fixed the problem completely.
Is the code you wrote to integrate Flask and SQLAlchemy opensource, and available somewhere?
Unfortunately not, but there was actually very little integration code written.
Our code follows the following pattern (we're using Flask-RESTX, and I've omitted serializers to keep it simple):
endpoint: import business from util import run_transaction
@ns.route("/user/string:id") class UserLink: def get(self, id): return run_transaction(lambda s: business.get_user(s, id))
business: from database.model import *
def get_user(session, id): return session.query(User).filter(User.id == id).one()
util: from sqlalchemy import create_engine from sqlalchemy.orm import sessionmaker import sqlalchemy_cockroachdb
engine = create_engine('postgresql://admin:swordfish@localhost/')
SessionMaker = sessionmaker(engine)
def run_transaction(func): sqlalchemy_cockroachdb.run_transaction(SessionMaker, func)
The purpose of the run_transaction function is to repeat transactions if there's a conflict, rather than trying to lock the record, which is a CockroachDB paradigm.
I hope the above is at least somewhat helpful in explaining how we're working without Flask-SQLAlchemy
Jonathan
Sorry for taking so long to reply. I'm afraid I don't check this mailing list as often as I should. :)
Totally fine, thanks for the reply!
When I thought about that use case, I supposed it would be OK to
instantiate the app and start the app context from within the script, as it would also give you access to Flask's config file. But I did not think about multithreading. Would you recommend against creating the app instance and the app context in a command-line script?
Well, that was what I tried to do first, but, as I said, everything broke down when I tried to do multithreading (and got worse when I tried to setup multiprocessing). The problem is that Flask-SQLAlchemy tries to manage the DB session for you, and, since SQLAlchemy sessions aren't thread-safe, my command-line script kept crashing, and a few hours of poking around couldn't fix it. If I'd been willing to poke around more in Flask-SQLAlchemy's, I might have figured something out, but it just didn't seem to be worth the effort, when manually managing my sessions fixed the problem completely.
Yeah, maybe something like scoped_session()[1] could have help, but no point in rewriting history (unless there's a mistake in your git reflog ;-) ) Indeed your integration layer seems slim enough. Thanks!
[1] https://docs.sqlalchemy.org/en/14/orm/contextual.html
While most of our flask apps have command-line scripts, I don't think many of them go full multithreading or multiprocessing, with the notable exception of Fedora-Messaging listeners, since the callback is run in a thread. I may end up publishing the sqlalchemy integration layer that I have written so we can use it in our apps without re-inventing a slightly different wheel every time. It's one more project to keep track of, but it should reduce the overall amount of code we maintain and tech debt we have to deal with.
Thoughts? Name ideas before I pick some french word nobody can pronounce?
Aurélien
On Mon, Dec 6, 2021 at 6:37 PM Aurelien Bompard abompard@fedoraproject.org wrote:
Hey folks!
I'd like to open a can of worms: SQLAlchemy integration in Flask. It's a long read but I hope you'll like it. I try not to be on the ranty side.
First, some context: we have been plagued by tech debt for a very long time, maybe more than other development projects because web tech has evolved very quickly in the past 15 years (after the big flatline of "I need to be IE6-compatible", but I digress), and also because we're a small team maintaining a ton of apps.
In the past, we've had to switch web frameworks a couple of times. Turbogears 1, then Turbogears 2, then Flask, but we also have apps in CherryPy (to this day!) and Pyramid (and somewhat Django although we're not maintaining them). As a result we've grown very suspicious of framework integration because that means more work when we (inevitably) have to migrate to a different framework. I think it's partly why we chose Flask at the time: it's a minimalist framework, therefore there will be less integration bits to migrate away from. I don't agree entirely with this point of view but I think the standardization has done us a lot of good.
We've also standardized on SQLAlchemy for our DB, which is great. But we haven't standardized on how to integrate the two. There's an integration library called Flask-SQLAlchemy, but our apps don't use it (to my knowledge). I think that the reason is that Flask-SQLAlchmemy makes your models unusable without Flask, and that triggers our "what if the framework goes away" PTSD. Here's an example of what models look like with Flask-SQLAlchemy:
from flask import Flaskfrom flask_sqlalchemy import SQLAlchemy
app = Flask(__name__) app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db' db = SQLAlchemy(app) class User(db.Model): id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String(80), unique=True, nullable=False
As you can see, the User class inherits from db.Model, something that the Flask-SQLAlchemy extension provides. Same for the Column and the DB types. I do understand why it would make us flinch to "tie" our models to the web framework, even indirectly. (that said, if we want to migrate those models from Flask-SQLAlchemy to native SQLAlchemy, it would be so trivial that plain sed can handle it)
So we've tried to do the SQLAlchemy integration in Flask ourselves. This has two downsides:
- it's not as easy as it looks, especially when migrations and Alembic
come into play
- we end up with many slightly different integrations, written by
different people or even the same person at different points in time. Ironically, our attempt at avoiding tech debt has caused us more tech debt.
One approach could be to build your data-models as a dedicated python module. This will require being more rigorous on writing generic objects and binding SQLAlchemy there, and specifically closely tracking code change.
Also you might want to check how deep SQLAlchemy is tied to your project (model, relationship, backrefs, sub_query, join, clause function, sesion, bulk_insert, etc) to see if this is such a big deal. Do all the apps require relational db or not, or even at which percent?
Flask-SQLAlchemy has however some pretty strong points:
- it's widely used and of good quality
- it's maintained by the same people who maintain Flask (although the same
cannot be said of Flask-Migrate, the Alembic integration)
- it gives us some shortcuts very common in web apps, like get_or_404 or
pagination support, that we also had to write ourselves with differing implementations.
But if we don't want to go this route, I've extracted a lightweight way of integrating SQLAlchemy in Flask that leaves the models usable without Flask. I think it's a reasonable concern to separate entirely the models from the web app framework, so I'm not advocating for Flask-SQLAlchemy here. I'm advocating against each app having its own integration.
On top of that, while I don't see us moving away from Flask anytime soon, I do see a reason why we would want to look at other web frameworks in the near future, and that reason is async support. Flask 2.x has grown support for async views, but it's not as good https://flask.palletsprojects.com/en/2.0.x/async-await/#when-to-use-quart-instead as other async-native frameworks such as FastAPI or Starlette, and extensions are unlikely to support it. That said, using async with SQLAlchemy requires doing away with quite a few convenience features of the ORM, such as implicit queries on model attributes, so moving to an async framework would require us to rewrite all the database access layer anyway. Maybe this is not a good example after all.
I use since a while Aiohttp coupled with aiohttp-slqalchemy. those two work pretty well. might want to look them up.
Anyway, this long email is about finding a common ground for SQLAlchemy integration in Flask, while taking into account our difficult experiences with webframewoks in the past, but not being locked in them. Is there something that I misrepresented here? Do you have opinions? Preferences?
Thanks!
Aurélien
infrastructure mailing list -- infrastructure@lists.fedoraproject.org To unsubscribe send an email to infrastructure-leave@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedorapro... Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
- we end up with many slightly different integrations, written by
different people or even the same person at different points in time. Ironically, our attempt at avoiding tech debt has caused us more tech debt.
One approach could be to build your data-models as a dedicated python module.
I mean different implementations of the integration between SQLAlchemy and Flask, in our applications. Not different implementations of the models, as each app has its own and that's expected. I'm talking about the setting up of the DB, reading the URI from the config file, the script that calls MetaData.create_all() and then alembic.commands.stamp(), the creation of the session, the removal of the session when the request is done, the activation of foreign keys on SQLite, ways to check that the schema is up-to-date, etc etc. All this stuff that all of our DB-using apps need, and almost all of them have done differently.
Also you might want to check how deep SQLAlchemy is tied to your project (model, relationship, backrefs, sub_query, join, clause function, sesion, bulk_insert, etc) to see if this is such a big deal.
I think it's very very tied to the applications. You'll find SQLAlchemy-specific code in the model methods and in the views, but the app is also usually manipulating the model instances, calling their attributes on-demand, etc.
Do all the apps require relational db or not, or even at which percent?
I think a lot of them do, I'd say that non-DB using web apps are the exception (like webhook2fedmsg handlers)
I use since a while Aiohttp coupled with aiohttp-slqalchemy. those two work pretty well. might want to look them up.
Interesting, I haven't tried those two. I don't suppose it lets you use SQLAlchemy's transparent attribute access in a non-blocking way? For example when calling Child.parents, SQLAlchemy will emit a query and populate the parents attribute with the result list, but that's a DB query and in the async world it should not block.
Thanks!
Aurélien
On Monday, December 6, 2021 6:36:56 PM CET Aurelien Bompard wrote:
Hey folks!
I'd like to open a can of worms: SQLAlchemy integration in Flask. It's a long read but I hope you'll like it. I try not to be on the ranty side.
First, some context: we have been plagued by tech debt for a very long time, maybe more than other development projects because web tech has evolved very quickly in the past 15 years (after the big flatline of "I need to be IE6-compatible", but I digress), and also because we're a small team maintaining a ton of apps.
In the past, we've had to switch web frameworks a couple of times. Turbogears 1, then Turbogears 2, then Flask, but we also have apps in CherryPy (to this day!) and Pyramid (and somewhat Django although we're not maintaining them). As a result we've grown very suspicious of framework integration because that means more work when we (inevitably) have to migrate to a different framework. I think it's partly why we chose Flask at the time: it's a minimalist framework, therefore there will be less integration bits to migrate away from. I don't agree entirely with this point of view but I think the standardization has done us a lot of good.
We've also standardized on SQLAlchemy for our DB, which is great. But we haven't standardized on how to integrate the two. There's an integration library called Flask-SQLAlchemy, but our apps don't use it (to my knowledge).
FTR, Copr uses Flask-SQLAlchemy.
Pavel
I think that the reason is that Flask-SQLAlchmemy makes your models unusable without Flask, and that triggers our "what if the framework goes away" PTSD. Here's an example of what models look like with Flask-SQLAlchemy:
from flask import Flaskfrom flask_sqlalchemy import SQLAlchemy
app = Flask(__name__) app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db' db = SQLAlchemy(app) class User(db.Model): id = db.Column(db.Integer, primary_key=True) username = db.Column(db.String(80), unique=True, nullable=False
As you can see, the User class inherits from db.Model, something that the Flask-SQLAlchemy extension provides. Same for the Column and the DB types. I do understand why it would make us flinch to "tie" our models to the web framework, even indirectly. (that said, if we want to migrate those models from Flask-SQLAlchemy to native SQLAlchemy, it would be so trivial that plain sed can handle it)
So we've tried to do the SQLAlchemy integration in Flask ourselves. This has two downsides:
- it's not as easy as it looks, especially when migrations and Alembic come
into play
- we end up with many slightly different integrations, written by different
people or even the same person at different points in time. Ironically, our attempt at avoiding tech debt has caused us more tech debt.
Flask-SQLAlchemy has however some pretty strong points:
- it's widely used and of good quality
- it's maintained by the same people who maintain Flask (although the same
cannot be said of Flask-Migrate, the Alembic integration)
- it gives us some shortcuts very common in web apps, like get_or_404 or
pagination support, that we also had to write ourselves with differing implementations.
But if we don't want to go this route, I've extracted a lightweight way of integrating SQLAlchemy in Flask that leaves the models usable without Flask. I think it's a reasonable concern to separate entirely the models from the web app framework, so I'm not advocating for Flask-SQLAlchemy here. I'm advocating against each app having its own integration.
On top of that, while I don't see us moving away from Flask anytime soon, I do see a reason why we would want to look at other web frameworks in the near future, and that reason is async support. Flask 2.x has grown support for async views, but it's not as good https://flask.palletsprojects.com/en/2.0.x/async-await/#when-to-use-quart-instead as other async-native frameworks such as FastAPI or Starlette, and extensions are unlikely to support it. That said, using async with SQLAlchemy requires doing away with quite a few convenience features of the ORM, such as implicit queries on model attributes, so moving to an async framework would require us to rewrite all the database access layer anyway. Maybe this is not a good example after all.
Anyway, this long email is about finding a common ground for SQLAlchemy integration in Flask, while taking into account our difficult experiences with webframewoks in the past, but not being locked in them. Is there something that I misrepresented here? Do you have opinions? Preferences?
Thanks!
Aurélien
infrastructure@lists.fedoraproject.org