Db engine cache refactor by PGijsbers · Pull Request #328 · openml/server-api

PGijsbers · 2026-05-08T10:26:58Z

Use functools to cache the database engine instead of private module variables.

coderabbitai · 2026-05-08T10:27:10Z

Walkthrough

This PR refactors database engine lifecycle management in src/database/setup.py. A new _create_engine helper consolidates SQLAlchemy async engine construction with URL logging and standard configuration (echo and pool_recycle=3600). The user_database() and expdb_database() accessor functions replace manual module-level singleton variables with @functools.cache decoration, caching AsyncEngine instances per configuration. The close_databases() function was updated to dispose engines by checking cached accessor function state and clearing caches, removing direct global variable access.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Db engine cache refactor' accurately captures the main change - refactoring database engine caching from module variables to functools.cache.
Description check	✅ Passed	The description directly relates to the changeset, explaining the shift from private module variables to functools for database engine caching.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch db-refactor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

sourcery-ai

Hey - I've found 2 issues, and left some high level feedback:

In close_databases, you call db() twice per iteration (for url and dispose()), which is unnecessary; consider assigning engine = db() once and using engine for both logging and disposal before clearing the cache.
The new log line in _create_engine logs the full db_url, which may include credentials; consider using db_url.render_as_string(hide_password=True) or similar to avoid leaking sensitive information in logs.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `close_databases`, you call `db()` twice per iteration (for `url` and `dispose()`), which is unnecessary; consider assigning `engine = db()` once and using `engine` for both logging and disposal before clearing the cache.
- The new log line in `_create_engine` logs the full `db_url`, which may include credentials; consider using `db_url.render_as_string(hide_password=True)` or similar to avoid leaking sensitive information in logs.

## Individual Comments

### Comment 1
<location path="src/database/setup.py" line_range="20" />
<code_context>
         database=db_config.database,
     )
+
+    logger.info("Creating database engine for {db_url}", db_url=db_url)
     return create_async_engine(
         db_url,
</code_context>
<issue_to_address>
**🚨 issue (security):** Avoid logging the full DB URL as it may expose credentials or other sensitive details.

`db_url`’s string/repr may include the username and password. Please log only non-sensitive parts (e.g., driver, host, database name) by masking the password (e.g., `db_url.set(password="***")`) or constructing a sanitized string instead.
</issue_to_address>

### Comment 2
<location path="src/database/setup.py" line_range="42" />
<code_context>
-        _expdb_engine = None
+    for db in (user_database, expdb_database):
+        if db.cache_info().currsize == 1:
+            logger.info("Disposing of engine connected to {db_url}", db_url=db().url)
+            await db().dispose()
+            db.cache_clear()
</code_context>
<issue_to_address>
**🚨 issue (security):** The dispose log message may leak the DB URL; consider masking as with the creation log.

As with the creation log, `db().url` can include credentials and internal host details. Please log a sanitized/partial value instead (e.g., mask password or log only host and database name).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/database/setup.py`:
- Line 20: The info-level logs currently print the full db_url via
logger.info("Creating database engine for {db_url}", db_url=db_url) (and the
similar call around line 42); change these to avoid exposing the full URL:
either log a redacted identifier/alias (e.g., derive a short db_name or masked
string from db_url) at info level, and move the full db_url logging to
logger.debug(...), or simply change the present logger.info(...) calls to
logger.debug(...) and add a new logger.info(...) that prints only the redacted
alias; update both occurrences that reference db_url.
- Around line 40-44: The loop over (user_database, expdb_database) calls db()
multiple times and can abort if await db().dispose() raises; update the loop to
call db() once per iteration (e.g., engine = db()), check engine and
cache_info(), then perform await engine.dispose() inside a try/except to isolate
errors per engine and always call db.cache_clear() in a finally block so the
cache is cleared even if dispose fails; reference the existing symbols
user_database, expdb_database, db.cache_info(), await ...dispose(), and
db.cache_clear().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ff4b47cb-c9f0-40d5-8e23-95284baed773

📥 Commits

Reviewing files that changed from the base of the PR and between 1d44c05 and 8a7beea.

📒 Files selected for processing (1)

src/database/setup.py

coderabbitai · 2026-05-08T10:28:29Z

        database=db_config.database,
    )
+
+    logger.info("Creating database engine for {db_url}", db_url=db_url)


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reduce database URL exposure in info logs.

Line 20 and Line 42 log connection URLs at info level. Even with password masking, this can leak infrastructure/user metadata into normal production logs. Prefer a redacted identifier (e.g., db name alias) or move full URL logging to debug.

Also applies to: 42-42

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/database/setup.py` at line 20, The info-level logs currently print the full db_url via logger.info("Creating database engine for {db_url}", db_url=db_url) (and the similar call around line 42); change these to avoid exposing the full URL: either log a redacted identifier/alias (e.g., derive a short db_name or masked string from db_url) at info level, and move the full db_url logging to logger.debug(...), or simply change the present logger.info(...) calls to logger.debug(...) and add a new logger.info(...) that prints only the redacted alias; update both occurrences that reference db_url.

coderabbitai · 2026-05-08T10:28:29Z

+    for db in (user_database, expdb_database):
+        if db.cache_info().currsize == 1:
+            logger.info("Disposing of engine connected to {db_url}", db_url=db().url)
+            await db().dispose()
+            db.cache_clear()


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make shutdown resilient per engine and always clear cache state.

If await db().dispose() fails for one engine, the loop exits early and the other cached engine is not disposed. Also, db() is called multiple times in the same iteration. Capture the engine once, dispose with error isolation, and clear cache in finally.

Proposed fix

async def close_databases() -> None: """Close all database connections.""" for db in (user_database, expdb_database): if db.cache_info().currsize == 1: - logger.info("Disposing of engine connected to {db_url}", db_url=db().url) - await db().dispose() - db.cache_clear() + engine = db() + try: + logger.info("Disposing database engine") + await engine.dispose() + except Exception: + logger.exception("Failed disposing database engine") + finally: + db.cache_clear()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/database/setup.py` around lines 40 - 44, The loop over (user_database, expdb_database) calls db() multiple times and can abort if await db().dispose() raises; update the loop to call db() once per iteration (e.g., engine = db()), check engine and cache_info(), then perform await engine.dispose() inside a try/except to isolate errors per engine and always call db.cache_clear() in a finally block so the cache is cleared even if dispose fails; reference the existing symbols user_database, expdb_database, db.cache_info(), await ...dispose(), and db.cache_clear().

codecov · 2026-05-08T10:28:45Z

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 94.51%. Comparing base (1d44c05) to head (8a7beea).

Files with missing lines	Patch %	Lines
src/database/setup.py	91.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #328      +/-   ##
==========================================
+ Coverage   94.48%   94.51%   +0.02%     
==========================================
  Files          68       68              
  Lines        3157     3155       -2     
  Branches      229      227       -2     
==========================================
- Hits         2983     2982       -1     
  Misses        111      111              
+ Partials       63       62       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PGijsbers added 2 commits May 8, 2026 12:05

use functools for caching instead of a module variable

8dea27c

Log creation and disposing of database engines

8a7beea

PGijsbers added the maintenance improvements or changes to existing systems label May 8, 2026

sourcery-ai Bot reviewed May 8, 2026

View reviewed changes

Comment thread src/database/setup.py

Comment thread src/database/setup.py

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Db engine cache refactor#328

Db engine cache refactor#328
PGijsbers wants to merge 2 commits intomainfrom
db-refactor

PGijsbers commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

coderabbitai Bot May 8, 2026

Uh oh!

codecov Bot commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

PGijsbers commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

❌ Failed checks (1 warning)

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented May 8, 2026 •

edited

Loading

codecov Bot commented May 8, 2026 •

edited

Loading