Data Persistence

The data persistence layer in this project is designed around an "In-Memory First" architecture. While the current implementation is volatile, it uses a clean repository pattern that abstracts the underlying storage from the service layer, providing a blueprint for future migration to persistent databases like SQLite or PostgreSQL.

The Repository Layer

The BookmarkRepository class in app/db/repository.py serves as the central data store. It manages three primary collections—bookmarks, tags, and collections—using standard Python dictionaries. This choice allows for $O(1)$ lookups by ID and simplifies the initial development of the API.

Storage Structure

The repository initializes three private dictionaries to hold the application's state:

# From app/db/repository.py
class BookmarkRepository:
    def __init__(self) -> None:
        self._bookmarks: Dict[str, Bookmark] = {}
        self._tags: Dict[str, Tag] = {}
        self._collections: Dict[str, Collection] = {}

Retrieval and Filtering

The repository implements basic filtering and pagination logic. For example, list_bookmarks supports 1-indexed pagination and status-based filtering. A notable design choice here is the silent handling of invalid status strings; if a status filter is provided that does not match the BookmarkStatus enum, the filter is ignored rather than raising an error.

# From app/db/repository.py
def list_bookmarks(
    self,
    page: int = 1,
    per_page: int = 25,
    status: Optional[str] = None,
) -> Tuple[List[Bookmark], int]:
    items = list(self._bookmarks.values())
    if status:
        try:
            target = BookmarkStatus(status)
            items = [b for b in items if b.status == target]
        except ValueError:
            pass # Silent ignore of invalid status
    items.sort(key=lambda b: b.created_at, reverse=True)
    total = len(items)
    start = (page - 1) * per_page
    return items[start : start + per_page], total

Search Indexing

To support full-text search without a dedicated search engine (like Elasticsearch), the project implements a SearchIndex in app/services/search_service.py. This is an inverted index that maps tokens (words) to bookmark IDs.

Index Lifecycle

The SearchIndex is tightly coupled with the BookmarkRepository. Upon initialization, it performs a full rebuild by scanning the repository's contents. It also supports incremental updates via index_bookmark and remove_bookmark methods, which are called by the service layer during mutation operations.

# From app/services/search_service.py
class SearchIndex:
    def __init__(self, repository: "BookmarkRepository") -> None:
        self._repo = repository
        self._index: Dict[str, Set[str]] = defaultdict(set)
        self._rebuild()

    def _rebuild(self) -> None:
        """Rebuild the entire index from the repository."""
        self._index.clear()
        # Scans up to 10,000 items for the initial index
        all_bookmarks, _ = self._repo.list_bookmarks(page=1, per_page=10000)
        for bookmark in all_bookmarks:
            self.index_bookmark(bookmark)

Search Logic

The search implementation uses a simple boolean AND logic: all tokens in a query must be present in a bookmark's title or description for it to be considered a match. Results are then ranked based on the frequency of token occurrences.

Future Persistence Blueprint

While the current repository is in-memory, the codebase includes internal stubs in app/db/_connection.py that demonstrate how a real database connection pool would be managed.

The _ConnectionPool and _Connection classes provide thread-safe management of database connections, including support for nested transactions and pool sizing. Although these are currently unused by the BookmarkRepository, they serve as a template for a future SQL-based implementation.

# From app/db/_connection.py
class _Connection:
    def begin_transaction(self) -> None:
        """Start a new transaction (supports nesting via savepoints)."""
        self._transaction_depth += 1

class _ConnectionPool:
    def acquire(self) -> _Connection:
        """Borrow a connection from the pool."""
        with self._lock:
            # Logic for managing available vs in-use connections
            ...

Design Tradeoffs and Constraints

The current persistence implementation involves several deliberate tradeoffs:

Volatility: Because data is stored in-memory, all bookmarks, tags, and collections are lost when the application process terminates. This makes the current version suitable for testing and ephemeral environments but not for production use without modification.
Initialization Overhead: The SearchIndex rebuilds itself entirely on startup. While efficient for small datasets, this could lead to significant latency as the number of bookmarks grows toward the 10,000-item limit defined in _rebuild.
Consistency: The BookmarkService is responsible for keeping the BookmarkRepository, SearchIndex, and its internal LRUCache in sync. Since the repository lacks transaction support, a failure during a multi-step update (e.g., saving a bookmark but failing to update the search index) could lead to temporary inconsistencies.
Pagination: The repository uses 1-based indexing for pagination, which aligns with standard API consumer expectations but requires careful offset calculation ((page - 1) * per_page) in the internal logic.

The Repository Layer​

Storage Structure​

Retrieval and Filtering​

Search Indexing​

Index Lifecycle​

Search Logic​

Future Persistence Blueprint​

Design Tradeoffs and Constraints​