Data Persistence
The data persistence layer in this project is designed around an "In-Memory First" architecture. While the current implementation is volatile, it uses a clean repository pattern that abstracts the underlying storage from the service layer, providing a blueprint for future migration to persistent databases like SQLite or PostgreSQL.
The Repository Layer
The BookmarkRepository class in app/db/repository.py serves as the central data store. It manages three primary collections—bookmarks, tags, and collections—using standard Python dictionaries. This choice allows for $O(1)$ lookups by ID and simplifies the initial development of the API.
Storage Structure
The repository initializes three private dictionaries to hold the application's state:
# From app/db/repository.py
class BookmarkRepository:
def __init__(self) -> None:
self._bookmarks: Dict[str, Bookmark] = {}
self._tags: Dict[str, Tag] = {}
self._collections: Dict[str, Collection] = {}
Retrieval and Filtering
The repository implements basic filtering and pagination logic. For example, list_bookmarks supports 1-indexed pagination and status-based filtering. A notable design choice here is the silent handling of invalid status strings; if a status filter is provided that does not match the BookmarkStatus enum, the filter is ignored rather than raising an error.
# From app/db/repository.py
def list_bookmarks(
self,
page: int = 1,
per_page: int = 25,
status: Optional[str] = None,
) -> Tuple[List[Bookmark], int]:
items = list(self._bookmarks.values())
if status:
try:
target = BookmarkStatus(status)
items = [b for b in items if b.status == target]
except ValueError:
pass # Silent ignore of invalid status
items.sort(key=lambda b: b.created_at, reverse=True)
total = len(items)
start = (page - 1) * per_page
return items[start : start + per_page], total
Search Indexing
To support full-text search without a dedicated search engine (like Elasticsearch), the project implements a SearchIndex in app/services/search_service.py. This is an inverted index that maps tokens (words) to bookmark IDs.
Index Lifecycle
The SearchIndex is tightly coupled with the BookmarkRepository. Upon initialization, it performs a full rebuild by scanning the repository's contents. It also supports incremental updates via index_bookmark and remove_bookmark methods, which are called by the service layer during mutation operations.
# From app/services/search_service.py
class SearchIndex:
def __init__(self, repository: "BookmarkRepository") -> None:
self._repo = repository
self._index: Dict[str, Set[str]] = defaultdict(set)
self._rebuild()
def _rebuild(self) -> None:
"""Rebuild the entire index from the repository."""
self._index.clear()
# Scans up to 10,000 items for the initial index
all_bookmarks, _ = self._repo.list_bookmarks(page=1, per_page=10000)
for bookmark in all_bookmarks:
self.index_bookmark(bookmark)
Search Logic
The search implementation uses a simple boolean AND logic: all tokens in a query must be present in a bookmark's title or description for it to be considered a match. Results are then ranked based on the frequency of token occurrences.
Future Persistence Blueprint
While the current repository is in-memory, the codebase includes internal stubs in app/db/_connection.py that demonstrate how a real database connection pool would be managed.
The _ConnectionPool and _Connection classes provide thread-safe management of database connections, including support for nested transactions and pool sizing. Although these are currently unused by the BookmarkRepository, they serve as a template for a future SQL-based implementation.
# From app/db/_connection.py
class _Connection:
def begin_transaction(self) -> None:
"""Start a new transaction (supports nesting via savepoints)."""
self._transaction_depth += 1
class _ConnectionPool:
def acquire(self) -> _Connection:
"""Borrow a connection from the pool."""
with self._lock:
# Logic for managing available vs in-use connections
...
Design Tradeoffs and Constraints
The current persistence implementation involves several deliberate tradeoffs:
- Volatility: Because data is stored in-memory, all bookmarks, tags, and collections are lost when the application process terminates. This makes the current version suitable for testing and ephemeral environments but not for production use without modification.
- Initialization Overhead: The
SearchIndexrebuilds itself entirely on startup. While efficient for small datasets, this could lead to significant latency as the number of bookmarks grows toward the 10,000-item limit defined in_rebuild. - Consistency: The
BookmarkServiceis responsible for keeping theBookmarkRepository,SearchIndex, and its internalLRUCachein sync. Since the repository lacks transaction support, a failure during a multi-step update (e.g., saving a bookmark but failing to update the search index) could lead to temporary inconsistencies. - Pagination: The repository uses 1-based indexing for pagination, which aligns with standard API consumer expectations but requires careful offset calculation (
(page - 1) * per_page) in the internal logic.