Maintaining Index Consistency
To maintain search index consistency in this project, you must ensure that the SearchIndex is updated whenever a bookmark is created, modified, or deleted. The SearchIndex is an in-memory inverted index that maps tokens from titles and descriptions to bookmark IDs.
Incremental Updates on Creation and Modification
When a bookmark is created or its searchable fields (title, description) are updated, use the index_bookmark method. This method acts as an "upsert"—it first removes any existing entries for the bookmark ID before re-indexing the new content.
In app/services/bookmark_service.py, this is handled within the create_bookmark and update_bookmark methods:
from app.models.bookmark import Bookmark
from app.services.search_service import SearchIndex
# Example: Updating the index during bookmark creation
def create_bookmark(self, data: Dict[str, Any]) -> Tuple[Optional[Bookmark], Optional[str]]:
# ... validation and persistence ...
bookmark = Bookmark.from_dict(data)
self._repo.save_bookmark(bookmark)
# Update the search index
self._search.index_bookmark(bookmark)
self._cache.invalidate(bookmark.id)
return bookmark, None
# Example: Updating the index during a partial update
def update_bookmark(self, bookmark_id: str, data: Dict[str, Any]) -> Tuple[Optional[Bookmark], Optional[str]]:
# ... retrieval and field updates ...
bookmark.title = data.get("title", bookmark.title)
bookmark._touch()
self._repo.save_bookmark(bookmark)
# Re-index the bookmark with updated title/description
self._search.index_bookmark(bookmark)
self._cache.invalidate(bookmark.id)
return bookmark, None
Removing Bookmarks from the Index
To remove a bookmark from the search results entirely, use the remove_bookmark method. This is essential for maintaining consistency if a bookmark is permanently deleted from the repository.
def hard_delete_bookmark(self, bookmark_id: str) -> bool:
# Remove from repository
success = self._repo.delete_bookmark(bookmark_id)
if success:
# Remove from search index to prevent stale results
self._search.remove_bookmark(bookmark_id)
self._cache.invalidate(bookmark_id)
return success
Automatic Index Rebuilding
The SearchIndex is entirely in-memory and does not persist to disk. It is automatically rebuilt from the BookmarkRepository whenever the service is initialized. This occurs in the SearchIndex.__init__ method via the private _rebuild helper:
# app/services/search_service.py
class SearchIndex:
def __init__(self, repository: "BookmarkRepository") -> None:
self._repo = repository
self._index: Dict[str, Set[str]] = defaultdict(set)
self._rebuild()
def _rebuild(self) -> None:
"""Rebuild the entire index from the repository."""
self._index.clear()
# Fetch all bookmarks (up to 10,000) and index them
all_bookmarks, _ = self._repo.list_bookmarks(page=1, per_page=10000)
for bookmark in all_bookmarks:
self.index_bookmark(bookmark)
Consistency Gotchas
- Soft-Deletes (Trash/Archive): In the current implementation of
BookmarkService, callingdelete_bookmark(which moves a bookmark to the trash) orarchive_bookmarkdoes not remove the bookmark from the search index. Search results will still include trashed or archived bookmarks unless the search query logic explicitly filters them out after retrieval. - In-Memory Lifecycle: Because the index is in-memory, any manual updates made directly to the
BookmarkRepository(bypassing theBookmarkService) will not be reflected in search until the application restarts or_rebuild()is called. - Tokenization Strategy: The index only stores tokens from the
titleanddescription. Changes to other fields (likeurlortags) do not require an index update as they are not currently indexed for full-text search. - Stop Words: Common words (defined in
_STOP_WORDSwithinsearch_service.py) are filtered out during indexing. Updating a bookmark to only contain stop words will effectively remove it from all search results.