Skip to main content

Maintaining Index Consistency

To maintain search index consistency in this project, you must ensure that the SearchIndex is updated whenever a bookmark is created, modified, or deleted. The SearchIndex is an in-memory inverted index that maps tokens from titles and descriptions to bookmark IDs.

Incremental Updates on Creation and Modification

When a bookmark is created or its searchable fields (title, description) are updated, use the index_bookmark method. This method acts as an "upsert"—it first removes any existing entries for the bookmark ID before re-indexing the new content.

In app/services/bookmark_service.py, this is handled within the create_bookmark and update_bookmark methods:

from app.models.bookmark import Bookmark
from app.services.search_service import SearchIndex

# Example: Updating the index during bookmark creation
def create_bookmark(self, data: Dict[str, Any]) -> Tuple[Optional[Bookmark], Optional[str]]:
# ... validation and persistence ...
bookmark = Bookmark.from_dict(data)
self._repo.save_bookmark(bookmark)

# Update the search index
self._search.index_bookmark(bookmark)

self._cache.invalidate(bookmark.id)
return bookmark, None

# Example: Updating the index during a partial update
def update_bookmark(self, bookmark_id: str, data: Dict[str, Any]) -> Tuple[Optional[Bookmark], Optional[str]]:
# ... retrieval and field updates ...
bookmark.title = data.get("title", bookmark.title)
bookmark._touch()
self._repo.save_bookmark(bookmark)

# Re-index the bookmark with updated title/description
self._search.index_bookmark(bookmark)

self._cache.invalidate(bookmark.id)
return bookmark, None

Removing Bookmarks from the Index

To remove a bookmark from the search results entirely, use the remove_bookmark method. This is essential for maintaining consistency if a bookmark is permanently deleted from the repository.

def hard_delete_bookmark(self, bookmark_id: str) -> bool:
# Remove from repository
success = self._repo.delete_bookmark(bookmark_id)
if success:
# Remove from search index to prevent stale results
self._search.remove_bookmark(bookmark_id)
self._cache.invalidate(bookmark_id)
return success

Automatic Index Rebuilding

The SearchIndex is entirely in-memory and does not persist to disk. It is automatically rebuilt from the BookmarkRepository whenever the service is initialized. This occurs in the SearchIndex.__init__ method via the private _rebuild helper:

# app/services/search_service.py

class SearchIndex:
def __init__(self, repository: "BookmarkRepository") -> None:
self._repo = repository
self._index: Dict[str, Set[str]] = defaultdict(set)
self._rebuild()

def _rebuild(self) -> None:
"""Rebuild the entire index from the repository."""
self._index.clear()
# Fetch all bookmarks (up to 10,000) and index them
all_bookmarks, _ = self._repo.list_bookmarks(page=1, per_page=10000)
for bookmark in all_bookmarks:
self.index_bookmark(bookmark)

Consistency Gotchas

  • Soft-Deletes (Trash/Archive): In the current implementation of BookmarkService, calling delete_bookmark (which moves a bookmark to the trash) or archive_bookmark does not remove the bookmark from the search index. Search results will still include trashed or archived bookmarks unless the search query logic explicitly filters them out after retrieval.
  • In-Memory Lifecycle: Because the index is in-memory, any manual updates made directly to the BookmarkRepository (bypassing the BookmarkService) will not be reflected in search until the application restarts or _rebuild() is called.
  • Tokenization Strategy: The index only stores tokens from the title and description. Changes to other fields (like url or tags) do not require an index update as they are not currently indexed for full-text search.
  • Stop Words: Common words (defined in _STOP_WORDS within search_service.py) are filtered out during indexing. Updating a bookmark to only contain stop words will effectively remove it from all search results.