Performing Text Searches
To execute free-text queries against the bookmark index, use the SearchIndex.search method. This method performs an "AND" search across tokens found in bookmark titles and descriptions, returning ranked results.
Executing a Search via BookmarkService
The most common way to perform a search is through the BookmarkService, which manages the SearchIndex singleton.
from app.services.bookmark_service import BookmarkService
from app.db.repository import BookmarkRepository
# Assuming service is already initialized
# results is a List[Bookmark]
results = bookmark_service.search(query="python tutorial", limit=10)
for bookmark in results:
print(f"Found: {bookmark.title} ({bookmark.url})")
Searching via the REST API
You can also execute searches by sending a GET request to the /bookmarks/search endpoint.
# Search for bookmarks containing both "python" and "tutorial"
curl "http://localhost:5000/bookmarks/search?q=python+tutorial&limit=5"
The route handler in app/routes/bookmarks.py processes this request:
@bookmarks_bp.route("/search", methods=["GET"])
def search_bookmarks():
query = request.args.get("q", "")
limit = request.args.get("limit", 20, type=int)
results = _service.search(query, limit=limit)
return jsonify({"results": [b.to_dict() for b in results], "count": len(results)})
How Search Works
The SearchIndex class in app/services/search_service.py implements the search logic using an in-memory inverted index.
1. Tokenization and Filtering
The query is first tokenized using the _tokenize method. It converts text to lowercase, extracts alphanumeric words using [a-z0-9]+, and filters out common stop words defined in _STOP_WORDS (e.g., "the", "and", "is").
_STOP_WORDS = {"the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "is", "it"}
def _tokenize(self, text: str) -> List[str]:
tokens = _TOKEN_RE.findall(text.lower())
return [t for t in tokens if t not in _STOP_WORDS]
2. AND Logic
The search requires all non-stop-word tokens to be present in a bookmark for it to be included in the results. This is achieved by intersecting the sets of bookmark IDs associated with each token.
candidate_ids: Set[str] = self._index.get(tokens[0], set()).copy()
for token in tokens[1:]:
candidate_ids &= self._index.get(token, set())
3. Relevance Ranking
Results are ranked by the total number of times the query tokens appear in the bookmark's title and description combined.
@staticmethod
def _rank_results(bookmarks: List[Bookmark], tokens: List[str]) -> List[Bookmark]:
def score(b: Bookmark) -> int:
text = f"{b.title} {b.description}".lower()
return sum(text.count(t) for t in tokens)
return sorted(bookmarks, key=score, reverse=True)
Troubleshooting and Limitations
- Stop Words: If your query consists only of stop words (e.g., searching for "the and"), the
searchmethod will return an empty list immediately. - Strict Matching: Because of the "AND" logic, adding more specific terms to a query will always return fewer or equal results, never more.
- In-Memory Index: The index is rebuilt from the database every time the application starts in
SearchIndex._rebuild(). For very large datasets, this may cause a delay during service initialization. - Case Insensitivity: All searches are case-insensitive as both the index and the queries are normalized to lowercase during tokenization.