Skip to main content

Performing Text Searches

To execute free-text queries against the bookmark index, use the SearchIndex.search method. This method performs an "AND" search across tokens found in bookmark titles and descriptions, returning ranked results.

Executing a Search via BookmarkService

The most common way to perform a search is through the BookmarkService, which manages the SearchIndex singleton.

from app.services.bookmark_service import BookmarkService
from app.db.repository import BookmarkRepository

# Assuming service is already initialized
# results is a List[Bookmark]
results = bookmark_service.search(query="python tutorial", limit=10)

for bookmark in results:
print(f"Found: {bookmark.title} ({bookmark.url})")

Searching via the REST API

You can also execute searches by sending a GET request to the /bookmarks/search endpoint.

# Search for bookmarks containing both "python" and "tutorial"
curl "http://localhost:5000/bookmarks/search?q=python+tutorial&limit=5"

The route handler in app/routes/bookmarks.py processes this request:

@bookmarks_bp.route("/search", methods=["GET"])
def search_bookmarks():
query = request.args.get("q", "")
limit = request.args.get("limit", 20, type=int)
results = _service.search(query, limit=limit)
return jsonify({"results": [b.to_dict() for b in results], "count": len(results)})

How Search Works

The SearchIndex class in app/services/search_service.py implements the search logic using an in-memory inverted index.

1. Tokenization and Filtering

The query is first tokenized using the _tokenize method. It converts text to lowercase, extracts alphanumeric words using [a-z0-9]+, and filters out common stop words defined in _STOP_WORDS (e.g., "the", "and", "is").

_STOP_WORDS = {"the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "is", "it"}

def _tokenize(self, text: str) -> List[str]:
tokens = _TOKEN_RE.findall(text.lower())
return [t for t in tokens if t not in _STOP_WORDS]

2. AND Logic

The search requires all non-stop-word tokens to be present in a bookmark for it to be included in the results. This is achieved by intersecting the sets of bookmark IDs associated with each token.

candidate_ids: Set[str] = self._index.get(tokens[0], set()).copy()
for token in tokens[1:]:
candidate_ids &= self._index.get(token, set())

3. Relevance Ranking

Results are ranked by the total number of times the query tokens appear in the bookmark's title and description combined.

@staticmethod
def _rank_results(bookmarks: List[Bookmark], tokens: List[str]) -> List[Bookmark]:
def score(b: Bookmark) -> int:
text = f"{b.title} {b.description}".lower()
return sum(text.count(t) for t in tokens)

return sorted(bookmarks, key=score, reverse=True)

Troubleshooting and Limitations

  • Stop Words: If your query consists only of stop words (e.g., searching for "the and"), the search method will return an empty list immediately.
  • Strict Matching: Because of the "AND" logic, adding more specific terms to a query will always return fewer or equal results, never more.
  • In-Memory Index: The index is rebuilt from the database every time the application starts in SearchIndex._rebuild(). For very large datasets, this may cause a delay during service initialization.
  • Case Insensitivity: All searches are case-insensitive as both the index and the queries are normalized to lowercase during tokenization.