Skip to main content

Command Palette

Search for a command to run...

Text Search and Geo Queries: A Deep Dive into Location-Aware Search in Cloud Firestore

Published
12 min read
Text Search and Geo Queries: A Deep Dive into Location-Aware Search in Cloud Firestore

Every useful app eventually needs to answer this question:

"Find me something specific, near where I am."

Coffee shops with WiFi. Events happening this weekend within 5km. Drivers available in your neighbourhood. It sounds like a simple feature request. But if you have tried to build this in Cloud Firestore before, you know it has never been simple.

You would end up with something like this:

  • Firestore as your source of truth

  • Algolia or Typesense for text search

  • A Cloud Function to sync data between them

  • Geohash libraries and bounding box math for proximity

Three moving parts to answer one user query. And when the Algolia sync breaks at 2am because of a malformed document, you are the one debugging it.

Cloud Firestore now supports native text search and geospatial querying. This changes the architecture for a large class of applications — not all of them, but enough that it is worth understanding deeply. This article walks through how to combine both in a real implementation, covers three strategies with different performance profiles, and is honest about where you will still hit walls.

Native text search and geospatial querying require Firestore Enterprise edition in Native mode. These features are currently in Preview. If you are on the standard Firestore tier, the Pipeline-based search and geo stages covered in this article will not be available — you will be limited to the geohash + array-contains strategies in Sections A, B, and C below. Check your database edition in the Google Cloud console before building against these APIs.

The Problem

We are trying to answer queries of this shape:

Given a collection D, return documents matching text T within radius R of point P — ideally ranked by relevance.

That involves three distinct problems:

  1. Text relevance — does this document match what the user typed?

  2. Spatial relevance — is this document close to the user?

  3. Ranking — of everything that matches both, which result comes first?

Firestore has always been excellent at deterministic indexed field lookups. It was not designed for ranking or spatial math. These new capabilities close that gap meaningfully, but the implementation strategy you choose still defines your performance, cost, and correctness ceiling.

Why This Was Hard Before

Each part of the typical stack solved only one piece of the problem.

Cloud Firestore was great at querying by exact field values. It had no concept of text relevance or distance. The closest you could get to geo queries was the geohash trick — precomputing a hash string at write time and using range queries to approximate a radius.

Algolia had excellent text relevance and good developer experience. It also had geo search. But it was a separate system, which meant maintaining a sync pipeline, paying for two services, and debugging the moments they fell out of sync.

Elasticsearch solved both text and geo natively and at scale. It also required you to run and tune a cluster, which is a significant operational commitment for a team that just wants to ship a feature.

The result for most Firebase-based teams: a multi-service architecture before they had even validated the product. Native text and geo in Firestore does not eliminate the complexity entirely — but it collapses it for enough use cases that the tradeoff conversation changes.

The Data Model

I will use a places collection throughout. A document looks like this:

{
  name: "Cafe Kimi",
  description: "Good espresso, fast WiFi, power sockets everywhere",
  tags: ["coffee", "wifi", "remote-work"],
  location: new GeoPoint(0.3476, 32.5825),
  geohash: "k9f3m7"
}

A few decisions worth calling out:

Use GeoPoint, not separate lat/lng fields. Firestore's GeoPoint type stores coordinates as an atomic unit. It works cleanly with the geo libraries you will use later and signals intent clearly in your schema.

Precompute the geohash at write time. This is a string like "k9f3m7" that encodes a geographic region. You will need it for Strategy C. Generate it using geofire-common when you create or update a document — do not try to compute it at query time.

Keep text fields intentional. The more text fields you add, the broader your search surface. Be deliberate about what you are matching against — name, description, and tags cover most use cases without over-indexing.

Here is a helper to write a place with the geohash included:

import { geohashForLocation } from "geofire-common";
import { collection, addDoc } from "firebase/firestore";
import { GeoPoint } from "firebase/firestore";

async function addPlace(place) {
  const geohash = geohashForLocation([
    place.lat,
    place.lng
  ]);

  await addDoc(collection(db, "places"), {
    name: place.name,
    description: place.description,
    tags: place.tags,
    location: new GeoPoint(place.lat, place.lng),
    geohash
  });
}

If you are retrofitting geo onto an existing collection, write a one-time migration script that adds the geohash field to all existing documents before you change your query logic.

The Strategies

There is no single correct approach. Each strategy has a different performance profile and a different breaking point. The right one depends on your dataset size, query patterns, and how much text relevance matters to your users.

Strategy A: Text-First, Filter by Distance

Run a text query against Firestore, then filter the results by distance in your application layer.

import { collection, query, where, getDocs } from "firebase/firestore";
import { distanceBetween } from "geofire-common";

const userLocation = [0.3136, 32.5811]; // Kampala city centre
const RADIUS_KM = 3;

async function searchNearby(tags) {
  const q = query(
    collection(db, "places"),
    where("tags", "array-contains-any", tags)
  );

  const snapshot = await getDocs(q);

  return snapshot.docs
    .map(doc => ({ id: doc.id, ...doc.data() }))
    .filter(place => {
      const distance = distanceBetween(
        [place.location.latitude, place.location.longitude],
        userLocation
      );
      return distance <= RADIUS_KM;
    });
}

// Usage
const results = await searchNearby(["wifi", "coffee"]);

This is the simplest approach and it works. The issue is read amplification — you are fetching every document that matches the text query, regardless of location, and discarding the ones too far away in your application code.

Performance characteristics:

  • Simple mental model, easy to debug

  • Read count scales with the size of your text match, not your geographic constraint

  • Distance calculation happens client-side

Use this when your dataset is small (under 10k documents), geo is a loose secondary filter, or you are still in the prototyping stage and want to move fast.

Break point: large collections. At 50k documents, a common tag like "wifi" might match 8,000 documents. You are paying for 8,000 reads to serve 15 results.

Strategy B: Geo-First, Filter by Text

Flip the order. Query by geographic boundary first, then apply text filtering to the results.

import { geohashQueryBounds, distanceBetween } from "geofire-common";
import { collection, query, orderBy, startAt, endAt, getDocs } from "firebase/firestore";

const center = [0.3136, 32.5811];
const radiusInM = 3000;

async function searchByLocationFirst(textTag) {
  const bounds = geohashQueryBounds(center, radiusInM);

  const promises = bounds.map(bound => {
    const q = query(
      collection(db, "places"),
      orderBy("geohash"),
      startAt(bound[0]),
      endAt(bound[1])
    );
    return getDocs(q);
  });

  const snapshots = await Promise.all(promises);

  return snapshots
    .flatMap(snap => snap.docs.map(doc => ({ id: doc.id, ...doc.data() })))
    .filter(place => {
      // Geohash bounds are approximate — confirm actual distance
      const distance = distanceBetween(
        [place.location.latitude, place.location.longitude],
        center
      );
      return distance <= radiusInM / 1000;
    })
    .filter(place => place.tags.includes(textTag));
}

// Usage
const results = await searchByLocationFirst("wifi");

This is significantly more read-efficient for location-dense queries. You are scoping to a geographic bucket first, which means far fewer documents make it to the text filter.

The tradeoff: text relevance becomes a hard post-filter, not a ranking signal. If your user types "espresso" and your documents only have the tag "coffee", they get nothing back. The geo boundary is also approximate — geohash regions are squares, not circles. You will see results just outside your intended radius that need a second distance check, which is why the distanceBetween confirmation is in the code above.

Performance characteristics:

  • Lower read counts, especially in geographically dense datasets

  • Multiple parallel queries (one per geohash bound) — typically 2 to 4

  • Text filtering happens client-side after geo fetch

Use this when location is the primary constraint. Logistics apps, delivery, ride-hailing — anything where proximity matters more than keyword precision.

Break point: when text matching quality starts to matter. You are doing exact tag matching here, not relevance scoring.

Strategy C: Hybrid Indexed Query

Combine geohash range queries with Firestore's text filtering in a single server-side query. Both filters happen before Firestore sends results back.

import { geohashQueryBounds, distanceBetween } from "geofire-common";
import { collection, query, where, orderBy, startAt, endAt, getDocs } from "firebase/firestore";

const center = [0.3136, 32.5811];
const radiusInM = 3000;

async function hybridSearch(tag) {
  const bounds = geohashQueryBounds(center, radiusInM);

  const promises = bounds.map(bound => {
    const q = query(
      collection(db, "places"),
      where("tags", "array-contains", tag),  // text filter
      orderBy("geohash"),                      // geo range
      startAt(bound[0]),
      endAt(bound[1])
    );
    return getDocs(q);
  });

  const snapshots = await Promise.all(promises);

  return snapshots
    .flatMap(snap => snap.docs.map(doc => ({ id: doc.id, ...doc.data() })))
    .filter(place => {
      const distance = distanceBetween(
        [place.location.latitude, place.location.longitude],
        center
      );
      return distance <= radiusInM / 1000;
    });
}

// Usage
const results = await hybridSearch("wifi");

This requires a composite index on (tags, geohash). Firestore will throw an error with a direct link to create it the first time you run the query — click the link and it will be ready in a few minutes.

Performance characteristics:

  • Lowest read count of the three strategies

  • Both filters applied server-side

  • Composite index required per tag/geohash combination

Use this when you need production-level performance and are dealing with a growing dataset. This is the strategy that does not fall apart at scale.

Break point: compound query constraints. Firestore does not allow combining array-contains-any with a range query (orderBy, startAt, endAt) on the same collection. You are limited to array-contains — one tag at a time. Multi-tag search requires client-side merging of multiple queries.

Ranking: The Layer None of These Strategies Give You

Here is the part most tutorials skip: all three strategies above produce a filtered list. None of them produce a ranked list.

A real user expects:

"The most relevant result near me should come first — not just any result within 3km."

That requires a scoring function. The simplest useful model looks like this:

score(d) = α × text_relevance(d) + β × (1 / distance(d, P))

Where α and β are weights you tune based on how much text relevance vs. proximity should influence the result order.

Firestore does not support this server-side. The ranking has to happen in your application layer, after you have the candidates.

In practice, this means two things:

Fetch enough candidates to rank meaningfully. If you only fetch the first 10 results before ranking, you might be discarding the most relevant result. Fetch a larger pool (50–100 documents), rank them, then slice to your display limit.

Your ranking quality is bounded by your text signal. If your text "relevance" is just a binary tag match, ranking by text score is not very useful. If you are storing a relevance score from an embedding or using a richer text index, the ranking becomes meaningful.

Here is a simple distance-weighted sort you can apply after any of the three strategies:

function rankResults(results, userLocation) {
  return results
    .map(place => {
      const distance = distanceBetween(
        [place.location.latitude, place.location.longitude],
        userLocation
      );
      return { ...place, distance };
    })
    .sort((a, b) => a.distance - b.distance);
}

For most apps, sorting by distance alone is enough. The moment your users expect the best match rather than the nearest match, you are in territory where a dedicated search engine's relevance model starts to show its value.

Cost Implications

Firestore pricing is directly tied to document reads and index storage. The strategy you choose has a material impact on your bill at scale.

Strategy A has the highest read amplification. If your text query matches 5,000 documents and only 80 are within your radius, you are paying for 5,000 reads.

Strategy B brings reads down significantly by geo-scoping first. The cost of geo queries is roughly proportional to the density of your data within the radius, not the total collection size.

Strategy C gives you the lowest read count but the highest index cost. Every composite index you add takes up storage and must be maintained on every write. For a collection with many tag values and high write volume, this adds up.

A rough rule: if reads are your bottleneck, use Strategy C. If writes are your bottleneck and your collection is large, profile the index storage cost before committing.

Putting It Together: A Complete Example

Here is a full implementation of the hybrid approach for a "find cafés with WiFi near me" feature:

import { db } from "./firebase";
import { collection, query, where, orderBy, startAt, endAt, getDocs } from "firebase/firestore";
import { geohashQueryBounds, distanceBetween } from "geofire-common";

const RADIUS_KM = 3;

async function findNearbyPlaces({ lat, lng }, tags) {
  const center = [lat, lng];
  const radiusInM = RADIUS_KM * 1000;
  const bounds = geohashQueryBounds(center, radiusInM);

  // Run one query per geohash bound, with text filter applied server-side
  const promises = bounds.map(bound =>
    getDocs(
      query(
        collection(db, "places"),
        where("tags", "array-contains", tags[0]), // primary tag
        orderBy("geohash"),
        startAt(bound[0]),
        endAt(bound[1])
      )
    )
  );

  const snapshots = await Promise.all(promises);

  const candidates = snapshots.flatMap(snap =>
    snap.docs.map(doc => ({ id: doc.id, ...doc.data() }))
  );

  // Confirm distance (geohash bounds are approximate)
  // Then sort by proximity
  return candidates
    .map(place => ({
      ...place,
      distance: distanceBetween(
        [place.location.latitude, place.location.longitude],
        center
      )
    }))
    .filter(place => place.distance <= RADIUS_KM)
    .sort((a, b) => a.distance - b.distance);
}

// Example: find coffee shops with WiFi near Kampala
const results = await findNearbyPlaces(
  { lat: 0.3136, lng: 32.5811 },
  ["wifi"]
);

console.log(`Found \({results.length} places within \){RADIUS_KM}km`);
results.forEach(place => {
  console.log(`\({place.name} — \){place.distance.toFixed(2)}km away`);
});

References

5 views