BiqQuery query cost

5 BigQuery Query Cost Mistakes Draining Your Cloud Budget

google biq query blog

How to Avoid Costly BigQuery Query Mistakes

Get in Touch
BiqQuery query cost

1. Using SELECT * and Scanning Excess Data

Querying all columns in a table (SELECT *) is not good practise in any system but can be a frequent but costly mistake. Since BigQuery charges based on the data scanned, not returned, this approach can significantly inflate query costs.

Tip: Specify only the columns you need. Use tools like the BigQuery query validator or cost calculator to preview the data scanned before running queries.

-- Common costly query (scans all columns)
SELECT *
FROM `project.dataset.sales`
WHERE sale_date >= '2025-01-01';

-- Optimised version (scans only needed columns)
SELECT sale_id, sale_date, amount
FROM `project.dataset.sales`
WHERE sale_date >= '2025-01-01';

2. Skipping Partitioning and Clustering

Without partitioning and clustering, queries can scan entire tables unnecessarily, driving up costs.

BigQuery partitioning supports ingestion-time and column-based partitioning (DATE/TIMESTAMP). Each partitioned table can have up to 4000 partitions, and clustering is limited to up to 4 columns. Partition filters are required in queries to avoid full table scans, and certain data types (e.g. STRING) cannot be used directly for partitioning.

Tip: Partition tables on fields like DATE or TIMESTAMP, and cluster by frequently filtered columns. This drastically reduces BigQuery query cost by narrowing data scans.

-- Common costly query (scans all columns)
SELECT *
FROM `project.dataset.sales`
WHERE sale_date >= '2025-01-01';

-- Optimised version (scans only needed columns)
SELECT sale_id, sale_date, amount
FROM `project.dataset.sales`
WHERE sale_date >= '2025-01-01';
-- Costly query with LIMIT (still scans all data)
SELECT *
FROM `project.dataset.sales`
LIMIT 10;

-- Optimised alternative using TABLESAMPLE to reduce data scanned
SELECT *
FROM `project.dataset.sales`
TABLESAMPLE SYSTEM (10 PERCENT);
-- Costly JOIN without filtering or pre-aggregation
SELECT a.user_id, a.session_id, b.purchase_amount
FROM `project.dataset.sessions` a
JOIN `project.dataset.purchases` b
ON a.user_id = b.user_id;

-- Optimised JOIN with pre-filtering and aggregation
WITH filtered_sessions AS (
  SELECT user_id, session_id
  FROM `project.dataset.sessions`
  WHERE event_date >= '2025-01-01'
),
aggregated_purchases AS (
  SELECT user_id, SUM(purchase_amount) AS total_purchase
  FROM `project.dataset.purchases`
  GROUP BY user_id
)
SELECT s.user_id, s.session_id, p.total_purchase
FROM filtered_sessions s
JOIN aggregated_purchases p
ON s.user_id = p.user_id;

5. Not Tracking Query Costs

Many teams lack visibility into who is running expensive queries or what logic is driving costs.

Tip: Use maximum_bytes_billed to cap scan limits in each query. Implement a BigQuery cost tracker or dashboard to monitor usage by user, project, or data source.

Bonus Tip: Consider Temporary Tables

For multi-step queries, temporary tables can reduce recomputation. Use CREATE TEMP TABLE in scripts to hold intermediate results, then reference them in follow-up queries.

Bonus Tip: Monitor Scheduled Query Costs

Scheduled queries can quietly contribute to growing BigQuery query costs—especially if they run frequently or scan large datasets.

Tip: Audit scheduled queries regularly. Ensure they use partition filters, avoid unnecessary scans, and run at appropriate intervals. Use dry runs to estimate their cost before scheduling.

Bonus Tip: Use APPROX_DISTINCT to Save on Count Queries

Counting distinct values across large datasets can be computationally expensive. The standard COUNT(DISTINCT column) performs a full shuffle and scan, which increases BigQuery query costs significantly.

Tip: Where absolute precision isn’t required, use APPROX_DISTINCT(column) instead. It dramatically reduces processing cost and time with minimal accuracy trade-offs—ideal for dashboards and summary views.

Ready to reduce your BigQuery query cost? Contact our team for a tailored optimisation review.

Read more about Google BiqQuery.

Learn about how we supported Paddle UK with their Google BiqQuery data strategy

Reduce cloud costs