
Today, organizations are collecting data at record speeds. Between sensor measurements and consumer behavior, the need for tools capable of efficiently storing and analyzing large amounts of data has never been more important. Google Cloud offers tailored solutions, particularly with Google BigQuery.
The right data analysis tools greatly facilitate data-driven decision-making. Google BigQuery is one of those powerful tools, and this article will explain how to use it step by step.
What is Google BigQuery?
BigQuery is a fully managed and serverless data warehouse offered by Google Cloud Platform (GCP). It allows you to analyze terabytes of data in seconds.
BigQuery is based on Dremel, a distributed system developed by Google to quickly query very large datasets. Dremel divides query execution into “slots” to fairly distribute resources among multiple users. This system uses Jupiter (Google’s internal network) to access storage, which is based on Colossus, a distributed file system that ensures data replication and recovery.
Data is stored in a columnar format, allowing for high compression and fast analysis speeds. BigQuery can also query data from other services such as BigTable, Cloud Storage, Cloud SQL, Google Analytics, or Google Drive.
BigQuery is ideal for analyzing very large volumes of data, especially when datasets are primarily read-only. It is not suitable for traditional transactional databases (OLTP) or small databases.
Finally, BigQuery operates without requiring infrastructure management: you only pay based on the storage space used and the volume of queries made. However, it is important to note that data must be hosted on Google Cloud, which may limit your architectural flexibility.
Practical Guide: How to Use Google BigQuery
BigQuery is accessible from the Google Cloud Platform web interface, or via API, SDK, or CLI.
Even without your own data, you can start with public datasets offered by Google Cloud. An interesting example is the COVID-19 dataset, which is freely accessible.
Here’s how to proceed:
Step 1: Download the dataset to your computer
Download an up-to-date version of the dataset in CSV format to your local machine.
Step 2: Import and store the dataset in Google BigQuery
- Log in to GCP and go to the BigQuery console (Big Data section).
- Click on CREATE DATASET to create a new dataset. Give it a unique identifier and choose a storage region.
- Once the dataset is created, click on CREATE TABLE:
- Source: Upload
- File format: CSV
- Select your local file.
- Name your table (for example:
worldwide_cases). - Enable the Auto Detect option to automatically detect the schema.

Step 3: Query the data stored in BigQuery
Once the table is created, you can run your first SQL queries:
- To display 1000 rows: sqlCopierModifier
SELECT * FROM `project_name.dataset_name.worldwide_cases` LIMIT 1000 - To get the total number of cases and deaths by country: sqlCopierModifier
SELECT countriesAndTerritories, SUM(cases) AS N_Cases, SUM(deaths) AS N_Deaths, COUNT(*) AS N_Rows FROM `project_name.dataset_name.worldwide_cases` GROUP BY countriesAndTerritories LIMIT 1000

Step 4: Add the dataset to Google Cloud Storage
You can also store your files in Google Cloud Storage (GCS):
- Create a GCS bucket.
- Upload your CSV file to this bucket.
(You can refer to Google’s documentation to create a bucket if needed.)

Step 5: Use BigQuery with a dataset in Google Cloud Storage
- In BigQuery, create a new table:
- Source: Google Cloud Storage
- Specify the file path in GCS.
- Format: CSV
- Give it a new name (for example
worldwide_cases_in_bucket).
- You will be able to query this new table in the same way as before.


Conclusion
BigQuery is an extremely powerful solution for quickly exploring and analyzing large amounts of data. It allows you to go from zero to advanced analysis in no time.
However, despite its advantages, BigQuery is not perfect: it is less suited for frequently changing data and requires the use of Google Cloud storage. For greater flexibility, it is recommended to keep your raw data elsewhere.
To efficiently store large volumes of data while maintaining cloud choice freedom, Cloud Volumes ONTAP from NetApp is an excellent alternative. Available on AWS, Azure, and Google Cloud, this solution optimizes costs, improves storage efficiency, easily clones datasets, performs automatic tiering, and ensures data protection.



