Not all big data projects need a complex architecture and engineering team in order to start making sense of the data, so what should you do if you need to do some good old analysis and just want to get started right away? Assuming, for example, that you’re part of a small company, starting up a project and you need to analyze lots of data without spending additional time thinking of/planning the build of an architecture, hiring an architect / engineer, managing an infrastructure…, just need to see through your data and make sense of it. This is where Google’s BigQuery comes into play (of course there are many other potential uses but let’s stick with this for the moment). Called (a bit pretentiously maybe) an Enterprise Cloud Data Warehouse solution, thus scaring upfront many potential users in my opinion, in fact BigQuery is helping many to, at least, quick start their path in the Big Data world.
As part of the preparation of our next workshop, Data Analytics with BigQuery, we interviewed Gabriel Preda – trainer for the workshop but most importantly enthusiastic user of the solution for the last couple of years – to give us a glimpse of what we should expect from this solution.
Why BigQuery, why did it made sense to you?
Usually in a startup each person wears more than one hat. You put the hat of the sysadmin…. you’re the sysadmin. Later you might need to wear the hat which says „innovation”… and start collecting GBs of daily data and of course process them in a timely fashion. Being short on people it was clear that we needed a SaaS solution.
In which use cases should we use BigQuery (analytical, data migration, cloud requirements)?
BigQuery is designed for OLAP (Online Analytical Processing) or BI. You should not use BigQuery for OLTP. Best use case for BigQuery are: ad hoc and trial-and- error interactive query of large dataset for quick analysis and troubleshooting.
Can you list the best fit scenarios for it?
I have used it successfully for in house analytics solutions. But I think it’s one of the best candidates on the market for data fishing because of it’s ability to perform ad hoc queries on large amount of data…
Is it more feasible to be used in projects where the data has been already natively stored in the cloud (e.g. Google Cloud Storage)?
Data transfer towards BigQuery is free. You might have some costs in transforming the data as there are some requirements on the data BigQuery can ingest. If you already have data in CSV, Avro (and soon Parquet) you can import them directly.
Which are the BigQuery alternatives/competitors?
I don’t know what to say about this… as it is quite a unique beast product!
Can you control where your data is, in case you have some requirements regarding location of your data?
You can choose between US and EU. But that is where it ends. Though there are some awesome news… there is an experimental extension to the BigQuery client that offers client-side encryption (Homomorphic encryption) for a subset of query types… that is: you can encrypt your data, upload encrypted data to BigQuery, run queries, fetch the results and decrypt them locally. It’s magic!
How you visualize the results of the analysis or the correlations of the data in BigQuery.
In the worst case scenario, when you can’t use the existing integrations, you can retrieve the results and use any visualization tool you are accustomed with. Now there are a lot of available integrations like: Tableau, Qlik, Talend, Informatica, SnapLogic or newcomers like Chartio or even free & open source BI tools like Metabase. There is also a Google solution (for now in beta) called Data Studio which covers more than BigQuery. I’ll do my best add details about Data Studio during the workshop.
Interview by: Valentina Crisan – bigdata.ro