How To Fragment Bigquery Response Into 10000 In Every Request?
Solution 1:
One option would be to write result of your query into destination table and then use Tabledata: list API to retrieve data from that table in a paged manner either using maxResults
and pageToken
to retrieve page by page or maxResults
and startIndex
to retrieve specified set of rows.
Another option would be to add row_number to your query (something like below)
SELECT visitorId , totals.visits,
ROW_NUMBER() OVER() as num
FROM [12123333.ga_sessions_20160602]
with still writing result into destination temp table and then retrieve data from that table using new num
field for grouping as num % 10000 = {group_number}
for example . Or you can use INTEGER(num / 10000) = {group_number}
- whatever you like more
SELECT visitorId , totals.visits
FROM tempTable
WHERE num %10000=0
next will be with
WHERE num %10000=1
and so on ...
Please note: second option uses expensive (execution wise - not billing wise) ROW_NUMBER() function which requires all data for each partition (in this case it is only one partition - all rows) to be in the same node - so depends on number of rows it can work or not. For your specific example with just 500K rows it's going to work - but if you extend it to table with millions and millions rows - it might not (depends on how much data you output in each row and number of rows)
One more note: - in first option you pay only once when you generate result and save it into temp table. Then - it's free in a sense that Tabledata.list API is free to use as it does not use BigQuery query per se, but rather just reads directly from underlying data. - in second option you pay both - and when you generate temp table and each time you retrieve/query yet another group - because it is all BigQuery queries. Moreover each time you get data for specific group - you are charged for scanning whole temp table - so in your case it is extra 50 times
This makes (in your case) first option around 51 times cheaper than second one :o)
Solution 2:
It sound like you are asking for data pagination, where page size is 10 000, you could use the following query
SELECT visitorId, totals.visits,
FROM (
SELECT visitorId , totals.visits, ROW_NUMBER() OVER() as rownum
FROM [12123333.ga_sessions_20160602]'
) WHERE rownum BETWEEN 1AND10000
and so on
SELECT visitorId, totals.visits,
FROM (
SELECT visitorId , totals.visits, ROW_NUMBER() OVER() as rownum
FROM [12123333.ga_sessions_20160602]'
) WHERE rownum BETWEEN 10001AND20000
Post a Comment for "How To Fragment Bigquery Response Into 10000 In Every Request?"