tf.contrib.cloud.BigQueryReader
A Reader that outputs keys and tf.Example values from a BigQuery table.
Inherits From: ReaderBase
tf.contrib.cloud.BigQueryReader( project_id, dataset_id, table_id, timestamp_millis, num_partitions, features=None, columns=None, test_end_point=None, name=None )
Example use:
# Assume a BigQuery has the following schema, # name STRING, # age INT, # state STRING # Create the parse_examples list of features. features = dict( name=tf.io.FixedLenFeature([1], tf.string), age=tf.io.FixedLenFeature([1], tf.int32), state=tf.io.FixedLenFeature([1], dtype=tf.string, default_value="UNK")) # Create a Reader. reader = bigquery_reader_ops.BigQueryReader(project_id=PROJECT, dataset_id=DATASET, table_id=TABLE, timestamp_millis=TIME, num_partitions=NUM_PARTITIONS, features=features) # Populate a queue with the BigQuery Table partitions. queue = tf.compat.v1.train.string_input_producer(reader.partitions()) # Read and parse examples. row_id, examples_serialized = reader.read(queue) examples = tf.io.parse_example(examples_serialized, features=features) # Process the Tensors examples["name"], examples["age"], etc...
Note that to create a reader a snapshot timestamp is necessary. This will enable the reader to look at a consistent snapshot of the table. For more information, see 'Table Decorators' in BigQuery docs.
See ReaderBase for supported methods.
Args | |
---|---|
project_id | GCP project ID. |
dataset_id | BigQuery dataset ID. |
table_id | BigQuery table ID. |
timestamp_millis | timestamp to snapshot the table in milliseconds since the epoch. Relative (negative or zero) snapshot times are not allowed. For more details, see 'Table Decorators' in BigQuery docs. |
num_partitions | Number of non-overlapping partitions to read from. |
features | parse_example compatible dict from keys to VarLenFeature and FixedLenFeature objects. Keys are read as columns from the db. |
columns | list of columns to read, can be set iff features is None. |
test_end_point | Used only for testing purposes (optional). |
name | a name for the operation (optional). |
Raises | |
---|---|
TypeError |
|
Attributes | |
---|---|
reader_ref | Op that implements the reader. |
supports_serialize | Whether the Reader implementation can serialize its state. |
Methods
num_records_produced
num_records_produced( name=None )
Returns the number of records this reader has produced.
This is the same as the number of Read executions that have succeeded.
Args | |
---|---|
name | A name for the operation (optional). |
Returns | |
---|---|
An int64 Tensor. |
num_work_units_completed
num_work_units_completed( name=None )
Returns the number of work units this reader has finished processing.
Args | |
---|---|
name | A name for the operation (optional). |
Returns | |
---|---|
An int64 Tensor. |
partitions
partitions( name=None )
Returns serialized BigQueryTablePartition messages.
These messages represent a non-overlapping division of a table for a bulk read.
Args | |
---|---|
name | a name for the operation (optional). |
Returns | |
---|---|
1-D string Tensor of serialized BigQueryTablePartition messages. |
read
read( queue, name=None )
Returns the next record (key, value) pair produced by a reader.
Will dequeue a work unit from queue if necessary (e.g. when the Reader needs to start reading from a new file since it has finished with the previous file).
Args | |
---|---|
queue | A Queue or a mutable string Tensor representing a handle to a Queue, with string work items. |
name | A name for the operation (optional). |
Returns | |
---|---|
A tuple of Tensors (key, value). | |
key | A string scalar Tensor. |
value | A string scalar Tensor. |
read_up_to
read_up_to( queue, num_records, name=None )
Returns up to num_records (key, value) pairs produced by a reader.
Will dequeue a work unit from queue if necessary (e.g., when the Reader needs to start reading from a new file since it has finished with the previous file). It may return less than num_records even before the last batch.
Args | |
---|---|
queue | A Queue or a mutable string Tensor representing a handle to a Queue, with string work items. |
num_records | Number of records to read. |
name | A name for the operation (optional). |
Returns | |
---|---|
A tuple of Tensors (keys, values). | |
keys | A 1-D string Tensor. |
values | A 1-D string Tensor. |
reset
reset( name=None )
Restore a reader to its initial clean state.
Args | |
---|---|
name | A name for the operation (optional). |
Returns | |
---|---|
The created Operation. |
restore_state
restore_state( state, name=None )
Restore a reader to a previously saved state.
Not all Readers support being restored, so this can produce an Unimplemented error.
Args | |
---|---|
state | A string Tensor. Result of a SerializeState of a Reader with matching type. |
name | A name for the operation (optional). |
Returns | |
---|---|
The created Operation. |
serialize_state
serialize_state( name=None )
Produce a string tensor that encodes the state of a reader.
Not all Readers support being serialized, so this can produce an Unimplemented error.
Args | |
---|---|
name | A name for the operation (optional). |
Returns | |
---|---|
A string Tensor. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/cloud/BigQueryReader