tf.contrib.cloud.BigQueryReader

A Reader that outputs keys and tf.Example values from a BigQuery table.

Inherits From: ReaderBase

tf.contrib.cloud.BigQueryReader(
    project_id, dataset_id, table_id, timestamp_millis, num_partitions,
    features=None, columns=None, test_end_point=None, name=None
)

Example use:

# Assume a BigQuery has the following schema,
#     name      STRING,
#     age       INT,
#     state     STRING

# Create the parse_examples list of features.
features = dict(
  name=tf.io.FixedLenFeature([1], tf.string),
  age=tf.io.FixedLenFeature([1], tf.int32),
  state=tf.io.FixedLenFeature([1], dtype=tf.string, default_value="UNK"))

# Create a Reader.
reader = bigquery_reader_ops.BigQueryReader(project_id=PROJECT,
                                            dataset_id=DATASET,
                                            table_id=TABLE,
                                            timestamp_millis=TIME,
                                            num_partitions=NUM_PARTITIONS,
                                            features=features)

# Populate a queue with the BigQuery Table partitions.
queue = tf.compat.v1.train.string_input_producer(reader.partitions())

# Read and parse examples.
row_id, examples_serialized = reader.read(queue)
examples = tf.io.parse_example(examples_serialized, features=features)

# Process the Tensors examples["name"], examples["age"], etc...

Note that to create a reader a snapshot timestamp is necessary. This will enable the reader to look at a consistent snapshot of the table. For more information, see 'Table Decorators' in BigQuery docs.

See ReaderBase for supported methods.

Args
`project_id`	GCP project ID.
`dataset_id`	BigQuery dataset ID.
`table_id`	BigQuery table ID.
`timestamp_millis`	timestamp to snapshot the table in milliseconds since the epoch. Relative (negative or zero) snapshot times are not allowed. For more details, see 'Table Decorators' in BigQuery docs.
`num_partitions`	Number of non-overlapping partitions to read from.
`features`	parse_example compatible dict from keys to `VarLenFeature` and `FixedLenFeature` objects. Keys are read as columns from the db.
`columns`	list of columns to read, can be set iff features is None.
`test_end_point`	Used only for testing purposes (optional).
`name`	a name for the operation (optional).

Raises
`TypeError`	If features is neither None nor a dict or If columns is neither None nor a list or If both features and columns are None or set.

Attributes
`reader_ref`	Op that implements the reader.
`supports_serialize`	Whether the Reader implementation can serialize its state.

Methods

`num_records_produced`

View source

num_records_produced(
    name=None
)

Returns the number of records this reader has produced.

This is the same as the number of Read executions that have succeeded.

Args
`name`	A name for the operation (optional).

Returns
An int64 Tensor.

`num_work_units_completed`

View source

num_work_units_completed(
    name=None
)

Returns the number of work units this reader has finished processing.

Args
`name`	A name for the operation (optional).

Returns
An int64 Tensor.

`partitions`

View source

partitions(
    name=None
)

Returns serialized BigQueryTablePartition messages.

These messages represent a non-overlapping division of a table for a bulk read.

Args
`name`	a name for the operation (optional).

Returns
`1-D` string `Tensor` of serialized `BigQueryTablePartition` messages.

`read`

View source

read(
    queue, name=None
)

Returns the next record (key, value) pair produced by a reader.

Will dequeue a work unit from queue if necessary (e.g. when the Reader needs to start reading from a new file since it has finished with the previous file).

Args
`queue`	A Queue or a mutable string Tensor representing a handle to a Queue, with string work items.
`name`	A name for the operation (optional).

Returns
A tuple of Tensors (key, value).
`key`	A string scalar Tensor.
`value`	A string scalar Tensor.

`read_up_to`

View source

read_up_to(
    queue, num_records, name=None
)

Returns up to num_records (key, value) pairs produced by a reader.

Will dequeue a work unit from queue if necessary (e.g., when the Reader needs to start reading from a new file since it has finished with the previous file). It may return less than num_records even before the last batch.

Args
`queue`	A Queue or a mutable string Tensor representing a handle to a Queue, with string work items.
`num_records`	Number of records to read.
`name`	A name for the operation (optional).

Returns
A tuple of Tensors (keys, values).
`keys`	A 1-D string Tensor.
`values`	A 1-D string Tensor.

`reset`

View source

reset(
    name=None
)

Restore a reader to its initial clean state.

Args
`name`	A name for the operation (optional).

Returns
The created Operation.

`restore_state`

View source

restore_state(
    state, name=None
)

Restore a reader to a previously saved state.

Not all Readers support being restored, so this can produce an Unimplemented error.

Args
`state`	A string Tensor. Result of a SerializeState of a Reader with matching type.
`name`	A name for the operation (optional).

Returns
The created Operation.

`serialize_state`

View source

serialize_state(
    name=None
)

Produce a string tensor that encodes the state of a reader.

Not all Readers support being serialized, so this can produce an Unimplemented error.

Args
`name`	A name for the operation (optional).

Returns
A string Tensor.

© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/cloud/BigQueryReader