tf.contrib.factorization.KMeans
Creates the graph for k-means clustering.
tf.contrib.factorization.KMeans( inputs, num_clusters, initial_clusters=RANDOM_INIT, distance_metric=SQUARED_EUCLIDEAN_DISTANCE, use_mini_batch=False, mini_batch_steps_per_iteration=1, random_seed=0, kmeans_plus_plus_num_retries=2, kmc2_chain_length=200 )
Args | |
---|---|
inputs | An input tensor or list of input tensors. It is assumed that the data points have been previously randomly permuted. |
num_clusters | An integer tensor specifying the number of clusters. This argument is ignored if initial_clusters is a tensor or numpy array. |
initial_clusters | Specifies the clusters used during initialization. One of the following:
|
distance_metric | Distance metric used for clustering. Supported options: "squared_euclidean", "cosine". |
use_mini_batch | If true, use the mini-batch k-means algorithm. Else assume full batch. |
mini_batch_steps_per_iteration | Number of steps after which the updated cluster centers are synced back to a master copy. |
random_seed | Seed for PRNG used to initialize seeds. |
kmeans_plus_plus_num_retries | For each point that is sampled during kmeans++ initialization, this parameter specifies the number of additional points to draw from the current distribution before selecting the best. If a negative value is specified, a heuristic is used to sample O(log(num_to_sample)) additional points. |
kmc2_chain_length | Determines how many candidate points are used by the k-MC2 algorithm to produce one new cluster centers. If a (mini-)batch contains less points, one new cluster center is generated from the (mini-)batch. |
Raises | |
---|---|
ValueError | An invalid argument was passed to initial_clusters or distance_metric. |
Methods
training_graph
training_graph()
Generate a training graph for kmeans algorithm.
This returns, among other things, an op that chooses initial centers (init_op), a boolean variable that is set to True when the initial centers are chosen (cluster_centers_initialized), and an op to perform either an entire Lloyd iteration or a mini-batch of a Lloyd iteration (training_op). The caller should use these components as follows. A single worker should execute init_op multiple times until cluster_centers_initialized becomes True. Then multiple workers may execute training_op any number of times.
Returns | |
---|---|
A tuple consisting of: | |
all_scores | A matrix (or list of matrices) of dimensions (num_input, num_clusters) where the value is the distance of an input vector and a cluster center. |
cluster_idx | A vector (or list of vectors). Each element in the vector corresponds to an input row in 'inp' and specifies the cluster id corresponding to the input. |
scores | Similar to cluster_idx but specifies the distance to the assigned cluster instead. |
cluster_centers_initialized | scalar indicating whether clusters have been initialized. |
init_op | an op to initialize the clusters. |
training_op | an op that runs an iteration of training. |
© 2020 The TensorFlow Authors. All rights reserved.
Licensed under the Creative Commons Attribution License 3.0.
Code samples licensed under the Apache 2.0 License.
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/contrib/factorization/KMeans