tf.keras.datasets.imdb.load_data
Loads the IMDB dataset.
tf.keras.datasets.imdb.load_data(
path='imdb.npz', num_words=None, skip_top=0, maxlen=None, seed=113,
start_char=1, oov_char=2, index_from=3, **kwargs
)
Arguments |
path | where to cache the data (relative to ~/.keras/dataset ). |
num_words | max number of words to include. Words are ranked by how often they occur (in the training set) and only the most frequent words are kept |
skip_top | skip the top N most frequently occurring words (which may not be informative). |
maxlen | sequences longer than this will be filtered out. |
seed | random seed for sample shuffling. |
start_char | The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character. |
oov_char | words that were cut out because of the num_words or skip_top limit will be replaced with this character. |
index_from | index actual words with this index and higher. |
**kwargs | Used for backwards compatibility. |
Returns |
Tuple of Numpy arrays: (x_train, y_train), (x_test, y_test) . |
Raises |
ValueError | in case maxlen is so low that no input sequence could be kept. |
Note that the 'out of vocabulary' character is only used for words that were present in the training set but are not included because they're not making the num_words
cut here. Words that were not seen in the training set but are in the test set have simply been skipped.