Crash recovery
While we strive to give you perfect uptime, like other complex applications RethinkDB is not immune to crashing. Here are some tips on how to recover from a crash, how to submit a bug report, and how to maximize availability.
What to do after a crash
Check if you ran out of memory
You may be able to check if the kernel’s out-of-memory killer is responsible for the crash by checking the system message buffer:
sudo dmesg | grep oom
This may show you messages similar to this:
rethinkdb invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
[<ffffffff8111d272>] ? oom_kill_process+0x82/0x2a0
If this is the case, you may be able to avoid crashes by changing RethinkDB’s cache size. For information on in-memory caches, how to check their current size, and how to change them, read Understanding RethinkDB memory requirements.
Check the log
The log file’s location is dependent on your system configuration and how you started RethinkDB.
-
If you started
rethinkdb
on a terminal rather than from a startup script, it will log to therethinkdb_data
directory. By default it will write tolog_file
but this may be overridden with the--log-file
startup option. -
If your Linux system uses
systemd
, usejournalctl
to view the log:journalctl -u rethinkdb@<instance>
-
If you installed RethinkDB through a package manager on a system that does not use
systemd
, then you may have to check where it’s configured to log. It’s very likely this will be in the/var/log/
directory (i.e.,/var/log/rethinkdb
).
The log may give you information as to what caused the crash.
Community support
If it doesn’t appear to be a memory issue and the log provides no clue, you can try asking for support on our official IRC channel, #rethinkdb on freenode or our Google Group. If your problem is a crash that we’ve seen before—or our users have—this may get you a quick answer.
How to submit a bug report
We use Github for issue tracking: https://github.com/rethinkdb/rethinkdb/issues. If you want to report a suspected bug in RethinkDB, open an issue there.
The most important things for you to provide for us are:
-
The full output from
rethinkdb --version
, something like:rethinkdb 1.13.3 (CLANG 5.1 (clang-503.0.40))
-
The full output from
uname -a
, something like:Darwin rethink.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun 3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
-
The backtrace from the crash, if it’s available in the logs.
Other things that might be helpful to us, if you have them:
- A dump of the system tables (see below)
- A dump of the
rethinkdb._debug_table_status
table (a “hidden” table in therethinkdb
system database) - The core file, if it was dumped on crash
- The data files if RethinkDB cannot restart¹
- The output of
rethinkdb
on startup - Your cluster configuration (number of servers, basic network topology, etc.)
- Information about the server:
- How much memory it has
- The file system it’s using
- Are you running RethinkDB in a VM?
- Other unusual configuration details
- Is the crash reproducible, and if so, under what conditions?
Dumping the system tables
In the Data Explorer, the following command will output the contents of all the configuration/status tables and the most recent 50 lines of the logs
table:
r.expr(["current_issues", "jobs", "stats", "server_config", "server_status",
"table_config", "table_status", "db_config", "cluster_config"]).map(
[r.row, r.db('rethinkdb').table(r.row).coerceTo('array')]
).coerceTo('object').merge(
{logs: r.db('rethinkdb').table('logs').limit(50).coerceTo('array')}
)
Setting up high availability
RethinkDB supports replication of data: every table in a database can be replicated as many times as you have servers in a cluster. Setting up replication is a simple operation with the web interface or the command line tool. For details, read Sharding and replication.
RethinkDB does not have fully automatic failover (yet), but if a server in a cluster crashes it can be manually removed from the cluster. In most cases, RethinkDB will recover from such a situation automatically. For information on this process, read Failover.
- We’ll sign an NDA if necessary, and can set up an FTP server for you to transfer the file to if it’s large.
© RethinkDB contributors
Licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License.
https://rethinkdb.com/docs/crashes/