MySQL · MyRocks · MyRocks参数介绍

on 2018-01-09 | by 数据库内核月报关注作者

以下参数是db级别的，全局有效

参数说明备注 rocksdb_block_cache 缓存uncompressed blocks，此cache有分区优化，分区数由table_cache_numshardbits控制，默认为6即64个分区。每个分区至少大于512k（rocksdb::LRUCache::LRUCache）默认为512M rocksdb_max_total_wal_size 如果WAL超过rocksdb_max_total_wal_size，会swich memtable并flush memtable 默认为0, 表示大小不能超过所有 columnfamily write_buffer的4倍 rocksdb_wal_size_limit_mb purge wal时最多可以保留wal的最大大小（对应DBOptions::WAL_size_limit_MB）默认为0，表示不控制保留wal数量, 只要memtable flush了wal都可以 purge rocksdb_wal_ttl_seconds 控制purge wal的频率，每隔rocksdb_wal_ttl_seconds/2 purge一次。如果rocksdb_wal_size_limit_mb > 0, 那么每600s purge一次（kDefaultIntervalToDeleteObsoleteWAL）默认为0 rocksdb_manual_wal_flush If true WAL is not flushed automatically after each write. Instead it relies on manual invocation of FlushWAL to write the WAL buffer to its file. 默认为true rocksdb_deadlock_detect 是否开启死锁检测默认是关闭的 rocksdb_wal_bytes_per_sync 每rocksdb_wal_bytes_per_sync字节sync一次WAL(WritableFileWriter::Flush) 默认为0, 每次都刷 rocksdb_wal_recovery_mode 重启时recovery模式 1: Fail to start, do not recover 0: If corrupted last entry: truncate and start 2: Truncate everything after corrupted entry • Even not corrupted entries • Acceptable on slaves 3: Truncate only corrupted entry • Most dangerous option rocksdb_strict_collation_exceptions 可以取非memcompare类型collation的表取值为正则表达式，如"t1,t2*" rpl_skip_tx_api Use write batches for replication thread instead of tx api 作用于备库 rocksdb_master_skip_tx_api Disables Transaction API Enables WriteBatch API, There is no row lock，UPDATE and DELETEs are faster You must ensure no concurrent operation running rocksdb_read_free_rpl_tables 用正则表达式指定使用read free replication的库表，如.*或t.* 默认为空 rocksdb_info_log_level 日志级别，数值越小越详细 0:debug_level 1:info_level 2:warn_level 3:error_level 4:fatal_level 5:header_level rocksdb_perf_context_level 指定 perf context的级别0，1: disable2: enable only count stats3: Other than count stats, also enable time stats except for mutexes4: enable count and time stats 默认0 rocksdb_max_background_jobs 后台工作线程数老版本还分为rocksdb_max_background_jobs和max_background_compactions，新版合为一个，会自动分配两者数量。 https://github.com/facebook/rocksdb/wiki/Thread-Pool rocksdb_commit_in_the_middle Commit rows implicitly every rocksdb_bulk_load_size, 设置rocksdb_bulk_load为on时自动commit in middle 默认OFF,不建议全局设置，应回话级别设置 rocksdb_blind_delete_primary_key 通过主键delete 有且仅有主键索引的表时，不需要读取数据，直接通过指定的主键来删除默认OFF,DELETES by Primary Key Works:
DELETE FROM t WHERE id IN (1, 2, 3, 4, 5, 6, ...., 10000)
Does not work:
DELETE .. WHERE id < 10 rocksdb_use_direct_reads use O_DIRECT for reading data 默认OFF rocksdb_use_direct_io_for_flush_and_compaction use O_DIRECT for flush and compact 默认OFF rocksdb_skip_fill_cache Skip filling block cache on read requests 默认OFF，DDL load 时使用 gap_lock_raise_error Using Gap Lock without full unique key in multi-table or multi-statement transactions is not allowed.违法以上情况使用gap lock会记入错误日志默认false gap_lock_write_log Using Gap Lock without full unique key in multi-table or multi-statement transactions is not allowed.
违法以上情况使用gap lock会记入gap_lock_log_file指定的文件中默认false gap_lock_log_file 指定记录gap lock的文件 rocksdb_stats_dump_period_sec
控制Statistic信息记录到LOG中的频率（DBImpl::PrintStatistics）
默认600，Note that currently it is only dumped after a compaction. So if the database doesn't serve any write for a long time, statistics may not be dumped, despite of options.stats_dump_period_sec. rocksdb_compaction_readahead_size If non-zero, we perform bigger reads when doing compaction. If you're running RocksDB on spinning disks, you should set this to at least 2MB. That way RocksDB's compaction is doing sequential instead of random reads. 默认为0 rocksdb_advise_random_on_open If set true, will hint the underlying file system that the file access pattern is random, when a sst file is opened. 默认ON rocksdb_max_row_locks 事务最多可以持有锁的个数默认1M rocksdb_bytes_per_sync 每rocksdb_wal_bytes_per_sync字节sync一次sst文件(WritableFileWriter::Flush) 默认为0, 每次都刷You may consider using rate_limiter to regulate write rate to device.
When rate limiter is enabled, it automatically enables bytes_per_sync
to 1MB. rocksdb_enable_ttl Enable expired TTL records to be dropped during compaction 默认ON rocksdb_enable_ttl_read_filtering For tables with TTL, expired records are skipped/filtered out during processing and in query results. Disabling this will allow these records to be seen, but as a result rows may disappear in the middle of transactions as they are dropped during compaction. Use with caution. 默认ON rocksdb_bulk_load bulk_load开关默认OFF,https://github.com/facebook/mysql-5.6/wiki/data-loading rocksdb_bulk_load_allow_unsorted 支持非主键排序数据的bulk_load 默认OFF rocksdb_bulk_load_size 每rocksdb_bulk_load_size次write进行一次bulk_load 默认1000次 rocksdb_enable_bulk_load_api Enables using SstFileWriter for bulk loading 默认ON rocksdb_enable_2pc 是否开启2pc 默认ON rocksdb_rate_limiter_bytes_per_sec 控制读写sst的速度DBOptions::rate_limiter bytes_per_sec for RocksDB 默认0 rocksdb_sst_mgr_rate_bytes_per_sec 控制删除sst的速度DBOptions::sst_file_manager rate_bytes_per_sec for RocksDB 默认0 rocksdb_delayed_write_rate WriteStall时delay的时间，单位微秒（DBOptions::delayed_write_rate）默认0 rocksdb_write_disable_wal 是否关闭WAL 默认为OFF rocksdb_flush_log_at_trx_commit Sync wal on transaction commitSimilar to innodb_flush_log_at_trx_commit. 1: sync on commit,0,2: not sync on commit 默认1 rocksdb_cache_index_and_filter_blocks index和filter blocks是否缓存到block cache 默认ON rocksdb_pin_l0_filter_and_index_blocks_in_cache if cache_index_and_filter_blocks is true and the below is true, then filter and index blocks are stored in the cache, but a reference is held in the "table reader" object so the blocks are pinned and only evicted from cache when the table reader is freed. 默认ON

以上参数可以通过show variables查看

更详细可以参考代码 db_options_type_info

include/rocksdb/options.h

以下参数是column family级别的，可以分别对每个column family设置

参数说明备注 write_buffer_size memtable内存大小默认 max_write_buffer_number memtable的最大个数默认2 min_write_buffer_number_to_merge it is the minimum number of memtables to be merged before flushing to storage. For example, if this option is set to 2, immutable memtables are only flushed when there are two of them 默认1 target_file_size_base level1 sst大小默认64M target_file_size_multiplier level L(L>1) sst大小target_file_size_base * (target_file_size_multiplier ^ (L-1)) 默认1，
For example, if target_file_size_base is 2MB and
target_file_size_multiplier is 10, then each file on level-1 will
be 2MB, and each file on level-2 will be 20MB,
and each file on level-3 will be 200MB max_bytes_for_level_base level1的sst总大小默认256M max_bytes_for_level_multiplier level L的sst总大小为 max_bytes_for_level_base*(max_bytes_for_level_multiplier)^(L-1))*max_bytes_for_level_multiplier_additional(L-1)(VersionStorageInfo::CalculateBaseBytes) 默认10 max_bytes_for_level_multiplier_additional Different max-size multipliers for different levels.
(VersionStorageInfo::CalculateBaseBytes) 默认：1:1:1:1:1:1:1 num_levels level数量默认7 level0_file_num_compaction_trigger 当level0文件数量超过此值时触发level0 compact 默认4 level0_slowdown_writes_trigger 当level0文件数量超过此值时触发x写delay 默认20 level0_stop_writes_trigger 当level0文件数量超过此值时触发停写默认36 pin_l0_filter_and_index_blocks_in_cache if cache_index_and_filter_blocks is true and the below is true, then filter and index blocks are stored in the cache, but a reference is held in the "table reader" object so the blocks are pinned and only evicted from cache when the table reader is freed. 默认1，column family单独设置会覆盖rocksdb_pin_l0_filter_and_index_blocks_in_cache cache_index_and_filter_blocks index和filter blocks是否缓存到block cache 默认1，column family单独设置会覆盖rocksdb_cache_index_and_filter_blocks optimize_filters_for_hits 设置为True，最后一层不保存filter信息，最后一层bloomfilter实际没有用处默认OFF filter_policy 指定filter策略 filter_policy=bloomfilter:10:false表示使用bloomfilter,bits_per_key_=10, hash函数个数为10*ln2，false：use_block_based_builder_=false，表示使用full filter prefix_extractor 指定filter使用前缀 prefix_extractor=capped:24表示最多取前缀24个字节，另外还有fixed:n方式表示只取前缀n个字节，忽略小于n个字节的key. 具体可参考CappedPrefixTransform，FixedPrefixTransform partition_filters 表示时否使用partitioned filter 默认falsefilter 参数优先级如下 block base > partitioned > full. 比如说同时指定use_block_based_builder_=true和partition_filters=true实际使用的block based filter whole_key_filtering If true, place whole keys in the filter (not just prefixes) 默认1 level_compaction_dynamic_level_bytes In this mode, size target of levels are changed dynamically based on size of the last level.减少写放大 http://rocksdb.org/blog/2015/07/23/dynamic-level.html memtable 指定memtable类型(skiplist/vector/hash_linkedlist/prefix_hash/cuckoo) 默认skiplist compaction_pri compact选择文件策略kByCompensatedSize：Slightly prioritize larger files by size compensated by #deleteskOldestLargestSeqFirst：First compact files whose data's latest update time is oldestkOldestSmallestSeqFirst：First compact files whose range hasn't been compacted to the next level for the longestkMinOverlappingRatio：First compact files whose ratio between overlapping size in next level and its size is the smallest 默认kByCompensatedSize compression_per_level 指定每个level的压缩策略 It usually makes sense to avoid compressing levels 0 and 1 and to compress data only in higher levels. You can even set slower compression in highest level and faster compression in lower levels (by highest we mean Lmax). bottommost_compression 指定最底level的压缩策略 arena_block_size rocksdb内存分配单位KBlockSize由参数arena_block_size指定 arena_block_size不指定时默认为write_buffer_size的1/8. soft_pending_compaction_bytes_limit All writes will be slowed down to at least delayed_write_rate if estimated
bytes needed to be compaction exceed this threshold 默认64G hard_pending_compaction_bytes_limit All writes are stopped if estimated bytes needed to be compaction exceed this threshold. 默认256G

以上参数可以通过select * from information_schema.rocksdb_cf_options查看

更详细可以参考代码ParseColumnFamilyOption, cf_options_type_info

include/rocksdb/table.h
rocksdb/util/options_helper.h
rocksdb/options/options_helper.cc
include/rocksdb/advanced_options.h

参数配置示例

rocksdb_default_cf_options=memtable=vector;
arena_block_size=10M;
disable_auto_compactions=1;
min_write_buffer_number_to_merge=1;
write_buffer_size=100000m;
target_file_size_base=32m;
max_bytes_for_level_base=512m;
level0_file_num_compaction_trigger=20;
level0_slowdown_writes_trigger=30;
level0_stop_writes_trigger=30;
max_write_buffer_number=5;
compression_per_level=kNoCompression:kNoCompression:kNoCompression:kNoCompression:kNoCompression:kNoCompression;
bottommost_compression=kNoCompression;
block_based_table_factory={cache_index_and_filter_blocks=1;filter_policy=bloomfilter:10:false;whole_key_filtering=1};
level_compaction_dynamic_level_bytes=false;
optimize_filters_for_hits=true

参数修改示例

SET @@global.rocksdb_update_cf_options='cf1={write_buffer_size=8m;target_file_size_base=2m};cf2={write_buffer_size =16m;max_bytes_for_level_multiplier=8};cf3={target_file_size_base=4m};';

注意：此方式可以动态修改，但没有持久化到OPTIONS文件中, 需手动修改OPTIONS文件

文章来源：

Author：数据库内核月报
link：http://10.101.233.47:4000/monthly/2018/01/09/

下一篇：PgSQL · 应用案例 · 惊天性能！单RDS PostgreSQL实例支撑 2000亿

上一篇：PgSQL · 应用案例 · 传统分库分表(sharding)的缺陷与破解之法

MySQL · MyRocks · MyRocks参数介绍

以下参数是db级别的，全局有效

以下参数是column family级别的，可以分别对每个column family设置

添加我喜欢的博客

编辑我的关注

更多推荐博客

MySQL · MyRocks · MyRocks参数介绍

以下参数是db级别的，全局有效

以下参数是column family级别的，可以分别对每个column family设置

添加我喜欢的博客

编辑 我的关注

更多 推荐博客

编辑我的关注

更多推荐博客