Jenny Chen's weblog
MySQL Innodb Performance Tuning For Disk-Bound Workloads On CMT Servers
MySQL is one of the world's most popular open source databases, and it is widely used and becoming the database-of-choice for many leading Web 2.0 sites. Like most database servers, the most common bottleneck in the enterprise environment encountered by MySQL is disk I/O. To maximize the performance of MySQL for disk I/O bound workloads on the Solaris operating system on CMT servers (e.g. the Sun Fire T2000 Server), configuration and tuning of MySQL server variables is critical and can make a big difference to performance, as does the optimization of the Solaris filesystem for MySQL, and the configuration of storage arrays.
Configuration Issues
MySQL server's performance can be optimized using various configuration settings. The first step is to read the configuration and system variables by running the command:
mysql> show variables;
Once you have done this, you can take appropriate action to configure/tune the variables for better performance. This action can be one of the following:
1. Change a value in the my.cnf configuration file, or mysql start up options
2. Configure the optimum number of user connections
3. Optimize Solaris file system performance
4. Setup and configure storage disk array
5. Make database schema changes, such as changing design of one or more tables, or adding or modifying indexes
6. Optimize the queries used by the application
This document will only concentrate on changing MySQL configuration settings and file system tuning. Storage configuration and changes at the database design and application level are not covered.
MySQL Server Variables
MySQL has many variables that can be adjusted to change MySQL behavior or for performance purpose. For I/O bound workloads, the most important parameters are memory related variables. MySQL includes several storage engines, including MyISAM, InnoDB, HEAP, and Berkeley DB (BDB), some variables apply to one of the storage engines only, some variables are used in the SQL layer applying to all the storage engines. While using the ACID transaction supported Innodb storage engine, first, we don’t need to configure the following memory-related variables, saving precious memory in the disk I/O bound workload:
· bulk_insert_buffer_size
· key_buffer_size
· key_cache_age_threshold, key_cache_block_size, key_cache_division_limit
· read_buffer_size, read_rnd_buffer_size
There are several memory-related variables that apply to all storage engines
· join_buffer_size – A buffer used for full join. When there are large joins without indexes, increase this buffer size to improve the efficiency.
· sort_buffer_size – A buffer used for the sort result set allocated by each thread. This can speed up ORDER BY and GROUP BY queries.
· query_cache_size – Set this variable to a nonzero value to enable query caching
· query_cache_limit – The maximum size of the cached result set, the larger result set won’t be cached
· query_cache_min_res_unit –query cache allocate memory blocks with the minimum size set by this variable. When the application has a lot of queries with small results,the default block size (4KB) may lead to memory fragmentation. So with small resultsets, decreasing it to 2048 or 1024 bytes might improve performance; with large query resultsets, increasing it to 8192, 16384 or more may improve performance query_cache_type: 0=OFF, 1=ON
The MySQL query cache stores the identical SELECT queries issued by clients to the database server. This makes it possible to locate and re-issue the same queries without repetitive hard parsing activities. MySQL also stores the query's result set in the query cache, which can significantly reduce the overhead of creating complex result sets for queries from the disk or memory caches, reducing both physical and logical I/O. This can speed up applications where repetitive queries of products are being issued. If you see a high value for qcache_hits compared to your total queries at runtime or a low value for qcache_free_memory seen from the mysql>show status; you probably need to increase the value of the query_cache_size parameter accordingly. Otherwise, you would decrease the value of the query_cache_size parameter to save memory resources for the other MySQL cache buffers. If qcache_hit is 0 in the runtime, you would completely turn off the query cache by setting query_cache_type as 0, together with setting query_cache_size as 0, since there is some overhead caused by having the query cache enabled besides wasting the memory resource. If the application uses many simple SELECT queries without them being repeated, having the query cache enabled may actually impede performance by 5-10%. However, for applicationswith many repeated SELECT queries with large resultsets, the performance increase set by the query cache can be 200% or more.
· tmp_table_size – set the maximum memory to allocate to a temporary table automatically created during query execution before MySQL converts it into an on-disk MyISAM table. When you see a lot of queries with the state value shown as “copying to tmp table on disk” when running the mysql>SHOW PROCESSLIST(or mysqladmin –i10 processlist extended-status) command, this means that the temporary resultset was larger than the value set by tmp_table_size, so that the MySQL thread copies the temporary table from RAM to disk. , In such cases, increasing the value can speed up execution of large queries; otherwise, decrease the value to save memory for the MySQL I/O bound workload.
· table_cache – Size this cache to keep most tables open since opening tables can be expensive. The optimum value for table_cache is directly related to the number of tables that need to be opened simultaneously in order to perform multiple-table joins. The table_cache value should be no less than the number of concurrent connections times the largest number tables involved in any one join. 1024 is a good value for applications with a couple of hundred tables (each connection has its own entry). You should check the Open_tables status variable to see if it is large compared to table_cache
MySQL Innodb Only Memory-related variables
· innodb_buffer_pool_size – Set the amount of memory allocated to both Innodb data and index buffer cache. If the server requests data available in the cache, the data can be processed right away. Otherwise, the operating system will request that the data be loaded from the disk into the buffer. It is important to set this value as high as possible to use the more efficient innodb data and index buffer cache instead of operating system buffer. For the sysbench I/O bound workload on a T2000 server with 8G RAM, increasing innodb_buffer_pool_size from 4G to 5G can improve performance by around 11%.
· innodb_additional_mem_pool_size - Sets the amount of memory allocated to the buffer storing the InnoDB internal data dictionary and other internal data structures. This parameter does not affect performance much, so set it to 20M (For applications with more tables, more memory needs to be allocated here) for the sysbench OLTP I/O bound test case.
· innodb_log_buffer_size - Set the amount of memory allocated to the buffer storing InnoDB write-ahead log entries. For large transactions, the log can be loaded into the log buffer instead of writing log to the log files on disk untill the log buffer is flushed on each transaction commit. If you see large log I/Os in the show innodb status output at runtime, you probably need to set a larger value for the innodb_log_buffer_size parameter to save disk I/O. For workloads which don’t have long transactions like sysbench, it is not necessary to waste memory resources by setting a higher value for the log buffer; it is fine to set it to 8Mbytes.
Other MySQL Innodb variables impacting I/O Performance
· innodb_flush_log_at_trx_commit - InnoDB flushes the transaction log to disk approximately once per second in the background. As a default, innodb_flush_log_at_trx_commit is set to 1, meaning the log is flushed to the disk at a transaction commit, and modifications made by the transaction won’t be lost during a MySQL, OS, or HW crash. For workloads running with many small transactions, you can reduce disk I/O to the logs to improve performance by setting the innodb_flush_log_at_trx_commit parameter to 0, meaning no log flushing on each transaction commit. However, the transaction might be lost if MySQL crashes. In the sysbench OLTP I/O bound workload test on a T2000 server, setting innodb_flush_log_at_trx_commit =0 in the read-only test can improve performance by 4%. You can set this value to 2 to flush the log to the OS cache to save disk I/O on each transaction commit.
· innodb_log_file_size – Set the size of each log file in a log group. InnoDB writes to the log files in a circular fashion, so the bigger innodb_log_file_size, the less checkpoint flush activity, reducing disk I/O, but increasing recovery time. In the show innodb status output, if there are large page writes in the BUFFER POOL AND MEMORY part, you will need to increase this parameter.
Configure an Optimum Number of User Threads
MySQL is a single-process, multithreaded application. There is one master thread with highest priority to control the server. For every client request, it creates a dedicated user thread running at normal priority in the thread pools to process the user request and send back the result to each client once the result is ready. And there is one single user thread that waits for input from the console, and a group of utility threads running at lower priority to handle some background tasks. Currently, MySQL cannot scale well with the number of concurrent user connections. On a T2000 server, in the OLTP I/O bound read-write sysbench test, MySQL can scale from 2 up to 64 concurrent user threads to reach the peak performance point. After that, increasing the number of user connections will increase the user level lock contention observed from prstat –mL output(LCK) to reduce MySQL performance. For applications where the number of user connections is tunable, you need to test to get the optimum number of user connections for peak performance. For applications where the number of user connections is not tunable, the innodb_thread_concurrency parameter can be configured to set the number of threads working inside the InnoDB engine. You need to increase this value when you see many queries in the queue in show innodb status. Setting this value at 0 will disable it. On the T2000 server, we set it to be around 2*(Num of disks) in the sysbench OLTP I/O bound workload test. Testing and tuning the optimal value for the innodb_thread_concurrency parameter according to the kind of workload, and behavior of your system at runtime, can affect performance significantly.
Optimize File system Performance on T2000
File system performance have a big impact on system performance -- particularly when running an I/O bound workload with a database size much bigger than system memory. How to configure the file system for better performance depends on the workload access pattern: random or sequential. For a sequential workload, we can increase the file system cluster size (the maxcontig parameter) to allow read ahead or writing back more data from/to the disk to reduce the total number of I/O operations. For random workloads, we can reduce the file system cluster size to match the innodb I/O size. On the Solaris platform, maxcontig is set as 128 by default, which will trigger read-ahead for the whole file system cluster length (128*8 Kbytes on Solaris Sparc, 128*4Kbytes on Solaris x86) or the maximum size of physical I/O set in maxphys system variable. In the sysbench OLTP I/O bound test (a random workload,) for example, it can saturate a disk and significantly degrade performance because of it. One way to solve this problem is to reduce the value of the maxcontig parameter. In the sysbench OLTP I/O bound tests on the T2000, changing the maxcontig value to be 2 by using the tunefs –a 2 /dev/dsk/c4t1d0s6 command on the file system improved performance by 10%-13%. The shortcoming of this solution is that it will impact the performance of other sequential workloads on your system. The other way to improve performance is to disable file system caching with the UFS mount option: forcedirectio (mount –o remount,forcedirectio /data). Since innodb has its own buffer cache for the data and index(set by innodb_buffer_pool_size) which is more efficient than the operating system cache, we can use filesystem directio to save the double buffering and automatically disable read-ahead to benefit the random workload. On a T2000 installed with Solaris 10 update 1 to update 3, it is important to add set auto_lpg_maxszc=1 to the /etc/system suggested by Aleksandr Guzovskiy to reduce max pagesize for heap/stack/mmap to be 64k(default is up to be 256M) while using filesystem directio, otherwise, you may encounter a big performance drop with directio while multiple concurrent user threads are accessing the MySQL database. On the T2000 in the sysbench OLTP I/O bound tests, using directio improved performance by 14%-17%. The shortcoming of using filesystem directio is that it will significantly impact the performance of other applications on your system which don’t have internal caching (like the MySQL MyISAM engine which does not have its own data buffer cache) but instead depend on the filesytem caching to implement buffering.
Expected MySQL Performance On Niagara 2 UltraSPARC T2 Server
For disk I/O workload, MySQL can benefit the new features on Niagara 2 servers compared to Niagara 1 servers:
Larger L2 Cache: The on-chip 4MB L2 cache on Niagara 2 server can cache frequently accessed memory to get better MySQL performance.
Larger memory: Up to 512GB memory of fully buffered DIMMs in the integrated memory controller with an aggregated memory bandwidth of 64 gbps on the Niagara 2 server can buffer larger databases into the memory and reduce the time the CPU spends waiting for data to arrive. By adjusting how much memory MySQL innodb uses, we can expect to get significant performance improvements.
X8 PCI Express: PCI-E directly on-chip can reduce latency to speed up MySQL disk I/O performance as expected.
More paper on UltraSPARC T2 Server Technology, performance, etc.
Allan Pack's Weblog: CMT Comes Of Age
Example MySQL options
Here is the example of /etc/my.cnf on T2000(32x1200MHz, 8GB RAM, Solaris 10 11/06) in sysbench OLTP I/O bound test(100M-row):
[mysqld]
port = 3306
socket = /tmp/mysql.sock
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
log-error = /data/error.txt
user=root
skip-locking
max_connections = 3000
table_cache = 1024
max_allowed_packet = 1M
sort_buffer_size = 64K
thread_cache = 8
thread_concurrency = 32
query_cache_size = 0M
query_cache_type = 0
default-storage-engine = innodb
transaction_isolation = REPEATABLE-READ
tmp_table_size = 1M
innodb_data_file_path = ibdata1:100M:autoextend
innodb_buffer_pool_size = 5500M
innodb_additional_mem_pool_size = 20M
innodb_log_file_size =1900M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit =1
innodb_lock_wait_timeout = 300
innodb_max_dirty_pages_pct = 90
innodb_thread_concurrency =32
Example /etc/system on T2000
set pcie:pcie_aer_ce_mask=0x1
set ip:dohwcksum=0
set autoup=900
set tune_t_fsflushr=1
set auto_lpg_maxszc=1
Power-Efficiency Study For MySQL Database Server
Introduction
The purpose of this document is to explain the power-efficeincy analysis proess, power-efficiency issues and solutions in MySQL database server as an example to help identify the right tools and procedures that support the development of energy-efficient application.
Analysis Process
Install and start test application(MySQL) on idle system, create workload to simulate the scenario that multiple concurrent clients connecting to the MySQL server without executing any queries.
Run PowerTop on Solaris to see how much time the CPUs are spending running in lower power states, and how much time the CPU are spending running in C0 state(meaning CPU is actually executing instructions.)
Use DTrace to profile application understand the source of the wakeups causing power-consuming activity.
The remainder of the document covers the detail information on each of the above procedures
Test Setup And MySQL Configurations
HW Installation:
Host Server Type
SunFire x4150Server
CPU: 8x2826MHz Intel-Xeon
Memory: 16GB
Client Server Type
SunFire v20z server
CPU: 2x1793MHz AMD
Memory:2G
SW Installation:
Host OS
Solaris snv_96 x86
MySQL Database Server
5.0.51
Test App
Sysbench
commenting query execution:
//rc = mysql_stmt_execute(stmt->ptr);
Sysbench test Installation Steps:
shell> cd /sysbench-version
shell> ./configure
shell> make
shell> make install
MySQL Configurations:
a. Standalone MySQL server
default configuration
b. Master-slave MySQL Server
master db server: /etc/my.cnf
[mysqld]
server-id=1
log-bin= /usr/local/mysql/data/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 500M
slave db server: /etc/my.cnf
[mysqld]
server-id=2
master-host = v20z-241-30
master-user = slave_v20z
master-password = passw0rd
master-connect-retry = 60
C. Master-master MySQL Server
master db server: /etc/my.cnf
[mysqld]
replicate-same-server-id = 0
server-id=1
auto-increment-increment = 2
auto-increment-offset = 1
master-host = 10.6.241.106
master-user = slave_x4150
master-password = passw0rd
master-connect-retry = 60
log-bin= /usr/local/mysql/data/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 500M
Command To run 16 concurrent user sysbench oltp test(no queries):
standalone mysql server configuration:
shell> sysbench –test=oltp –oltp-table-size=100000 –mysql-db=sbtest –max-requests=0 --oltp-dist-type=special —num-threads=16 run
replication mysql server(M/S, M/M) configuration:
shell> sysbench –test=oltp –oltp-table-size=100000 –mysql-db=sbtest –max-requests=0 --oltp-dist-type=special –mysql-host=x4150-241-06 —num-threads=16 run
Test Observations and Results
Before starting MySQL server, the system is idle.
shell>powertop
Solaris PowerTOP version 1.0 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (3.7%) 2826 Mhz 100.0% C1 2.5ms (96.3%) Wakeups-from-idle per second: 384.9 interval: 8.0s no ACPI power usage estimate available Top causes for wakeups: 26.0% (100.0) <kernel> : genunix`clock 25.9% ( 99.8) <kernel> : uhci`uhci_handle_root_hub_status_change 19.1% ( 73.4) sched : <scheduled timeout expiration> 8.7% ( 33.4) <kernel> : ehci`ehci_handle_root_hub_status_change 4.4% ( 16.8) java : <scheduled timeout expiration> 1.0% ( 4.0) <kernel> : uhci`uhci_cmd_timeout_hdlr 1.0% ( 4.0) <kernel> : genunix`schedpaging The powertop tool for Solaris x86 reports how much time CPUs spending in lower power states(C1), and how much time the CPUs are runing on C0 state(executing instructions). Even though the system show idle from processor monitor tools(vmstat, mpstat), the powertop can show that not 100% of its time running at the C1 state, but arround 3.7% time running on C0 state with wakeups mainly from some kernel activities.Test 1: Standalone MySQL server Configuration
When running 16-concurrent users sysbench test, the powertop report the similar data as expected since there is no queries executed and the MySQL server didn't do any work: Solaris PowerTOP version 1.0 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (3.9%) 2826 Mhz 100.0% C1 2.5ms (96.1%) Wakeups-from-idle per second: 389.3 interval: 8.0s no ACPI power usage estimate available Top causes for wakeups: 25.8% (100.2) <kernel> : uhci`uhci_handle_root_hub_status_change 25.7% (100.0) <kernel> : genunix`clock 18.8% ( 73.1) sched : <scheduled timeout expiration> 8.6% ( 33.3) <kernel> : ehci`ehci_handle_root_hub_status_change 4.3% ( 16.6) java : <scheduled timeout expiration> 1.0% ( 4.0) <kernel> : uhci`uhci_cmd_timeout_hdlr 1.0% ( 4.0) <kernel> : genunix`schedpagingTest 2: Master-slave MySQL server configuration
On slave server, the powertop report :
Solaris PowerTOP version 1.0 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (27.3%) 2826 Mhz 100.0% C1 0.1ms (72.7%) Wakeups-from-idle per second: 9846.8 interval: 8.0s no ACPI power usage estimate available Top causes for wakeups: 33.5% (3301.1) <interrupt> : e1000g#0 15.6% (1532.4) <interrupt> : aac#0 1.0% (100.2) <kernel> : uhci`uhci_handle_root_hub_status_change 1.0% (100.0) <kernel> : genunix`clock 0.7% ( 73.2) sched : <scheduled timeout expiration> 0.3% ( 33.3) <kernel> : ehci`ehci_handle_root_hub_status_change 0.2% ( 16.7) java : <scheduled timeout expiration>
Use DTrace tracing MySQL's hot function calls to help understand why MySQL slave server is causing CPU wake up from idle state to waste power:
#!/usr/sbin/dtrace -qs
pid$1:::entry
{
self->ts = vtimestamp;
}
pid$1:::return
/self->ts/
{
@a["Count",probefunc] = count();
@b["Time",probefunc] = sum(vtimestamp - self->ts);
self->ts = 0;
}
tick-10sec
{
exit(0);
}
Identified that the cuase of the wakeups was the known MySQL's bug: http://bugs.mysql.com/bug.php?id=33815
Re-configure MySQL's slave server's “server-id = 3” in /etc/my.cnf, start 16 con-current users sysbench test, the powertop report close to 96% low power state(C1) and there is no wakeups from MySQL.
Test 3: Master-master MySQL server configuration
When running 16-concurrent users sysbench test connecting to the MySQL master server, the powertop report no wakeups from MySQL as expected since there is no queries executed by the MySQL server: Solaris PowerTOP version 1.0 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (4.7%) 2826 Mhz 100.0% C1 1.9ms (95.3%) Wakeups-from-idle per second: 496.1 interval: 8.0s no ACPI power usage estimate available Top causes for wakeups: 20.2% (100.2) <kernel> : uhci`uhci_handle_root_hub_status_change 20.2% (100.0) <kernel> : genunix`clock 14.8% ( 73.4) sched : <scheduled timeout expiration> 6.7% ( 33.3) <kernel> : ehci`ehci_handle_root_hub_status_change 3.4% ( 16.7) java : <scheduled timeout expiration> 0.8% ( 4.1) <interrupt> : e1000g#0 0.8% ( 4.0) <kernel> : uhci`uhci_cmd_timeout_hdlr 0.8% ( 4.0) <kernel> : genunix`schedpaging 0.4% ( 2.0) <kernel> : fcip`fcip_timeoutPower and Performance Measurement
With the different MySQL configuration(single server, replication server) in the tests above, we can see when MySQL's running on idle system, CPU can spend the lowest power states. In order to reduce the energy costs, we can work on improving MySQL applicaiton's performance(response time) to keep system in idle state longer saving the CPU resource.
There are different places that we can make changes to improve performance of a MySQL application: schema optimization and indexing, query performance optimization, tuning database server settings. The more detail and deep information on MySQL performance was covered in the guide book of High Perormance MySQL,2nd edition written by the MySQL performance experts. In this document, I have research test results bellow showing that increasing MySQL performance with these methods can actually reduce CPU utilization to save energy.
Performance Test Tool
mysqlslap: emulate client load for MySQL server. This tool is bundled with MySQL 5.1 server binary on Solaris OS
database schema: world schema(availabe at: http://dev.mysql.com/doc)
Execute custom query file with mysqlslap:
shell>mysqlslap -uroot -p -q ./myqueries.sql --create-schema=world -i 8
Test 1: Indexing
Create a new table city_huge based on City table in world database
mysql>create table city_huge select * from City;
mysql>insert into city_huge select * from City; (4 times)
mysql>alter table city_huge add index name_btree(Name); (create index on Name column)
Execute the query file using the index:
shell>cat scan-index.sql
...
SELECT count(*) FROM city_huge force index (name_btree) where name = 'Amsterdam'
...
shell>mysqlslap -uroot -p -q ./scan-index.sql --create-schema=world -i 8
Benchmark Average number of seconds to run all queries: 0.110 seconds Minimum number of seconds to run all queries: 0.109 seconds Maximum number of seconds to run all queries: 0.111 seconds Number of clients running queries: 1Average number of queries per client: 1000
shell>powertop Solaris PowerTOP version 1.0 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (8.6%) 2826 Mhz 100.0% C1 0.3ms (91.4%) Wakeups-from-idle per second: 2633.3 interval: 8.0s no ACPI power usage estimate available Top causes for wakeups: 3.8% (100.0) <kernel> : genunix`clock 3.8% ( 99.8) <kernel> : uhci`uhci_handle_root_hub_status_change 2.8% ( 73.3) sched : <scheduled timeout expiration> 1.3% ( 33.3) <kernel> : ehci`ehci_handle_root_hub_status_change 0.6% ( 16.8) java : <scheduled timeout expiration> 0.2% ( 6.4) <interrupt> : aac#0 0.2% ( 6.4) <interrupt> : e1000g#Execute the query file without using the index:
shell>cat scan-noindex.sql
...
SELECT count(*) FROM city_huge ignore index (name_btree) where name = 'Amsterdam'
..
For all the database application, adding index is a great method to boost read-access performance. It allows MySQL spend less time find and retrieve the records from indexing instead of scanning the whole table. The time saving of table scan can boost the power usage of the database system. In the above test, we can see that using index can significantly improve the response time of queries and save power by increasing CPUs' spending in lower power states(C1) for arround 12%.
Test 2: Query Performance
Execute query file running join with two tables
#cat join.sql
select Country.Name from Country join city_huge on Country.Code=city_huge.CountryCode where
city_huge.Population> 8000000;
select Country.Name from Country join city_huge on Country.Code=city_huge.CountryCode where
city_huge.Population> 9000000;
#mysqlslap -uroot -p -q ./join.sql --create-schema=world -i 4
Benchmark Average number of seconds to run all queries: 0.003 seconds Minimum number of seconds to run all queries: 0.003 seconds Maximum number of seconds to run all queries: 0.005 seconds Number of clients running queries: 1 Average number of queries per client: 2 shell>powertop Cn Avg residency P-states (frequencies) C0 (cpu running) (6.3%) 2826 Mhz 100.0% C1 1.3ms (93.7%) Wakeups-from-idle per second: 726.3 interval: 8.0s no ACPI power usage estimate available Top causes for wakeups: 13.8% (100.0) <kernel> : genunix`clock 13.7% ( 99.8) <kernel> : uhci`uhci_handle_root_hub_status_change 10.1% ( 73.4) sched : <scheduled timeout expiration> 4.6% ( 33.3) <kernel> : ehci`ehci_handle_root_hub_status_change 2.3% ( 16.8) java : <scheduled timeout expiration> 0.9% ( 6.9) <interrupt> : e1000g#0 0.5% ( 4.0) <kernel> : uhci`uhci_cmd_timeout_hdlrExecute query file replacing the join quereis with subqueries
#cat subquery.sql
SELECT distinct Country.name FROM Country WHERE code IN (SELECT CountryCode FROM city_huge WHERE population > 8000000);
SELECT distinct Country.name FROM Country WHERE code IN (SELECT CountryCode FROM city_huge WHERE population > 9000000);
Test 3: MySQL Server configurations
MySQL's query cache stores the identical SELECT queries issued by client to the database server. By default, MySQL set the query_cache_size as 0. In this test , configure MySQL server to set query_cache_size as 1M mysql>set global query_cache_size=1m # mysqlslap -uroot -p -q ./subquery.sql --create-schema=world -i 4 Enter password: Benchmark Average number of seconds to run all queries: 2.630 seconds Minimum number of seconds to run all queries: 0.000 seconds Maximum number of seconds to run all queries: 10.522 seconds Number of clients running queries: 1 Average number of queries per client: 2 #powertop Solaris PowerTOP version 1.0 (C) 2007 Intel Corporation Cn Avg residency P-states (frequencies) C0 (cpu running) (10.4%) 2826 Mhz 100.0% C1 1.5ms (89.6%) Wakeups-from-idle per second: 585.4 interval: 8.0s no ACPI power usage estimate available Top causes for wakeups: 17.1% (100.0) <kernel> : genunix`clock 17.0% ( 99.8) <kernel> : uhci`uhci_handle_root_hub_status_change 12.5% ( 73.3) sched : <scheduled timeout expiration> 5.7% ( 33.4) <kernel> : ehci`ehci_handle_root_hub_status_change 2.9% ( 16.8) java : <scheduled timeout expiration> 1.1% ( 6.4) <interrupt> : e1000g#0 0.7% ( 4.0) <kernel> : uhci`uhci_cmd_timeout_hdl
After configure MySQL server to enable query cache, MySQL can save time and CPU resource on repetitivly parsing, optimization and execution of the same queries. The test above showed that the queries' response time can be redcued by nearly 4 times by retrieving query result set from the query cache directly and the power usage can be saved by incresing CPUs' spending in lower power states(C1) for arround 11%.
Conclusion
In this document, I use MySQL as an example to show the steps to measure the energy efficiency of the application software. By using the powertop tool on Solaris, we can see whether the system can take advantage of power management features when it is idle; or we can use dtrace tool on Solaris to analyze the source of the power management issues report in the powertop tool, and further improve power efficiency by developing and tuning high performance application.
References
DTrace Integration with MySQL 5.0 - Chime demo in MySQL Users Conference 2008
On April 15-16, we demoed a few DTrace probes for MySQL 5.0 integrated with Chime visualization Tool at MySQL users conference 2008. Here is an snapshot of the DTrace probes in chime showing the query execution time/count in Chime tool:
The DTrace probes inserted into MySQL 5.0 in the demo are:
provider mysql {
probe data__receive__start(int);
probe data__receive__finish(int);
probe query__plan__start(char *);
probe query__plan__finish(char *);
probe query__execute__start(void *, char *, char *, const char *, char *);
probe query__execute__finish(void *, char *, char *, const char *, char *,int);
probe query__cache__hit(void *, char *, char *, const char *, char *);
probe query__cache__miss(void *, char *, char *, const char *, char *);
probe myisam__wrlck__start();
probe myisam__wrlck__finish();
probe innodb__index__next__start(char *);
probe innodb__index__next__finish(char *);
probe innodb__index__next__same__start(char *);
probe innodb__index__next__same__finish(char *);
probe innodb__index__prev__start(char *);
probe innodb__index__prev__finish(char *);
probe innodb__index__first__start(char *);
probe innodb__index__first__finish(char *);
probe innodb__index__last__start(char *);
probe innodb__index__last__finish(char *);
probe innodb__rnd__init__start(char *);
probe innodb__rnd__init__finish(char *);
probe innodb__rnd__end__start(char *);
probe innodb__rnd__end__finish(char *);
probe innodb__rnd__next__start(char *);
probe innodb__rnd__next__finish(char *);
probe innodb__rnd__pos__start(char *);
probe innodb__rnd__pos__finish(char *);
probe flush__log__start();
probe flush__log__finish();
probe innodb__wait__buff__start();
probe innodb__wait__buff__finish();
};
Many people seeing the demo showed interest to try with the chime tool, per their requests, I'm putting the DTrace patch and chime tool for MySQL DTrace with the instructions on how to use the tool:
1. Apply the DTrace patch for MySQL 5.0.41
#gpatch -p2 -i mysql-5.0.41-dtrace.patch
2. Compile the MySQL src with the DTrace patch integrated:
#
./configure --enable-dtrace --prefix=/usr/local/mysql --with-innodb ->32 bit
# CFLAGS="-m64", CXXFLAGS="-m64" ./configure --enable-dtrace
DTRACEFLAGS='-64'
--prefix=/usr/local/mysql --with-innodb ->64 bit
In MySQL 6.0, there are a few DTrace probes inserted into the source code, you can try with the probes by compiling with "--enable-dtrace" configure option.
3. Install the chime tool for MySQL DTrace: chime.demo.tar by uncompress the chime.demo.tar from /
# tar xvf chime.demo.tar
4. Start Chime Tool as root:
#/opt/OSOL0chime/bin/chime
Instead of using Chime to show the probes in the user interface, we can also use the test scripts: dtrace_test.tar to run the DTrace probes. For example, by running the following script, we can get the time spending by each SQL statements waiting for reading page synchronous from disk, so that we can tuning the innodb_buffer_pool_size or SQL accordingly.
#cat innodb_buffer_wait.d
#!/usr/sbin/dtrace -qs
mysql*:::query-execute-start
{
this->query = copyinstr(arg4);
}
mysql*:::innodb-wait-buff-start
{
self->init = vtimestamp;
}
mysql*:::innodb-wait-buff-finish
/self->init/
{
@innittime[this->query] = sum(vtimestamp - self->init);
self->init = 0;
}
profile:::tick-5s
{
printf("------------------------------------------\n");
printf("Date: %Y\n",timestamp);
printf("Time wait for innodb buffer pool available");
printa(@inittime):
printf("------------------------------------------\n");
}
# ./innodb_buffer_wait.d
----------------------------------------------------------------------------------------
Date: 2008 April 17 13:34:43
Time wait for innodb buffer pool available
SELECT c from sbtest where id between ? and ? order by c 1976800
SELECT SUM(K) from sbtest where id between ? and ? 6184600
...
-----------------------------------------------------------------------------------------
Installation and Configuration Sun Cluster Data Service For MySQL
MySQL data service running on a clustered-server model provide orderly startup, shutdown, fault monitoring, and high availability(failover) mechanism compared to the single server model. It can be free donwloaded, and Sun also released Open HA cluster, derived from the Sun Cluster 3.2 agents including HA MySQL data service. The open HA cluster also provide build tools necessary to develop new features, build and use the code. In addition, Sun cluster 3.2 Data service for MySQL has the advantages including:
It is easy to install and configure with straightforward GUI and command line interfaces.
Expanded support for Solaris Containers(Solaris zones)
Expanded support for SMF
ZFS is fully supported as a local highly available fileystem
The following is the OS and HW setup for two-node cluster:
Solaris 10 11/06 installed on the two nodes
Each node has two network interfaces to be used as point-to-point private interconnects, and one network interface connect to the public network interface.
Two storage device(SE6120) connected to the two nodes
Figure
1:MySQL Replication Clustered Model
In the above MySQL clustered-server configuration, logical host name is set as the failover IP address within the same subnet. When the MySQL resource online, the failover IP is plumbed on the node where the MySQL resource is running. If a failover happen, the IP address moves along with the MySQL resource to the failover node. In this example, the configuration is:
Name
Interface
IP address
Logical Hostname
10.6.241.210
Node1
Sunfire-x64-241-02
bge0
10.6.241.208
Node2
Sunfire-x64-241-03
bge0
10.6.241.209
For the private network interfaces as cluster interconnection, Sun cluster installation will configure the network assigning the private network addresses. Note: Do NOT configure the private network interfaces before installing Sun cluster, otherwise, you will fail to install Sun cluster and get the error message as:
Adapter “ce0” is already in use as a public network adapter
After the HW and network is setup as the above, the general task flow of MySQL data service installation and configuration is:
Figure 2:MySQL Data Service Installation & Configuration Flow chart
Step 1: Plan Installation
Sun cluster requires to set aside a special file system named as: /globaldevices on one of the local disks for use in managing global devices on the two nodes.This file system is later mounted as a cluster file system as:
(optional) Setup cluster environment
PATH=/usr/bin:/usr/cluster/bin:/usr/local/mysql/bin:/usr/sbin:/usr/ccs/bin:/usr/cluster/man:$PATH
On both nodes, update the /etc/inet/ipnodes file with all public hostnames for the cluster
On both nodes, add the following entry in the /etc/system file
The first entry supports ce adapters for the private interconnect, and the second entry disable the loopback file system(LOFS)
5. local-mac-address? variable must set as true for Ethenet adapters. On Solaris x86, use the command: #eeprom local-mac-address?=true; On Solaris SPARC, change the local-mac-address? Variable to be ture from OBP OK> promptStep 2: Sun Cluster and MySQL data service Installation
Solaris cluster(sun cluster, Sun cluster Geographic Edition and Sun cluster agents) 3.2 is downloadable at:
http://www.sun.com/download/products.xml?id=4581ab9e
On both nodes, run the installer command to bring up the installer GUI . The installation of Sun cluster and Sun cluster for HA MySQL is straightforward to follow the instructions on the screen. Choose Sun cluster Core 3.2 core software, Sun cluster HA for MySQL to install, and choose Configure later before the installation.
Step 3: Configure Sun Cluster Software
Perform this procedure from one node of the cluster to configure Sun Cluster software on all nodes of the cluster:
1. # /usr/cluster/bin/scinstall
2. From the main menu, pick the “Option 1” to “Create a new cluster or add a cluster node”
3. From the new cluster and Cluster Node menu, pick the “Option 1” to “Create a new cluster”
4. From the Typical or Custom Mode, pick the “Option 1” for typical mode
5. From the Cluster Nodes menu, type the node name: “Sunfire-x64-241-02”, and “Sunfire-x64-241-03”
6. Provide the first and the second private adapter name: ce0, bge1
7. Type “no” for “ disable automatic quorum device selection”
8. Type “yes” for “ create the new cluster”
9. Type “no” for “ Interrupt cluster creation for sccheck errors”
At this point, the scinstall utility configures all cluster nodes and reboot the cluster. The cluster is established when all nodes have successfully booted into the cluster. Sun Cluster installation output is logged in /var/cluster/logs/install/scinstall.log.N file.
10. Verify cluster setup:
Sunfire-x64-241-03# clquorum list Sunfire-x64-241-03 Sunfire-x64-241-02 Sunfire-x64-241-03# clnode status Cluster Nodes === --- Node Status --- Node Name Status --------- ------ Sunfire-x64-241-03 Online Sunfire-x64-241-02 OnlineWhen the scinstall utility finishes, The cluster is now ready to configure the components you will use to support highly available MySQL, including device groups, and file systems.
Step 4: Configure Volume Manager and File System
In a shared disk set configuration in this example, two hosts are physically connected to the same set of disks. When one node fails, another node has exclusive access to the disks. Each node can control a shared disk set, but only one host can control it at a time.
From one node: Sunfire-x64-241-02, create one disk set for MySQL data service, so that Sunfire-x64-241-02 is make as the primary node:
Sunfire-x64-241-02# metaset -s mysqlset -a -h Sunfire-x64-241-02 Sunfire-x64-241-03
Verify that the configuration of the disk sets is correct and visible to both nodes.
From the primary node: Sunfire-x64-241-02, list the DID mappings:
Sunfire-x64-241-03
-
Sunfire-x64-241-02# cldevice show |grep Device
-
...
DID Device Name: /dev/did/rdsk/d5
Full Device Path: Sunfire-x64-241-02:/dev/rdsk/c3t60003BACCC90200046264D58000A22E3d0
Full Device Path: Sunfire-x64-241-03:/dev/rdsk/c5t60003BACCC90200046264D58000A22E3d0
Add /dev/did/rdsk/d5 to the MySQL disk setup
...
Sunfire-x64-241-02# metaset -s mysqlset -a /dev/did/rdsk/d5
Verify that the configuration of the disk set is correct
Sunfire-x64-241-02# metaset -s mysqlset
On both nodes, create the /etc/lvm/md.tab file with the following entries
-
mysqlset1/d0 -m mysqlset1/d10
mysqlset1/d10 1 1 /dev/did/rdsk/d5s0
mysqlset1/d1 -p mysqlset1/d0 50G
mysqlset1/d3 -p mysqlset1/d0 50G
7. From Sunfire-x64-241-02, take ownership for the mysql disk set and activate the volume-
Sunfire-x64-241-02# cldevicegroup switch -n Sunfire-x64-241-02 mysqlset
Sunfire-x64-241-02# metainit -s mysqlset -a
-
Sunfire-x64-241-02# metastat
...
Status: Okay
...
10. On both node, create the mount-point directory for the file systems
-
# mkdir -p /global/mysql - mysql master and slave servers
11. On both node, add entries to the /etc/vfstab file for the above mount points
12. On both nodes, mount the file systems, and verify that the file systems are mounted-
# mount
/global/mysql on /dev/md/mysqlset/dsk/d1 read/write/setuid/devices/intr/largefiles/logging/noquota/global/xattr/nodfratime/onerror=panic/dev=1544001 on Tue Aug 14 17:12:45 2007
/global/mysql-data1 on /dev/md/mysqlset/dsk/d2 read/write/setuid/devices/intr/forcedirectio/largefiles/logging/noquota/global/xattr/nodfratime/onerror=panic/dev=1544002 on Tue Aug 14 17:12:50 2007
-
/global/mysql-data2 on /dev/md/mysqlset/dsk/d3 read/write/setuid/devices/intr/forcedirectio/largefiles/logging/noquota/global/xattr/nodfratime/onerror=panic/dev=1544003 on Tue Aug 14 17:12:56 2007
-
Sunfire-x64-241-02# cluster check
Step 5: Configure Sun Cluster Resource for MySQL
Register SUNW.gds, SUNW.HAStoragePlus resource type
Sunfire-x64-241-02# scrgadm -a -t SUNW.gds
Sunfire-x64-241-02# scrgadm -a -t SUNW.HAStoragePlus
Create MySQL resource group named MySQL-failover-resource-group
Sunfire-x64-241-02# scrgadm -a -g MySQL-failover-resource-group
Create the HAStoragePlus resource named MySQL-has-resource in the MySQL-failover-resource-group resource group for MySQl disk storage
Sunfire-x64-241-02# scrgadm -a -j MySQL-has-resource -g MySQL-failover-resource-group -t SUNW.HAStoragePlus -x FilesystemMountPoints=/global/mysql-data1,/global/mysql-data2
Create a logical hostname resource named MySQL-lh-resource
Sunfire-x64-241-02# scrgadm -a -L -j MySQL-lh-resource -g MySQL-failover-resource-group -l 10.6.241.210
To verify the logical hostname resource is online, you can run “ifconfig -a” to see if the virtual IP address is configured on the network interface.
Enable the failover resource group including the MySQL disk storage and logical hostname resources.
Sunfire-x64-241-02# scswitch -Z -g MySQL-failover-resource-group
Step 6: Install and Configure MySQL
On both nodes, install MySQL under /usr/local/mysql, which is symbol link to /global/mysql
Sunfire-x64-241-02# mysql_install_db –datadir=/global/mysql-data1
Sunfire-x64-241-02# chown -R root .
Sunfire-x64-241-02# chown -R mysql /global/mysql-data1
Sunfire-x64-241-02# chgrp -R mysql .
Copy the sample “my.cnf_sample_master” and “my.cnf_sample_slave” under “/opt/SUNWscmys/etc” to the MySQl data directory(mysql-data1, mysql-data2) of the MySQL master and slave machine.
Modify the sample my.cnf file to point to the right directories for the data and log files. “bind-address” must be set with the logical hostname as “10.6.241.210” in this example. Please note: it need to set bind-address = ip number, because of the unfixed MySQL bug on Solaris amd64 OS.
Step 7: Modify MySQL Configuration Files
Go to the directory /opt/SUNWscmys, add cluster's information in the mysql_config file.
-
Sunfire-x64-241-02# vi mysql_config
...
MYSQL_USER=root
MYSQL_PASSWD=password
MYSQL_HOST=10.6.241.210 ->Logical hostname(IP)
FMUSER=fmuser
FMPASS=fmuser
MYSQL_SOCK=/tmp/10.6.241.212.sock
MYSQL_NIC_HOSTNAME="Sunfire-x64-241-02 Sunfire-x64-241-03" ->Physical hostname
Add cluster's information in the ha_mysql_config file
Sunfire-x64-241-02# vi ha_mysql_config ... RS=MySQL-failover-resource-group RG=MySQL-failover-resource-group PORT=3306 LH=10.6.241.210 HAS_RS=mysql-has-resource
-
...
BASEDIR=/usr/local/mysql
DATADIR=/global/mysql-data1
MYSQLUSER=mysql
MYSQLHOST=Sunfire-x64-241-02
FMUSER=fmuser
FMPASS=fmuser
LOGDIR=/global/mysql-data1/logs
CHECK=YES
Step 8: Enable Sun cluster HA For MySQL
Start MySQL server with “–skip-grant-table” option
Sunfire-x64-241-02# mysqld_safe –defaults-file=/global/mysql-data1/my.cnf –datadir=/global/mysql-data1 –skip-grant-table –user=mysql &
Change the password for root as “password”
Sunfire-x64-241-02# mysql -S /tmp/10.6.241.210.sock -u root mysql>UPDATE mysql.user set Password=PASSWORD('password') -> where User='root'; mysql>FLUSH PRIVILEGES;Shutdown and restart the MySQL servers without the “–skip-grant-table” option on the node where the resource group is online(check it with the “scstat -g” command)
Prepare the Sun cluster specific test database.
Sunfire-x64-241-02# cd /opt/SUNWscmys Sunfire-x64-241-02# ./mysql_register -f /opt/SUNWscmys/util/mysql_config sourcing /opt/SUNWscmys/util/mysql_config and create a working copy under /opt/SUNWscmys/util/mysql_config.work
Shutdown the mysql server
Sunfire-x64-241-02# mysqladmin -S /tmp/10.6.241.210.sock shutdown -p
Register resource
Sunfire-x64-241-02# cd /opt/SUNWscmys Sunfire-x64-241-02# ./ha_mysql_register -f /opt/SUNWscmys/util/ha_mysql_configEnable each MySQL resource
scswitch -e -j MySQL-has-resource scswitch -e -j MySQL-lh-resource
Step 9: Verify Sun cluster HA for MySQL configuration
Once all the MySQL resources are created and configured, and online(check with the “scstat -g” command), you should go ahead to see if the MySQL database can successfully fail over to each node configured in the resource group(MySQL-failover-resource-group). This can be verified by running with “scswitch” to switch MySQL resource group to another node to fail the resouce group to:
#scswitch -z -g MySQL-failover-resource-group -h Sunfire-x64-241-03
If you can successfully migrate the database to each node in the cluster, you now have highly available MySQL database.
At this point, by following the above nine steps, I have completed the basics of deploying highly available MySQL database. To deploy the Sun Cluster MySQL data service to achieve the maximum availability into the production environment, I would highly recommend you read through the Sun cluster documents and MySQL data service guides and verify everything in a test environment in advance.
DTrace Probes In MySQL
Inserting user-defined DTrace probes into MySQL source code is very useful to help user identify the performance problems in the application level and the database server, In addition, the cost of the USDT probe is basically neglectable. Each probes inserted into the src can be enabled by adding the code like:
If (PROVIDER_PROBE_ENABLED()
{
PROVIDER_PROBE(arg0,…);
}
The steps to add DTrace probes into MySQL is very straightforward.
Step 1: Figure out what probes are needed to insert into the source code
This is the difficult part that requires you understand the MySQL implementation details. Generally, it is good to insert probes to clarify the DB response time distribution including processing query, waiting on locks and latches, doing disk I/O, receiving/sending back data. You can certainly define more probes deep into each of the MySQL engines (such as: define probes to measure the cost of innodb sync spin wait)
Step 2: Define Provider and probes
Create a mysqlprovider.d file as:
provider mysql {
probe query__execute__start(int);
probe query__execute__finish(int);
…
};
It is required to define the probes with easy to understand name. The two underscore(__) is translated to hyphen(-) in the D script file, so the above two probes are called query-execute-start and query-execute-finish
Step 3: Define header file for probes
Create mysqlprovider.h file as:
#ifndef _MYSQLPROVIDER_H
#define _MYSQLPROVIDER_H
#ifdef ENABLE_DTRACE
#define MYSQL_QUERY_EXECUTE_START(arg0) \
__dtrace_mysql__query_execute__start(arg0)
#define MYSQL_QUERY_EXECUTE_START_ENABLED() \
__dtraceenabled_mysql__query_execute__start()
#define MYSQL_QUERY_EXECUTE_FINISH(arg0) \
__dtrace_mysql__query_execute__finish(arg0)
#define MYSQL_QUERY_EXECUTE_FINISH_ENABLED() \
__dtraceenabled_mysql__query_execute__finish()
extern void __ dtrace_mysql__query_execute__start(int)
extern int __ dtraceenabled_mysql__query_execute__start(void)
extern void __ dtrace_mysql__query_execute__finish(int)
extern int __ dtraceenabled_mysql__query_execute__finish(void)
#else
/*
*Unless DTrace is explicitly enabled with –enable-dtrace, the MYSQL macros will expand to no-ops.
*/
#define MYSQL_QUERY_EXECUTE_START(arg0) \
__dtrace_mysql__query_execute__start(arg0)
#define MYSQL_QUERY_EXECUTE_START_ENABLED() \
__dtraceenabled_mysql__query_execute__start()
#define MYSQL_QUERY_EXECUTE_FINISH(arg0) \
__dtrace_mysql__query_execute__finish(arg0)
#define MYSQL_QUERY_EXECUTE_FINISH_ENABLED()
#endif
#endif /* _MYSQLPROVIDER_H */
Step 4: Insert the probes into source code
You need to include the header file created for DTrace probes before inserting the probe macro. And in order to monitor the server behavior as expected, it requires the knowledge of the MySQL source code to add the probe macro into the right place.
#include <mysqlprovider.h>
mysql_parse {
…
bool
mysql_execute_command(THD *thd)
{
MYSQL_QUERY_EXECUTE_START(thd->thread_id);
…
case SQLCOM_EXECUTE:
{
mysql_sql_stmt_execute(thd);
MYSQL_QUERY_EXECUTE_FINISH(thd->thread_id);
Break;
}
….
}
Step 5: Build MySQL with DTrace
You will need to specify the “—enable-dtrace” as the configure option to make the DTrace probes available in MySQL on Solaris 10 and above. On the other operating system without the DTrace facility, the DTrace probes are disabled as default.
In the Makefile, you can compile the 64-bit MySQL with DTrace probes as bellow:
mysqlproviders.o: mysqlproviders.d $(mysqld_OBJECTS)
dtrace -G -64 -s mysqlproviders.d $(mysqld_OBJECTS)
Now, at this point, you have completed inserting the DTrace probes into MySQL, and the probes are ready to use. For example, to use the query-execute-start and query-execute-stop probes, you can write a simple D script(query-execute.d) to measure the time spending on the query execution for each session.
#!/usr/sbin/dtrace –qs
mysql*:::query-execute-start
{
self->init = timestamp;
}
mysql*:::query-execute-finish
/self->init/
{
@inittime[args[0]] = sum(timestamp – self->init);
self->init = 0;
}
profile:::tick-5s
{
printf("--------------------------------------------------\n");
printf("Date: %Y\n", walltimestamp);
printf("Query execution time\n");
printa(@inittime);
printf("--------------------------------------------------\n");
}
Now, you can execute the script to get the data for query execution:
#./query_execute.d
--------------------------------------------------
Date: 2007 May 25 19:18:59
Query execution time
149 4542802785
146 4577178817
148 4586742308
147 4602289846
--------------------------------------------------
Please let me know if you find this is useful, any suggestions on which/where probes would be useful in the MySQL server and client application. You can contact me by email: Luojia.chen@sun.com, or comment on this blog.
Resources:
- Statically Defined Tracing for User Applications chapter of DTrace manual
- More USDT enhancements
- PostgreSQL DTrace User Guide
- Open Solaris Community:DTrace
Connect/J and MySQL's Tuning Tips in SpecjAppserver2004
Sun just post SpecjAppserver2004 with a very excellent result of 712.87 SPECjAppserver 2004 JOPS@Standard using the latest SJSAS 9.0 Platform Edition on Sun Fire X4100 Cluster with MySQL 5.0.20 yesterday(Also check out Tom Daly's blog, Scott Oak's blog and Robert Lee's blog for more information)
SPECjAppserver is industry standard benchmark for performance testing J2EE application server. As part of important component of SPECjAppserver testing, database's performance and scalability also greatly affect the overall performance result in the SPECjAppserver test.
On the database side, the MySQL's performance in the SpecjAppserver test was significantly impacted by the feature of server-side prepared Statement added to MySQL 4.1 and above. According to MySQL's article on Prepared Statement, from the performance perspective, the advantages of using server-side prepared statement are:
1. It parses the query only single time, so that for the following same queries(with different parameters), it save the CPU resource to directly execute the queries without parsing.
2. It uses the new binary protocol to reduce the CPU usage on converting everything into strings before sending them across network.
However, the disadvantage of using server-side prepared statement is that there is two round-trips to the server for the prepared statements in order to gain the security benefits of prepared statement, so that it could impact performance in some workloads with simple queries executed few times. In the SPECjAppserver test, with the help from MySQL engineers(Peter Zaitsev and Mark Mathew,), we tried with the correct URL settings for Connector/J 3.1.13 to disable the server-side prepared statement and enable caching prepared statement in domain.xml as bellow:
<property
name="cachePreparedStatements" value="true"
/>
<property name="useServerPreparedStmts"
value="false" />
After these changes, we solved the CPU bottleneck on the database system by reducing the CPU utilization from 100% to 57% in the same load(Dealer Injection Rate), so that we could get much better performance result by further utilized the free CPU resource after increasing the load.
Besides the Connector/J's setting, there were a few key MySQL server's configuration and tunning in the SPECjAppserver
1. MySQL Innodb buffer
and maximum dirty page percentage size
According to Peter
Zaitsev, there was a MySQL's
bug in
the Innodb's fuzzy Checkpointing implementation. This caused a sharp
I/O spike we observed in the SPECjAppserver test after increasing the
Dealer Injection Rate, which slowed down the response time of the
Manufacturing transaction in the test. Fortunately, we could solve
this problem by tuning the "innodb_pool_buffer_size" and
"innodb_max_dirty_pages_pct" accordingly, so that the
checkpoint could write more blocks to the disk in average, and flush
less blocks at the point when the log file was full.
2. MySQL query
cache
MySQL 4.0 and later version also has a nice feature
called query cache that stores the identical SELECT queries issued by
clients to the database server. This makes it possible to locate and
re-issue the same queries without repetitive hard parsing activities.
MySQL also stores the query's result set in the query cache, which
can reduce the overhead of creating complex result sets for queries
from the disk or memory caches, reducing both physical and logical
I/O. However,in the SPECjAppserver test, we observed the qcache_hit
as 0 at runtime, which indicated none query was serviced from the
query cache. In such case, we completely turn off the query cache by
setting query_cache_type as 0 to save the CPU as well as Memory
resource on the query cache.
Scale-out MySQL Study on Solaris Nevada
HW configuration:
Server A: v20z Server(CPU: 2x2193 Mhz, Memory: 4032 MB)
Server B: x4150 Server(CPU: 8x2826 Mhz, Memory: 16384 MB)
SW configuration:
MySQL5.1.24-rc, MySQL proxy 0.61, Amoeba for MySQL 0.31, DBT2 for MySQL
OS versions:
Solaris Nevada snv_91
Test: MySQL DBT2 Test using Amoeba For MySQL
amoeba configuration and query routing rule:
#cat amoeba.xml
<?xml version="1.0" encoding="gbk"?>
<!DOCTYPE amoeba:configuration SYSTEM "amoeba.dtd">
<amoeba:configuration xmlns:amoeba="http://amoeba.meidusa.com/">
<server>
<property name="port">2066</property>
<property name="ipAddress">HostA</property>
<property name="readThreadPoolSize">600</property>
<property name="clientSideThreadPoolSize">600</property>
<property name="serverSideThreadPoolSize">600</property>
<property name="netBufferSize">300</property>
<property name="tcpNoDelay">no</property>
<property name="user">root</property>
</server>
<connectionManagerList>
<connectionManager name="defaultManager">
<className>com.meidusa.amoeba.net.AuthingableConnectionManager</className>
</connectionManager>
</connectionManagerList>
<dbServerList>
<dbServer name="HostB">
<factoryConfig>
<className>com.meidusa.amoeba.mysql.net.MysqlServerConnectionFactory</className>
<property name="manager">defaultManager</property>
<property name="port">3306</property>
<property name="ipAddress">HostB</property>
<property name="schema">dbt2</property>
<property name="user">root</property>
</factoryConfig>
<poolConfig>
<className>com.meidusa.amoeba.net.poolable.PoolableObjectPool</className>
<property name="maxActive">200</property>
<property name="maxIdle">200</property>
<property name="minIdle">10</property>
<property name="minEvictableIdleTimeMillis">600000</property>
<property name="timeBetweenEvictionRunsMillis">600000</property>
<property name="testOnBorrow">true</property>
<property name="testWhileIdle">true</property>
</poolConfig>
</dbServer>
<dbServer name="HostA">
<factoryConfig>
<className>com.meidusa.amoeba.mysql.net.MysqlServerConnectionFactory</className>
<property name="manager">defaultManager</property>
<property name="port">3306</property>
<property name="ipAddress">HostA</property>
<property name="schema">dbt2</property>
<property name="user">root</property>
</factoryConfig>
<poolConfig>
<className>com.meidusa.amoeba.net.poolable.PoolableObjectPool</className>
<property name="maxActive">200</property>
<property name="maxIdle">200</property>
<property name="minIdle">10</property>
<property name="minEvictableIdleTimeMillis">600000</property>
<property name="timeBetweenEvictionRunsMillis">600000</property>
<property name="testOnBorrow">true</property>
<property name="testWhileIdle">true</property>
</poolConfig>
</dbServer>
<dbServer name="multiPool" virtual="true">
<poolConfig>
<className>com.meidusa.amoeba.server.MultipleServerPool</className>
<property name="loadbalance">1</property>
<property name="poolNames">HostB,HostA</property>
</poolConfig>
</dbServer>
</dbServerList>
<queryRouter>
<className>com.meidusa.amoeba.mysql.parser.MysqlQueryRouter</cla
<property name="needParse">true</property>
</queryRouter>
</amoeba:configuration>
#cat rule.xml
<?xml version="1.0" encoding="gbk"?>
<!DOCTYPE amoeba:rule SYSTEM "rule.dtd">
<amoeba:rule xmlns:amoeba="http://amoeba.meidusa.com/">
<tableRule name="warehouse" schema="dbt2" defaultPools="HostA">
</tableRule>
<tableRule name="item" schema="dbt2" defaultPools="HostA">
</tableRule>
<tableRule name="customer" schema="dbt2" defaultPools="HostA">
</tableRule>
<tableRule name="district" schema="dbt2" defaultPools="HostB">
</tableRule>
<tableRule name="history" schema="dbt2" defaultPools="HostA">
</tableRule>
<tableRule name="new_order" schema="dbt2" defaultPools="HostA">
</tableRule>
<tableRule name="orders" schema="dbt2" defaultPools="HostA">
</tableRule>
<tableRule name="order_line" schema="dbt2" defaultPools="HostB">
</tableRule>
<tableRule name="stock" schema="dbt2" defaultPools="HostB">
</tableRule>
</amoeba:rule>
Test 1. number of warehouse: 3
DBT2 Test results using single MySQL vs. two MySQLs using Amoeba:
Threads#
1 mysqld(Server A)
CPU(usr/sys/idle)
Threads#
2 mysqld(Server A,B)
16
9576 TPM
80/19/1
16
6928TPM
32
9628 TPM
79/21/0
32
9310TPM
64
9574 TPM
78/20/2
64
9357TPM
128
8918 TPM
80/19/1
128
9629 TPM
256
8425 TPM
81/19/0
256
9792 TPM
512
8012 TPM
82/18/0
512
9564 TPM
CPU statistics on Server A(DBT2 app, MySQL server)
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 64 0 5 2392 2099 9095 149 1124 130 0 42926 45 30 0 25
1
60 0 1 1059 1 10658 329 1125 124 0 48039 50 23 0
26
CPU statistics on Server B(Amoeba for MySQL: bind with CPU 0-5, MySQL Server:Use CPU6,7)
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 3 0 1 10916 10413 13315 375 4122 1368 0 30541 35 38 0 28
1 4 0 3 501 0 18041 588 4673 891 0 42709 41 28 0 31
2 2 0 0 435 46 14415 380 3665 1022 0 26819 36 17 0 46
3 3 0 0 365 0 14087 362 3689 739 0 29129 39 20 0 41
4 2 0 1 319 2 11197 299 2582 793 0 19806 34 14 0 52
5 4 0 1 434 3 15149 457 3935 750 0 33424 39 22 0 39
6 3 0 0 33 0 4206 32 446 134 0 20105 22 7 0 71
7
1 0 0 37 0 4202 34 448 143 0 20119 21 7 0
71
Test
2. number of warehouse: 100
DBT2 Test results using single MySQL vs. two MySQLs using Amoeba:
Disk statistics on Server A(Single MySQL server Test)
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
95.8 3.8 28064.7 22.1 0.0 28.0 0.0 281.6 0 100 c1t1d0
extended device statistics
Disk statistics on Server A(Two MySQL servers Test)
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
62.6
3.2 32931.2 227.2 0.0 7.6 0.0 115.5 0 100 c1t1d0
Conclusion:
Use two mysql servers with amoeba proxy to divide duties can significantly improve MySQL performance when the single MySQL server encountered disk I/O bottleneck
The limitation of Amoeba proxy for MySQL is that it consumed many CPU resources and had overhead on extra network I/Os. In DBT2 benchmark test, using two MySQL servers can't achieve more performance gain when the single MySQL server encountered CPU bottleneck, however, it scale better with number of concurrent users than the single MySQL server.
