How to Capture SQL Statements with AWS RDS MySQL

Ronald Bradford
DoiT International
Published in
19 min readNov 9, 2023

--

Image from Tee11 / Shutterstock

Capturing your SQL Statement with RDS MySQL

In this article, I am going to demonstrate all native capabilities in MySQL for capturing SQL statements when using AWS Relational Database Service (RDS). Identifying SQL statements is an important design and operations principle to help understand your application usage, load, performance impacts, and configuration bottlenecks with the database.

The purpose of this article is to identify how to “capture” SQL statements information. I will not be covering the additional supporting information that you can also obtain with an SQL statement. This would include information such as a Query Execution Plan (QEP), the table structure, index cardinality, optimizer configuration, and statistics. This will be covered in a subsequent post.

SQL statement gathering

MySQL offers multiple means to collect executing or executed SQL statements. No one method is exclusively the best, and multiple methods are often needed to gather sufficient information for any detailed observability of a system and more detailed analysis of SQL statements. You can use these different methods. Each can also include multiple ways to obtain the same details.

  1. Processlist
  2. Performance Schema
  3. Sys Schema
  4. Slow query log
  5. General query log
  6. Binary log

Unless otherwise stated the mysql command-line client is used for the following example output.

NOTE: The SQL examples shown here are compatible with both MySQL 5.7 and MySQL 8.0. Due to the end-of-life (EOL) of MySQL 5.7 community edition, and the upcoming EOL of AWS RDS MySQL 5.7, all example output and documentation references use the current MySQL 8.0 version. Some output shown in this article may differ from the output produced when using MySQL 5.7.

1. SQL with the MySQL processlist

The MySQL processlist shows the currently running SQL statements. These are SQL statements that are running at the time this processlist statement is executed. For long-running queries, this is an easy approach to find SQL statements. For high frequency, low latency SQL statements, it can be very difficult to observe SQL statements via the MySQL processlist. In addition to the executing SQL statement, you can obtain details of the source user and host, the database schema used, the execution time (in seconds), and the current internal state the query is performing.

You can retrieve the MySQL processlist output via these different means.

  • performance schema
  • sys schema
  • information schema
  • SHOW statement
  • mysqladmin command

There is no specific advantage of using one of the above options over the other when the output provides the same columns of information. Some of these options were not available in older versions of MySQL and have been added for generally accepted industry standards that are valid ISO SQL statements while SHOW is not. The mysqladmin command offers different administration options and enables the combination of multiple outputs as demonstrated in the example.

Processlist via Performance Schema

When enabled, the performance_schema threads table and the processlist table provide this information.

mysql> SELECT * FROM performance_schema.processlist;
+-----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
+-----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 5 | event_scheduler | localhost | NULL | Daemon | 456 | Waiting on empty queue | NULL |
| 9 | rdsadmin | localhost | NULL | Sleep | 0 | | NULL |
| 11 | rdsadmin | localhost | NULL | Sleep | 0 | | NULL |
| 19 | rdsadmin | localhost | NULL | Sleep | 3 | | NULL |
| 23 | rdsadmin | localhost | NULL | Sleep | 65 | | NULL |
| 263 | rdsadmin | localhost | NULL | Sleep | 0 | | NULL |
| 404 | dba | 172.31.16.14:43464 | airport | Query | 0 | executing | SELECT SQL_NO_CACHE a.country_code, c.name, COUNT(*) AS cnt FROM airport a INNER JOIN country c USING (country_code) GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10 |
| 406 | dba | 172.31.16.14:43484 | airport | Query | 0 | executing | SELECT * FROM performance_schema.processlist |
+-----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Processlist via Sys Schema

The sys schema processlist and session views provide access to SQL statements and additional supporting information. These views are built on top of the performance schema and must be enabled to capture SQL via the sys schema. These views also provide additional attributes of metadata with the SQL statement. This is the example using the sys.x$processlist view.

mysql> SELECT * FROM sys.x$processlist WHERE command = 'Query'\G
*************************** 1. row ***************************
thd_id: 675
conn_id: 650
user: dba@172.31.16.14
db: airport
command: Query
state: executing
time: 0
current_statement: SELECT a.country_code, c.name, COUNT(*) AS cnt FROM airport a INNER JOIN country c USING (country_code) GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10
statement_latency: 116727428000
progress: NULL
lock_latency: 506000000
rows_examined: 0
rows_sent: 0
rows_affected: 0
tmp_tables: 1
tmp_disk_tables: 0
full_scan: YES
last_statement: NULL
last_statement_latency: NULL
current_memory: 165019
last_wait: NULL
last_wait_latency: NULL
source: NULL
trx_latency: 116250140000
trx_state: ACTIVE
trx_autocommit: YES
pid: 15730
program_name: mysql

The above statement terminator produces a vertical table format, i.e. column: value per row, rather than the traditional horizontal table format of columns and rows. The SQL statement terminator ‘\G’ is used here rather than the traditional semicolon ‘;’. This syntax is specific to the mysql client and is not supported when using other 3rd party GUI clients.

NOTE: If you receive the following error message you can safely execute the statement SET SQL_BIG_SELECTS=1; prior to the above SQL statement.

ERROR 1104 (42000) at line 1: The SELECT would examine more than 
MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1
or SET MAX_JOIN_SIZE=# if the SELECT is okay

Processlist via Information Schema

While the Information Schema is a defacto standard used in many different RDBMS products, the processlist table is non-standard. In MySQL, you can view information with the following statement.

mysql > SELECT * FROM information_schema.processlist;
+----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ID | USER | HOST | DB | COMMAND | TIME | STATE | INFO |
+----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 88 | dba | 172.31.16.14:59474 | airport | Query | 0 | executing | SELECT SQL_NO_CACHE a.country_code, c.name, COUNT(*) AS cnt FROM airport a INNER JOIN country c USING (country_code) GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10 |
| 9 | rdsadmin | localhost | NULL | Sleep | 0 | | NULL |
| 26 | rdsadmin | localhost | NULL | Sleep | 1 | | NULL |
| 11 | rdsadmin | localhost | NULL | Sleep | 0 | | NULL |
| 19 | rdsadmin | localhost | NULL | Sleep | 9 | | NULL |
| 5 | event_scheduler | localhost | NULL | Daemon | 352 | Waiting on empty queue | NULL |
| 94 | dba | 172.31.16.14:59522 | airport | Query | 0 | executing | SELECT * FROM information_schema.processlist |
| 23 | rdsadmin | localhost | NULL | Sleep | 261 | | NULL |
+----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Processlist via SHOW statement

Before introducing the Information Schema in version 5.1, and the Performance Schema in version 5.5, the primary means of capturing the processlist in MySQL was with the SHOW command.

mysql> SHOW FULL PROCESSLIST;

This output is identical to the result of SELECT * FROM performance_schema.processlist; shown above. The FULL keyword is necessary to obtain non-truncated statements.

For more information see the MySQL 8.0 Reference Manual.

Processlist via mysqladmin client

An additional MySQL client tool mysqladmin which is used to perform various administrative operations can also produce a running processlist output in tabular format.

$ mysqladmin --verbose processlist status
+-----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 5 | event_scheduler | localhost | | Daemon | 1419 | Waiting on empty queue | |
| 9 | rdsadmin | localhost | | Sleep | 1 | | |
| 11 | rdsadmin | localhost | | Sleep | 1 | | |
| 19 | rdsadmin | localhost | | Sleep | 6 | | |
| 23 | rdsadmin | localhost | | Sleep | 128 | | |
| 771 | rdsadmin | localhost | | Sleep | 3 | | |
| 830 | dba | 172.31.16.14:54000 | airport | Query | 0 | executing | SELECT a.country_code, c.name, COUNT(*) AS cnt FROM airport a INNER JOIN country c USING (country_code) GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10 |
| 832 | dba | 172.31.16.14:54026 | | Query | 0 | init | show full processlist |
+-----+-----------------+--------------------+---------+---------+------+------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
Uptime: 1421 Threads: 8 Questions: 14058 Slow queries: 1290 Opens: 261 Flush tables: 3 Open tables: 181 Queries per second avg: 9.893

If you do not specify --verbose you will receive truncated SQL statements, even for this short statement. For example:

| 786 | dba             | 172.31.16.14:59446 | airport | Query   | 0    | executing              | SELECT  a.country_code, c.name, COUNT(*) AS cnt FROM airport a INNER JOIN country c USING (country_c |

For more information see the MySQL 8.0 Reference Manual

2. SQL with the Performance Schema

When the Performance Schema is enabled and the applicable consumers are enabled, multiple tables can be used to provide detailed information on executed and executing SQL statements.

Configuration options

In native MySQL, you would enable the performance schema with the following.

mysql> SET GLOBAL performance_schema=1;

With the AWS RDS-managed service, you must use parameter groups to make parameter changes. For example:

$ modify-parameter ${PG_NAME} performance_schema 1 pending-reboot

NOTE: For an RDS instance this refers to the instance parameter group. When an instance participates as part of an AWS RDS Aurora cluster there is a cluster parameter group which is the default unless the parameter values are overridden by the instance parameter group.

When the performance schema is enabled for the first time, the MySQL instance, and thus an AWS RDS instance must be restarted.

$ aws rds reboot-db-instance --db-instance-identifier ${INSTANCE_ID}

The performance schema consumers and instruments for capturing SQL statements will be enabled by default, however, these can be modified for a running MySQL instance. You can verify the present configuration with:

mysql> SELECT * FROM performance_schema.setup_instruments WHERE NAME LIKE 'statement/%';
mysql> SELECT * FROM performance_schema.setup_consumers WHERE NAME LIKE '%statements%';

Validation of settings

The following statement can confirm if the performance schema is enabled.

mysql> SELECT @@performance_schema;
+----------------------+
| @@performance_schema |
+----------------------+
| 1 |
+----------------------+

Performance Schema processlist table

As previously demonstrated the processlist table provides basic information on currently executing statements.

Performance Schema threads table

The threads table shows additional attributes for currently executing SQL Statements.

mysql> SELECT * FROM performance_schema.threads WHERE PROCESSLIST_STATE="executing"\G
*************************** 1. row ***************************
THREAD_ID: 960
NAME: thread/sql/one_connection
TYPE: FOREGROUND
PROCESSLIST_ID: 935
PROCESSLIST_USER: dba
PROCESSLIST_HOST: 172.31.16.14
PROCESSLIST_DB: airport
PROCESSLIST_COMMAND: Query
PROCESSLIST_TIME: 0
PROCESSLIST_STATE: executing
PROCESSLIST_INFO: SELECT a.country_code, c.name, COUNT(*) AS cnt FROM airport a INNER JOIN country c USING (country_code) GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10
PARENT_THREAD_ID: NULL
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: SSL/TLS
THREAD_OS_ID: 353
RESOURCE_GROUP: NULL

Performance Schema events_statements_* tables

Where the performance schema shines is the ability to capture recently executed SQL statements, an essential tool for very fast executing SQL statements.

mysql> SELECT * FROM performance_schema.events_statements_current WHERE SQL_TEXT like '%airport%'\G
*************************** 1. row ***************************
THREAD_ID: 981
EVENT_ID: 4
END_EVENT_ID: NULL
EVENT_NAME: statement/sql/select
SOURCE: init_net_server_extension.cc:110
TIMER_START: 2471111163284000
TIMER_END: 2471111453481000
TIMER_WAIT: 290197000
LOCK_TIME: 175000000
SQL_TEXT: SELECT * FROM performance_schema.events_statements_current WHERE SQL_TEXT like '%airport%'
DIGEST: 2fdfe28204e80c382a14597873be575ea8c7ede6e4757e77700a333bbe50b025
DIGEST_TEXT: SELECT * FROM `performance_schema` . `events_statements_current` WHERE `SQL_TEXT` LIKE ?
CURRENT_SCHEMA: NULL
OBJECT_TYPE: NULL
OBJECT_SCHEMA: NULL
OBJECT_NAME: NULL
OBJECT_INSTANCE_BEGIN: NULL
MYSQL_ERRNO: 0
RETURNED_SQLSTATE: NULL
MESSAGE_TEXT: NULL
ERRORS: 0
WARNINGS: 0
ROWS_AFFECTED: 0
ROWS_SENT: 0
ROWS_EXAMINED: 0
CREATED_TMP_DISK_TABLES: 0
CREATED_TMP_TABLES: 0
SELECT_FULL_JOIN: 0
SELECT_FULL_RANGE_JOIN: 0
SELECT_RANGE: 0
SELECT_RANGE_CHECK: 0
SELECT_SCAN: 1
SORT_MERGE_PASSES: 0
SORT_RANGE: 0
SORT_ROWS: 0
SORT_SCAN: 0
NO_INDEX_USED: 1
NO_GOOD_INDEX_USED: 0
NESTING_EVENT_ID: NULL
NESTING_EVENT_TYPE: NULL
NESTING_EVENT_LEVEL: 0
STATEMENT_ID: 22413
1 row in set (0.01 sec)

Other performance tables include executed SQL statements including events_statements_current, events_statements_history, and events_statements_long. For more information see the MySQL 8.0 Reference Manual.

3. SQL with the Sys Schema

The MySQL sys schema is a set of objects and views that interpret data available in the Performance Schema and provide a more user-friendly view to the end user. This also shows executed SQL statements and aggregated statistics of executions. The first view to consider is the statement_analysis or x$statement_analysis.

mysql> select * from sys.x$statement_analysis where query like '%airport%'\G
*************************** 1. row ***************************
query: SELECT `a` . `country_code` , `c` . `name` , COUNT ( * ) AS `cnt` FROM `airport` `a` INNER JOIN `country` `c` USING ( `country_code` ) GROUP BY `country_code` , NAME ORDER BY ? DESC LIMIT ?
db: airport
full_scan: *
exec_count: 8
err_count: 0
warn_count: 0
total_latency: 1330746923000
max_latency: 304276507000
avg_latency: 166343365000
lock_latency: 1937000000
rows_sent: 80
rows_sent_avg: 10
rows_examined: 1223680
rows_examined_avg: 152960
rows_affected: 0
rows_affected_avg: 0
tmp_tables: 8
tmp_disk_tables: 0
rows_sorted: 80
sort_merge_passes: 0
digest: 616aa31e4f99b22a194c91ae9f1cfbfb95d6957621b55393b4a01603d8e1fd47
first_seen: 2023-10-12 14:55:56.021003
last_seen: 2023-10-12 15:23:14.225503

Many other sys schema tables include executed SQL statements and aggregations of subsets of SQL statements. A few examples include statements_with_full_table_scans and statements_with_temp_tables. For more information see the MySQL 8.0 Reference Manual.

4. SQL with the slow query log

When enabled, the slow query log performs exactly as the title suggests. This logs all slow queries longer than a pre-defined execution time. In older versions, this was a unit of seconds. In MySQL 5.7 and newer this is now in microseconds. The value of 0 is also valid, meaning log all SQL statements executed. However, note this is a performance overhead and an additional cost on a highly concurrent database.

Configuration options

In native MySQL you can enable the slow query log with:

mysql> SET GLOBAL slow_query_log=1;
mysql> SET GLOBAL long_query_time=0.001;
mysql> SET GLOBAL log_output=FILE; /* This is the default */

With the AWS RDS-managed service, you must use parameter groups to make parameter changes. These changes are immediate. For example:

$ modify-parameter ${PG_NAME} slow_query_log 1
$ modify-parameter ${PG_NAME} long_query_time 0.001
$ modify-parameter ${PG_NAME} log_output FILE

There are several additional configuration parameters we will not discuss including slow_query_log_file, log_queries_not_using_indexes, and log_slow_admin_statements.

NOTE: You should carefully evaluate the impact of enabling these additional slow query log options. Each option can produce large volumes of SQL statements that add to your log file size. Queries that intentionally perform a full-table scan due to a small number of rows, or frequent administrative statements essential for instrumentation and observability with monitoring tools may be logged.

Validation of settings

To confirm your slow query log is enabled and configured.

mysql> SELECT @@slow_query_log, @@long_query_time, @@log_output;
+------------------+-------------------+--------------+
| @@slow_query_log | @@long_query_time | @@log_output |
+------------------+-------------------+--------------+
| 1 | 0.001000 | FILE |
+------------------+-------------------+--------------+

For more information see the MySQL 8.0 Reference Manual.

Downloading the slow query log from AWS RDS

When enabled, the slow query log will write to a log file on your RDS Instance. You can retrieve this with the AWS CLI.

$ INSTANCE_ID=
$ aws rds download-db-log-file-portion \
--db-instance-identifier ${INSTANCE_ID} \
--starting-token 0 \
--output text \
--log-file-name slowquery/mysql-slowquery.log > slowquery.log

The RDS slow log files are rotated hourly. You can use the following commands to view all log files available for download.

$ aws rds describe-db-log-files --db-instance-identifier ${INSTANCE_ID}

{
"DescribeDBLogFiles": [
...
{
"LogFileName": "slowquery/mysql-slowquery.log",
"LastWritten": 1697125638054,
"Size": 761426
},
{
"LogFileName": "slowquery/mysql-slowquery.log.2023-10-12.14",
"LastWritten": 1697121600696,
"Size": 60517
},
{
"LogFileName": "slowquery/mysql-slowquery.log.2023-10-12.15",
"LastWritten": 1697122801087,
"Size": 343085
}
]
}

Slow query log available information

An individual SQL Statement in the slow query log will include the following information:

  • Execution time
  • User and Host
  • Thread Id
  • Query time
  • Lock time
  • Rows sent
  • Rows examined
  • Full SQL statement

Slow query log example output

# Time: 2023-10-05T15:56:58.831165Z
# User@Host: dba[dba] @ [172.31.16.14] Id: 295
# Query_time: 0.259097 Lock_time: 0.000178 Rows_sent: 10 Rows_examined: 76733
SET timestamp=1696521418;
SELECT a.country_code, c.name, COUNT(*) AS cnt
FROM airport a
INNER JOIN country c USING (country_code)
GROUP BY country_code, name
ORDER BY 3 DESC
LIMIT 10;

NOTE: The slow query log is well structured with all non-commented lines being actual SQL statements, and these statements are correctly terminated, allowing for a very easy way to collect, log, and execute these statements generally. The only limitation is when tables are not qualified with the schema, knowledge of your table's database schema is required.

Slow query log via SQL table

While not generally recommended it is possible to collect slow queries via an SQL statement when the slow query log is enabled and using the configuration setting of log_output=TABLE.

SQL gathering of slow query log information

The following SQL statement can retrieve the queries that have been logged to the mysql.slow_log table.

# Table does not have a PK, so this is a crud hack
SELECT sl.*,
CONVERT(q.sql_text USING utf8mb4)AS query
FROM mysql.slow_log sl
INNER JOIN mysql.slow_log q USING (start_time)
WHERE CONVERT(q.sql_text USING utf8mb4) like '%airport%'\G

Slow query log via table example output

    start_time: 2023-10-05 15:59:30.878226
user_host: dba[dba] @ [172.31.16.14]
query_time: 00:00:00.295653
lock_time: 00:00:00.000164
rows_sent: 10
rows_examined: 76733
db: airport
last_insert_id: 0
insert_id: 0
server_id: 1843089174
sql_text: 0x53454C45435420612E636F756E7472795F636F64652C20632E6E616D652C20434F554E54282A2920415320636E742046524F4D20616972706F72742061202020494E4E4552204A4F494E20636F756E7472792063205553494E472028636F756E7472795F636F6465292047524F555020425920636F756E7472795F636F64652C206E616D65204F5244455220425920332044455343204C494D4954203130
thread_id: 295
query: SELECT a.country_code, c.name, COUNT(*) AS cnt FROM airport a INNER JOIN country c USING (country_code) GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10

5. SQL with the general query log

When configured, MySQL can log all SQL statements to the general query log.

NOTE: You should carefully consider which environment to enable the general query log and for what duration. The following example is of a 100% idle system and the 12 additional SQL statements executed within the same second as our example SQL statement.

WARNING: As with the slow query log it is possible to log this output to a MySQL table. This is discouraged as it can have a significant impact on the performance of a running system.

Configuration options

In native MySQL you can enable the general query log with:

mysql> SET GLOBAL general_log=1;
mysql> SET GLOBAL log_output=FILE; /* This is the default */

With the AWS RDS-managed service, you must use parameter groups to make parameter changes. These changes are immediate. For example:

$ modify-parameter ${PG_NAME} general_log 1
$ modify-parameter ${PG_NAME} log_output FILE

There is one additional configuration parameter we will not discuss here. that is the general_log_file variable.

Validation of settings

To confirm your general query log is enabled and configured.

mysql> SELECT @@general_log, @@log_output, @@general_log_file;
+---------------+--------------+------------------------------------------+
| @@general_log | @@log_output | @@general_log_file |
+---------------+--------------+------------------------------------------+
| 1 | FILE | /rdsdbdata/log/general/mysql-general.log |
+---------------+--------------+------------------------------------------+

For more information see the MySQL 8.0 Reference Manual.

Downloading the general query log from AWS RDS

When enabled, the general query log will write to a log file on your RDS Instance. You can retrieve this with the AWS CLI.

$ INSTANCE_ID=
$ aws rds download-db-log-file-portion \
--db-instance-identifier ${INSTANCE_ID} \
--starting-token 0 \
--output text \
--log-file-name general/mysql-general.log > general.log

As mentioned in the prior section you can use the argument describe-db-log-files to list general query log files that have been rotated.

General query log example output

2023-10-05T17:31:06.006853Z         9 Query     set local oscar_local_only_replica_host_status=1;
2023-10-05T17:31:06.006963Z 9 Query SELECT durable_lsn, current_read_point, server_id, last_update_timestamp FROM information_schema.replica_host_status;
2023-10-05T17:31:06.007500Z 9 Query set local oscar_local_only_replica_host_status=0
2023-10-05T17:31:06.836552Z 540 Query SELECT
a.country_code, c.name, COUNT(*) AS cnt
FROM airport a
INNER JOIN country c USING (country_code)
GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10
2023-10-05T17:31:06.932194Z 9 Query set local oscar_local_only_replica_host_status=1;
2023-10-05T17:31:06.932368Z 9 Query SELECT durable_lsn, current_read_point, server_id, last_update_timestamp FROM information_schema.replica_host_status;
2023-10-05T17:31:06.933104Z 9 Query set local oscar_local_only_replica_host_status=0
2023-10-05T17:31:06.935817Z 11 Query SET @@sql_log_bin=off
2023-10-05T17:31:06.936615Z 11 Query select @@session.transaction_read_only
2023-10-05T17:31:06.937118Z 11 Query INSERT INTO mysql.rds_heartbeat2(id, value) values (1,1696527066936) ON DUPLICATE KEY UPDATE value = 1696527066936
2023-10-05T17:31:06.947853Z 11 Query select @@session.transaction_read_only
2023-10-05T17:31:06.948293Z 11 Query COMMIT
2023-10-05T17:31:06.948725Z 11 Query SET @@sql_log_bin=on

NOTE: The general query log is semi-structured. SQL statements can span multiple lines, preserving the format given to MySQL from your client program. It is not sufficient to strip the first 41 characters from each line to gather SQL statements.

For more information see the MySQL 8.0 Reference Manual.

6. SQL with the binary log

Often overlooked, the MySQL binary log when enabled is a wealth of SQL statement information for SQL that performs writes to your database. The binary log can be critical in identifying transaction boundaries and the sequence of SQL statements within a transaction to assist when reviewing locking. The binary log also provides the full column details of INSERT, UPDATE, and DELETE statements.

Any RDS instance configuration that uses read replicas has binary logging enabled by default.

Validation of settings

To confirm your binary log is enabled and configured.

mysql> SELECT @@log_bin, @@binlog_format, @@log_statements_unsafe_for_binlog, @@log_bin_trust_function_creators;
+-----------+-----------------+------------------------------------+-----------------------------------+
| @@log_bin | @@binlog_format | @@log_statements_unsafe_for_binlog | @@log_bin_trust_function_creators |
+-----------+-----------------+------------------------------------+-----------------------------------+
| 0 | ROW | 1 | 0 |
+-----------+-----------------+------------------------------------+-----------------------------------+

You can identify the binary logs that are available on your MySQL instance with:

mysql> SHOW BINARY LOGS;

You can download the binary logs using the mysqlbinlog command. For example.

$ TMP_DIR=${TMP_DIR:-/tmp}
$ AUTHENTICATION="-h<host> -u<user> -p<password"
$ BINLOG_FILE="binlog.00042"
mysqlbinlog --read-from-remote-server --raw --result-file=${TMP_DIR}/ ${AUTHENTICATION} ${BINLOG_FILE} 2>/dev/null # Kill warning message and not mess with $?

You can view the contents of the binary log with the mysqlbinlog command. Depending on your configuration options some arguments are more applicable, however, this is generally best for all formats:

$ mysqlbinlog -vvv --base64-output=DECODE-ROWS ${TMP_DIR}/${BINLOG_FILE}

Analysis of the binary log is a more detailed process that we will not discuss here. This analysis also depends on the binary log row format configuration variable value. You can obtain table summary information for a given file, or period with one or more files using additional arguments. This snippet provides a table summary when binlog_format=ROW is used.

mysqlbinlog ${BINLOG_ADDITIONAL_ARGS} ${TMP_DIR}/${BINLOG_FILE}  | \
sed -e "s/^### //;" | \
sed -e "/^#/d;s/\/\*.*\*\/[;]//;/^$/d" | \
cut -c1-100 | \
tr '[A-Z]' '[a-z]' | \
egrep "^(insert|update|delete|replace|commit|alter|drop|create)" | \
sed -e "s/\t/ /g;s/\`//g;s/(.*$//;s/ set .*$//;s/ as .*$//;s/ join .*$//;s/ values .*$//;" | sed -e "s/ where .*$//;s/ignore //g;s/ inner//g;s/ left//g;s/ right//g;s/ from//g;s/ into//g" | \
sed -e "s/or replace//;s/sql security definer//;s/algorithm=.* //;s/definer=.* //g" | \
sed -e "s/ \w.*\.\*//" | \
awk '{ print $1,$2 }' | \
sort | uniq -c | sort -nr

You can obtain a copy of the full script that includes downloading and processing a binary log file from this Github gist.

For more information see the MySQL 8.0 Reference Manual.

Summary of SQL collection options

Here are the primary benefits and limitations of each option. The best approach to determine what option is right for your needs is to try each one using the steps outlined in this article.

Processlist benefits

  • Enabled by default

Processlist limitations

  • Only captures executing SQL statements in-flight
  • Required elevated privileges to see all queries by all users

Performance Schema benefits

  • Captures previously executed statements in addition to currently executing statements
  • Provides aggregated results which is ideal for high-frequency queries

Performance Schema limitations

  • Must be enabled. requires a restart for the first time activation
  • Needs tuning in highly performant systems to limit resource overhead

Sys Schema benefits

  • Extends performance schema with highly usable views and objects that are very well-named and for specific purposes

Sys Schema limitations

  • Requires performance schema to be enabled

Slow query log benefits

  • Captures all slow queries that exceed a pre-defined execution time

Slow query log limitations

  • Not really any limitations. This should always be enabled with a suitable threshold in all MySQL instances.

General query log benefits

  • Provides a complete sequential list of all SQL statements. This is ideal for transaction analysis and assessment of data access antipatterns.

General query log limitations

  • Significant overhead for high-usage production systems. It is recommended you use this with a test suite, ideally using a single thread.

Binary log benefits

  • Generally always enabled to ensure HA and PITR capabilities

Binary log limitations

  • Only captures writes when enabled
  • More complicated format to process capturing full SQL statements

Conclusion

There are multiple native methods using MySQL client tools to collect SQL statements from MySQL running on AWS RDS. There are additional tools and techniques that can also collect SQL statements in MySQL that will not be discussed in this article.

The collection of an SQL statement is the first essential step in gathering sufficient information from your application for data access patterns. This can be critical information for architectural design, testing, and performance purposes.

In a subsequent article, I will be discussing the collection of supporting metadata that is necessary with any SQL statements for performance analysis. This includes the Query Execution Plan (QEP), table and index metadata, column distribution statistics, and specific configuration options that can impact the execution of an SQL statement.

Appendix

Example data used

For this article, we used the following airport dataset available at https://github.com/ronaldbradford/data/tree/main/mysql-data/airport

Example SQL used

For each method of collection, the following example SQL statement was used.

SELECT a.country_code, c.name, COUNT(*) AS cnt
FROM airport a
INNER JOIN country c USING (country_code)
GROUP BY country_code, name ORDER BY 3 DESC LIMIT 10;

This SQL statement will return a set of results like:

+--------------+----------------+-------+
| country_code | name | cnt |
+--------------+----------------+-------+
| US | United States | 30581 |
| BR | Brazil | 6849 |
| JP | Japan | 3430 |
| CA | Canada | 3075 |
| AU | Australia | 2576 |
| MX | Mexico | 2288 |
| RU | Russia | 1556 |
| KR | South Korea | 1400 |
| GB | United Kingdom | 1398 |
| DE | Germany | 1042 |
+--------------+----------------+-------+
10 rows in set (0.47 sec)

NOTE: The actual counts of airports and the ordering of countries may vary depending on when you retrieved the dataset.

SQL collection

As this is a test system with minimal usage some statements retrieved all possible rows, and others were restricted with a WHERE clause applicable to capture this specific example SQL statement. When capturing SQL in your system you may need to use applicable filtering qualifications to limit the output of statements.

Bash modify-parameter helper function

The following function is used in this article to simplify the syntax of modifying AWS RDS parameter group variables.

modify-parameter() {
[[ $# -lt 3 ]] || [[ $# -gt 4 ]] && echo "ERROR: <parameter-group> <parameter> <value> [pending-reboot]" && return 1

local PARAMETER_GROUP="$1"
local PARAMETER_NAME="$2"
local PARAMETER_VALUE="$3"
local APPLY_METHOD="immediate"

[[ $# -eq 4 ]] && APPLY_METHOD="$4"
aws rds modify-db-parameter-group --db-parameter-group-name ${PARAMETER_GROUP} --parameters "ParameterName=${PARAMETER_NAME},ParameterValue='${PARAMETER_VALUE}',ApplyMethod=${APPLY_METHOD}"
aws rds describe-db-parameters --db-parameter-group-name ${PARAMETER_GROUP} --query 'Parameters[] | [?ParameterName == `'${PARAMETER_NAME}'`]'
}

For more information see the AWS CLI command reference for modify-db-parameter-group and describe-db-parameters syntax.

SQL identification tip

MySQL enables you to embed a comment into an SQL statement however not all tools capture and log that information. It is a good practice for engineering to create some sort of unique label for every SQL statement. The value in the comment is not important, its uniqueness is. The ability to see a query with a comment and locate that in the source code reduces the time for identification and mean-time-to-resolve (MTTR) any issues.

SELECT /* Countries with the most airports */  
a.country_code, c.name, COUNT(*) AS cnt
FROM airport a
INNER JOIN country c USING (country_code)
GROUP BY country_code, name
ORDER BY 3 DESC LIMIT 10;

For more information see the MySQL 8.0 Reference Manual.

--

--

Senior Cloud Data Architect at DoiT specializing in MySQL, RDS, Aurora, AWS and other open source products.