跨 PostgreSQL 实例移动表 ylc3000 2025-11-13 0 浏览 0 点赞 长文 # Moving tables across PostgreSQL instances / 跨 PostgreSQL 实例移动表 **02 Nov 2025** --- At work, we recently had to move a few tables from one PostgreSQL instance to another. In my previous [post](https://ananthakumaran.in/2025/06/11/postgresql-upgrade.html), I discussed how to use Google’s Database Migration Service ([DMS](https://cloud.google.com/database-migration/docs)) to migrate data from one instance to another. Unfortunately, that option was not available here, since DMS only allows the migration of an entire database, not specific tables within a database. 在工作中,我们最近需要将几个表从一个 PostgreSQL 实例移动到另一个实例。在我之前的[文章](https://ananthakumaran.in/2025/06/11/postgresql-upgrade.html)中,我讨论了如何使用谷歌的数据库迁移服务([DMS](https://cloud.google.com/database-migration/docs))将数据从一个实例迁移到另一个实例。不幸的是,这个选项在这里不可用,因为 DMS 只允许迁移整个数据库,而不支持数据库内的特定表。 We chose the native logical replication option. It’s a much more involved process compared to using DMS, but it provides greater flexibility and allows replication of specific tables only. 我们选择了原生的逻辑复制选项。与使用 DMS 相比,这是一个更复杂的过程,但它提供了更大的灵活性,并且只允许复制特定的表。 ### Grant access to user accounts / 授予用户帐户访问权限 Let’s assume you already have both the source and destination PostgreSQL instances ready. You need to grant replication access to the user accounts on both the source and destination databases. In the case of Cloud SQL, we had to grant the `REPLICATION` role to the user account; this may vary for other instances. 假设您已经准备好了源和目标 PostgreSQL 实例。您需要在源和目标数据库上为用户帐户授予复制权限。在 Cloud SQL 的情况下,我们必须为用户帐户授予 `REPLICATION` 角色;这在其他实例中可能会有所不同。 ```sql ALTER USER "sql" with REPLICATION -- run \du to verify ``` ### Copy over schema / 复制 Schema The next step is to copy over the schema. The table schema needs to be identical on both the source and destination. We used `pg_dump` to dump and restore the schema, as we had to move more than 50 tables. 下一步是复制 schema。源和目标实例上的表结构需要完全相同。由于我们需要移动超过50个表,我们使用 `pg_dump` 来导出和恢复 schema。 There are a few nuances related to constraints as well, which we will cover soon. Before that, let’s try to understand a bit more about how logical replication works. 关于约束也有一些微妙之处,我们很快会讨论。在此之前,让我们先多了解一下逻辑复制的工作原理。 Logical replication runs in two modes: 逻辑复制以两种模式运行: 1) **Initial dump:** It copies the data from the source to the destination. **初始转储:** 它将数据从源复制到目标。 2) **CDC:** Once the initial dump is done, it switches to CDC mode and applies changes to the destination in real time. **CDC(变更数据捕获):** 初始转储完成后,它会切换到 CDC 模式,并实时将变更应用到目标。 The key point here is that during the initial dump, some constraints can’t be enforced. For example, if you have a foreign key relationship between two tables, dumping one of the tables will throw an error if the referenced column hasn’t been dumped yet. 这里的关键点是,在初始转储期间,某些约束无法强制执行。例如,如果您在两个表之间有外键关系,如果引用的列尚未转储,那么转储其中一个表将会抛出错误。 To solve this problem and to speed up the initial dump process, we first create tables without constraints and indexes. Indexes slow down the dump stage, and it’s easier to rebuild them once the initial dump is complete. 为了解决这个问题并加快初始转储过程,我们首先创建没有约束和索引的表。索引会减慢转储阶段的速度,并且在初始转储完成后重建它们会更容易。 `pg_dump` provides a useful flag to dump only the table definitions without indexes and constraints. `pg_dump` 提供了一个有用的标志,可以仅转储不带索引和约束的表定义。 ```sh pg_dump URL --no-owner --no-acl --section=pre-data -s \ -t users \ -t events \ > /tmp/pre-data.sql ``` The `--section` flag allows us to control what is dumped. `pre-data` dumps only the table definitions, while `post-data` dumps all constraints and indexes. `--section` 标志允许我们控制转储的内容。`pre-data` 仅转储表定义,而 `post-data` 转储所有约束和索引。 ```sh pg_dump URL --no-owner --no-acl --section=post-data -s \ -t users \ -t events \ > /tmp/post-data.sql ``` #### Restore table definition / 恢复表定义 ```sh psql URL --echo-all --single-transaction -v ON_ERROR_STOP=1 -f /tmp/pre-data.sql ``` There’s one more catch here: logical replication depends on the primary key, so you need to create the primary key constraint in addition to the table definition. However, primary key constraints are in the `post-data.sql` file. Open it in an editor and remove everything except the primary key constraints. 这里还有一个问题:逻辑复制依赖于主键,所以除了表定义之外,您还需要创建主键约束。然而,主键约束位于 `post-data.sql` 文件中。在编辑器中打开它,并删除除主键约束之外的所有内容。 #### Restore primary key constrain / 恢复主键约束 ```sh psql URL --echo-all --single-transaction -v ON_ERROR_STOP=1 -f /tmp/primary-key-only.sql ``` #### Functions and others / 函数及其他 If you use PostgreSQL functions, enums, or anything that isn’t covered by `pg_dump`, you might have to handle those manually. When you specify the `-t table` option, it only copies objects directly related to the tables. Enums and functions don’t fall under that. 如果您使用 PostgreSQL 函数、枚举或任何未被 `pg_dump` 覆盖的内容,您可能需要手动处理它们。当您指定 `-t table` 选项时,它只复制与表直接相关的对象。枚举和函数不在此范围内。 ### Set up publication and subscription / 设置发布和订阅 At this point, the table structure should be identical on both instances for the tables that are going to be migrated. The publication should be created on the source instance. 此时,对于要迁移的表,两个实例上的表结构应该是相同的。发布(publication)应在源实例上创建。 ```sql CREATE PUBLICATION migration_publication FOR TABLE users, events ``` A corresponding subscription needs to be created on the destination instance. 相应的订阅(subscription)需要在目标实例上创建。 ```sql CREATE SUBSCRIPTION migration_subscription CONNECTION 'host={IP} port=5432 user={USER} password={SECRET} dbname={DBNAME} sslmode=require' PUBLICATION migration_publication ``` Replace `{variable}` with the respective values. If the verify CA option is enabled on the source instance, you need to disable it and enable only SSL mode. 请将 `{variable}` 替换为相应的值。如果源实例上启用了验证 CA 选项,您需要禁用它并仅启用 SSL 模式。 Once the subscription is created, PostgreSQL starts copying data from the source to the destination instance. It performs an initial data dump for each table and then switches to CDC mode. 一旦订阅创建成功,PostgreSQL 就会开始将数据从源实例复制到目标实例。它会对每个表执行初始数据转储,然后切换到 CDC 模式。 You need to wait until the initial dump is complete and it moves to the CDC state with near-zero lag. PostgreSQL exposes this information through several tables such as `pg_replication_slots`, `pg_stat_replication`, and `pg_stat_subscription`. 您需要等到初始转储完成,并且它进入延迟接近零的 CDC 状态。PostgreSQL 通过 `pg_replication_slots`、`pg_stat_replication` 和 `pg_stat_subscription` 等几个表来公开这些信息。 ```sql SELECT slot_name, confirmed_flush_lsn, pg_current_wal_lsn(), (pg_current_wal_lsn() - confirmed_flush_lsn) AS lsn_distance FROM pg_replication_slots ``` ```sql SELECT subscription_name, active, pg_size_pretty(pg_current_wal_lsn() - replay_lsn) AS lag_bytes FROM pg_stat_subscription ``` ```sql SELECT relid::regclass AS table_name, srel.srsubstate AS replication_state, CASE srel.srsubstate WHEN 'i' THEN 'Initializing' WHEN 'd' THEN 'Initial Dump' WHEN 's' THEN 'Synchronized' WHEN 'r' THEN 'Replicating' ELSE 'Unknown' END AS state_description FROM pg_subscription sub JOIN pg_subscription_rel srel ON sub.oid = srel.srsubid ORDER BY table_name ``` ### Add indexes and foreign keys / 添加索引和外键 Once the replication moves to the CDC state, you can create indexes and foreign keys. The `post-data.sql` file contains all indexes and foreign keys. Remove the primary key constraints and keep the rest. 一旦复制进入 CDC 状态,您就可以创建索引和外键了。`post-data.sql` 文件包含所有索引和外键。移除主键约束并保留其余部分。 ```sh psql URL --echo-all --single-transaction -v ON_ERROR_STOP=1 -f /tmp/indexes.sql ``` This will take quite some time if you have a lot of data. 如果您的数据量很大,这将花费相当长的时间。 ### Analyze / 分析 `analyze` is one of the most often overlooked steps when moving or upgrading PostgreSQL instances. PostgreSQL depends on the statistics generated by this command to create an efficient query plan. Without these statistics, it might choose an inefficient plan, and a query that used to take 50 ms could turn into a 1-second query on your new instance. So make sure to run `analyze` once the indexes are created. If you have more time, you can also run a `vacuum` as well. 在移动或升级 PostgreSQL 实例时,`analyze` 是最常被忽视的步骤之一。PostgreSQL 依赖此命令生成的统计信息来创建高效的查询计划。如果没有这些统计信息,它可能会选择一个低效的计划,一个曾经需要 50 毫秒的查询在您的新实例上可能会变成一个 1 秒的查询。所以请确保在创建索引后运行 `analyze`。如果您有更多时间,也可以运行 `vacuum`。 ```sql -- start with analyze (will be done faster compared to vacuum) analyze (verbose, BUFFER_USAGE_LIMIT '64MB') -- vacuum analyze vacuum (verbose, analyze) ``` ### Switchover / 切换 #### Sequence / 序列 Your data is now available on both systems, and you are nearly ready to switch your traffic from the source to the destination. 现在您的数据在两个系统上都可用,您几乎可以准备好将流量从源切换到目标了。 PostgreSQL copies the data and keeps the indexes in sync, but it doesn’t sync the sequences. You have to do that manually. PostgreSQL 会复制数据并保持索引同步,但它不会同步序列(sequences)。您必须手动进行同步。 ```sql -- view current value SELECT s.schemaname, s.sequencename, s.last_value FROM pg_sequences s WHERE s.sequencename in ('users_id_seq', 'events_id_seq'); ``` You can view the current value on the source instance, and then set it to a higher value on the destination instance. 您可以在源实例上查看当前值,然后在目标实例上将其设置为一个更高的值。 ```sql SELECT 'SELECT setval(' || quote_literal(s.schemaname || '.' || s.sequencename) || ', ' || CASE WHEN s.sequencename = 'users_id_seq' THEN s.last_value + 100 ELSE s.last_value + 10000 END || ', true);' FROM pg_sequences s WHERE s.sequencename IN ( 'events_id_seq', 'users_id_seq' ); ``` You can run the snippet above on the source instance; the output is a SQL query that you can run on the destination instance. The buffer value is up to you. The key point is that after running the query on the destination instance, you need to perform the switchover before the sequence values on the source instance exceed those on the destination instance. 您可以在源实例上运行上面的代码片段;其输出是一个 SQL 查询,您可以在目标实例上运行它。缓冲值由您决定。关键点是,在目标实例上运行查询后,您需要在源实例上的序列值超过目标实例上的序列值之前执行切换。 #### Disable writes / 禁用写入 Once the sequence is updated, stop sending writes to the source PostgreSQL instance. Wait for the replication lag to reach zero, and then switch all writes to the destination PostgreSQL instance. You can monitor the replication lag using `pg_replication_slots`. 序列更新后,停止向源 PostgreSQL 实例发送写入请求。等待复制延迟降至零,然后将所有写入切换到目标 PostgreSQL 实例。您可以使用 `pg_replication_slots` 来监控复制延迟。 #### PgBouncer The amount of downtime depends on how your app is architected, whether your application can run in read-only mode, and other factors. [PgBouncer](https://www.pgbouncer.org/) can help significantly in this regard, and it’s what we used to achieve near-zero downtime. 停机时间的长短取决于您的应用程序架构、应用程序是否可以在只读模式下运行以及其他因素。[PgBouncer](https://www.pgbouncer.org/) 在这方面可以提供很大帮助,我们就是用它来实现近乎零停机的。 PgBouncer is a PostgreSQL proxy. The key feature relevant to our situation is that it allows configuration changes without requiring a restart. Assume you have a database named `myapp` that’s configured to connect to the source instance. You can edit the PgBouncer config file to update the connection details to point to the destination instance. Then, connect to the PgBouncer [admin console](https://www.pgbouncer.org/usage.html#admin-console) and run the following commands. PgBouncer 是一个 PostgreSQL 代理。与我们情况相关的关键特性是它允许在不重启的情况下更改配置。假设您有一个名为 `myapp` 的数据库,它被配置为连接到源实例。您可以编辑 PgBouncer 配置文件,更新连接详细信息以指向目标实例。然后,连接到 PgBouncer 的[管理控制台](https://www.pgbouncer.org/usage.html#admin-console)并运行以下命令。 ```plaintext pgbouncer> PAUSE myapp; pgbouncer> RELOAD; pgbouncer> SHOW DATABASES; pgbouncer> RESUME myapp; ``` The first command pauses all connections from PgBouncer to the source PostgreSQL instance. The command blocks until all in-flight queries are completed, and new queries are queued. `RELOAD` reloads the configuration from disk. You can run `SHOW DATABASES` to quickly verify that the new configuration has been loaded. `RESUME` then resumes connectivity, now to the new destination PostgreSQL instance. 第一个命令会暂停从 PgBouncer到源 PostgreSQL 实例的所有连接。该命令会阻塞,直到所有正在进行的查询完成,并且新的查询被排队。`RELOAD` 从磁盘重新加载配置。您可以运行 `SHOW DATABASES` 来快速验证新配置是否已加载。`RESUME` 然后恢复连接,现在连接到新的目标 PostgreSQL 实例。 If you don’t have any long-running queries, this process can result in near-zero downtime, as no queries are dropped. As long as `RESUME` is executed quickly, users will notice at most a slight increase in latency. 如果您没有任何长时间运行的查询,这个过程可以实现近乎零的停机时间,因为没有查询被丢弃。只要 `RESUME` 被快速执行,用户最多只会注意到延迟略有增加。 You can use the following query to check for long-running queries and terminate them if needed. 您可以使用以下查询来检查长时间运行的查询,并根据需要终止它们。 ```sql SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '2 minutes'; ``` In the worst-case scenario, if the `PAUSE` command hangs because of a long-running query, you can forcefully restart PgBouncer. However, this will result in errors for any active connections. 在最坏的情况下,如果 `PAUSE` 命令因为一个长时间运行的查询而挂起,您可以强制重启 PgBouncer。然而,这将导致任何活动连接出错。 ### Cleanup / 清理 Once you’re confident everything is working correctly, you can clean up the logical replication setup. 一旦您确信一切正常,就可以清理逻辑复制设置了。 Drop the subscription on the destination instance. 在目标实例上删除订阅。 ```sql DROP SUBSCRIPTION migration_subscription ``` Drop the publication on the source instance. 在源实例上删除发布。 ```sql DROP PUBLICATION migration_publication `` 网闻录 跨 PostgreSQL 实例移动表