V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
V2EX 提问指南
jackLoveDota
V2EX  ›  问与答

aws mysql 数据同步到 clickhouse cloud 的问题求助

  •  
  •   jackLoveDota · 6 小时 9 分钟前 · 130 次点击

    昨天看有 v 友推荐 seatunnel ,仔细看了一下官网文档,内网的 mysql 同步至 clickhouse 完全正常,但是配置成 clickhouse cloud 的 https 连接地址就连接失败,查看错误日志,一直是使用 http 协议,没办法切换到 https ,clickhouse 的端口是 8443 ,貌似也没办法切换到 https 的协议,请问有哪一项配置可以切换协议吗

    2 条回复
    jackLoveDota
        1
    jackLoveDota  
    OP
       4 小时 33 分钟前
    ```
    env {
    execution.parallelism = 8
    job.mode = "STREAMING"
    # 检查点配置
    checkpoint.interval = 60000
    # 支持多次运行
    restart.strategy = "fixed-delay"
    restart.attempts = 3
    }
    source {
    # MySQL 源表配置
    MySQL-CDC {
    result_table_name = "mysql_source"
    server-id = 5400-5408
    hostname = "localhost"
    port = 3306
    username = "root"
    password = "root"
    database-name = "abc"
    # t_test 每天数据量 1000w ,按照天为纬度分表,这里合并到 clickhouse ,聚合成一张表,方便统计,是否合理?
    table-names = ["abc.t_test","abc.t_test_*"]
    base-url = "jdbc:mysql://localhost:3306/abc"
    # 需要全量+增量
    startup.mode = "INITIAL"
    driver = "com.mysql.cj.jdbc.Driver"
    # CDC 配置
    monitor.interval = 1000
    chunk.size = 32768
    exactly-once = true
    # 并行读取配置
    split.size = 50000
    split.even-distribution.factor.lower-bound = 0.05
    split.even-distribution.factor.upper-bound = 0.95
    # 并行读取优化
    connection.pool.size = 8 # 增加连接池大小
    fetch.size = 10000 # 增加每次获取的记录数

    # 启用批量模式
    enable.batch = true
    batch.size = 10000

    # 记录同步位点
    offset.storage = "filesystem"
    offset.storage.path = "/tmp/seatunnel/offset"
    }
    }
    transform {
    Sql {
    query = """
    SELECT
    id,
    CASE
    WHEN statistic_finished = 1 THEN 1
    ELSE 0
    END as statistic_finished,
    ip,
    created_at,
    updated_at
    FROM mysql_source
    """
    }
    }
    sink {
    Clickhouse {
    # ClickHouse Cloud 连接配置
    host = "xxx.clickhouse.cloud:8443"
    database = "local_test"
    table = "t_test"
    username = "default"
    password = "123456"
    # 表操作配置
    primary_key = "id"
    # 针对公网延迟优化的写入配置
    bulk_size = 5000 # 考虑网络延迟,稍微减小批量
    flush_interval = 5000 # 增加刷新间隔
    # 重试配置
    retry_codes = [429, 500, 503]
    max_retries = 5
    retry_interval = 10000
    # 超时配置
    connect_timeout = 60000
    socket_timeout = 300000
    # 连接池配置
    connection_pool {
    max_size = 16
    core_size = 8
    min_evictable_idle_time_millis = 300000
    }
    # 写入优化
    enable_partition = true
    partition_strategy = "balanced"
    # 压缩配置
    compression = true
    compression_type = "gzip"
    # 时区设置
    server_time_zone = "UTC"
    }
    }
    ```
    jackLoveDota
        2
    jackLoveDota  
    OP
       4 小时 32 分钟前
    这是脱敏后的配置文件,有那么用过的能帮忙解决下,解决后留地址,打 100u 给大佬买咖啡,谢谢了
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   922 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 21ms · UTC 19:48 · PVG 03:48 · LAX 11:48 · JFK 14:48
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.