V2EX = way to explore
V2EX 是一个关于分享和探索的地方
Sign Up Now
For Existing Member  Sign In
kimibob
V2EX  ›  问与答

请教一下 Spark 中如何将多个 Spark ml 模型应用到单个 Dataset/DataFrame 的每个分区中,实现一次运行训练多个模型?

  •  
  •   kimibob · Aug 27, 2021 · 1203 views
    This topic created in 1707 days ago, the information mentioned may be changed or developed.

    类似于如下的操作,根据 key 分组,对每个分组应用 mllib 里的算法训练一个模型

    val input = spark.read.load(..)
    val models = input
      .groupByKey(x => x.age)
      .mapGroups{
        (k, v) => 
          val subset = v.toList.toDS
          someModel.fit(subset)
      }
      
    

    mllib 的算法好像需要接收 rdd 类型数据,但分组后的数据是 Iterable ?

    No Comments Yet
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2436 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 26ms · UTC 05:10 · PVG 13:10 · LAX 22:10 · JFK 01:10
    ♥ Do have faith in what you're doing.