答案由jeroenstiers

jeroenstiers — 2017年1月27日星期五18:59:19 GMT

< p>

你在比赛中表现如何?如果您将这个B文件存储在数据库中，并且您可以使用joiner来执行匹配，那么速度可能会提高很多!

答案由mark2atsafe亚搏在线

mark2at亚搏在线safe — (北京时间2017年1月26日星期四20:33:20 GMT

< p >嗯…我认为这部分取决于工作区在做什么。您说它“将A中的变量与另一个静态文件(文件B)匹配”——这是否意味着属性键上的联接?也许和功能记忆格一起?有多少特征呢?A的特征比B少吗?A中的1,000个特征与B中的1,000,000个特征相匹配。

如果是这样的话，您可以用Joiner transformer替换FeatureMerger。您必须将B放入某种数据库格式中，即使只是临时的，但这样您就不必从B读取所有1,000,000个特性来匹配其中的1,000个。

类似地，您可以使用FeatureReader或SQLExecutor或任何其他能够从数据集B检索所选特性的转换器。因此，这是一种技术:读取相同次数的B，但减少需要从中读取的数据量。

这是性能的第一定律，顺便说一下。 Performance is defined as "useful work" carried out, so if you're reading data that isn't used, it's not useful work!

Anyway, to read B only once, then I think you would need to just have one run of the workspace, and load all of the data from all of the A files at one time. Basically one big process. How many features are in each A file? Are they non-spatial or very complex geometry? That will be the big issue.

I'm assuming you have your current setup - to read each A file separately - because combined there is too much data. But if not, and you are only doing it as a quick way to batch process data, there are other solutions. Considering reading all of the A files at once, then using a Dataset Fanout on the output to split them back out again. Group-by parameters in transformers can help keep each A file separate if necessary.

So that's a second technique: reading all of the A files at once, so you only need to read B once.

Finally, and this is where my imagination starts to exceed my knowledge, I wonder if you can set up a continuously running workspace. That workspace reads file B and then receives features from all of the file As one at a time. It keeps running as long as you are passing file A features to it.

We do have transformers like that - TCPIPReceiver, SQSReceiver, etc. I'm partly unsure because I don't know how a workspace like that deals with a reader. I assume it reads the data once and then - if you have a group-based transformer - holds that data for processing against any incoming feature, but I don't know for sure. Also, of course, I don't really know what action you are carrying out or how many features you have, or how often you run this process (cause this would take a lot of setting up, but could save you lots of time if you repeat the process daily).

So, that's the third technique: a continuous process that just listens for new A features to compare against B.

There are some other things like this, but they're fairly specialized; for example if you are doing something with a DEM, the SurfaceModeller can save its internal workings as a file for re-use at a later point. And I think there are some web formats/transformers that will cache data locally, so you don't have to keep rereading it from a remote source again and again. But I don't know if they will help you here.

Anyway, I hope something here helps. If you can give us a few more details about what you are doing exactly, then it might be that there's something we can do to help. Other than that, as others have said, use a good, fast format. I seem to recall trying out different formats at one time, and MicroStation v8 was the fastest, but whether that's still true, and whether I included a comparison with FFS, I don't know.

If I have any more bright (or even not-so-bright) ideas, I'll let you know,

Regards

Mark

PS: Thanks for letting me know about this one Bruce. Not quite a blog, but getting on for blog length!

答案由bruceharold

bruceharold — 2017年1月26日星期四17:05:45 GMT

我能感觉到一篇来自Mark Ireland的博客文章，关于通过FFS或SQLite文件对工作空间进行记忆……

答案由erik_jan

erik_jan — 2017年1月26日星期四16:49:30 GMT

作为一个补充，你可能想看看FeatureReader变压器。

该转换器允许您读取与文件A(可能不是完整的文件B)相比所需的文件B的部分。

答案由david_r

david_r — 格林尼治时间2017年1月26日星期四16:48:44

不，你不能，不幸的是。

但您应该考虑文件B的格式，因为它可以产生很大的差异，因为一些格式比其他格式快一个数量级。您可以尝试将文件B预处理为本机FFS格式，看看是否有区别。

如果这还不能提高性能，那么可以对您的工作流进行更全面的分析。亚搏在线例如，如果您在大型数据集上重复使用FeatureMerger，那么首先将数据加载到关系数据库中，并使用索引字段(作为视图或来自SQLExecutor/SQLCreator)执行SQL join查询，这可能会带来很大的差异。

答案由erik_jan

erik_jan — 格林尼治时间2017年1月26日星期四16:47:17

能否读取文件B并创建一个ffs文件?ffs是FME的内部文件格式，读取速度比任何其他格式都快。

创建一个(临时)工作区，读取文件B并写入ffs写入器。然后在进程中使用ffs文件，而不是文件b。

“加速工作区跑步机”的答案

答案由jeroenstiers

答案由mark2atsafe亚搏在线

答案由bruceharold

答案由erik_jan

答案由david_r

答案由erik_jan