评论和答案为“加快工作区亚军”

通过jneujens的回答jneujens评论

jneujens — 周五，2017年9月8日12时35分03秒GMT

的确伟大的建议！搜索结果

通过jeroenstiers回答

jeroenstiers — 周五，2017年1月27日18时59分19秒GMT

你怎么进行匹配？如果您存储在数据库中这个B档，你可以你一个木匠进行的比赛中，速度可能会增加很多！

通过jeroenstiers评论

jeroenstiers — 周五，2017年1月27日18时56分59秒GMT

的确是一个非常好的问题。给予好评！搜索结果

评论由osarhan上osarhan的答案

osarhan — 周五，2017年1月27日17时48分51秒GMT

感谢您抽出时间来写这么详细的响应 @ Mark2AtSafe ！结果我带你亚搏在线前两段（以及其他人的意见）在船上，你猜对我的问题的参数和规模。每个文件A是从10至100个记录任何东西，文件B是接近500k的记录。结果我创建文件B的简化版本，CSV和所使用的木匠，并有更好的结果，它是沿着荏苒相当快了。搜索结果我一定会探索利用数据库，如果这成为一个常规任务，现在我很高兴地让我的18000页的文件需要24小时运行。这是一个曾经在此刻一年的任务。搜索结果读取所有的文件截至曾经是我的初步计划，但因为它们是XML，并且包含许多节点，那我想提取它变得有点棘手，是发现系统无法处理我的志向，但我把这个下来的系统我FME上是资源不足。这就是我如何得到我的当前工作区亚军的设置！搜索结果，再次感谢！点击奥马尔

评论由erik_jan上erik_jan的评论

erik_jan — 星期四，2017年1月26日二十时37分36秒GMT

好马，我得到了暗示，跟着你的建议。搜索结果

通过mark2atsafe评论亚搏在线

mark2at亚搏在线safe — 星期四，2017年1月26日20时35分22秒GMT

这是一个很好的问题，顺便说一句。我会在这里想等民间排序应该投票最多为有趣的（暗示，暗示）。我看到很多票的答案，但很少提问，这是一种耻辱，因为我觉得它鼓励质量问题。搜索结果

通过mark2atsafe回答亚搏在线

mark2at亚搏在线safe — 星期四，2017年1月26日20时三十三分20秒GMT

嗯......我想这一定程度上取决于什么样的工作区正在做什么。你说“在一个匹配的变量到另一个静态文件，文件B” - 这是否意味着一个上一个属性键连接？也许与FeatureMerger？有多少功能有哪些？是否有一个功能较少比乙？例如，1000层的特征是在分在B对百万匹配？

如果是这样，你可以用木变压器更换FeatureMerger。你必须把B插入到某种类型的数据库格式，哪怕只是暂时的，但那么你就不用看了的B所有这1百万的功能，只是与它们匹配的1000。

同样你可以使用FeatureReader或SQLExecutor或可能检索来自数据集B.所以这是一个技术来选择特征的任何其它变压器：读取B中的相同的次数而减少数据需要从中读出的量

这就是性能的第一部法律，顺便说一句。性能被定义为“无用功”进行的，因此，如果您正在阅读的是不使用的数据，它不是有用的工作！

无论如何，查看B只有一次，那么我认为你需要只是有工作区的一个运行，并在同一时间加载所有的数据从所有A文件。 Basically one big process. How many features are in each A file? Are they non-spatial or very complex geometry? That will be the big issue.

I'm assuming you have your current setup - to read each A file separately - because combined there is too much data. But if not, and you are only doing it as a quick way to batch process data, there are other solutions. Considering reading all of the A files at once, then using a Dataset Fanout on the output to split them back out again. Group-by parameters in transformers can help keep each A file separate if necessary.

So that's a second technique: reading all of the A files at once, so you only need to read B once.

Finally, and this is where my imagination starts to exceed my knowledge, I wonder if you can set up a continuously running workspace. That workspace reads file B and then receives features from all of the file As one at a time. It keeps running as long as you are passing file A features to it.

We do have transformers like that - TCPIPReceiver, SQSReceiver, etc. I'm partly unsure because I don't know how a workspace like that deals with a reader. I assume it reads the data once and then - if you have a group-based transformer - holds that data for processing against any incoming feature, but I don't know for sure. Also, of course, I don't really know what action you are carrying out or how many features you have, or how often you run this process (cause this would take a lot of setting up, but could save you lots of time if you repeat the process daily).

So, that's the third technique: a continuous process that just listens for new A features to compare against B.

There are some other things like this, but they're fairly specialized; for example if you are doing something with a DEM, the SurfaceModeller can save its internal workings as a file for re-use at a later point. And I think there are some web formats/transformers that will cache data locally, so you don't have to keep rereading it from a remote source again and again. But I don't know if they will help you here.

Anyway, I hope something here helps. If you can give us a few more details about what you are doing exactly, then it might be that there's something we can do to help. Other than that, as others have said, use a good, fast format. I seem to recall trying out different formats at one time, and MicroStation v8 was the fastest, but whether that's still true, and whether I included a comparison with FFS, I don't know.

If I have any more bright (or even not-so-bright) ideas, I'll let you know,

Regards

Mark

PS: Thanks for letting me know about this one Bruce. Not quite a blog, but getting on for blog length!

评论由osarhan上osarhan的答案

osarhan — 星期四，2017年1月26日17时36分35秒GMT

谢谢，会做搜索结果

评论由osarhan上osarhan的答案

osarhan — 星期四，2017年1月26日17时36分16秒GMT

谢谢你，结果我会牢记这一点对于未来，可能会改变我的过程中，如果这是我们开始做了一年多的一次。搜索结果

评论由osarhan上osarhan的答案

osarhan — 星期四，2017年1月26日17点35分10秒GMT

谢谢，我给这个一去，我开始读文件B为SHP，并意识到这是做愚蠢的事情，我开始使用一个CSV，这是快了很多，我会尽力FFS和汇报。

通过bruceharold回答

bruceharold — 星期四，2017年1月26日17时05分45秒GMT

我能感觉到一个博客帖子通过FFS或SQLite的文件就快到了马克爱尔兰有关工作区的memoisation .....

通过erik_jan回答

erik_jan — 星期四，2017年1月26日十六点49分30秒GMT

作为补充，你可能想看看FeatureReader变压器。

这是变压器可以读取文件B部分，你比较需要与文件A（和潜在的不完整的文件B）。

通过david_r回答

david_r — 星期四，2017年1月26日十六时48分44秒GMT

没有，你不能，unfortuately。

但是，你应该考虑文件B的格式，因为它可以使一个很大的区别的一些格式是一个数量级比别人快。你可以尝试预处理文件B到原有的FFS格式，看看是否有差别。

如果不提高性能，使您的工作更一般的分析是为了。亚搏在线举例来说，如果你反复使用的大型数据集一FeatureMerger，可能你的数据受益于第一加载到关系型数据库，并使用SQL加入使用索引字段（无论是作为一个视图或从SQLExecutor / SQLCreator）查询，它可以使显著差异。

通过erik_jan回答

erik_jan — 星期四，2017年1月26日16时47分17秒GMT

您可以读取文件B并创建一个FFS文件？FFS为FME内部文件格式和读取比其它任何格式速度更快。

创建（临时）的工作区和读文件B并写入到FFS作家。然后使用FFS文件在过程而不是文件B