p>
你怎么进行匹配?如果您存储在数据库中这个B档,你可以你一个木匠进行的比赛中,速度可能会增加很多! P>
嗯......我想这一定程度上取决于什么样的工作区正在做什么。你说“在一个匹配的变量到另一个静态文件,文件B” - 这是否意味着一个上一个属性键连接?也许与FeatureMerger?有多少功能有哪些?是否有一个功能较少比乙?例如,1000层的特征是在分在B对百万匹配? p>
如果是这样,你可以用木变压器更换FeatureMerger。你必须把B插入到某种类型的数据库格式,哪怕只是暂时的,但那么你就不用看了的B所有这1百万的功能,只是与它们匹配的1000。 p>
同样你可以使用FeatureReader或SQLExecutor或可能检索来自数据集B.所以这是一个技术来选择特征的任何其它变压器:读取B中的相同的次数而减少数据需要从中读出的量 p>
这就是性能的第一部法律,顺便说一句。性能被定义为“无用功”进行的,因此,如果您正在阅读的是不使用的数据,它不是有用的工作! p>
无论如何,查看B只有一次,那么我认为你需要只是有工作区的一个运行,并在同一时间加载所有的数据从所有A文件。 Basically one big process. How many features are in each A file? Are they non-spatial or very complex geometry? That will be the big issue.
I'm assuming you have your current setup - to read each A file separately - because combined there is too much data. But if not, and you are only doing it as a quick way to batch process data, there are other solutions. Considering reading all of the A files at once, then using a Dataset Fanout on the output to split them back out again. Group-by parameters in transformers can help keep each A file separate if necessary.
So that's a second technique: reading all of the A files at once, so you only need to read B once.
Finally, and this is where my imagination starts to exceed my knowledge, I wonder if you can set up a continuously running workspace. That workspace reads file B and then receives features from all of the file As one at a time. It keeps running as long as you are passing file A features to it.
We do have transformers like that - TCPIPReceiver, SQSReceiver, etc. I'm partly unsure because I don't know how a workspace like that deals with a reader. I assume it reads the data once and then - if you have a group-based transformer - holds that data for processing against any incoming feature, but I don't know for sure. Also, of course, I don't really know what action you are carrying out or how many features you have, or how often you run this process (cause this would take a lot of setting up, but could save you lots of time if you repeat the process daily).
So, that's the third technique: a continuous process that just listens for new A features to compare against B.
There are some other things like this, but they're fairly specialized; for example if you are doing something with a DEM, the SurfaceModeller can save its internal workings as a file for re-use at a later point. And I think there are some web formats/transformers that will cache data locally, so you don't have to keep rereading it from a remote source again and again. But I don't know if they will help you here.
Anyway, I hope something here helps. If you can give us a few more details about what you are doing exactly, then it might be that there's something we can do to help. Other than that, as others have said, use a good, fast format. I seem to recall trying out different formats at one time, and MicroStation v8 was the fastest, but whether that's still true, and whether I included a comparison with FFS, I don't know.
If I have any more bright (or even not-so-bright) ideas, I'll let you know,
Regards
Mark
PS: Thanks for letting me know about this one Bruce. Not quite a blog, but getting on for blog length!
我能感觉到一个博客帖子通过FFS或SQLite的文件就快到了马克爱尔兰有关工作区的memoisation ..... P>
作为补充,你可能想看看FeatureReader变压器。 p>
这是变压器可以读取文件B部分,你比较需要与文件A(和潜在的不完整的文件B)。 p>
没有,你不能,unfortuately。 p>
但是,你应该考虑文件B的格式,因为它可以使一个很大的区别的一些格式是一个数量级比别人快。你可以尝试预处理文件B到原有的FFS格式,看看是否有差别。 p>
如果不提高性能,使您的工作更一般的分析是为了。亚搏在线举例来说,如果你反复使用的大型数据集一FeatureMerger,可能你的数据受益于第一加载到关系型数据库,并使用SQL加入使用索引字段(无论是作为一个视图或从SQLExecutor / SQLCreator)查询,它可以使显著差异。 p>
您可以读取文件B并创建一个FFS文件?FFS为FME内部文件格式和读取比其它任何格式速度更快。 p>
创建(临时)的工作区和读文件B并写入到FFS作家。然后使用FFS文件在过程而不是文件B P>