< p>
你在比赛中表现如何?如果您将这个B文件存储在数据库中,并且您可以使用joiner来执行匹配,那么速度可能会提高很多!
如果是这样的话,您可以用Joiner transformer替换FeatureMerger。您必须将B放入某种数据库格式中,即使只是临时的,但这样您就不必从B读取所有1,000,000个特性来匹配其中的1,000个。
类似地,您可以使用FeatureReader或SQLExecutor或任何其他能够从数据集B检索所选特性的转换器。因此,这是一种技术:读取相同次数的B,但减少需要从中读取的数据量。
这是性能的第一定律,顺便说一下。 Performance is defined as "useful work" carried out, so if you're reading data that isn't used, it's not useful work!
Anyway, to read B only once, then I think you would need to just have one run of the workspace, and load all of the data from all of the A files at one time. Basically one big process. How many features are in each A file? Are they non-spatial or very complex geometry? That will be the big issue.
I'm assuming you have your current setup - to read each A file separately - because combined there is too much data. But if not, and you are only doing it as a quick way to batch process data, there are other solutions. Considering reading all of the A files at once, then using a Dataset Fanout on the output to split them back out again. Group-by parameters in transformers can help keep each A file separate if necessary.
So that's a second technique: reading all of the A files at once, so you only need to read B once.
Finally, and this is where my imagination starts to exceed my knowledge, I wonder if you can set up a continuously running workspace. That workspace reads file B and then receives features from all of the file As one at a time. It keeps running as long as you are passing file A features to it.
We do have transformers like that - TCPIPReceiver, SQSReceiver, etc. I'm partly unsure because I don't know how a workspace like that deals with a reader. I assume it reads the data once and then - if you have a group-based transformer - holds that data for processing against any incoming feature, but I don't know for sure. Also, of course, I don't really know what action you are carrying out or how many features you have, or how often you run this process (cause this would take a lot of setting up, but could save you lots of time if you repeat the process daily).
So, that's the third technique: a continuous process that just listens for new A features to compare against B.
There are some other things like this, but they're fairly specialized; for example if you are doing something with a DEM, the SurfaceModeller can save its internal workings as a file for re-use at a later point. And I think there are some web formats/transformers that will cache data locally, so you don't have to keep rereading it from a remote source again and again. But I don't know if they will help you here.
Anyway, I hope something here helps. If you can give us a few more details about what you are doing exactly, then it might be that there's something we can do to help. Other than that, as others have said, use a good, fast format. I seem to recall trying out different formats at one time, and MicroStation v8 was the fastest, but whether that's still true, and whether I included a comparison with FFS, I don't know.
If I have any more bright (or even not-so-bright) ideas, I'll let you know,
Regards
Mark
PS: Thanks for letting me know about this one Bruce. Not quite a blog, but getting on for blog length!
我能感觉到一篇来自Mark Ireland的博客文章,关于通过FFS或SQLite文件对工作空间进行记忆……
作为一个补充,你可能想看看FeatureReader变压器。
该转换器允许您读取与文件A(可能不是完整的文件B)相比所需的文件B的部分。
不,你不能,不幸的是。
但您应该考虑文件B的格式,因为它可以产生很大的差异,因为一些格式比其他格式快一个数量级。您可以尝试将文件B预处理为本机FFS格式,看看是否有区别。
如果这还不能提高性能,那么可以对您的工作流进行更全面的分析。亚搏在线例如,如果您在大型数据集上重复使用FeatureMerger,那么首先将数据加载到关系数据库中,并使用索引字段(作为视图或来自SQLExecutor/SQLCreator)执行SQL join查询,这可能会带来很大的差异。
能否读取文件B并创建一个ffs文件?ffs是FME的内部文件格式,读取速度比任何其他格式都快。
创建一个(临时)工作区,读取文件B并写入ffs写入器。然后在进程中使用ffs文件,而不是文件b。