span8
span4
span8
span4
Hi FME'ers,
Depending on when I press the Post button, FME2019.2 is either just out, or going to be released imminently, and I wanted to discuss an interesting new parameter that has gone into several transformers:Preserve Feature Order.
It's not so much the parameter that I have to explain, as why the parameter is needed, and that's at a fairly detailed level of FME use. Not every user will care and I expect nearly no-one to use this parameter!
So why did we add it? To ensure FME is backwards compatible and to give our advanced users maximum flexibility. So you can skip to the summary section and I won't be offended; but you might find this interesting, and I'll try to explain as simply as possible.
此新参数的原因围绕工作空间中的功能流程旋转。
Knowing how features flow through an FME workspace is not vital, but it's often useful to know. For this reason we did a workshop on the subject at the last user conference. Attendees got to their feet and pretended to be features, while the Safe folk were readers, writers, and transformers. It was awesome and I wish the video recording had been good enough to publish. It wasn't though, and so here's a brief intro to the flow of features...
One key aspect of FME has always been that features pass through a workspace, one at a time, in the same order that they are read. That order doesn't change and so - because you can rely on it - you can design your workspace to take advantage of that.
For example, you know that feature X (a clip outline) will reach a Clipper transformer before feature Y (the feature to clip), so you can set the mode parameter to "Clippers First", which improves performance.
Similarly, when features emerge from a transformer, they will emerge in the same order. For example, say I have six features representing roads:
ID | Road Type |
A | Highway |
B | 住宅 |
C | Highway |
D | Private |
E | Highway |
F | 住宅 |
If I pass them through a Tester transformer, testing forRoadType = Highway,然后输出的顺序:
ID | Result |
A | Passed |
B | Failed |
C | Passed |
D | Failed |
E | Passed |
F | Failed |
So the order of the features as they are processed, is retained in the output. A,B,C,D,E,F.
In short, features are processed and output in a predictable order, and that can be useful.
However... starting around FME2017 we introduced a "bulk mode" to FME. It's a performance-enhancing technique that for the most part remains invisible. It just happens and you don't need to really know about it. FME is faster and we're all happy.
But the word "bulk" should suggest that features aren't always processed one-at-a-time any more, which means that they won't pass through the workspace the same way.
In bulk mode, features now pass through FME in blocks. You can see this by looking at the feature counts. While a workspace is running they often increment by large numbers, not one at a time. Here the features emerge from this StatisticsCalculator in blocks of 100,000 features at a time:
By itself, this doesn't really make any difference to your work; but is the feature order the same...?
Actually, yes, feature order remains the same in bulk mode. The features passed into the StatisticsCalculator transformer (as above) emerge in the same order.
The block sizes might not be the same (one block of 1.5m features entered, multiple blocks of 100,000 features emerged) but the order of features is unchanged. If the StatisticsCalculator were connected to a writer, the data would be written in the same order as it was read.
So nothing has really changed.Except...there is a difference for transformers where multiple output ports are connected.
So features now pass through FME in blocks. When there are multiple output ports, then features emerge from individual ports in blocks too. This Tester shows what I mean:
If you watch a few times like I did, you can see that (except for the first number update) there are approximately 100,000 features each time again, but split into passed and failed.
For example, the fifth time the numbers update, 99,930 features were added to the feature counts; 54,170 from the PASSED port and 45,760 from the FAILED port.
So... let's go back to our previous example of features representing roads:
IDRoad Type | |
A | Highway |
B | 住宅 |
C | Highway |
D | Private |
E | Highway |
F | 住宅 |
If I again pass them through a Tester transformer, again testing for RoadType = Highway, then this time the output will be in a different order:
IDResult | |
A | Passed |
C | Passed |
E | Passed |
B | Failed |
D | Failed |
F | Failed |
That's because blocks are output one port at a time. Features A,C,E form one block, B,D,F form the other. The orderwithin功能块保持不变,但each port now outputs its features separately.
这是一个问题吗?嗯,99.9%的用户这只是一种不会影响它们的微妙差异。但是,我们希望在100%的情况下向后兼容。因此,对于可能依赖 - 或想要利用 - 旧行为的0.1%的用户,我们添加了一个新参数
Here's the new parameter in a Tester transformer:
What it does is set the transformer to either use the new behaviour (Preserve Order: Per Output Port) or the old behaviour (Preserve Order: Across Output Ports).
If I set the above workspace to use Across Output Ports (the legacy method) then I basically turn off Bulk Mode. This is the result:
So features are now exiting individually again, not by port. It's potentially useful in preserving feature order regardless of ports, but, without Bulk Mode there may well be a performance penalty to pay.
For example, notice that the GIF above is approximately 25% longer without bulk mode. So it's a choice of less control, but faster; or more control, but slower.
日志消息逐渐说明(遗留)方法,例如:
测试人员(TestFactory): Splitting bulk mode features into features
To which transformers does this apply? Not to all of them. Just ones where you would expect the order to be preserved (mostly Feature-Based, filtering transformers):
Plus we will add shortly:
Incidentally, when you open a previous workspace in a newer FME, these transformers get the new parameter with it set to the Across Output Ports mode. That's the legacy setting and it's in order to be backwards compatible. The workspace will work as it always did.
So if you open an older workspace, you can safely change this to Per Output Port, if you don't rely on feature order in the transformer output.
Newly placed transformers default to the new behaviour, to take advantage of bulk mode.
Not an "FAQ" because I don't think anyone will ask this question frequently, but it might be worth knowing about.
Q)You mention that "without Bulk Mode theremaywell be a performance penalty to pay". What do you mean by "may"? Won't you always lose bulk mode and pay a performance cost whenever you use this parameter in Across Output Ports mode?
一)不总是,不。考虑一个测试变压器在哪儿e 20,000 features enter and they are ALL passes. In that case they all emerge from the same port, so there's no need for FME to exit bulk mode to retain feature order. Similarly, if the first 10,000 features were passes and the final 10,000 features were fails, there would be no need to exit bulk mode because there is no back and forth between output ports.
It's only when features are exiting multiple ports alternately (Pass, Fail, Pass, Fail, Pass, Pass, Fail) that it becomes necessary to exit bulk mode, causing a performance penalty.
The exception to this are the AttributeFilter and DuplicateFilter transformers, which will always lose bulk mode when set to Across Output Ports.
As you can see, there are different scenarios in which performance may or may not be affected; that's why I wrote "theremaybe a performance penalty", because outlining all of these scenarios is not feasible.
If that's a little confusing - or you skipped all of that content from the intro downwards - here's an analogy.
Think of a workspace in terms of sorting data in an Excel spreadsheet. In Excel, data is sorted using a "sort key". In FME data is sorted using the order of the incoming features. The order of features (you could say) is literally the FME sort key:
Like a spreadsheet, the data in FME stays in the sorted order throughout, unless you specifically do something (like use a Sorter transformer) to change it.
But - because of Bulk Mode - FME2019 now also sorts data by the order of output ports in a transformer. It's like having a second sort key:
So this new parameter (Preserve Feature Order) appears in both the Tester and other transformers to ask whether you want data in FME to be sorted by PortOrder+ReadOrder (Per Output Port/the newer default) or just by ReadOrder (Across Output Ports/the legacy behaviour):
I can't emphasize enough that it's really not a big deal, because so few of you have workspaces that need this very precise level of feature ordering. But we like to be proactive and backwards compatible at all times, hence this post.
If you do have any questions, please do post them below and I'll do my best to answer them.
Preserving feature order is a hugely powerful aspect of FME and I'm very happy to see Safe providing an option for backwards compatibility in this regard.
I had my "ah hah!" moment one day working with FME and heading down a rabbit hole of recursion and looping before I suddenly realised that I didn't need any of that because of the way FME handles features one at a time - and it only took me about 15 years of working with FME to realise that!
I try to get this concept across whenever I teach an advanced FME course (perhaps it should be added to the official course content?) but have always found it a bit esoteric to explain. Your post helps to explain the mechanics of it all. Thanks Mark!
Thanks Nic. I think, yes, we should add this to the main course content. Previously we've talked about feature-based and group-based transformers, but I think we've gone beyond that now. Bulk Mode is just so different. It's an advanced topic for sure, but one that users at that level should be aware of.
Hi@mark2atsafe, thanks for the great post. However I would syggest to rename the option Across Output Ports to "Ignore feature order" (or I do not understand it correctly). Otherwise I would recommend to add such a value to really make clear what happens.
I'm a bit confused with the Counter behavior when the TestFilter option of Preserve Feature Order:Across Output Ports, see example below:
So I decided to compare the output result between the Counter and the function@数数()。计数器显示未确定值:
while the @Count() function output in a bit accurate count result though:
Note that I added a section with a question about feature order and bulk mode. To save you looking for it above, here it is again:
Not an "FAQ" because I don't think anyone will ask this question frequently, but it might be worth knowing about.
Q)You mention that "without Bulk Mode theremaywell be a performance penalty to pay". What do you mean by "may"? Won't you always lose bulk mode and pay a performance cost whenever you use this parameter in Across Output Ports mode?
一)不总是,不。考虑一个测试变压器在哪儿e 20,000 features enter and they are ALL passes. In that case they all emerge from the same port, so there's no need for FME to exit bulk mode to retain feature order. Similarly, if the first 10,000 features were passes and the final 10,000 features were fails, there would be no need to exit bulk mode because there is no back and forth between output ports.
It's only when features are exiting multiple ports alternately (Pass, Fail, Pass, Fail, Pass, Pass, Fail) that it becomes necessary to exit bulk mode, causing a performance penalty.
The exception to this are the AttributeFilter and DuplicateFilter transformers, which will always lose bulk mode when set to Across Output Ports.
As you can see, there are different scenarios in which performance may or may not be affected; that's why I wrote "theremaybe a performance penalty", because outlining all of these scenarios is not feasible.
Good news. I sometimes felt it would be nice if there were "FeatureTableResolver" transformer that just splits a feature table into individual features to prevent the side effect.
Awesome enhancement to add to FME.
I actually noticed this happened to me in several workspaces, and I had to add the Sorter to overcome the feature out of order behavior.
I can think of a few cases where feature order is both implicit and required. Particularly in regards to splitting raster bands and recombining them (with a tester/testFilter to allow for specific band processing).
But for the most part I look forward to the performance gains in bulk mode.
Thanks@mark2atsafe. That was a great explanation and I definitely feel it will come in handy. We are working on workspaces to validate DWG files and need the order of features to be preserved as it also helps to define the order and direction of the points and lines that make up polygon features. Thanks again and keep up the great work.
Thanks. I'm glad to hear that we made a good decision to add this parameter and that it will be useful.
Different results for Attribute Is Null from Tester each time Workspaces is run7 Answers
FME Workbench crashes when adding tester to canvas3 Answers
finding duplicates then selecting one based on a value in another attribute3 Answers
Select by attributes with case sensitive and replacestring4 Answers
© 2019 Safe Software Inc |Legal