Streaming processing of Assemble flow (Technical Preview)
In Assemble flow, streaming allows a flow activity to begin process and produce output when get partial input from other activity (or activities). It is very important for distributed data access. It could reduce the memory consumption and improve the performance, for the flow will not require the full data into the memory before processing.
In M2, Assemble flow provides a technical preview for XML-oriented streaming support. Currently, we provide a set of streaming operators for feeds: fetch feed, aggregate, filter, truncate, unique. But the design could be extended to other types XML document.
Let us use a very simple sample to see how the streaming operators work: We will aggregate the Yahoo and CNN news feeds and keep all entries which content contains the "U.S."
This flow could be found in zero.assemble.flow.samples/public/samples/filterFeed/index.flow
<process name="Sample">
<receiveGET name="rssRcv"/>
<feed name="YahooFeed" url="http://rss.news.yahoo.com/rss/topstories"/>
<feed name="CNNFeed" url="http://rss.cnn.com/rss/cnn_topstories.rss"/>
<aggregateFeeds name="aggregate">
<input value="${YahooFeed}"/>
<input value="${CNNFeed}"/>
</aggregateFeeds>
<filterFeed name="feedfilter" condition="contains(atom:content, 'U.S.')">
<input value="${aggregate}"/>
</filterFeed>
<replyGET name="rssRply">
<input value="${feedfilter}"/>
</replyGET>
</process>
The "feed" activies will fetch feed and convert it to XML DOM node, "aggregateFeeds" and "filterFeeds" activities will accept the DOM node as input and generate output DOM node object.
Comparing to the stream version which could be get from zero.assemble.flow.samples/public/samples/filterFeedStream
<process name="Sample">
<receiveGET name="rssRcv"/>
<feedStream name="YahooFeed" url="http://rss.news.yahoo.com/rss/topstories"/>
<feedStream name="CNNFeed" url="http://rss.cnn.com/rss/cnn_topstories.rss"/>
<aggregateFeedStreams name="aggregate">
<input value="${YahooFeed}"/>
<input value="${CNNFeed}"/>
</aggregateFeedStreams>
<filterFeedStream name="feedfilter" condition="contains(atom:content, 'U.S.')">
<input value="${aggregate}"/>
</filterFeedStream>
<replyGET name="rssRply">
<input value="${feedfilter}"/>
</replyGET>
</process>
The feedStream activities just open the input streams for XML event, and the subsequence activities just iterator the events from the streams and process on the event streams.
Please read the attached presentation for more detail infomation
-- yili - 23 Oct 2007