partitionBy

Rob_Ashton · March 6, 2013, 7:32pm

Would be appreciated, I want to update those blog entries (and I want to run these darned projections over the data ( > half a million events on my server now) and see what a proper statistically relevant sample looks like :D)

Rob_Ashton · March 8, 2013, 2:37pm

Does this work yet?

I really want to do a pile of processing on my last.fm data, I thought of some wicked temporal queries

Yuri_Solodkyy · March 8, 2013, 2:47pm

Rob,

the only significant change is that you need outputTo(,) to persist results. Otherwise results stay in memory and are only available via http requests.

-yuriy

Rob_Ashton · March 8, 2013, 2:48pm

I was using outputTo, I was only getting 33 results (one for each stream with a count of 1)

Greg_Young1 · March 8, 2013, 2:49pm

@yuriy see thread. It appears to be a bit of a bug. With outputTo it seems the query was not seeing events in stream.

Yuri_Solodkyy · March 8, 2013, 2:53pm

Do you have this commit in your sources?

86819bff16ad106324863a93ec0d15a7b0076474

Yuri_Solodkyy · March 8, 2013, 2:56pm

Anyway, Let me run the very similar projection to check.

Rob_Ashton · March 8, 2013, 2:57pm

apparently I did not - I’ll give it another whirl then. (I updated the day you told me it had been fixed, had you not merged to dev yet?)

Anyway no matters, I’ll report back with hopeful success

Rob_Ashton · March 8, 2013, 2:58pm

Oh, I see that was yesterday evening and after the thread - okay - fine, well I’ll definitely be reporting back then!

Yuri_Solodkyy · March 8, 2013, 2:59pm

It fixes probably different issue, but if you got just previous commit you have completely broken fromStream/fromCategory.

Yuri_Solodkyy · March 8, 2013, 3:10pm

Rob,

is is possible that you tried to output into the same stream/streams from different projection? i.e. you first created output streams from one projection and then attempted to write to these streams from another projection?

if so, it is possible that the second projection ignored output to this stream.

I’m adding an option to restart projections (in development) and it will address this problem as well.

-yuriy

Rob_Ashton · March 8, 2013, 3:15pm

Probably not, see the sequential numbers?

It’s most likely I just had the buggy build, I’ll try again in an hour or so now I’ve done a pull.

An option to restart projections would be great for development

My plan for production is to version them and disable projections of a previous version if I don’t need them any more - does that sound reasonable?

Yuri_Solodkyy · March 8, 2013, 3:15pm

I tried

fromCategory().
foreachStream().
when().

outputTo ()

and I can see more than just one event handled in different streams (on current dev)

Rob_Ashton · March 8, 2013, 3:16pm

great, I can’t wait to see it for myself

Yuri_Solodkyy · March 8, 2013, 3:21pm

Rob,

I foresee a problem if you run one projection with

outputTo(‘test1’, ‘test1-{0}’)

then disable it and run another projection with the same

outputTo(‘test1’, ‘test1-{0}’)

At least with current implementation it will either fail or do not write results until the new projection reaches the same position as the disabled one. (Fail if the second projection reads from incompatible source).

Restarting will make sure that we ignore any contents of any scream written by the previous version of the projection.

-yuriy

Rob_Ashton · March 8, 2013, 3:26pm

So I’d have to version the output streams too - that sounds reasonable.

I’m thinking algorithm improvements and such for outputs over time.

Rob_Ashton · March 8, 2013, 5:08pm

This really isn’t working for me, I’ve pulled latest dev and wrote the following two projections

fromStream(‘github’)

.when({

PushEvent: function(s, e) {

linkTo(‘pelanguage-’ + e.body.repo.language, e)

}

})

And

fromCategory(‘pelanguage’)

.foreachStream()

.when({

$init: function() { return { count: 0 } },

PushEvent: function(s, e) {

s.count++

}

}).outputTo(‘totaleventsbylanguage’, ‘totaleventsbylanguage-{0}’)

The result is I end up with a load of streams created with pelanguage-LANGUAGE and the fromCategory stream does absolutely nothing at all.

What am I doing wrong?

Rob_Ashton · March 8, 2013, 5:09pm

This is on a completely fresh set of data, I rm -rfed the entire data directory and logs directory before running this isolated test.

Rob_Ashton · March 8, 2013, 5:11pm

I’m confused though, I appear to have some artifacts left over from the last test - is there something else I have to delete other than the data directory?

Rob_Ashton · March 8, 2013, 5:14pm

There are warnings appearing in the log due to faulty de-serialisations, is this relevant?