Day II of the evented Github adventure

So, yesterday we left off with the knowledge that

fromAll().whenAny(function(s, e) {

if(e.body && e.body.repo) {

var date = new Date(e.body.created_at)

var dateString = date.getUTCFullYear() + ‘’ + date.getUTCMonth() + ‘’ + date.getUTCDate()

linkTo(‘day-’ + dateString, e)

}

}

)

combined with

fromCategory(‘day’)

.foreachStream()

.whenAny(function(s, e) {

if(e.body && e.body.repo) {

var date = new Date(e.body.created_at)

var hour = date.getUTCHours()

if(typeof s[hour] === ‘undefined’)

s[hour] = 0

s[hour]++

}

})

wasn’t going to work, although

fromStream(‘day-20121112’)

.whenAny(function(s, e) {

if(e.body && e.body.repo) {

var date = new Date(e.body.created_at)

var hour = date.getUTCHours()

if(typeof s[hour] === ‘undefined’)

s[hour] = 0

s[hour]++

}

})

would - so that left me with a sadface as my per-day statistics can’t happen otherwise.

Moving on, I have some more queries to look at and can ignore that these don’t work for now in doing that and look at how to do

  • Most active users

  • Most active repos

This would ordinarily be done per re-partitioned day-stream, but I’ll have a stab at doing them on the main github event stream instead

I’ll also go on and see if I can do some temporal work, such as

“how long does it usually take before an issue gets closed”

“which projects have their owners respond to new issues fastest”

“which projects have their owners respond to new issues slowest”

I’m not sure how to do any of this, although with some digging around I’m sure I’ll manage, as per yesterday, any tips for the above would be appreciated as well as “That’s not possible” or “that’s not what you’d use that for”

For issue being closed… Stream per issue. Then a for each issue correlate open to close would be an easy way…

Btw for most active users and most active projects these are queries where a read model would be better (simple aggregations)

I could even be devilish and build up that read model inside the event store as a projection, but you’d probably tell me off for that

Btw you are using event type there … Try

when({ ‘type’: function()}) pattern match

Rob,

in any case it is better to use

.when({ IssuesEvent: function

syntax. It runs faster.

-yuriy

Oops, forgot about that :slight_smile:

Ah, I notice that on the branch ‘projections’ the above

fromCategory(‘user’)

foreachStream()

Works as I’d expect it to, so I’m now working off that branch until somebody tells me that it’s the wrong thing to do or that dev/master have caught up

Or it doesn’t work as expected, it just works “differently”

I’ve found threads in the mailing list where people have been using the fromCategory stuff with apparent success (nov 21st) - although it was called fromEachStream rather than foreachStream back then

I can’t do any of the projections or charts I want to do in the current state of things so Imma gonna go away and write some other Javascript until this works

can you be more clear what is not working as you expect?

Well, I expect that

fromAll().whenAny(function(s, e) {

if(e.body && e.body.repo) {

var date = new Date(e.body.created_at)

var dateString = date.getUTCFullYear() + ‘’ + date.getUTCMonth() + ‘’ + date.getUTCDate()

linkTo(‘day-’ + dateString, e)

}

}

)

Will create a stream for each day containing links to the original events, so

/stream/day-20121113

(Yes, it’s an 11 because that’s December in Javascript)

This works as expected

I then expect if I want to build a projection for each of these new partitions I can write

fromCategory(‘day’)

.foreachStream()

.whenAny(function(s, e) {

if(e.body && e.body.repo) {

var date = new Date(e.body.created_at)

var hour = date.getUTCHours()

if(typeof s[hour] === ‘undefined’)

s[hour] = 0

s[hour]++

}

})

And this should mean I can go to

/projection/events_per_hour/state?partition=day-20121113

and see the state for that partition, however looking at the stats it doesn’t even bother running the code in the projection above

The code in the projection itself is fine, as if I do something like

fromStream(‘day-20121112’)

.whenAny(function(s, e) {

if(e.body && e.body.repo) {

var date = new Date(e.body.created_at)

var hour = date.getUTCHours()

if(typeof s[hour] === ‘undefined’)

s[hour] = 0

s[hour]++

}

})

I get a nice JSON that looks like

{ 9: 1000, 10: 2000, 11: 2500, etc }

So I either massively mis-understand how fromCategory/etc works in which case I’d like a pointer how to do the above properly, or it’s broken, or somewhere in the middle.

Yuriy alluded to this yesterday, but I wasn’t quite clear what the conclusion was

Rob,

there is a problem with link-on-link which is a little hard to solve quickly. However,

I’m almost ready with another feature which will help in similar scenarios. See the sample below. The idea is

that you can partition state by any rule (not only by stream). In this case you can apply your projections

directly to fromAll() partitioned by date.

We’ll discuss how to approach the link-on-link issue as well.

fromCategory(“account”).partitionBy(
function(e){
return e.body.kind;
}).when({
$init: function() {
return {
count: 0,
balance: 0,
}
},

“AccountDebited”: function (s, e) {
s.count++;
s.balance += e.body.debitedAmount;
return s;
},
“AccountCredited”: function (s, e) {
s.count++;
s.balance -= e.body.creditedAmount;

 return s;

},

});

The code for this feature is in foreachBy branch for now.

–yuriy

This will be great, thanks!

so this says there is a state per kind if i read it correctly … im thinking how else this can be used

fromAll()

.partitionBy(function(e) {

if(e.body && e.body.repo) {

var date = new Date(e.body.created_at)

return date.getUTCFullYear() + ‘’ + date.getUTCMonth() + ‘’ + date.getUTCDate()

}

})

.when({

“PushEvent”: function(s, e) {

var date = new Date(e.body.created_at)

var hour = date.getUTCHours()

if(typeof s[hour] === ‘undefined’)

s[hour] = 0

s[hour]++

}

})

Works a charm

projection/hourly_totals_per_day/state?partition=20121113

{"14":125,"17":29}

(I've not been running the event-pumping service as I've been disconnected)


I imagine this is a lot less efficient than partitioning by day and then applying several projections against that re-partitioned stream, but it'll be easy enough to port across if link-on-link stuff can be made to work

Why would I be getting this?


Attempt to relock the '20121113' partition state locked at the '25766014/25765164' position at the earlier position '25766014/25765164'

This is a bug in getting state, but it is already fixed in dev. It only happens when you try to retrieve state which is currently processed.

Rob,

>>I imagine this is a lot less efficient than partitioning by day and then applying several projections against that re-partitioned stream, but it'll be easy enough to port across if link-on-link stuff can be made to work
It may be true or not.  You query can actually run using an event type index (if enabled with options statement. it will be enabled by default later) to get PushEvents.  
However, more important is that once a projection catch up with the head of transaction file it runs by directly receiving events from the transaction file reader.  So, if you run projections continuously you need to compare time required to run JS function with time/space required to write links event and than resolve it when reading it back.  I cannot say what wins without measuring.


-yuriy

Good to know, thanks

Interesting, then I’ll carry on doing it this way then - at some point I’ll dig into how this stuff actually works (I’ve not yet braved the C# codebase, I’ve heard it’s interesting)

Seems we’re going to end up with several different ways of processing and re-partitioning streams - not a bad thing at all