Day 3.5 of the evented Github adventure

Rob_Ashton · December 17, 2012, 11:24pm

So, I spent all day reviewing rubbish games on ludumdare.com and am going to get back on the horse for a few hours before hitting bed.

I’m wondering about house-keeping, we’ve already discussed that I can throw my projection code in a folder and curl the lot up to an event store when testing, but what do I do about all my test data?

My test data comes from Github, I keep ploughing a few hours data into the event store, realising I’ve made a mess of the projections, can’t really delete the projections (can mark them as deleted of course, but that means all my lovely names are buggered)

How best to do this sort of thing, as it’s clearly going to be a scenario that teams would hit too…

Have a script that can take my main event stream(s) such as /streams/github and plough it from a “buggered” eventstore into a clean and fresh event store
Keep a secondary store of all those events and have a script to plough them into an event store whenever I start fresh dev (would prefer not)
Just version my projections and deal with the fact that I’m going to be deleting them over and over again until I get them right

I can understand that there isn’t really such a thing as ‘deleting’ a projection - because they can have side effects such as stream/projection creation and that would be a PITA to manage (because a clean-up would involve deleting all of that too and there goes all the immutability).

Just curious how you plan/advise on handling this sort of thing in the real world

Rob_Ashton1 · December 18, 2012, 1:51am

Also, now in the process of getting mono onto my production server and have a question

You mentioned max-age or whatever, is it possible over an entire stream to say “Don’t keep data that is more than ‘x’ days old?”

I don’t really want to have to fork over for storage when actually I’m just building a fun little stats site

Andrii_Nakryiko · December 18, 2012, 7:38am

Hi Rob,

That’s exactly what $maxAge is for. When you create a stream, you can provide $maxAge: and EventStore will ensure to never return you events from that stream that are older than the number of seconds you specify. To physically remove that data, you will need to occasionally run scavenging (by cron or by hands) so we can really compact our storage.

You can also specify $maxCount (even simultaneously with $maxAge) – the maximum number of events that are allowed to be in stream. If both $maxAge and $maxCount are specified, the stricter restriction wins, obviously.

Yuri_Solodkyy · December 18, 2012, 8:29am

Rob,

Regarding the development process with projections.

There is a hidden option aimed to help with projection debugging a little.

options({

$forceProjectionName: ‘NAME’,

});

It is not for production use and it may stop working, but for now if you specify this option the projection starts behaving as if it were named differently. It changes all the names of the internal and published streams, so effectively you can restart a projection while keeping the list of projections short. I know it is not what you are looking for.

We discussed adding a DESTROY stream option which enables us to add a debug feature to completely remove a projection. As you already noted it is quite dangerous option, but it may allow some development scenarios. Probably this option should be enabled in configuration first.
Any other ideas?

-yuriy

Greg_Young1 · December 18, 2012, 8:34am

Rob let me explain why we tombstone (don’t let you recreate a stream). Most people would be using http on top. Doing so could completely and totally break http caching. We have discussed adding a flag to kill the tombstone as well but it has some really nasty side effects some people may not immediately see.

Roy_Jacobs · December 18, 2012, 9:55am

I would find option 2 very useful. Certainly it should be a debug-only thing that should be “opt-in” and I’m sure people will abuse it, but it would make debugging and experimenting a lot simpler.

Greg_Young1 · December 18, 2012, 10:10am

yep. yhere is a card for it… its a bit trickier than that but not too bad.

Rob_Ashton1 · December 18, 2012, 10:14am

Yeah, I understand that which is why I mentioned immutability

Having DESTROY would be really useful in dev, but you know that people would a use it in the end and that would cause support issues

Perhaps option 2 is the sanest for now, just run two event stores and feed the second off the first and use that for testing projections, knowing if I restart it it’ll be okay

Perhaps that’s an answer, if rather than attempt a clean up of side effects of a projection, you have the ability to delete everything except explicitly mentioned streams

In my case, that would be “delete everything except the main GitHub stream”, then “re post all projections”

Sounds a bit dumb on writing it down, but the dev story does need to be a bit friendlier and that would do it

Greg_Young1 · December 18, 2012, 10:16am

how big is the git hub stream?

Rob_Ashton1 · December 18, 2012, 10:18am

Does this work for implicitly created streams? I guess not - so I should be creating my GitHub stream explicitly and setting a maxage, haven’t seen the docs for that. It I see there is a c# Api for that so I’ll have a look once I’m out of bed!

Rob_Ashton1 · December 18, 2012, 10:20am

Not huge, about 5000 events per hour tops, hence why having a second store of some sort wouldn’t be too painful

Andrii_Nakryiko · December 18, 2012, 11:24am

Rob. re: $maxAge. Yes, you can specify that only on explicitly created stream. To specify $maxAge, pass as metadata on StreamCreate operation valid serialized to utf-8 string JSON object: { “$maxAge”: 3600 } (for a 1 day expiration).

As for repopulating your DB from second instance. One other possible solution in development process would be to create an instance of EventStore with only GitHub data. Then copy it into another folder and start another EventStore instance from that directory. Your development will go in second instance. Once you want to start from the clean slate, you just overwrite development instance’s DB folder with the one from the ES instance with just DB data. That’s just xcopy. Maybe not the nicest solution, but the easiest for sure.

Rob_Ashton · December 18, 2012, 11:30am

Yeah, that’s pretty much where I’m at - pulling github data in constantly and then copying the folder to do work with - it’s not ideal but it’ll do

I should be explicitly creating streams before I do a linkTo then too I guess, that’ll be a bit annoying for my per-hour streams

Greg_Young1 · December 18, 2012, 11:31am

yes we have a note to support metadata on these creates

Rob_Ashton · December 18, 2012, 11:34am

Okay, well it’s no bother for now

I can live with a few weeks data before restarting and re-clearing everything manually if size becomes an issue