"Multiple projections emitting to the same stream detected" after recreating a projection

Peter_McEvoy · October 3, 2017, 4:18pm

For some reason I naievely thougth that deleting and recreating a projection would solve a problem I had in my application. Now I have two problems…

The projection intends to project from a large stream of events (“hub”) to a per-aggregate instance stream. The original projection was not created with “track emitted”

so “delete emitted” would not have helped even if I had known about it. Regardless, I am now in a state where I cannot recreate the projection. Or rather, I am in a

state that I cannot enable the recreated projection.

I would like to follow this advice to attempt to DELETE the created streams that were created by the projection. However I am unable to easily list all streams that exist…

I wrote a script to try and list all streams from $streams, but that appears to lists all events from all streams and after deleting a page of streams, I now appear to have events that are

orphaned (they are listed in $streams, they have a streamId, but that stream does not exist). Perhaps they will disappear if I scavange?

Based on an off-the-cuff comment mentioned in one of the various posts that I have read, I seem to believe that the “Multiple projections…detected”

error message arrises because the projection checks the head event on a target stream and compare a projection ID and raises that message.

If I was able to publish an event to a stream with a projectionId that matches the projection, will that trick the validation logic for enabling the projection? What shape should

that event take. (although I will still have to deal with the issue to list all streams)

How can I see what projection a given stream/event is associated with - I cannot see anything in the “rich” JSON of a stream. Likewise I am unable

to see any projectionId in metadata about the projection

Is there was a way I can change the projectionId of my new projection to a desired value?
Is there an easier way to list all streams than iterating over all the events in $streams? I have several orders of magnitude less streams that events, so processing

each event seems excessive

In deleting a stream, should I soft delete or hard delete? (The hard delete documentation indicates that the stream cannot be recreated and indeed that it does not get

scavenged - is that a mistake?). Will soft delete be enough given that the events aren’t deleted?

Is there a better way to fix the mess I have got myself into?

Sincerely

Pete

Hayley-Jean_Campbell · October 4, 2017, 6:52am

Hi Pete,

Answers inline:

If I was able to publish an event to a stream with a projectionId that matches the projection, will that trick the validation logic for enabling the projection? What shape should that event take.

A projection writes a bit of metadata with each event it writes. The metadata looks something like this:

{

“$v”: “1:-1:1:3”,

“$c”: 2073612,

“$p”: 2073612,

“$causedBy”: “5b65b64d-705d-4070-9a32-16bce3659934”

}

The value that is checked against is that $v (version) parameter. This parameter contains the projection id, epoch id, and projection version.

It would be possible to trick the projection into thinking that it is able to write to a stream by writing an event with the same metadata that the projection expects, but deleting the stream would be easier and much more reliable.

If you were to try this, I would recommend that you emit an event from your projection first to a test stream and use that metadata as a template rather than trying to build your own.

How can I see what projection a given stream/event is associated with - I cannot see anything in the “rich” JSON of a stream. Likewise I am unable to see any projectionId in metadata about the projection

You can see this in the UI as “Link Metadata”, or by viewing the stream with rich JSON and resolve link tos disabled. For example:

curl -i http://localhost:2113/streams/%24streams?embed=TryHarder -H “accept:application/json” -H “ES-ResolveLinkTos:false” -u admin:changeit

``

Is there was a way I can change the projectionId of my new projection to a desired value?

No. The projection id is determined by the projection’s position in the $projections-$all stream.

Is there an easier way to list all streams than iterating over all the events in $streams? I have several orders of magnitude less streams that events, so processing each event seems excessive

$streams only contains the first event of every stream. You can list all these events and grab the stream id from each of them to build a list of streams.

In deleting a stream, should I soft delete or hard delete? (The hard delete documentation indicates that the stream cannot be recreated and indeed that it does not get scavenged - is that a mistake?). Will soft delete be enough given that the events aren’t deleted?

Always soft delete unless you really do not want to write to that stream ever again. A hard delete effectively locks down a stream and prevents you from writing to it.

A soft delete is sufficient, while you can still see the events in $all, Event Store recognises that the stream is deleted.

The deleted events will be removed during a scavenge when they are eligible.

Is there a better way to fix the mess I have got myself into?

You are on the right track - if you can find all the streams your projection emitted to and soft delete them, then you would have done precisely what happens internally when you select “delete emitted streams.”

Peter_McEvoy · October 4, 2017, 7:07am

Haley,
Thanks for your detailed reply - it really helps… I’ll double down on the delete approach. I’m putting together some reusable powershell pipeline functions so that I can do things like this:

With-AllStreamIds | ?{ $_ -match ‘StreamName’} | With-Stream $_ | Delete

``

And that is useful in the longer term for us… Although, 10,000 pages in, it’s still churning away…!

While this digs me out of a hole for now on my development box, I won’t be able to do this on our production system because once events are projected from the hub into their respective aggregate stream, they can be scavenged (we set max-age to a week), so recreating emitted streams will not be possible (no fear, our project hasn’t gone live yet and it’s just as well I discovered this issue early). I’m going to explore the mechanism you suggest for emitting a single test event and copying that into the existing streams

Thanks again! I’ll update here…

Pete

Peter_McEvoy · October 5, 2017, 7:58am

Hi Haley,
Just to follow up a little more about the $v metadata and how to construct it. I have managed to fix a projection by writing a dummy event with the correct $v value and the projection will reset and start. So now I’d like to explore how to construct the correct $v value. While I can write a test event from a projection to another stream, that does mean changing the projection definition and is a little convoluted - especially to automate. I’m really aiming for a script that given a projection name, and a stream regex can send the dummy event to all streams matching the regex. So I would really like to construct the $v based on data available from existing streams.

I see that I can get the epoch and version of a projection from the ‘projections/any’ endpoint - however that information does not contain the projectionId. I can see that there is a “create-and-prepare” event in the $all stream that does contain a projectionId, epoch and version properties - that event lives on a stream called $projections-$ - and eventually that event seems to get pushed off that stream (even if not, I would have a new problem to figure out how to get that guid).

I wonder if you are aware of any documentation that lists all the internal streams that come into play that contain the information about a projection?

Lastly, the $v value seems to have four values a:b:c:d - from your explanation “a” is the projectionId, and “d” is the version. I think that “b” is the epoch value (as descipred in /projections/any) - but what is the “c” value?

I appreciate your (or anyone else’s) input to this problem and I promise to share the scripts that I manage to cobble together…

Sincerely

Pete

Hayley-Jean_Campbell · October 6, 2017, 6:48am

Hi Pete,

Before I answer your questions, I would first like to understand why you are wanting to go this route.

If your projection is not live yet, could you not simply use the TrackEmittedStreams option when you create the projection?

Apart from that, having a max age set on a stream does not prevent you from deleting the stream as well.

Even if the stream was scavenged, the last event in the stream always remains so that you are still able to perform such actions on the stream.

The reason details such as the projections streams and version tags are not documented is because they are internal to Event Store.

While I understand that knowing what they are and how they are created is useful when you find yourself in a specific situation and need a quick fix, we would prefer to not have people rely on them.

This is because they can change without warning, and we need to be free to change such things if necessary without having to worry about breaking functionality for people depending on them.

Peter_McEvoy · October 6, 2017, 11:12am

Hi Haley-Jean,
OK - so your answer has caused us to 5*whys internally and we’ve come to the conclusion that the root cause of this problem is that our application architecture is using GES in an incorrect way and that we need to restructure. We are relying on GES projections to create events of record and that is probably a bad thing.

Here is our pattern:

We realize that GES is not “transactional” when writing to multiple streams, so all application events are writen to a single stream: “hub”. So if a business process needs to update multiple aggregates, we can save those events in a single save operation and it’s all-or-nothing.
Having a single stream retaining all events for all time is probably not a good idea, so we use projections to emit events from the “hub” into aggregate type streams. The events are marked as “canonical” and the stream is now the source of record for that event. We now no longer need the events on the hub and they can be scanvenged after a reasonable period of time.
We also use projections to create indexing streams (linkTo) of each aggregate instance from the canonical stream. These streams are used by the application to populate views of an individual aggregate.

If for any reason a problem occurs with a projection, it will affect the application. For the canonical streams, we cannot drop and recreate them as the source events will have been scavenged.

You are correct, we are still in development at this time, but we are preparing to go live and need to harden our knowledge of the entire system and understand what it will take to administer the sytem and what to do incase of failure. One issue that has cropped up from time to time has someone in the dev team mistakenly dropping their projections and recreating them: it’s not unreasonable to liken them to RDBMS “views” or “persistent indexes” or some kind of transformation lambda that can be recreated. Another issue that can cause this may be writing to a stream that should only be written to by a projection.

Yes: we should be updating existing projections rather than drop-and-recreate, but the UI makes deleteing really easy to do; yes, we should prevent manual writing to a stream that a projection writes to. To date, we have been saying “don’t do that” and re-installing and initializing from scratch. However that is not a production solution and I have been trying to figure out how to fix a faulted projection. Another use case may be where we need to manually add a compensating event to a stream that a projection writes to.

Hence all of this research into how to fix the “Multiple projections emitting to the same stream…” message without deleting all the streams that it ever created. (You may have seen me reviving a post that Pieter answered a year ago about the structure of the $v value and how to get projectionId from a projection name).

I absolutely take your points on board about internals of the $v value and you have convinced me to back away! This has prompted internal discussion and we are going to change the GES pattern that we had adopted. Instead, the app will write directly to the canonical aggregate type streams and to the hub after - it may be that we lose the transactional nature of updating multiple aggregate types, but we’ll have to figure out a way to address that. We will probably keep the indexing projections - dropping and recreating those streams will be possible (although the app will be in a strange state until they are recreated).

As an aside (and to put a curious mind at rest), I really would love to understand why GES wants to enforce that projections are the exclusive owners of the streams they create?

Thanks for taking the time to respond…

Pete

Poule_Dodue · October 11, 2017, 2:21am

“GES wants to enforce that projections are the exclusive owners of the streams they create”

@Hayley

^^ Is it something that may change in the future?

Hayley-Jean_Campbell · October 11, 2017, 9:00am

The reason projections exclusively own their streams is because if they didn’t they would lose all predictability. We would no longer have any idea what should be in that stream.

An example of this is when a projection starts up from a checkpoint. It first goes through any events after that checkpoint and checks them against the emitted stream.

By doing this, we can tell where the projection got up to last and can continue from where we left off. On top of that, the projection can verify that everything is in order, no events missing etc.

If anyone can write to the emitted streams, then the projection would have no idea where it got to last in terms of processing.

We can no longer trust that the projection itself emitted that event or if something else did.