Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Should generate valid SCXML state ids #11

Closed
jdrew1303 opened this issue Apr 11, 2018 · 10 comments
Closed

Bug: Should generate valid SCXML state ids #11

jdrew1303 opened this issue Apr 11, 2018 · 10 comments
Assignees

Comments

@jdrew1303
Copy link

jdrew1303 commented Apr 11, 2018

The ids generated in {output: 'scxml'} mode can be invalid. Spaces are not supported in ID tokens. According to the xml specs:

ID tokens must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens (-), underscores (_), colons (:), and periods (.).

Expected Behavior

The generator should take quoted states and camelCase, kebab-case or snake_case them (or any other casing strategy that would meet the requirements). It should also check for invalid chars.

Current Behavior

It generates the ids without any processing or checking for invalid chars. Below is an example of the parse error caused by the incorrect output when the tape example in the README is taken as input.

runkit___npm__state-machine-cat___runkit

Possible Solution

One would be to restrict the input for the names to valid scxml names. This would be to the detriment of the diagrams though. Another options would be to process all names. I dont think the scheme matters too much. For simplicity it may be best to stick to javascript notation (camelCase).

Steps to Reproduce (for bugs)

Below is a simple demo app that can be run to generate a scion state machine from the scxml exported. I test these on runkit (with the blow code prepopulated) to reproduce (https://runkit.com/embed/16wxk3gr6h6w)

const R = require('ramda');

const smcat = require('state-machine-cat');
const scxml = require('scxml');

const scxmlToMachine = (xml) => new Promise((resolve, reject) => {
    scxml.documentStringToModel(null, xml, (err, model) => {
        if (err) return reject(err);
        return resolve(model);
    });
}).then((model) => new Promise((resolve, reject) => {
    model.prepare((err, preparedModel) => {
        if (err) return reject(err);
        return resolve(preparedModel);
    })
})).then(R.constructN(1, scxml.scion.Statechart));

const dslDefinition = `
# yep, notes get rendered here as well
# multiple notes translate into multiple
# lines in notes in the diagram
initial,
"tape player off",
"tape player on" {
  stopped => playing : play;
  playing => stopped : stop;
  playing => paused  : pause;
  paused  => playing : pause;
  paused  => stopped : stop;
};

initial           => "tape player off";
"tape player off" => stopped           : power;
"tape player on"  => "tape player off" : power;
`;

// generate state machine diagram
const svgDiagram = smcat.render(dslDefinition, {outputType: 'svg'});
console.log(svgDiagram);

// generate scxml and a scion state machine to test validity
const scxmlDocument = smcat.render(dslDefinition, {outputType: 'scxml'});
console.log(scxmlDocument)
scxmlToMachine(scxmlDocument).then(console.log.bind(console))
                   .catch(console.warn.bind(console))

Context

Your Environment

PS Thanks for the great project. 👍

@sverweij sverweij self-assigned this Apr 12, 2018
@sverweij
Copy link
Owner

hi @jdrew1303, thanks for reporting this. I was already afraid I'd have to process id's into xml conformity. IIRC there's similar requirements (/ recommendations) for event descriptors - that might need a similar treatment.

Your suggestion to use camelCasing is a very good first one. I'll try to look into it over the weekend.

@sverweij
Copy link
Owner

sverweij commented Apr 12, 2018

hmm - I now see there's quite lot that's not allowed as ID's. I'll probably use a replacement algorithm in the solution - e.g. all map all non-allowed characters => _, flattening diacritics to their base variant (à => a).

(I'll also look if a known mapping solution like punycode would work).

(some notes for implementation)

@jdrew1303
Copy link
Author

jdrew1303 commented Apr 12, 2018

Hey @sverweij,

Thanks for the quick reply. 👍Yah, you're right about the event descriptors too. If there's a space in them they're registering as multiple transition events in 2 of the state machine frameworks Im working with (Apache Commons SCXML for Java and SCION for javascript). 🙈

Punycode would do the trick. The only thing is that the developer would have to be able to recreate the ids and event names on their end too (to trigger them and to switch on the current state). I took a look at the fold-to-ascii repo. It looks like exactly what's needed for a first pass. If you were to that as a first pass and then on the second convert the casing.

There is another option. If the core use case for SMC is to generate diagrams. Then it may be appropriate to just provide a hook for developers to add their own name serializers. They can take the default and force the BAs or whoever to use a naming scheme or they can have a generic transformation that can convert it to their own schema for naming. 🤔It might be overkill but two libs that I use quite a lot for this sort of thing are ware and attach-ware. You end up with express js like middleware that you can plug in.

Either one does the job really.

@sverweij
Copy link
Owner

hi @jdrew1303 I've dug into the latest xml specification and the situation there is a lot less grim. It allows a large charter of unicode: (spec for id which points to spec for name). state-machine-cat still needs to handle spaces (and a few other characters) differently, but it doesn't seem necessary to transform diacritics (🎉 ). (If scion and Apache's SCXML implement this correctly, that is.)

I found xml-name-validator - which looks solid and will be helpful.

@ hooks/ plugins and especially ware & attach-ware are an interesting suggestion - I'll need some time to grok them, though. They might be useful for more than transforming names into conformity...

sverweij added a commit that referenced this issue Apr 13, 2018
sverweij added a commit that referenced this issue Apr 13, 2018
@sverweij sverweij mentioned this issue Apr 13, 2018
11 tasks
@sverweij
Copy link
Owner

sverweij commented Apr 13, 2018

hey @jdrew1303 I've published a beta version of state-machine-cat on npm with the ~~~first~~~ second version of the fix for this issue. If you ~~~npm i state-machine-cat@2.6.3-beta-0~~~ npm i state-machine-cat@2.6.3-beta-1 you can give it a try - let me know what you think!

For what to expect:

A fix for event names will follow in a separate PR.

@jdrew1303
Copy link
Author

Hey @sverweij . Sorry about the delay in replying.

I wasn't expecting the turnaround so quickly on the issue 👍 😄It solves the issues with the IDs from my testing (both with apache commons and scion). Would it be possible to apply the same serializer to the transition event names (would it be better to close this issue off and raise a seperate one)?

@sverweij
Copy link
Owner

@jdrew1303 don't worry I had an unexpected windfall in time to work on it, so it went faster then prognosed.

I've merged the PR to master (which automatically releases to state-machine-cat.js.org) and released to npm (state-machine-cat@2.6.3)

@ event names: We can apply the same serializer to the event names - but the rules for scxml event names are slightly different from those for state ids, so it's probably wise to make a separate conversion thing for it, if any:

  • the xsd-flavored regular expression for events is (\i|\d|\-)+(\.(\i|\d|\-)+)*(scxml-datatypes, search for EventType.datatype); a state id is a simpler affair: \i(\c)* (TBH the difference between \i|\d\- and \c is not very big, and characters you probably won't use anyway: #xB7 | [#x0300-#x036F] | [#x203F-#x2040])
  • spaces (#x20) are meaningful (they separate multiple events) -> we should probably leave spaces in place and convert illegal space-likes characters (tab, linefeed, ...) to #x20 so you can still have > 1 event per transition.
  • ...as are dots within an event (they play a role in event selection) - if scion and apache are as faithful implementations of scxml as I expect, they're using that...

-> What do you think?

@sverweij
Copy link
Owner

@jdrew1303 - I'v published state-machine-cat@2.6.4-beta-0 to npm with a throw at generating valid scxml event descriptors. Give it a spin and let me know if it works for you.

Details & rationale here.

Main takeaways:

  • similar translation as state id's - but with the characters allowed for event descriptors.
  • end of lines (\n) separate multiple events. Rationale:
    • keeping spaces as event separators would hamper readability of diagrams
    • end of lines as separators for multiple events look quite natural
    • even though in the scxml's I've seen multiple events per event attribute are not that common we still need event separators to keep the same expressivity as scxml

example:

meow -> eat: 
  human provides food
  human leaves food on table;

screen shot 2018-04-18 at 21 54 06

<?xml version="1.0" encoding="UTF-8"?>
<scxml xmlns="http://www.w3.org/2005/07/scxml" version="1.0">
    <state id="meow">
        <transition event="human_provides_food human_leaves_food_on_table" target="eat"/>
    </state>
    <state id="eat">
    </state>
</scxml>

@jdrew1303
Copy link
Author

Hey @sverweij . Sorry I've been swamped for the past week or so.

The use of \n keeps the scm definitions easy to read. Nice call 👍The changes fix most of the issues I've encountered so far.

I'm going through the spec in a lot more detail at the moment for a piece Im working on. It'll exercise a lot more of the API surface. I'll try any that make sense on smc as Im going through. I think the naming issues are going to come up in a few more sections (conditions, entry and exit states) but I'll check with the engines also to see what they support.

I've been working with XML quite a bit over the past week or so and it's a pain to work with in JS. I'm not sure if it's of any use with helping structuring the SCXML generation for SMC, it'll depend on how far down the rabbit hole you go 😛:

I've said it before but thanks for the quick turn around and a great little lib 👍

@sverweij
Copy link
Owner

I'll publish the PR tonight to npm as latest (and update the online interpreter) and close this issue. If you encounter any other issues - let me know in a new issue, and I'll see how I can help out.

@ other sections of SCXML - I've checked these when working on state ids & event descriptors. My conclusion was that 'a boolean expression' (for cond) and 'executable content' (for onentry and onexit) can be taken as "anything goes" as far as state-machine-cat's scope is considered; it depends on the target language anyway. I'm glad to be corrected though - my specification exegesis skills have gotten rusty.

@ XML in js: I agree it's a pain. The SCXML exporting in smc is mostly javascript objects juggling. Only at the last possible moment it serialises to XML with two simple handlebars templates.

Thanks for the links to your gists. Using jsx seems like an interesting approach as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants