Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ytplayer? (issues with live stream) #1

Open
vgoklani opened this issue Jan 13, 2020 · 14 comments
Open

ytplayer? (issues with live stream) #1

vgoklani opened this issue Jan 13, 2020 · 14 comments

Comments

@vgoklani
Copy link

vgoklani commented Jan 13, 2020

Hey there, thanks for releasing this!

I followed your instructions but got this error:

VM26770:4 Uncaught ReferenceError: ytplayer is not defined

Where does this get defined?

@benwiley4000
Copy link
Owner

@vgoklani in which context are you running the script? You should be on the page of a YouTube video (e.g. https://www.youtube.com/watch?v=OraxqbUjpHw) and if you navigated to that video from another page, I would refresh the page to make sure the JavaScript globals refer to the video you chose (apparently the old globals stick around otherwise).

Next you should open the developer tools JavaScript console and from there I would just follow the instructions in the readme... paste in the script, run the function to save the file(s), and it should just work. I just tried it. Let me know if anything else seems unclear. And let me know if you think anything in the readme should be updated.

Thanks!

@vgoklani
Copy link
Author

Thanks for the response @benwiley4000 !

I wasn't able to get it to work for this video since it's a live stream.

https://www.youtube.com/watch?v=dp8PhLsUcFE

By any chance, do you know how to get the callback that gets called every time the captions get updated? I'm trying to process the real-time stream. Thanks!

@benwiley4000
Copy link
Owner

Oh, I have no clue. If it updates the same info that we use to grab the captions I would try a loop using requestAnimationFrame or a setInterval. Otherwise I'm not sure off the top of my head. If you learn anything let me know! I'd love to improve this script to support all types of YouTube videos/streams.

@benwiley4000 benwiley4000 changed the title ytplayer? ytplayer? (issues with live stream) May 3, 2020
@dirxiang
Copy link

dirxiang commented Oct 2, 2020

Hi,

Thank you for sharing this! It worked for me to download a file for the English captions. But is it possible to extract a file for another language? I was hoping to get a file that has the captions in Chinese, which is auto-translated by youtube. Is it possible if it's auto-translated? Thanks!

@benwiley4000
Copy link
Owner

@dirxiang I believe that should work. Could you share the video URL?

@dirxiang
Copy link

dirxiang commented Oct 2, 2020

@dirxiang I believe that should work. Could you share the video URL?

the link is: https://www.youtube.com/watch?v=D4g8MmICJ8g&ab_channel=BCPSMagnetPrograms
Thanks!

@benwiley4000
Copy link
Owner

@dirxiang ah, that's a new feature I don't think was available before. I don't support this currently but it can be added, I just tested. I'll open a new issue for this.

@dirxiang
Copy link

dirxiang commented Oct 2, 2020 via email

@benwiley4000
Copy link
Owner

Done (see other thread #2 ).

@benwiley4000
Copy link
Owner

benwiley4000 commented Oct 7, 2020

@vgoklani I finally found a solution for consuming captions from YouTube live streams as they become available. For now the API is quite different from the VTT download script for completed videos, but perhaps it can be adapted into the same tool.

Here is the usage. I'll paste the function below.

// starts consuming captions beginning now
var callback = console.log;
handleCaptionsStream(callback);

// starts consuming captions at a point in the past up until the present,
// and continues consuming captions as they become available
var callback = console.log;
var date = new Date();
// start 3 hours ago (YouTube seems to allow you to request
// up to a bit more than 7 days in the past if needed)
date.setHours(date.getHours() - 3);
handleCaptionsStream(callback, date);

Your callback will be triggered with the following data:

Screen Shot 2020-10-07 at 1 15 53 PM

A few things to note about the response:

  • The optional startTimestamp passed into the handleCaptionsStream call is a unix timestamp reflecting the real time (e.g. 2:17PM EST 7 October 2020), so the number of milliseconds since GMT 1970.
  • unixTimestampRelative and the start and end properties for captions are unix timestamp values reflecting the time from the beginning of the stream. If you convert them into Date objects, you'll get dates in 1969/1970 (when unix time begins).
  • Note that all the time values in streamProperties are in microseconds (Us) so they need to be divided by 1000 to be converted to milliseconds.
  • You may notice that the Sequence-Number multiplied by the Target-Duration-Us is equal to the microsecond (Us) timestamp of that sequence index. Divide that result by 1000 to get its millisecond timestamp (unixTimestampRelative).

Here's the function to be pasted into the JS console:

function handleCaptionsStream(callback, startTimestamp) {
  var playerResponse = JSON.parse(ytplayer.config.args.player_response);
  var captionsUrl = playerResponse.streamingData.adaptiveFormats.find(function (
    format
  ) {
    return format.mimeType.indexOf('text/') === 0;
  }).url;
  var domParser = new window.DOMParser();

  fetchCaptions().then(function (primaryInfo) {
    var beginningTimestamp =
      Date.now() - primaryInfo.streamProperties['Stream-Duration-Us'] / 1000;
    var startSequenceNumber = startTimestamp
      ? Math.round(
          ((startTimestamp - beginningTimestamp) * 1000) /
            primaryInfo.streamProperties['Target-Duration-Us']
        )
      : primaryInfo.streamProperties['Sequence-Number'];
    return fetchCaptionsUntilEnd(startSequenceNumber);
    function fetchCaptionsUntilEnd(sequenceNumber) {
      var timestamp =
        beginningTimestamp +
        (primaryInfo.streamProperties['Target-Duration-Us'] *
          primaryInfo.streamProperties['Sequence-Number']) /
          1000;
      return (timestamp > Date.now()
        ? waitUntil(timestamp)
        : Promise.resolve()
      ).then(function () {
        return fetchCaptions(sequenceNumber).then(function (info) {
          callback(info);
          if (info.streamProperties['Stream-Finished'] === 'F') {
            return fetchCaptionsUntilEnd(
              info.streamProperties['Sequence-Number'] + 1
            );
          }
        });
      });
    }
  });

  function fetchCaptions(sequenceNumber) {
    return fetchTextUntilContentReturned(
      captionsUrl +
        (sequenceNumber === undefined ? '' : '&sq=' + sequenceNumber)
    ).then(function (text) {
      var streamPropertiesContent = text.slice(
        text.indexOf('Sequence-Number:'),
        text.indexOf('\r\n\r\n')
      );
      var streamProperties = {};
      streamPropertiesContent.split('\n').forEach(function (line) {
        var lineParts = line.trim().split(': ');
        var key = lineParts[0];
        var value = lineParts[1];
        streamProperties[key] = isNaN(value) ? value : Number(value);
      });
      var xmlIndex = text.indexOf('<?xml ');
      var xmlContent = xmlIndex !== -1 ? text.slice(xmlIndex) : null;
      var xmlTree =
        xmlContent && domParser.parseFromString(xmlContent, 'text/xml');
      var unixTimestampRelative =
        (streamProperties['Sequence-Number'] *
          streamProperties['Target-Duration-Us']) /
        1000;
      var captions =
        xmlTree &&
        Array.prototype.map
          .call(xmlTree.querySelectorAll('p'), function (p) {
            var textContent = p.textContent;
            if (textContent.trim()) {
              var t = Number(p.getAttribute('t'));
              var d = Number(p.getAttribute('d'));
              var start = t + unixTimestampRelative;
              var end = start + d;
              return {
                text: textContent,
                start: start,
                end: end
              };
            }
          })
          .filter(Boolean);
      var webVttContent =
        captions &&
        captions
          .map(function (caption) {
            return (
              formatTime(caption.start / 1000) +
              ' --> ' +
              formatTime(caption.end / 1000) +
              '\n' +
              caption.text +
              '\n'
            );
          })
          .concat('')
          .join('\n');
      return {
        streamProperties: streamProperties,
        xmlContent: xmlContent,
        xmlTree: xmlTree,
        unixTimestampRelative: unixTimestampRelative,
        captions: captions,
        webVttContent: webVttContent
      };
    });
  }

  // for some reason we get an empty response sometimes
  function fetchTextUntilContentReturned(url) {
    return fetch(url)
      .then(function (res) {
        return res.text();
      })
      .then((text) => {
        return text || fetchTextUntilContentReturned(url);
      });
  }

  function waitUntil(unixTime) {
    return new Promise(function (resolve) {
      setTimeout(resolve, Math.max(0, unixTime - Date.now()));
    });
  }

  function pad2(number) {
    // thanks https://www.electrictoolbox.com/pad-number-two-digits-javascript/
    return (number < 10 ? '0' : '') + number;
  }

  function pad3(number) {
    return number >= 100 ? number : '0' + pad2(number);
  }

  // time: seconds
  function formatTime(time) {
    var hours = 0;
    var minutes = 0;
    var seconds = 0;
    var milliseconds = 0;
    while (time >= 60 * 60) {
      hours++;
      time -= 60 * 60;
    }
    while (time >= 60) {
      minutes++;
      time -= 60;
    }
    while (time >= 1) {
      seconds++;
      time -= 1;
    }
    milliseconds = (time * 1000).toFixed(0);
    return (
      pad2(hours) +
      ':' +
      pad2(minutes) +
      ':' +
      pad2(seconds) +
      '.' +
      pad3(milliseconds)
    );
  }
}

@benwiley4000
Copy link
Owner

Also note that the script on master should work for you if you just need captions for a live stream that has completed already

@benwiley4000
Copy link
Owner

I just updated the script above to include a webVttContent property which looks like this:

Screen Shot 2020-10-07 at 2 59 29 PM

@benwiley4000
Copy link
Owner

Hm, seems like some of the math might be wrong now that I look at the timestamps in the vtt content. I can try to give this another look later.

@benwiley4000
Copy link
Owner

The script now reflects correct timestamps in the webvtt content. There may still be a problem with captions overlapping (maybe they should be combined when on overlapping ranges), but they should work with the html video element.

Screen Shot 2020-10-07 at 4 51 01 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants