There is a JavaScript parser at least in C and Java (Mozilla), in JavaScript (Mozilla again) and Ruby. Is there any currently out there for Python?
I don’t need a JavaScript interpreter, per se, just a parser that’s up to ECMA-262 standards.
A quick google search revealed no immediate answers, so I’m asking the SO community.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
Nowadays, there is at least one better tool, called slimit:
SlimIt is a JavaScript minifier written in Python. It compiles
JavaScript into more compact code so that it downloads and runs
faster.SlimIt also provides a library that includes a JavaScript parser,
lexer, pretty printer and a tree visitor.
Demo:
Imagine we have the following javascript code:
$.ajax({
type: "POST",
url: 'http://www.example.com',
data: {
email: '<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="4322212003246d202c2e">[email protected]</a>',
phone: '9999999999',
name: 'XYZ'
}
});
And now we need to get email, phone and name values from the data object.
The idea here would be to instantiate a slimit parser, visit all nodes, filter all assignments and put them into the dictionary:
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor
data = """
$.ajax({
type: "POST",
url: 'http://www.example.com',
data: {
email: '<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7d1c1f1e3d1a531e1210">[email protected]</a>',
phone: '9999999999',
name: 'XYZ'
}
});
"""
parser = Parser()
tree = parser.parse(data)
fields = {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
for node in nodevisitor.visit(tree)
if isinstance(node, ast.Assign)}
print fields
It prints:
{'name': "'XYZ'",
'url': "'http://www.example.com'",
'type': '"POST"',
'phone': "'9999999999'",
'data': '',
'email': "'<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="cdacafae8daae3aea2a0">[email protected]</a>'"}
Method 2
ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.
The ANTLR site provides many grammars, including one for JavaScript.
As it happens, there is a Python API available – so you can call the lexer (recognizer) generated from the grammar directly from Python (good luck).
Method 3
I have translated esprima.js to Python:
https://github.com/PiotrDabkowski/pyjsparser
>>> from pyjsparser import parse
>>> parse('var $ = "Hello!"')
{
"type": "Program",
"body": [
{
"type": "VariableDeclaration",
"declarations": [
{
"type": "VariableDeclarator",
"id": {
"type": "Identifier",
"name": "$"
},
"init": {
"type": "Literal",
"value": "Hello!",
"raw": '"Hello!"'
}
}
],
"kind": "var"
}
]
}
It’s a manual translation so its very fast, takes about 1 second to parse angular.js file (so 100k characters per second). It supports whole ECMAScript 5.1 and parts of version 6 – for example Arrow functions, const, let.
If you need support for all the newest JS6 features you can translate esprima on the fly with Js2Py:
import js2py
esprima = js2py.require("<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7c190f0c0e15111d3c48524c524d">[email protected]</a>")
esprima.parse("a = () => {return 11};")
# {'body': [{'expression': {'left': {'name': 'a', 'type': 'Identifier'}, 'operator': '=', 'right': {'async': False, 'body': {'body': [{'argument': {'raw': '11', 'type': 'Literal', 'value': 11}, 'type': 'ReturnStatement'}], 'type': 'BlockStatement'}, 'expression': False, 'generator': False, 'id': None, 'params': [], 'type': 'ArrowFunctionExpression'}, 'type': 'AssignmentExpression'}, 'type': 'ExpressionStatement'}], 'sourceType': 'script', 'type': 'Program'}
Method 4
As pib mentioned, pynarcissus is a Javascript tokenizer written in Python. It seems to have some rough edges but so far has been working well for what I want to accomplish.
Updated: Took another crack at pynarcissus and below is a working direction for using PyNarcissus in a visitor pattern like system. Unfortunately my current client bought the next iteration of my experiments and have decided not to make it public source. A cleaner version of the code below is on gist here
from pynarcissus import jsparser
from collections import defaultdict
class Visitor(object):
CHILD_ATTRS = ['thenPart', 'elsePart', 'expression', 'body', 'initializer']
def __init__(self, filepath):
self.filepath = filepath
#List of functions by line # and set of names
self.functions = defaultdict(set)
with open(filepath) as myFile:
self.source = myFile.read()
self.root = jsparser.parse(self.source, self.filepath)
self.visit(self.root)
def look4Childen(self, node):
for attr in self.CHILD_ATTRS:
child = getattr(node, attr, None)
if child:
self.visit(child)
def visit_NOOP(self, node):
pass
def visit_FUNCTION(self, node):
# Named functions
if node.type == "FUNCTION" and getattr(node, "name", None):
print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.end]
def visit_IDENTIFIER(self, node):
# Anonymous functions declared with var name = function() {};
try:
if node.type == "IDENTIFIER" and hasattr(node, "initializer") and node.initializer.type == "FUNCTION":
print str(node.lineno) + " | function " + node.name + " | " + self.source[node.start:node.initializer.end]
except Exception as e:
pass
def visit_PROPERTY_INIT(self, node):
# Anonymous functions declared as a property of an object
try:
if node.type == "PROPERTY_INIT" and node[1].type == "FUNCTION":
print str(node.lineno) + " | function " + node[0].value + " | " + self.source[node.start:node[1].end]
except Exception as e:
pass
def visit(self, root):
call = lambda n: getattr(self, "visit_%s" % n.type, self.visit_NOOP)(n)
call(root)
self.look4Childen(root)
for node in root:
self.visit(node)
filepath = r"C:UsersdwardDropboxjuggernaut2juggernautparsertestdatajasmine.js"
outerspace = Visitor(filepath)
Method 5
You can try python-spidermonkey
It is a wrapper over spidermonkey which is codename for Mozilla’s C implementation of javascript.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0