You are here:HomeTechnologies


Source code after development process
Obfuscated source code with readable layout (intermediate step)
Final JS obfuscated delivery source code version

Web Shrinker

1. Problems

Problem #1

Demanding Web 2.0 applications are complex to create and therefore expensive. Owners want to protect their property, so they can pay off their investment, perhaps through license sales, and not be ripped off by illegal copies. However, since no compilation of JavaScript takes place, only an interpretation, the un-obscured source code will be sent to the browser. Protection does not seem possible.

Problem #2

Generated HTML pages and JavaScript code can quickly become large and require unnecessarily long loading times. With increasing bandwidth on the internet, this problem is decreasing, although it is still a problem, especially in mobile networks and in underdeveloped parts of the world.

2. Solution

The solution to the above problems is an automatic masking of the JavaScript source code after the development phase. In practice, this means disguising the source code so that 1. it becomes unreadable to unauthorized third parties, and thus the concepts behind the source code can only be traced with a huge effort, and 2. it becomes significantly smaller in size.

To implement this idea, the given source code must be analyzed automatically to identify the code structure, or to filter out variables, instructions and other language elements. The obfuscation thereby takes place in two separate parts:

  1. Syntactic analysis of the given source code in the languages ASP, HTML and JavaScript. Based on this analysis, a syntax tree of the program is created.
  2. Manipulate the syntax tree according to clearly defined rules and then output as a masked file.

The sequence of obfuscation resembles a compilation of source code files, except that in this particular case, both source and target language are the same. The syntactic analysis is based here on the specifications of the respective languages (e.g. ECMA-262 from the European Computer Manufacturers Association for JavaScript). With these specifications, it is possible, given the source code, to search for defined tokens of the language (under the principle of a lexicon). The resulting token stream (of so-called lexemes) is then used as input to a language parser of that language. This language parser examines the token stream by conventional language patterns, such as instructions, assignments or expressions. If there is a shared source of valid derivative work under the relevant language specification, a matching syntax tree is created as the last stage of syntactic analysis. Otherwise, the given source code is syntactically incorrect.

Clearly defined rules can now be applied to the syntax tree created in the first part, to obfuscate it. These rules are called obfuscation rules. Some of the rules used are listed:

  1. Rename meaningful variables as meaningless variables (irreversible)
    Linguistically well-marked and significant variables such as 'LoopCount' are transformed into meaningless strings that have no connection with the actual code e.g. "x_".
  2. Remove comments from source documentation (irreversible)
    Each commented line in the source code helps one understand the execution of the program. This help is removed by a complete deletion of documentation.
  3. Deconstruction of well-structured block layout
    A clear code structure helps you understand right away which statements belong to which sub-programs, functions or loops. The destruction of the block structure makes it difficult for unauthorized third parties to understand the source code.
  4. Restructuring of the program sequence
    The supreme discipline of obfuscation: by the active change of program flow, such as the transformation of loops into recursive function calls, a maximum level of opacity is produced.

After the syntax tree has been manipulated using these rules, the final output is put into a (now) obfuscated file. The focus here is on a valid transition through the obfuscation process so that the masked code has exactly the same functionality as its unmasked counterpart.

3. Conclusion

A particular difficulty arises from the usual mix of different web languages, for example if an ASP script to a HTML/CSS page generates several JavaScript variables. The CEITON Web Shrinker is able to work even across language-overlapping ASP, JavaScript, and HTML processing and synchronous CSS. The mask can be even more secure, if such external libraries are included. The Web Shrinker is not only suitable for small web tools but also especially for large software projects in various languages.