Jan Fan     About     Archive     Feed     English Blog

脚本语言背后的软件抽象层

一开始,我对脚本语言(Scripting language)也没多大在意,我会好多呐,Python/Lua/Tcl/Scheme/Javascript/Bash,怎么也算半个脚本高手了吧!

最近一番深入之后,什么?!

嗯……人果然还是谦虚一点好!

本文就来讲讲脚本语言,和它背后的软件抽象层。

脚本语言(Scripting language)

当我开始学习游戏脚本高级编程之后,我才发现原来我的世界脚本无处不在。

以前我对脚本的印象停留在Perl/Python这些比较复杂的通用脚本语言。

The term “scripting language” is also used loosely to refer to dynamic high-level general-purpose language, such as Perl, Tcl, and Python

后来了解到脚本可以用作扩展语言(Extension language),整个世界都被点亮了。

In the case of scripting an application, this is also known as an extension language.

典型的有

还有

还有许多许多,原来这也是脚本,那也是脚本,脚本的概念一下子被丰富了! 脚本系统原来是可大可小的,只要能被集成在下层的“大家伙”上,调用并操控下层提供的功能,复杂的Java语言我也可以把它看成是一种脚本,C语言我也可以给它做一个运行时环境让它变成脚本。

大到复杂如Python,小到简单如命令行,它们都是脚本语言,被解释,被集成,从一个更高更抽象的层次去编程自定义你想要的计算机操作。

运行时环境(Run-time environment)

每个脚本语言都有它自己的runtime,比如说Ruby语言就不能运行在Python语言的runtime上面。

Every programming language has some form of a runtime system, whether the language is a compiled language, interpreted language, embedded domain-specific language, or is invoked via an API as is pthreads.

但这个runtime究竟是什么东西? 我们来看几个例子。

  • The runtime system of the C language is a particular set of instructions inserted into the executable image by the compiler.
  • The OS kernel can be viewed as a runtime system, and that the set of OS calls that invoke OS behaviors may be viewed as an API invoked language.
  • For assembly language, the physical CPU itself can be viewed as an implementation of the runtime system for a programming language.

runtime的具体实质其实非常宽泛,针对不同的计算机语言,它所代指的内涵差异非常大。 但对于每一种计算机语言,runtime其实都提供了最重要的一个功能——解释语言所描述的行为。

A run-time system, also called runtime system, or just runtime, exhibits the behavior of the constructs of a computer language.

In addition to the behavior of the language constructs, a runtime system may also perform support services such as type checking, debugging, or code generation and optimization.

而对于脚本语言来说,要解释它的行为就要具体问题具体分析了。

A scripting language or script language is a programming language that supports scripts, programs written for a special run-time environment that can interpret (rather than compile) and automate the execution of tasks that could alternatively be executed one-by-one by a human operator.

Environments that can be automated through scripting include

  • software applications,
  • web pages within a web browser,
  • the shells of operating systems (OS), and
  • embedded systems.

对于具体领域的脚本语言(Domain-specific programming language),比如说具体的应用程序,如游戏引擎\浏览器\MS Office,它们的脚本的行为解释出来就是

而对于通用目的脚本语言(General-purpose programming language),它们的行为的解释就非常明确了——模拟计算机处理器,把脚本语言转化为虚拟机的CPU指令集,完成通用计算或操控硬件设备。

In the limit, the run-time system may provide services such as a P-code machine or virtual machine, that hide even the processor’s instruction set. This arrangement greatly simplifies the task of language implementation and its adaptation to different machines, and allows sophisticated language features such as reflection.

虚拟机(Virtual Machine)

虚拟机就是在软件层面上模拟一个真实的计算机系统。

In computing, a virtual machine (VM) is an emulation of a particular computer system.

我们真实的计算机由哪些东西组成?CPU,寄存器,内存,I/O。 这些东西统统可以在一个软件抽象层上进行“假冒”:

就是这么粗制滥造,牺牲的就是效率。 可以看看这篇文章实现的一个非常简单的虚拟机,获取一个大概的认知。

虚拟机也有自己的一套指令集(Instruction set),它描述了VM的功能范围。 但虚拟机上的一条指令,实际可能被转换成真实CPU上的好几条。 但换来的是几乎没有限制的硬件自由和可移植性,你想要什么样的物理架构都可以用软件模拟出来。

VM有一套与真实CPU相对应的执行流程。

    Real          Virtual

 +--------+ | +------------+
 |Assembly| | |Intermediate|
 |language| | |  language  |
 +--------+ | +------------+
     |      |        |
 +--------+ |   +--------+
 |Machine | |   |Bytecode|
 |language| |   +--------+
 +--------+ |
     |      |        |
   +---+    |      +--+
   |CPU|    |      |VM|
   +---+    |      +--+

解释器(Interpreter)

In computer science, an interpreter is a computer program that directly executes, i.e. performs, instructions written in a programming or scripting language, without previously compiling them into a machine language program.

解释器一般采用以下几种方式去执行程序:

  1. parse the source code and perform its behavior directly
  2. translate source code into some efficient intermediate representation and immediately execute this
  3. explicitly execute stored precompiled code made by a compiler which is part of the interpreter system

从解释器的实现方式1,我们可以看到,脚本并不一定需要被编译为中间代码,解释器可以直接对高级脚本代码进行执行,比如早期的Lisp语言。 而第3种实现方式,解释器的角色其实就相当于一个虚拟机。

好了,解释器和编译器究竟有什么不同呢?

解释和编译的过程可以有相当大部分的交集,比如词法分析\语法分析\中间代码\汇编等。 但比起直接编译的代码,解释执行的代码与最底层的machine code这间永远多出一个软件抽象层。 即使是解释器最终产出了machine code,它也是这个软件抽象层的machine code。

Both compilers and interpreters generally turn source code (text files) into tokens, both may (or may not) generate a parse tree, and both may generate immediate instructions (for a stack machine, quadruple code, or by other means).

The basic difference is that a compiler system, including a (built in or separate) linker, generates a stand alone machine code program, while an interpreter system instead performs the actions described by the high level program.

最后

本文简要地阐述了脚本语言\runtime\VM\解释器的概念,其实就是想表达这样一种观点——脚本语言,作为一种更高效的计算机控制手段,改变了以往编程的概念。它与虚拟机\解释器一起,使人们慢慢地脱离物理硬件的限制,在软件抽象层上更自由更高效地设计语言和软件。

主要参考资料

Comments

多说 Disqus