Dear Eliza, matrix string theory may be viewed just as a variation of BFSS Matrix Theory, although arguably an important one, and the original papers are the full introductions at the same moment.
http://arxiv.org/abs/hep-th/9701025
http://arxiv.org/abs/hep-th/9702187
http://arxiv.org/abs/hep-th/9703030
Some of the few hundred followups deal with some more technical issues.
The last paper in the list above, which is the newest one, should be the most optimized one. To say the least, it contains the most detailed treatment of the interactions. One may enumerate a couple of reviews of BFSS Matrix Theory. Some of them dedicate some time to matrix string theory, some of them don't. For example, see
http://arxiv.org/abs/hep-th/9712072
http://arxiv.org/abs/hep-th/0101126
A derivation of BFSS Matrix Theory was given by Seiberg:
http://arxiv.org/abs/hep-th/9710009
M-theory in 11 dimensions may be compactified on a nearly light-like (slightly space-like) circle - which is still consistent. $X^-$ becomes a periodic variable, $X^-\approx X^-+2\pi R$. (This light-cone treatment was automatically used in my paper above but it was Lenny Susskind who took credit for it months later - much ado about nothing. The original BFSS paper was using the "infinite momentum frame".) In the lightlike limit, a Lorentz boost may map the compactification to a compactification of M-theory on a very short spatial circle in 11D Planck units (because the proper length of the nearly light-like circle was tiny) - which is type IIA string theory. Units of momenta along the compact light-like direction become D0-branes.
The kinematic regime guarantees that these D0-branes are non-relativistic. They're well-described by the non-relativistic supersymmetric quantum mechanics - the matrix model - which is the dimensional reduction of the 10D supersymmetric Yang-Mills theory to 0+1 dimensions. The gauge group is $U(N)$. It has 16 non-trivial real supercharges.
So one can show that all of physics of M-theory, if studied in the light-cone gauge, is equivalent to an ordinary non-gravitational matrix model - a quantum mechanical model with matrix degrees of freedom. The eigenvalues of the $X^i$ matrices may be viewed as the positions of the gravitons (or their superpartners) in 11 dimensions; a threshold (zero-binding-energy) bound state of several such eigenvalues (which can be proved to exist, a remarkable property of $SU(N)$ supersymmetric quantum mechanics) are gravitons that carry a higher number of units of the quantized light-like (longitudinal) momentum.
All interactions are encoded in the off-diagonal elements of the matrices which are classically zero but whose virtual quantum effects make the eigenvalues interact so that the resulting picture is indistinguishable from 11D supergravity at low energies; much like AdS/CFT, it is an equivalence of a gravitational theory and a non-gravitational one (in some sense, the compact light-like direction $X^-$ of the matrix model is the holographic direction). The model contains black holes and all other expected objects, too: extended branes may be added. The identical natural of gravitons and gravitinos - with the right Bose-Einstein and Fermi-Dirac statistics - appears because the permutation group is embedded into the $U(N)$ gauge group of the quantum mechanical model, and all physical states must therefore be invariant under this $U(N)$ i.e. also $S_N$. The compact M2-branes (membranes) appear most directly because the whole BFSS matrix model may be viewed as a discretization of the M2-brane world volume theory in M-theory - assuming that the world volume coordinates generate a non-commutative geometry. This equivalence may be derived in a straightforward way, especially for the toroidal and spherical topology of the M2-branes. M5-branes are harder to see but they must be there, too.
The BFSS Matrix Theory above gave the first complete definition of M-theory in 11 dimensions (the whole superselection sector of the Hilbert space) that was valid at all energies. It's a light-cone-gauge description where sectors with different values of $p^+ = N/R$ are separated and separately described by the $U(N)$ quantum mechanical models. I forgot to say - to really decompactify the $X^-$ coordinate, one needs to send its radius $R$ to infinity. Because $p^+=N/R$ is fixed (physical momentum), $N$ has to be sent to infinity, too. The infinite-space physics is always obtained as the large $N$ limit of calculations in $U(N)$ matrix models.
Matrix string theory
One may apply the same derivation to find the matrix model of other superselection sectors besides the 11D vacuum of M-theory, too. It includes some (simple) compactifications; the right matrix model isn't known for all compactifications. In particular, matrix models for type IIA string theory and heterotic $E_8\times E_8$ string theory have a very simple form. Instead of a quantum mechanical model i.e. 0+1-dimensional field theory arising from the D0-branes, one ends up with a 1+1-dimensional supersymmetric gauge theory originating from D1-branes of type IIB (an extra T-duality is added to the derivation), compactified on a cylinder, the so-called matrix string theory (although the historically more correct name is "screwing string theory").
In matrix string theory, again, the eigenvalues of the $U(N)$ matrices $X^i$ are interpreted as positions of points on strings in the transverse 8-dimensional space (the two light-like directions are treated separately in light-cone gauge: one of them, $X^+$, is the light-like time and the other, $X^-$, is compactified). Those eigenvalues $X^i_{nn}(\sigma)$ still depend on $\sigma$, the spatial coordinate of the cylinder on which the gauge theory is defined.
However, one may obtain strings of an arbitrary length by applying permutations on the eigenvalues: the length determines the light-like longitudinal momentum $p^+=N/R$ which is quantized because $X^-$ is compactified. All these permutations are allowed because $U(N)$ is gauged as a symmetry in the matrix model. Consequently, perturbative type IIA and HE string theory with arbitrary numbers of strings are defined by an orbifold conformal field theory - a single string propagating on the orbifold $R^{8N}/S_N$, if you wish (with the extra fermionic degrees of freedom, too). The permutations now guarantee not only the indistinguishability of strings in the same vibration states but also the existence of strings with higher values of $p^+$ - it looks like your configuration II on the world volume if you wish (but the path in the spacetime is generic) - as well as the validity of the $L_0=\tilde L_0$ condition in the continuum limit, among other things. Interactions work as expected, too.
The perturbative string theories always emerge in the light-cone gauge Green-Schwarz description. In the heterotic case, the $E_8$ groups arise from the fermionic representation of the $E_8$ current algebra: those extra fermions are fermions transforming in the fundamental representation of $U(N)$; sixteen of them per single Hořava-Witten boundary i.e. per single $E_8$ while the gauge group has to be changed to $O(N)$ and some degrees of freedom (originally Hermitian matrices) become symmetric real tensors of $O(N)$ while others are antisymmetric, see the paper below and its followups:
http://arxiv.org/abs/hep-th/9612198
The main advantage of matrix string theory is that while it may be explicitly shown to agree with type IIA or HE string theory at the weak coupling, it provides one with the exact non-perturbative description at any value of the string coupling. In particular, one may see that when the coupling is sent to infinity, matrix string theory reduces to the original BFSS matrix model for M-theory in large 11 dimensions (with an $E_8$ domain wall, in the heterotic case).
Similar matrix models exist for type IIB in ten dimensions, too: one needs the maximally supersymmetric $2+1$-dimensional superconformal field theory which became relevant for the BLG construction (which later transmuted to the ABJM membrane minirevolution). The methods of matrix models become more complicated for backgrounds with additional compact dimensions - by compactifying spacetime dimensions (dimensional reduction), one needs to add dimensions to the matrix model ("dimensional oxidation") - and no matrix models are known if more than 5 transverse spacetime coordinates are compactified (which is why we can't define matrix models for phenomenologically interesting compactifications, at least as of 2011).
By the way, a long list of introductory literature about all kinds of string-theoretical topics, most recently updated in 2004, is here:
http://arxiv.org/abs/hep-th/0311044
This post imported from StackExchange Physics at 2014-03-24 03:35 (UCT), posted by SE-user Luboš Motl