We can safely conclude that purely manual key-framing and direct parameterisation are technologies of the past. They were adequate in their own time period, due to inferior computer hardware, when one could not afford to waste clock cycles and had to cut all possible corners to achieve the desirable effect. Even the most elementary techniques, such as smooth shading, had to be implemented manually through software. Such techniques are now significantly faster and implemented directly in the graphics hardware. They are also easily accessible and abstracted within graphics libraries, such as OpenGL and DirectX. Direct parameterisation has a relative advantage over pure key-framing because it dramatically reduces the amount of data required for the animation. However, frequent undesirable effects due to parameter clashing and the lack of skin deformation mechanisms cause scientists and developers to avoid this technique. The pseudo-muscle-based approach is viewed as little more than a bridge between earlier attempts and approaches and the muscle-based approach.
2.2.1 Muscle-based approach
While the development of the first pure muscle-based facial model by Waters (1987) constituted a significant milestone and was fairly advanced for its time, the model is relatively simple by modern criteria. It represents the skin as a geometric surface with no underlying structure. The deformations are implemented by means of a simple geometric distortion of the surface, which fails to reproduce subtle tissue deformation. Terzopoulos and Waters (1990) alleviate some of the mentioned problems by introducing anatomically-based arrangements of muscle models, along with a physically-based tissue model. This tissue model allows for more realistic surface deformations than the previous attempts. Zhang, Prakash and Sung (2001) model the skin using non-linear spring frames that are able to simulate the dynamics of real skin. The advantage of this approach is that the model does not need to be treated as a continuous surface since each mass point and each spring can be accessed individually.
Improved muscle action control was the goal of Pasquariello and Pelachaud (2001) and Bui (2004). This was achieved by dividing their respective models into a number of areas. For skin simulation purposes, both Pasquariello and Pelechaud (2001) and Bui (2004) diverted from the physics-based approach. Although Pasquariello and Pelechaud animated the skin in a realistic way, they did not use physical simulation of muscles and the visco-elastic behaviour of the skin. The two alternative techniques they used to simulate furrows, bulges and wrinkles were bump mapping and physical displacement of vertices.
Alternatively, Bui created wrinkles by displacing the affected vertices in the direction of the normal to the direction of muscle action. He also addressed the artefacts that occur on the skin surface under the influence of two or more of Waters’s (1987) vector muscles. Waters’s way of handling this was to add the displacements sequentially. Bui proposed simulating parallelism by calculating the resultant displacement internally, then applying it to the vertex.
Tang, Liew and Yan (2004) introduced a NURBS muscle-based system, defined by three to five control points. Using this system, muscle deformation is achieved by modifying the weighting of these control points. Internally, the weight modification forces the knots to move, which in turn moves the vertices of the model. To enhance realism, the authors attempted to simulate the fatty tissue reaction to deformation by adding control points between the two end control points. A promising anatomical model, offering unique versatility, was also described by Kahler, Haber and Seidel (2001) and Kahler et al. (2002). This model fitted the muscles and calculated the skull mesh based on the face geometry, thus greatly reducing manual intervention. A result of this technique can be seen in Figure 85 (pp 123), where a model of a boy has been automatically adapted to his different ages. The animation is based on a mass-spring system.
Sifakis et al. (2005) constructed an anatomically accurate facial muscle model, using the principles derived from the more general muscle construction principles of Teran et al. (2005). Their technique is based on finite element algorithms (for a detailed exposition of finite elements, the reader is referred to Fish and Belytschko, 2007). A significant feature of their model is that its muscle action can interact with the environment, that is, the muscle forces can be combined with external forces such as collision, producing the resultant effect shown in Figure 66 (pp 103).
Mass-spring and finite element algorithms seem to be the two dominating technologies for muscle- based animation today. The two schools of thought co-exist and superiority of one over the other has yet to be established.
2.2.2 Image-based techniques
Image-based techniques still remain the methods of choice in the movie industry, due to that industry’s photorealistic requirements. Motion picture special effects are usually subject to post- production, so quality takes precedence over rendering speed. An example of sophisticated image- based modelling and animation techniques are those used in the movie Matrix Reloaded, as described in Borshukov and Lewis (2003) and Borshukov et al. (2003).
Another technique that may be classified as belonging to the image-based group of methods, and that has survived to this day, is so-called ‘blendshape interpolation’. The basic principle consists of a series of static photos and the interpolation between them. Important recent work on blendshape modelling was carried out by Joshi et al. (2003), Lewis et al. (2005) and Deng et al. (2006).
Joshi et al. (2003) designed a method for automating the blendshape segmentation, greatly reducing the amount of manual work required. Due to the complexity of the human face, the blendshapes have to be segmented into smaller regions. Lewis et al. (2005) presented a new algorithm for solving the problem of blendshape interference. This undesirable effect appears when two interacting parameters are individually adjusted, and subsequently interfere with one another throughout the process.
Deng et al. (2006) describe a semi-automatic method of cross-mapping of facial data to pre- designed blendshape models. They also improve on the blendshape weight-solving algorithm.
2.2.3 Performance-driven methods
In performance-driven animation there are several distinguishable data acquisition research problems. Regarding implementation, the research problems are ‘shared’ between image- and geometry-based animation techniques. Here it should be mentioned that performance-driven techniques sometimes still use key-frames and interpolation. One of the main reasons for this is the fact that the large amount of key-frame data, which would be a major issue if constructed manually, can now be derived via an automatic acquisition method.
At the top of the list of the abovementioned performance-related problems is the issue of perfecting the method of capturing the performance data, that is reducing the human intervention to the minimum, to diminish or eliminate face markers and to reduce the need for custom acquisition hardware. Borshukov et al. (2003) use an optical flow and photogrammetric technique to record a live actor’s performance. Optical flow refers to a technique of tracking each pixel in time using multiple cameras. The spatial position of each pixel can later be determined using triangulation. Blanz et al.
(2003) combine image- and geometry-based technologies to augment the performance by simulating motion that has not yet been performed.
Zhang et al. (2004) designed a system using several video cameras positioned around the subject (performer) at an angle. No facial markers were used, so that the footage is also suitable for texture and lighting purposes. Video cameras are relatively inexpensive and non-intrusive acquisition hardware. Once the videos have been produced, the computer derived the geometry of the subject using machine vision techniques. Zhang et al. (2006) also combined image- and geometry-based technologies, but for the purpose of simulating subtle facial details – such as wrinkles – that cannot be identified through performance.
Gutierrez-Osuna et al. (2005) created an interesting mixture of existing approaches in their performance-driven audio/visual synthetic system. The generic model contained a number of polygons with identified (standardised) MPEG-4 facial points (FPs). Facial expressions were achieved using muscle action, each of which conformed to MPEG-4 FPs. Although the model represented all of the ‘anatomy’ of a muscle-based system (mass-spring-based muscles, skull and jaw), the animation was not a free Newtonian physics system. The forces that acted on the muscles were compiled or defined in such a way that they conformed to MPEG-4 FPs.
There is an increasing trend towards the use of machine learning techniques in data-driven approaches to computer graphics, and in particular to facial animation (Hertzmann, 2003). Notable here is that of Steinke, Schölkopf and Blanz (2005), in which the use of support vector machine algorithms for 3D shape processing is presented. One of the case studies concerns the reconstruction of scans of human faces.
2.3 Future developments and conclusion