EN LA AMÉRICA PREHISPÁNICA
CRONOLOGIA GENERAL Paleolítico
4.1. ORIGEN Y DESARROLLO DE LAS COLECCIONES PREHISPÁNICAS Tras el regreso de Colón del primer viaje al Nuevo Mundo, numerosos objetos
Results are notoriously difficult to obtain in the field of knowledge based systems, particularly where experts are involved, as it is typically not possible to run long trials or investigate the opinions of multiple experts or allow for multiple runs,
85
simply because expert time is by definition valuable and hard to come by. Due to these factors it is necessary for us to draw what insights we can from the data we can collect. In light of this, every expert action in the system is logged, with all relevant parameters being recorded into a database for later evaluation. As well as this logged data, some insights can be gleaned from examining the resultant knowledge base structure itself. Between these two sources of information, and in light of the MCRDR methodology, it is possible to determine at least approximate measures for a good range of the relevant factors.
The first prototype was trained on 129 high risk medication review cases from nursing homes by Dr. Tenni, while the second prototype was trained on 244 home medicine review cases, which are in a slightly lower risk category. It is unfortunate that neither of these trials could be allowed to run longer, but due to limitations in funding and expert time it was necessary to end the study at this point. The consequence of this is that all insights and conclusions must be treated with some scepticism, as some observations may well be based on random variation, rather than true trends.
3.4.1 Growth of the Knowledge Base – Rules per Case
By measuring the number of rules created for each case it is possible to determine at what rate the knowledge base is growing. Typically this rate will be quite high early on, with a gradual slow-down as the knowledge base progressively encapsulates a greater percentage of the domain. Clearly, if the knowledge base encompasses the entirety of a domain it will stop growing, as the expert will never need to define new rules or exceptions, since the system will always provide the correct answers. In practice for complex domains this point is never reached, with past studies and simulations showing that growth will hit a wall where the system is providing a very high level of accuracy, but some rare cases do still require new knowledge. It can be seen in Figure 3-2 that both knowledge bases did, perhaps, appear to reach the beginnings of this slow down period where the growth rate begin to plateau, although the data is simply not complete enough to draw this conclusion with any certainty. What may be interesting here, however, is that the first prototype reached this point at around about the 100 case mark, whilst the second prototype reached it closer to the 200 case mark. It has already been noted that the second prototype was
86
using marginally “simpler” cases, which were expected to have slightly fewer classifications overall, but this should have little bearing on when the system’s growth rate began to slow. One likely explanation for this phenomenon is that the second prototype, being a substantially improved domain model, allowed the expert to define a far wider range of problems, which encouraged them to tackle a larger portion of the overall domain of medication review than was covered by the first prototype.
Another possible cause for this could be that the first expert was defining more general rules, which have been shown in simulation studies to learn more rapidly, but also have a longer slowing-down period. This might also account for the fact that the first prototype did appear to learn at a faster rate than the second prototype, although this again might simply be because the cases inherently had more classifications to provide. A closer look at other aspects will tell us more about which causes are more or less relevant.
Figure 3-2 The growth charts of both knowledge bases.
3.4.2 Specificity of the rules – Conditions per Rule
In order to determine whether one expert was creating more general rules than the other it is possible to look at the number of conditions each expert was using per rule. This can only be a rough measure of the specificity of the rules, as the expert is
0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 T o tal r u le s Cases seen First Second
87
not likely to be creating the same rule at the same time, and because the second prototype was designed to allow both greater specificity and generality in a given condition, but since they are creating rules to cover the same domain it is expected that they will ultimately be creating similar rule sets. As such, the expert with the lower average conditions per rule can be fairly reliably be said to be creating more general rules than the other, and give some indication of precisely how much more general.
Although it is difficult to see in Figure 3-3, it was found that the first expert was creating somewhat more general rules, with an average number of conditions per rule of 1.79 compared to the second expert’s average of 2.29. This does account for the fact that the first prototype was seen to learn faster, and also offers some explanation as to why the second prototype was seen to be in a heavy learning phase for so much longer. With further evaluation it should become clear whether or not this is because the second expert was attempting to encapsulate a larger portion of the medication review domain.
Figure 3-3 Conditions per rule.
0 1 2 3 4 5 6 7 8 0 50 100 150 200 250 300 350 400 C o n d it io n s Rule First Second
88
3.4.3 Accuracy of the system – Correct classifications
provided
Although standard measures of accuracy cannot be determined for these MCRDR systems, as there is no further classified test cases to evaluate the knowledge bases against, it is still possible to determine a measure of accuracy. This can be done with the assumption that once the expert has finished evaluating a case it is correctly classified. So, a measure of accuracy can be found by calculating how many correct classifications were provided by the system and dividing this figure by the total number of classifications the case had when the expert had finished with it. The equation for this can be seen in Equation 1, where Cf is the number of classifications found, Crem is the number of classifications removed, Crep is the number of classifications replaced, and Ra is the number of rules added for a given case. This approach for calculating accuracy operates under the assumption that every classification the expert states is wrong, and every classification that the expert states is missing is a system “error”, while every classification the system provides, and which is still present when the expert has finished assessing the case, is correct. With this formula we are calculating the number of correct classifications the system found compared to the number of classifications the expert has asserted the system should have found. By then averaging the accuracy results across a period it is possible to see a rough indication of how many correct classifications the system is providing in stages throughout the training process. This is slightly different information than might be garnered by simply looking at the number of rules which are being added, since it takes into consideration the number of classifications each individual case should have. The alternative might be to argue that a case is classified incorrectly unless it has all, or a specific portion of the correct classifications applied to it, before expert intervention. This approach would seem unreasonable though, as the system in this case is trying to detect potentially multiple distinct problems, rather than components of the problem.
Equation 1 Calculating the accuracy of the system's suggestions.