Now that the structure and content of the training data have been understood, the ontol- ogy and the RST components can finally be implemented in the system. This section will explain the steps that need to be taken in transforming the free-form reports into a structured form. Figure 6.1 summarises these steps.
Figure 6.1: The steps in transforming free-form reports to structured form
Pre-processing the Reports
The first step that needs to be taken in transforming the free-form reports to structured form is to pre-process the reports to remove the many obvious errors caused by the ra- diologists as mentioned in Chapter 5. This was performed by manually going through
the reports and correcting the errors. Once this has been completed, the reports will then be submitted to the system to be transformed into a structured form. After sub- mission, another pre-processing phase will be performed automatically by the system whereby all the main titles of the reports such as “US Abdomen” and “Ultrasound: Abdomen” will be removed because the produced reports will be assigned with a stan- dardised title.
Annotating and Splitting the Reports
Subsequently, relevant words from the reports will be annotated with classes from AUO. The annotation process will be performed using the same method as presented in Subsection 5.5.1 where the paragraphs in the report were split into single words and later combined to two and three word combinations before being compared with relevant classes from AUO. The relevant classes are those that have been defined in the rules to identify rhetorical relations in ultrasound reports as described in Section 5.4. When all the words have been annotated, they will be merged to form a paragraph before being split again after each sentence boundaries such as a full stop or a colon to create a list of annotated sentences. The reason these reports were split into single sentences was because it is important to recognise which type of information each sentence contains so that the system knows how to process the sentence.
Identifying the Sentence Type
As mentioned in Subsection 6.3.1, there were four different types of information that have been recognised in the sample ultrasound reports. Thus, the next step would be to identify which of these types of information does each sentences in the reports belongs to. These sentences can be classified as either a clinical history, finding / observation, conclusion or further management. The classification was performed by using signal words, AUO or both as well as by identifying RST relations. Since the information on clinical history were all implicit, it can only be identified using a list of signal words such as “previous”, “history” as well as the AUO class,“Month of the Year”.
Conclusions on the other hand were all explicit, therefore, they can be identified using the PREPARATION relation where they were separated under the title “Con- clusion” or “Comment:”. A RESTATEMENT relation can also be used to identify conclusions that were written as the “Main or Principal Diagnosis:”. On the contrary, sentences which are of the type further management can be identified using cue words such as “advise”, “suggest”, “consider” and “recommend”. Since it is also sometimes
Type of Information Signal Word / Title
Clinical history previous, history, “Month of the Year” class
Conclusion “Conclusion:”, “Comment:”, “Main or Principal Diagnosis:”
Further management advise, suggest, consider, recommend
Table 6.2: The different types of information and its signal words
explicitly written, the further management information can also be identified using the PREPARATION relation where the information is written under the title “Further Management:”. Table 6.2 mentions several examples of the signal words and titles in the PREPARATION relation and the equivalent information they are signalling. All the sentences which were of these three types of information will be extracted from the free-form report and moved under suitable headings in the structured report. However, for all the other remaining sentences that does not fall under these three types will then be regarded as finding / observation and will be further processed.
Applying RST on the Findings and Observations
The aim of applying RST in the transformation process is to group the findings in the reports according to the area examined under the “Findings / Observations” heading as shown in Figure 6.2. For example, if the sonographer has examined the liver, pancreas and spleen of the patient, the findings and observations should be recorded according to the area examined instead of writing all of it in one paragraph. This could be per- formed by identifying three out of the seven rhetorical relations which are the JOINT, LIST and ELABORATION relations in the findings and observations.
The JOINT relation was used to separate several areas examined which were ini- tially reported in one sentence, into several separate sentences. The JOINT relation was the most used relation in transforming the free-form reports into structured re- ports. For example, one of the sample reports stated that “The kidney, spleen and pancreas is normal”. In this example, it was clear that there were three areas being examined which are “kidney”, “spleen” and “pancreas”. When RST is applied to this sentence, it will recognise that there exist a JOINT relation between the three organs because of the cue word “and” as well as the commas between the organs, although grammatically it looks wrong as the radiographer wrote “is normal” when it should be “are normal”. This allows the system to separate these three organs into separate sentences without losing its observation.
The existence of the JOINT relation informs the system that it should report three different sentences that starts with the three organs and shares the same observation which is “is normal”. This produced three sentences which were “The kidney is nor- mal”, “Spleen is normal” and “Pancreas is normal”. The JOINT relation will only separate a sentence into several other sentences when there is more than one area be- ing reported. If there is only one, it will not separate the sentence. For example, it was reported that “The liver has smooth contour and normal echogenicity”. Even though there was a word “and” that signals a JOINT relation, but since the sentence was re- porting about only one area which is the liver, it will not be split into two.
In a sentence where there is the word “and” but there are also two organs and two observations being reported, no JOINT relation will be identified based on the rules defined in Section 5.4. Instead, this sentence will be segmented into two text spans that have a LIST relation between each other. Therefore, the sentence will be repro- duced as two sentences that do not share one observation. An example of this is the sentence “The gallbladder wall is very thickened and the liver appear prominent”. In this sentence, there were two areas being reported which are the gallbladder and the liver. However, because there were also two observations (“very thickened” and “ap- pear prominent”), the system will recognise this sentence as having a LIST relation. Therefore, it separates the sentence into two but with both of them having their own ob- servation. The decision whether or not to separate a finding to more than one sentences is summarised in the flowchart in Figure 6.3.
Another relation being used in the transformation process is the ELABORATION relation. This relation is important in ensuring that any other extra information was not lost when it is being separated or joined with another sentence. For example, consider
Figure 6.3: Separating a finding based on JOINT and LIST relation
this sentence “Normal appearance of spleen (measuring 12.6cm), head and body of pancreas and aorta (measuring 1.4cm inner to inner)”. This sentence consisted of three areas which were the spleen, head and body of pancreas and aorta. There was only one observation which was “normal appearance”. A JOINT relation that exist between the three areas will separate the sentence into three which are “Normal appearance of spleen”, “Normal appearance of head and body of pancreas” and “Normal appearance of aorta”. Without the ELABORATION relation, it will be hard to retain the informa- tion on the measurement of the organs. Therefore, ELABORATION relation is also needed in the transformation process. The ELABORATION relation allows the sen- tence to be separated without losing any information resulting the two sentences to become “Normal appearance of spleen (measuring 12.6cm)” and “Normal appearance of aorta (measuring 1.4cm inner to inner)”.
Displaying the Structured Report
When all the sentences in the free-form report have been grouped under suitable head- ings, the complete structured reports will be displayed to the radiologists to verify their accuracy. An example of the generated structured report of the following free-form re- port is shown in Figure 6.4:
“US Abdomen : Normal liver echo pattern with no focal lesion demon- strated. No evidence of gall stones or dilatation of the bile ducts. Both kidneys are normal in size and echo pattern with no mass lesion or ev- idence of obstruction. Normal pancreas, aorta and spleen. Conclusion: Normal examination.”