UCR. Progress report - 1999



The main preconditions for development of information technology for Ukrainian population-based cancer registry were existing state system of cancer registration and presence of oncological service in Ukraine which involves a number of oblast, city and regional oncological clinics or dispensaries. State system of cancer registration in Ukraine (based on paper files) began to function since 1932. Nowadays registration of cancer patients is carried out with 46 oncological dispensaries, including 25 oblast dispensaries.

The basic principle of cancer registration is that all oncological information, which concerned a case of the disease, is aggregated in the regional oncological establishment in place of patient’s residence.

All medical documents about a cancer patient are have to be sent into the oncological dispensary at place of patient’s residence irrespective of place of diagnosis or cancer patient’s treatment.

The basic sources of the information are the next registration medical documents:

Thus the basic registration document was the “Notification...” on the basis of which the annual cancer incidence reports in Ukraine were made.

When accepting the “Notification...” or “Abstract...” an oncological dispensary had to fill up the “Control card” for the patient except for that who was diagnosed after death and for so-called departmental patient.

In transition to computerised information technology of oncological patients’ data processing a number of lacks in existing paper technology were revealed and the further ways of its perfection were designated.

  1. Absence of the single registration document for malignant patient. Using the “Notification...” as the registration document results in hyper-registration, as all medical establishments filled it up, which were in touch with patient and for each tumour diagnosed. There were cases of having up to 5 copies of “Notification...” for one patient (in Odessa area). “Control card”’s have been filled up only for the patients who were under supervision of the oncological dispensary and have not been filled up for those who were diagnosed after death and for departmental patients.

  2. The existing medical registration documents did not provide the registration of the detailed clinical diagnosis and frequently were reduced to registration of an ICD code (without specification of topography and morphology of the tumour). The authorised by Ministry of Health of Ukraine list of morphological types under registration was restricted to 25 rather general terms (!).

  3. Registration documents, as a rule, used to be filled up by physicians who were not familiar with the basic ideas and principles of ICD-coding. Manual coding of diseases frequently produces many mistakes (even experienced coders make from 35% up to 48% errors according to study done in USA).

  4. Official cancer incidence information of Ministry of Health of Ukraine is based only on the data of the first year of the account. But for such extensive territory as Ukraine it is not possible to receive the information for all diseased within one year. That is the official incidence data were not complete and never were obtained more precise (because of impossibility to do it manually every year).

  5. Official cancer mortality data in population of Ukraine is based on data of State Statistic Bureau, which used only the first year of the account too. Besides the basic cancer registration principle is accumulation whole information about the patient in residential onco-dispensary, but deaths’ registration in State Statistic Bureau is done in the place of obsequies. Thus, if a patient from, for instance, Luganskaya oblast has died on territory of Donetskaya oblast, then such case in the State Statistic Bureau data will be accounted in Donetskaya oblast (instead of Luganskaya as it should be more convenient for calculation of cancer mortality). Annually we receive a difference between official cancer mortality data and mortality data of cancer registry in about 5-6 thousands patients owing to those died and accounted in other oblasts. (Don’t forget that it is only the data of the first year of the account.) Besides the large divergences in causes of death codes are observed.

  6. There were no precise recommendations about multiple neoplasm’s’ registration what resulted in hyper-registration of cases.

  7. Low efficiency of manual search of duplicates.

Naturally the lacks determined have become a matter for discussion in HM. They have been accounted while developing more perfect information technology for cancer registration. Besides for development of new computerised information technology (IT) we proceeded from specific features of present situation in Ukraine, that is: non-completeness of cancer registries’ personnel and frequent changes in it, impossibility of regular training realisation etc. Therefore the new IT was assigned to perform control functions to supervise the observance of the existing international standards in cancer registration.

It is evident that for such extensive territory as Ukraine has, with large number of annually registered cases of disease (near 160 thousands) and quota of diseased (near 750 thousands), the centralised cancer registry is not reasonable. The distributed principle of cancer registration, data updating and quality data check therefore was chosen. The information on cancer cases is collecting in cancer registries of oblast level in oblast oncological dispensaries. After the data quality control and updating the data, it is transferred into the central registry in Ukrainian Research Institute of Oncology and Radiology. The large oblast centres, with the population 3-5 million persons in oblast (Dnepropetrovskaya, Donetskaya, and Lvovskaya) have their own regional distributed structure. The universal software provides an opportunity to be adjusted to the regional oncological structure and uniformity of codificators enables to integrate the initial cancer data at a state level. Nowadays new IT is introduced in 20 oblast centres and 3 more are in a stage of introduction. Last state of the integrated database is shown on a slide. ( DIS_ENTR )

Whereby did we solve the indicated lacks in established state system of registration of oncological information?

  1. formalised “Registration card for patient with malignant neoplasm" (RC) has been developed and authorised to execution by HM in 1998 ( KONTR_E2 ).

  2. The card have to be filled up for all cancer patients (who are under the follow-up in oncological dispensary, departmental patients and those diagnosed after death as well) resident on oblast’s territory. The “Notification...”, “Abstract..” and other registration documents carry the informative function and supplement a RC ( SL1 ).

  3. Obligatory description of topography and morphology type of the tumour in "RC" is stipulated.

  4. Computerised coding of the diagnosis in ICD and ICD-O codes has been introduce


The diagnosis is the major statistical unit for calculation of incidence, mortality and other related rates. Hence, presenting and coding of diagnosis in cancer registry requires special accuracy. Software support for this procedure is useful and necessary if coders are not enough qualified. It is known that even for the most accurate coding of diagnoses the error rate is quite substantial (in some cases as much as 38% according to researchers from USA).

We have to indicate the main reasons for automated ICD-coding introduction in our country:

According to the purposes of cancer registration, the cancer registries shall comply with certain standards of data presentation.

The programs initiated by the WHO and intended to ensure comparability of the incidence data, led to development of the ICD and special adaptation of the ICD for oncology (ICD-O). The ICD-O allows for more detailed coding of the oncological diagnoses, recording site, morphology and behaviour for each tumour.

Oncological diagnosis is quite easy to structure and abstract, which substantially facilitates the work with it in the computerised system. It is always possible to identify, directly or indirectly, its topography (site of primary) and morphology (histological type) components.

It allows to present each oncological medical diagnosis as a structural combination of specific values for each of the mentioned oncological diagnosis characteristics and to give each characteristic its own place in the cancer registry database file of diagnoses. Specific values of each component of the oncological diagnosis are selected from the definite set of values admissible for this characteristic.

It often happens, however, that clinical diagnoses in the medical records include quite special wordings, which are not presented in the ICD-O alphabetical list and whose analogues can only be found in the ICD-O by the specialist.

Thus, for the goal to enable clinical diagnosis coding jointly with histopathologists and clinicians of the URIOR the special codificators for the UCR system has been developed. The topography section of the codificator is the systematised ICD-oriented list of organs and their parts, anatomical sites, tissues, etc. containing over 1600 units (including synonyms). This section of the codificator may be used for both adequately detailed coding of primary site in ICDO and for solving another problems of cancer registry. The topography section has hierarchical (tree-type) structure with leaves for most detailed sites, which are subordinated, to more general concepts in nodes of the tree

For the purpose of morphology coding a quite extensive list of pathological conditions containing various clinical and pathological wordings for oncological diagnoses and tumour-like diseases (over 2000 terminology units) has also been developed.

To computerise the diagnoses coding by the ICD-O, the next step to make was to agree the UCR codificator lists with the ICD-O nomenclature lists for topography and morphology. Here synonymy of terms was taken into account. We can consider it on the example of the codificator morphology section. Mutual agreement (connecting) of the above lists allowed to obtain a hierarchical tree-type structure with morphology terms and codes of the ICD-O in its nodes (in this case) and all synonym wordings of the corresponding morphology from the UCR codificator in its leaves (values subordinated to the nodes). The codificator list of sites was treated in a similar way. Program function of transition from the leaf value to the value of the corresponding node allows computerising coding of the oncological diagnosis by the ICD-O. ( DIC_1 , DIC_2 )

The codificators are provided with the special service, which ensures correct coding of diagnoses. The service functions are based on rules of coding of the diagnoses stated in ICD-O. We took into account practically all rules, some of them we want to describe more in detail.(SL_2)

Thus, some tumours have more than one histological type. The most frequent combinations, singled out in the ICD-O as separate morphology units, are also presented in the UCR codificator with individual codes. Hence, while entering the diagnosis it is necessary to check various possible combinations of prefixes or compound terms in order to find a suitable version. For this purpose the UCR codificator has an option of contextual search for terms with the letter combination determined by operator at the time of input. Contextual search would result in the list of all terminology units with the pre-determined textual sub-string from the codificator section being used (for example, morphology section) (Rule 10).

If a compound morphology type is not found in the UCR morphology codificator, the multiple primaries are probable. Our program of cancer registry allows to make analysis a complex morphological type by its components and to prompt to the operator, how to register the given diagnosis and also to determine more suitable for the given case code of morphology (Rule 11).

We believe that a most important advantage of the UCR cancer registration technology is the automatic check (based on the recommendations made by the IARC and IACR in 1994 instead of Rule 14) of several diagnoses with one patient whether it is actually multiple primary.

The ICD and ICD-O codes are functionally dependent, that is, having ICD-O codes for topography, morphology and behaviour of the neoplasm, we can always convert them into the ICD code. The program for coding the diagnosis by ICD in the UCR system is based on the ICD-O codes which, in their turn, are logically agreed with the codes of the UCR codificators.

Transition to only computerised coding in ICD and ICD-O nowadays is carried out. Such technology of coding allows keeping continuity of the data with transition to the new version ICD-10. The transition to new system of coding will require only drawing up of a new special file and function for association between ICD-10 codes and ICD-O codes and its integration into the UCR program of data input.


The automatic control of data quality in the UCR computerised system includes the following items ( SL-3 ):

Besides, the data are analysed with the use of the following methods:

Functions from (1) to (4) are applied both at the time of data input and at the run of the total control procedure for the registry records. The check procedure provides interface for visual control of errors and suspicions for errors after its running is over.

Practically all data entered into the UCR database are controlled. Naturally, sets of control functions are different for hospital and population-based registries though they often coincide.


Would be reasonable to describe the checks of (4) type concerning the diagnosis data. They are supported with the next items:

- correspondence between behaviour and morphology;

- consistency of neoplasm topography and the patient sex;

- consistency of neoplasm topography and the patient age;

- consistency of neoplasm morphology and the patient sex;

- consistency of neoplasm morphology and the patient age;

- consistency of neoplasm morphology and topography;

- correctness of registering the multiple primary.

- analysis of unspecified and indefinitely specified primary tumours.

- presence of TNM for the histologically verified diagnosis;

- validity of TNM indices for specific morphology and site;

- correspondence of TNM and stage;

The check functions of UCR are permanently improving.


While developing the technology of Ukrainian cancer registry the application of automated linkage procedures in international practice of cancer registration was analysed. With the help of linkage procedures the following tasks are solved: (S_LINK_0)

The methods of probabilistic linkage are applicable, for example, if we have two rather large (more than ten thousands records) independent sources of personified computerised information and we have to reveal records about the same patients in both these sources. Unfortunately in public health services of Ukraine the computer databases are not widespread so that it become possible to use automated linkage for the decision of such cancer-registry’s tasks as, for example, to find out the data about patients’ deaths.

But the task of search of duplicated records is more urgent for us. The reasons are following: (S_LINK_2)

Application of usual for world practice procedures of probabilistic linkage is complicated by the fact that there is a sufficient number of studies of problems of probabilistic linkage for English language: NYSIIS-codings, methods of batching, etceteras are developed. While the similar study for Russian was not carried out or their results are inaccessible. The simple carrying of English language algorithms over the Russian language makes quality of linkage worse. We undertake attempts of adaptation of these algorithms, but the final result is far from the desirable.

Besides probabilistic linkage presupposes an establishment of some probable false links (usually with probability 99.5%), that is quite allowable for a task of linkage of two registries for some scientific research. However complete automated use of these algorithms for search of duplicates or automated data transfer would result in that 1 patient from 200 (2 hundreds) patients would be added the another patient's data (accounting such probability of false links).

Therefore, in view of application in the Ukrainian cancer-registry, the original linkage algorithm has been developed. The essence of it is in the automated search of suspicions for duplicate with the subsequent interactive review of pairs found (as shown on the picture, by Fazes) (S_LINK_3):

Faze 1. The procedure of search of cards’ pairs, suspicious for the duplicate, is similar to that used in probabilistic linkage. The search is made according to surname, name, patronymic and date of birth of a patient, both complete and partial coincidence of these parameters are taken into account. At the given stage this procedure gives higher probability of revealing of duplicates than procedure of probabilistic linkage adapted by us for Russian language. But in future it is possible to replace our procedure for probabilistic one or using these procedures concurrently as well.

Faze 2. The final decision that found pair is true link the person takes. Let's note that searching of suspicious for duplicate pairs is only automated. The responsible person of the registry does the final conclusion about identity or diversity of pair found. Sometimes this conclusion cannot be made only with computer files’ data and examination of primary paper forms or consultation with the workers of a residential for patient oncological department is necessary. But beneficially the conclusion made may be considered as practically authentic.

Faze 3. After a pair of cards is recognised as duplicate, the automated joining of the information contained in both cards occurs. Practically always each of duplicate cards contains some part of pertinent data to be pieced together. For example, if a duplicate has appeared on account of wrong registration of multiple cancer, then each of the diagnoses most likely are in a separate cards, and the resulted card should contain them both. The special algorithm is developed for analysis and transferring the information from each of relevant fields of card so that it provides maximum authentic result after the cards being pieced together.

The card recognised by the operator as more authentic should be chosen as a source. For each record of diagnosis, treatment etc. in the second card the search of the appropriate record in the source card is carried out. If such information is absent then the record is transferred wholly. If similar (in date and other content) record is present in the source card then only those fields, which are not filled in source or are filled less detailed are transferred.

For example, if in some record of diagnosis the morphological type "Malignant neoplasm" is specified, and in the duplicate card with similar tumour site and date of diagnosis the morphological type "Alveolar adenocarcinoma" is specified, then in a result we shall receive "Alveolar adenocarcinoma" irrespective of the fact which card was recognised as a source. (S_LINK_4)

Faze 4. After the automated data transferring the unwanted card is deleted, and operator can edit the result card (in case of necessity).

Faze 5. In any case upon termination of data joining the procedure of complete logical check of the result card is performed.

Faze 6. Only after completion of the computerised card check procedure and possible errors correcting the result card is to be legal in the database.

Thus, we, on the one hand, automate all labour-consuming and routine work of search and elimination of the duplicates, and on the other hand, the registry worker provides the control of this process and participates in sometimes arisen non-standard situations and logic contradictions solving.

Applied in duplicates search mechanism has also become useful for automated data transferring from the Hospital Cancer Registry into the Population-based Cancer Registry. The “Abstract from medical in-patient card for patient with malignant diagnosis” is sent into the oncological dispensary in place of residence of patient with the aim of patient’s registration and updating the information already stored in oblast cancer registry. Many patients receive treatment in the residential oncological dispensary where they are under follow-up. Their data are stored in the unified computerised system of the hospital cancer registry.

For the automated data transferring it was necessary to solve the following problems (S_LINK-):

In hospital cancer registry the procedure of creating the computer abstracts’ file is developed. It contains the same information as paper copies (Form 027/onco), and is created in the same moment that is upon termination of medical in-patient card input into the computerised hospital cancer registry. The electronic abstracts are stored in files of the same format as in population-based cancer registry.

While transferring into population-based cancer registry the data of electronic abstracts are located in so-called exchange data buffer wherefrom population cancer registry workers transfer them in a database. The same procedure of preliminary search according to key fields is carried out, the same as while creating a new card. If an appropriate card in database of population registry is found out - then the same as while the automated joining of duplicates procedure is performed. But the roles of duplicates play the hospital registry cards from exchange data buffer. Upon termination of transferring the card in the buffer eliminates and the result card in population registry’s database is checking.

The application of the automated procedures of data transferring from hospital registry into the population one has allowed to reduce time of handling the new card of patient from the selfsame dispensary from several minutes to several seconds. It also significantly reduces working hours for running the registry and stream of paper documents within dispensary. Besides the probable new mistakes caused by the repeated input from the paper documents are reduced either. All actions while joining the data, both automated and made by operator, are recorded in protocol.

The work on introduction of automated data exchange technology is now conducted in all cancer registries of Ukraine For these goals we use means of e-mail. Last year Ukrainian Research Institute of Oncology and Radiology has transferred in regional dispensary not only current electronic abstracts, but also abstract for all patients treated in it during last 10 years. After processing these data it was found out that the significant amount of patients treated in URIOR were not yet registered in their place of residence (because the paper “Abstracts” and “Notifications” have not been sent).

The development and introduction of data exchange automated technology has become possible due to development of the unified software for the population-based cancer registries in all oblasts, and due to wide introduction of compatible computerised hospital cancer registry information system.

Nowadays we have an opportunity to create the common oncological information environment in Ukraine.