Topic > Spatial data mining and data analysis

With the advanced growth of technology, the expansion of research areas, the implementation of various commercial and open source GIS systems has brought to light a massive collection of data stored in different bases. Nowadays, we generate about several trillion bytes of data every day, characterized by high dimensionality and large sample size and called Big Data or huge volumes of data. However, in today's situation data is mysterious, we have rich data but poor information. DM is the non-trivial process of identifying valid, new, potentially useful, and ultimately understandable patterns in data. Fayyad et al. (1996) Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay Spatial data mining in other ways is a distinctive type of data mining. The main distinction between data mining and spatial data mining is that, in spatial data mining tasks we use not only non-spatial attributes but also spatial attributes. It has been said that spatial data is special and therefore special methods and techniques need to be processed or analyzed. This concept has appeared in several review papers and articles, although few of them are against this concept. Most of these review papers suggest that it is an extremely difficult task to extract interesting patterns in a geographic dataset compared to extraction in traditional data, this is because geographic or spatial data is associated with complex spatial data types, relationship spatial, spatial heterogeneity, spatial autocorrelation, ecological fallacy, and the modifiable areal unit problem (MAUP). In this type of situation, the adoption and effectiveness of traditional data mining techniques becomes thankless. To decide whether spatial data is special or not, I suggest we take our little time to take a short tour to describe the term spatial analysis and then describe just two characteristics of spatial data. Spatial analysis is a special type of methods with the aim of identifying or describing the pattern to identify and understand the process associated with that particular pattern. The results of the spatial analysis change when the positions of the analyzed objects change. This is well explained by Tobler (1979) in his First Law of Geography "everything is related to everything else, but things near are more related than things far away". The first law of geography emphasizes more on spatial dependence or spatial autocorrelation, which implies that the phenomenon in one place is more likely to be repeated in a nearby place rather than a distant one. To deal with this type of situation requires very special techniques. First to compare the pattern observed in the data (e.g., locations in point pattern analysis, values ​​at locations in spatial autocorrelation) with that in which space is irrelevant (Anselin, 1989). These data sets are scale dependent, the associated queries to extract information for this dataset are more advanced and very complex, as explained in . This is opposed to traditional statistical techniques that assume that observations are independent, and so in this sense these techniques cannot be critically implemented for data that exhibit spatial dependence behavior. Spatial data has another unique feature called spatial heterogeneity, which means that the behavior of relationships in space is not stable, but varies in different areas of the map. A realistic perspective on most spatial data must assume that in general most spatial processes are nonstationary and anisotropic. Heterogeneity and non-stationarity