Home Journals IJDNE Using Entity Identification and Classification for Automated Integration of Spatial-Temporal Data

JOURNAL METRICS

CiteScore 2025: 1.9 ℹCiteScore:

CiteScore is the number of citations received by a journal in one year to documents published in the three previous years, divided by the number of documents indexed in Scopus published in those same three years.

SCImago Journal Rank (SJR) 2025: 0.228 ℹSCImago Journal Rank (SJR):

The SJR is a size-independent prestige indicator that ranks journals by their 'average prestige per article'. It is based on the idea that 'all citations are not created equal'. SJR is a measure of scientific influence of journals that accounts for both the number of citations received by a journal and the importance or prestige of the journals where such citations come from It measures the scientific influence of the average article in a journal, it expresses how central to the global scientific discussion an average article of the journal is.

Source Normalized Impact per Paper (SNIP) 2025: 0.451 ℹSource Normalized Impact per Paper(SNIP):

SNIP measures a source’s contextual citation impact by weighting citations based on the total number of citations in a subject field. It helps you make a direct comparison of sources in different subject fields. SNIP takes into account characteristics of the source's subject field, which is the set of documents citing that source.

Using Entity Identification and Classification for Automated Integration of Spatial-Temporal Data

R. Ahsan| R. Neamtu | E. Rundensteiner

Both authors equally contributed to the work) Worcester Polytechnic Institute, USA

Received:

N/A

| |

Accepted:

N/A

| | Citation

dne110304f.pdf

OPEN ACCESS

https://www.witpress.com/elibrary/dne-volumes/11/3/1190

Abstract:

Big data, crucial to answering economic, social, and political questions facing our society, tend to be diverse and distributed through various sites across the Internet. The creation of tools to integrate and analyze such data is of paramount interest. Yet the automation of these processes continues to be a great challenge. Our work rests on the observation that a great number of public data sources in domains ranging from economic to demographic, although of complex structure, often share key similarities, namely the presence of the Time and Location. Our proposed Data Integration through Object Modeling framework or DIOM tackles the critical problem of automating data integration from a variety of public websites by abstracting key features of multi-dimensional tables and interpreting them in the context of knowledge-centered Unified Spatial Temporal Model. Our classification-driven extractors are trained to identify and classify entities from both structured and unstructured parts of spreadsheets. The unstructured part contained in titles, headers, and footers reveals critical information, so-called Implicit Knowledge, crucial to the correct interpretation of data. Our experimental results on real world datasets from heterogeneous public data sources show increased accuracy by 25% compared to state-of-the-art approaches.

Keywords:

big data, data extraction, data integration, information retrieval

References

[1] Hung, V., Benatallah, B. & Saint-Paul, R., Spreadsheet-based complex data transformation. In 20th ACM, pp. 1749–1754, ACM, 2011.

[2] Lakshmanan, L.V.S., Subramanian, S.N., Goyal, N. & Krishnamurthy, R., On querying spreadsheets. In Proceedings 14th International Conference on Data Engineering, pp. 134–141, IEEE, 1998.

[3] Coletta, R., Castanier, E., Valduriez, P., Frisch, C., Ngo, D.H. & Bellahsene, Z., Public data integration with websmatch. In First International Workshop on Open Data, pp. 5–12, ACM, 2012.

[4] Roth, M., Hernandez, M.A., Coulthard, P., Yan, L., Popa, L., Ho, H.C.T. & Salter, C.C., Xml mapping technology: making connections in an xml-centric world. IBM Systems Journal, 45(2), pp.

389–409, 2006. http://dx.doi.org/10.1147/sj.452.0389

[5] Liu, B. & Jagadish, H., A spreadsheet algebra for a direct data manipulation query interface. In Data Engineering, ICDE’09, pp. 417–428, IEEE, 2009.

[6] Chen, Z. & Cafarella, M., Automatic web spreadsheet data extraction. In 3rd International Workshop on Semantic Search Over the Web, p. 1. ACM, 2013.

[7] Chen, Z. & Cafarella, M., Integrating spreadsheet data via accurate and low-effort extraction. In 20th ACM SIGKDD, pp. 1126–1135, ACM, 2014.

[8] Lafferty, J., McCallum, A. & Pereira, F.C.N., Conditional random fields: probabilistic models for segmenting and labeling sequence data, 2001.

[9] Fuxman, A., Hernandez, M.A., Ho, H., Miller, R.J., Papotti, P. & Popa, L., Nested mappings:

schema mapping reloaded. In 32nd International Conference on Very Large Data Bases, pp. 67–78, VLDB Endowment, 2006.

[10] Abraham, R. & Erwig, M., UCheck: a spreadsheet type checker for end users. Journal of Visual Languages & Computing, 18, pp. 71–95, 2007. http://dx.doi.org/10.1016/j.jvlc.2006.06.001

[11] Cunha, J., Saraiva, J. & Visser, J., From spreadsheets to relational databases and back. In 2009 ACM SIGPLANworkshop on Partial Evaluation and Program Manipulation.

[12] Pinto, D., McCullam, A., Wei, X. & Croft, W.B., Table extraction using conditional random fields. In 26th Annual International ACM SIGIR, pp. 235–242, ACM, 2003.

[13] Malouf, R., Markov models for language-independent named entity recognition. In 6th Conference on Natural Language Learning — Volume 20, pp. 1–4, Association for Computational Linguistics, 2002. http://dx.doi.org/10.3115/1118853.1118872

[14] Chieu, & Ng, H.T., Named entity recognition: a maximum entropy approach using global information. In 19th International Conference on Computational Linguistics-Volume 1, Association for Computational Linguistics, 2002.

[15] Powers, D., Evaluation: from precision, recall and f-factor to roc. Informedness, Markedness & Correlation (Tech. Rep.), Adelaide, Australia, 2007.

[16] Finkel, J.R., Grenager, T. & Manning, C., Incorporating non-local information into information extraction systems by gibbs sampling. In 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370, Association for Computational Linguistics, 2005.

IJHT
MMEP
ACSM
EJEE
ISI
I2M
JESA
RCMA
RIA
TS
IJSDP
IJSSE
IJDNE
JNMES
IJES
EESRJ
RCES
AMA_A
AMA_B
AMA_C
AMA_D
MMC_A
MMC_B
MMC_C
MMC_D

Username
Password
Remember me

Search form

Using Entity Identification and Classification for Automated Integration of Spatial-Temporal Data