Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma

Serim Gençaslan; Anıl Utku; M. Ali Akcayol

doi:10.2339/politeknik.775185

Research Article

Deep Learning Based Video Event Classification

Year 2023, Volume: 26 Issue: 3, 1155 - 1165, 01.10.2023

Serim Gençaslan Anıl Utku M. Ali Akcayol

https://doi.org/10.2339/politeknik.775185

Abstract

In recent years, due to the growth of digital libraries and video databases, automatic detection of activities from videos and obtaining patterns from large datasets have come to the fore. Object detection from images is used as a tool for various applications and is the basis of video classification. Objects in videos are more difficult to identify than in single images, as the information in videos has a time-continuity constraint. Following the developments in the field of computer vision, the use of open source software packages for machine learning and deep learning and the developments in hardware technologies have enabled the development of new approaches. In this study, a deep learning-based classification model has been developed for the classification of sports branches in video. In the model developed using CNN, transfer learning has been applied with VGG-19. Experimental studies on 32827 frames using CNN and VGG-19 models showed that VGG-19 has a more successful classification performance than CNN with an accuracy rate of 83%.

Keywords

Video classification, deep learning, CNN, VGG-19

References

[1] Çiğdem A.C.I. and Çırak A., “Türkçe Haber Metinlerinin Konvolüsyonel Sinir Ağları ve Word2Vec Kullanılarak Sınıflandırılması”, Bilişim Teknolojileri Dergisi, 12(3): 219-228, (2019).
[2] Ma S., Sigal L. and Sclaroff S., “Learning activity progression in lstms for activity detection and early detection”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1942-1950, (2016).
[3] Ribeiro P.C., Santos-Victor J. and Lisboa P., “Human activity recognition from video: modeling, feature selection and classification architecture”, Proceedings of International Workshop on Human Activity Recognition and Modelling, 61-78, (2005).
[4] Ribeiro P.C., Santos-Victor J. and Lisboa P., “Human activity recognition from video: modeling, feature selection and classification architecture”, Proceedings of International Workshop on Human Activity Recognition and Modelling, 61-78, (2005).
[5] Kim E., Helal S. and Cook D., “Human activity recognition and pattern discovery”, IEEE pervasive computing, 9(1): 48-53, (2009).
[6] Anguita D., Ghio A., Oneto L., Parra X. and Reyes-Ortiz J.L., “A public domain dataset for human activity recognition using smartphones”, In Proceedings of the 21th international European symposium on artificial neural networks, computational intelligence and machine learning, Belgium, 437-442, (2013).
[7] Lin W., Sun M.T., Poovandran R. and Zhang Z., “Human activity recognition for video surveillance”, 2008 IEEE International Symposium on Circuits and Systems, Washington, USA, 2737-2740, (2008).
[8] Dai X., Singh B., Zhang G., Davis L.S. and Qiu Chen Y., “Temporal context network for activity localization in videos”, Proceedings of the IEEE International Conference on Computer Vision, Cambridge, MA, USA, 5793-5802, (2017).
[9] Kay W., Carreira J., Simonyan K., Zhang B., Hillier C., Vijayanarasimhan S. and Suleyman M., “The kinetics human action video dataset”, arXiv preprint arXiv:1705.06950, (2017).
[10] Soomro K., Zamir A.R. and Shah M., “UCF101: A dataset of 101 human actions classes from videos in the wild”, arXiv preprint arXiv:1212.0402, (2012).
[11] Kuehne H., Jhuang H., Garrote E., Poggio T. and Serre T., “HMDB: a large video database for human motion recognition”, 2011 International Conference on Computer Vision, Barcelona, Spain, 2556- 2563, (2011).
[12] Sigurdsson G.A., Varol G., Wang X., Farhadi A., Laptev I. and Gupta A., “Hollywood in homes: Crowdsourcing data collection for activity understanding”, European Conference on Computer Vision, Amsterdam, Netherlands, 510-526, (2016).
[13] Gu C., Sun C., Ross D.A., Vondrick C., Pantofaru C., Li Y. and Schmid C., “Ava: A video dataset of spatio-temporally localized atomic visual actions”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, USA, 6047-6056, (2018).
[14] Idrees H., Zamir A.R., Jiang Y.G., Gorban A., Laptev I., Sukthankar R. and Shah M., “The THUMOS challenge on action recognition for videos in the wild”, Computer Vision and Image Understanding, 155: 1-23, (2017).
[15] Schuldt C., Laptev I. and Caputo B., “Recognizing human actions: a local SVM approach”, Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 32-36, (2004).
[16] Blank M., Gorelick L., Shechtman E., Irani M. and Basri R., “Actions as space-time shapes”, Tenth IEEE International Conference on Computer Vision (ICCV'05), Beijing, China, 1395-1402, (2005).
[17] Rodriguez M.D., Ahmed J. and Shah M., “Action mach a spatio-temporal maximum average correlation height filter for action recognition”, 2008 IEEE conference on computer vision and pattern recognition, Anchorage, Alaska, 1-8, (2008).
[18] Weinland D., Boyer E. and Ronfard R., “Action recognition from arbitrary views using 3d exemplars”, 2007 IEEE 11th International Conference on Computer Vision, Rio De Janeiro, Brazil, 1-7, (2007).
[19] Marszalek M., Laptev I. and Schmid C., “Actions in context”, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, 2929-2936, (2009).
[20] Soomro K., Zamir A.R. and Shah M., “UCF101: A dataset of 101 human actions classes from videos in the wild”, arXiv preprint arXiv:1212.0402, (2012).
[21] Valueva M.V., Nagornov N.N., Lyakhov P.A., Valuev G.V. and Chervyakov N.I., “Application of the residue number system to reduce hardware costs of the convolutional neural network implementation”, Mathematics and Computers in Simulation, (2020).
[22] Van den Oord A., Dieleman S. and Schrauwen B., “Deep content-based music recommendation”, Advances in neural information processing systems, 2643-2651, (2013).
[23] Collobert R. and Weston J., “A unified architecture for natural language processing: Deep neural networks with multitask learning”, Proceedings of the 25th international conference on Machine learning, Helsinki, Finland, 160-167, (2008).
[24] Tsantekidis A., Passalis N., Tefas A., Kanniainen J., Gabbouj M. and Iosifidis A., “Forecasting stock prices from the limit order book using convolutional neural networks”, 2017 IEEE 19th Conference on Business Informatics (CBI), Thessaloniki, Greece, 7-12, 2017.
[25] Fukushima K., “Neocognitron”. Scholarpedia, 2(1): 1717, (2007).
[26] Hubel D.H. and Wiesel T.N., “Receptive fields and functional architecture of monkey striate cortex”, The Journal of physiology, 195(1): 215-243, (1968).
[27] Fukushima K. and Miyake S., “Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition”, Competition and cooperation in neural nets, 267-285, (1982).
[28] Li S., Li W., Cook C., Zhu C. and Gao Y., “Independently recurrent neural network (indrnn): Building a longer and deeper RNN”, Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, Utah, ABD, 5457-5466, (2018).
[29] Sundermeyer M., Ney H. and Schlüter R., “From feedforward to recurrent LSTM neural networks for language modeling”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3): 517-529, (2015).

Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma

Year 2023, Volume: 26 Issue: 3, 1155 - 1165, 01.10.2023

Serim Gençaslan Anıl Utku M. Ali Akcayol

https://doi.org/10.2339/politeknik.775185

Abstract

Son yıllarda, dijital kütüphanelerin ve video veritabanlarının büyümesi nedeniyle, videolardan aktivitelerin otomatik olarak tespit edilmesi ve büyük veri kümelerinden örüntülerin elde edilmesi ön plana çıkmaktadır. Görüntüden nesne algılama, çeşitli uygulamalar için bir araç olarak kullanılır ve video sınıflandırmanın temelidir. Videolardaki bilgilerin zaman sürekliliği kısıtlaması olduğundan, videolardaki nesneleri tanımlamak tek görüntüye göre daha zordur. Bilgisayarlı görme alanındaki gelişmelerin ardından, makine öğrenmesi ve derin öğrenme için açık kaynaklı yazılım paketlerinin kullanımı ve donanım teknolojilerinde yaşanan gelişmeler, yeni yaklaşımların geliştirilmesine imkân sağlamıştır. Bu çalışmada, video üzerinde spor dallarının sınıflandırılmasına yönelik derin öğrenme tabanlı bir sınıflandırma modeli geliştirilmiştir. CNN kullanılarak geliştirilen modelde, VGG-19 ile öğrenme aktarımı uygulanmıştır. 32827 adet frame üzerinde, CNN ve VGG-19 modelleri kullanılarak yapılan deneysel çalışmalar, VGG-19’un %83 doğruluk oranı ile CNN’den daha başarılı bir sınıflandırma performansına sahip olduğunu göstermiştir.

Keywords

Video sınıflandırma, derin öğrenme, CNN, VGG-19

References

[1] Çiğdem A.C.I. and Çırak A., “Türkçe Haber Metinlerinin Konvolüsyonel Sinir Ağları ve Word2Vec Kullanılarak Sınıflandırılması”, Bilişim Teknolojileri Dergisi, 12(3): 219-228, (2019).
[2] Ma S., Sigal L. and Sclaroff S., “Learning activity progression in lstms for activity detection and early detection”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1942-1950, (2016).
[3] Ribeiro P.C., Santos-Victor J. and Lisboa P., “Human activity recognition from video: modeling, feature selection and classification architecture”, Proceedings of International Workshop on Human Activity Recognition and Modelling, 61-78, (2005).
[4] Ribeiro P.C., Santos-Victor J. and Lisboa P., “Human activity recognition from video: modeling, feature selection and classification architecture”, Proceedings of International Workshop on Human Activity Recognition and Modelling, 61-78, (2005).
[5] Kim E., Helal S. and Cook D., “Human activity recognition and pattern discovery”, IEEE pervasive computing, 9(1): 48-53, (2009).
[6] Anguita D., Ghio A., Oneto L., Parra X. and Reyes-Ortiz J.L., “A public domain dataset for human activity recognition using smartphones”, In Proceedings of the 21th international European symposium on artificial neural networks, computational intelligence and machine learning, Belgium, 437-442, (2013).
[7] Lin W., Sun M.T., Poovandran R. and Zhang Z., “Human activity recognition for video surveillance”, 2008 IEEE International Symposium on Circuits and Systems, Washington, USA, 2737-2740, (2008).
[8] Dai X., Singh B., Zhang G., Davis L.S. and Qiu Chen Y., “Temporal context network for activity localization in videos”, Proceedings of the IEEE International Conference on Computer Vision, Cambridge, MA, USA, 5793-5802, (2017).
[9] Kay W., Carreira J., Simonyan K., Zhang B., Hillier C., Vijayanarasimhan S. and Suleyman M., “The kinetics human action video dataset”, arXiv preprint arXiv:1705.06950, (2017).
[10] Soomro K., Zamir A.R. and Shah M., “UCF101: A dataset of 101 human actions classes from videos in the wild”, arXiv preprint arXiv:1212.0402, (2012).
[11] Kuehne H., Jhuang H., Garrote E., Poggio T. and Serre T., “HMDB: a large video database for human motion recognition”, 2011 International Conference on Computer Vision, Barcelona, Spain, 2556- 2563, (2011).
[12] Sigurdsson G.A., Varol G., Wang X., Farhadi A., Laptev I. and Gupta A., “Hollywood in homes: Crowdsourcing data collection for activity understanding”, European Conference on Computer Vision, Amsterdam, Netherlands, 510-526, (2016).
[13] Gu C., Sun C., Ross D.A., Vondrick C., Pantofaru C., Li Y. and Schmid C., “Ava: A video dataset of spatio-temporally localized atomic visual actions”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, USA, 6047-6056, (2018).
[14] Idrees H., Zamir A.R., Jiang Y.G., Gorban A., Laptev I., Sukthankar R. and Shah M., “The THUMOS challenge on action recognition for videos in the wild”, Computer Vision and Image Understanding, 155: 1-23, (2017).
[15] Schuldt C., Laptev I. and Caputo B., “Recognizing human actions: a local SVM approach”, Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 32-36, (2004).
[16] Blank M., Gorelick L., Shechtman E., Irani M. and Basri R., “Actions as space-time shapes”, Tenth IEEE International Conference on Computer Vision (ICCV'05), Beijing, China, 1395-1402, (2005).
[17] Rodriguez M.D., Ahmed J. and Shah M., “Action mach a spatio-temporal maximum average correlation height filter for action recognition”, 2008 IEEE conference on computer vision and pattern recognition, Anchorage, Alaska, 1-8, (2008).
[18] Weinland D., Boyer E. and Ronfard R., “Action recognition from arbitrary views using 3d exemplars”, 2007 IEEE 11th International Conference on Computer Vision, Rio De Janeiro, Brazil, 1-7, (2007).
[19] Marszalek M., Laptev I. and Schmid C., “Actions in context”, 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami Beach, FL, 2929-2936, (2009).
[20] Soomro K., Zamir A.R. and Shah M., “UCF101: A dataset of 101 human actions classes from videos in the wild”, arXiv preprint arXiv:1212.0402, (2012).
[21] Valueva M.V., Nagornov N.N., Lyakhov P.A., Valuev G.V. and Chervyakov N.I., “Application of the residue number system to reduce hardware costs of the convolutional neural network implementation”, Mathematics and Computers in Simulation, (2020).
[22] Van den Oord A., Dieleman S. and Schrauwen B., “Deep content-based music recommendation”, Advances in neural information processing systems, 2643-2651, (2013).
[23] Collobert R. and Weston J., “A unified architecture for natural language processing: Deep neural networks with multitask learning”, Proceedings of the 25th international conference on Machine learning, Helsinki, Finland, 160-167, (2008).
[24] Tsantekidis A., Passalis N., Tefas A., Kanniainen J., Gabbouj M. and Iosifidis A., “Forecasting stock prices from the limit order book using convolutional neural networks”, 2017 IEEE 19th Conference on Business Informatics (CBI), Thessaloniki, Greece, 7-12, 2017.
[25] Fukushima K., “Neocognitron”. Scholarpedia, 2(1): 1717, (2007).
[26] Hubel D.H. and Wiesel T.N., “Receptive fields and functional architecture of monkey striate cortex”, The Journal of physiology, 195(1): 215-243, (1968).
[27] Fukushima K. and Miyake S., “Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition”, Competition and cooperation in neural nets, 267-285, (1982).
[28] Li S., Li W., Cook C., Zhu C. and Gao Y., “Independently recurrent neural network (indrnn): Building a longer and deeper RNN”, Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, Utah, ABD, 5457-5466, (2018).
[29] Sundermeyer M., Ney H. and Schlüter R., “From feedforward to recurrent LSTM neural networks for language modeling”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23(3): 517-529, (2015).

There are 29 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Research Article
Authors	Serim Gençaslan 0000-0001-8404-3099 Anıl Utku 0000-0002-7240-8713 M. Ali Akcayol 0000-0002-6615-1237
Publication Date	October 1, 2023
Submission Date	July 28, 2020
Published in Issue	Year 2023 Volume: 26 Issue: 3

Cite

APA	Gençaslan, S., Utku, A., & Akcayol, M. A. (2023). Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma. Politeknik Dergisi, 26(3), 1155-1165. https://doi.org/10.2339/politeknik.775185
AMA	Gençaslan S, Utku A, Akcayol MA. Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma. Politeknik Dergisi. October 2023;26(3):1155-1165. doi:10.2339/politeknik.775185
Chicago	Gençaslan, Serim, Anıl Utku, and M. Ali Akcayol. “Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma”. Politeknik Dergisi 26, no. 3 (October 2023): 1155-65. https://doi.org/10.2339/politeknik.775185.
EndNote	Gençaslan S, Utku A, Akcayol MA (October 1, 2023) Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma. Politeknik Dergisi 26 3 1155–1165.
IEEE	S. Gençaslan, A. Utku, and M. A. Akcayol, “Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma”, Politeknik Dergisi, vol. 26, no. 3, pp. 1155–1165, 2023, doi: 10.2339/politeknik.775185.
ISNAD	Gençaslan, Serim et al. “Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma”. Politeknik Dergisi 26/3 (October 2023), 1155-1165. https://doi.org/10.2339/politeknik.775185.
JAMA	Gençaslan S, Utku A, Akcayol MA. Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma. Politeknik Dergisi. 2023;26:1155–1165.
MLA	Gençaslan, Serim et al. “Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma”. Politeknik Dergisi, vol. 26, no. 3, 2023, pp. 1155-6, doi:10.2339/politeknik.775185.
Vancouver	Gençaslan S, Utku A, Akcayol MA. Derin Öğrenme Tabanlı Video Üzerinde Olay Sınıflandırma. Politeknik Dergisi. 2023;26(3):1155-6.

Article Files

Full Text

download This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International.